System Architecture
Actors & Trust Boundaries
The system has four actors, each operating at a different trust level:
- Browser extension — untrusted client, captures user selections and renders results
- CloudFlare edge — trusted perimeter, handles DNS, rate limiting, and origin authentication
- AWS Lambda — trusted compute, runs the orchestration pipeline
- AWS Bedrock — trusted model provider, generates etymological analysis
Trust decreases as you move toward the user. The extension is treated as adversarial — every input is validated, every claim is verified.
Request Pipeline
A single user selection traverses five stages before returning an answer:
CloudFlare proxies api.aletheia.study via CNAME to the Lambda Function URL. A Worker rewrites the Host header and injects a shared secret, so the Lambda can verify the request came through the authorized edge. Rate limiting (3 requests per 10 seconds per IP on POST) happens at this layer, before any compute cost is incurred.
Lambda Handler Pipeline
Inside the Lambda, the request passes through a sequential pipeline. Each stage can short-circuit with an appropriate HTTP response:
Much like the Unseen University Library, where every book must pass through several layers of cataloguing before reaching the shelves — and the Librarian knows exactly where each one belongs — every request is filed, classified, and stored with precision. Ook.
Context & Memory Strategy
Aletheia uses request-scoped context enrichment, not conversational memory. When a user selects text, the extension captures a context bundle of three signals:
- The selection — the word or phrase the user highlighted
- Surrounding text — the enclosing paragraph and nearby sentences
- Page metadata — the URL and document context
This bundle is the entire input to the model. A dictionary gives you every meaning of "cell" — Aletheia gives you the biological meaning because it can see "mitochondria" in the surrounding text. The context window is deliberately bounded to the visible page rather than accumulated across sessions.
DynamoDB stores each request with a 30-day TTL for conversation history features, but there is no cross-session memory or user profile building. This is a privacy decision as much as an architectural one: the system never knows what you were reading yesterday. The user's browser already has all the context the model needs; the system's job is to capture it faithfully and pass it through.
Model Selection
The system supports two models with automatic fallback:
- Amazon Nova Micro (default) — faster responses (~532ms median), lower cost
- Claude Haiku (fallback) — richer analysis (~1,469ms median), used when Nova is unavailable
Model choice is configuration-driven, not code-driven. The orchestrator treats the model as a parameter, not a dependency. This means swapping models requires changing an environment variable, not refactoring code.
Auth Architecture
Authentication runs on a separate Lambda (AletheiaAuth) to isolate auth failures from the core analysis pipeline:
OAuth Flow
- Extension opens LinkedIn OAuth consent page
- User authorizes → callback delivers auth code to extension
- Extension sends auth code to
/auth/token - Auth Lambda exchanges code for LinkedIn tokens, fetches OIDC identity
- Creates/updates user record in DynamoDB (
aletheia-users) - Issues JWT with tier, billing anchor, and 24-hour expiry
JWT Claims
Auth is feature-flagged via AUTH_ENABLED on the main Lambda. When disabled, the analysis pipeline runs without token validation — useful for development and when auth isn't yet needed.
Design Decisions
Why no LangChain?
LangChain adds abstraction layers between you and the model. For a system with a single, well-defined LLM interaction pattern, these layers add complexity without benefit. The naked orchestrator approach means:
- Every prompt is a string literal you can read
- Every model call is a direct boto3 invocation you can trace
- Error handling is explicit, not buried in framework middleware
- Testing mocks one thing: the Bedrock client
Why separate auth Lambda?
Blast radius isolation. If the auth system has a bug, the core analysis pipeline keeps working for unauthenticated users. If the analysis pipeline has a bug, users can still authenticate. Neither failure mode takes down both capabilities.
Why no RAG or vector database?
Aletheia generates etymological and contextual analysis from the model's parametric knowledge, grounded by the context bundle from the user's browser. There is no retrieval step and no vector database because:
- The knowledge source is stable — language, etymology, and usage patterns are exactly the kind of broad, well-established knowledge that large language models encode well during training
- The grounding signal comes from the page — the surrounding text provides the context that disambiguates meaning, eliminating the need to retrieve similar passages from a corpus
- The cost doesn't justify it — a vector store adds infrastructure, latency, and a maintenance burden for marginal quality improvement on a task the model already handles well
RAG is the right tool when the model lacks domain knowledge or when the knowledge changes faster than the model can be retrained. Neither applies here. Knowing when not to add infrastructure is as important as knowing how to build it.
Why CloudFlare Worker instead of API Gateway?
Cost. API Gateway charges per request. CloudFlare Workers have a generous free tier and provide edge-level rate limiting, DDoS protection, and SSL termination at no additional cost. The Worker also rewrites the Host header and injects a shared secret, acting as a lightweight authentication proxy.
Deployable Units
| Component | Runtime | Purpose |
|---|---|---|
AletheiaAgent | Python 3.12 Lambda | Core analysis pipeline |
AletheiaAuth | Python 3.12 Lambda | OAuth, JWT, user management |
AletheiaKillSwitch | Python 3.12 Lambda | Emergency shutdown (SNS-triggered) |
aletheia-api | CloudFlare Worker | Edge routing, rate limiting, origin auth |
| Chrome Extension | Manifest V3 | Text selection, UI overlay, auth |
| Firefox Extension | Manifest V3 | Text selection, UI overlay, auth |
Aletheia