System Architecture

        Design philosophy: Naked Python orchestrator. No LangChain, no framework abstractions. Every line of LLM orchestration is visible, testable, and auditable. The system does one thing well: turn selected text into contextual understanding.
      

Actors & Trust Boundaries

The system has four actors, each operating at a different trust level:

Browser extension — untrusted client, captures user selections and renders results
CloudFlare edge — trusted perimeter, handles DNS, rate limiting, and origin authentication
AWS Lambda — trusted compute, runs the orchestration pipeline
AWS Bedrock — trusted model provider, generates etymological analysis

Trust decreases as you move toward the user. The extension is treated as adversarial — every input is validated, every claim is verified.

Request Pipeline

A single user selection traverses five stages before returning an answer:

CloudFlare Edge → Worker (Host rewrite) → Lambda Handler → Bedrock (Nova/Haiku) → DynamoDB

CloudFlare proxies api.aletheia.study via CNAME to the Lambda Function URL. A Worker rewrites the Host header and injects a shared secret, so the Lambda can verify the request came through the authorized edge. Rate limiting (3 requests per 10 seconds per IP on POST) happens at this layer, before any compute cost is incurred.

Lambda Handler Pipeline

Inside the Lambda, the request passes through a sequential pipeline. Each stage can short-circuit with an appropriate HTTP response:

Input Validation Type checks, whitespace, length (max 20KB), UTF-8 encoding

Origin Verification Shared secret check — reject requests not via CloudFlare

Guardrails (Denylist) O(1) HashSet lookup against 802-term denylist

Guardrails (Semantic) LLM classifies across taxonomy — hard/soft/none block types

DynamoDB Persist Store request with 30-day TTL (2,592,000 seconds)

Bedrock Generation Etymology & context analysis via Nova Micro or Claude Haiku

Score Processing & Response Parse scores, apply thresholds, structure JSON response

Much like the Unseen University Library, where every book must pass through several layers of cataloguing before reaching the shelves — and the Librarian knows exactly where each one belongs — every request is filed, classified, and stored with precision. Ook.

Context & Memory Strategy

Aletheia uses request-scoped context enrichment, not conversational memory. When a user selects text, the extension captures a context bundle of three signals:

The selection — the word or phrase the user highlighted
Surrounding text — the enclosing paragraph and nearby sentences
Page metadata — the URL and document context

This bundle is the entire input to the model. A dictionary gives you every meaning of "cell" — Aletheia gives you the biological meaning because it can see "mitochondria" in the surrounding text. The context window is deliberately bounded to the visible page rather than accumulated across sessions.

DynamoDB stores each request with a 30-day TTL for conversation history features, but there is no cross-session memory or user profile building. This is a privacy decision as much as an architectural one: the system never knows what you were reading yesterday. The user's browser already has all the context the model needs; the system's job is to capture it faithfully and pass it through.

Model Selection

The system supports two models with automatic fallback:

Amazon Nova Micro (default) — faster responses (~532ms median), lower cost
Claude Haiku (fallback) — richer analysis (~1,469ms median), used when Nova is unavailable

Model choice is configuration-driven, not code-driven. The orchestrator treats the model as a parameter, not a dependency. This means swapping models requires changing an environment variable, not refactoring code.

Auth Architecture

Authentication runs on a separate Lambda (AletheiaAuth) to isolate auth failures from the core analysis pipeline:

OAuth Flow

Extension opens LinkedIn OAuth consent page
User authorizes → callback delivers auth code to extension
Extension sends auth code to /auth/token
Auth Lambda exchanges code for LinkedIn tokens, fetches OIDC identity
Creates/updates user record in DynamoDB (aletheia-users)
Issues JWT with tier, billing anchor, and 24-hour expiry

JWT Claims

        JWT Payload
{
  "user_id":            // LinkedIn OIDC 'sub' (stable identifier)
  "tier":               // "free" | "subscriber" | "admin"
  "billing_anchor_day": // Day of month for monthly cap window
  "exp":                // 24 hours from issuance
  "jti":                // Unique JWT ID (replay prevention)
}
      

Auth is feature-flagged via AUTH_ENABLED on the main Lambda. When disabled, the analysis pipeline runs without token validation — useful for development and when auth isn't yet needed.

Design Decisions

Why no LangChain?

LangChain adds abstraction layers between you and the model. For a system with a single, well-defined LLM interaction pattern, these layers add complexity without benefit. The naked orchestrator approach means:

Every prompt is a string literal you can read
Every model call is a direct boto3 invocation you can trace
Error handling is explicit, not buried in framework middleware
Testing mocks one thing: the Bedrock client

Why separate auth Lambda?

Blast radius isolation. If the auth system has a bug, the core analysis pipeline keeps working for unauthenticated users. If the analysis pipeline has a bug, users can still authenticate. Neither failure mode takes down both capabilities.

Why no RAG or vector database?

Aletheia generates etymological and contextual analysis from the model's parametric knowledge, grounded by the context bundle from the user's browser. There is no retrieval step and no vector database because:

The knowledge source is stable — language, etymology, and usage patterns are exactly the kind of broad, well-established knowledge that large language models encode well during training
The grounding signal comes from the page — the surrounding text provides the context that disambiguates meaning, eliminating the need to retrieve similar passages from a corpus
The cost doesn't justify it — a vector store adds infrastructure, latency, and a maintenance burden for marginal quality improvement on a task the model already handles well

RAG is the right tool when the model lacks domain knowledge or when the knowledge changes faster than the model can be retrained. Neither applies here. Knowing when not to add infrastructure is as important as knowing how to build it.

Why CloudFlare Worker instead of API Gateway?

Cost. API Gateway charges per request. CloudFlare Workers have a generous free tier and provide edge-level rate limiting, DDoS protection, and SSL termination at no additional cost. The Worker also rewrites the Host header and injects a shared secret, acting as a lightweight authentication proxy.

Deployable Units

Component	Runtime	Purpose
`AletheiaAgent`	Python 3.12 Lambda	Core analysis pipeline
`AletheiaAuth`	Python 3.12 Lambda	OAuth, JWT, user management
`AletheiaKillSwitch`	Python 3.12 Lambda	Emergency shutdown (SNS-triggered)
`aletheia-api`	CloudFlare Worker	Edge routing, rate limiting, origin auth
Chrome Extension	Manifest V3	Text selection, UI overlay, auth
Firefox Extension	Manifest V3	Text selection, UI overlay, auth