System Architecture

Design philosophy: Naked Python orchestrator. No LangChain, no framework abstractions. Every line of LLM orchestration is visible, testable, and auditable. The system does one thing well: turn selected text into contextual understanding.

Actors & Trust Boundaries

The system has four actors, each operating at a different trust level:

Trust decreases as you move toward the user. The extension is treated as adversarial — every input is validated, every claim is verified.

Request Pipeline

A single user selection traverses five stages before returning an answer:

CloudFlare Edge Worker (Host rewrite) Lambda Handler Bedrock (Nova/Haiku) DynamoDB

CloudFlare proxies api.aletheia.study via CNAME to the Lambda Function URL. A Worker rewrites the Host header and injects a shared secret, so the Lambda can verify the request came through the authorized edge. Rate limiting (3 requests per 10 seconds per IP on POST) happens at this layer, before any compute cost is incurred.

Lambda Handler Pipeline

Inside the Lambda, the request passes through a sequential pipeline. Each stage can short-circuit with an appropriate HTTP response:

1
Input Validation Type checks, whitespace, length (max 20KB), UTF-8 encoding
2
Origin Verification Shared secret check — reject requests not via CloudFlare
3
Guardrails (Denylist) O(1) HashSet lookup against 802-term denylist
4
Guardrails (Semantic) LLM classifies across taxonomy — hard/soft/none block types
5
DynamoDB Persist Store request with 30-day TTL (2,592,000 seconds)
6
Bedrock Generation Etymology & context analysis via Nova Micro or Claude Haiku
7
Score Processing & Response Parse scores, apply thresholds, structure JSON response

Much like the Unseen University Library, where every book must pass through several layers of cataloguing before reaching the shelves — and the Librarian knows exactly where each one belongs — every request is filed, classified, and stored with precision. Ook.

Context & Memory Strategy

Aletheia uses request-scoped context enrichment, not conversational memory. When a user selects text, the extension captures a context bundle of three signals:

This bundle is the entire input to the model. A dictionary gives you every meaning of "cell" — Aletheia gives you the biological meaning because it can see "mitochondria" in the surrounding text. The context window is deliberately bounded to the visible page rather than accumulated across sessions.

DynamoDB stores each request with a 30-day TTL for conversation history features, but there is no cross-session memory or user profile building. This is a privacy decision as much as an architectural one: the system never knows what you were reading yesterday. The user's browser already has all the context the model needs; the system's job is to capture it faithfully and pass it through.

Model Selection

The system supports two models with automatic fallback:

Model choice is configuration-driven, not code-driven. The orchestrator treats the model as a parameter, not a dependency. This means swapping models requires changing an environment variable, not refactoring code.

Auth Architecture

Authentication runs on a separate Lambda (AletheiaAuth) to isolate auth failures from the core analysis pipeline:

OAuth Flow

  1. Extension opens LinkedIn OAuth consent page
  2. User authorizes → callback delivers auth code to extension
  3. Extension sends auth code to /auth/token
  4. Auth Lambda exchanges code for LinkedIn tokens, fetches OIDC identity
  5. Creates/updates user record in DynamoDB (aletheia-users)
  6. Issues JWT with tier, billing anchor, and 24-hour expiry

JWT Claims

JWT Payload { "user_id": // LinkedIn OIDC 'sub' (stable identifier) "tier": // "free" | "subscriber" | "admin" "billing_anchor_day": // Day of month for monthly cap window "exp": // 24 hours from issuance "jti": // Unique JWT ID (replay prevention) }

Auth is feature-flagged via AUTH_ENABLED on the main Lambda. When disabled, the analysis pipeline runs without token validation — useful for development and when auth isn't yet needed.

Design Decisions

Why no LangChain?

LangChain adds abstraction layers between you and the model. For a system with a single, well-defined LLM interaction pattern, these layers add complexity without benefit. The naked orchestrator approach means:

Why separate auth Lambda?

Blast radius isolation. If the auth system has a bug, the core analysis pipeline keeps working for unauthenticated users. If the analysis pipeline has a bug, users can still authenticate. Neither failure mode takes down both capabilities.

Why no RAG or vector database?

Aletheia generates etymological and contextual analysis from the model's parametric knowledge, grounded by the context bundle from the user's browser. There is no retrieval step and no vector database because:

RAG is the right tool when the model lacks domain knowledge or when the knowledge changes faster than the model can be retrained. Neither applies here. Knowing when not to add infrastructure is as important as knowing how to build it.

Why CloudFlare Worker instead of API Gateway?

Cost. API Gateway charges per request. CloudFlare Workers have a generous free tier and provide edge-level rate limiting, DDoS protection, and SSL termination at no additional cost. The Worker also rewrites the Host header and injects a shared secret, acting as a lightweight authentication proxy.

Deployable Units

ComponentRuntimePurpose
AletheiaAgentPython 3.12 LambdaCore analysis pipeline
AletheiaAuthPython 3.12 LambdaOAuth, JWT, user management
AletheiaKillSwitchPython 3.12 LambdaEmergency shutdown (SNS-triggered)
aletheia-apiCloudFlare WorkerEdge routing, rate limiting, origin auth
Chrome ExtensionManifest V3Text selection, UI overlay, auth
Firefox ExtensionManifest V3Text selection, UI overlay, auth