Observability & Monitoring
Observability in LLM systems is fundamentally different from traditional web services. You can't just log request/response bodies — those contain user-generated text that may be sensitive. Every metric, trace annotation, and dashboard widget must be designed with the assumption that the content is private, while the operational metadata is not.
Like the Clacks towers of the Grand Trunk, every message carries overhead bytes recording its journey — how long each tower held it, which ones dropped it, the cost of the semaphore time — but the message itself is never read by the operators. GNU Terry Pratchett.
X-Ray Distributed Tracing
AWS X-Ray traces requests end-to-end across Lambda, DynamoDB, and Bedrock. Sampling is set at 5% to balance visibility against cost.
What we trace (safe)
- Annotations (searchable):
model_id,status,error_type - Metadata (visible):
tokens_used,latency_ms, Bedrock namespace details - Subsegments: Custom spans for guardrails, Bedrock calls, DynamoDB writes
What we never trace
- Prompt text or user selections
- Model completions or generated content
- Page URLs or user identifiers
- Any dimension that could identify what a user was reading
The X-Ray SDK is patched via patch_all() for automatic boto3 instrumentation. Custom subsegments wrap each pipeline stage so latency breakdowns show exactly where time is spent.
CloudWatch Metrics (EMF)
Six metrics are emitted via Embedded Metric Format (EMF) — structured log lines that CloudWatch automatically parses into queryable metrics, without the cost of PutMetricData API calls:
| Metric | Dimensions | Unit | Purpose |
|---|---|---|---|
RequestCount |
[Tier] |
Count | Total requests by user tier |
CapUtilization |
[Tier, Window] |
Percent | How close users are to rate limits |
CapDenied |
[Tier] |
Count | Requests rejected by rate limiting |
BedrockCostEstimate |
— | None (USD) | Per-request cost estimate |
ErrorRate |
[StatusCode] |
Count | Errors by HTTP status code |
Latency |
— | Milliseconds | End-to-end request latency |
All metrics land in the Aletheia/API namespace. Tier dimensions are validated against a fixed set (free, subscriber, admin, pro, enterprise) to prevent cardinality explosion from malformed JWTs.
Component-Level Latency
Each request records timing breakdowns across seven stages, returned in the API response for client-side diagnostics:
| Component | What it measures |
|---|---|
| Input validation | Text checks, UTF-8 verification |
| Denylist check | HashSet lookup against term list |
| Semantic guardrail | LLM classification call |
| DynamoDB persist | Conditional write with TTL |
| Bedrock generation | Model invocation (dominant cost) |
| Score processing | Parse and threshold scores |
| Total | End-to-end Lambda execution |
EMF Payload Structure
Each metric is a structured JSON log line. CloudWatch extracts the _aws envelope automatically:
Client-Side Diagnostics
The extension popup includes a diagnostics panel that shows:
- Extension version and browser type
- Service worker status and last restart time
- API endpoint connectivity
- Auth state and token expiry (when auth is enabled)
- Last request latency breakdown
This lets users self-diagnose issues before filing a bug report, and gives the team immediate signal when debugging user-reported problems.
Health Check
The /health endpoint returns system status without invoking any LLM calls. It validates that the Lambda is responsive, DynamoDB is reachable, and the Bedrock client can be instantiated. This is the target for post-deploy smoke tests and uptime monitoring.
Admin Metrics Dashboard
The /metrics endpoint (admin-only) returns aggregate statistics: total requests, unique users, requests by tier, error rates, and cost estimates. No individual user data is exposed — all metrics are aggregated at the system level.
Aletheia