Observability & Monitoring

The constraint: NEVER log prompt text, user input, completion text, URLs, or user IDs. Observability for an LLM system must answer "is it working?" and "how much does it cost?" without ever seeing what users asked.

Observability in LLM systems is fundamentally different from traditional web services. You can't just log request/response bodies — those contain user-generated text that may be sensitive. Every metric, trace annotation, and dashboard widget must be designed with the assumption that the content is private, while the operational metadata is not.

Like the Clacks towers of the Grand Trunk, every message carries overhead bytes recording its journey — how long each tower held it, which ones dropped it, the cost of the semaphore time — but the message itself is never read by the operators. GNU Terry Pratchett.

X-Ray Distributed Tracing

AWS X-Ray traces requests end-to-end across Lambda, DynamoDB, and Bedrock. Sampling is set at 5% to balance visibility against cost.

What we trace (safe)

What we never trace

The X-Ray SDK is patched via patch_all() for automatic boto3 instrumentation. Custom subsegments wrap each pipeline stage so latency breakdowns show exactly where time is spent.

CloudWatch Metrics (EMF)

Six metrics are emitted via Embedded Metric Format (EMF) — structured log lines that CloudWatch automatically parses into queryable metrics, without the cost of PutMetricData API calls:

MetricDimensionsUnitPurpose
RequestCount [Tier] Count Total requests by user tier
CapUtilization [Tier, Window] Percent How close users are to rate limits
CapDenied [Tier] Count Requests rejected by rate limiting
BedrockCostEstimate None (USD) Per-request cost estimate
ErrorRate [StatusCode] Count Errors by HTTP status code
Latency Milliseconds End-to-end request latency

All metrics land in the Aletheia/API namespace. Tier dimensions are validated against a fixed set (free, subscriber, admin, pro, enterprise) to prevent cardinality explosion from malformed JWTs.

Component-Level Latency

Each request records timing breakdowns across seven stages, returned in the API response for client-side diagnostics:

ComponentWhat it measures
Input validationText checks, UTF-8 verification
Denylist checkHashSet lookup against term list
Semantic guardrailLLM classification call
DynamoDB persistConditional write with TTL
Bedrock generationModel invocation (dominant cost)
Score processingParse and threshold scores
TotalEnd-to-end Lambda execution

EMF Payload Structure

Each metric is a structured JSON log line. CloudWatch extracts the _aws envelope automatically:

Embedded Metric Format (EMF) payload { "_aws": { "Timestamp": 1708900000000, "CloudWatchMetrics": [{ "Namespace": "Aletheia/API", "Dimensions": [["Tier"]], "Metrics": [{ "Name": "RequestCount", "Unit": "Count" }] }] }, "Tier": "free", "RequestCount": 1 }

Client-Side Diagnostics

The extension popup includes a diagnostics panel that shows:

This lets users self-diagnose issues before filing a bug report, and gives the team immediate signal when debugging user-reported problems.

Health Check

The /health endpoint returns system status without invoking any LLM calls. It validates that the Lambda is responsive, DynamoDB is reachable, and the Bedrock client can be instantiated. This is the target for post-deploy smoke tests and uptime monitoring.

Admin Metrics Dashboard

The /metrics endpoint (admin-only) returns aggregate statistics: total requests, unique users, requests by tier, error rates, and cost estimates. No individual user data is exposed — all metrics are aggregated at the system level.