Observability & Monitoring

        The constraint: NEVER log prompt text, user input, completion text, URLs, or user IDs. Observability for an LLM system must answer "is it working?" and "how much does it cost?" without ever seeing what users asked.
      

Observability in LLM systems is fundamentally different from traditional web services. You can't just log request/response bodies — those contain user-generated text that may be sensitive. Every metric, trace annotation, and dashboard widget must be designed with the assumption that the content is private, while the operational metadata is not.

Like the Clacks towers of the Grand Trunk, every message carries overhead bytes recording its journey — how long each tower held it, which ones dropped it, the cost of the semaphore time — but the message itself is never read by the operators. GNU Terry Pratchett.

X-Ray Distributed Tracing

AWS X-Ray traces requests end-to-end across Lambda, DynamoDB, and Bedrock. Sampling is set at 5% to balance visibility against cost.

What we trace (safe)

Annotations (searchable): model_id, status, error_type
Metadata (visible): tokens_used, latency_ms, Bedrock namespace details
Subsegments: Custom spans for guardrails, Bedrock calls, DynamoDB writes

What we never trace

Prompt text or user selections
Model completions or generated content
Page URLs or user identifiers
Any dimension that could identify what a user was reading

The X-Ray SDK is patched via patch_all() for automatic boto3 instrumentation. Custom subsegments wrap each pipeline stage so latency breakdowns show exactly where time is spent.

CloudWatch Metrics (EMF)

Six metrics are emitted via Embedded Metric Format (EMF) — structured log lines that CloudWatch automatically parses into queryable metrics, without the cost of PutMetricData API calls:

Metric	Dimensions	Unit	Purpose
`RequestCount`	`[Tier]`	Count	Total requests by user tier
`CapUtilization`	`[Tier, Window]`	Percent	How close users are to rate limits
`CapDenied`	`[Tier]`	Count	Requests rejected by rate limiting
`BedrockCostEstimate`	—	None (USD)	Per-request cost estimate
`ErrorRate`	`[StatusCode]`	Count	Errors by HTTP status code
`Latency`	—	Milliseconds	End-to-end request latency

All metrics land in the Aletheia/API namespace. Tier dimensions are validated against a fixed set (free, subscriber, admin, pro, enterprise) to prevent cardinality explosion from malformed JWTs.

Component-Level Latency

Each request records timing breakdowns across seven stages, returned in the API response for client-side diagnostics:

Component	What it measures
Input validation	Text checks, UTF-8 verification
Denylist check	HashSet lookup against term list
Semantic guardrail	LLM classification call
DynamoDB persist	Conditional write with TTL
Bedrock generation	Model invocation (dominant cost)
Score processing	Parse and threshold scores
Total	End-to-end Lambda execution

EMF Payload Structure

Each metric is a structured JSON log line. CloudWatch extracts the _aws envelope automatically:

        Embedded Metric Format (EMF) payload
{
  "_aws": {
    "Timestamp": 1708900000000,
    "CloudWatchMetrics": [{
      "Namespace": "Aletheia/API",
      "Dimensions": [["Tier"]],
      "Metrics": [{
        "Name": "RequestCount",
        "Unit": "Count"
      }]
    }]
  },
  "Tier": "free",
  "RequestCount": 1
}
      

Client-Side Diagnostics

The extension popup includes a diagnostics panel that shows:

Extension version and browser type
Service worker status and last restart time
API endpoint connectivity
Auth state and token expiry (when auth is enabled)
Last request latency breakdown

This lets users self-diagnose issues before filing a bug report, and gives the team immediate signal when debugging user-reported problems.

Health Check

The /health endpoint returns system status without invoking any LLM calls. It validates that the Lambda is responsive, DynamoDB is reachable, and the Bedrock client can be instantiated. This is the target for post-deploy smoke tests and uptime monitoring.

Admin Metrics Dashboard

The /metrics endpoint (admin-only) returns aggregate statistics: total requests, unique users, requests by tier, error rates, and cost estimates. No individual user data is exposed — all metrics are aggregated at the system level.