Operations & Cost Control

The problem: LLM APIs charge per token. A single runaway loop or DDoS can generate an unbounded bill. Every layer of the system is designed to prevent this — not just detect it after the fact.

Lord Vetinari governs Ankh-Morpork not through force but through careful observation and the occasional, precise intervention. The city runs itself; he merely ensures nothing runs away with it. The same principle applies here.

Three-Layer Kill Switch

When something goes wrong, the system shuts itself down. Three independent mechanisms ensure that no single failure mode can result in unbounded spend:

1
CloudWatch Alarm → Kill Switch Lambda If invocations exceed 100 in 5 minutes, an SNS notification triggers AletheiaKillSwitch, which sets Lambda concurrency to 0. All requests immediately return 503.
2
AWS Budget Alarm Monthly budget threshold triggers notification and alerts. Independent of application code — works even if the Lambda itself is compromised.
3
Manual Concurrency Override aws lambda put-function-concurrency --reserved-concurrent-executions 0 — the nuclear option, available via CLI script (lambda-off.sh).

Restoration is intentionally manual: aws lambda delete-function-concurrency --function-name AletheiaAgent. You don't want an auto-heal mechanism that might restore service during an active attack.

Seven-Layer Defense in Depth

Cost control isn't a single feature — it's seven independent layers, any one of which can stop a runaway bill:

#LayerMechanismFail Mode
1CloudFlare Rate Limit3 req / 10s / IP429 at edge
2Origin SecretShared secret in Worker403 before compute
3Input ValidationMax 20KB text, UTF-8400 before LLM
4Token Cap (Daily)20 requests/day default429 before LLM
5Multi-Window Rate LimitHourly/Daily/Monthly DDB429 before LLM
6Kill SwitchConcurrency → 0503 all requests
7AWS BudgetMonthly spend thresholdAlert + manual action

The key insight is that layers 1–5 prevent expensive operations from starting. They reject requests before Bedrock is invoked. Layer 6 stops everything after the fact. Layer 7 is the human-in-the-loop backstop.

Real Cost Model

LLM cost is often hand-waved. Here's the actual breakdown for a single Aletheia request using Nova Micro:

ComponentCost per RequestNotes
Bedrock (Nova Micro)$0.000075~100 tokens at $0.75/1M
Lambda compute$0.000042~700ms at 256MB
DynamoDB write$0.00001251 WCU
X-Ray trace (5%)$0.0000025$0.000050 × 5% sampling
CloudWatch logs$0.000005~500 bytes/request
Total$0.000137

At 100 daily active users making 5 requests each: $2.10/month. At 1,000 DAU: $21/month. The cost model makes Aletheia viable as a free product — the ceiling is low enough that a kill switch at 100 invocations/5min provides a safety margin of orders of magnitude above normal traffic.

Multi-Window Rate Limiting

Rate limits use DynamoDB atomic transactions across three concurrent time windows:

WindowKey FormatTTLReset
HourlyRATE#HOURLY#{ISO-hour}2 hoursNext hour
DailyRATE#DAILY#{ISO-date}2 daysMidnight UTC
MonthlyRATE#MONTHLY#{YYYY-MM}35 daysBilling anchor day

All three windows are checked and incremented in a single DynamoDB TransactWriteItems call. If any window is at capacity, the request is rejected before the LLM is invoked. TTLs ensure expired counters are automatically cleaned up.

DynamoDB transactional rate limit (simplified) # Single atomic transaction checks all windows response = dynamodb.transact_write_items( TransactItems=[ { # Hourly window "Update": { "Key": {"pk": f"RATE#HOURLY#{hour_key}"}, "ConditionExpression": "#count < :cap", "UpdateExpression": "ADD #count :one SET #ttl = :ttl" } }, { # Daily window "Update": { /* same pattern */ } }, { # Monthly window "Update": { /* same pattern */ } } ] ) # If any ConditionExpression fails → TransactionCanceledException # → 429 returned, no Bedrock invocation

The fail mode is fail-closed: if DynamoDB is unreachable, requests are denied (503) rather than allowed through. This is a deliberate choice — a temporary denial of service is preferable to an unbounded bill.

Feature Flags

Operational controls are feature-flagged via Lambda environment variables, not code deploys:

Changing a feature flag requires updating the Lambda environment and waiting for the next cold start. There is no hot-reload — this is intentional. You want flag changes to be visible in deployment history, not silently applied.

Post-Deploy Smoke Testing

Every deployment runs a two-step verification:

  1. Health check: GET /health — verifies Lambda is responsive and dependencies are reachable
  2. Analysis test: POST / with a known-safe test payload — verifies the full pipeline end-to-end, including Bedrock invocation

If either fails, the deployment is considered broken. The CI pipeline runs these automatically after infrastructure provisioning.

Rollback

Lambda supports instant rollback to any previous version. The rollback procedure is documented in the cost incident response runbook and can be executed in under 60 seconds:

  1. Identify the last known-good version from Lambda version history
  2. Update the function alias to point to that version
  3. Run smoke tests to verify rollback succeeded

No code redeploy required. No build pipeline. Just an AWS CLI command.