Operations & Cost Control
Lord Vetinari governs Ankh-Morpork not through force but through careful observation and the occasional, precise intervention. The city runs itself; he merely ensures nothing runs away with it. The same principle applies here.
Three-Layer Kill Switch
When something goes wrong, the system shuts itself down. Three independent mechanisms ensure that no single failure mode can result in unbounded spend:
AletheiaKillSwitch, which sets Lambda concurrency to 0. All requests immediately return 503.
aws lambda put-function-concurrency --reserved-concurrent-executions 0 — the nuclear option, available via CLI script (lambda-off.sh).
Restoration is intentionally manual: aws lambda delete-function-concurrency --function-name AletheiaAgent. You don't want an auto-heal mechanism that might restore service during an active attack.
Seven-Layer Defense in Depth
Cost control isn't a single feature — it's seven independent layers, any one of which can stop a runaway bill:
| # | Layer | Mechanism | Fail Mode |
|---|---|---|---|
| 1 | CloudFlare Rate Limit | 3 req / 10s / IP | 429 at edge |
| 2 | Origin Secret | Shared secret in Worker | 403 before compute |
| 3 | Input Validation | Max 20KB text, UTF-8 | 400 before LLM |
| 4 | Token Cap (Daily) | 20 requests/day default | 429 before LLM |
| 5 | Multi-Window Rate Limit | Hourly/Daily/Monthly DDB | 429 before LLM |
| 6 | Kill Switch | Concurrency → 0 | 503 all requests |
| 7 | AWS Budget | Monthly spend threshold | Alert + manual action |
The key insight is that layers 1–5 prevent expensive operations from starting. They reject requests before Bedrock is invoked. Layer 6 stops everything after the fact. Layer 7 is the human-in-the-loop backstop.
Real Cost Model
LLM cost is often hand-waved. Here's the actual breakdown for a single Aletheia request using Nova Micro:
| Component | Cost per Request | Notes |
|---|---|---|
| Bedrock (Nova Micro) | $0.000075 | ~100 tokens at $0.75/1M |
| Lambda compute | $0.000042 | ~700ms at 256MB |
| DynamoDB write | $0.0000125 | 1 WCU |
| X-Ray trace (5%) | $0.0000025 | $0.000050 × 5% sampling |
| CloudWatch logs | $0.000005 | ~500 bytes/request |
| Total | $0.000137 |
At 100 daily active users making 5 requests each: $2.10/month. At 1,000 DAU: $21/month. The cost model makes Aletheia viable as a free product — the ceiling is low enough that a kill switch at 100 invocations/5min provides a safety margin of orders of magnitude above normal traffic.
Multi-Window Rate Limiting
Rate limits use DynamoDB atomic transactions across three concurrent time windows:
| Window | Key Format | TTL | Reset |
|---|---|---|---|
| Hourly | RATE#HOURLY#{ISO-hour} | 2 hours | Next hour |
| Daily | RATE#DAILY#{ISO-date} | 2 days | Midnight UTC |
| Monthly | RATE#MONTHLY#{YYYY-MM} | 35 days | Billing anchor day |
All three windows are checked and incremented in a single DynamoDB TransactWriteItems call. If any window is at capacity, the request is rejected before the LLM is invoked. TTLs ensure expired counters are automatically cleaned up.
The fail mode is fail-closed: if DynamoDB is unreachable, requests are denied (503) rather than allowed through. This is a deliberate choice — a temporary denial of service is preferable to an unbounded bill.
Feature Flags
Operational controls are feature-flagged via Lambda environment variables, not code deploys:
AUTH_ENABLED— toggles JWT validation on the main pipelineMODEL_ID— switches between Nova Micro and Haiku without redeployKILL_SWITCH_THRESHOLD— invocation count that triggers auto-shutdown
Changing a feature flag requires updating the Lambda environment and waiting for the next cold start. There is no hot-reload — this is intentional. You want flag changes to be visible in deployment history, not silently applied.
Post-Deploy Smoke Testing
Every deployment runs a two-step verification:
- Health check:
GET /health— verifies Lambda is responsive and dependencies are reachable - Analysis test:
POST /with a known-safe test payload — verifies the full pipeline end-to-end, including Bedrock invocation
If either fails, the deployment is considered broken. The CI pipeline runs these automatically after infrastructure provisioning.
Rollback
Lambda supports instant rollback to any previous version. The rollback procedure is documented in the cost incident response runbook and can be executed in under 60 seconds:
- Identify the last known-good version from Lambda version history
- Update the function alias to point to that version
- Run smoke tests to verify rollback succeeded
No code redeploy required. No build pipeline. Just an AWS CLI command.
Aletheia