Operations & Cost Control

        The problem: LLM APIs charge per token. A single runaway loop or DDoS can generate an unbounded bill. Every layer of the system is designed to prevent this — not just detect it after the fact.
      

Lord Vetinari governs Ankh-Morpork not through force but through careful observation and the occasional, precise intervention. The city runs itself; he merely ensures nothing runs away with it. The same principle applies here.

Three-Layer Kill Switch

When something goes wrong, the system shuts itself down. Three independent mechanisms ensure that no single failure mode can result in unbounded spend:

CloudWatch Alarm → Kill Switch Lambda If invocations exceed 100 in 5 minutes, an SNS notification triggers AletheiaKillSwitch, which sets Lambda concurrency to 0. All requests immediately return 503.

AWS Budget Alarm Monthly budget threshold triggers notification and alerts. Independent of application code — works even if the Lambda itself is compromised.

Manual Concurrency Override aws lambda put-function-concurrency --reserved-concurrent-executions 0 — the nuclear option, available via CLI script (lambda-off.sh).

Restoration is intentionally manual: aws lambda delete-function-concurrency --function-name AletheiaAgent. You don't want an auto-heal mechanism that might restore service during an active attack.

Seven-Layer Defense in Depth

Cost control isn't a single feature — it's seven independent layers, any one of which can stop a runaway bill:

#	Layer	Mechanism	Fail Mode
1	CloudFlare Rate Limit	3 req / 10s / IP	429 at edge
2	Origin Secret	Shared secret in Worker	403 before compute
3	Input Validation	Max 20KB text, UTF-8	400 before LLM
4	Token Cap (Daily)	20 requests/day default	429 before LLM
5	Multi-Window Rate Limit	Hourly/Daily/Monthly DDB	429 before LLM
6	Kill Switch	Concurrency → 0	503 all requests
7	AWS Budget	Monthly spend threshold	Alert + manual action

The key insight is that layers 1–5 prevent expensive operations from starting. They reject requests before Bedrock is invoked. Layer 6 stops everything after the fact. Layer 7 is the human-in-the-loop backstop.

Real Cost Model

LLM cost is often hand-waved. Here's the actual breakdown for a single Aletheia request using Nova Micro:

Component	Cost per Request	Notes
Bedrock (Nova Micro)	$0.000075	~100 tokens at $0.75/1M
Lambda compute	$0.000042	~700ms at 256MB
DynamoDB write	$0.0000125	1 WCU
X-Ray trace (5%)	$0.0000025	$0.000050 × 5% sampling
CloudWatch logs	$0.000005	~500 bytes/request
Total	$0.000137

At 100 daily active users making 5 requests each: $2.10/month. At 1,000 DAU: $21/month. The cost model makes Aletheia viable as a free product — the ceiling is low enough that a kill switch at 100 invocations/5min provides a safety margin of orders of magnitude above normal traffic.

Multi-Window Rate Limiting

Rate limits use DynamoDB atomic transactions across three concurrent time windows:

Window	Key Format	TTL	Reset
Hourly	`RATE#HOURLY#{ISO-hour}`	2 hours	Next hour
Daily	`RATE#DAILY#{ISO-date}`	2 days	Midnight UTC
Monthly	`RATE#MONTHLY#{YYYY-MM}`	35 days	Billing anchor day

All three windows are checked and incremented in a single DynamoDB TransactWriteItems call. If any window is at capacity, the request is rejected before the LLM is invoked. TTLs ensure expired counters are automatically cleaned up.

        DynamoDB transactional rate limit (simplified)
# Single atomic transaction checks all windows
response = dynamodb.transact_write_items(
    TransactItems=[
        {   # Hourly window
            "Update": {
                "Key": {"pk": f"RATE#HOURLY#{hour_key}"},
                "ConditionExpression": "#count < :cap",
                "UpdateExpression": "ADD #count :one SET #ttl = :ttl"
            }
        },
        {   # Daily window
            "Update": { /* same pattern */ }
        },
        {   # Monthly window
            "Update": { /* same pattern */ }
        }
    ]
)
# If any ConditionExpression fails → TransactionCanceledException
# → 429 returned, no Bedrock invocation
      

The fail mode is fail-closed: if DynamoDB is unreachable, requests are denied (503) rather than allowed through. This is a deliberate choice — a temporary denial of service is preferable to an unbounded bill.

Feature Flags

Operational controls are feature-flagged via Lambda environment variables, not code deploys:

AUTH_ENABLED — toggles JWT validation on the main pipeline
MODEL_ID — switches between Nova Micro and Haiku without redeploy
KILL_SWITCH_THRESHOLD — invocation count that triggers auto-shutdown

Changing a feature flag requires updating the Lambda environment and waiting for the next cold start. There is no hot-reload — this is intentional. You want flag changes to be visible in deployment history, not silently applied.

Post-Deploy Smoke Testing

Every deployment runs a two-step verification:

Health check: GET /health — verifies Lambda is responsive and dependencies are reachable
Analysis test: POST / with a known-safe test payload — verifies the full pipeline end-to-end, including Bedrock invocation

If either fails, the deployment is considered broken. The CI pipeline runs these automatically after infrastructure provisioning.

Rollback

Lambda supports instant rollback to any previous version. The rollback procedure is documented in the cost incident response runbook and can be executed in under 60 seconds:

Identify the last known-good version from Lambda version history
Update the function alias to point to that version
Run smoke tests to verify rollback succeeded

No code redeploy required. No build pipeline. Just an AWS CLI command.