Scaling Notes | IDA — Identity for the AI Age

Run 3+ replicas behind the LB.
HPA target: CPU 60%, memory 75%.
Steady-state CPU per pod for ~100 RPS: ~250–500 m.
Memory ~256–512 Mi per pod under steady load.
Connection-pool size: min(50, n_postgres_connections / n_pods) — see pgxpool.Config.MaxConns.

Vertical first: start with 4 vCPU / 16 GB RAM, increase before sharding.
Read replicas: for issuer-analytics + dashboard reads, deploy 1–2 read replicas; route via DSN logic in pkg/repository.
Connection pooler: PgBouncer in transaction-pooling mode if total clients × MaxConns > Postgres max_connections.
Indexes: the 002_* migrations cover the hot paths; profile new queries with EXPLAIN ANALYZE before adding.
Vacuum: autovacuum on; tune autovacuum_vacuum_scale_factor=0.05 for the large audit_events and credentials tables.

Single-instance is fine for OTP / rate-limit / session up to ~5k RPS.
Replicated (1 primary + 2 replicas) for HA — failover via Sentinel or managed (ElastiCache).
Cluster mode only if memory > 100 GB or > 50k RPS — IDA’s working set is small (TTL-bounded keys).
Eviction: allkeys-lru (compose default) — safe because all data has natural TTLs except rate-limit buckets, which self-cleanup.

Self-hosted node preferred for hot workloads — avoids per-call quota.
RPC pool: maintain 2–3 RPC URLs and round-robin in the chain client; failover on 5xx or RPC error.
WebSocket subscription for events (DIDRegistry, RevocationRegistry) to keep off-chain cache fresh.
Gas budgeting: monitor eth_gasPrice; alert if > 2x baseline. Bulk operations should retry with EIP-1559 fee bump.

Question	Where to find the answer
What is the peak request rate?	Prometheus `http_requests_total`, 95th percentile / 1 min
What is the worst-case latency?	`http_request_duration_seconds` p99
Is the DB the bottleneck?	`pg_stat_statements`, slow-query log
Is the chain the bottleneck?	RPC error rate + tx confirmation time dashboard
Can the API scale linearly?	HPA replica count vs. RPS — if RPS does not scale linearly, find the shared lock