Observability

View as Markdown

The API server uses Uber Zap (go.uber.org/zap) for structured JSON logging. Verified at cmd/server/main.go:27 (logger, _ := zap.NewProduction()).

14.8.1 Log levels

LevelWhen to use
debugTrace-level — request bodies, full SQL. Dev only.
infoDefault — request completed, service ready, on-chain tx receipt.
warnRecoverable issues — chain client unavailable, OAuth provider degraded.
errorBug or environment failure — handler panic, DB pool exhausted.
fatalBoot-time only — config load failed, DB ping failed. Process exits.

Set with LOG_LEVEL=debug|info|warn|error.

14.8.2 Health probes

Two HTTP endpoints (handlers in internal/handler/health.go):

EndpointUsed byBehaviour
GET /healthLivenessAlways returns 200 { "status": "ok" } if the process is up.
GET /readyReadinessPings Postgres + Redis; returns 200 only when both are up. Returns 503 otherwise.

Map these to Kubernetes:

1livenessProbe:
2 httpGet: { path: /health, port: 8080 }
3readinessProbe:
4 httpGet: { path: /ready, port: 8080 }

14.8.3 Metrics

VERIFY: confirm whether the API exposes a Prometheus /metrics endpoint. If present, recommended scrape:

1- job_name: ida-api
2 metrics_path: /metrics
3 static_configs:
4 - targets: [ "ida-api.ida.svc.cluster.local:8080" ]

Recommended dashboards:

  • Request rate / status / latency (RED method) — http_requests_total, http_request_duration_seconds, by path, method, status.
  • Database pool — open / idle / in-use connections.
  • Redis — command rate, keyspace size, evictions.
  • Chain client — RPC call rate, RPC error rate, mean tx confirmation time.
  • Auth funnel — OTP issued / verified / failed; refresh token rotated / replayed.

14.8.4 Log shipping

JSON logs to stdout — pipe into:

  • Cluster: Fluent Bit / Vector → Loki / Elasticsearch / OpenSearch.
  • Single-host: docker compose logsjournald or files mounted at /var/log/ida/.

Required fields to keep searchable: request_id, auth_subject, auth_role, path, status, latency_ms.

14.8.5 Tracing

Recommended (not yet wired): OpenTelemetry — wrap Chi router with otelhttp.NewMiddleware. Propagate traceparent from clients (Portal already sets X-Request-Id).

VERIFY: OTel integration status in pkg/telemetry/.