telemetry: production /api/stats with persistent query_log + Vercel deploy fix by IgnazioDS · Pull Request #4 · IgnazioDS/NexusRAG

IgnazioDS · 2026-04-27T18:59:13Z

Summary

Three commits on this branch:

b4c5651 fix: shrink Vercel bundle below 250MB cap to fix production 500. Heavy deps (boto3, google-cloud-aiplatform, google-cloud-discoveryengine, langgraph) moved to optional dependency groups; langgraph lazy-imported inside build_graph(). Bundle drops 324MB → ~100MB.
2eca850 fix: stub botocore.exceptions and google.api_core.exceptions exception types when those optional packages aren't installed, so module load on the slim deploy doesn't crash on ImportError.
f7d00fa feat: Tier-A telemetry — alembic migration 0031_query_log for the query_log table, FastAPI middleware that records (query_id, started_at, completed_at, latency_ms, retrieved_chunks, status) per /v1/run request, aggregator that computes queries_total, queries_24h, queries_7d, p50/p95_latency_ms, avg_retrieval_size, indexed_chunks, last_active_at, and the public /api/stats route per the TELEMETRY_SCHEMA Tier-A contract.

Why two fix commits before the feat

The original prod 500 had two layers:

Layer 1 (bundle size): Vercel was hitting Bundle size (324.02 MB) exceeds limit. Enabling runtime dependency installation. and timing out cold starts. Commit 1 fixes this.
Layer 1.5 (top-level exception imports): While testing locally with the slimmed deps, two provider modules were doing from botocore.exceptions import ... / from google.api_core.exceptions import ... at module top-level, breaking import on the slim runtime. Commit 2 stubs the exception types behind try/except ImportError.
Layer 2 (production secrets): A deliberate model_validator at nexusrag/core/config.py:392 refuses to start the app with the default UI_CURSOR_SECRET. Resolved by setting the env var on the Vercel project (this is environment work, not code).

Database migration

alembic upgrade head

Adds query_log table with (id, query_id, started_at, completed_at, latency_ms, retrieved_chunks, status) and a descending index on completed_at for cheap windowed aggregations.

Privacy

query_log stores only id/timestamps/latency/chunk-count/status. No prompt text, model output, tenant id, user id, or IP ever lands there. The /api/stats endpoint never returns row-level data.

Failure mode

/api/stats never returns HTTP 5xx. On aggregator failure (DB down, schema drift, etc.) the route returns HTTP 200 with status: "degraded", zeroed metrics, and the envelope stays contract-compliant. Internal error messages are logged but never appear in the response body.

Tests

nexusrag/tests/unit/test_stats_route.py — happy path, degraded path (assertion: no internal error string in body), header coverage, OPTIONS preflight, field-type parametrization.

Test plan

Local: FastAPI app imports cleanly with slim deps (399 routes register, no ModuleNotFoundError)
Vercel preview: build succeeds (no Bundle size exceeds limit warning)
Vercel preview: cold start succeeds (gated on UI_CURSOR_SECRET env var on the production project)
curl https://nexusrag-lyart.vercel.app/api/stats returns HTTP 200 with schema_version: 1 and the Tier-A metric shape
CORS preflight from https://eleventh.dev console
After first /v1/run traffic: queries_total, queries_24h start incrementing

The Vercel build log showed: Bundle size (324.02 MB) exceeds limit. Enabling runtime dependency installation. When that fallback triggers, deps get pip-installed at cold-start time inside the function rather than baked into the bundle. The install overrun the cold-start budget on every cold lambda, so every request to a cold instance returned 500. The runtime logs confirm consistent 500s on / and /api/stats since the heavy deps were added. Fix: - Move boto3 (declared but never imported anywhere), google-cloud- aiplatform, google-cloud-discoveryengine, and langgraph from default dependencies to optional dependency groups (`agent`, `vertex`, `aws`) in pyproject.toml. - Mirror the trim in requirements.txt so Vercel's auto-detection installs only the API-skeleton deps. - Lazy-import langgraph inside build_graph() in nexusrag/agent/graph.py so module load no longer pulls langgraph into the boot path. Bundle size drops from 324MB to ~100MB, well under the 250MB cap. Runtime dependency installation is no longer needed; cold start imports cleanly. The /v1/run route still works on deployments where the optional `agent` extra is installed (e.g. Cloud Run, Fly.io for the actual LLM gateway). On Vercel, calling /v1/run will fail with a clean ImportError, which is the honest signal that LLM heavy-lifting is intentionally not on this serverless surface — Vercel hosts the API skeleton, telemetry, docs, and lightweight routes.

The bundle-slim commit moved boto3, google-cloud-aiplatform, and google-cloud-discoveryengine to optional dependency groups so the Vercel deploy fits under the 250MB cap. But two provider modules had top-level imports of the supporting libraries' exception types (`botocore.exceptions.BotoCoreError`/`ClientError`, `google.api_core.exceptions.GoogleAPICallError`/`RetryError`), breaking module load on the slim deploy. Wrap each in try/except ImportError and define empty stub classes when the optional package isn't installed. The stubs are unreachable in normal flow because no code path that would raise them runs without the underlying SDK; they exist purely so the `isinstance(exc, ...)` guard expressions stay valid syntax. Sites: - nexusrag/providers/retrieval/bedrock_kb.py — BotoCoreError, ClientError - nexusrag/providers/retrieval/vertex_ai.py — GoogleAPICallError, RetryError After this fix, the FastAPI app imports cleanly with only the slim default dependency set installed (verified locally: 399 routes register, no ModuleNotFoundError).

Implements the Tier-A telemetry contract from https://github.com/IgnazioDS/IgnazioDS/blob/main/TELEMETRY_SCHEMA.md for consumption by the Production Telemetry panel on https://eleventh.dev. The widget polls every 30s. Schema changes: - New alembic migration 0031_query_log adds the query_log table with columns (id, query_id, started_at, completed_at, latency_ms, retrieved_chunks, status) and a CHECK constraint on status. Descending index on completed_at supports the windowed aggregations without sequential scans. - New domain model `QueryLog` (nexusrag/domain/models.py) registers the table on the existing Base.metadata so alembic autogeneration stays consistent with hand-written migrations. Recording (write path): - A small `record_query()` helper in nexusrag/persistence/repos/query_log.py performs the insert and computes latency_ms from started_at/completed_at. - The existing `request_context_middleware` in nexusrag/apps/api/main.py is extended to schedule a fire-and-forget query_log write for every request whose path starts with /v1/run or /run. The DB write opens its own session via SessionLocal so the request path never blocks on telemetry, and any failure is swallowed with a warning log — telemetry must not break the request. - Routes can populate request.state.retrieved_chunks to surface the actual chunk count for the row; otherwise the field is recorded as 0. Aggregation (read path): - nexusrag/persistence/repos/query_log.py also exposes `aggregate()` which runs windowed counts, percentile_cont(0.50/0.95) over the 24h window for p50/p95 latency, AVG over 24h for retrieval size, COUNT(*) on chunks for indexed_chunks, and MAX(completed_at) for last_active_at. Each rollup is a separate cheap query so a transient failure in one doesn't blow up the whole response. Route: - nexusrag/apps/api/routes/stats.py registers GET /api/stats and OPTIONS /api/stats (CORS preflight, returns 204). The route is mounted outside the /v1 prefix so the public path stays canonical. - main.py adds /api/stats to _LEGACY_EXEMPT_PREFIXES so the deprecation Sunset/Link headers don't get attached to a public endpoint. - All counters are clamped by SAFETY_CAPS in the repo module to prevent runaway exposure (queries_total <= 10M, queries_24h <= 1M, etc). Privacy: - query_log stores ONLY id, query_id, timestamps, latency, chunk count, and a constrained status string. No prompt text, model output, tenant ids, or user ids ever land in this table. - /api/stats returns ONLY counts and percentiles, never row-level data. Failure mode: - /api/stats never returns HTTP 5xx. On aggregator failure the route returns HTTP 200 with status="degraded", zeroed metrics, and the envelope stays contract-compliant. Internal error messages are logged but never appear in the response body (a public endpoint must not leak detail). Tests (nexusrag/tests/unit/test_stats_route.py): - happy path: response shape matches Tier-A contract, metrics surfaced - degraded path: aggregator raises -> HTTP 200 with status=degraded, zero metrics, no internal error string in the body - header coverage: Cache-Control, CORS-* headers present - OPTIONS preflight: 204 with CORS headers - field-type parametrization for the response envelope

vercel · 2026-04-27T18:59:18Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
nexusrag	Ready	Preview, Comment	Apr 27, 2026 6:59pm

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f7d00fa3ef

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

chatgpt-codex-connector · 2026-04-27T19:04:13Z

+agent = [
+  "langgraph>=0.0.59",


Keep langgraph in base dependencies

This commit makes langgraph optional, but /v1/run still depends on it unconditionally via build_graph() at runtime. Our default runtime install path (pip install . in Dockerfile) does not install extras, so deployments built from base dependencies will return stream errors for every run request once from langgraph.graph import END, StateGraph executes. Keep langgraph in dependencies or make runtime installs explicitly include the agent extra.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-27T19:04:13Z

        response = await call_next(request)
+        completed_at_dt = datetime.now(timezone.utc)
        latency_ms = (time.monotonic() - start) * 1000.0


Measure query completion after SSE stream ends

The query log timestamps are captured immediately after call_next, but /run returns a StreamingResponse whose generation continues after this point, so completed_at and latency_ms are recorded before the actual query finishes. This makes the published p50/p95 latency metrics systematically too low for real streamed runs and breaks the intended end-to-end timing semantics.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-27T19:04:13Z

+                    started_at=started_at_dt,
+                    completed_at=completed_at_dt,
+                    retrieved_chunks=retrieved_chunks,
+                    status="ok" if response.status_code < 400 else "error",


Derive query status from run outcome, not HTTP code

For /run, many failures are encoded as SSE error events while the HTTP status remains 200, so this status mapping marks failed runs as ok. That skews last_active_at (which filters on status='ok') and any future success/error reporting from query_log. The status should be set from the stream’s logical outcome, not only response.status_code.

Useful? React with 👍 / 👎.

chatgpt-codex-connector · 2026-04-27T19:04:13Z

+            chunks_attr = getattr(request.state, "retrieved_chunks", 0)
+            try:
+                retrieved_chunks = int(chunks_attr)
+            except (TypeError, ValueError):
+                retrieved_chunks = 0


Persist actual retrieved chunk counts

The middleware falls back to 0 unless request.state.retrieved_chunks is set, but there is no assignment to that request-state field in the request handlers, so every inserted row records retrieved_chunks=0. As a result, /api/stats will report avg_retrieval_size as zero regardless of real retrieval behavior.

Useful? React with 👍 / 👎.

IgnazioDS added 3 commits April 28, 2026 02:17

chatgpt-codex-connector Bot reviewed Apr 27, 2026

View reviewed changes

IgnazioDS merged commit 810c990 into main Apr 27, 2026
2 of 6 checks passed

IgnazioDS deleted the telemetry-public-stats branch April 27, 2026 19:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

telemetry: production /api/stats with persistent query_log + Vercel deploy fix#4

telemetry: production /api/stats with persistent query_log + Vercel deploy fix#4
IgnazioDS merged 3 commits intomainfrom
telemetry-public-stats

IgnazioDS commented Apr 27, 2026

Uh oh!

vercel Bot commented Apr 27, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot Apr 27, 2026

Uh oh!

chatgpt-codex-connector Bot Apr 27, 2026

Uh oh!

chatgpt-codex-connector Bot Apr 27, 2026

Uh oh!

chatgpt-codex-connector Bot Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

		agent = [
		"langgraph>=0.0.59",

Conversation

IgnazioDS commented Apr 27, 2026

Summary

Why two fix commits before the feat

Database migration

Privacy

Failure mode

Tests

Test plan

Uh oh!

vercel Bot commented Apr 27, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

chatgpt-codex-connector Bot Apr 27, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant