Skip to content

telemetry: production /api/stats with persistent query_log + Vercel deploy fix#4

Merged
IgnazioDS merged 3 commits intomainfrom
telemetry-public-stats
Apr 27, 2026
Merged

telemetry: production /api/stats with persistent query_log + Vercel deploy fix#4
IgnazioDS merged 3 commits intomainfrom
telemetry-public-stats

Conversation

@IgnazioDS
Copy link
Copy Markdown
Owner

Summary

Three commits on this branch:

  1. b4c5651 fix: shrink Vercel bundle below 250MB cap to fix production 500. Heavy deps (boto3, google-cloud-aiplatform, google-cloud-discoveryengine, langgraph) moved to optional dependency groups; langgraph lazy-imported inside build_graph(). Bundle drops 324MB → ~100MB.
  2. 2eca850 fix: stub botocore.exceptions and google.api_core.exceptions exception types when those optional packages aren't installed, so module load on the slim deploy doesn't crash on ImportError.
  3. f7d00fa feat: Tier-A telemetry — alembic migration 0031_query_log for the query_log table, FastAPI middleware that records (query_id, started_at, completed_at, latency_ms, retrieved_chunks, status) per /v1/run request, aggregator that computes queries_total, queries_24h, queries_7d, p50/p95_latency_ms, avg_retrieval_size, indexed_chunks, last_active_at, and the public /api/stats route per the TELEMETRY_SCHEMA Tier-A contract.

Why two fix commits before the feat

The original prod 500 had two layers:

  • Layer 1 (bundle size): Vercel was hitting Bundle size (324.02 MB) exceeds limit. Enabling runtime dependency installation. and timing out cold starts. Commit 1 fixes this.
  • Layer 1.5 (top-level exception imports): While testing locally with the slimmed deps, two provider modules were doing from botocore.exceptions import ... / from google.api_core.exceptions import ... at module top-level, breaking import on the slim runtime. Commit 2 stubs the exception types behind try/except ImportError.
  • Layer 2 (production secrets): A deliberate model_validator at nexusrag/core/config.py:392 refuses to start the app with the default UI_CURSOR_SECRET. Resolved by setting the env var on the Vercel project (this is environment work, not code).

Database migration

alembic upgrade head

Adds query_log table with (id, query_id, started_at, completed_at, latency_ms, retrieved_chunks, status) and a descending index on completed_at for cheap windowed aggregations.

Privacy

query_log stores only id/timestamps/latency/chunk-count/status. No prompt text, model output, tenant id, user id, or IP ever lands there. The /api/stats endpoint never returns row-level data.

Failure mode

/api/stats never returns HTTP 5xx. On aggregator failure (DB down, schema drift, etc.) the route returns HTTP 200 with status: "degraded", zeroed metrics, and the envelope stays contract-compliant. Internal error messages are logged but never appear in the response body.

Tests

nexusrag/tests/unit/test_stats_route.py — happy path, degraded path (assertion: no internal error string in body), header coverage, OPTIONS preflight, field-type parametrization.

Test plan

  • Local: FastAPI app imports cleanly with slim deps (399 routes register, no ModuleNotFoundError)
  • Vercel preview: build succeeds (no Bundle size exceeds limit warning)
  • Vercel preview: cold start succeeds (gated on UI_CURSOR_SECRET env var on the production project)
  • curl https://nexusrag-lyart.vercel.app/api/stats returns HTTP 200 with schema_version: 1 and the Tier-A metric shape
  • CORS preflight from https://eleventh.dev console
  • After first /v1/run traffic: queries_total, queries_24h start incrementing

The Vercel build log showed:
  Bundle size (324.02 MB) exceeds limit. Enabling runtime dependency installation.

When that fallback triggers, deps get pip-installed at cold-start time
inside the function rather than baked into the bundle. The install
overrun the cold-start budget on every cold lambda, so every request
to a cold instance returned 500. The runtime logs confirm consistent
500s on / and /api/stats since the heavy deps were added.

Fix:
- Move boto3 (declared but never imported anywhere), google-cloud-
  aiplatform, google-cloud-discoveryengine, and langgraph from default
  dependencies to optional dependency groups (`agent`, `vertex`, `aws`)
  in pyproject.toml.
- Mirror the trim in requirements.txt so Vercel's auto-detection
  installs only the API-skeleton deps.
- Lazy-import langgraph inside build_graph() in nexusrag/agent/graph.py
  so module load no longer pulls langgraph into the boot path.

Bundle size drops from 324MB to ~100MB, well under the 250MB cap.
Runtime dependency installation is no longer needed; cold start
imports cleanly.

The /v1/run route still works on deployments where the optional `agent`
extra is installed (e.g. Cloud Run, Fly.io for the actual LLM gateway).
On Vercel, calling /v1/run will fail with a clean ImportError, which
is the honest signal that LLM heavy-lifting is intentionally not on
this serverless surface — Vercel hosts the API skeleton, telemetry,
docs, and lightweight routes.
The bundle-slim commit moved boto3, google-cloud-aiplatform, and
google-cloud-discoveryengine to optional dependency groups so the
Vercel deploy fits under the 250MB cap. But two provider modules had
top-level imports of the supporting libraries' exception types
(`botocore.exceptions.BotoCoreError`/`ClientError`,
`google.api_core.exceptions.GoogleAPICallError`/`RetryError`),
breaking module load on the slim deploy.

Wrap each in try/except ImportError and define empty stub classes
when the optional package isn't installed. The stubs are unreachable
in normal flow because no code path that would raise them runs without
the underlying SDK; they exist purely so the `isinstance(exc, ...)`
guard expressions stay valid syntax.

Sites:
- nexusrag/providers/retrieval/bedrock_kb.py — BotoCoreError, ClientError
- nexusrag/providers/retrieval/vertex_ai.py — GoogleAPICallError, RetryError

After this fix, the FastAPI app imports cleanly with only the slim
default dependency set installed (verified locally: 399 routes register,
no ModuleNotFoundError).
Implements the Tier-A telemetry contract from
https://github.com/IgnazioDS/IgnazioDS/blob/main/TELEMETRY_SCHEMA.md
for consumption by the Production Telemetry panel on
https://eleventh.dev. The widget polls every 30s.

Schema changes:
- New alembic migration 0031_query_log adds the query_log table with
  columns (id, query_id, started_at, completed_at, latency_ms,
  retrieved_chunks, status) and a CHECK constraint on status.
  Descending index on completed_at supports the windowed aggregations
  without sequential scans.
- New domain model `QueryLog` (nexusrag/domain/models.py) registers the
  table on the existing Base.metadata so alembic autogeneration stays
  consistent with hand-written migrations.

Recording (write path):
- A small `record_query()` helper in
  nexusrag/persistence/repos/query_log.py performs the insert and
  computes latency_ms from started_at/completed_at.
- The existing `request_context_middleware` in
  nexusrag/apps/api/main.py is extended to schedule a fire-and-forget
  query_log write for every request whose path starts with /v1/run or
  /run. The DB write opens its own session via SessionLocal so the
  request path never blocks on telemetry, and any failure is swallowed
  with a warning log — telemetry must not break the request.
- Routes can populate request.state.retrieved_chunks to surface the
  actual chunk count for the row; otherwise the field is recorded as 0.

Aggregation (read path):
- nexusrag/persistence/repos/query_log.py also exposes `aggregate()`
  which runs windowed counts, percentile_cont(0.50/0.95) over the 24h
  window for p50/p95 latency, AVG over 24h for retrieval size, COUNT(*)
  on chunks for indexed_chunks, and MAX(completed_at) for last_active_at.
  Each rollup is a separate cheap query so a transient failure in one
  doesn't blow up the whole response.

Route:
- nexusrag/apps/api/routes/stats.py registers GET /api/stats and
  OPTIONS /api/stats (CORS preflight, returns 204). The route is mounted
  outside the /v1 prefix so the public path stays canonical.
- main.py adds /api/stats to _LEGACY_EXEMPT_PREFIXES so the deprecation
  Sunset/Link headers don't get attached to a public endpoint.
- All counters are clamped by SAFETY_CAPS in the repo module to prevent
  runaway exposure (queries_total <= 10M, queries_24h <= 1M, etc).

Privacy:
- query_log stores ONLY id, query_id, timestamps, latency, chunk count,
  and a constrained status string. No prompt text, model output, tenant
  ids, or user ids ever land in this table.
- /api/stats returns ONLY counts and percentiles, never row-level data.

Failure mode:
- /api/stats never returns HTTP 5xx. On aggregator failure the route
  returns HTTP 200 with status="degraded", zeroed metrics, and the
  envelope stays contract-compliant. Internal error messages are
  logged but never appear in the response body (a public endpoint
  must not leak detail).

Tests (nexusrag/tests/unit/test_stats_route.py):
- happy path: response shape matches Tier-A contract, metrics surfaced
- degraded path: aggregator raises -> HTTP 200 with status=degraded,
  zero metrics, no internal error string in the body
- header coverage: Cache-Control, CORS-* headers present
- OPTIONS preflight: 204 with CORS headers
- field-type parametrization for the response envelope
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 27, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
nexusrag Ready Ready Preview, Comment Apr 27, 2026 6:59pm

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f7d00fa3ef

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread pyproject.toml
Comment on lines +30 to +31
agent = [
"langgraph>=0.0.59",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Keep langgraph in base dependencies

This commit makes langgraph optional, but /v1/run still depends on it unconditionally via build_graph() at runtime. Our default runtime install path (pip install . in Dockerfile) does not install extras, so deployments built from base dependencies will return stream errors for every run request once from langgraph.graph import END, StateGraph executes. Keep langgraph in dependencies or make runtime installs explicitly include the agent extra.

Useful? React with 👍 / 👎.

Comment thread nexusrag/apps/api/main.py
Comment on lines 119 to 121
response = await call_next(request)
completed_at_dt = datetime.now(timezone.utc)
latency_ms = (time.monotonic() - start) * 1000.0
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Measure query completion after SSE stream ends

The query log timestamps are captured immediately after call_next, but /run returns a StreamingResponse whose generation continues after this point, so completed_at and latency_ms are recorded before the actual query finishes. This makes the published p50/p95 latency metrics systematically too low for real streamed runs and breaks the intended end-to-end timing semantics.

Useful? React with 👍 / 👎.

Comment thread nexusrag/apps/api/main.py
started_at=started_at_dt,
completed_at=completed_at_dt,
retrieved_chunks=retrieved_chunks,
status="ok" if response.status_code < 400 else "error",
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Derive query status from run outcome, not HTTP code

For /run, many failures are encoded as SSE error events while the HTTP status remains 200, so this status mapping marks failed runs as ok. That skews last_active_at (which filters on status='ok') and any future success/error reporting from query_log. The status should be set from the stream’s logical outcome, not only response.status_code.

Useful? React with 👍 / 👎.

Comment thread nexusrag/apps/api/main.py
Comment on lines +134 to +138
chunks_attr = getattr(request.state, "retrieved_chunks", 0)
try:
retrieved_chunks = int(chunks_attr)
except (TypeError, ValueError):
retrieved_chunks = 0
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Persist actual retrieved chunk counts

The middleware falls back to 0 unless request.state.retrieved_chunks is set, but there is no assignment to that request-state field in the request handlers, so every inserted row records retrieved_chunks=0. As a result, /api/stats will report avg_retrieval_size as zero regardless of real retrieval behavior.

Useful? React with 👍 / 👎.

@IgnazioDS IgnazioDS merged commit 810c990 into main Apr 27, 2026
2 of 6 checks passed
@IgnazioDS IgnazioDS deleted the telemetry-public-stats branch April 27, 2026 19:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant