Skip to content

fix: enable DB spans and metrics in telemetry#452

Open
turisanapo wants to merge 7 commits intomainfrom
fix/otel-db-instrumentation
Open

fix: enable DB spans and metrics in telemetry#452
turisanapo wants to merge 7 commits intomainfrom
fix/otel-db-instrumentation

Conversation

@turisanapo
Copy link
Copy Markdown
Contributor

@turisanapo turisanapo commented Apr 30, 2026

Summary

Fixes #449 — DB operations were invisible in telemetry (no db_* metric tables, no @opentelemetry/instrumentation-pg spans).

Two root causes, both fixed in packages/shared-api/lib/otel.ts:

  • Missing metrics (NoopMeter): registerInstrumentations() registered the DB instrumentations separately from the NodeSDK, so it never called setMeterProvider() on them — they kept a permanent NoopMeter. Fixed by creating them inside getOtelConfig() and returning them via the instrumentations field, so the NodeSDK owns them and sets a real MeterProvider after sdk.start().

  • Missing pg spans (RITM bypass): Bun resolves all static ESM imports before any module body runs (oven-sh/bun#3775), so the RITM hooks registered by PgInstrumentation never intercept pg (already loaded via @prisma/adapter-pg). Fixed by directly calling _patchPgClient(pg.Client) inside getOtelConfig(), which runs after the full module graph is available.

Test plan

  • bun run typecheck — 0 errors across all 8 workspaces
  • Verified locally: started hebo-auth, created users, queried GreptimeDB
    • @opentelemetry/instrumentation-pg: 59 spans (SELECT, INSERT, UPDATE, BEGIN, COMMIT)
    • @8monkey/opentelemetry-instrumentation-bun-sql: 2 spans
    • db_client_operation_duration_seconds_{bucket,count,sum} tables created with data
    • HTTP metrics (http_server_request_duration_seconds_*) unaffected

🤖 Generated with Claude Code

…rumentation

Closes #449

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 30, 2026

Warning

Rate limit exceeded

@turisanapo has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 52 minutes and 39 seconds before requesting another review.

To keep reviews running without waiting, you can enable usage-based add-on for your organization. This allows additional reviews beyond the hourly cap. Account admins can enable it under billing.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: bd6c1360-e485-49d4-8432-8fbe46d2fd0a

📥 Commits

Reviewing files that changed from the base of the PR and between d853188 and af7d508.

📒 Files selected for processing (1)
  • packages/shared-api/lib/otel.ts
📝 Walkthrough

Walkthrough

Refactored OpenTelemetry and Greptime DB initialization: instrumentation registration moved from import-time to on-demand inside getOtelConfig(), Bun/pg client instrumentation is patched dynamically per call, and Greptime DB client is lazily initialized via getGreptimeSqlClient() with exported GREPTIME_HOST.

Changes

Cohort / File(s) Summary
OpenTelemetry instrumentation
packages/shared-api/lib/otel.ts
Removed module-level registerInstrumentations(...). getOtelConfig(serviceName) now constructs PgInstrumentation and BunSqlInstrumentation per call, attempts a Bun-specific dynamic patch by loading pg from @prisma/adapter-pg and invoking _patchPgClient(...) (errors ignored), and uses GREPTIME_HOST directly for GREPTIME_OTLP_ENDPOINT.
Greptime DB client (lazy init)
packages/shared-api/db/greptime.ts
Replaced eager connection/client creation with export const GREPTIME_HOST and export function getGreptimeSqlClient() that lazily initializes a module-scoped Bun SQL client on first call; removed pre-created client and connection-string builder.
Middleware type & runtime change
apps/api/src/middlewares/greptime.ts
Middleware now acquires greptimeDb by calling getGreptimeSqlClient() during request resolution; exported GreptimeDb type changed from BunSqlClient to Bun.SQL to match returned client type.

Sequence Diagram(s)

sequenceDiagram
  participant App as Application (request)
  participant Middleware as Greptime Middleware
  participant OTEL as getOtelConfig()
  participant BunSQL as Bun.SQL client
  participant PgPkg as `@prisma/adapter-pg` (pg)

  App->>Middleware: incoming request
  Middleware->>BunSQL: call getGreptimeSqlClient()
  BunSQL->>BunSQL: initialize client if not exists
  Middleware->>OTEL: call getOtelConfig(serviceName)
  OTEL->>OTEL: instantiate PgInstrumentation & BunSqlInstrumentation
  OTEL->>PgPkg: dynamic require('@prisma/adapter-pg') (try)
  PgPkg-->>OTEL: (if present) provide pg client
  OTEL->>PgPkg: call PgInstrumentation._patchPgClient(pg) (errors ignored)
  OTEL-->>Middleware: return instrumentation config
  Middleware-->>App: attach greptimeDb and continue
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Possibly related PRs

Poem

🐰
I hop in late, not at import time,
Patching pg with a nimble rhyme.
Lazy clients snug in burrows deep,
Spans awake from their morning sleep.
Hooray — the traces leap and climb!

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The PR title 'fix: enable DB spans and metrics in telemetry' directly aligns with the main objective of the changeset: fixing database telemetry by ensuring database spans and metrics are properly captured in OpenTelemetry instrumentation.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/otel-db-instrumentation

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
Review rate limit: 0/1 reviews remaining, refill in 52 minutes and 39 seconds.

Comment @coderabbitai help to get the list of available commands and usage tips.

@turisanapo turisanapo marked this pull request as draft April 30, 2026 02:49
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
packages/shared-api/lib/otel.ts (1)

90-100: Debug logging would aid troubleshooting when pg manual patching fails silently.

The workaround is well-documented and addresses a real Bun limitation (GitHub issue #3775). The version is already pinned to 0.66.0, which mitigates the risk of API breakage. However, the silent catch {} block makes it hard to debug if the patching fails. Consider adding a debug log:

   try {
     // oxlint-disable no-unsafe-assignment no-unsafe-call no-unsafe-member-access
     const { createRequire } = require("module");
     const pg = createRequire(require.resolve("@prisma/adapter-pg"))("pg");
     // `@ts-expect-error` _patchPgClient is a private method on PgInstrumentation
     pgInstrumentation._patchPgClient(pg.Client);
     // oxlint-enable no-unsafe-assignment no-unsafe-call no-unsafe-member-access
-  } catch {}
+  } catch (err) {
+    if (!IS_PRODUCTION) console.debug("[otel] pg manual patching failed:", err);
+  }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/shared-api/lib/otel.ts` around lines 90 - 100, The empty catch
around the manual pg patching hides failures; update the try/catch in the otel
patch block so the catch logs debug-level details including the caught error and
context (e.g., that createRequire(require.resolve("@prisma/adapter-pg"))("pg")
and pgInstrumentation._patchPgClient(pg.Client) failed), using the existing
logger (or console.debug if none) so failures to patch pg.Client are visible for
troubleshooting.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/shared-api/lib/otel.ts`:
- Around line 90-100: The empty catch around the manual pg patching hides
failures; update the try/catch in the otel patch block so the catch logs
debug-level details including the caught error and context (e.g., that
createRequire(require.resolve("@prisma/adapter-pg"))("pg") and
pgInstrumentation._patchPgClient(pg.Client) failed), using the existing logger
(or console.debug if none) so failures to patch pg.Client are visible for
troubleshooting.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8e845411-5f16-4cc9-b3d8-046d88cd626e

📥 Commits

Reviewing files that changed from the base of the PR and between e727c36 and c667f24.

⛔ Files ignored due to path filters (1)
  • bun.lock is excluded by !**/*.lock
📒 Files selected for processing (1)
  • packages/shared-api/lib/otel.ts

@turisanapo turisanapo marked this pull request as ready for review April 30, 2026 02:58
The bun:sql client was created at module-eval time, before
BunSqlInstrumentation could wrap the SQL constructor. Make it lazy
so the first access goes through the instrumented constructor.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@turisanapo turisanapo force-pushed the fix/otel-db-instrumentation branch from 26b0c42 to ad05bd3 Compare April 30, 2026 03:14
@turisanapo turisanapo force-pushed the fix/otel-db-instrumentation branch from 20cc603 to ca30fc1 Compare April 30, 2026 03:20
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@turisanapo turisanapo force-pushed the fix/otel-db-instrumentation branch from 1dee971 to 1501084 Compare April 30, 2026 03:27
@turisanapo turisanapo force-pushed the fix/otel-db-instrumentation branch from 62caa11 to a61932d Compare April 30, 2026 03:39
@turisanapo turisanapo requested a review from heiwen April 30, 2026 03:56
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
packages/shared-api/lib/otel.ts (1)

85-100: Empty catch block pattern is intentional for resilience—logging suggestion should align with structured logger pattern.

The empty catch {} at line 100 mirrors similar graceful fallbacks elsewhere in the codebase (e.g., secret.ts, postgres.ts), confirming this is intentional. If added, observability logging should use the codebase's structured logger (already imported in this file and used consistently with logger.info/error/warn) rather than console.debug.

Additionally, _patchPgClient is a private API that may change between @opentelemetry/instrumentation-pg versions, though this is already mitigated by the @ts-expect-error comment.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@packages/shared-api/lib/otel.ts` around lines 85 - 100, The empty catch
swallowing errors when attempting to patch pg.Client should emit a structured
log via the existing logger instead of doing nothing; update the try/catch
around the createRequire + pgInstrumentation._patchPgClient call to catch the
error (e.g., catch (err)) and call logger.debug or logger.warn with a short
message like "failed to patch pg.Client for Bun: falling back" and include the
caught error object, referencing the pgInstrumentation,
createRequire(require.resolve("@prisma/adapter-pg"))("pg"), and _patchPgClient
call so maintainers can locate the change.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@packages/shared-api/lib/otel.ts`:
- Around line 85-100: The empty catch swallowing errors when attempting to patch
pg.Client should emit a structured log via the existing logger instead of doing
nothing; update the try/catch around the createRequire +
pgInstrumentation._patchPgClient call to catch the error (e.g., catch (err)) and
call logger.debug or logger.warn with a short message like "failed to patch
pg.Client for Bun: falling back" and include the caught error object,
referencing the pgInstrumentation,
createRequire(require.resolve("@prisma/adapter-pg"))("pg"), and _patchPgClient
call so maintainers can locate the change.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d858a7dd-35e9-4aaf-b6ca-9db395cbd51e

📥 Commits

Reviewing files that changed from the base of the PR and between c667f24 and d853188.

📒 Files selected for processing (3)
  • apps/api/src/middlewares/greptime.ts
  • packages/shared-api/db/greptime.ts
  • packages/shared-api/lib/otel.ts

.as("scoped");

export type GreptimeDb = BunSqlClient;
export type GreptimeDb = Bun.SQL;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't know this, its completely encapsulated by the shared-api layer. Please revert.

url,
let _client: Bun.SQL;

/** Lazily created so BunSqlInstrumentation wraps `bun:sql` before the first `new SQL()` call. */
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The previous createBunSqlClient was lazy as well, what exactly is the change here?

Comment on lines +105 to +111
pgInstrumentation,
new BunSqlInstrumentation({
requireParentSpan: true,
ignoreConnectionSpans: true,
// FUTURE: set to true to avoid leaking sensitive information
maskStatement: false,
}),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Align coding pattern across pg and bun instrumentation.

Comment on lines +90 to +100
// Pg is already loaded via @prisma/adapter-pg before OTel's module-load
// hooks can patch it. Patch pg.Client directly as a workaround, while still
// passing the instrumentation to NodeSDK so providers are configured correctly.
try {
// oxlint-disable no-unsafe-assignment no-unsafe-call no-unsafe-member-access
const { createRequire } = require("module");
const pg = createRequire(require.resolve("@prisma/adapter-pg"))("pg");
// @ts-expect-error _patchPgClient is a private method on PgInstrumentation
pgInstrumentation._patchPgClient(pg.Client);
// oxlint-enable no-unsafe-assignment no-unsafe-call no-unsafe-member-access
} catch {}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a big workaround / very hacky. Is this either (a) something that needs to be fixed upstream in @prisma/adapter-pg or otel-instrumentation-otel or (b) something where the initialisation needs to follow a similar lazy pattern like we do with our greptime client?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DB operations only partially visible in telemetry — no DB spans or metric tables in production

2 participants