Skip to content

Enhance health_check tool with always-on counts, chunks check, and embedding reachability#72

Merged
jeffgreendesign merged 4 commits intomainfrom
claude/add-health-check-tool-fQN5Y
Apr 4, 2026
Merged

Enhance health_check tool with always-on counts, chunks check, and embedding reachability#72
jeffgreendesign merged 4 commits intomainfrom
claude/add-health-check-tool-fQN5Y

Conversation

@jeffgreendesign
Copy link
Copy Markdown
Owner

  • Return row counts unconditionally (not just in verbose mode) so agents
    get actionable diagnostics in a single call
  • Add chunks table check with count
  • Add memory_observations count alongside entities
  • Add proactive_insights count
  • Add embedding service reachability check (tries generateEmbedding)
  • Add provider field to embeddings check output
  • Update Zod output schema with observations and provider fields
  • Add tests for new functionality (16 total, up from 13)

https://claude.ai/code/session_01JtWxbWQKXXxmE51EM7dmJh

…bedding reachability

- Return row counts unconditionally (not just in verbose mode) so agents
  get actionable diagnostics in a single call
- Add chunks table check with count
- Add memory_observations count alongside entities
- Add proactive_insights count
- Add embedding service reachability check (tries generateEmbedding)
- Add provider field to embeddings check output
- Update Zod output schema with observations and provider fields
- Add tests for new functionality (16 total, up from 13)

https://claude.ai/code/session_01JtWxbWQKXXxmE51EM7dmJh
@vercel
Copy link
Copy Markdown

vercel Bot commented Apr 4, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
dashboard Building Building Preview, Comment Apr 4, 2026 6:20pm
textrawl Ready Ready Preview, Comment Apr 4, 2026 6:20pm

Request Review

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 4, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 0332d029-03ce-4b51-a6b7-e591fc64b71e

📥 Commits

Reviewing files that changed from the base of the PR and between 31745de and 51c9588.

📒 Files selected for processing (1)
  • src/utils/health-helpers.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/utils/health-helpers.ts

Walkthrough

Health checks now invoke generateEmbedding('health check') when embeddings are configured and include an optional provider in the embeddings check; embedding-call failures mark embeddings ok: false and can set overall status to degraded. A new chunks component is checked for accessibility and row count. Row counts for documents, chunks, memory.entities, memory.observations, conversations.sessions, and insights.count use queryCount when verbose is true and estimateRowCount otherwise. insightQueue.pending is queried via queryOne('chunks_pending') on the non-verbose path. ComponentCheckSchema adds optional provider and observations fields.

Sequence Diagram(s)

sequenceDiagram
  participant Client
  participant HealthSvc as HealthService
  participant DB as Database
  participant Emb as EmbeddingProvider

  Client->>HealthSvc: GET /health?verbose=(true|false)
  HealthSvc->>DB: check DB connection & checkTable(documents, chunks, memory, conversations, insights)
  DB-->>HealthSvc: table existence / accessibility
  alt table accessible
    alt verbose
      HealthSvc->>DB: queryCount("SELECT count(*) FROM <table>")
    else not verbose
      HealthSvc->>DB: estimateRowCount(<table>)
    end
    DB-->>HealthSvc: count or estimate
  else inaccessible
    DB-->>HealthSvc: error -> HealthSvc marks check ok:false
  end
  HealthSvc->>DB: queryOne('chunks_pending') for insightQueue (non-verbose also)
  DB-->>HealthSvc: pending count or error
  alt embeddings configured
    HealthSvc->>Emb: generateEmbedding("health check")
    alt reachable
      Emb-->>HealthSvc: vector -> embeddings.ok:true, include provider
    else unreachable
      Emb-->>HealthSvc: error -> embeddings.ok:false, set degraded
    end
  end
  HealthSvc-->>Client: aggregated health JSON with checks: documents, chunks, memory, conversations, insights, insightQueue, embeddings
Loading
🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive The pull request description covers the key changes (counts, chunks check, embeddings reachability, provider field, schema updates) and mentions test additions, but is missing checklist completion and related issues information required by the template. Complete the checklist items (typecheck, testing, code style verification) and fill in the Related Issues section to fully match the template requirements.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The pull request title 'Enhance health_check tool with always-on counts, chunks check, and embedding reachability' is specific and accurately captures the main enhancements: always-on row counts, chunks table check, and embedding reachability verification.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/add-health-check-tool-fQN5Y

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/tools/__tests__/health.test.ts`:
- Around line 218-228: The test "returns degraded when embedding service is
unreachable" asserts embeddings.ok and embeddings.error but doesn't assert the
overall status; update the test (in the it block that calls callHealth(false)
and inspects result.structuredContent) to also expect
result.structuredContent?.status === 'degraded' (assuming you fixed the
implementation to set hasFailure on embedding failure); alternatively, if you
don't change the implementation, rename the test to "reports embeddings failure
when service is unreachable" to match the current assertions.

In `@src/tools/health.ts`:
- Around line 108-125: The embedding failure branch currently sets
checks.embeddings.ok = false but doesn't update the overall health flag; modify
the catch in the generateEmbedding call so it also sets the aggregate failure
marker (the same variable used elsewhere to compute status, e.g., hasFailure) to
true when an embedding error occurs, ensuring overall status becomes degraded;
update the catch that alters checks.embeddings.error and checks.embeddings.ok to
also set hasFailure = true (or the equivalent overall-status flag) so the health
endpoint reflects the embedding service outage.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7e778c30-5cb9-4899-8196-c820723ba1c9

📥 Commits

Reviewing files that changed from the base of the PR and between 5e2de8e and 1e8f4db.

📒 Files selected for processing (2)
  • src/tools/__tests__/health.test.ts
  • src/tools/health.ts

Comment thread src/tools/__tests__/health.test.ts
Comment thread src/tools/health.ts
The catch block for generateEmbedding and the !embeddingsConfigured
branch were not setting hasFailure = true, so an embedding service
outage would leave overall status as 'healthy' instead of 'degraded'.
Also added the missing status assertion in the corresponding test.

https://claude.ai/code/session_01JtWxbWQKXXxmE51EM7dmJh
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
src/tools/health.ts (1)

233-249: ⚠️ Potential issue | 🟠 Major

Mark insightQueue failures as degraded.

Unlike the other component checks, neither failure path updates hasFailure, so the tool can still return healthy while checks.insightQueue.ok is false.

🔧 Proposed fix
 						if (!queueOk) {
 							checks.insightQueue.error = "Table 'insight_queue' not accessible";
+							hasFailure = true;
 						} else {
 							const row = await queryOne<{ chunks_pending: number }>(
 								'SELECT chunks_pending FROM insight_queue WHERE id = 1',
 							);
@@
 						} catch (err) {
 							checks.insightQueue = {
 								ok: false,
 								error: err instanceof Error ? err.message : 'Check failed',
 							};
+							hasFailure = true;
 						}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tools/health.ts` around lines 233 - 249, The insight_queue branch doesn't
set the global failure flag so the service can be considered healthy despite
checks.insightQueue.ok being false; after detecting a bad table (when queueOk is
false) and inside the catch block where checks.insightQueue is set to ok: false,
set the same failure marker used elsewhere (hasFailure = true) so the overall
health is marked degraded/failing; update the logic around
checkTable('insight_queue') and the catch handling that assigns
checks.insightQueue to also flip hasFailure to true, referencing
checks.insightQueue, hasFailure, checkTable and queryOne to locate the code to
change.
🧹 Nitpick comments (1)
src/tools/health.ts (1)

139-139: Consider estimated counts on the default path.

These exact COUNT(*) probes now run on every health check. On larger documents, chunks, memory_*, and proactive_insights tables, that makes the diagnostic path itself expensive. If always-on counts are required, I'd return catalog estimates by default and reserve exact counts for verbose.

Based on learnings: The health_check tool is always registered and provides per-component diagnostics. Use it as the first step when diagnosing infrastructure issues.

Also applies to: 157-157, 176-179, 199-201, 221-221

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/tools/health.ts` at line 139, The health check currently runs expensive
exact COUNT(*) queries (e.g., queryCount used to set checks.documents.count) on
every run; change the default path to use inexpensive estimated counts (PG
catalog reltuples or equivalent) for the large tables (documents, chunks,
memory_* tables, proactive_insights) and only execute exact queryCount('SELECT
count(*) ...') when the health check is invoked with verbose mode. Update the
assignments at checks.documents.count and the analogous spots (the other
occurrences you noted) to branch on the verbose flag (use an estimate helper or
lightweight catalog query when !verbose, and call queryCount for exact numbers
when verbose is true) so diagnostics remain cheap by default.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Outside diff comments:
In `@src/tools/health.ts`:
- Around line 233-249: The insight_queue branch doesn't set the global failure
flag so the service can be considered healthy despite checks.insightQueue.ok
being false; after detecting a bad table (when queueOk is false) and inside the
catch block where checks.insightQueue is set to ok: false, set the same failure
marker used elsewhere (hasFailure = true) so the overall health is marked
degraded/failing; update the logic around checkTable('insight_queue') and the
catch handling that assigns checks.insightQueue to also flip hasFailure to true,
referencing checks.insightQueue, hasFailure, checkTable and queryOne to locate
the code to change.

---

Nitpick comments:
In `@src/tools/health.ts`:
- Line 139: The health check currently runs expensive exact COUNT(*) queries
(e.g., queryCount used to set checks.documents.count) on every run; change the
default path to use inexpensive estimated counts (PG catalog reltuples or
equivalent) for the large tables (documents, chunks, memory_* tables,
proactive_insights) and only execute exact queryCount('SELECT count(*) ...')
when the health check is invoked with verbose mode. Update the assignments at
checks.documents.count and the analogous spots (the other occurrences you noted)
to branch on the verbose flag (use an estimate helper or lightweight catalog
query when !verbose, and call queryCount for exact numbers when verbose is true)
so diagnostics remain cheap by default.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 7485f842-cc11-441a-9b8c-318e808b2065

📥 Commits

Reviewing files that changed from the base of the PR and between 1e8f4db and 1d30b90.

📒 Files selected for processing (2)
  • src/tools/__tests__/health.test.ts
  • src/tools/health.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/tools/tests/health.test.ts

- Set hasFailure = true when insight_queue table check fails or throws,
  matching the pattern used by every other table check
- Use pg_class.reltuples for cheap estimated row counts in default mode;
  only run exact count(*) queries when verbose=true
- Add estimateRowCount helper to health-helpers.ts
- Add tests for insight_queue failure propagation and estimated vs exact
  count behavior (18 tests total, up from 16)

https://claude.ai/code/session_01JtWxbWQKXXxmE51EM7dmJh
Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (2)
src/utils/health-helpers.ts (2)

40-43: Query may return rows from multiple schemas.

The relname lookup doesn't include a schema filter, so if tables with identical names exist in different schemas (e.g., public.documents and archive.documents), the query returns multiple rows and rows[0] picks one non-deterministically.

For the current health-check use case with well-known table names this is unlikely to cause issues, but consider joining with pg_namespace if stricter schema scoping is ever needed.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/health-helpers.ts` around lines 40 - 43, The query in
health-helpers uses pgQuery to select reltuples by relname, which can return
multiple rows when identical table names exist across schemas; update the query
in the function using tableName to scope by schema (e.g., join pg_namespace and
filter on nspname) or require/accept a schema-qualified identifier and use a
schema-aware lookup (join pg_namespace on pg_class.relnamespace or use
to_regclass with schema-qualified name) so the row returned is deterministic and
references the correct table.

38-47: Consider adding try/catch for consistency with checkTable.

Unlike checkTable (lines 12-17), this function doesn't wrap the pgQuery call in a try/catch. If pgQuery throws (connection error, permission denied), the exception propagates to the caller where it becomes a generic "Check failed" message, losing the specific error context.

Additionally, checkTable defensively calls isDatabaseConfigured() at line 10, but this function doesn't—though the caller in health.ts does gate on dbOk, adding the check here would be more consistent.

♻️ Suggested refactor for consistency
 /** Estimated row count from pg_class (cheap, no table scan). */
 export async function estimateRowCount(tableName: string): Promise<number> {
+	if (!isDatabaseConfigured()) return -1;
 	if (!SAFE_IDENTIFIER.test(tableName)) return -1;
+	try {
 	const { rows } = await pgQuery<{ reltuples: string }>(
 		'SELECT reltuples FROM pg_class WHERE relname = $1',
 		[tableName],
 	);
 	if (!rows[0]) return -1;
 	const est = Number(rows[0].reltuples);
 	return est < 0 ? 0 : Math.round(est);
+	} catch {
+		return -1;
+	}
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/health-helpers.ts` around lines 38 - 47, The estimateRowCount
function should mirror checkTable's defensive behavior: first call
isDatabaseConfigured() and return -1 if it is false, and wrap the pgQuery
invocation in a try/catch so DB errors don't leak as generic failures; on catch
return -1 and log the error with context (use the same logger pattern as
checkTable) so callers still get -1 but the real error is recorded. Ensure you
update the estimateRowCount function (referencing pgQuery and
isDatabaseConfigured) to implement this try/catch and early-config check.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@src/utils/health-helpers.ts`:
- Around line 40-43: The query in health-helpers uses pgQuery to select
reltuples by relname, which can return multiple rows when identical table names
exist across schemas; update the query in the function using tableName to scope
by schema (e.g., join pg_namespace and filter on nspname) or require/accept a
schema-qualified identifier and use a schema-aware lookup (join pg_namespace on
pg_class.relnamespace or use to_regclass with schema-qualified name) so the row
returned is deterministic and references the correct table.
- Around line 38-47: The estimateRowCount function should mirror checkTable's
defensive behavior: first call isDatabaseConfigured() and return -1 if it is
false, and wrap the pgQuery invocation in a try/catch so DB errors don't leak as
generic failures; on catch return -1 and log the error with context (use the
same logger pattern as checkTable) so callers still get -1 but the real error is
recorded. Ensure you update the estimateRowCount function (referencing pgQuery
and isDatabaseConfigured) to implement this try/catch and early-config check.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 92c63b14-b6c5-4c75-8bb2-66e756890952

📥 Commits

Reviewing files that changed from the base of the PR and between 1d30b90 and 31745de.

📒 Files selected for processing (3)
  • src/tools/__tests__/health.test.ts
  • src/tools/health.ts
  • src/utils/health-helpers.ts
🚧 Files skipped from review as they are similar to previous changes (2)
  • src/tools/tests/health.test.ts
  • src/tools/health.ts

…ndling

- Scope pg_class query to public schema via pg_namespace join, avoiding
  ambiguity when identical table names exist across schemas
- Add isDatabaseConfigured() early return matching checkTable's pattern
- Wrap pgQuery in try/catch returning -1 on failure, preventing DB
  errors from propagating as unhandled exceptions

https://claude.ai/code/session_01JtWxbWQKXXxmE51EM7dmJh
@jeffgreendesign jeffgreendesign merged commit 04db08c into main Apr 4, 2026
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants