Skip to content

fix: add timeout to SDK metadata extraction#99

Open
Spirotot wants to merge 3 commits intohappier-dev:devfrom
Spirotot:fix/metadata-extraction-timeout
Open

fix: add timeout to SDK metadata extraction#99
Spirotot wants to merge 3 commits intohappier-dev:devfrom
Spirotot:fix/metadata-extraction-timeout

Conversation

@Spirotot
Copy link

@Spirotot Spirotot commented Feb 28, 2026

Summary

  • extractSDKMetadata() has no timeout — if the spawned SDK process hangs (e.g. due to inaccessible filesystems or missing credentials), the extraction waits indefinitely, leaking a process for the session lifetime
  • Adds a 10-second timeout via setTimeout + AbortController
  • If the SDK query doesn't produce an init message within the timeout, the query is aborted and empty metadata is returned gracefully
  • Normal extraction completes in ~1-2 seconds, so 10 seconds provides ample headroom

Context

Discovered while debugging daemon-spawned sessions on macOS where the metadata extraction subprocess would hang indefinitely due to filesystem access issues (symlinks to cloud storage). While extractSDKMetadataAsync is fire-and-forget and doesn't block session startup, the leaked process persists for the session lifetime.

Test plan

  • Verify normal metadata extraction still completes successfully (should be well under 10s)
  • Verify that when the SDK query hangs, extraction times out after 10s and returns empty metadata gracefully
  • Verify existing test mocks (extractSDKMetadataAsync: vi.fn()) still pass

Summary by CodeRabbit

  • Bug Fixes

    • Added a metadata extraction timeout to prevent hangs and ensure stuck processes are aborted.
    • Improved logging for timeouts, aborts, and successful metadata capture to aid diagnosis and observability.
  • Tests

    • Added sequential integration tests validating metadata capture, child-process termination, and configurable timeout behavior under normal and hanging scenarios.

extractSDKMetadata() has no timeout — if the spawned SDK process hangs
(e.g. due to inaccessible filesystems or missing credentials), the
extraction waits indefinitely, leaking a process for the session lifetime.

Add a 10-second timeout via setTimeout + AbortController. If the SDK
query doesn't produce an init message within the timeout, the query is
aborted and empty metadata is returned gracefully. Normal extraction
completes in ~1-2 seconds, so 10 seconds provides ample headroom.
@coderabbitai
Copy link

coderabbitai bot commented Feb 28, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f20fc78 and a8990bf.

📒 Files selected for processing (1)
  • apps/cli/src/backends/claude/sdk/metadataExtractor.test.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • apps/cli/src/backends/claude/sdk/metadataExtractor.test.ts

Walkthrough

Adds a configurable timeout mechanism for SDK metadata extraction (DEFAULT/MIN/MAX and env parsing), aborting the SDK query on timeout or after capturing init metadata; ensures timeout cleanup on all paths and enhances logging for timeout/abort scenarios.

Changes

Cohort / File(s) Summary
Metadata extraction logic
apps/cli/src/backends/claude/sdk/metadataExtractor.ts
Adds DEFAULT/MIN/MAX timeout constants and resolveMetadataExtractionTimeoutMs(); integrates timeout-driven control flow that aborts the SDK query on timeout or after capturing init-system-message metadata; unrefs and always clears timeout; treats AbortError as normal abort; expands logging.
Tests for timeout and abort behavior
apps/cli/src/backends/claude/sdk/metadataExtractor.test.ts
Adds tests that spawn fake Claude processes to validate metadata capture and process termination, and to assert configurable extraction timeout behavior; includes env-scoped helpers and PID/file wait utilities.

Sequence Diagram(s)

sequenceDiagram
    participant Extractor
    participant ClaudeProcess as "Claude\n(child)"
    participant Timer
    participant OS

    Extractor->>ClaudeProcess: spawn process & start read loop
    Note right of ClaudeProcess: may emit INIT metadata to stdout
    Timer->>Extractor: start timeout
    ClaudeProcess-->>Extractor: INIT metadata (optional)
    alt INIT received
        Extractor->>Timer: clear timeout
        Extractor->>ClaudeProcess: send abort (signal/abort controller)
        ClaudeProcess->>OS: exit
        ClaudeProcess-->>Extractor: exit confirmation
        Extractor-->>Extractor: return extracted metadata
    else timeout occurs
        Timer->>Extractor: trigger timeout
        Extractor->>ClaudeProcess: send abort (signal/abort controller)
        ClaudeProcess->>OS: exit
        ClaudeProcess-->>Extractor: exit confirmation
        Extractor-->>Extractor: return empty metadata
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 28.57% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding a timeout mechanism to SDK metadata extraction to prevent process hangs.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
apps/cli/src/backends/claude/sdk/metadataExtractor.ts (1)

59-71: Consolidate timeout cleanup in a finally block.

clearTimeout(timeoutId) is repeated across multiple branches. A finally block keeps cleanup centralized and reduces maintenance risk.

Suggested refactor
-    try {
+    try {
         logger.debug('[metadataExtractor] Starting SDK metadata extraction')
@@
-                clearTimeout(timeoutId)
                 abortController.abort()
@@
-        clearTimeout(timeoutId)
         logger.debug('[metadataExtractor] No init message received from SDK')
         return {}
-
     } catch (error) {
-        clearTimeout(timeoutId)
         // Check if it's an abort error (expected — either from timeout or after capture)
         if (error instanceof Error && error.name === 'AbortError') {
             logger.debug('[metadataExtractor] SDK query aborted (timeout or after capturing metadata)')
             return {}
         }
         logger.debug('[metadataExtractor] Error extracting SDK metadata:', error)
         return {}
+    } finally {
+        clearTimeout(timeoutId)
     }

As per coding guidelines: "Apply DRY, SOLID, and KISS principles - avoid speculative or YAGNI code".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/cli/src/backends/claude/sdk/metadataExtractor.ts` around lines 59 - 71,
The repeated clearTimeout(timeoutId) (and abortController.abort()) should be
centralized in a finally block inside the metadata extraction routine so cleanup
always runs regardless of return or thrown error; refactor the function
containing the try/catch (e.g., the metadata extraction function in
metadataExtractor.ts that returns metadata or {} and currently calls
clearTimeout in multiple places) to declare timeoutId and abortController in the
outer scope, move clearTimeout(timeoutId) and abortController.abort() into a
finally block, and remove the duplicate clearTimeout calls in the normal return
paths while keeping existing logging (logger.debug('[metadataExtractor] No init
message received from SDK')) and preserving the same return values.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@apps/cli/src/backends/claude/sdk/metadataExtractor.ts`:
- Around line 15-16: The constant METADATA_EXTRACTION_TIMEOUT_MS is hardcoded;
change it to be configurable by reading from environment/config with a safe
default of 10000ms (e.g., use process.env.METADATA_EXTRACTION_TIMEOUT_MS or your
project's config loader to parse an integer and fall back to 10_000). Replace
the current const with code that reads and validates the configured value
(coerce to number, handle NaN) and keep the same exported/used symbol name
METADATA_EXTRACTION_TIMEOUT_MS so existing callers in metadataExtractor.ts
continue to work.

---

Nitpick comments:
In `@apps/cli/src/backends/claude/sdk/metadataExtractor.ts`:
- Around line 59-71: The repeated clearTimeout(timeoutId) (and
abortController.abort()) should be centralized in a finally block inside the
metadata extraction routine so cleanup always runs regardless of return or
thrown error; refactor the function containing the try/catch (e.g., the metadata
extraction function in metadataExtractor.ts that returns metadata or {} and
currently calls clearTimeout in multiple places) to declare timeoutId and
abortController in the outer scope, move clearTimeout(timeoutId) and
abortController.abort() into a finally block, and remove the duplicate
clearTimeout calls in the normal return paths while keeping existing logging
(logger.debug('[metadataExtractor] No init message received from SDK')) and
preserving the same return values.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1037536 and 746c1b7.

📒 Files selected for processing (1)
  • apps/cli/src/backends/claude/sdk/metadataExtractor.ts

Comment on lines +15 to +16
/** Maximum time to wait for SDK metadata extraction before giving up. */
const METADATA_EXTRACTION_TIMEOUT_MS = 10_000
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Make metadata timeout configurable instead of hardcoded.

METADATA_EXTRACTION_TIMEOUT_MS is hardcoded. Please source it from environment/config (with a safe default) so operators can tune behavior per environment.

Suggested change
-/** Maximum time to wait for SDK metadata extraction before giving up. */
-const METADATA_EXTRACTION_TIMEOUT_MS = 10_000
+/** Maximum time to wait for SDK metadata extraction before giving up. */
+const DEFAULT_METADATA_EXTRACTION_TIMEOUT_MS = 10_000
+const METADATA_EXTRACTION_TIMEOUT_MS = (() => {
+    const raw = process.env.HAPPIER_METADATA_EXTRACTION_TIMEOUT_MS
+    const parsed = raw ? Number.parseInt(raw, 10) : NaN
+    return Number.isFinite(parsed) && parsed > 0 ? parsed : DEFAULT_METADATA_EXTRACTION_TIMEOUT_MS
+})()

As per coding guidelines: "Avoid hardcoded behavior - load configuration from YAML/environment variables instead".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/cli/src/backends/claude/sdk/metadataExtractor.ts` around lines 15 - 16,
The constant METADATA_EXTRACTION_TIMEOUT_MS is hardcoded; change it to be
configurable by reading from environment/config with a safe default of 10000ms
(e.g., use process.env.METADATA_EXTRACTION_TIMEOUT_MS or your project's config
loader to parse an integer and fall back to 10_000). Replace the current const
with code that reads and validates the configured value (coerce to number,
handle NaN) and keep the same exported/used symbol name
METADATA_EXTRACTION_TIMEOUT_MS so existing callers in metadataExtractor.ts
continue to work.

@greptile-apps
Copy link
Contributor

greptile-apps bot commented Feb 28, 2026

Greptile Summary

Added a 10-second timeout to SDK metadata extraction to prevent indefinite process hangs when the spawned SDK subprocess blocks on inaccessible filesystems or missing credentials.

Key changes:

  • Introduced METADATA_EXTRACTION_TIMEOUT_MS constant (10 seconds) with proper cleanup via clearTimeout() in all code paths (success, normal completion, and error)
  • Updated error logging to distinguish between timeout aborts and post-capture aborts
  • Maintains graceful degradation by returning empty metadata on timeout

The implementation properly handles all edge cases and follows the codebase's established patterns for error handling and abort controllers.

Confidence Score: 5/5

  • This PR is safe to merge with no risk
  • Clean implementation that properly handles timeouts with correct cleanup in all code paths, follows codebase conventions, and addresses a real production issue (leaked processes from hanging SDK queries)
  • No files require special attention

Important Files Changed

Filename Overview
apps/cli/src/backends/claude/sdk/metadataExtractor.ts Added 10s timeout to prevent indefinite hangs during SDK metadata extraction - proper cleanup in all code paths

Last reviewed commit: 746c1b7

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
apps/cli/src/backends/claude/sdk/metadataExtractor.test.ts (1)

58-67: Add afterAll to clean up the last test's temporary directory.

The beforeEach hook cleans up tmpRoot from a previous test, but after the final test runs, its temporary directory is never removed. This leaves test artifacts on disk.

🧹 Proposed fix to add final cleanup
 import { beforeEach, describe, expect, it, vi } from 'vitest';
+import { afterAll } from 'vitest';
   beforeEach(() => {
     vi.resetModules();
     if (tmpRoot) {
       rmSync(tmpRoot, { recursive: true, force: true });
       tmpRoot = '';
     }
   });
+
+  afterAll(() => {
+    if (tmpRoot) {
+      rmSync(tmpRoot, { recursive: true, force: true });
+    }
+  });

Based on learnings: "Use temporary directories for filesystem tests with automatic cleanup".

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@apps/cli/src/backends/claude/sdk/metadataExtractor.test.ts` around lines 58 -
67, Add an afterAll cleanup to the describe.sequential('claude sdk metadata
extractor') block to remove the final test's temporary directory: check the
existing tmpRoot variable and, if set, call rmSync(tmpRoot, { recursive: true,
force: true }) (the same cleanup logic as in beforeEach) inside an afterAll hook
so the last test's temp directory is removed when the suite finishes.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@apps/cli/src/backends/claude/sdk/metadataExtractor.test.ts`:
- Around line 58-67: Add an afterAll cleanup to the describe.sequential('claude
sdk metadata extractor') block to remove the final test's temporary directory:
check the existing tmpRoot variable and, if set, call rmSync(tmpRoot, {
recursive: true, force: true }) (the same cleanup logic as in beforeEach) inside
an afterAll hook so the last test's temp directory is removed when the suite
finishes.

ℹ️ Review info

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 746c1b7 and f20fc78.

📒 Files selected for processing (2)
  • apps/cli/src/backends/claude/sdk/metadataExtractor.test.ts
  • apps/cli/src/backends/claude/sdk/metadataExtractor.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • apps/cli/src/backends/claude/sdk/metadataExtractor.ts

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants