feat(heartbeat): automatic diagnostic retry for failed runs by Noesis-Boss · Pull Request #1 · Noesis-Boss/paperclip

Noesis-Boss · 2026-04-19T19:44:40Z

Summary

When a run fails due to an adapter error (network timeout, rate limit, API error, etc.), Paperclip now automatically classifies the failure and can retry it up to 2 times using the heartbeat service as a diagnostic agent.

What changed

New column: `diagnosticRetryCount`

Added to heartbeat_runs via migration 0046_curious_deadpool.sql
Tracks how many diagnostic retries have been attempted for a run
Max 2 retries (count=0 first attempt, count=1 second attempt, count=2 stops)

New function: `maybeEnqueueDiagnosticRetry` in `heartbeat.ts`

After every failed run in finalizeCompletedRun, this function:

Checks if diagnosticRetryCount < 2
Classifies the failure by parsing error, exitCode, and signal
If canRetry = true (transient errors only), enqueues a diagnostic retry run
Updates diagnosticRetryCount on the new run

Failure classification (`classifyFailure`)

Each failure is categorized and marked retryable or not:

Category	Retryable	Reason
`process_killed`	✅	SIGKILL/SIGTERM
`rate_limit`	✅	API rate limit
`timeout`	✅	Network timeout
`network_error`	✅	Connection reset, EOF errors
`eof_error`	✅	Empty response
`not_found`	❌	Missing file
`permission_denied`	❌	Permissions issue
`oom`	❌	Out of memory
`unknown`	❌	Unclassified

No changes for `process_lost`

The existing processLossRetryCount retry mechanism is unchanged.

Testing

Migration applied, server restarts successfully
TypeScript compiles (pre-existing upstream errors unrelated to this PR)

- Add diagnosticRetryCount column to heartbeat_runs table - Add maybeEnqueueDiagnosticRetry function that runs after each failed run - Diagnostic agent analyzes error (signal, exitCode, error message) and classifies failure type - Retry is enqueued as a new run with context from the failed run (task, session, workspace) - Max 2 diagnostic retries per failed run before giving up - Includes migration for diagnostic_retry_count column - Future upstream updates: rebase origin/master and re-apply migration if needed

Noesis-Boss force-pushed the diagnostic-retry branch from bef20d1 to ad3faaa Compare April 19, 2026 20:34

Noesis-Boss merged commit 96a1bbf into master Apr 23, 2026
1 of 3 checks passed

Noesis-Boss deleted the diagnostic-retry branch April 23, 2026 10:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(heartbeat): automatic diagnostic retry for failed runs#1

feat(heartbeat): automatic diagnostic retry for failed runs#1
Noesis-Boss merged 1 commit intomasterfrom
diagnostic-retry

Noesis-Boss commented Apr 19, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Noesis-Boss commented Apr 19, 2026

Summary

What changed

New column: diagnosticRetryCount

New function: maybeEnqueueDiagnosticRetry in heartbeat.ts

Failure classification (classifyFailure)

No changes for process_lost

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

New column: `diagnosticRetryCount`

New function: `maybeEnqueueDiagnosticRetry` in `heartbeat.ts`

Failure classification (`classifyFailure`)

No changes for `process_lost`