Summary
A Coder Hub sandbox session failed to resume. The Hub surfaced a generic error, but a direct manual resume via the Agentuity CLI reproduced a platform-side 409 with a more specific message:
Cannot resume sandbox: it is suspended but has no checkpoint data
This looks inconsistent with the sandbox lifecycle history, which shows prior evacuation/checkpoint activity for the same sandbox, including one event with a concrete checkpoint id. The expectation is that this sandbox should be resumable, or at minimum the platform state should be self-consistent.
Affected IDs
- Hub session:
codesess_938b33a6bf93
- Sandbox:
sbx_aa2c0c5d2c92b74e8f890ad57ca57f454ad8b52f05461d0f0ad23263e88c
- Driver job:
job_d9f56dde7973877928a81271
What I checked
Hub side
- Hub session row still exists and is
paused.
- In-memory runtime is gone, which is expected after paused eviction.
- Hub sandbox tracker latched this error:
Sandbox resume failed while platform status is suspended (HTTP 409 conflict). This looks like a platform checkpoint/resume failure.
- Replay / DB activity shows the last persisted assistant reply completed cleanly; no new
user_prompt, turn_start, task, or tool activity was recorded after that.
- This means the failed prompt never made it into a live turn; failure happened during wake/resume, before agent execution.
Platform side
Current sandbox status:
Manual resume via CLI:
agentuity cloud sandbox resume sbx_aa2c0c5d2c92b74e8f890ad57ca57f454ad8b52f05461d0f0ad23263e88c \
--org-id org_2u8RgDTwcZWrZrZ3sZh24T5FCtz --json
Result:
{
"error": {
"code": "API_ERROR",
"message": "Cannot resume sandbox: it is suspended but has no checkpoint data",
"exitCode": 14,
"details": {
"tag": "APIErrorResponse",
"status": 409,
"sessionId": "sess_c3233f64db553bfd2dfa6d9cc4fcb095"
}
}
}
Sandbox status remained suspended after the manual resume attempt.
Relevant sandbox lifecycle timeline
All times UTC on 2026-04-03.
14:55:42Z sandbox created / started
14:55:44Z tracked driver job created and reported running
15:17:19Z lifecycle:suspended + evacuation:state-update(status=suspended, evacuation_phase=checkpoint-received)
15:18:23Z another suspend event emitted with:
checkpoint_id=ckpt_fc6d480abf57f5f3
checkpoint_bucket=ago-d066e3-checkpoints
checkpoint_size=65519891
15:19:21Z lifecycle:reconcile(previous_status=suspended)
15:23:07Z lifecycle:resumed
15:24:03Z another evacuation/suspend sequence:
lifecycle:suspended(phase=pre-suspend, suspension_reason=evacuation)
evacuation:state-update(status=suspended, evacuation_phase=checkpoint-received)
- another
lifecycle:suspended(phase=pre-suspend, suspension_reason=evacuation)
Additional suspicious signals
- The platform still reports the original tracked driver job as
running with no completion or replacement job.
- A direct
GET /sandbox/:id succeeds normally.
- A direct
GET /sandbox/checkpoints/:id?orgId=... timed out after 12s with zero bytes returned.
- I am not over-claiming on this one, but it smells related given the resume error.
Why this looks wrong
The platform is simultaneously telling us:
- the sandbox is
suspended
- earlier lifecycle events included
checkpoint-received
- one suspend emitted a concrete checkpoint id
- a manual resume now fails because the sandbox allegedly has
no checkpoint data
That combination should not happen in a healthy checkpoint/resume flow.
Expected behavior
One of these should be true:
- the sandbox resumes successfully from its latest checkpoint, or
- the sandbox transitions into a terminal/invalid state with a consistent reason and the checkpoint/event surfaces agree about the missing checkpoint, or
- the resume endpoint returns a more precise failure mode tied to the actual checkpoint object that is missing/corrupt/unreadable.
Notes
Coder Hub currently collapses any resume 409 that leaves the sandbox in suspended into a generic message, so the direct CLI/manual resume result above is the most useful raw signal I found.
Summary
A Coder Hub sandbox session failed to resume. The Hub surfaced a generic error, but a direct manual resume via the Agentuity CLI reproduced a platform-side
409with a more specific message:This looks inconsistent with the sandbox lifecycle history, which shows prior evacuation/checkpoint activity for the same sandbox, including one event with a concrete checkpoint id. The expectation is that this sandbox should be resumable, or at minimum the platform state should be self-consistent.
Affected IDs
codesess_938b33a6bf93sbx_aa2c0c5d2c92b74e8f890ad57ca57f454ad8b52f05461d0f0ad23263e88cjob_d9f56dde7973877928a81271What I checked
Hub side
paused.Sandbox resume failed while platform status is suspended (HTTP 409 conflict). This looks like a platform checkpoint/resume failure.user_prompt,turn_start, task, or tool activity was recorded after that.Platform side
Current sandbox status:
suspendedManual resume via CLI:
Result:
{ "error": { "code": "API_ERROR", "message": "Cannot resume sandbox: it is suspended but has no checkpoint data", "exitCode": 14, "details": { "tag": "APIErrorResponse", "status": 409, "sessionId": "sess_c3233f64db553bfd2dfa6d9cc4fcb095" } } }Sandbox status remained
suspendedafter the manual resume attempt.Relevant sandbox lifecycle timeline
All times UTC on 2026-04-03.
14:55:42Zsandbox created / started14:55:44Ztracked driver job created and reportedrunning15:17:19Zlifecycle:suspended+evacuation:state-update(status=suspended, evacuation_phase=checkpoint-received)15:18:23Zanother suspend event emitted with:checkpoint_id=ckpt_fc6d480abf57f5f3checkpoint_bucket=ago-d066e3-checkpointscheckpoint_size=6551989115:19:21Zlifecycle:reconcile(previous_status=suspended)15:23:07Zlifecycle:resumed15:24:03Zanother evacuation/suspend sequence:lifecycle:suspended(phase=pre-suspend, suspension_reason=evacuation)evacuation:state-update(status=suspended, evacuation_phase=checkpoint-received)lifecycle:suspended(phase=pre-suspend, suspension_reason=evacuation)Additional suspicious signals
runningwith no completion or replacement job.GET /sandbox/:idsucceeds normally.GET /sandbox/checkpoints/:id?orgId=...timed out after 12s with zero bytes returned.Why this looks wrong
The platform is simultaneously telling us:
suspendedcheckpoint-receivedno checkpoint dataThat combination should not happen in a healthy checkpoint/resume flow.
Expected behavior
One of these should be true:
Notes
Coder Hub currently collapses any resume
409that leaves the sandbox insuspendedinto a generic message, so the direct CLI/manual resume result above is the most useful raw signal I found.