runtime: reconnect resumed step/checkpoint trace edges in resume_run

Problem

In the conversation graph, resumed runtime artifacts are written as orphan nodes:

WF step N: client_sandbox_resume
WF checkpoint N
They are missing the normal runtime trace edges:

wf_next_step_exec
persist_checkpoint
This makes resumed runs look broken in CDC viewers even though the workflow state is otherwise correct.

Root Cause

In runtime.py, resume_run(...) currently persists the resumed step and checkpoint with last_exec_node=None.

That bypasses the normal edge-writing behavior already used in the main run(...) path.

Current problematic lines are around:

runtime.py
_persist_step_exec(... last_exec_node=None)
_persist_checkpoint(... last_exec_node=None)
There is even an inline comment saying:

not strongly linked to previous right now
Expected Behavior

When resume_run(...) persists:

WF step N: client_sandbox_resume
it should attach a wf_next_step_exec edge from the previous execution node, usually:

wf_step|<run_id>|N-1
Then when it persists:

WF checkpoint N
it should attach a persist_checkpoint edge from the resumed step node:

wf_step|<run_id>|N
This should match the same runtime trace semantics as the normal non-resume execution path.

Proposed Delta

In runtime.py:

In resume_run(...), recover the previous execution node before calling _persist_step_exec(...).
Prefer:
wf_step|<run_id>|step_seq_current-1
Fallback to:
wf_run|<run_id>
Pass that node as last_exec_node into _persist_step_exec(...).
Capture the returned resumed exec node.
Pass that returned node as last_exec_node into _persist_checkpoint(...).
Conceptually:

previous_exec_node = lookup wf_step|run_id|step_seq_current-1
if not found:
    previous_exec_node = lookup wf_run|run_id

resumed_exec_node = self._persist_step_exec(
    ...,
    last_exec_node=previous_exec_node,
)

self._persist_checkpoint(
    ...,
    last_exec_node=resumed_exec_node,
)
Why This Is Upstream

This is not bridge-specific governance logic. It is a generic workflow runtime resume trace issue in Kogwistar itself.

Any product using:

resume_run(...)
conversation runtime trace
CDC / graph viewers
can hit the same orphaned resumed-step/checkpoint problem.

Regression Test To Add

In test_workflow_suspend_resume.py, extend the existing suspend/resume test to assert that after resume:

the resumed step has:
wf_next_step_exec|<run_id>|2|last::wf_step|<run_id>|1|to::wf_step|<run_id>|2
the resumed checkpoint has:
persist_checkpoint|<run_id>|2|last::wf_step|<run_id>|2|to::wf_ckpt|<run_id>|2
That pins the invariant:

resumed runtime trace must be connected just like normal runtime trace
User-visible Impact

Without the fix:

CDC shows resumed workflow nodes as orphans
the trace looks semantically broken
With the fix:

resumed steps/checkpoints stay connected
workflow trace in CDC is continuous and understandable

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

runtime: reconnect resumed step/checkpoint trace edges in resume_run #8

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

runtime: reconnect resumed step/checkpoint trace edges in resume_run #8

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions