Problem
In the conversation graph, resumed runtime artifacts are written as orphan nodes:
WF step N: client_sandbox_resume
WF checkpoint N
They are missing the normal runtime trace edges:
wf_next_step_exec
persist_checkpoint
This makes resumed runs look broken in CDC viewers even though the workflow state is otherwise correct.
Root Cause
In runtime.py, resume_run(...) currently persists the resumed step and checkpoint with last_exec_node=None.
That bypasses the normal edge-writing behavior already used in the main run(...) path.
Current problematic lines are around:
runtime.py
_persist_step_exec(... last_exec_node=None)
_persist_checkpoint(... last_exec_node=None)
There is even an inline comment saying:
not strongly linked to previous right now
Expected Behavior
When resume_run(...) persists:
WF step N: client_sandbox_resume
it should attach a wf_next_step_exec edge from the previous execution node, usually:
wf_step|<run_id>|N-1
Then when it persists:
WF checkpoint N
it should attach a persist_checkpoint edge from the resumed step node:
wf_step|<run_id>|N
This should match the same runtime trace semantics as the normal non-resume execution path.
Proposed Delta
In runtime.py:
In resume_run(...), recover the previous execution node before calling _persist_step_exec(...).
Prefer:
wf_step|<run_id>|step_seq_current-1
Fallback to:
wf_run|<run_id>
Pass that node as last_exec_node into _persist_step_exec(...).
Capture the returned resumed exec node.
Pass that returned node as last_exec_node into _persist_checkpoint(...).
Conceptually:
previous_exec_node = lookup wf_step|run_id|step_seq_current-1
if not found:
previous_exec_node = lookup wf_run|run_id
resumed_exec_node = self._persist_step_exec(
...,
last_exec_node=previous_exec_node,
)
self._persist_checkpoint(
...,
last_exec_node=resumed_exec_node,
)
Why This Is Upstream
This is not bridge-specific governance logic. It is a generic workflow runtime resume trace issue in Kogwistar itself.
Any product using:
resume_run(...)
conversation runtime trace
CDC / graph viewers
can hit the same orphaned resumed-step/checkpoint problem.
Regression Test To Add
In test_workflow_suspend_resume.py, extend the existing suspend/resume test to assert that after resume:
the resumed step has:
wf_next_step_exec|<run_id>|2|last::wf_step|<run_id>|1|to::wf_step|<run_id>|2
the resumed checkpoint has:
persist_checkpoint|<run_id>|2|last::wf_step|<run_id>|2|to::wf_ckpt|<run_id>|2
That pins the invariant:
resumed runtime trace must be connected just like normal runtime trace
User-visible Impact
Without the fix:
CDC shows resumed workflow nodes as orphans
the trace looks semantically broken
With the fix:
resumed steps/checkpoints stay connected
workflow trace in CDC is continuous and understandable
Problem
In the conversation graph, resumed runtime artifacts are written as orphan nodes:
WF step N: client_sandbox_resume
WF checkpoint N
They are missing the normal runtime trace edges:
wf_next_step_exec
persist_checkpoint
This makes resumed runs look broken in CDC viewers even though the workflow state is otherwise correct.
Root Cause
In runtime.py, resume_run(...) currently persists the resumed step and checkpoint with last_exec_node=None.
That bypasses the normal edge-writing behavior already used in the main run(...) path.
Current problematic lines are around:
runtime.py
_persist_step_exec(... last_exec_node=None)
_persist_checkpoint(... last_exec_node=None)
There is even an inline comment saying:
not strongly linked to previous right now
Expected Behavior
When resume_run(...) persists:
WF step N: client_sandbox_resume
it should attach a wf_next_step_exec edge from the previous execution node, usually:
wf_step|<run_id>|N-1
Then when it persists:
WF checkpoint N
it should attach a persist_checkpoint edge from the resumed step node:
wf_step|<run_id>|N
This should match the same runtime trace semantics as the normal non-resume execution path.
Proposed Delta
In runtime.py:
In resume_run(...), recover the previous execution node before calling _persist_step_exec(...).
Prefer:
wf_step|<run_id>|step_seq_current-1
Fallback to:
wf_run|<run_id>
Pass that node as last_exec_node into _persist_step_exec(...).
Capture the returned resumed exec node.
Pass that returned node as last_exec_node into _persist_checkpoint(...).
Conceptually:
previous_exec_node = lookup wf_step|run_id|step_seq_current-1
if not found:
previous_exec_node = lookup wf_run|run_id
resumed_exec_node = self._persist_step_exec(
...,
last_exec_node=previous_exec_node,
)
self._persist_checkpoint(
...,
last_exec_node=resumed_exec_node,
)
Why This Is Upstream
This is not bridge-specific governance logic. It is a generic workflow runtime resume trace issue in Kogwistar itself.
Any product using:
resume_run(...)
conversation runtime trace
CDC / graph viewers
can hit the same orphaned resumed-step/checkpoint problem.
Regression Test To Add
In test_workflow_suspend_resume.py, extend the existing suspend/resume test to assert that after resume:
the resumed step has:
wf_next_step_exec|<run_id>|2|last::wf_step|<run_id>|1|to::wf_step|<run_id>|2
the resumed checkpoint has:
persist_checkpoint|<run_id>|2|last::wf_step|<run_id>|2|to::wf_ckpt|<run_id>|2
That pins the invariant:
resumed runtime trace must be connected just like normal runtime trace
User-visible Impact
Without the fix:
CDC shows resumed workflow nodes as orphans
the trace looks semantically broken
With the fix:
resumed steps/checkpoints stay connected
workflow trace in CDC is continuous and understandable