Fix- Multipart parameter processing file dicts.#140
Conversation
…Go to step in runner.
There was a problem hiding this comment.
Pull request overview
This PR fixes multipart handling for “file dict” payloads, improves CLI/log output safety when binary data is present, and updates Arazzo workflow execution to support goto to a specific step (instead of always advancing sequentially).
Changes:
- Recognize multipart file dicts (
{content, file_name/filename}) and avoid accidental JSON serialization of those structures. - Add output sanitization/truncation to reduce terminal/log pollution from binary/large payloads.
- Implement step-level
gotohandling by introducing apending_goto_step_idexecution-state flag.
Reviewed changes
Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| runner/tests/fixtures/bnpl/test_config.yaml | Updates expected API call count for BNPL fixture to reflect goto behavior. |
| runner/arazzo_runner/runner.py | Adds JSON-sanitization for callback outputs and implements pending goto step selection logic. |
| runner/arazzo_runner/models.py | Adds pending_goto_step_id to workflow execution state. |
| runner/arazzo_runner/http.py | Improves multipart file field detection (file_name + legacy filename) and validates file content type. |
| runner/arazzo_runner/executor/parameter_processor.py | Prevents file dicts from being serialized in multipart payloads; adds embedded expression substitution in string params. |
| runner/arazzo_runner/main.py | Truncates large/binary outputs when printing workflow results to the terminal. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
char0n
left a comment
There was a problem hiding this comment.
Overall Assessment
This PR bundles three distinct concerns into one: (1) multipart file dict handling, (2) GOTO step navigation, (3) terminal output sanitization. I'd recommend splitting these into separate PRs for easier review and safer
rollback, but I'll review each part on its merits.
1. GOTO Step Logic (runner.py, models.py) — Has bugs
Critical: The GOTO step implementation has a structural issue with for/else.
In runner.py, the GOTO handling in the ActionType.GOTO branch with step_id uses a for/else construct, but there are return statements placed after the else block. The flow appears to be:
for idx, step in enumerate(steps):
if step.get("stepId") == target_step_id:
next_step_idx = idx
break
state.current_step_id = steps[next_step_idx].get("stepId")
else:
logger.warning(...)
state.current_step_id = step_id
return { "status": STEP_COMPLETE, ... } # This runs when target IS found — wrong status
state.pending_goto_step_id = steps[next_step_idx].get("stepId")
state.current_step_id = step_id
return { "status": GOTO_STEP, ... } # This is unreachableThe return STEP_COMPLETE after the for/else will execute when the target step is found (via break), returning STEP_COMPLETE instead of GOTO_STEP. And the code below it (pending_goto_step_id assignment and
GOTO_STEP return) is unreachable. This means the GOTO logic effectively never works as intended.
Suggested fix: The logic should be:
target_step_id = next_action["step_id"]
found = False
for idx, step in enumerate(steps):
if step.get("stepId") == target_step_id:
next_step_idx = idx
found = True
break
if found:
state.pending_goto_step_id = steps[next_step_idx].get("stepId")
state.current_step_id = step_id # step we just completed
return {
"status": WorkflowExecutionStatus.GOTO_STEP,
"step_id": state.pending_goto_step_id,
}
else:
logger.warning(f"GOTO target step '{target_step_id}' not found")
# Fall through to CONTINUE behaviorAlso: The pending_goto_step_id approach introduces stateful coupling between execute_next_step calls. Consider whether it's simpler to just set state.current_step_id to the step before the target (so the
existing sequential logic picks up the right step), or directly set current_step_id to the target and add a flag to skip the "advance to next" logic. The current two-field approach (current_step_id +
pending_goto_step_id) is fragile — if anything resets state between calls, the GOTO is lost.
models.py: The pending_goto_step_id field docstring is placed as a string literal above the field — this looks like it's intended as a comment but will be interpreted as a standalone expression (not a docstring for
the field, since dataclass fields don't have docstrings). Use a # comment instead.
2. Multipart File Dict Handling (parameter_processor.py, http.py) — Mostly good, some issues
parameter_processor.py — _process_multipart_payload:
- ✅ Good: Detecting pre-formatted file dicts with
content+file_name/filenameto avoid double-serialization. ⚠️ Issue: The PR introducesfile_name(underscore) as the "preferred" key while the existing codebase usesfilename(no underscore). This creates an inconsistency. The_rehydrate_blob_referencemethod at line 71 still
produces"filename", andhttp.pypreviously only checked"filename". Now both must be supported everywhere. I'd recommend picking one canonical key and normalizing at a single boundary rather than supporting both
throughout.⚠️ Issue: The newelif isinstance(value, bytes | bytearray)block (wrapping binary data) uses"file_name"key, but the existing code block just above it (line 98-104 in the current file) uses"filename". This will
be confusing.
http.py multipart handling:
- ✅ Good: Adding validation that file content is actually
bytes/bytearraybefore upload, with clear error messages. - ✅ The
has_file_namevariable extraction is clean. ⚠️ Minor: The fallback for rawbytes | bytearray(lines 223-225) was already present in the current codebase (elif isinstance(value, bytes | bytearray)at line 198 of currenthttp.py). Verify this isn't a duplicate.
3. Output Sanitization (__main__.py, runner.py) — Good idea, implementation notes
__main__.py — _truncate_for_display:
- ✅ Good approach for CLI readability.
⚠️ Issue:_OMIT_CONTENT_KEYShardcodes specific key names (htmlContent,fileContent,body,content,responseBody,data). The key"body"and"data"are extremely common and will be truncated even when they
contain short, useful values (e.g.,{"data": "ok"}). Consider only truncating when the value actually exceedsmax_len, not unconditionally based on key name.- The function handles
bytesandstrbut doesn't handle other non-serializable types (e.g., custom objects). Not critical but worth noting.
runner.py — _sanitize_for_json:
- ✅ Good: prevents
json.dumpsfailures in event callbacks. ⚠️ Issue: Only applied tostep_completeevents. If binary data flows throughworkflow_completeor other events, the same crash will occur. Should be applied uniformly to all events withoutputsin kwargs.
4. Embedded Expression Substitution (parameter_processor.py)
The new elif re.search(r"\$inputs\.\w+|\$steps\.\w+", value) block in prepare_parameters:
⚠️ The regex\$steps\.\w+only matches one segment after$steps.— it won't match$steps.myStep.outputs.myField. This needs to be\$steps\.\w+(?:\.\w+)*or similar.⚠️ replace_embeddedreturns""forNonevalues silently. This could mask bugs. At minimum, log a warning.- There's potential overlap with the existing
" $"handler above it. Document when each branch triggers.
5. generate_env_mappings normalization (runner.py)
✅ The change to accept a single Arazzo doc dict (not just a list) is a good usability improvement. The duck-typing check isinstance(arazzo_docs, dict) and "workflows" in arazzo_docs is reasonable.
This PR introduces a fix to incorrect serialisation of file dicts within multipart payload processing. Previously the file dict was being treated as a generic dict in case handling and serialised incorrectly. This caused failure in file upload API calls with file dicts e.g. file: { content, file_name } and incorrect parameter structure for the final request. File dicts for file upload API calls are now recognised correctly and pass through processing without unnecessary serialisation.