Fix- Multipart parameter processing file dicts. by sophie-jentic · Pull Request #140 · jentic/arazzo-engine

sophie-jentic · 2026-03-10T16:39:48Z

This PR introduces a fix to incorrect serialisation of file dicts within multipart payload processing. Previously the file dict was being treated as a generic dict in case handling and serialised incorrectly. This caused failure in file upload API calls with file dicts e.g. file: { content, file_name } and incorrect parameter structure for the final request. File dicts for file upload API calls are now recognised correctly and pass through processing without unnecessary serialisation.

…Go to step in runner.

…ig expectation.

Copilot

Pull request overview

This PR fixes multipart handling for “file dict” payloads, improves CLI/log output safety when binary data is present, and updates Arazzo workflow execution to support goto to a specific step (instead of always advancing sequentially).

Changes:

Recognize multipart file dicts ({content, file_name/filename}) and avoid accidental JSON serialization of those structures.
Add output sanitization/truncation to reduce terminal/log pollution from binary/large payloads.
Implement step-level goto handling by introducing a pending_goto_step_id execution-state flag.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
runner/tests/fixtures/bnpl/test_config.yaml	Updates expected API call count for BNPL fixture to reflect `goto` behavior.
runner/arazzo_runner/runner.py	Adds JSON-sanitization for callback outputs and implements pending `goto` step selection logic.
runner/arazzo_runner/models.py	Adds `pending_goto_step_id` to workflow execution state.
runner/arazzo_runner/http.py	Improves multipart file field detection (`file_name` + legacy `filename`) and validates file content type.
runner/arazzo_runner/executor/parameter_processor.py	Prevents file dicts from being serialized in multipart payloads; adds embedded expression substitution in string params.
runner/arazzo_runner/main.py	Truncates large/binary outputs when printing workflow results to the terminal.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

runner/arazzo_runner/executor/parameter_processor.py

runner/arazzo_runner/__main__.py

runner/arazzo_runner/runner.py

runner/arazzo_runner/models.py

runner/arazzo_runner/http.py

runner/arazzo_runner/executor/parameter_processor.py

char0n

Overall Assessment

This PR bundles three distinct concerns into one: (1) multipart file dict handling, (2) GOTO step navigation, (3) terminal output sanitization. I'd recommend splitting these into separate PRs for easier review and safer
rollback, but I'll review each part on its merits.

1. GOTO Step Logic (`runner.py`, `models.py`) — Has bugs

Critical: The GOTO step implementation has a structural issue with for/else.

In runner.py, the GOTO handling in the ActionType.GOTO branch with step_id uses a for/else construct, but there are return statements placed after the else block. The flow appears to be:

for idx, step in enumerate(steps):
    if step.get("stepId") == target_step_id:
        next_step_idx = idx
        break
    state.current_step_id = steps[next_step_idx].get("stepId")
else:
    logger.warning(...)
    state.current_step_id = step_id

return { "status": STEP_COMPLETE, ... }  # This runs when target IS found — wrong status

state.pending_goto_step_id = steps[next_step_idx].get("stepId")
state.current_step_id = step_id

return { "status": GOTO_STEP, ... }      # This is unreachable

The return STEP_COMPLETE after the for/else will execute when the target step is found (via break), returning STEP_COMPLETE instead of GOTO_STEP. And the code below it (pending_goto_step_id assignment and
GOTO_STEP return) is unreachable. This means the GOTO logic effectively never works as intended.

Suggested fix: The logic should be:

target_step_id = next_action["step_id"]
found = False
for idx, step in enumerate(steps):
    if step.get("stepId") == target_step_id:
        next_step_idx = idx
        found = True
        break

if found:
    state.pending_goto_step_id = steps[next_step_idx].get("stepId")
    state.current_step_id = step_id  # step we just completed
    return {
        "status": WorkflowExecutionStatus.GOTO_STEP,
        "step_id": state.pending_goto_step_id,
    }
else:
    logger.warning(f"GOTO target step '{target_step_id}' not found")
    # Fall through to CONTINUE behavior

Also: The pending_goto_step_id approach introduces stateful coupling between execute_next_step calls. Consider whether it's simpler to just set state.current_step_id to the step before the target (so the
existing sequential logic picks up the right step), or directly set current_step_id to the target and add a flag to skip the "advance to next" logic. The current two-field approach (current_step_id +
pending_goto_step_id) is fragile — if anything resets state between calls, the GOTO is lost.

models.py: The pending_goto_step_id field docstring is placed as a string literal above the field — this looks like it's intended as a comment but will be interpreted as a standalone expression (not a docstring for
the field, since dataclass fields don't have docstrings). Use a # comment instead.

2. Multipart File Dict Handling (`parameter_processor.py`, `http.py`) — Mostly good, some issues

parameter_processor.py — _process_multipart_payload:

✅ Good: Detecting pre-formatted file dicts with content + file_name/filename to avoid double-serialization.
⚠️ Issue: The PR introduces file_name (underscore) as the "preferred" key while the existing codebase uses filename (no underscore). This creates an inconsistency. The _rehydrate_blob_reference method at line 71 still
produces "filename", and http.py previously only checked "filename". Now both must be supported everywhere. I'd recommend picking one canonical key and normalizing at a single boundary rather than supporting both
throughout.
⚠️ Issue: The new elif isinstance(value, bytes | bytearray) block (wrapping binary data) uses "file_name" key, but the existing code block just above it (line 98-104 in the current file) uses "filename". This will
be confusing.

http.py multipart handling:

✅ Good: Adding validation that file content is actually bytes/bytearray before upload, with clear error messages.
✅ The has_file_name variable extraction is clean.
⚠️ Minor: The fallback for raw bytes | bytearray (lines 223-225) was already present in the current codebase (elif isinstance(value, bytes | bytearray) at line 198 of current http.py). Verify this isn't a duplicate.

3. Output Sanitization (`main.py`, `runner.py`) — Good idea, implementation notes

__main__.py — _truncate_for_display:

✅ Good approach for CLI readability.
⚠️ Issue: _OMIT_CONTENT_KEYS hardcodes specific key names (htmlContent, fileContent, body, content, responseBody, data). The key "body" and "data" are extremely common and will be truncated even when they
contain short, useful values (e.g., {"data": "ok"}). Consider only truncating when the value actually exceeds max_len, not unconditionally based on key name.
The function handles bytes and str but doesn't handle other non-serializable types (e.g., custom objects). Not critical but worth noting.

runner.py — _sanitize_for_json:

✅ Good: prevents json.dumps failures in event callbacks.
⚠️ Issue: Only applied to step_complete events. If binary data flows through workflow_complete or other events, the same crash will occur. Should be applied uniformly to all events with outputs in kwargs.

4. Embedded Expression Substitution (`parameter_processor.py`)

The new elif re.search(r"\$inputs\.\w+|\$steps\.\w+", value) block in prepare_parameters:

⚠️ The regex \$steps\.\w+ only matches one segment after $steps. — it won't match $steps.myStep.outputs.myField. This needs to be \$steps\.\w+(?:\.\w+)* or similar.
⚠️ replace_embedded returns "" for None values silently. This could mask bugs. At minimum, log a warning.
There's potential overlap with the existing " $" handler above it. Document when each branch triggers.

5. `generate_env_mappings` normalization (`runner.py`)

✅ The change to accept a single Arazzo doc dict (not just a list) is a good usability improvement. The duck-typing check isinstance(arazzo_docs, dict) and "workflows" in arazzo_docs is reasonable.

Fix for file dict in multipart payload processing. Sanitised output. …

c0319ec

…Go to step in runner.

sophie-jentic requested review from Killian-Jentic and char0n March 10, 2026 16:39

Test config now uses go to steps so more API calls than previous conf…

c324494

…ig expectation.

char0n requested a review from Copilot March 11, 2026 10:23

Copilot started reviewing on behalf of char0n March 11, 2026 10:23 View session

Copilot AI reviewed Mar 11, 2026

View reviewed changes

char0n approved these changes Mar 11, 2026

View reviewed changes

Revert changes to separate PR concerns.

a4220b6

sophie-jentic changed the title ~~Fix- Multipart parameter processing. Go to arazzo step.~~ Fix- Multipart parameter processing file dicts. Mar 12, 2026

sophie-jentic added 5 commits March 12, 2026 10:59

Revert sanitisation change for other PR.

5440f08

expression substitution resulting in none fix

e51cf82

Test config reverted. file_name canonical key

4c6861a

Param processor test. Lint

f1eb540

Use file_name as key name

471bf4d

sophie-jentic merged commit f54d6be into main Mar 12, 2026
10 checks passed

sophie-jentic deleted the Multipart-payload-file-dict-fix,-sanitize-terminal-output branch March 12, 2026 13:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix- Multipart parameter processing file dicts.#140

Fix- Multipart parameter processing file dicts.#140
sophie-jentic merged 8 commits intomainfrom
Multipart-payload-file-dict-fix,-sanitize-terminal-output

sophie-jentic commented Mar 10, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

char0n left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

sophie-jentic commented Mar 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

char0n left a comment

Choose a reason for hiding this comment

Overall Assessment

1. GOTO Step Logic (runner.py, models.py) — Has bugs

2. Multipart File Dict Handling (parameter_processor.py, http.py) — Mostly good, some issues

3. Output Sanitization (__main__.py, runner.py) — Good idea, implementation notes

4. Embedded Expression Substitution (parameter_processor.py)

5. generate_env_mappings normalization (runner.py)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

sophie-jentic commented Mar 10, 2026 •

edited

Loading

1. GOTO Step Logic (`runner.py`, `models.py`) — Has bugs

2. Multipart File Dict Handling (`parameter_processor.py`, `http.py`) — Mostly good, some issues

3. Output Sanitization (`main.py`, `runner.py`) — Good idea, implementation notes

4. Embedded Expression Substitution (`parameter_processor.py`)

5. `generate_env_mappings` normalization (`runner.py`)