Skip to content

Fix- Multipart parameter processing file dicts.#140

Merged
sophie-jentic merged 8 commits intomainfrom
Multipart-payload-file-dict-fix,-sanitize-terminal-output
Mar 12, 2026
Merged

Fix- Multipart parameter processing file dicts.#140
sophie-jentic merged 8 commits intomainfrom
Multipart-payload-file-dict-fix,-sanitize-terminal-output

Conversation

@sophie-jentic
Copy link
Copy Markdown
Member

@sophie-jentic sophie-jentic commented Mar 10, 2026

This PR introduces a fix to incorrect serialisation of file dicts within multipart payload processing. Previously the file dict was being treated as a generic dict in case handling and serialised incorrectly. This caused failure in file upload API calls with file dicts e.g. file: { content, file_name } and incorrect parameter structure for the final request. File dicts for file upload API calls are now recognised correctly and pass through processing without unnecessary serialisation.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes multipart handling for “file dict” payloads, improves CLI/log output safety when binary data is present, and updates Arazzo workflow execution to support goto to a specific step (instead of always advancing sequentially).

Changes:

  • Recognize multipart file dicts ({content, file_name/filename}) and avoid accidental JSON serialization of those structures.
  • Add output sanitization/truncation to reduce terminal/log pollution from binary/large payloads.
  • Implement step-level goto handling by introducing a pending_goto_step_id execution-state flag.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
runner/tests/fixtures/bnpl/test_config.yaml Updates expected API call count for BNPL fixture to reflect goto behavior.
runner/arazzo_runner/runner.py Adds JSON-sanitization for callback outputs and implements pending goto step selection logic.
runner/arazzo_runner/models.py Adds pending_goto_step_id to workflow execution state.
runner/arazzo_runner/http.py Improves multipart file field detection (file_name + legacy filename) and validates file content type.
runner/arazzo_runner/executor/parameter_processor.py Prevents file dicts from being serialized in multipart payloads; adds embedded expression substitution in string params.
runner/arazzo_runner/main.py Truncates large/binary outputs when printing workflow results to the terminal.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Copy Markdown
Member

@char0n char0n left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall Assessment

This PR bundles three distinct concerns into one: (1) multipart file dict handling, (2) GOTO step navigation, (3) terminal output sanitization. I'd recommend splitting these into separate PRs for easier review and safer
rollback, but I'll review each part on its merits.


1. GOTO Step Logic (runner.py, models.py) — Has bugs

Critical: The GOTO step implementation has a structural issue with for/else.

In runner.py, the GOTO handling in the ActionType.GOTO branch with step_id uses a for/else construct, but there are return statements placed after the else block. The flow appears to be:

for idx, step in enumerate(steps):
    if step.get("stepId") == target_step_id:
        next_step_idx = idx
        break
    state.current_step_id = steps[next_step_idx].get("stepId")
else:
    logger.warning(...)
    state.current_step_id = step_id

return { "status": STEP_COMPLETE, ... }  # This runs when target IS found — wrong status

state.pending_goto_step_id = steps[next_step_idx].get("stepId")
state.current_step_id = step_id

return { "status": GOTO_STEP, ... }      # This is unreachable

The return STEP_COMPLETE after the for/else will execute when the target step is found (via break), returning STEP_COMPLETE instead of GOTO_STEP. And the code below it (pending_goto_step_id assignment and
GOTO_STEP return) is unreachable. This means the GOTO logic effectively never works as intended.

Suggested fix: The logic should be:

target_step_id = next_action["step_id"]
found = False
for idx, step in enumerate(steps):
    if step.get("stepId") == target_step_id:
        next_step_idx = idx
        found = True
        break

if found:
    state.pending_goto_step_id = steps[next_step_idx].get("stepId")
    state.current_step_id = step_id  # step we just completed
    return {
        "status": WorkflowExecutionStatus.GOTO_STEP,
        "step_id": state.pending_goto_step_id,
    }
else:
    logger.warning(f"GOTO target step '{target_step_id}' not found")
    # Fall through to CONTINUE behavior

Also: The pending_goto_step_id approach introduces stateful coupling between execute_next_step calls. Consider whether it's simpler to just set state.current_step_id to the step before the target (so the
existing sequential logic picks up the right step), or directly set current_step_id to the target and add a flag to skip the "advance to next" logic. The current two-field approach (current_step_id +
pending_goto_step_id) is fragile — if anything resets state between calls, the GOTO is lost.

models.py: The pending_goto_step_id field docstring is placed as a string literal above the field — this looks like it's intended as a comment but will be interpreted as a standalone expression (not a docstring for
the field, since dataclass fields don't have docstrings). Use a # comment instead.


2. Multipart File Dict Handling (parameter_processor.py, http.py) — Mostly good, some issues

parameter_processor.py_process_multipart_payload:

  • ✅ Good: Detecting pre-formatted file dicts with content + file_name/filename to avoid double-serialization.
  • ⚠️ Issue: The PR introduces file_name (underscore) as the "preferred" key while the existing codebase uses filename (no underscore). This creates an inconsistency. The _rehydrate_blob_reference method at line 71 still
    produces "filename", and http.py previously only checked "filename". Now both must be supported everywhere. I'd recommend picking one canonical key and normalizing at a single boundary rather than supporting both
    throughout.
  • ⚠️ Issue: The new elif isinstance(value, bytes | bytearray) block (wrapping binary data) uses "file_name" key, but the existing code block just above it (line 98-104 in the current file) uses "filename". This will
    be confusing.

http.py multipart handling:

  • ✅ Good: Adding validation that file content is actually bytes/bytearray before upload, with clear error messages.
  • ✅ The has_file_name variable extraction is clean.
  • ⚠️ Minor: The fallback for raw bytes | bytearray (lines 223-225) was already present in the current codebase (elif isinstance(value, bytes | bytearray) at line 198 of current http.py). Verify this isn't a duplicate.

3. Output Sanitization (__main__.py, runner.py) — Good idea, implementation notes

__main__.py_truncate_for_display:

  • ✅ Good approach for CLI readability.
  • ⚠️ Issue: _OMIT_CONTENT_KEYS hardcodes specific key names (htmlContent, fileContent, body, content, responseBody, data). The key "body" and "data" are extremely common and will be truncated even when they
    contain short, useful values (e.g., {"data": "ok"}). Consider only truncating when the value actually exceeds max_len, not unconditionally based on key name.
  • The function handles bytes and str but doesn't handle other non-serializable types (e.g., custom objects). Not critical but worth noting.

runner.py_sanitize_for_json:

  • ✅ Good: prevents json.dumps failures in event callbacks.
  • ⚠️ Issue: Only applied to step_complete events. If binary data flows through workflow_complete or other events, the same crash will occur. Should be applied uniformly to all events with outputs in kwargs.

4. Embedded Expression Substitution (parameter_processor.py)

The new elif re.search(r"\$inputs\.\w+|\$steps\.\w+", value) block in prepare_parameters:

  • ⚠️ The regex \$steps\.\w+ only matches one segment after $steps. — it won't match $steps.myStep.outputs.myField. This needs to be \$steps\.\w+(?:\.\w+)* or similar.
  • ⚠️ replace_embedded returns "" for None values silently. This could mask bugs. At minimum, log a warning.
  • There's potential overlap with the existing " $" handler above it. Document when each branch triggers.

5. generate_env_mappings normalization (runner.py)

✅ The change to accept a single Arazzo doc dict (not just a list) is a good usability improvement. The duck-typing check isinstance(arazzo_docs, dict) and "workflows" in arazzo_docs is reasonable.


@sophie-jentic sophie-jentic changed the title Fix- Multipart parameter processing. Go to arazzo step. Fix- Multipart parameter processing file dicts. Mar 12, 2026
@sophie-jentic sophie-jentic merged commit f54d6be into main Mar 12, 2026
10 checks passed
@sophie-jentic sophie-jentic deleted the Multipart-payload-file-dict-fix,-sanitize-terminal-output branch March 12, 2026 13:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants