Skip to content

[Engine] Native python step type — eliminate bash-wrapped Python programs #84

@ameet

Description

@ameet

Problem

Many workflows need data transformation logic that's too complex for JavaScript code steps but doesn't warrant a separate service. Today, the only option is embedding Python inside bash steps:

{
  "id": "analyzeData",
  "type": "bash",
  "bash": {
    "command": "python3 -c \"\nimport json, sys\nfrom datetime import datetime, timedelta\n\ndays = {{$.input.days}}\ncutoff = (datetime.now() - timedelta(days=days)).isoformat()\n\nruns = []\ntry:\n    with open('data.jsonl') as f:\n        for line in f:\n            try:\n                r = json.loads(line.strip())\n                if r.get('ts', '') >= cutoff:\n                    runs.append(r)\n            except: pass\nexcept: pass\n\nsectors = {}\nfor r in runs:\n    if r.get('sector'):\n        sectors[r['sector']] = sectors.get(r['sector'], 0) + 1\n\nprint(json.dumps({'count': len(runs), 'sectors': sectors}))\n\"",
    "timeout": 15000,
    "parseJson": true
  }
}

This pattern appears in 6+ flows in our codebase, with embedded Python ranging from 40-100+ lines.

Problems with bash-wrapped Python

  1. Escaping hell — double quotes must be escaped as \", template interpolation {{...}} inside Python strings is fragile, triple-quotes (''') can be injected if interpolated values contain them
  2. No syntax validation — Python syntax errors only surface at runtime; static validators can't easily parse Python from inside JSON strings
  3. No line numbers — when Python errors occur, the traceback references python3 -c "<string>" with no useful line numbers
  4. Template interpolation risks{{$.input.companyName}} injected into Python code breaks if the value contains quotes or special characters
  5. IDE support — zero syntax highlighting, no autocomplete, no linting for embedded Python

Proposal

Add a native Python step type:

{
  "id": "analyzeData",
  "type": "python",
  "python": {
    "source": "import json\nfrom datetime import datetime, timedelta\n\ndays = inputs['days']\n...\nreturn {'count': len(runs), 'sectors': sectors}",
    "module": "steps/analyze_data.py",
    "timeout": 15000
  }
}

Where:

  • source is inline Python (like code steps today)
  • module references an external .py file (like code step modules)
  • inputs dict provides access to $.input.*
  • steps dict provides access to $.steps.*
  • Return value becomes the step output (auto-serialized to JSON)
  • Template interpolation happens on the inputs, not inside the Python source

Why this matters

Python is the natural language for data transformation, file parsing, API orchestration, and ML pipelines. Teams that use One CLI for these workflows are forced into the most fragile pattern in the platform. A native step type would:

  • Eliminate escaping issues entirely
  • Enable static validation (ast.parse on the source)
  • Support external module files for better IDE tooling
  • Separate template interpolation from code execution

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions