Skip to content

feat(workflow): add DAG orchestration engine for multi-agent workflows#73

Open
Ernest-Wu wants to merge 7 commits intoHKUDS:mainfrom
Ernest-Wu:feature/workflow-dag-engine
Open

feat(workflow): add DAG orchestration engine for multi-agent workflows#73
Ernest-Wu wants to merge 7 commits intoHKUDS:mainfrom
Ernest-Wu:feature/workflow-dag-engine

Conversation

@Ernest-Wu
Copy link
Copy Markdown

Problem

OpenHarness currently only supports single agent loops with simple subagent spawning. There is no structured way to define complex multi-step workflows with dependencies, automatic retry, failure propagation, or execution observability.

Solution

Add a complete Workflow DAG Engine that enables multi-agent orchestration with:

  • Dependency-aware execution: Nodes run only when all dependencies complete
  • Parallel execution: Independent nodes run concurrently via topological layering
  • Automatic retry: Exponential backoff with jitter (configurable per node)
  • Failure propagation: Downstream nodes auto-skipped when upstream fails
  • Observability: JSON, Graphviz DOT, and HTML report exports

Built-in Templates

  1. refactor — Code analysis → Refactor → Review
  2. feature-dev — Planning → Implementation → Tests → Docs
  3. test-and-docs — Run tests → Fix failures → Verify → Update docs

Changes

New Files (17)

  • src/openharness/workflow/ — Core engine (types, scheduler, executor, parser, trace, recovery)
  • src/openharness/commands/workflow.py — CLI commands
  • src/openharness/workflow/templates/*.yaml — 3 built-in templates
  • tests/test_workflow/ — 40 unit tests + 4 E2E tests (real MiniMax-M2.7 API)
  • docs/workflow-engine.md — Complete usage documentation

Modified Files (3)

  • src/openharness/cli.py — Added workflow subcommand registration
  • src/openharness/workflow/executor.py — ErrorEvent handling, restricted tool context
  • CHANGELOG.md — Added entry under Unreleased

Verification

  • uv run ruff check src tests scripts — All checks passed
  • uv run pytest -q — 546 passed, 10 skipped, 1 xfailed
  • ✅ E2E tests with real MiniMax-M2.7 API on AutoAgent workspace:
    • basic_execution: 3414 chars output, glob/grep/bash tools called
    • parallel_execution: 2 nodes run concurrently in 8.6s
    • dependency_chain: upstream results flow to downstream nodes
    • failure_propagation: error handling works correctly

CLI Usage

# List available templates
oh workflow list

# Show template details
oh workflow show refactor

# Run a workflow (dry run)
oh workflow run refactor -v target_path=src/main.py --dry-run

# Export workflow structure
oh workflow export refactor -f json
oh workflow export refactor -f html -o report.html

Ernest-Wu and others added 7 commits April 8, 2026 11:09
Add a complete workflow DAG engine that enables multi-agent orchestration
with dependency-aware, parallel-capable node execution.

Problem:
- OpenHarness only supports single agent loops with simple subagent spawning
- No structured way to define complex multi-step workflows with dependencies
- Missing automatic retry, failure propagation, and execution observability

Changes:
- Workflow DAG scheduler with topological sort and layered parallel execution
- YAML workflow definition parser with variable interpolation
- Node execution engine integrating with the existing Agent Loop
- Automatic retry with exponential backoff and jitter
- Failure propagation: downstream nodes skipped when upstream fails
- Execution tracing with JSON, Graphviz DOT, and HTML report exports
- 3 built-in templates: refactor, feature-dev, test-and-docs
- CLI commands: oh workflow list|show|run|export
- 40 unit/integration tests covering types, scheduler, parser, and tracing

Verification:
- uv run ruff check src tests scripts (all passed)
- uv run pytest -q (546 passed, 6 skipped, 1 xfailed)
- All new code follows existing project conventions and typing style

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Add end-to-end test suite following harness-eval skill patterns:
- Uses real MiniMax M2.7 API calls (no mocks)
- Tests on unfamiliar AutoAgent workspace
- 4 test scenarios: basic execution, parallel nodes, dependency chains, failure propagation
- Validates actual tool execution and result collection

Also fixes:
- parser.py: correctly distinguish YAML content from file paths
- executor.py: properly collect AssistantTextDelta events for output
- executor.py: avoid deepcopy of QueryContext (unpickleable HTTP clients)

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Remove unused imports (ToolRegistry, tempfile)
- Fix f-string warnings without placeholders
- Mark E2E tests with @pytest.mark.skip (require real API key)
- E2E tests should be run directly with python, not via pytest

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- Fix base_url construction for OpenAI-compatible client (ensure /v1 suffix)
- Use MiniMax-M2.5 model (M2.7 returns 529 overloaded errors)
- Add ErrorEvent handling in node executor
- Clean up debug logging and temporary test files
- All 4 E2E scenarios now pass with real API calls:
  - basic_execution: 3414 chars output, glob/grep/bash tools called
  - parallel_execution: 2 nodes run concurrently
  - dependency_chain: upstream results flow to downstream nodes
  - failure_propagation: error handling works correctly

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
- MiniMax-M2.7 API is working (earlier 529 was temporary overload)
- Rewrite E2E test file to clean structure
- All 4 E2E scenarios verified with M2.7:
  - basic_execution: 3414 chars output
  - parallel_execution: 8.6s concurrent
  - dependency_chain: 118s with context passing
  - failure_propagation: proper error handling

Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Co-authored-by: Qwen-Coder <qwen-coder@alibabacloud.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant