FlowSteer: Interactive Agentic Workflow Orchestration via End-to-End Reinforcement Learning

FlowSteer: End-to-End RL Framework for Automated Workflow Orchestration

Overview

FlowSteer addresses critical challenges in agentic workflow orchestration—high manual cost, reliance on specific operators/LLMs, and sparse reward signals. FlowSteer is an end-to-end reinforcement learning (RL) framework that takes a lightweight policy model as the agent and an executable canvas environment, automating workflow orchestration through multi-turn interaction. In this process, the policy model analyzes execution states and selects editing actions, while the canvas executes operators and returns feedback for iterative refinement.

By integrating Canvas Workflow Relative Policy Optimization (CWRPO) with diversity-constrained rewards, FlowSteer offers a plug-and-play framework that supports diverse operator libraries and interchangeable LLM backends.

Key Features

End-to-End RL Training: Learns workflow orchestration through real execution feedback
Plug-and-Play Design: Supports diverse operator libraries and interchangeable LLM backends
CWRPO Algorithm: Novel training algorithm with diversity-constrained rewards and conditional release
Multi-Turn Interaction: Iteratively builds and refines workflows through canvas environment

Results

Project Structure

flowsteer/
├── train_interactive.py          # Main training script
├── eval_only.py                  # Evaluation script
├── merge_and_upload.py           # Model merge and upload utility
├── config/
│   ├── training_interactive.yaml # Training configuration
│   ├── datasets.yaml             # Dataset configuration
│   └── operator.json             # Operator definitions
├── data/
│   ├── train_balanced_12k_humaneval36_fixed.jsonl  # Training data
│   └── test_balanced_768_no_overlap.jsonl          # Test data
├── scripts/
│   ├── operators.py              # Workflow operators
│   └── prompts/                  # Prompt templates
└── src/
    └── interactive/              # Core interactive training modules
        ├── workflow_env.py       # Workflow environment
        ├── cwrpo_trainer.py      # CWRPO trainer
        ├── action_parser.py      # Action parsing
        └── trajectory_reward.py  # Reward computation

Requirements

Python 3.10+
CUDA 11.8+
GPU: NVIDIA A100 80GB (recommended) or RTX 3090 24GB
Memory: 64GB+

Quick Start

1. Install Environment

conda create -n flowsteer python=3.10 -y
conda activate flowsteer
pip install -r requirements.txt
pip install vllm>=0.6.0

2. Download Base Model

# Using huggingface
huggingface-cli download Qwen/Qwen3-8B

# Or using modelscope
pip install modelscope
python -c "from modelscope import snapshot_download; snapshot_download('Qwen/Qwen3-8B')"

3. Start vLLM Server

CUDA_VISIBLE_DEVICES=0 python -m vllm.entrypoints.openai.api_server \
    --model /path/to/Qwen3-8B \
    --served-model-name Qwen3-8B \
    --port 8003 \
    --gpu-memory-utilization 0.85 \
    --max-model-len 16384 \
    --enable-lora \
    --max-loras 2 \
    --max-lora-rank 64 \
    --trust-remote-code \
    --dtype bfloat16

4. Start Training

CUDA_VISIBLE_DEVICES=2 python train_interactive.py \
    --config config/training_interactive.yaml

5. Evaluation

python eval_only.py --config config/training_interactive.yaml \
    --checkpoint checkpoints/interactive/checkpoint_step_100

Configuration

Edit config/training_interactive.yaml:

# Model configuration
base_model: "/path/to/Qwen3-8B"
vllm_base_url: "http://localhost:8003/v1"

# Training parameters
learning_rate: 1.0e-5
samples_per_source: 6
max_steps: 300

# LoRA configuration
use_lora: true
lora_rank: 64
lora_alpha: 64

License

This project is for research purposes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FlowSteer: Interactive Agentic Workflow Orchestration via End-to-End Reinforcement Learning

FlowSteer: End-to-End RL Framework for Automated Workflow Orchestration

Overview

Key Features

Results

Project Structure

Requirements

Quick Start

1. Install Environment

2. Download Base Model

3. Start vLLM Server

4. Start Training

5. Evaluation

Configuration

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
config		config
figs		figs
scripts		scripts
src		src
.gitignore		.gitignore
README.md		README.md
eval_only.py		eval_only.py
merge_and_upload.py		merge_and_upload.py
requirements.txt		requirements.txt
start_vllm_server.sh		start_vllm_server.sh
train_interactive.py		train_interactive.py

Folders and files

Latest commit

History

Repository files navigation

FlowSteer: Interactive Agentic Workflow Orchestration via End-to-End Reinforcement Learning

FlowSteer: End-to-End RL Framework for Automated Workflow Orchestration

Overview

Key Features

Results

Project Structure

Requirements

Quick Start

1. Install Environment

2. Download Base Model

3. Start vLLM Server

4. Start Training

5. Evaluation

Configuration

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages