Separating Reasoning from Execution in Agent Systems
Most agent systems merge decision-making and execution into a single loop.
That design obscures failure modes:
- You cannot tell whether a failure came from bad reasoning or bad tool execution
- You cannot evaluate planning quality independently of outcomes
- You cannot safely add memory, replanning, or observability without compounding ambiguity
Analysis consolidated in
rag-systems-foundations
and extended in
agent-tool-retriever
showed that control decisions must be explicit and inspectable.
This repository enforces a hard architectural boundary:
Planning decides what should happen. Execution decides how it happens.
No component is allowed to violate that separation.
Reasoning must be inspectable without running tools. Execution must be debuggable without re-reasoning.
If either condition fails, the system is incorrectly designed.
The system is composed of three explicit layers, each with a single responsibility.
User
↓
Runtime (orchestration only)
↓
Planner → Executor
Responsibility: Decide what should happen.
The planner:
-
Consumes:
- the user question
-
Produces:
- a structured, machine-readable Plan
-
Guarantees:
- no tool calls
- no execution
- no side effects
The planner is evaluated only on plan quality, not on task success.
Example planner output:
{
"objective": "What BLEU score did Vaswani et al. report for EN–DE translation?",
"steps": [
{
"step_id": 1,
"action": "retrieve",
"args": {"question": "...", "k": 4},
"rationale": "source-bound factual request"
}
]
}Responsibility: Execute the plan exactly as written.
The executor:
-
Consumes:
- the planner’s plan
-
Performs:
- only the tool calls specified in the plan
-
Produces:
- a raw execution trace
Constraints are strict:
- ❌ no replanning
- ❌ no goal reinterpretation
- ❌ no hidden reasoning
The executor is evaluated on faithful adherence, not intelligence.
Responsibility: Orchestrate, not decide.
The runtime:
- Calls the planner
- Passes the plan to the executor
- Assembles retrieved context (if any)
- Delegates answer generation
- Writes structured traces
It is the only layer allowed to:
- serialize objects
- assemble outputs
- write logs
The runtime contains no decision logic.
agent-planner-executor/
├── planner/
│ ├── planner.py # Pure plan generation
│ ├── plan_schema.py # Plan / PlanStep definitions
│ └── __init__.py
│
├── executor/
│ ├── executor.py # Faithful plan execution
│ └── __init__.py
│
├── runtime/
│ ├── run.py # Orchestration only
│ └── __init__.py
│
├── response/
│ └── generate.py # Answer generation (opaque)
│
├── utils/
│ └── logging_utils.py # Trace writing
│
├── logs/
│ └── traces.jsonl # Structured execution traces
│
└── main.py # Thin entrypoint
Each module exists for a single reason. No module performs work outside its assigned role.
This system was evaluated using the same question set as
agent-tool-retriever.
Observed behavior:
-
Final answers are identical
-
Retrieval decisions are unchanged
-
Planner emits:
noopfor parametric questionsretrievefor source-dependent questions
-
Executor executes plans faithfully
The only change is observability.
This demonstrates that the planner / executor split is an architectural refactor, not a behavioral modification.
Each run produces a structured trace containing:
- the user question
- the planner’s full plan
- executor step-by-step execution
- the final answer
This makes previously invisible distinctions explicit:
- parametric vs evidence-based answers
- intentional retrieval skips
- exact tool outputs used downstream
Monolithic agent loops collapse these signals. This system preserves them.
This architecture is designed to expose, not mask, failures such as:
- logically valid plans that are operationally impossible
- over- or under-retrieval decisions
- incorrect planner assumptions about tools
- silent execution drift from intended plans
In a single-loop agent, these failures are indistinguishable. Here, they are isolatable.
Deliberately excluded:
- memory or persistence
- replanning or self-correction
- reflection loops
- multi-agent coordination
- tool optimization or learning
These features are unsafe to add until reasoning and execution are separable.
In real systems:
- compilers are separate from runtimes
- query planners are separate from operators
- schedulers are separate from workers
Agent systems should follow the same discipline.
This repository treats agents as a systems engineering problem, not a prompt-design exercise.
This architecture is a prerequisite for agent-memory-systems, where persistence and recall are added without collapsing reasoning, control, and execution into a single loop.
This repository is for readers who prioritize:
- debuggability over demos
- architecture over cleverness
- failure analysis over happy paths
If you want a chatbot, look elsewhere. If you want to understand how agent systems actually break, this is the right place.