Skip to content

A minimal agent architecture that cleanly separates planning from execution, making reasoning, failures, and tool use fully inspectable.

Notifications You must be signed in to change notification settings

Arnav-Ajay/agent-planner-executor

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

agent-planner-executor

Separating Reasoning from Execution in Agent Systems


Why This Repository Exists

Most agent systems merge decision-making and execution into a single loop.

That design obscures failure modes:

  • You cannot tell whether a failure came from bad reasoning or bad tool execution
  • You cannot evaluate planning quality independently of outcomes
  • You cannot safely add memory, replanning, or observability without compounding ambiguity

Analysis consolidated in rag-systems-foundations and extended in agent-tool-retriever showed that control decisions must be explicit and inspectable.

This repository enforces a hard architectural boundary:

Planning decides what should happen. Execution decides how it happens.

No component is allowed to violate that separation.


Core Design Principle

Reasoning must be inspectable without running tools. Execution must be debuggable without re-reasoning.

If either condition fails, the system is incorrectly designed.


System Architecture

The system is composed of three explicit layers, each with a single responsibility.

User
  ↓
Runtime (orchestration only)
  ↓
Planner → Executor

Planner

Responsibility: Decide what should happen.

The planner:

  • Consumes:

    • the user question
  • Produces:

    • a structured, machine-readable Plan
  • Guarantees:

    • no tool calls
    • no execution
    • no side effects

The planner is evaluated only on plan quality, not on task success.

Example planner output:

{
  "objective": "What BLEU score did Vaswani et al. report for EN–DE translation?",
  "steps": [
    {
      "step_id": 1,
      "action": "retrieve",
      "args": {"question": "...", "k": 4},
      "rationale": "source-bound factual request"
    }
  ]
}

Executor

Responsibility: Execute the plan exactly as written.

The executor:

  • Consumes:

    • the planner’s plan
  • Performs:

    • only the tool calls specified in the plan
  • Produces:

    • a raw execution trace

Constraints are strict:

  • ❌ no replanning
  • ❌ no goal reinterpretation
  • ❌ no hidden reasoning

The executor is evaluated on faithful adherence, not intelligence.


Runtime

Responsibility: Orchestrate, not decide.

The runtime:

  • Calls the planner
  • Passes the plan to the executor
  • Assembles retrieved context (if any)
  • Delegates answer generation
  • Writes structured traces

It is the only layer allowed to:

  • serialize objects
  • assemble outputs
  • write logs

The runtime contains no decision logic.


Repository Structure

agent-planner-executor/
├── planner/
│   ├── planner.py        # Pure plan generation
│   ├── plan_schema.py   # Plan / PlanStep definitions
│   └── __init__.py
│
├── executor/
│   ├── executor.py      # Faithful plan execution
│   └── __init__.py
│
├── runtime/
│   ├── run.py           # Orchestration only
│   └── __init__.py
│
├── response/
│   └── generate.py      # Answer generation (opaque)
│
├── utils/
│   └── logging_utils.py # Trace writing
│
├── logs/
│   └── traces.jsonl     # Structured execution traces
│
└── main.py              # Thin entrypoint

Each module exists for a single reason. No module performs work outside its assigned role.


Behavioral Equivalence

This system was evaluated using the same question set as agent-tool-retriever.

Observed behavior:

  • Final answers are identical

  • Retrieval decisions are unchanged

  • Planner emits:

    • noop for parametric questions
    • retrieve for source-dependent questions
  • Executor executes plans faithfully

The only change is observability.

This demonstrates that the planner / executor split is an architectural refactor, not a behavioral modification.


Traces as First-Class Outputs

Each run produces a structured trace containing:

  • the user question
  • the planner’s full plan
  • executor step-by-step execution
  • the final answer

This makes previously invisible distinctions explicit:

  • parametric vs evidence-based answers
  • intentional retrieval skips
  • exact tool outputs used downstream

Monolithic agent loops collapse these signals. This system preserves them.


Failure Modes Made Visible

This architecture is designed to expose, not mask, failures such as:

  • logically valid plans that are operationally impossible
  • over- or under-retrieval decisions
  • incorrect planner assumptions about tools
  • silent execution drift from intended plans

In a single-loop agent, these failures are indistinguishable. Here, they are isolatable.


What This System Does Not Include

Deliberately excluded:

  • memory or persistence
  • replanning or self-correction
  • reflection loops
  • multi-agent coordination
  • tool optimization or learning

These features are unsafe to add until reasoning and execution are separable.


Why This Matters

In real systems:

  • compilers are separate from runtimes
  • query planners are separate from operators
  • schedulers are separate from workers

Agent systems should follow the same discipline.

This repository treats agents as a systems engineering problem, not a prompt-design exercise.

This architecture is a prerequisite for agent-memory-systems, where persistence and recall are added without collapsing reasoning, control, and execution into a single loop.


Intended Audience

This repository is for readers who prioritize:

  • debuggability over demos
  • architecture over cleverness
  • failure analysis over happy paths

If you want a chatbot, look elsewhere. If you want to understand how agent systems actually break, this is the right place.


About

A minimal agent architecture that cleanly separates planning from execution, making reasoning, failures, and tool use fully inspectable.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages