diff --git a/rfcs/000-project-phases.md b/rfcs/000-project-phases.md index 1a6b0f71..379b2769 100644 --- a/rfcs/000-project-phases.md +++ b/rfcs/000-project-phases.md @@ -1,10 +1,31 @@ -# RFC: OpenEnv layering +# RFC: Design Principles and Broad Roadmap **Status**: In Review **Created**: 10/17/2025 +**Amended**: November 12, 2025 **Authors**: @Darktex, @pankit-eng, @jspisak, @zkwentz **RFC ID:** 000 +## Amendment History + +**November 12, 2025**: Added design principles, target audience, and updated roadmap to reference RFCs 005-007. + +## Design Principles + +These principles guide every decision in OpenEnv: + +1. **Minimize deltas across lifecycle**: Training → Evals → Production should use identical tool interfaces +2. **Minimize human-agent divergence**: Tools that work for humans should work for agents +3. **Be hands-on**: Provide ready-to-use implementations, not just specs +4. **Design for LLMs**: Optimize for context efficiency, in-distribution behavior, and token costs + +## Target Audience + +- **Environment builders**: Get reach across projects without custom adapters +- **Model builders**: Access massive inventory of environments and tools +- **Researchers**: Reproducible setups with versioned tools, rewards, and evals +- **Infrastructure engineers**: Clear contracts enabling backend optimization + ## Summary Before jumping into the actual concrete proposals, this RFC introduces how we are going to approach this problem space, what problems we want to solve with this project, and how we plan to prioritize and solve them systematically. @@ -27,9 +48,14 @@ We will group development from now till version 1.0 into three phases. In the **first phase** of this project, we will focus **exclusively** on the narrowest definition of environments, without even worrying about rewards nor evals. Instead, the focus in this phase (and in the RFCs you find in this directory) is going to be on: 1. Establishing a convention on what is an environment and where we draw the "environment" box (RFC 001). 2. Landing the basics of _sandboxing_, _versioning_, _binary distribution_, _dependency management_ (RFC 002). -3. Nailing our tools support through MCP (Model Context Protocol) integration for both remote and local tools (RFC 003). -4. Defining a unified action interface for all environment types (RFC 004). -5. Exploring RPC communication patterns beyond HTTP for long-running sessions (particularly for interpreted languages like Python, Bash, Ruby, etc.). Coming in an upcoming RFC. +3. Nailing our tools support through MCP (Model Context Protocol) integration: + - RFC 003: Traditional tool calling (ListToolsAction, CallToolAction) + - RFC 004: CodeAct support (agents write executable code) + - RFC 005: MCP as the universal interface (policy and rationale) +4. Establishing tool registry and distribution patterns via Hugging Face Hub (upcoming RFC). +5. Enabling production performance simulation to minimize training-production delta (RFC 006). +6. Exploring MCP protocol interception for observability (RFC 007). +7. Exploring RPC communication patterns beyond HTTP for long-running sessions (particularly for interpreted languages like Python, Bash, Ruby, etc.). Coming in an upcoming RFC. We will conclude this phase with version 0.3. diff --git a/rfcs/001-abstractions.md b/rfcs/001-abstractions.md index 630d8a49..c2a29020 100644 --- a/rfcs/001-abstractions.md +++ b/rfcs/001-abstractions.md @@ -2,9 +2,14 @@ **Status**: In Review **Created**: 10/20/2025 +**Amended**: November 12, 2025 **Authors**: @Darktex, @pankit-eng, @jspisak, @zkwentz **RFC ID:** 001 +## Amendment History + +**November 12, 2025**: Added two-interface model (MCP for agents, HTTP for operations), simulation layer clarity, event queues, state management, and "The Time Problem" section. + ## Summary This document defines what we call an "Environment", what its responsibilities are, and how we expect our customers to use our environments in their systems. @@ -65,40 +70,150 @@ This is the contract that we are proposing. We feel it strikes a good balance be These are the key abstractions that we expect. Note that in this project we only implement the "Environment" abstraction under our meaning. You can map to other "agents" or "environment" abstractions by writing adapters to and from OpenEnvs. Key assumptions: -1. We separate tasks from environments. While it is a good idea to package up a dataset with an environment and evals, we expect this wrapping to be done *outside* the env box. This allows for the reuse of environments across tasks. +1. The Environment bundles everything needed for agent interaction: tools (MCP servers), sandboxing, code execution, reward computation, tasks/datasets, and evals. This packaging makes environments self-contained and reusable. 2. We hold the state of everything **external** to the agent in the Environment. For example, if your agent defines `a = 4` with an action and wants to read `a` some time in the future, the environment will persist the interpreter state and remember variable assignments. 3. We expect a _thin_ Agent abstraction around your model that holds the state of everything pertaining to your model, such as conversation history, tokenizer etc. +```mermaid +flowchart TB + subgraph outer["OUTER SYSTEM (RL Training Infrastructure)"] + agent["Agent (Thin Wrapper) + + - Model/Policy + - Tokenizer + - Conversation History"] + + env["Environment (Docker Container) + + - MCP Servers + - Sandbox + - Code Execution + - Reward Pipeline + - External State + - Task/Dataset Loader + - Evals (aggregated)"] + + orchestration["RL Orchestration (Training Loop) + + - reset, step, get_state + - Simulation control + - Metrics and monitoring"] + + agent <-->|"MCP + (Tool Calls)"| env + orchestration -->|"HTTP + (Orchestration)"| env + end + + classDef agentBox fill:#e1f5ff,stroke:#333,stroke-width:2px + classDef envBox fill:#fff4e1,stroke:#333,stroke-width:2px + classDef orchBox fill:#f0f0f0,stroke:#333,stroke-width:2px + + class agent agentBox + class env envBox + class orchestration orchBox ``` -┌──────────────────────────────────────────────────────────────────────────┐ -│ OUTER SYSTEM │ -│ │ -│ ┌──────────────────┐ ┌───────────────────────────┐ │ -│ │ Dataset/Task │ │ Agent │ │ -│ │ Loader │───────────────────>│ (thin wrapper) │ │ -│ │ │ Provides task │ │ │ -│ └──────────────────┘ │ • Model/Policy │ │ -│ │ • Tokenizer │ │ -│ ┌──────────────────┐ │ • Conversation History │ │ -│ │ Evals │ └───────┬───────────────────┘ │ -│ │ (data-dependent, │ │ ^ │ -│ │ aggregated) │ │ Action │ │ -│ └──────────────────┘ │ │Observation │ -│ v │ │ -│ ┌─────────────────┴───────────┐│ -│ │ Environment ││ -│ │ ││ -│ │ • Tools (MCP) ││ -│ │ • Sandbox (Docker) ││ -│ │ • Code Execution ││ -│ │ • Reward Pipeline ││ -│ │ • External State ││ -│ │ (e.g., interpreter vars) ││ -│ └─────────────────────────────┘│ -│ │ -└──────────────────────────────────────────────────────────────────────────┘ + +**Key Interfaces:** +- **MCP (Agent ↔ Environment)**: Agent-environment tool interaction (training AND production) +- **HTTP (Orchestration ↔ Environment)**: Simulation control + operations (training AND production) + + +**Critical insight**: The Agent uses **MCP exclusively** to interact with the Environment. The HTTP interface is for orchestration (simulation control in training, operations in production), never for agent actions. + +## Two Interfaces, Two Purposes + +A critical insight shapes OpenEnv's architecture: **environments expose two distinct interfaces** serving fundamentally different purposes. + +**1. MCP (Agent Interface)** +- Agent ↔ Environment tool interaction +- Present in training AND production +- Operations: Tool calls (`search()`, `execute_sql()`, etc.) +- **This is the ONLY interface agents use** (see RFC 005) + +**2. HTTP (Service/Operations Interface)** +- RL Orchestration ↔ Environment control +- Present in training AND production (different purposes) +- Operations: + - Training: `reset()`, `step()`, `get_state()` (simulation control) + - Production: Health checks, metrics, logs (operations) +- **Agents NEVER access this directly** + +**Key principle**: MCP for agent actions, HTTP for orchestration. See RFC 002 for detailed specification of how these interfaces work in practice, including graceful degradation from training to production. + +**Special note**: Simulation control methods (`.reset()`, `.step()`) are **never** exposed as MCP tools. This ensures agents never learn they can reset reality—critical for safe production deployment. + +## The Time Problem: Simulation vs Production + +A critical insight that shapes our entire design: + +**Simulation Time (Training/Eval)**: +- Time only advances when we say so (via `.step()`) +- Agent can "think" for arbitrary real-world time - simulation is paused +- Environment state is frozen until agent acts +- Can reset to initial state infinitely +- Code execution blocks execute atomically from environment's perspective + +**Real Time (Production)**: +- Time flows continuously +- Events arrive on their own schedule (people get hired *now*, not when agent is ready) +- Agent must react with bounded latency +- Cannot reset (it's the real world). Deleting records is a one-way door. +- No "turns" in the traditional sense - continuous stream of events + +**Key insight**: You can simulate production (via event queues), but you can't "productionize" simulation (can't pause reality). + +This temporal duality drives the need for two distinct interfaces: +- **Simulation control** (HTTP): Reset, step, reward computation (training/eval only) +- **Agent-environment interaction** (MCP): Tool calls (training AND production) + +**See RFC 006** for how we simulate production performance characteristics (latency, reliability) during training to minimize the training-production delta. + +## Event Queues: First-Class Abstraction + +Environments fall into two categories: + +1. **Static environments**: State only changes when agent acts (chess, coding puzzles) +2. **Dynamic environments**: State changes independently (database with external events, customer service) + +We make the event queue a **first-class abstraction**: +- **Empty queue** = static environment +- **Populated queue** = dynamic environment with external events + +```python +class Environment: + def __init__( + self, + mode: str, # "sim" or "prod" + mcp_servers: List[MCPServerConfig], + event_queue: EventQueue, # Empty for static, populated for dynamic + .. +. + ): + self.event_queue = event_queue + self.mode = mode ``` +The event queue delivers external events (e.g., "new employee hired", "API request received") that change the environment state independently of agent actions. This enables realistic simulation of production scenarios where the world doesn't wait for the agent. + +## State Management: Why It's Separate + +**State** is a distinct concept from both **tools** and **data**: + +1. **Not part of the dataset**: While datasets contain tasks, the initial state snapshot (e.g., database contents) is separate. You can have many different tasks operate on the same state snapshot! + +2. **Not part of MCP tools**: Tools query and mutate state, but state itself isn't defined by MCP. MCP only deals with the interface to state. + +3. **Simulation-specific reset capability**: During training, we need the ability to reset state to its original snapshot. **Crucially**, the agent absolutely cannot trigger this reset—it's exclusively for the training loop via `.reset()` (HTTP). If the agent could reset state, it would learn that every error is recoverable, creating a huge training-production delta. + +**Example**: Database maintenance environment +- Initial state: SQLite database with employee records +- Agent calls `execute_sql("DELETE FROM employees")` → receives penalty in reward +- Training loop calls `env.reset()` → database restored to initial snapshot +- Agent learns not to delete records (because it can't undo the damage) + +In production, there is no reset. The agent must live with consequences of its actions. + ## Python Interfaces Below are the core Python interfaces that define the contract between agents and environments. @@ -444,7 +559,7 @@ for batch_of_tasks in dataloader: 3. **PyTorch DataLoader compatibility**: `TaskDataset` follows the PyTorch `IterableDataset` interface (implements `__iter__`), making it seamlessly compatible with PyTorch's `DataLoader` for streaming data, multiprocess loading, etc. This is ideal for sequential data access and large datasets. -4. **Flexibility**: Environments can support both traditional tool calling (where each tool call is a separate action) and CodeAct (where an action contains code that may call multiple tools). See RFC 004 for details on unified action interface and RFC 003 for MCP integration. +4. **Flexibility**: Environments can support both traditional tool calling (where each tool call is a separate action) and CodeAct (where an action contains code that may call multiple tools). See RFC 005 for details on unified action interface, RFC 003 for traditional MCP integration, and RFC 004 for CodeAct. 5. **State ownership**: The Environment owns all external state (file system, interpreter state, tool outputs). The Agent owns internal state (conversation history, model hidden states, etc.). diff --git a/rfcs/002-env-spec.md b/rfcs/002-env-spec.md index 54e01184..dedeee90 100644 --- a/rfcs/002-env-spec.md +++ b/rfcs/002-env-spec.md @@ -2,9 +2,14 @@ **Status**: In Review **Created**: 10/14/2025 +**Amended**: November 12, 2025 **Authors**: @Darktex, @pankit-eng, @jspisak, @zkwentz **RFC ID:** 002 +## Amendment History + +**November 12, 2025**: Added tool duality (sim vs prod), Docker Compose patterns, positioning framework (OpenEnv vs systems built on top), and graceful degradation principles. + ## Summary An e2e framework for creating, deploying and using isolated execution environments for agentic RL training, built using Gymnasium style APIs. It provides a clean client-server architecture where environments run as FastAPI servers in Docker containers, and clients interact with them via type-safe HTTP APIs. @@ -52,6 +57,8 @@ Building execution environments for AI agents, code execution, or computational └─────────────────────────────────────────────────────────┘ ``` +**Important**: This diagram shows the **HTTP interface** used by RL orchestration for simulation control (`reset()`, `step()`, `get_state()`). The **MCP interface** for agent-environment tool interaction is separate and runs alongside (see "Graceful Degradation to Production" section below and RFC 005). + ### Core Abstractions(Already available on the master) #### 1. Environment (Server-Side) @@ -149,7 +156,7 @@ In this RFC, we want to align on four decisions that will shape the overall desi These three APIs establish the minimum viable interface for environment interaction and are sufficient for basic RL training workflows. They align with established patterns from Gymnasium and similar frameworks, making them immediately familiar to practitioners. -**Scope**: This RFC focuses exclusively on these baseline APIs. Additional APIs (e.g., `render()`, `seed()`) will be explored in follow-up RFCs. The `actions()` method for action discovery is defined in RFC 004. +**Scope**: This RFC focuses exclusively on these baseline APIs. Additional APIs (e.g., `render()`, `seed()`) will be explored in follow-up RFCs. The `actions()` method for action discovery is defined in RFC 005. #### Decision 2: Environment-Computed Rewards @@ -218,3 +225,214 @@ print(result.observation.stdout) # "Hello, World!\n" print(result.observation.exit_code) # 0 client.close() ``` + +## Tool Duality: Simulation vs Production + +Many tools need different implementations in training vs production while maintaining identical interfaces: + +**Examples**: +- **Search API**: Production calls actual search; training uses mock +- **Email**: Production sends real emails; training logs to file +- **Database**: Production hits real DB; training uses containerized instance + +**Key principle**: The **MCP interface must be identical** to maintain training/production parity (see RFC 005). + +### Three-Phase Ecosystem Evolution + +**Phase 1 (Current)**: Community provides sim-only tools +- Environment builders create MCP servers for their simulated environments +- Production deployment uses different tooling (acceptable for research) +- Example: SQLite MCP for training, Postgres connector for production + +**Phase 2 (6-12 months)**: Tool registry emerges +- Community-maintained mappings: "search_tool (sim) → Algolia (prod)" +- Hugging Face Hub hosts these registries (see future tool registry RFC) +- Still requires manual prod setup, but mapping is documented + +**See future tool registry RFC for detailed specification of tool registry format, HF Hub structure, and community contribution workflows.** + +**Phase 3 (12+ months)**: Tool providers participate +- Major SaaS companies provide official sim/prod server pairs +- One-line deployment: Specify registry entry, get both modes +- Example: `search: algolia/search-mcp` pulls both sim and prod servers +- Tool providers shipping dual-mode servers becomes standard practice + +### Dual-Mode Server Pattern + +Tool providers can ship servers that handle both modes: + +```python +class SendGridMCPServer: + def __init__(self): + self.mode = os.getenv("MODE", "prod") # "sim" or "prod" + + if self.mode == "sim": + self.client = MockEmailClient() # Logs to file + elif self.mode == "prod": + self.client = SendGridAPIClient() # Real API + + @mcp_tool + def send_email(self, to: str, subject: str, body: str): + # Same interface, different implementation + return self.client.send(to, subject, body) +``` + +**Benefits**: +- Single package to maintain +- Tool provider owns simulation quality +- Realistic test data from source + +## Docker Compose: Dual-Mode Deployment + +Production and simulation may have different dependency requirements. We use Docker Compose to manage these cleanly: + +### Simulation Mode + +```yaml +# docker-compose.sim.yml +services: + env: + image: openenv/my-env:v1 + environment: + MODE: sim + ports: + - "8080:8080" + + # Lightweight mocks + mock-db: + image: postgres:15 + environment: + POSTGRES_DB: testdb + + mock-email: + image: openenv/mock-email:v1 +``` + +**Characteristics**: +- Mock services (in-memory database, fake email server) +- Lightweight, fast startup +- No external dependencies +- No API keys required + +### Production Mode + +```yaml +# docker-compose.prod.yml +services: + env: + image: openenv/my-env:v1 # Same image! + environment: + MODE: prod + DB_CONNECTION: ${PROD_DB_URL} + EMAIL_API_KEY: ${SENDGRID_KEY} + ports: + - "8080:8080" +``` + +**Characteristics**: +- Real services (Postgres, SendGrid) +- API keys, credentials +- Network access, higher latency +- Production-grade reliability + +**Key insight**: The environment code is identical—only configuration differs. + +## Graceful Degradation to Production + +When deploying to production, OpenEnv environments **gracefully degrade** into pure MCP servers: + +**Training Mode**: +``` +┌─────────────────────────────────────┐ +│ HTTP Layer (Simulation + Ops) │ +│ - reset(), step(), get_state() │ +│ - Health checks, metrics │ +├─────────────────────────────────────┤ +│ MCP Layer (Agent Tools) │ +│ - search(), execute_sql(), etc. │ +│ - SAME as production │ +└─────────────────────────────────────┘ +``` + +**Production Mode**: +``` +┌─────────────────────────────────────┐ +│ HTTP Layer (Ops Only) │ +│ - Health checks, metrics, logs │ +│ - NO reset/step (not simulation) │ +├─────────────────────────────────────┤ +│ MCP Layer (Agent Tools) │ +│ - search(), execute_sql(), etc. │ +│ - IDENTICAL interface │ +└─────────────────────────────────────┘ +``` + +The agent sees the same MCP interface in both modes. The HTTP layer shifts from simulation control to operational monitoring. + +## Dependency Management: Sim vs Prod + +**Approach**: Use separate Docker Compose files for different modes + +**Training workflow**: +```bash +docker-compose -f docker-compose.sim.yml up +# Fast startup, mock services, no credentials needed +``` + +**Production workflow**: +```bash +export PROD_DB_URL="postgresql://..." +export SENDGRID_KEY="..." +docker-compose -f docker-compose.prod.yml up +# Real services, production credentials +``` + +The environment code remains unchanged. Only the orchestration layer differs. + +## Positioning: OpenEnv vs Systems Built on OpenEnv + +### OpenEnv: The Standard + +**Mission**: Source maximum high-quality environment contributions from community + +**Characteristics**: +- **Flexible**: Supports both traditional tool calling (RFC 003) and CodeAct (RFC 004) paradigms +- **Open**: Anyone can contribute environments +- **Quality-focused**: High bar for useful, production-relevant environments +- **MCP-native**: Universal interface for all environments (see RFC 005) + +**Design philosophy**: Make frontier practices (CodeAct, production-first, progressive disclosure) EASY, not MANDATORY. + +**What we optimize for**: +- ✅ Environments that reflect real-world use cases +- ✅ Environments with clear reward signals +- ✅ Environments that work in both training and production +- ❌ Toy environments with no production analog +- ❌ Environments with made-up APIs that don't match real services + +This isn't about being exclusive—it's about maintaining a quality bar that makes the ecosystem valuable. + +### Systems Built on OpenEnv + +**Mission**: Build best-in-class agent training infrastructure for specific use cases + +**Characteristics**: +- **Opinionated**: May choose CodeAct-only, specific training algorithms, specific toolsets +- **Customized**: Optimized for particular workloads (e.g., reasoning, coding, customer service) +- **Closed or open**: May be internal systems or community projects +- **Add layers**: Build on OpenEnv foundation with additional infrastructure + +**Example**: Internal RL training stack +- 100% CodeAct (no tool-calling mode) +- Custom training infrastructure integration (e.g., TorchForge for async RL) +- Production-first by default (no simulation-only quirks) +- Advanced features (e.g., TimeWalk for tree search, tool-aware checkpointing) +- Uses OpenEnv environments but adds opinionated layers + +**The relationship**: +- **OpenEnv provides the foundation**: Environment standard, MCP interface, community contributions +- **Systems add opinions**: Optimizations, integrations, constraints on top +- **Both benefit**: OpenEnv gets community contributions, systems get ecosystem reach + +**Analogy**: OpenEnv is like Linux (flexible kernel), systems built on it are like Ubuntu or Red Hat (opinionated distributions). + diff --git a/rfcs/003-mcp-support.md b/rfcs/003-mcp-support.md index f8cd6f3c..6b96c08c 100644 --- a/rfcs/003-mcp-support.md +++ b/rfcs/003-mcp-support.md @@ -1,727 +1,550 @@ # RFC: MCP (Model Context Protocol) Support -**Status**: In Review -**Created**: 10/21/2025 -**Authors**: @Darktex, @pankit-eng -**RFC ID:** 003 +**Status**: In Review **Created**: 10/21/2025 **Amended**: November 15, 2025 **Authors**: @Darktex, @pankit-eng **RFC ID:** 003 -## Summary +## Amendment History -This RFC defines how OpenEnv integrates with MCP (Model Context Protocol) to expose external tools to agents. We propose supporting both traditional function-calling paradigms and CodeAct-style execution by implementing an MCP client that exposes remote MCP server tools as Python functions within our execution environments. +**November 12, 2025**: -## Motivation +- Restructured to start with MCP primer showing REST-like API (tools/list, tools/call) +- Added Traditional MCP Interface section with ListToolsAction and CallToolAction +- Mentioned resources and prompts in passing (focused on tools) +- Moved progressive disclosure to FAQ (MCP protocol layer concern) +- Made Docker Compose deployment pattern explicitly optional +- Added FAQ section for common questions +- Removed CodeAct content (moved to RFC 004\) -### Problem Statement +**Note on RFC 007**: MCP Protocol Interception will define how to intercept MCP calls at the protocol layer for observability, monitoring, and metadata injection. This enables features like logging all tool calls, injecting performance metadata (see RFC 006), and A/B testing tool implementations. -Modern AI agents need access to external tools (web search, file operations, database queries, etc.). While MCP provides a standardized protocol for defining and exposing these tools, there are two distinct usage patterns that need support: +## Summary -1. **Traditional Tool Calling**: Agent explicitly calls a tool by name with structured parameters (e.g., `call_tool("search_web", {"query": "python patterns"})`) -2. **CodeAct Paradigm**: Agent writes Python code that directly imports and calls tools as if they were native Python functions (e.g., `from tools import search_web; results = search_web(query="python patterns")`) +This RFC defines how OpenEnv integrates with MCP (Model Context Protocol) to expose external tools to agents using traditional tool calling. We map MCP's REST-like API (tools/list, tools/call) to Gym-style action types, creating a standardized interface for tool discovery and invocation. -MCP's RPC-based, language-agnostic design works naturally for the first pattern but requires additional infrastructure for the second. +To limit the focus on finalizing the API interface, we are intentionally choosing to focus only on local MCP tools via this RFC (and respective changes). This means that any MCP tool shall be packaged inside the OpenEnv sandbox (docker container) for the purpose of this RFC. -### Goals +## Problem Statement -1. **MCP Compatibility**: Support standard MCP servers without modification -2. **Dual Paradigm Support**: Enable both traditional tool calling and CodeAct execution styles -3. **Language Independence**: Leverage MCP's language-agnostic design to support tools written in any language -4. **Deployment Simplicity**: Provide patterns for deploying MCP servers alongside environments -5. **Developer Experience**: Make tools feel native to Python in CodeAct mode +Modern AI agents need access to tools (web search, file operations, database queries, etc.). OpenEnv environments follow the Gym-style specifications with defined action spaces. We need a standardized way to surface tools as part of the environment's interface to policy for RL training and at the same time, use the same tools during inference. -## Background: MCP Architecture +## High Level Approach -### Overview +There are primarily two problems in relation to tools that we want to solve with this proposal. -MCP (Model Context Protocol) is a protocol for exposing tools to AI models. It uses: -- **JSON Schema** for tool definitions and parameter validation -- **RPC-based communication** (typically over stdio, HTTP, or SSE) -- **Language independence** - servers can be written in any language +1. Action Discovery: This entails the mechanism & interface by which RL training code as well as inference discovers the tools available at its disposal. +2. Actions & Tool Calling: This entails the mechanism & interface by which RL training code as well inference calls the tools it discovered as part of Tool discovery. -### Standard MCP Flow +In terms of design principles, we want to approach this with: -``` -┌─────────────┐ ┌─────────────┐ -│ │ 1. list_tools() │ │ -│ MCP Client │─────────────────────────>│ MCP Server │ -│ │ │ │ -│ │<─────────────────────────│ │ -│ │ 2. Tool definitions │ │ -│ │ (JSON Schema) │ │ -│ │ │ │ -│ │ 3. call_tool(name, │ │ -│ │ params) │ │ -│ │─────────────────────────>│ │ -│ │ │ │ -│ │<─────────────────────────│ │ -│ │ 4. Tool result (JSON) │ │ -└─────────────┘ └─────────────┘ -``` +1. **Be as close as possible to MCP protocol** when it comes to tool support in OpenEnv. This reduces the barrier for adoption. +2. **Minimize the difference between training vs inference**. This helps achieve the same or expected level of performance from the agents. +3. **Continue supporting simple gym-style APIs for RL** but augment where needed to support the same env in inference. -### Why MCP Works for Traditional Tool Calling +### Proposed Solution -In RFC 004, we defined `ToolCallAction` with a `tool_name` and `parameters` structure. This maps naturally to MCP: +We propose to **adopt** the MCP interface for **all actions exposed to the Agent** in OpenEnv. Note that this does **not** include what we instead expose to the training/serving infrastructure (the control plane), which we will talk about later. -```python -# RFC 004 style -action = ToolCallAction( - tool_name="search_web", - parameters={"query": "python patterns", "max_results": 5} -) -observation = env.step(action) - -# Maps to MCP call_tool RPC -mcp_client.call_tool( - name="search_web", - arguments={"query": "python patterns", "max_results": 5} -) -``` +While not every action may use what we could call “tools”, the general need to list all possible actions and to execute one overlaps exactly with MCP. So, this RFC proposes to just standardize on the MCP protocol for all actions as providing a consistent interface helps models. -The environment can act as an MCP client, forwarding tool calls to the MCP server and returning results in observations. +There are two ways to perform actions: -### Why MCP Needs Adaptation for CodeAct +1. **Tool-calling** makes a single tool call per agent action (and thus env step) +2. **CodeAct** allows the agent to write code blocks as an action and exposes tools as methods callable within the code block, thus enabling multiple tool call per single action -In CodeAct, agents write Python code that executes directly. **Best Practice**: Tools should be pre-imported in the execution environment, and the model should be informed of available tools via a system prompt. Any import statements written by the model should be ignored. +Our proposal supports both: Env builders will **build once** and will be compatible with either style out of the box. -```python -# Agent generates this code (no imports!) -# Tools are already available: search_web, read_file +Let’s go through some scenarios. -results = search_web(query="python patterns", max_results=5) -config = read_file(path="/workspace/config.json") -print(f"Found {len(results)} results") -``` +#### Scenario 1: Perform actions/call MCP tools (they become the same thing) -**Pros**: -- **Security**: Prevents arbitrary module imports -- **Determinism**: Environment controls exactly what's available -- **Simplicity**: Model doesn't need to guess import syntax or module names -- **Reliability**: Avoids import errors and version conflicts +| Tool call style | CodeAct style | +| :---- | :---- | +| `obs = env.step( ToolCallAction( tool="read_text_file", path="~/file.txt", head=100 ) ) ` | `obs = env.step( CodeAction(""" text = read_text_file('~/file.txt', head=100) return [l for l in text.split("\n") if "needle" in l] """) )` | -This requires: -1. **Pre-import tools**: Inject tool functions into the execution namespace before running agent code -2. **Function call translation**: Converting Python function calls to MCP RPC calls -3. **Type marshaling**: Converting between Python types and JSON for MCP communication -4. **Import filtering**: Strip or ignore import statements from agent-generated code +#### Scenario 2: Discover actions/MCP tools (they become the same thing) -## Design +| Tool call style | CodeAct style | +| :---- | :---- | +| `obs = env.step(ListToolsAction()) # obs contains the list. We provide this action for you, you don't have to code it.` | There is only one action: `CodeAction`, but you can call `list_tools()` as a function inside it. `obs = env.step( CodeAction(""" return [t for t in list_tools() if 'browser' in t] """) ) ` Again, we will do this plumbing for you. | -### Architecture Overview +#### Scenario 3: I already have my own MCP client for inference -#### 3. System Prompt for Tool Availability +A lot of what `env.step()` provides can be performed by an MCP client, assuming you don't care about the other things we do like e.g. reward computation, evals etc (which you probably don't in prod). In this case, we expose an MCP endpoint alongside the existing HTTP API. -When using CodeAct with pre-imported tools, the agent needs to know what's available. We provide this via a system prompt: + **Transport:** -```python -def generate_tool_system_prompt(registry: MCPToolRegistry) -> str: - """Generate a system prompt describing available tools. +* We reuse the same HTTP transport via fastAPI that we are using for the Env to also expose the MCP server (content will follow MCP's JSON-RPC message format). + * NOTE: MCP’s default transport protocols are stdio and streamable HTTP. However, [Custom transport protocols](https://modelcontextprotocol.io/specification/2025-06-18/basic/transports#custom-transports) are allowed and still in-spec. +* Future: We plan to support standard Streamable HTTP transport to reduce latency and be more aligned with the spec. - Args: - registry: Tool registry containing available tools +```py +# Environment exposes both interfaces + env = MyEnvironment() + app = create_fastapi_app(env, action_cls, observation_cls) - Returns: - System prompt text describing available tools - """ - prompt_parts = [ - "You are writing Python code that will be executed.", - "The following tools are pre-imported and available for use:", - "" - ] + # Training/Eval: Use step API + POST http://localhost:8000/step + {"action": {...}} - for tool_name in registry.list_tools(): - tool = registry.get_tool(tool_name) - prompt_parts.append(f"- {tool_name}: {tool.__doc__ or 'No description'}") - - prompt_parts.extend([ - "", - "IMPORTANT: Do NOT write import statements. All tools are already available.", - "Simply call the functions directly by name.", - "", - "Example:", - "results = search_web(query='python async', max_results=5)", - "content = read_file(path='/workspace/config.json')", - ]) - - return "\n".join(prompt_parts) -``` + # Production/Inference: Use MCP API + POST http://localhost:8000/mcp + {"jsonrpc": "2.0", "method": "tools/list", "id": 1} + POST http://localhost:8000/mcp + {"jsonrpc": "2.0", "method": "tools/call", "params": {"name": "...", "arguments": {...}}, "id": 2} ``` -┌────────────────────────────────────────────────────────────────┐ -│ Docker Environment │ -│ │ -│ ┌─────────────────────────────────────────────────────────┐ │ -│ │ Agent Code (Python) │ │ -│ │ │ │ -│ │ # Traditional style (RFC 003) │ │ -│ │ action = ToolCallAction( │ │ -│ │ tool_name="search_web", │ │ -│ │ parameters={"query": "..."} │ │ -│ │ ) │ │ -│ │ env.step(action) │ │ -│ │ │ │ -│ │ # CodeAct style (NEW) │ │ -│ │ # Tools pre-imported, no import statements needed │ │ -│ │ results = search_web(query="...") │ │ -│ └────────────────────┬────────────────────────────────────┘ │ -│ │ │ -│ │ Python import/call │ -│ ▼ │ -│ ┌─────────────────────────────────────────────────────────┐ │ -│ │ MCP Client (Python Library) │ │ -│ │ │ │ -│ │ - Tool discovery & caching │ │ -│ │ - Dynamic Python function generation │ │ -│ │ - Type marshaling (Python ↔ JSON) │ │ -│ │ - RPC communication with MCP servers │ │ -│ └────────────────────┬────────────────────────────────────┘ │ -│ │ HTTP/SSE │ -└───────────────────────┼────────────────────────────────────────┘ - │ - ┌───────────────┼───────────────┐ - │ │ │ - ▼ ▼ ▼ -┌──────────────┐ ┌──────────────┐ ┌──────────────┐ -│ MCP Server 1 │ │ MCP Server 2 │ │ MCP Server N │ -│ (Search) │ │ (Files) │ │ (Database) │ -└──────────────┘ └──────────────┘ └──────────────┘ -``` - -### Core Components - -#### 1. MCP Client Library - -We need an MCP client that can run inside our Python execution environments. We have three options: -**Option A: Build our own** - Full control but requires maintenance -**Option B: Use FastMCP** - Popular, well-maintained Python MCP client -**Option C: Use mcp-use** - Alternative Python MCP library -**Recommendation**: Start with **FastMCP** or **mcp-use** as they provide: -- Standard MCP protocol implementation -- Tool discovery and schema parsing -- RPC communication primitives -- Active maintenance and community support + Limitations (to be addressed later): + \- No SSE streaming (single request/response only) + \- No server-initiated messages + \- No session management + \- These will be added when we implement standard Streamable HTTP transport -#### 2. Tool Registry & Namespace Injection +## CodeAct-specific performance improvement for local MCP servers {#codeact-specific-performance-improvement-for-local-mcp-servers} -```python -from typing import Any, Callable, Dict -import inspect +MCP is a client-server architecture that requires JSON-RPC for cross-process communication. This design maps naturally to remote MCP servers and tool calling-style. When we go into CodeAct-style, there’s an extra issue to think about whenever the MCP server is local: **double-marshaling**. -class MCPToolRegistry: - """Registry that exposes MCP tools as Python functions. +If you write a Python function for local execution, e.g. `fibonacci(n)`, and then decorate it with a fastmcp `@mcp.tool` decorator, you can then expose it via MCP to the fastMCP server. This means that this function will now take RPCs and will marshal types from Python → JSON-RPC. If you then want to use CodeAct, you want to pass this MCP tool to a Python interpreter for local execution, thus doing another conversion JSON-RPC → Python\! You did a whole round trip just to come back to square one. - This bridges MCP's RPC-based tool calling to Python's function - call syntax, enabling CodeAct-style tool usage. +Existing implementations of MCP \+ CodeAct from [Anthropic](https://www.anthropic.com/engineering/code-execution-with-mcp) and [CloudFlare](https://blog.cloudflare.com/code-mode/) do not address this, but they likely have more remote MCP servers (while we focus on local MCP servers at this stage). Many of our environments come from gym-style, no-tool envs for RL, which will translate into becoming local MCP servers. With CodeAct becoming mainstream, performance becomes a concern for us. - Tools are injected into the execution namespace before running - agent code, eliminating the need for import statements. - """ +### Proposal: use introspection? - def __init__(self, mcp_clients: list[MCPClient]): - """Initialize registry with one or more MCP clients. +We can build a MCP server similar to FastMCP, but with the ability to register local tool calls, e.g. - Args: - mcp_clients: List of MCP clients connected to different servers - """ - self.mcp_clients = mcp_clients - self._tool_map: Dict[str, tuple[MCPClient, ToolDefinition]] = {} - self._discover_tools() +```py +from mcp.server import Server - def _discover_tools(self) -> None: - """Discover all tools from connected MCP servers.""" - for client in self.mcp_clients: - tools = client.list_tools() - for tool in tools: - self._tool_map[tool.name] = (client, tool) + class LocalMCPServer: + def __init__(self): + self.mcp = Server("myserver") + self._callables = {} # Store underlying functions - def get_tool(self, name: str) -> Callable: - """Get a Python-callable wrapper for an MCP tool. + def tool(self, func): + """Decorator that registers both MCP tool AND stores callable""" + # Store the actual Python function + self._callables[func.__name__] = func - Args: - name: Tool name + # Also register for MCP protocol + @self.mcp.call_tool() + async def mcp_handler(name: str, arguments: dict): + if name == func.__name__: + return func(**arguments) - Returns: - Callable that executes the MCP tool + return func - Example: - # Get a single tool wrapper - search = registry.get_tool("search_web") - results = search(query="python", max_results=5) - """ - if name not in self._tool_map: - raise ValueError(f"Tool '{name}' not found in registry") + def get_callables(self) -> dict[str, Callable]: + """Return dict of {function_name: function}""" + return self._callables.copy() - client, tool_def = self._tool_map[name] + # Define tools + server = LocalMCPServer() - def tool_wrapper(**kwargs: Any) -> Any: - """Generated wrapper function that calls MCP tool.""" - # Validate parameters against JSON schema - self._validate_params(tool_def, kwargs) + @server.tool + def fibonacci(n: int) -> int: + if n <= 1: return n + return fibonacci(n-1) + fibonacci(n-2) - # Call MCP server - result = client.call_tool(name, kwargs) - - # Parse and return result - return result +``` - # Set function metadata for better introspection - tool_wrapper.__name__ = name - tool_wrapper.__doc__ = tool_def.description +Then we can inject these functions into the namespace of the Python interpreter: - # Generate type hints from JSON schema (optional enhancement) - # tool_wrapper.__annotations__ = self._schema_to_annotations(tool_def) +```py +class CodeActEnvironment: + def __init__(self): + self.mcp_server = self._create_mcp_server() - return tool_wrapper + def _get_local_tools_as_callables(self) -> dict[str, Callable]: + """Extract Python callables from local MCP server""" + tools = {} - def list_tools(self) -> list[str]: - """List all available tool names.""" - return list(self._tool_map.keys()) + # Introspect the MCP server to get underlying Python functions + # (assumes local MCP server exposes this - see above) + for tool_name, func in self.mcp_server.get_callables().items(): + tools[tool_name] = func - def get_all_tools(self) -> Dict[str, Callable]: - """Get all tools as a dictionary for namespace injection. + return tools - Returns: - Dictionary mapping tool names to callable wrappers + def execute_code(self, code: str): + """Execute agent's code with tools injected directly""" - Example: - tools = registry.get_all_tools() - # Inject into execution namespace - exec_globals = {**globals(), **tools} - exec(agent_code, exec_globals) - """ - return {name: self.get_tool(name) for name in self.list_tools()} + # Build execution namespace with tools as direct functions + namespace = self._get_local_tools_as_callables() - def _validate_params(self, tool_def: ToolDefinition, params: Dict[str, Any]) -> None: - """Validate parameters against tool's JSON schema.""" - # Use jsonschema library for validation - # This ensures type safety even with dynamic calls - pass + # Agent's code now has direct access! + exec(code, namespace) + return namespace.get('result') # Or however you capture output -def filter_imports(code: str) -> str: - """Remove import statements from agent-generated code. +``` - This prevents models from attempting to import modules, - since all tools are pre-imported into the namespace. - - Args: - code: Python code that may contain import statements +Of course, **remote** MCP servers would go through the standard route – even if they are written in Python, there isn’t much we can do since we need to send data over the wire anyway. This optimization only applies to local servers—remote MCP servers will continue to use the standard protocol since data must be serialized over the network anyway. This approach preserves MCP's benefits for tool discovery and model prompting while eliminating redundant marshaling for local execution. - Returns: - Code with import statements removed +## Design - Example: - code = ''' - from tools import search_web - import os - - result = search_web(query="test") - ''' - - filtered = filter_imports(code) - # filtered = "result = search_web(query='test')" - """ - import re - # Remove 'import ...' and 'from ... import ...' lines - lines = code.split('\n') - filtered_lines = [ - line for line in lines - if not re.match(r'^\s*(import\s+|from\s+.*\s+import\s+)', line) - ] - return '\n'.join(filtered_lines) +### Architecture Overview \- Tool-call mode + +```mermaid +flowchart TB + Inference["Inference Server + (e.g., vLLM)"] + + subgraph icp["Infrastructure Control Plane"] + Reset["Use Case 1: + Reset Environment"] + RunStep["Use Case 2: + Run Agent Step + (Standard Flow)"] + DirectMCP["Use Case 3: + Call Tool Directly + (Alternative Flow)"] + end + + subgraph container["OpenEnv Container"] + ResetEndpoint["reset()"] + + StepEndpoint["step() + + MCP Client + Reward"] + + ParentMCP["Parent MCP Server + (composition)"] + + subgraph children["Child MCP Servers"] + FS["Filesystem"] + Git["Git"] + Browser["Browser"] + end + end + + %% Use Case 1: Reset + Reset -->|"POST /reset"| ResetEndpoint + + %% Use Case 2: Agent Step (Standard) + RunStep -->|"request text"| Inference + Inference -->|"generated text"| RunStep + RunStep -->|"POST /step + (with text)"| StepEndpoint + StepEndpoint -->|"Observation + + Reward"| RunStep + + %% Use Case 3: Direct MCP (Alternative) + DirectMCP -.->|"POST /mcp + (bypass step)"| ParentMCP + + %% MCP Flow inside container + StepEndpoint -->|"JSON-RPC"| ParentMCP + + %% MCP Composition + ParentMCP <-->|"delegates"| FS + ParentMCP <-->|"delegates"| Git + ParentMCP <-->|"delegates"| Browser + + %% Styling + classDef icpBox fill:#d6eaff,stroke:#0066cc,stroke-width:3px + classDef useCaseBox fill:#e3f2fd,stroke:#1976d2,stroke-width:1px + classDef inferenceBox fill:#fff3e0,stroke:#f57c00,stroke-width:2px + classDef containerBox fill:#fffacd,stroke:#ff9800,stroke-width:3px + classDef endpointBox fill:#f3e5f5,stroke:#9c27b0,stroke-width:2px + classDef childBox fill:#f3e5f5,stroke:#9c27b0,stroke-width:1px + + class Inference inferenceBox + class Reset,RunStep,DirectMCP useCaseBox + class ResetEndpoint,StepEndpoint,ParentMCP endpointBox + class FS,Git,Browser childBox ``` -### Integration with Environment Interface - -#### Traditional Tool Calling (RFC 003 Style) +### Architecture Overview \- CodeAct mode + +```mermaid +flowchart TB + Inference["Inference Server + (e.g., vLLM)"] + + subgraph icp["Infrastructure Control Plane"] + Reset["Use Case 1: + Reset Environment"] + RunStep["Use Case 2: + Run Agent Step + (CodeAct Mode)"] + DirectMCP["Use Case 3: + Call Tool Directly + (Alternative Flow)"] + end + + subgraph container["OpenEnv Container"] + ResetEndpoint["reset()"] + + StepEndpoint["step() + + Code Sandbox + Reward"] + + ParentMCP["Parent MCP Server + (composition)"] + + subgraph local["Local Child Servers + (Direct Python calls)"] + FS["Filesystem + ⚡ Direct callable"] + Git["Git + ⚡ Direct callable"] + Python["Python REPL + ⚡ Direct callable"] + end + + subgraph remote["Remote Child Servers + (JSON-RPC)"] + Slack["Slack API + 🌐 JSON-RPC"] + External["External APIs + 🌐 JSON-RPC"] + end + end + + %% Use Case 1: Reset + Reset -->|"POST /reset"| ResetEndpoint + + %% Use Case 2: Agent Step (CodeAct) + RunStep -->|"request text"| Inference + Inference -->|"generated Python code"| RunStep + RunStep -->|"POST /step + (with code)"| StepEndpoint + StepEndpoint -->|"Observation + + Reward"| RunStep + + %% Use Case 3: Direct MCP (Alternative) + DirectMCP -.->|"POST /mcp + (bypass step)"| ParentMCP + + %% Code execution - local direct calls (NO JSON-RPC) + StepEndpoint -->|"Direct Python call + NO marshaling"| FS + StepEndpoint -->|"Direct Python call + NO marshaling"| Git + StepEndpoint -->|"Direct Python call + NO marshaling"| Python + + %% Code execution - remote JSON-RPC + StepEndpoint -->|"JSON-RPC + from within code"| Slack + StepEndpoint -->|"JSON-RPC + from within code"| External + + %% All servers registered with Parent + FS -->|"registered"| ParentMCP + Git -->|"registered"| ParentMCP + Python -->|"registered"| ParentMCP + Slack -->|"registered"| ParentMCP + External -->|"registered"| ParentMCP + + %% Styling + classDef icpBox fill:#d6eaff,stroke:#0066cc,stroke-width:3px + classDef useCaseBox fill:#e3f2fd,stroke:#1976d2,stroke-width:1px + classDef inferenceBox fill:#fff3e0,stroke:#f57c00,stroke-width:2px + classDef containerBox fill:#fffacd,stroke:#ff9800,stroke-width:3px + classDef endpointBox fill:#f3e5f5,stroke:#9c27b0,stroke-width:2px + classDef localBox fill:#e8f5e9,stroke:#4caf50,stroke-width:2px + classDef remoteBox fill:#e3f2fd,stroke:#2196f3,stroke-width:2px + + class Inference inferenceBox + class Reset,RunStep,DirectMCP useCaseBox + class ResetEndpoint,StepEndpoint,ParentMCP endpointBox + class FS,Git,Python localBox + class Slack,External remoteBox +``` -Environments act as MCP clients: +### Frequently Asked Questions -```python -from core.env_server import Environment, Observation -from mcp_client import MCPClient +### FAQ 001\. How do we handle progressive disclosure with 100+ tools? -class ToolCallingEnvironment(Environment): - """Environment that forwards ToolCallActions to MCP servers.""" +Progressive disclosure (showing a subset of tool schemas to save context) is handled at the **MCP protocol layer**, not by OpenEnv. MCP servers can implement progressive disclosure patterns, and MCP clients can provide `get_tool_schema()` meta-tools for on-demand schema loading. - def __init__(self, mcp_servers: list[str]): - self.mcp_clients = [MCPClient(url) for url in mcp_servers] - self.registry = MCPToolRegistry(self.mcp_clients) +OpenEnv simply consumes MCP's `list_tools()` and `call_tool()` APIs. How the MCP server manages tool schemas internally is outside our scope. - def step(self, action: Action) -> Observation: - if isinstance(action, ToolCallAction): - # Forward to MCP server - tool = self.registry.get_tool(action.tool_name) - result = tool(**action.parameters) - - # Convert result to observation - return self._make_observation(result) - else: - raise ValueError(f"Expected ToolCallAction, got {type(action)}") - - def tools(self) -> list[ToolDefinition]: - """RFC 003 tool discovery API.""" - return [tool_def for _, tool_def in self.registry._tool_map.values()] -``` +**Resources**: -#### CodeAct Style +- [MCP Specification on Resource Discovery](https://spec.modelcontextprotocol.io/) +- Future work (RFC 007\) will explore MCP protocol interception for caching and metadata injection -Python code execution environments pre-import tools into the execution namespace: +### FAQ 002\. Wait, so every action is a “tool” now?\! -```python -from core.env_server import Environment -from core.tools import PyExecutor -from mcp_client import MCPClient, MCPToolRegistry, filter_imports +Yes. So, if e.g. you are playing a game of chess, moving pieces becomes a “tool” of sorts. Ultimately what MCP considers a “tool” is simply a function, so as long as you had a function before, you are just a Python decorator away from migrating to this brand new world. What you get is worth it as models are primed to call MCP servers, so they benefit from this standardization as it’s much more in distribution. We propose a [way](#codeact-specific-performance-improvement-for-local-mcp-servers) to mitigate performance impact of this while on local MCP servers. -class CodeActEnvironment(Environment): - """Environment for CodeAct with MCP tool access.""" +### FAQ 003\. How many servers do you have? - def __init__(self, mcp_servers: list[str]): - self.executor = PyExecutor() +It’s up to env builders to decide how many servers they want to expose – ultimately, it’s a question of how you want to group your tools. - # Initialize MCP clients and registry - mcp_clients = [MCPClient(url) for url in mcp_servers] - self.registry = MCPToolRegistry(mcp_clients) +However, we should expose **a single MCP server** by combining them as appropriate: FastMCP (and most MCP implementations) support [composition](https://gofastmcp.com/servers/composition) for this reason. - # Pre-import all tools into execution namespace - self.tool_namespace = self.registry.get_all_tools() +# Appendix - def step(self, action: CodeAction) -> Observation: - # Filter out any import statements from agent code - filtered_code = filter_imports(action.code) +### MCP Primer: What We Need to Implement - # Execute with tools pre-injected into namespace - result = self.executor.run( - filtered_code, - extra_globals=self.tool_namespace - ) - return self._make_observation(result) +The [Model Context Protocol (MCP)](https://modelcontextprotocol.io/) is an open standard that provides a REST-like API for AI agents to interact with external systems. To be MCP-compliant, we need to implement a client that can communicate with MCP servers. - def get_system_prompt(self) -> str: - """Get system prompt describing available tools.""" - return generate_tool_system_prompt(self.registry) -``` +#### Core MCP APIs -### MCP Server Deployment - -#### Deployment Pattern - -MCP servers should be deployed alongside environment containers. We propose a Docker Compose pattern: - -```yaml -# docker-compose.yml -version: '3.8' - -services: - # Main environment container - environment: - build: ./environment - ports: - - "8000:8000" - environment: - - MCP_SERVERS=http://mcp-search:8001,http://mcp-files:8002 - depends_on: - - mcp-search - - mcp-files - networks: - - agent-network - - # MCP server for web search - mcp-search: - image: mcp-search-server:latest - ports: - - "8001:8001" - environment: - - SEARCH_API_KEY=${SEARCH_API_KEY} - networks: - - agent-network - - # MCP server for file operations - mcp-files: - image: mcp-files-server:latest - ports: - - "8002:8002" - volumes: - - workspace:/workspace:ro - networks: - - agent-network - -networks: - agent-network: - driver: bridge - -volumes: - workspace: -``` +MCP servers expose three main capabilities through JSON-RPC: -#### Environment Configuration +##### 1\. Tools (Primary Focus) -Environments specify required MCP servers in their configuration: +Tools are functions that agents can call to perform actions or retrieve information. -```python -# environment_config.py -from dataclasses import dataclass -from typing import List +**Discovery**: `tools/list` -@dataclass -class MCPServerConfig: - """Configuration for an MCP server.""" - name: str - image: str - port: int - env_vars: dict[str, str] = None +```json +{ + "jsonrpc": "2.0", + "method": "tools/list", + "id": 1 +} +``` -@dataclass -class EnvironmentConfig: - """Environment configuration including MCP dependencies.""" - name: str - image: str - mcp_servers: List[MCPServerConfig] - -# Example configuration -CODING_ENV_CONFIG = EnvironmentConfig( - name="coding-env", - image="coding-env:latest", - mcp_servers=[ - MCPServerConfig( - name="search", - image="mcp-search-server:latest", - port=8001, - env_vars={"SEARCH_API_KEY": "${SEARCH_API_KEY}"} - ), - MCPServerConfig( - name="files", - image="mcp-files-server:latest", - port=8002, - ), +**Response**: + +```json +{ + "jsonrpc": "2.0", + "result": { + "tools": [ + { + "name": "search_web", + "description": "Search the web for information", + "inputSchema": { + "type": "object", + "properties": { + "query": { "type": "string" }, + "max_results": { "type": "integer", "default": 5 } + }, + "required": ["query"] + } + } ] -) + }, + "id": 1 +} ``` -#### Build & Deployment Tools - -We provide utilities to generate Docker Compose files from environment configs: - -```python -from pathlib import Path -import yaml - -def generate_compose_file(config: EnvironmentConfig, output_path: Path) -> None: - """Generate docker-compose.yml from environment config.""" - compose = { - "version": "3.8", - "services": {}, - "networks": {"agent-network": {"driver": "bridge"}}, +**Invocation**: `tools/call` + +```json +{ + "jsonrpc": "2.0", + "method": "tools/call", + "params": { + "name": "search_web", + "arguments": { + "query": "python patterns", + "max_results": 5 } - - # Main environment service - mcp_urls = [f"http://{s.name}:{s.port}" for s in config.mcp_servers] - compose["services"]["environment"] = { - "image": config.image, - "ports": ["8000:8000"], - "environment": { - "MCP_SERVERS": ",".join(mcp_urls) - }, - "depends_on": [s.name for s in config.mcp_servers], - "networks": ["agent-network"], - } - - # MCP server services - for server in config.mcp_servers: - compose["services"][server.name] = { - "image": server.image, - "ports": [f"{server.port}:{server.port}"], - "networks": ["agent-network"], - } - if server.env_vars: - compose["services"][server.name]["environment"] = server.env_vars - - # Write compose file - output_path.write_text(yaml.dump(compose)) - -# Usage -generate_compose_file(CODING_ENV_CONFIG, Path("docker-compose.yml")) + }, + "id": 2 +} ``` -## Key Design Decisions - -### Decision 1: MCP Client Implementation +**Full specification**: [MCP Tools API](https://modelcontextprotocol.io/specification/2025-06-18/server/tools) -**Chosen Approach**: Use existing Python MCP client library (FastMCP or mcp-use) rather than building our own. +##### 2\. Resources (Secondary \- mentioned for completeness) -**Rationale**: -- **Faster development**: Leverage existing, tested implementations -- **Standard compliance**: These libraries follow MCP spec changes -- **Community support**: Benefit from community bug fixes and features -- **Focus on value-add**: Spend effort on CodeAct integration, not protocol details +Resources provide read-only data that agents can access (files, database queries, API responses, etc.). -**Trade-offs**: -- External dependency (mitigated by vendoring if needed) -- Less control over implementation details +- `resources/list` \- Discover available resources +- `resources/read` \- Read resource contents -### Decision 2: Pre-Import Tools vs Import Statements +**Full specification**: [MCP Resources API](https://modelcontextprotocol.io/specification/2025-06-18/server/resources) -**Chosen Approach**: Pre-import all tools into the execution namespace and filter out import statements from agent code. +##### 3\. Prompts (Secondary \- mentioned for completeness) -**Rationale**: -- **Security**: Prevents arbitrary module imports that could access system resources -- **Determinism**: Environment has full control over available tools -- **Reliability**: Eliminates import errors and module not found issues -- **Simplicity**: Model doesn't need to know correct import syntax -- **Best Practice**: Aligns with sandboxed code execution principles +Prompts are reusable prompt templates that agents can retrieve and instantiate. -**Trade-offs**: -- Requires filtering/stripping import statements from agent code -- Need clear system prompts to inform model of available tools -- Less "natural" than writing actual imports (but safer and more reliable) +- `prompts/list` \- Discover available prompts +- `prompts/get` \- Retrieve a prompt template -### Decision 3: Docker Compose for MCP Server Orchestration +**Full specification**: [MCP Prompts API](https://modelcontextprotocol.io/specification/2025-06-18/server/prompts) -**Chosen Approach**: Use Docker Compose to deploy MCP servers alongside environment containers. +**Note**: This RFC focuses primarily on **tools** as they are the most common use case for agent-environment interaction. Resources and prompts are mentioned for completeness but will be addressed in future RFCs if needed. -**Rationale**: -- **Declarative**: Clear specification of dependencies -- **Standard tooling**: Docker Compose is widely understood -- **Networking**: Built-in network isolation and service discovery -- **Development experience**: Easy local testing with `docker-compose up` +#### Traditional MCP Interface: Function Calling -**Trade-offs**: -- Additional complexity for simple environments without tools -- May need adaptation for Kubernetes deployments (future RFC) +The most direct way to support MCP is to map the protocol directly to action types. -## Examples +##### ListToolsAction and CallToolAction -### Example 1: Traditional Tool Calling with MCP +We introduce two action types that correspond directly to MCP's API: -```python -from envs.tool_calling_env import ToolCallingEnv, ToolCallAction +```py +@dataclass +class ListToolsAction(Action): + """Request list of available tools from MCP servers.""" + pass # No parameters needed -# Start environment with MCP servers -env = ToolCallingEnv.from_docker_compose("docker-compose.yml") +@dataclass +class CallToolAction(Action): + """Call a specific tool via MCP.""" + tool_name: str + parameters: Dict[str, Any] +``` -# Discover available tools -tools = env.tools() -print(f"Available tools: {[t.name for t in tools]}") -# Output: ['search_web', 'read_file', 'write_file'] +##### How It Works -# Reset environment -obs = env.reset() +```py +# Agent discovers available tools +action = ListToolsAction() +obs = env.step(action) +# obs.tools = [ +# {"name": "search_web", "description": "...", "inputSchema": {...}}, +# {"name": "read_file", "description": "...", "inputSchema": {...}} +# ] -# Agent makes tool call -action = ToolCallAction( +# Agent calls a tool +action = CallToolAction( tool_name="search_web", - parameters={"query": "Python async patterns", "max_results": 5} + parameters={"query": "python patterns", "max_results": 5} ) - -obs = env.step(action) -print(obs.metadata["tool_result"]) -# Output: [{"title": "...", "url": "...", ...}, ...] - -env.close() -``` - -### Example 2: CodeAct with MCP Tools - -```python -from envs.codeact_env import CodeActEnv, CodeAction - -# Start environment (MCP servers started automatically) -env = CodeActEnv.from_docker_compose("docker-compose.yml") - -# Reset environment -obs = env.reset() - -# Get system prompt to inform agent of available tools -system_prompt = env.get_system_prompt() -print(system_prompt) -# Output: -# You are writing Python code that will be executed. -# The following tools are pre-imported and available for use: -# -# - search_web: Search the web for information -# - read_file: Read contents of a file -# ... - -# Agent generates code that uses MCP tools (no imports!) -agent_code = """ -# Tools are already available, just use them directly -results = search_web(query="Python async patterns", max_results=5) -print(f"Found {len(results)} results") - -# Read configuration -config = read_file(path="/workspace/config.json") -print(f"Config: {config}") -""" - -# Execute code (tools available transparently) -action = CodeAction(code=agent_code) obs = env.step(action) - -print(obs.stdout) -# Output: -# Found 5 results -# Config: {...} - -env.close() +# obs.result = {"results": [...]} ``` -### Example 3: Building MCP Server for Custom Tools - -```python -# my_tools_server.py -from fastmcp import FastMCP - -mcp = FastMCP("My Custom Tools") - -@mcp.tool() -def analyze_sentiment(text: str) -> dict: - """Analyze sentiment of text. - - Args: - text: Text to analyze - - Returns: - Dictionary with sentiment scores - """ - # Your implementation - return { - "positive": 0.8, - "negative": 0.1, - "neutral": 0.1 - } - -@mcp.tool() -def summarize_text(text: str, max_length: int = 100) -> str: - """Summarize text to specified length. +##### Environment Implementation - Args: - text: Text to summarize - max_length: Maximum length of summary +```py +from mcp_client import MCPClient - Returns: - Summarized text - """ - # Your implementation - return text[:max_length] + "..." +class MCPEnvironment(Environment): + def __init__(self, mcp_server_urls: list[str]): + self.mcp_clients = [MCPClient(url) for url in mcp_server_urls] -if __name__ == "__main__": - mcp.run() + def step(self, action: Action) -> Observation: + if isinstance(action, ListToolsAction): + # Call tools/list on all MCP servers + all_tools = [] + for client in self.mcp_clients: + tools = client.list_tools() + all_tools.extend(tools) + + return Observation( + done=False, + metadata={"tools": all_tools} + ) + + elif isinstance(action, CallToolAction): + # Find the right MCP server and call tools/call + for client in self.mcp_clients: + if client.has_tool(action.tool_name): + result = client.call_tool( + name=action.tool_name, + arguments=action.parameters + ) + return Observation( + done=False, + metadata={"result": result} + ) + + raise ValueError(f"Tool '{action.tool_name}' not found") ``` -```dockerfile -# Dockerfile for custom MCP server -FROM python:3.10-slim - -WORKDIR /app - -RUN pip install fastmcp +This approach is **immediately MCP-compliant** \- we're just exposing the MCP REST API as Gym-style actions. -COPY my_tools_server.py . - -EXPOSE 8001 - -CMD ["python", "my_tools_server.py"] -``` ## Open Questions 1. **Caching**: Should we cache tool results, and if so, what's the invalidation strategy? @@ -729,6 +552,8 @@ CMD ["python", "my_tools_server.py"] 3. **Error Handling**: Should MCP errors be propagated as exceptions or returned in observations? 4. **Versioning**: How to handle version compatibility between MCP clients and servers? +**Note**: Observability and protocol-level interception for MCP will be addressed in RFC 007 (MCP Protocol Interception), which will define patterns for monitoring, logging, and metadata injection at the MCP protocol layer. + ## References - [Model Context Protocol Specification](https://spec.modelcontextprotocol.io/) @@ -737,4 +562,3 @@ CMD ["python", "my_tools_server.py"] - RFC 000: OpenEnv Project Phases - RFC 001: OpenEnv Basic Abstractions - RFC 002: OpenEnv Framework Spec -- RFC 004: Support multiple tool calls via Action wrapper abstraction diff --git a/rfcs/004-actions-as-tool-calls.md b/rfcs/004-actions-as-tool-calls.md deleted file mode 100644 index c3434f5b..00000000 --- a/rfcs/004-actions-as-tool-calls.md +++ /dev/null @@ -1,479 +0,0 @@ -# RFC: Support multiple tool calls via Action wrapper abstraction - -**Status**: In Review -**Created**: 10/15/2025 -**Authors**: @Darktex, @pankit-eng -**RFC ID**: 004 - -**Note**: This RFC defines the unified action interface that applies to all environment types. RFC 003 describes how MCP tools integrate with this action system to enable tool calling in both traditional and CodeAct paradigms. - -## Summary - -This RFC proposes treating environment actions using a standardized pattern inspired by MCP (Model Context Protocol), where each action represents a discrete, named operation with typed parameters. This approach aligns OpenEnv with modern LLM agent frameworks while maintaining type safety and providing better introspection capabilities for agent training and debugging. - -Instead of arbitrary `Action` subclasses with domain-specific fields, actions follow a tool-call pattern with a `tool_name` and structured `parameters`, making the framework more composable and easier to integrate with tool-using agents. - -**Important**: While inspired by MCP's tool-calling pattern, this abstraction extends beyond external tools and code execution to **any environment action** - including game moves, navigation commands, configuration changes, and domain-specific operations that don't involve tools at all. - -## Motivation - -### Problem Statement - -Current action design in OpenEnv treats actions as dataclasses: - -```python -@dataclass -class CodeAction(Action): - code: str - -@dataclass -class BashAction(Action): - command: str - cwd: Optional[str] = None - -@dataclass -class MoveAction(Action): - direction: str # "up", "down", "left", "right" - -@dataclass -class GameAction(Action): - action_id: int - player_id: str -``` - -This approach has several limitations: - -1. **Lack of Introspection**: No standard way to discover what actions an environment supports -2. **LLM Integration Friction**: Modern LLM agents use tool-calling patterns with JSON schemas, requiring translation layers -3. **Inconsistent Patterns**: Each environment invents its own action structure without standardization -4. **Poor Discoverability**: Agents can't programmatically determine valid actions and their parameters - -### Goals - -1. **Standardize Action Structure**: Define a consistent pattern for representing all environment actions, inspired by MCP's tool-calling design -2. **Enable Action Discovery**: Provide APIs to introspect available actions in any environment -3. **Improve LLM Integration**: Native compatibility with tool-calling patterns used by Claude, GPT-4, and other models -4. **Maintain Type Safety**: Preserve strong typing while adopting the unified action pattern -5. **Universal Applicability**: Support any type of action - tools, code execution, game moves, navigation, configuration, etc. - -### Inspiration: MCP (Model Context Protocol) - -This RFC is heavily inspired by [MCP](https://spec.modelcontextprotocol.io/), which standardized how external tools are exposed to AI agents. MCP introduced: -- Standardized tool definitions with JSON Schema -- Tool discovery via `list_tools()` API -- Tool execution via `call_tool(name, parameters)` RPC -- Language-agnostic design - -We adopt these principles but **generalize beyond tools** to cover all environment actions. For example: -- A chess environment's "move piece" action is not a "tool" in the MCP sense -- A navigation environment's "go_north" action doesn't involve external tool calls -- A configuration environment's "set_parameter" action isn't code execution - -Yet all benefit from the same standardized action pattern. - -## Design - -### Architecture Overview - -``` -┌─────────────────────────────────────────────────────────┐ -│ Agent/RL Code │ -│ │ -│ # Action discovery (works for ANY environment) │ -│ actions = env.actions() │ -│ # -> [ActionDefinition(name="execute_code", ...)] │ -│ # -> [ActionDefinition(name="move_piece", ...)] │ -│ # -> [ActionDefinition(name="set_config", ...)] │ -│ │ -│ # Execute action (unified interface) │ -│ action = ToolCallAction( │ -│ tool_name="execute_code", │ -│ parameters={"code": "print('Hello')"} │ -│ ) │ -│ observation = env.step(action) │ -└─────────────────────────────────────────────────────────┘ - │ HTTP - ▼ -┌─────────────────────────────────────────────────────────┐ -│ Environment (Docker Container) │ -│ │ -│ class PythonCodeActEnv(Environment): │ -│ │ -│ @action("execute_code") │ -│ def execute_code(self, code: str) -> CodeResult: │ -│ return self._executor.run(code) │ -│ │ -│ def step(self, action: ToolCallAction): │ -│ action_fn = self._get_action(action.tool_name) │ -│ result = action_fn(**action.parameters) │ -│ return self._make_observation(result) │ -└─────────────────────────────────────────────────────────┘ -``` - -### Core Abstractions - -#### 1. ToolCallAction - -```python -from typing import Any, Dict -from dataclasses import dataclass, field - -@dataclass(kw_only=True) -class ToolCallAction(Action): - """Action representing a named operation with typed parameters. - - Inspired by MCP's tool-calling pattern, but generalized to represent - ANY environment action - not just tool calls or code execution. - - Examples: - - Tool calls: tool_name="search_web", parameters={"query": "..."} - - Code execution: tool_name="execute_code", parameters={"code": "..."} - - Game moves: tool_name="move_piece", parameters={"from": "e2", "to": "e4"} - - Navigation: tool_name="go_north", parameters={} - - Configuration: tool_name="set_timeout", parameters={"seconds": 30} - - This is the standard action type for all OpenEnv environments. - Environments dispatch based on tool_name to handle different action types. - """ - - tool_name: str - parameters: Dict[str, Any] = field(default_factory=dict) -``` - -#### 2. ToolDefinition - -```python -from typing import Any, Callable, Dict, List -from dataclasses import dataclass - -@dataclass -class ToolParameter: - """Definition of a tool parameter.""" - - name: str - type: str # JSON Schema type: "string", "number", "boolean", "object", "array" - description: str - required: bool = True - default: Any = None - -@dataclass -class ToolDefinition: - """Specification of an action that can be taken in an environment. - - Inspired by MCP's tool definition format and compatible with LLM tool-calling - APIs (Claude, OpenAI, etc.), but represents ANY action type - not just tools. - - This can describe: - - External tool calls (search_web, read_file) - - Code execution (execute_python, run_bash) - - Game actions (move_piece, attack, defend) - - Navigation commands (go_north, turn_left) - - Configuration changes (set_parameter, update_config) - - Any domain-specific action - """ - - name: str - description: str - parameters: List[ToolParameter] - - def to_json_schema(self) -> Dict[str, Any]: - """Convert to JSON Schema format for LLM tool calling.""" - return { - "name": self.name, - "description": self.description, - "input_schema": { - "type": "object", - "properties": { - p.name: { - "type": p.type, - "description": p.description, - } - for p in self.parameters - }, - "required": [p.name for p in self.parameters if p.required], - }, - } -``` - -#### 3. Enhanced Environment Interface - -```python -from typing import List, Optional - -class Environment(ABC): - """Base class for all environment servers.""" - - @abstractmethod - def reset(self) -> Observation: - """Reset the environment and return initial observation.""" - pass - - @abstractmethod - def step(self, action: Action) -> Observation: - """Take a step in the environment.""" - pass - - @property - @abstractmethod - def state(self) -> State: - """Get current environment state.""" - pass - - def actions(self) -> List[ToolDefinition]: - """Return list of available actions in this environment. - - This method enables action discovery for any environment type. - Actions can represent tools, code execution, game moves, navigation, - or any domain-specific operations. - - For backward compatibility, environments can return an empty list, - though implementing this method is strongly encouraged. - """ - return [] - - def tools(self) -> List[ToolDefinition]: - """Alias for actions() for backward compatibility with RFC 003. - - Deprecated: Use actions() instead. - """ - return self.actions() -``` - -### Key Design Decisions - -#### Decision 1: Unified Action Type vs. Per-Tool Action Classes - -**Chosen Approach**: Use a single `ToolCallAction` class with `tool_name` and `parameters` fields rather than creating separate action classes per tool. - -**Rationale**: -- **Simplicity**: Single action type is easier to understand and work with -- **Flexibility**: Adding new actions doesn't require new action classes -- **MCP Compatibility**: Matches the structure used by MCP for tool calling, enabling easy integration -- **Type Safety**: JSON Schema validation can still enforce parameter types -- **Universality**: Works for any action type - tools, game moves, navigation, configuration, etc. -- **Composability**: Multi-action environments work naturally - -**Trade-offs**: -- Advantages: - - Less boilerplate (no action class per tool) - - Natural support for dynamic tool sets -- Disadvantages: - - Action parameters are `Dict[str, Any]` instead of strongly-typed fields (mitigated by JSON Schema validation) - -#### Decision 2: Action Discovery via `actions()` Method - -**Chosen Approach**: Add an `actions()` method to the `Environment` base class that returns `List[ToolDefinition]`. - -**Rationale**: -- **Universal Introspection**: Agents can discover available actions in any environment type -- **LLM Integration**: Action definitions can be passed directly to LLM APIs (they use the same format as tool definitions) -- **Documentation**: Self-documenting environments via decorator pattern -- **MCP Alignment**: Follows MCP's `list_tools()` pattern but generalized to all actions - -**Note**: We keep `ToolDefinition` as the return type name for compatibility with LLM APIs and MCP, even though it represents any action type. - - -## Examples - -### Example 1: Code Execution Environment - -```python -from core.env_server import Environment, Observation, State, ToolCallAction -from core.tools import PyExecutor - -class PythonCodeActEnv(Environment): - """Environment for executing Python code via tool calls.""" - - def __init__(self): - self._executor = PyExecutor() - self._state = CodeState() - - @action("execute_code", "Execute Python code and return stdout, stderr, and exit code") - def execute_code(self, code: str) -> Dict[str, Any]: - """Execute Python code. - - Args: - code: Python code to execute - - Returns: - Dict with stdout, stderr, and exit_code keys - """ - result = self._executor.run(code) - return { - "stdout": result.stdout, - "stderr": result.stderr, - "exit_code": result.exit_code, - } - - def reset(self) -> Observation: - self._state = CodeState(episode_id=str(uuid.uuid4())) - return CodeObservation(stdout="", stderr="", exit_code=0) - - def step(self, action: Action) -> Observation: - if not isinstance(action, ToolCallAction): - raise ValueError(f"Expected ToolCallAction, got {type(action)}") - - # Dispatch to action method - if action.tool_name == "execute_code": - result = self.execute_code(**action.parameters) - reward = 1 if result["exit_code"] == 0 else -1 - self._state.step_count += 1 - return CodeObservation(reward=reward, **result) - else: - raise ValueError(f"Unknown action: {action.tool_name}") - - @property - def state(self) -> State: - return self._state -``` - - -### Example 2: Game Environment (Non-Tool Actions) - -```python -from core.env_server import Environment, Observation, State, ToolCallAction - -class ChessEnv(Environment): - """Chess environment - actions are game moves, not tools.""" - - def __init__(self): - self._board = chess.Board() - self._state = GameState() - - @action("move_piece", "Move a chess piece from one square to another") - def move_piece(self, from_square: str, to_square: str) -> Dict[str, Any]: - """Move a chess piece. - - Args: - from_square: Starting square (e.g., "e2") - to_square: Destination square (e.g., "e4") - - Returns: - Dict with move validity and game state - """ - move = chess.Move.from_uci(f"{from_square}{to_square}") - if move in self._board.legal_moves: - self._board.push(move) - return { - "valid": True, - "game_over": self._board.is_game_over(), - "fen": self._board.fen(), - } - return {"valid": False, "error": "Illegal move"} - - def reset(self) -> Observation: - self._board = chess.Board() - self._state = GameState(episode_id=str(uuid.uuid4())) - return ChessObservation(fen=self._board.fen(), legal_moves=list(self._board.legal_moves)) - - def step(self, action: Action) -> Observation: - if not isinstance(action, ToolCallAction): - raise ValueError(f"Expected ToolCallAction, got {type(action)}") - - # Dispatch to action method - if action.tool_name == "move_piece": - result = self.move_piece(**action.parameters) - reward = 1 if result.get("valid") else -1 - done = result.get("game_over", False) - self._state.step_count += 1 - return ChessObservation( - reward=reward, - done=done, - fen=result.get("fen"), - valid_move=result.get("valid"), - ) - else: - raise ValueError(f"Unknown action: {action.tool_name}") - - @property - def state(self) -> State: - return self._state - - def actions(self) -> List[ToolDefinition]: - """Return available actions (game moves, not tools).""" - return [ - ToolDefinition( - name="move_piece", - description="Move a chess piece from one square to another", - parameters=[ - ToolParameter(name="from_square", type="string", description="Starting square (e.g., 'e2')"), - ToolParameter(name="to_square", type="string", description="Destination square (e.g., 'e4')"), - ], - ) - ] -``` - -### Example 3: Client-Side Usage with LLM - -```python -from anthropic import Anthropic -from envs.coding_env import CodingEnv - -# Initialize environment -env = CodingEnv.from_docker_image("coding-env:latest") - -# Get available actions -actions = env.actions() # Returns List[ToolDefinition] - -# Convert to Claude's tool format (works for any action type!) -claude_tools = [action.to_json_schema() for action in actions] - -# Initialize Claude client -client = Anthropic() - -# Agent loop -observation = env.reset() -messages = [{"role": "user", "content": "Calculate fibonacci(10)"}] - -while not observation.done: - # Get model response with tools - response = client.messages.create( - model="claude-3-5-sonnet-20241022", - messages=messages, - tools=claude_tools, - ) - - # If model wants to take an action - if response.stop_reason == "tool_use": - tool_use = response.content[0] - - # Create action from LLM's tool call - # (works for code execution, game moves, or any action type) - action = ToolCallAction( - tool_name=tool_use.name, - parameters=tool_use.input, - tool_call_id=tool_use.id, - ) - - # Execute in environment - observation = env.step(action) - - # Add action result to messages - messages.append({ - "role": "assistant", - "content": response.content, - }) - messages.append({ - "role": "user", - "content": [{ - "type": "tool_result", # LLM APIs still call it "tool_result" - "tool_use_id": tool_use.id, - "content": str(observation), - }], - }) - print(observation.reward) - else: - break - -env.close() -``` - -## References - -- [Model Context Protocol (MCP) Specification](https://spec.modelcontextprotocol.io/) - Primary inspiration for this RFC -- [Anthropic Tool Use Documentation](https://docs.anthropic.com/claude/docs/tool-use) -- [OpenAI Function Calling](https://platform.openai.com/docs/guides/function-calling) -- RFC 000: OpenEnv Project Phases -- RFC 001: OpenEnv Basic Abstractions -- RFC 002: OpenEnv Framework Spec -- RFC 003: MCP Support diff --git a/rfcs/README.md b/rfcs/README.md index a2af899d..c90de3de 100644 --- a/rfcs/README.md +++ b/rfcs/README.md @@ -79,11 +79,20 @@ Each RFC should include the following sections: ## Current RFCs -- [000-project-phases.md](./000-project-phases.md) - OpenEnv layering -- [001-abstractions.md](./001-abstractions.md) - OpenEnv Basic Abstractions -- [002-env-spec.md](./002-env-spec.md) - OpenEnv Framework Spec for agent execution environments -- [003-mcp-support.md](./003-mcp-support.md) - MCP (Model Context Protocol) Support -- [004-actions-as-tool-calls.md](./004-actions-as-tool-calls.md) - Support multiple tool calls via Action wrapper abstraction +### Core Abstractions & Design +- [000-project-phases.md](./000-project-phases.md) - Design Principles and Broad Roadmap +- [001-abstractions.md](./001-abstractions.md) - Basic Abstractions (Environment, Agent, State) +- [002-env-spec.md](./002-env-spec.md) - Framework Spec for Agent Execution Environments + +### MCP Integration +- [003-mcp-support.md](./003-mcp-support.md) - MCP Support: Traditional Tool Calling +- [004-codeact-with-mcp.md](./004-codeact-with-mcp.md) - CodeAct Support with MCP +- [005-mcp-universal-interface.md](./005-mcp-universal-interface.md) - MCP as THE Universal Interface (Policy) + +### Tool Ecosystem & Performance +- [005-tool-registry.md](./005-tool-registry.md) - Tool Registry & Distribution via Hugging Face Hub (Note: Will be renumbered) +- [006-performance-simulation.md](./006-performance-simulation.md) - Production Performance Simulation +- [007-mcp-interception.md](./007-mcp-interception.md) - MCP Protocol Interception (Draft - Future Work) ## Questions?