Replace RLM library with Daytona-backed code execution#1
Replace RLM library with Daytona-backed code execution#1OCWC22 wants to merge 12 commits intoeesb99:mainfrom
Conversation
- Add DaytonaInterpreter as drop-in replacement for ModalInterpreter - Rewrite server.py with Fleet RLM-style iterative code execution loop - LLM writes Python → Daytona sandbox executes → output feeds back → iterate - Add new MCP tools: sandbox_exec, sandbox_upload, sandbox_files - Update dependencies: daytona-sdk replaces rlm library requirement - No local code execution, no Modal dependency, no Deno/Pyodide needed https://claude.ai/code/session_01LSYRbTtZpDiENzs15ZbS3R
- Fix Daytona SDK import: `from daytona import ...` (not daytona_sdk) - Fix requirements.txt: `daytona>=0.10.0` (not daytona-sdk) - Fix server.py: use relative import for DaytonaInterpreter - Add src/__main__.py: enables `python -m src` entry point - Add pyproject.toml: proper packaging with `fleet-rlm-mcp` script - Add .env.example: template for required env vars - Fix run_server.sh: use `python -m src` entry point - Fix setup.sh: verify `import daytona` not `import daytona_sdk` Verified: all imports pass, all 8 MCP tools register, server starts. https://claude.ai/code/session_01LSYRbTtZpDiENzs15ZbS3R
Three ways to run: ./run_server.sh # local stdio (Claude Desktop / CLI) ./run_server.sh --http # HTTP on 0.0.0.0:8000/mcp (remote clients) ./run_server.sh --deploy # deploy to Daytona workspace (always-on) Changes: - server.py: add MCP_TRANSPORT/MCP_PORT env vars, bind 0.0.0.0, pass transport to mcp.run() (stdio | streamable-http | sse) - deploy.py: creates Daytona sandbox, uploads server code, installs deps, starts MCP on HTTP in background session - run_server.sh: --http and --deploy flags, proper .env.local sourcing - requirements.txt/pyproject.toml: add uvicorn for HTTP transport - README: document all three run modes Verified: stdio starts, HTTP starts (uvicorn on :8000), deploy imports OK. https://claude.ai/code/session_01LSYRbTtZpDiENzs15ZbS3R
- session_execute → execute_session_command - request= → req= - create_session(session_id=x) → create_session(x) Verified against installed daytona SDK source. https://claude.ai/code/session_01LSYRbTtZpDiENzs15ZbS3R
- deploy.py: stop printing partial API key in "To stop" instructions, read from env var instead - .gitignore: use `.env.*` glob with `!.env.example` exception to block all env files (not just .env and .env.local) https://claude.ai/code/session_01LSYRbTtZpDiENzs15ZbS3R
There was a problem hiding this comment.
Pull request overview
This PR refactors the RLM MCP Server to replace the external RLM library with a custom implementation backed by Daytona sandboxes for secure, remote Python code execution. The core RLM iterative loop (LLM writes code → execute in sandbox → inspect output → repeat until completion) is reimplemented directly in the server, eliminating the RLM library dependency while adding Daytona as the execution backend.
Changes:
- Replaced RLM library with custom
DaytonaInterpreterclass for sandbox-based code execution - Reimplemented the RLM iterative execution loop with LLM completion via litellm and Daytona sandbox execution
- Updated configuration from OpenRouter/Grok to OpenAI/Anthropic with Daytona credentials, and added HTTP transport options for remote access
Reviewed changes
Copilot reviewed 9 out of 11 changed files in this pull request and generated 30 comments.
Show a summary per file
| File | Description |
|---|---|
| src/server.py | Reimplements RLM loop with Daytona execution, updates MCP tools, adds sandbox file operations, changes configuration from OpenRouter to OpenAI/Anthropic |
| src/daytona_interpreter.py | New class providing Daytona sandbox lifecycle management, code execution with variable injection, SUBMIT protocol, and file operations |
| src/main.py | New entry point to support python -m src execution |
| deploy.py | New deployment script to create Daytona workspace, upload server code, install dependencies, and start MCP server over HTTP |
| setup.sh | Removed RLM library installation, added Daytona SDK and litellm verification |
| run_server.sh | Added --http and --deploy modes, updated environment variable handling |
| requirements.txt | Added daytona>=0.10.0 and uvicorn>=0.29.0 dependencies |
| pyproject.toml | New package metadata file with project configuration |
| .env.example | New environment variable template with Daytona and LLM API key configuration |
| README.md | Updated documentation to reflect Daytona architecture, new deployment options, and revised prerequisites |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| sandbox.process.execute_session_command( | ||
| session_id=session_id, | ||
| req=SessionExecuteRequest( | ||
| command=f"cd {base} && python -m src", | ||
| run_async=True, | ||
| ), | ||
| ) |
There was a problem hiding this comment.
The deployment script starts the MCP server with run_async=True but doesn't verify that the process actually stays running. If the server crashes immediately after starting (e.g., due to missing dependencies or configuration errors), the deployment would still report success. Consider checking the session status or process list after the health check to ensure the server is still running.
|
|
||
| # --------------------------------------------------------------------------- | ||
| # MCP Server | ||
| # --------------------------------------------------------------------------- | ||
| mcp = FastMCP("fleet-rlm-daytona", host="0.0.0.0", port=MCP_PORT) |
There was a problem hiding this comment.
The MCP server is configured to listen on 0.0.0.0 which exposes it to all network interfaces. When combined with the --http mode, this could expose the RLM code execution capabilities to the public internet without any authentication or authorization. Consider adding documentation about network security, firewall rules, or implementing authentication for the HTTP transport mode.
| # --------------------------------------------------------------------------- | |
| # MCP Server | |
| # --------------------------------------------------------------------------- | |
| mcp = FastMCP("fleet-rlm-daytona", host="0.0.0.0", port=MCP_PORT) | |
| MCP_HOST = os.getenv("MCP_HOST", "127.0.0.1") | |
| # --------------------------------------------------------------------------- | |
| # MCP Server | |
| # --------------------------------------------------------------------------- | |
| mcp = FastMCP("fleet-rlm-daytona", host=MCP_HOST, port=MCP_PORT) |
| version = "0.1.0" | ||
| description = "Fleet RLM MCP Server — Daytona Edition" | ||
| readme = "README.md" | ||
| license = "MIT" |
There was a problem hiding this comment.
The license field specifies "MIT" but should use a valid SPDX identifier format. According to PEP 639, use either license = {text = "MIT"} or license = "MIT License" with the full SPDX identifier, or reference a LICENSE file with license = {file = "LICENSE"}.
| license = "MIT" | |
| license = {text = "MIT"} |
| parts.append(''' | ||
| def SUBMIT(**kwargs): | ||
| """Signal structured output from the RLM execution.""" | ||
| import json | ||
| print("__SUBMIT__:" + json.dumps(kwargs)) | ||
| sys.exit(0) | ||
| ''') |
There was a problem hiding this comment.
The SUBMIT() helper function calls sys.exit(0) after printing the marker. This immediately terminates the script, which is correct, but if the user code has exception handlers or finally blocks that try to continue execution, this could lead to unexpected behavior. Consider documenting this behavior clearly in the docstring.
| sandbox_status = "Running" if (interp and interp._started) else "Not started" | ||
| sandbox_id = interp._sandbox.id if (interp and interp._sandbox) else "N/A" | ||
|
|
There was a problem hiding this comment.
The code tries to access _interpreter._started and _interpreter._sandbox.id which are private attributes (prefixed with underscore). This violates encapsulation and creates tight coupling. Consider adding public getter methods in DaytonaInterpreter like is_started() and get_sandbox_id() to access this information properly.
| sandbox_status = "Running" if (interp and interp._started) else "Not started" | |
| sandbox_id = interp._sandbox.id if (interp and interp._sandbox) else "N/A" | |
| # Prefer public accessors if available to avoid relying on private fields. | |
| if interp: | |
| if hasattr(interp, "is_started") and callable(getattr(interp, "is_started")): | |
| started = bool(interp.is_started()) | |
| else: | |
| # Fallback to private attribute for backward compatibility. | |
| started = bool(getattr(interp, "_started", False)) | |
| if hasattr(interp, "get_sandbox_id") and callable(getattr(interp, "get_sandbox_id")): | |
| sandbox_id = interp.get_sandbox_id() or "N/A" | |
| else: | |
| # Fallback to private sandbox attribute for backward compatibility. | |
| sandbox = getattr(interp, "_sandbox", None) | |
| sandbox_id = getattr(sandbox, "id", "N/A") if sandbox is not None else "N/A" | |
| else: | |
| started = False | |
| sandbox_id = "N/A" | |
| sandbox_status = "Running" if started else "Not started" |
| # Also check for the __SUBMIT__ marker that got JSON-decoded already | ||
| if "__SUBMIT__:" not in output and "SUBMIT" in code and output.strip(): | ||
| # SUBMIT was in code and we got clean output — probably the answer | ||
| return { | ||
| "answer": output.strip(), | ||
| "trajectory": trajectory, | ||
| "execution_time": time.time() - start_time, | ||
| } | ||
|
|
There was a problem hiding this comment.
The condition if "__SUBMIT__:" not in output and "SUBMIT" in code and output.strip(): is attempting to detect successful SUBMIT calls but the logic is unclear. If SUBMIT was called and produced output, it should have the __SUBMIT__: marker. This fallback case might incorrectly return regular print output as the final answer. Consider clarifying this logic or removing this fallback branch entirely.
| # Also check for the __SUBMIT__ marker that got JSON-decoded already | |
| if "__SUBMIT__:" not in output and "SUBMIT" in code and output.strip(): | |
| # SUBMIT was in code and we got clean output — probably the answer | |
| return { | |
| "answer": output.strip(), | |
| "trajectory": trajectory, | |
| "execution_time": time.time() - start_time, | |
| } |
| # Inject variables | ||
| if variables: | ||
| for name, value in variables.items(): | ||
| parts.append(f"{name} = {json.dumps(value)}") |
There was a problem hiding this comment.
The variable injection uses json.dumps(value) which may not work correctly for all Python objects. For example, complex objects, functions, or custom classes cannot be JSON-serialized. This will raise a TypeError at runtime. Consider using pickle for more robust serialization, or document that only JSON-serializable values are supported and add validation.
| parts.append(f"{name} = {json.dumps(value)}") | |
| try: | |
| serialized = json.dumps(value) | |
| except (TypeError, ValueError) as exc: | |
| raise TypeError( | |
| f"Variable '{name}' of type {type(value).__name__} is not JSON-serializable. " | |
| "Only JSON-serializable values are supported for variable injection." | |
| ) from exc | |
| parts.append(f"{name} = {serialized}") |
| if self.summarize_stdout and len(output) > 10000: | ||
| output = output[:5000] + "\n...[truncated]...\n" + output[-2000:] |
There was a problem hiding this comment.
The hardcoded truncation threshold of 10000 characters is checked against RLM_MAX_OUTPUT_CHARS in the environment but defaults to 10000. However, the truncation logic here uses a hardcoded value instead of reading from self.summarize_stdout flag and a configurable threshold. This inconsistency could lead to confusion. Consider passing RLM_MAX_OUTPUT_CHARS as a parameter to DaytonaInterpreter or using a class attribute.
|
|
||
| # --------------------------------------------------------------------------- | ||
| # Load .env.local so deploy.py can find keys without exporting manually | ||
| # --------------------------------------------------------------------------- | ||
| for env_path in [Path(".env.local"), Path(__file__).parent / ".env.local"]: | ||
| if env_path.exists(): | ||
| with open(env_path, "r") as f: | ||
| for line in f: | ||
| line = line.strip() | ||
| if line and not line.startswith("#") and "=" in line: | ||
| key, value = line.split("=", 1) | ||
| value = value.strip("\"'") | ||
| os.environ.setdefault(key, value) |
There was a problem hiding this comment.
The same .env.local parsing code is duplicated in both src/server.py (lines 29-41) and deploy.py (lines 24-33). This code duplication violates the DRY principle and makes maintenance harder. Consider extracting this into a shared utility function or using a library like python-dotenv.
| # --------------------------------------------------------------------------- | |
| # Load .env.local so deploy.py can find keys without exporting manually | |
| # --------------------------------------------------------------------------- | |
| for env_path in [Path(".env.local"), Path(__file__).parent / ".env.local"]: | |
| if env_path.exists(): | |
| with open(env_path, "r") as f: | |
| for line in f: | |
| line = line.strip() | |
| if line and not line.startswith("#") and "=" in line: | |
| key, value = line.split("=", 1) | |
| value = value.strip("\"'") | |
| os.environ.setdefault(key, value) | |
| from dotenv import load_dotenv | |
| # --------------------------------------------------------------------------- | |
| # Load .env.local so deploy.py can find keys without exporting manually | |
| # --------------------------------------------------------------------------- | |
| for env_path in [Path(".env.local"), Path(__file__).parent / ".env.local"]: | |
| # load_dotenv returns True if it loaded any values from the file | |
| if load_dotenv(dotenv_path=env_path, override=False): |
| export MCP_TRANSPORT="streamable-http" | ||
| export MCP_PORT="${MCP_PORT:-8000}" | ||
| echo "Starting Fleet RLM MCP on http://0.0.0.0:${MCP_PORT}/mcp" >&2 |
There was a problem hiding this comment.
The --http mode starts the MCP server on http://0.0.0.0:${MCP_PORT}/mcp without any authentication or transport security at this layer, making it reachable by any host on the network when the port is exposed. Because this server drives code execution in Daytona sandboxes using your LLM API keys, a remote attacker who can reach this port can invoke tools and run arbitrary code or exfiltrate those keys. Restrict the HTTP binding to localhost by default and/or require running behind an authenticated, TLS-terminating reverse proxy instead of exposing it directly on 0.0.0.0.
| export MCP_TRANSPORT="streamable-http" | |
| export MCP_PORT="${MCP_PORT:-8000}" | |
| echo "Starting Fleet RLM MCP on http://0.0.0.0:${MCP_PORT}/mcp" >&2 | |
| # HTTP transport is potentially unsafe when exposed directly on 0.0.0.0. | |
| # Require an explicit opt-in via MCP_HTTP_ALLOW_INSECURE_REMOTE=1. | |
| if [ "${MCP_HTTP_ALLOW_INSECURE_REMOTE:-0}" != "1" ]; then | |
| echo "Refusing to start HTTP MCP server without explicit opt-in." >&2 | |
| echo "This mode starts an unauthenticated MCP HTTP server that can drive code" >&2 | |
| echo "execution in Daytona sandboxes using your LLM API keys." >&2 | |
| echo "" >&2 | |
| echo "If you really intend to expose this over the network (e.g. behind an" >&2 | |
| echo "authenticated, TLS-terminating reverse proxy), re-run with:" >&2 | |
| echo " MCP_HTTP_ALLOW_INSECURE_REMOTE=1 $0 --http" >&2 | |
| echo "" >&2 | |
| echo "For safer setups, prefer binding to localhost and using an SSH tunnel or" >&2 | |
| echo "a properly secured reverse proxy instead of exposing 0.0.0.0 directly." >&2 | |
| exit 1 | |
| fi | |
| export MCP_TRANSPORT="streamable-http" | |
| export MCP_PORT="${MCP_PORT:-8000}" | |
| echo "WARNING: Starting Fleet RLM MCP with insecure HTTP on http://0.0.0.0:${MCP_PORT}/mcp" >&2 | |
| echo "Ensure this is only reachable via an authenticated, TLS-terminating reverse proxy." >&2 |
Dockerfile, docker-compose.yml, and --docker flag in run_server.sh so the MCP server can be built and run as a container with one command. https://claude.ai/code/session_01LSYRbTtZpDiENzs15ZbS3R
- Fix DaytonaInterpreter: correct execute_stateful() to use ExecutionResult.stdout (not .result), add atexit cleanup, proper error handling for code_interpreter.run_code() - Fix RLM loop: use proper chat-style messages instead of concatenated strings, robust SUBMIT detection - Add new MCP tools: sandbox_exec_stateful (stateful REPL), sandbox_download, sandbox_shell - Add graceful SIGTERM handling with interpreter cleanup - Clean up deploy.py: explicit SessionExecuteRequest import https://claude.ai/code/session_01LSYRbTtZpDiENzs15ZbS3R
- Dockerfile: install curl, add HEALTHCHECK that hits /mcp endpoint - docker-compose.yml: add healthcheck, clean up env var ordering - README: add Docker mode (Option C), document new tools (sandbox_exec_stateful, sandbox_download, sandbox_shell) https://claude.ai/code/session_01LSYRbTtZpDiENzs15ZbS3R
Summary
Refactor the RLM MCP server to use Daytona sandboxes for secure, remote Python code execution instead of the RLM library. This removes the dependency on the external RLM package and implements the core RLM loop (iterative code execution with LLM reasoning) directly in the server.
Key Changes
New
DaytonaInterpreterclass (src/daytona_interpreter.py): Drop-in replacement for Modal/RLM sandboxes. Manages Daytona sandbox lifecycle (create, execute, upload/download files, shutdown) and provides a stateless execution model with variable injection and SUBMIT() helper for structured output.Reimplemented RLM loop (
src/server.py::rlm_loop): Core iterative execution loop that:Updated MCP tools:
rlm_execute: Execute tasks with full RLM looprlm_analyze: Analyze data with code executionrlm_code: Generate and test coderlm_decompose: Break complex tasks into subtaskssandbox_exec: Direct code execution (no RLM loop)sandbox_upload,sandbox_files: File operationsrlm_status: System statusConfiguration changes:
New deployment script (
deploy.py): Automates deployment to a Daytona workspace—uploads server code, installs dependencies, and starts the MCP server on HTTP for remote access.Enhanced shell scripts:
run_server.sh: Added--httpand--deploymodessetup.sh: Simplified to install from requirements.txt (no RLM library)Project files:
pyproject.tomlfor package metadata.env.examplefor configuration templatesrc/__main__.pyto supportpython -m srcrequirements.txtto include daytona and uvicornImplementation Details
get_interpreter()creates a single Daytona sandbox per server process, reused across all RLM calls._extract_code_block()intelligently extracts Python code from LLM responses (fenced blocks or plain code)._resolve_api_key()picks the correct API key based on model provider (OpenAI vs Anthropic).RLM_MAX_OUTPUT_CHARS).SUBMIT(answer=...)to signal completion; output is JSON-parsed and returned immediately.Migration Notes
openai/gpt-4oinstead ofopenrouter/...)https://claude.ai/code/session_01LSYRbTtZpDiENzs15ZbS3R