Skip to content

Replace RLM library with Daytona-backed code execution#1

Open
OCWC22 wants to merge 12 commits intoeesb99:mainfrom
OCWC22:claude/integrate-fleet-daytona-bSOUB
Open

Replace RLM library with Daytona-backed code execution#1
OCWC22 wants to merge 12 commits intoeesb99:mainfrom
OCWC22:claude/integrate-fleet-daytona-bSOUB

Conversation

@OCWC22
Copy link
Copy Markdown

@OCWC22 OCWC22 commented Feb 18, 2026

Summary

Refactor the RLM MCP server to use Daytona sandboxes for secure, remote Python code execution instead of the RLM library. This removes the dependency on the external RLM package and implements the core RLM loop (iterative code execution with LLM reasoning) directly in the server.

Key Changes

  • New DaytonaInterpreter class (src/daytona_interpreter.py): Drop-in replacement for Modal/RLM sandboxes. Manages Daytona sandbox lifecycle (create, execute, upload/download files, shutdown) and provides a stateless execution model with variable injection and SUBMIT() helper for structured output.

  • Reimplemented RLM loop (src/server.py::rlm_loop): Core iterative execution loop that:

    1. Builds a system prompt with task + context
    2. Asks LLM to write Python code
    3. Executes code in Daytona sandbox
    4. Appends code + output to conversation history
    5. Repeats until SUBMIT() is called or max iterations reached
    6. Falls back to LLM for final answer extraction
  • Updated MCP tools:

    • rlm_execute: Execute tasks with full RLM loop
    • rlm_analyze: Analyze data with code execution
    • rlm_code: Generate and test code
    • rlm_decompose: Break complex tasks into subtasks
    • sandbox_exec: Direct code execution (no RLM loop)
    • sandbox_upload, sandbox_files: File operations
    • rlm_status: System status
  • Configuration changes:

    • Replaced OpenRouter with OpenAI/Anthropic API keys
    • Changed default models from Grok to GPT-4o
    • Added Daytona-specific env vars (API key, URL, target region)
    • Added MCP transport options (stdio, streamable-http, SSE)
    • Reduced max iterations from 20 to 15
  • New deployment script (deploy.py): Automates deployment to a Daytona workspace—uploads server code, installs dependencies, and starts the MCP server on HTTP for remote access.

  • Enhanced shell scripts:

    • run_server.sh: Added --http and --deploy modes
    • setup.sh: Simplified to install from requirements.txt (no RLM library)
  • Project files:

    • Added pyproject.toml for package metadata
    • Added .env.example for configuration template
    • Added src/__main__.py to support python -m src
    • Updated requirements.txt to include daytona and uvicorn

Implementation Details

  • Lazy initialization: get_interpreter() creates a single Daytona sandbox per server process, reused across all RLM calls.
  • Code extraction: _extract_code_block() intelligently extracts Python code from LLM responses (fenced blocks or plain code).
  • API key resolution: _resolve_api_key() picks the correct API key based on model provider (OpenAI vs Anthropic).
  • Output truncation: Long stdout is summarized to prevent context blowup (configurable via RLM_MAX_OUTPUT_CHARS).
  • SUBMIT() protocol: Sandbox code calls SUBMIT(answer=...) to signal completion; output is JSON-parsed and returned immediately.
  • Stateless execution: Each code block runs in isolation with variable injection, avoiding state persistence issues.

Migration Notes

  • Removed dependency on the RLM library entirely
  • Removed OpenRouter API key requirement
  • Daytona API key is now required for sandbox creation
  • LLM models now use litellm format (e.g., openai/gpt-4o instead of openrouter/...)

https://claude.ai/code/session_01LSYRbTtZpDiENzs15ZbS3R

OCWC22 and others added 6 commits February 9, 2026 20:25
- Add DaytonaInterpreter as drop-in replacement for ModalInterpreter
- Rewrite server.py with Fleet RLM-style iterative code execution loop
- LLM writes Python → Daytona sandbox executes → output feeds back → iterate
- Add new MCP tools: sandbox_exec, sandbox_upload, sandbox_files
- Update dependencies: daytona-sdk replaces rlm library requirement
- No local code execution, no Modal dependency, no Deno/Pyodide needed

https://claude.ai/code/session_01LSYRbTtZpDiENzs15ZbS3R
- Fix Daytona SDK import: `from daytona import ...` (not daytona_sdk)
- Fix requirements.txt: `daytona>=0.10.0` (not daytona-sdk)
- Fix server.py: use relative import for DaytonaInterpreter
- Add src/__main__.py: enables `python -m src` entry point
- Add pyproject.toml: proper packaging with `fleet-rlm-mcp` script
- Add .env.example: template for required env vars
- Fix run_server.sh: use `python -m src` entry point
- Fix setup.sh: verify `import daytona` not `import daytona_sdk`

Verified: all imports pass, all 8 MCP tools register, server starts.

https://claude.ai/code/session_01LSYRbTtZpDiENzs15ZbS3R
Three ways to run:
  ./run_server.sh           # local stdio (Claude Desktop / CLI)
  ./run_server.sh --http    # HTTP on 0.0.0.0:8000/mcp (remote clients)
  ./run_server.sh --deploy  # deploy to Daytona workspace (always-on)

Changes:
- server.py: add MCP_TRANSPORT/MCP_PORT env vars, bind 0.0.0.0,
  pass transport to mcp.run() (stdio | streamable-http | sse)
- deploy.py: creates Daytona sandbox, uploads server code, installs
  deps, starts MCP on HTTP in background session
- run_server.sh: --http and --deploy flags, proper .env.local sourcing
- requirements.txt/pyproject.toml: add uvicorn for HTTP transport
- README: document all three run modes

Verified: stdio starts, HTTP starts (uvicorn on :8000), deploy imports OK.

https://claude.ai/code/session_01LSYRbTtZpDiENzs15ZbS3R
- session_execute → execute_session_command
- request= → req=
- create_session(session_id=x) → create_session(x)

Verified against installed daytona SDK source.

https://claude.ai/code/session_01LSYRbTtZpDiENzs15ZbS3R
Copilot AI review requested due to automatic review settings February 18, 2026 13:51
- deploy.py: stop printing partial API key in "To stop" instructions,
  read from env var instead
- .gitignore: use `.env.*` glob with `!.env.example` exception to
  block all env files (not just .env and .env.local)

https://claude.ai/code/session_01LSYRbTtZpDiENzs15ZbS3R
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR refactors the RLM MCP Server to replace the external RLM library with a custom implementation backed by Daytona sandboxes for secure, remote Python code execution. The core RLM iterative loop (LLM writes code → execute in sandbox → inspect output → repeat until completion) is reimplemented directly in the server, eliminating the RLM library dependency while adding Daytona as the execution backend.

Changes:

  • Replaced RLM library with custom DaytonaInterpreter class for sandbox-based code execution
  • Reimplemented the RLM iterative execution loop with LLM completion via litellm and Daytona sandbox execution
  • Updated configuration from OpenRouter/Grok to OpenAI/Anthropic with Daytona credentials, and added HTTP transport options for remote access

Reviewed changes

Copilot reviewed 9 out of 11 changed files in this pull request and generated 30 comments.

Show a summary per file
File Description
src/server.py Reimplements RLM loop with Daytona execution, updates MCP tools, adds sandbox file operations, changes configuration from OpenRouter to OpenAI/Anthropic
src/daytona_interpreter.py New class providing Daytona sandbox lifecycle management, code execution with variable injection, SUBMIT protocol, and file operations
src/main.py New entry point to support python -m src execution
deploy.py New deployment script to create Daytona workspace, upload server code, install dependencies, and start MCP server over HTTP
setup.sh Removed RLM library installation, added Daytona SDK and litellm verification
run_server.sh Added --http and --deploy modes, updated environment variable handling
requirements.txt Added daytona>=0.10.0 and uvicorn>=0.29.0 dependencies
pyproject.toml New package metadata file with project configuration
.env.example New environment variable template with Daytona and LLM API key configuration
README.md Updated documentation to reflect Daytona architecture, new deployment options, and revised prerequisites

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread deploy.py
Comment on lines +125 to +131
sandbox.process.execute_session_command(
session_id=session_id,
req=SessionExecuteRequest(
command=f"cd {base} && python -m src",
run_async=True,
),
)
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The deployment script starts the MCP server with run_async=True but doesn't verify that the process actually stays running. If the server crashes immediately after starting (e.g., due to missing dependencies or configuration errors), the deployment would still report success. Consider checking the session status or process list after the health check to ensure the server is still running.

Copilot uses AI. Check for mistakes.
Comment thread src/server.py
Comment on lines +60 to +64

# ---------------------------------------------------------------------------
# MCP Server
# ---------------------------------------------------------------------------
mcp = FastMCP("fleet-rlm-daytona", host="0.0.0.0", port=MCP_PORT)
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The MCP server is configured to listen on 0.0.0.0 which exposes it to all network interfaces. When combined with the --http mode, this could expose the RLM code execution capabilities to the public internet without any authentication or authorization. Consider adding documentation about network security, firewall rules, or implementing authentication for the HTTP transport mode.

Suggested change
# ---------------------------------------------------------------------------
# MCP Server
# ---------------------------------------------------------------------------
mcp = FastMCP("fleet-rlm-daytona", host="0.0.0.0", port=MCP_PORT)
MCP_HOST = os.getenv("MCP_HOST", "127.0.0.1")
# ---------------------------------------------------------------------------
# MCP Server
# ---------------------------------------------------------------------------
mcp = FastMCP("fleet-rlm-daytona", host=MCP_HOST, port=MCP_PORT)

Copilot uses AI. Check for mistakes.
Comment thread pyproject.toml
version = "0.1.0"
description = "Fleet RLM MCP Server — Daytona Edition"
readme = "README.md"
license = "MIT"
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The license field specifies "MIT" but should use a valid SPDX identifier format. According to PEP 639, use either license = {text = "MIT"} or license = "MIT License" with the full SPDX identifier, or reference a LICENSE file with license = {file = "LICENSE"}.

Suggested change
license = "MIT"
license = {text = "MIT"}

Copilot uses AI. Check for mistakes.
Comment thread src/daytona_interpreter.py Outdated
Comment on lines +207 to +213
parts.append('''
def SUBMIT(**kwargs):
"""Signal structured output from the RLM execution."""
import json
print("__SUBMIT__:" + json.dumps(kwargs))
sys.exit(0)
''')
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SUBMIT() helper function calls sys.exit(0) after printing the marker. This immediately terminates the script, which is correct, but if the user code has exception handlers or finally blocks that try to continue execution, this could lead to unexpected behavior. Consider documenting this behavior clearly in the docstring.

Copilot uses AI. Check for mistakes.
Comment thread src/server.py Outdated
Comment on lines +453 to +455
sandbox_status = "Running" if (interp and interp._started) else "Not started"
sandbox_id = interp._sandbox.id if (interp and interp._sandbox) else "N/A"

Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code tries to access _interpreter._started and _interpreter._sandbox.id which are private attributes (prefixed with underscore). This violates encapsulation and creates tight coupling. Consider adding public getter methods in DaytonaInterpreter like is_started() and get_sandbox_id() to access this information properly.

Suggested change
sandbox_status = "Running" if (interp and interp._started) else "Not started"
sandbox_id = interp._sandbox.id if (interp and interp._sandbox) else "N/A"
# Prefer public accessors if available to avoid relying on private fields.
if interp:
if hasattr(interp, "is_started") and callable(getattr(interp, "is_started")):
started = bool(interp.is_started())
else:
# Fallback to private attribute for backward compatibility.
started = bool(getattr(interp, "_started", False))
if hasattr(interp, "get_sandbox_id") and callable(getattr(interp, "get_sandbox_id")):
sandbox_id = interp.get_sandbox_id() or "N/A"
else:
# Fallback to private sandbox attribute for backward compatibility.
sandbox = getattr(interp, "_sandbox", None)
sandbox_id = getattr(sandbox, "id", "N/A") if sandbox is not None else "N/A"
else:
started = False
sandbox_id = "N/A"
sandbox_status = "Running" if started else "Not started"

Copilot uses AI. Check for mistakes.
Comment thread src/server.py Outdated
Comment on lines +228 to +236
# Also check for the __SUBMIT__ marker that got JSON-decoded already
if "__SUBMIT__:" not in output and "SUBMIT" in code and output.strip():
# SUBMIT was in code and we got clean output — probably the answer
return {
"answer": output.strip(),
"trajectory": trajectory,
"execution_time": time.time() - start_time,
}

Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition if "__SUBMIT__:" not in output and "SUBMIT" in code and output.strip(): is attempting to detect successful SUBMIT calls but the logic is unclear. If SUBMIT was called and produced output, it should have the __SUBMIT__: marker. This fallback case might incorrectly return regular print output as the final answer. Consider clarifying this logic or removing this fallback branch entirely.

Suggested change
# Also check for the __SUBMIT__ marker that got JSON-decoded already
if "__SUBMIT__:" not in output and "SUBMIT" in code and output.strip():
# SUBMIT was in code and we got clean output — probably the answer
return {
"answer": output.strip(),
"trajectory": trajectory,
"execution_time": time.time() - start_time,
}

Copilot uses AI. Check for mistakes.
# Inject variables
if variables:
for name, value in variables.items():
parts.append(f"{name} = {json.dumps(value)}")
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The variable injection uses json.dumps(value) which may not work correctly for all Python objects. For example, complex objects, functions, or custom classes cannot be JSON-serialized. This will raise a TypeError at runtime. Consider using pickle for more robust serialization, or document that only JSON-serializable values are supported and add validation.

Suggested change
parts.append(f"{name} = {json.dumps(value)}")
try:
serialized = json.dumps(value)
except (TypeError, ValueError) as exc:
raise TypeError(
f"Variable '{name}' of type {type(value).__name__} is not JSON-serializable. "
"Only JSON-serializable values are supported for variable injection."
) from exc
parts.append(f"{name} = {serialized}")

Copilot uses AI. Check for mistakes.
Comment thread src/daytona_interpreter.py Outdated
Comment on lines +158 to +159
if self.summarize_stdout and len(output) > 10000:
output = output[:5000] + "\n...[truncated]...\n" + output[-2000:]
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The hardcoded truncation threshold of 10000 characters is checked against RLM_MAX_OUTPUT_CHARS in the environment but defaults to 10000. However, the truncation logic here uses a hardcoded value instead of reading from self.summarize_stdout flag and a configurable threshold. This inconsistency could lead to confusion. Consider passing RLM_MAX_OUTPUT_CHARS as a parameter to DaytonaInterpreter or using a class attribute.

Copilot uses AI. Check for mistakes.
Comment thread deploy.py Outdated
Comment on lines +20 to +32

# ---------------------------------------------------------------------------
# Load .env.local so deploy.py can find keys without exporting manually
# ---------------------------------------------------------------------------
for env_path in [Path(".env.local"), Path(__file__).parent / ".env.local"]:
if env_path.exists():
with open(env_path, "r") as f:
for line in f:
line = line.strip()
if line and not line.startswith("#") and "=" in line:
key, value = line.split("=", 1)
value = value.strip("\"'")
os.environ.setdefault(key, value)
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same .env.local parsing code is duplicated in both src/server.py (lines 29-41) and deploy.py (lines 24-33). This code duplication violates the DRY principle and makes maintenance harder. Consider extracting this into a shared utility function or using a library like python-dotenv.

Suggested change
# ---------------------------------------------------------------------------
# Load .env.local so deploy.py can find keys without exporting manually
# ---------------------------------------------------------------------------
for env_path in [Path(".env.local"), Path(__file__).parent / ".env.local"]:
if env_path.exists():
with open(env_path, "r") as f:
for line in f:
line = line.strip()
if line and not line.startswith("#") and "=" in line:
key, value = line.split("=", 1)
value = value.strip("\"'")
os.environ.setdefault(key, value)
from dotenv import load_dotenv
# ---------------------------------------------------------------------------
# Load .env.local so deploy.py can find keys without exporting manually
# ---------------------------------------------------------------------------
for env_path in [Path(".env.local"), Path(__file__).parent / ".env.local"]:
# load_dotenv returns True if it loaded any values from the file
if load_dotenv(dotenv_path=env_path, override=False):

Copilot uses AI. Check for mistakes.
Comment thread run_server.sh
Comment on lines +41 to +43
export MCP_TRANSPORT="streamable-http"
export MCP_PORT="${MCP_PORT:-8000}"
echo "Starting Fleet RLM MCP on http://0.0.0.0:${MCP_PORT}/mcp" >&2
Copy link

Copilot AI Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The --http mode starts the MCP server on http://0.0.0.0:${MCP_PORT}/mcp without any authentication or transport security at this layer, making it reachable by any host on the network when the port is exposed. Because this server drives code execution in Daytona sandboxes using your LLM API keys, a remote attacker who can reach this port can invoke tools and run arbitrary code or exfiltrate those keys. Restrict the HTTP binding to localhost by default and/or require running behind an authenticated, TLS-terminating reverse proxy instead of exposing it directly on 0.0.0.0.

Suggested change
export MCP_TRANSPORT="streamable-http"
export MCP_PORT="${MCP_PORT:-8000}"
echo "Starting Fleet RLM MCP on http://0.0.0.0:${MCP_PORT}/mcp" >&2
# HTTP transport is potentially unsafe when exposed directly on 0.0.0.0.
# Require an explicit opt-in via MCP_HTTP_ALLOW_INSECURE_REMOTE=1.
if [ "${MCP_HTTP_ALLOW_INSECURE_REMOTE:-0}" != "1" ]; then
echo "Refusing to start HTTP MCP server without explicit opt-in." >&2
echo "This mode starts an unauthenticated MCP HTTP server that can drive code" >&2
echo "execution in Daytona sandboxes using your LLM API keys." >&2
echo "" >&2
echo "If you really intend to expose this over the network (e.g. behind an" >&2
echo "authenticated, TLS-terminating reverse proxy), re-run with:" >&2
echo " MCP_HTTP_ALLOW_INSECURE_REMOTE=1 $0 --http" >&2
echo "" >&2
echo "For safer setups, prefer binding to localhost and using an SSH tunnel or" >&2
echo "a properly secured reverse proxy instead of exposing 0.0.0.0 directly." >&2
exit 1
fi
export MCP_TRANSPORT="streamable-http"
export MCP_PORT="${MCP_PORT:-8000}"
echo "WARNING: Starting Fleet RLM MCP with insecure HTTP on http://0.0.0.0:${MCP_PORT}/mcp" >&2
echo "Ensure this is only reachable via an authenticated, TLS-terminating reverse proxy." >&2

Copilot uses AI. Check for mistakes.
Dockerfile, docker-compose.yml, and --docker flag in run_server.sh
so the MCP server can be built and run as a container with one command.

https://claude.ai/code/session_01LSYRbTtZpDiENzs15ZbS3R
- Fix DaytonaInterpreter: correct execute_stateful() to use
  ExecutionResult.stdout (not .result), add atexit cleanup,
  proper error handling for code_interpreter.run_code()
- Fix RLM loop: use proper chat-style messages instead of
  concatenated strings, robust SUBMIT detection
- Add new MCP tools: sandbox_exec_stateful (stateful REPL),
  sandbox_download, sandbox_shell
- Add graceful SIGTERM handling with interpreter cleanup
- Clean up deploy.py: explicit SessionExecuteRequest import

https://claude.ai/code/session_01LSYRbTtZpDiENzs15ZbS3R
- Dockerfile: install curl, add HEALTHCHECK that hits /mcp endpoint
- docker-compose.yml: add healthcheck, clean up env var ordering
- README: add Docker mode (Option C), document new tools
  (sandbox_exec_stateful, sandbox_download, sandbox_shell)

https://claude.ai/code/session_01LSYRbTtZpDiENzs15ZbS3R
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants