Skip to content

feat: Tensorlake Firecracker microVM sandbox for isolated shell execution#2707

Open
ajjimeno wants to merge 11 commits intoantinomyhq:mainfrom
ajjimeno:feat/tensorlake-sandbox-integration
Open

feat: Tensorlake Firecracker microVM sandbox for isolated shell execution#2707
ajjimeno wants to merge 11 commits intoantinomyhq:mainfrom
ajjimeno:feat/tensorlake-sandbox-integration

Conversation

@ajjimeno
Copy link
Copy Markdown

Summary

Add Tensorlake Firecracker microVM sandbox as an opt-in backend for shell command execution, isolating all LLM-issued shell calls from the host machine.

Context

Forge currently executes every shell tool call directly on the user's host machine. For agentic workflows this creates risk: a misguided or adversarial command runs with the user's full OS permissions. Tensorlake provides Firecracker microVM sandboxes that boot in sub-second time, persist filesystem state across commands within a session, and auto-suspend on inactivity — making them a natural fit for containing Forge's shell execution without sacrificing the stateful environment agents rely on.

Changes

  • TensorlakeCommandExecutor (crates/forge_infra/src/tensorlake.rs): new CommandInfra implementation that provisions a Tensorlake sandbox lazily on the first command, reuses it for the entire session, and terminates it on drop via a best-effort background task.
  • ForgeInfra (crates/forge_infra/src/forge_infra.rs): introduces a CommandExecutor enum (Local | Tensorlake) and a new ForgeInfra::new_with_tensorlake() constructor; all existing CommandInfra delegation is unchanged.
  • forge_infra/src/lib.rs: exports the tensorlake module and re-exports TensorlakeConfig.
  • ForgeAPI (crates/forge_api/src/forge_api.rs): adds ForgeAPI::init_with_tensorlake() factory that mirrors init() but selects the Tensorlake executor.
  • forge_api/src/lib.rs: re-exports TensorlakeConfig so callers don't need to import forge_infra directly.
  • CLI (crates/forge_main/src/cli.rs + main.rs): adds --tensorlake <API_KEY> flag (also reads TENSORLAKE_API_KEY env var); when provided, init_with_tensorlake() is used instead of init().

Key Implementation Details

  • The sandbox ID is stored in Arc<Mutex<Option<String>>> so clones of the executor share the same lazily-provisioned sandbox.
  • execute_command_raw (used for interactive/TTY commands) returns an explicit error in Tensorlake mode — raw stdin cannot be forwarded over HTTP.
  • Sandbox cleanup is best-effort: a tokio::spawn is fired from Drop to call DELETE /v2/sandboxes/{id} without blocking the caller.
  • No changes to domain, services, or tool executor layers — the integration is confined entirely to the infrastructure layer, consistent with the project's clean architecture.

Use Cases

# Route all shell commands through a Tensorlake microVM
forge --tensorlake sk-tl-xxxx

# Or via environment variable
TENSORLAKE_API_KEY=sk-tl-xxxx forge

Testing

# Verify forge_infra compiles cleanly
cargo check -p forge_infra

# Run unit tests for the new executor
cargo test -p forge_infra tensorlake

# End-to-end: set a real API key and run a shell-heavy task
TENSORLAKE_API_KEY=<key> forge --prompt "list files in /tmp"

Links

…hell execution

Adds a Tensorlake sandbox backend as an alternative to local shell execution.
When enabled via --tensorlake <API_KEY>, all shell tool calls are routed through
an isolated Firecracker microVM instead of running directly on the host machine.

Changes:
- crates/forge_infra/src/tensorlake.rs: new TensorlakeCommandExecutor implementing
  CommandInfra; lazy sandbox provisioning via Tensorlake REST API; best-effort
  cleanup on Drop; unit tests for config defaults and executor creation
- crates/forge_infra/src/forge_infra.rs: CommandExecutor enum dispatches between
  Local and Tensorlake backends; ForgeInfra::new_with_tensorlake() constructor
- crates/forge_infra/src/lib.rs: export tensorlake module and TensorlakeConfig
- crates/forge_api/src/forge_api.rs: ForgeAPI::init_with_tensorlake() factory
- crates/forge_api/src/lib.rs: re-export TensorlakeConfig for callers
- crates/forge_main/src/cli.rs: --tensorlake <API_KEY> flag (also reads
  TENSORLAKE_API_KEY env var)
- crates/forge_main/src/main.rs: wire CLI flag to init_with_tensorlake()

Co-Authored-By: ForgeCode <noreply@forgecode.dev>
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 26, 2026

CLA assistant check
All committers have signed the CLA.

ajjimeno and others added 2 commits March 26, 2026 14:11
The Drop impl calls tokio::spawn, which requires a Tokio runtime context.
Switching to #[tokio::test] provides that runtime during the test.

Co-Authored-By: ForgeCode <noreply@forgecode.dev>
- Remap macOS/home paths to /tmp when executing commands in the remote
  Linux microVM so process spawn never fails with ENOENT
- Fix test_sandbox_proxy_url to use #[tokio::test] since Drop calls
  tokio::spawn and panics outside a runtime
- Add 'env' feature to clap so TENSORLAKE_API_KEY env var is respected
- Fix borrow issue in tensorlake key closure in main.rs

Co-Authored-By: ForgeCode <noreply@forgecode.dev>
ajjimeno and others added 3 commits March 26, 2026 15:34
- Add AtomicBool cleanup_scheduled guard to Drop so that clones sharing
  the same sandbox_id Arc only schedule one DELETE request, not one per
  clone
- Parse env_vars (Vec<String> of KEY=VALUE) into a HashMap and forward
  to the Tensorlake process API which accepts {"KEY": "value"} dicts
- Add env field to StartProcessRequest with skip_serializing_if so
  requests without env vars are unaffected
- Add tests for both: clone-dedup guard and env var parsing

Co-Authored-By: ForgeCode <noreply@forgecode.dev>
The AtomicBool cleanup_scheduled fired the DELETE on the first clone
to be dropped, while other clones sharing the same Arc<Mutex<sandbox_id>>
could still be alive and using the sandbox.

Fix by introducing a SandboxGuard inner struct that owns the sandbox
state and implements Drop. Wrapping it in Arc inside
TensorlakeCommandExecutor means the DELETE is issued exactly once —
when Arc::strong_count reaches zero (last clone dropped).

Co-Authored-By: ForgeCode <noreply@forgecode.dev>
… them

- wait_for_process_exit: return Err on non-success HTTP status instead of
  continuing to poll, so API failures fail fast rather than spinning for
  150 s before timing out
- get_process_output: return Err on non-success HTTP status instead of
  returning an empty string, preventing silent output loss

Addresses Graphite review comments on PR antinomyhq#2707.

Co-Authored-By: ForgeCode <noreply@forgecode.dev>
ajjimeno and others added 3 commits March 27, 2026 09:19
Co-authored-by: graphite-app[bot] <96075541+graphite-app[bot]@users.noreply.github.com>
- forge_api.rs: drop removed InitAuth/LoginInfo imports; drop 'restricted' param from init() and init_with_tensorlake() to match main's updated ForgeEnvironmentInfra::new signature; keep TensorlakeConfig import and init_with_tensorlake constructor
- forge_infra.rs: drop 'restricted' param from new() and new_with_tensorlake(); rename environment_service → config_infra to match main; fix service_url field name
- main.rs: remove cli.restricted reference (flag removed from CLI); keep tensorlake branching logic with updated ForgeAPI call signatures

Co-Authored-By: ForgeCode <noreply@forgecode.dev>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants