Skip to content

did the sandbox integration#1585

Open
FireFistisDead wants to merge 3 commits intomofa-org:mainfrom
FireFistisDead:vibe
Open

did the sandbox integration#1585
FireFistisDead wants to merge 3 commits intomofa-org:mainfrom
FireFistisDead:vibe

Conversation

@FireFistisDead
Copy link
Copy Markdown
Contributor

@FireFistisDead FireFistisDead commented Apr 7, 2026

📋 Summary

This PR adds sandboxed execution support for the Shell tool so agent-driven commands run in a more isolated and controlled environment, and also adds Semantic Caching for gateway chat inference to reduce latency and cost for repetitive prompts.

Together, these changes improve both:

  • Security (sandboxed shell execution)
  • Efficiency (semantic cache hits bypassing expensive inference)

🔗 Related Issues

Closes #1584 #1611
Related to #1584


🧠 Context

Shell Sandbox

The Shell tool previously executed commands directly on the host. In an agent framework, that is risky because model-generated actions can access sensitive files, environment variables, or broader system resources.

Semantic Cache

Standard cache behavior is usually exact-match only. In production, users often ask semantically similar questions with slightly different wording, which still triggers full inference unnecessarily. Semantic caching reduces repeated inference by matching prompt meaning, not only exact text.


🛠️ Changes

Sandbox execution (Shell tool)

  • Added sandbox execution mode for the Shell tool.
  • Added Linux sandbox support with isolated execution and resource limits.
  • Added non-Linux sandbox fallback through the WASM runtime for supported module-based execution.
  • Added sandbox: true support in agent.yml shell tool config.
  • Added SDK helpers so shell tool config is applied end to end.
  • Added tests for sandbox config parsing, directory restrictions, and sandbox downgrade prevention.

Semantic caching (Gateway)

  • Added semantic cache middleware for gateway chat flow.
  • Added embedding-based prompt similarity lookup before agent execution.
  • Added cache write-back after successful non-cached execution.
  • Added configurable threshold/top_k/provider/model wiring in server config.
  • Added response metadata fields to indicate cache usage and similarity score.
  • Added tests for:
    • semantic hit behavior
    • agent-boundary isolation
    • disabled-cache behavior

🧪 How You Tested

  1. Ran focused Shell tests:

    • cargo test -p mofa-plugins shell::tests -- --nocapture
  2. Ran the full mofa-plugins library test suite:

    • cargo test -p mofa-plugins --lib -- --nocapture
  3. Ran semantic cache tests in gateway:

    • cargo test -p mofa-gateway middleware::semantic_cache::tests:: -- --nocapture
  4. Ran gateway compile checks:

    • cargo check -p mofa-gateway
  5. Attempted full workspace suite:

    • cargo test --workspace --all-features

⚠️ Breaking Changes

  • No breaking changes
  • Breaking change

🧹 Checklist

Code Quality

  • Code follows Rust idioms and project conventions
  • cargo fmt run
  • cargo clippy passes without warnings

Testing

  • Tests added/updated
  • cargo test passes locally without any error

Documentation

  • Public APIs documented
  • README / docs updated if needed

PR Hygiene

  • PR is small and focused
  • Branch is up to date with main
  • No unrelated commits
  • Commit messages explain why, not only what

🚀 Deployment Notes

No migration is required.

  • For sandboxed Shell execution, set sandbox: true in Shell tool config in agent.yml.
  • For semantic caching, enable semantic cache in gateway/server config and tune threshold/top_k/provider/model as needed.

Linux sandbox mode depends on bubblewrap and optional resource-limit tooling.


🧩 Additional Notes for Reviewers

  • Sandbox mode cannot be downgraded per call once enabled globally.
  • Fallback behavior is fail-closed, not silent host-execution fallback.
  • Semantic cache hits are bounded by configured similarity threshold and scoped to avoid cross-agent leakage.

Copilot AI review requested due to automatic review settings April 7, 2026 17:59
@FireFistisDead
Copy link
Copy Markdown
Contributor Author

@BH3GEI @lijingrs you can check the pr when you are available

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds sandbox-aware execution to the built-in shell tool and wires agent.yml tool configuration through the SDK so built-in tools can be created/configured automatically from YAML.

Changes:

  • Introduces sandbox execution paths for ShellCommandTool (Linux bubblewrap + non-Linux WASM fallback), plus config parsing and related tests.
  • Adds SDK helpers to build a built-in ToolPlugin / ToolExecutor / LLMAgentBuilder from AgentYamlConfig tool settings.
  • Updates example agent.yml to demonstrate shell sandbox configuration.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
examples/config/agent.yml Adds example shell tool config showing sandbox: true and resource limits.
crates/mofa-sdk/src/llm_tools.rs Adds helpers to construct built-in tool plugin/executor and attach it to an LLM builder from YAML.
crates/mofa-sdk/src/lib.rs Re-exports the new YAML/built-in-tools helper functions from mofa_sdk::llm.
crates/mofa-plugins/src/tools/shell.rs Implements sandbox config, directory restrictions, timeouts, and sandboxed execution for the shell tool (plus tests).
crates/mofa-plugins/src/tools/mod.rs Adds a config-aware built-in tool plugin factory (currently only applies per-tool JSON config).

Comment on lines +18 to +23
/// Shell sandbox execution mode.
#[derive(Debug, Clone, Copy, PartialEq, Eq)]
pub enum ShellExecutionMode {
Host,
Sandbox,
}
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ShellExecutionMode is a public enum but is missing #[non_exhaustive], which makes it a breaking-change surface if new execution modes are added later. Consider adding #[non_exhaustive] (and updating any exhaustive matches accordingly) to keep the public API forward-compatible.

Copilot uses AI. Check for mistakes.
Comment on lines 180 to 187
@@ -67,6 +185,282 @@ impl ShellCommandTool {
.iter()
.any(|allowed| command == allowed || command.starts_with(&format!("{} ", allowed)))
}
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is_command_allowed still accepts commands like "ls -la" via the starts_with("ls ") check, but execution uses Command::new(command) with args passed separately. This will treat "ls -la" as an executable name and fail unexpectedly; the allowlist check should validate only the executable token (and/or reject whitespace in command).

Copilot uses AI. Check for mistakes.
))
})?;

if self.sandbox.mode == ShellExecutionMode::Sandbox {
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolve_working_dir only applies allowed_working_dirs when the tool is globally configured with mode == Sandbox. If the tool is configured with mode == Host but a call requests sandbox ({"sandbox": true}), the sandboxed execution path is used but directory restrictions are skipped. The allowlist check should be based on sandbox_enabled_for_call(arguments) rather than only the global mode.

Suggested change
if self.sandbox.mode == ShellExecutionMode::Sandbox {
if self.sandbox_enabled_for_call(arguments) {

Copilot uses AI. Check for mistakes.

fn ensure_working_dir_allowed(&self, working_dir: &Path) -> PluginResult<()> {
if self.sandbox.allowed_working_dirs.is_empty() {
return Ok(());
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In sandbox mode, an empty allowed_working_dirs list currently means "allow any working_dir". Because working_dir is caller-controlled, this allows binding arbitrary host paths into the sandbox (e.g. setting working_dir: "/"), which defeats the intent of isolating access to sensitive files. Consider default-denying when mode == Sandbox unless an allowlist is configured, or default the allowlist to a safe base directory and disallow absolute/parent-traversal working dirs.

Suggested change
return Ok(());
return Err(mofa_kernel::plugin::PluginError::ExecutionFailed(
"Sandbox mode requires a non-empty allowed_working_dirs configuration".to_string(),
));

Copilot uses AI. Check for mistakes.
Comment on lines +201 to +214
fn resolve_working_dir(&self, arguments: &serde_json::Value) -> PluginResult<PathBuf> {
let candidate = arguments
.get("working_dir")
.and_then(|d| d.as_str())
.map(PathBuf::from)
.unwrap_or_else(|| std::env::current_dir().unwrap_or_else(|_| PathBuf::from(".")));

let resolved = std::fs::canonicalize(&candidate).map_err(|e| {
mofa_kernel::plugin::PluginError::ExecutionFailed(format!(
"Failed to resolve working_dir '{}': {}",
candidate.display(),
e
))
})?;
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

resolve_working_dir uses std::fs::canonicalize (blocking filesystem IO) from an async tool execution path. In a Tokio runtime this can block the executor thread under load. Consider switching to tokio::fs::canonicalize or wrapping canonicalization in tokio::task::spawn_blocking.

Copilot uses AI. Check for mistakes.
Comment on lines +299 to +305
let mut wrapped_args: Vec<String> = vec![
"--die-with-parent".to_string(),
"--new-session".to_string(),
"--unshare-all".to_string(),
"--clearenv".to_string(),
"--proc".to_string(),
"/proc".to_string(),
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ShellSandboxConfig.clear_env is parsed from config, but sandboxed execution always uses bubblewrap --clearenv regardless of this flag. This makes clear_env: false misleading for sandbox mode; either honor the flag in the sandbox path or document/rename it to clarify it only affects host execution.

Copilot uses AI. Check for mistakes.
Comment on lines +384 to +386
_args: &[String],
working_dir: &Path,
) -> PluginResult<std::process::Output> {
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On non-Linux platforms, execute_wasm_fallback silently ignores _args. If arguments are not supported for the WASM fallback, it should return an explicit error when args are provided (or implement passing them through), otherwise callers will see surprising behavior differences across OSes.

Suggested change
_args: &[String],
working_dir: &Path,
) -> PluginResult<std::process::Output> {
args: &[String],
working_dir: &Path,
) -> PluginResult<std::process::Output> {
if !args.is_empty() {
return Err(mofa_kernel::plugin::PluginError::ExecutionFailed(
"Non-Linux WASM sandbox fallback does not support command arguments".to_string(),
));
}

Copilot uses AI. Check for mistakes.
Comment on lines +68 to +93
/// Build a built-in ToolPlugin from `agent.yml` tools configuration.
///
/// Only `enabled: true` tools are included in the generated config map.
pub fn builtin_tool_plugin_from_agent_yaml(
plugin_id: &str,
config: &AgentYamlConfig,
) -> LLMResult<ToolPlugin> {
let mut tool_configs: HashMap<String, serde_json::Value> = HashMap::new();

if let Some(tools) = &config.tools {
for tool in tools.iter().filter(|t| t.enabled) {
tool_configs.insert(
tool.name.clone(),
serde_json::to_value(&tool.config).map_err(|e| {
LLMError::Other(format!(
"Failed to serialize tool config for '{}': {}",
tool.name, e
))
})?,
);
}
}

create_builtin_tool_plugin_with_config(plugin_id, &tool_configs)
.map_err(|e| LLMError::Other(format!("Failed to create builtin ToolPlugin: {}", e)))
}
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

builtin_tool_plugin_from_agent_yaml filters tools[] by enabled, but the resulting plugin is still created via create_builtin_tool_plugin_with_config, which registers all built-in tools unconditionally (including shell). This means enabling any tool in agent.yml effectively enables every built-in tool, and enabled: false has no effect. The builder should respect the enabled list by only registering enabled tools (or explicitly removing/omitting disabled ones).

Copilot uses AI. Check for mistakes.
Comment on lines +42 to 61
/// Create a ToolPlugin with built-in tools and per-tool JSON config.
///
/// `tool_configs` is keyed by tool name, e.g. `"shell"`.
pub fn create_builtin_tool_plugin_with_config(
plugin_id: &str,
tool_configs: &HashMap<String, serde_json::Value>,
) -> PluginResult<ToolPlugin> {
let mut tool_plugin = ToolPlugin::new(plugin_id);

// Register all built-in tools
tool_plugin.register_tool(HttpRequestTool::new());
tool_plugin.register_tool(FileSystemTool::new_with_defaults()?);
tool_plugin.register_tool(ShellCommandTool::new_with_defaults());
let shell_tool = tool_configs
.get("shell")
.map(ShellCommandTool::new_with_defaults_from_config)
.unwrap_or_else(ShellCommandTool::new_with_defaults);
tool_plugin.register_tool(shell_tool);
tool_plugin.register_tool(DateTimeTool::new());
tool_plugin.register_tool(CalculatorTool::new());
tool_plugin.register_tool(RhaiScriptTool::new()?);
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

create_builtin_tool_plugin_with_config only applies JSON config to tools but still registers every built-in tool. When used with agent.yml tooling, this makes tools[].enabled ineffective and can unintentionally expose tools that were meant to be disabled (notably shell). Consider changing this helper to accept a set/list of enabled tool names and only register those tools.

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Secure Shell Tool Execution in Sandboxed Environment

2 participants