Add NemoClaw sandbox + LiteLLM proxy integration by kosaku-sim · Pull Request #2 · simount/OpenClaw-on-AWS

kosaku-sim · 2026-03-23T11:33:27Z

概要

NVIDIA NemoClaw（OpenShell）サンドボックスと LiteLLM プロキシを Linux CloudFormation テンプレートに統合。OpenClaw エージェントをカーネルレベルで隔離されたサンドボックス内で実行し、OpenShell の managed inference proxy 経由で Amazon Bedrock にアクセスします。

ホストに OpenClaw (22GB) をインストールせず、サンドボックスイメージ内の OpenClaw のみを使用することで、ディスク使用量を約 5GB に抑えます。

EnableSandbox=false（従来）と EnableSandbox=true（本PR）の比較

項目	EnableSandbox=false（従来）	EnableSandbox=true（本PR）
OpenClaw の場所	ホストに直接インストール（npm, 22GB）	sandbox イメージに内蔵（ホストにインストール不要, ~5GB）
コード実行の隔離	OpenClaw 内蔵の Docker sandbox（アプリレベル）	Landlock + seccomp + Network Namespace（カーネルレベル）
ネットワーク隔離	Docker network（基本的にインターネット到達可能）	デフォルト全遮断、`https://inference.local` のみ許可
ファイルシステム隔離	Docker volume mount	Landlock LSM で `/sandbox` と `/tmp` のみ書き込み可
API キーの保護	コンテナ内の環境変数に存在	sandbox 内に存在しない（OpenShell プロキシがホスト側で注入）
LLM アクセス経路	OpenClaw → Bedrock（直接）	OpenClaw → inference.local → LiteLLM → Bedrock
ディスク使用量	~25GB（Chromium含むnpmパッケージ）	~5GB（sandbox image + LiteLLM venv）

従来の Docker sandbox は OpenClaw のアプリケーション機能で、エージェントのコード実行を Docker コンテナ内で行う仕組みです。NemoClaw sandbox はそれとは異なり、OpenClaw 自体をカーネルレベルで隔離された環境に閉じ込めるため、Docker-in-Docker は不要です。

アーキテクチャ

Browser → SSM Port Forward → host:18789 (SSH LocalForward)
  → Sandbox:18789 (OpenClaw Gateway)
  → https://inference.local (OpenShell managed inference proxy)
  → host.openshell.internal:4000 (LiteLLM on host)
  → Amazon Bedrock

セキュリティモデル（3層のLinuxカーネル隔離）

Landlock LSM: ファイルシステムアクセス制御（/sandbox と /tmp のみ書き込み可）
seccomp-BPF: 危険なシステムコールをブロック
Network Namespace: デフォルトで全アウトバウンド通信を遮断。https://inference.local のみ許可
クレデンシャル注入: API キーはホスト側の OpenShell プロキシが保持。サンドボックス内には存在しない

変更内容

`scripts/setup-nemoclaw-litellm.sh` — 完全書き直し

10ステップの自動化スクリプト:

inotify制限設定（k3s安定動作に必須）
LiteLLM プロキシのインストールと systemd サービス化
OpenShell CLI のインストール（NemoClaw installer経由）
OpenShell ゲートウェイ起動
host.openshell.internal のIP修正（Docker network gateway IP を動的検出）
LiteLLM をOpenAI互換プロバイダーとして登録 + inference route 設定
Sandbox policy 作成（network_policies で host.openshell.internal:4000 を許可）
サンドボックス作成 + CRD hostAliases パッチ + Pod再作成
SSH経由でOpenClaw設定配信（https://inference.local/v1 をbaseUrlに使用）+ ゲートウェイ起動
SSH LocalForward の systemd サービス化（host:18789 → sandbox:18789）

`clawdbot-bedrock.yaml` — CloudFormation テンプレート

EnableSandbox パラメータ（デフォルト: true）で NemoClaw+LiteLLM を制御
sandbox mode 時に inotify 制限を自動設定
フォールバック処理: openshell-forward サービス再起動に修正
sandbox 名を openclaw に統一
SOUL.md をsandbox mode ではSSH経由で配信
ダッシュボードURL形式: ?token= → #token= に修正

ドキュメント

SECURITY.md: NemoClaw + LiteLLM アーキテクチャのセキュリティドキュメント
TROUBLESHOOTING.md: NemoClaw/LiteLLM トラブルシューティングセクション追加
README.md: アーキテクチャ図とパラメータ表を更新
DEPLOYMENT.md: NemoClaw/LiteLLM の確認手順を追加

解決した技術課題

課題	原因	解決策
LLM request timeout	sandbox から LiteLLM に直接到達不可（HTTP proxy が 403）	`https://inference.local`（managed inference proxy）経由に変更
host.openshell.internal 到達不可	OpenShell が docker0 (172.17.0.1) を設定するが、k3s クラスタは別 Docker network	Docker network gateway IP を動的検出し、CRD hostAliases をパッチ
device token mismatch	sandbox 内の `.openclaw/identity` ディレクトリ権限不足	SSH 経由で内部からディレクトリ作成（overlayfs キャッシュ問題を回避）
gateway token missing	`?token=` がリダイレクトで消失	`#token=`（フラグメント）形式に変更
k3s namespace not ready	inotify インスタンス制限（デフォルト128）を k3s が枯渇	`fs.inotify.max_user_instances=512` に設定
OpenClaw config invalid	2026.3.11 のスキーマが異なる（`provider`, `auth.mode` 等は無効）	正しいスキーマ（`gateway`/`models.providers`/`agents.defaults`）を使用

テスト計画

https://inference.local 経由で LiteLLM → Bedrock 接続確認
Control UI (Health OK, Version 2026.3.11) で日本語チャット動作確認
sandbox 内 OpenClaw ゲートウェイ起動確認（port 18789）
EnableSandbox=true で新規 CloudFormation スタックデプロイ（エンドツーエンド）
EnableSandbox=false で既存 Bedrock 直接接続フローが正常動作
SSM ポートフォワーディング経由で Web UI にアクセス確認

Closes #1

🤖 Generated with Claude Code

… AI execution Integrates NVIDIA NemoClaw (OpenShell) and LiteLLM into the Linux CloudFormation template to provide defense-in-depth isolation: OpenClaw runs inside a network-restricted sandbox that can only reach the LiteLLM proxy on localhost:4000, which proxies all model requests to Amazon Bedrock via IAM role. Closes #1 Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace heredoc configs with python/printf generation, remove comments and blank lines from UserData to reduce from ~40KB to ~25KB base64. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Move inline NemoClaw install, LiteLLM config, network policy, and sandbox gateway service code from CloudFormation UserData to external script (scripts/setup-nemoclaw-litellm.sh). UserData now downloads and executes the script when EnableSandbox=true, reducing raw size from ~21KB to ~12KB (well within the 16KB EC2 limit). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

This script is downloaded and executed by UserData when EnableSandbox=true. Contains LiteLLM proxy install/config, NemoClaw sandbox setup, network policy, and systemd services. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

raw.githubusercontent.com requires %2F for branch names containing slashes (feature/nemoclaw-litellm-integration). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

NemoClaw installer (nc.sh) requires HOME to be set. CloudFormation UserData runs as root without HOME exported, causing 'unbound variable' error with set -e. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Use `nemoclaw onboard --non-interactive` instead of manual sandbox creation - Register LiteLLM as OpenShell provider via `openshell provider create` - Set inference route via `openshell inference set` (not config file) - Use `host.openshell.internal` for sandbox-to-host LiteLLM access - Bind LiteLLM on 0.0.0.0 so sandbox can reach it - Add persistent port forward via systemd wrapper service - Pre-stage OpenClaw config with allowedOrigins for SSM access - Update fallback restart to use openshell-forward service Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…all steps - Move OpenClaw config writing BEFORE nc.sh install (onboard copies it) - Remove explicit `nemoclaw onboard` (nc.sh --non-interactive does it) - Add /root/.local/bin to PATH after NemoClaw install - Add PATH to systemd service environment - Fix sandbox name detection Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Remove set -e (installer stops at [4/7] without NVIDIA_API_KEY, expected) - Find NVM-installed node and add to PATH after install - Wait for sandbox to become ready before configuring provider - Add comments explaining NIM API key skip behavior Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

The NemoClaw installer retry (||) re-runs onboard which recreates the gateway and destroys the existing sandbox. Run once only. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…rol UI NemoClaw sandbox handles agent execution (messaging, CLI, tools). Host OpenClaw gateway serves Control UI on port 18789 (auth=none). This avoids the device identity bug in NemoClaw's OpenClaw 2026.3.11. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Change api from "openai" to "openai-completions" (valid enum value) - Remove invalid "auth":"none" from provider config Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

OpenClaw npm package alone uses ~22GB (includes Chromium, Control UI, plugins). Combined with NemoClaw Docker images and LiteLLM, 30GB is insufficient and causes disk full issues. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Skip Node.js/npm install on host when EnableSandbox=true (saves 22GB) - Update OpenClaw inside NemoClaw sandbox to latest (fixes device identity bug) - Patch sandbox config to auth.mode=none via overlayfs - Set up persistent openshell forward for port 18789 - Run messaging plugin enablement inside sandbox - Revert EBS to 30GB (sufficient without host npm install) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

…ture Replace the NemoClaw onboard-dependent setup with a direct openshell CLI workflow that uses the managed inference proxy (https://inference.local). Setup script changes: - Use openshell gateway/provider/inference/sandbox commands directly - Route LLM requests via https://inference.local (bypasses sandbox proxy) - Fix host.openshell.internal IP (detect Docker network gateway dynamically) - Patch Sandbox CRD hostAliases for correct host resolution - Deliver config via SSH tee (replaces brittle overlayfs patching) - SSH LocalForward systemd service (replaces openshell forward) - Add inotify sysctl limits (prevents k3s "too many open files" crash) CloudFormation changes: - Add inotify limits before Docker install in sandbox mode - Fix fallback to restart openshell-forward service - Standardize sandbox name to "openclaw" - Deliver SOUL.md via SSH in sandbox mode - Fix dashboard URL format (?token= -> #token=) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

NemoClaw + LiteLLM setup takes longer than 20 minutes due to: - apt-get upgrade - LiteLLM pip install - NemoClaw installer + Docker image pulls - OpenShell gateway + sandbox creation Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Three issues found during end-to-end CloudFormation deploy test: 1. openshell sandbox create --no-tty hangs on SSH session → Run in background, wait for Ready status, then kill 2. Port 18789 conflict: docker-proxy (openshell gateway) occupies it → SSH LocalForward binds on 18790 instead → SSM port forward targets 18790, maps to local 18789 3. openshell-forward service (User=ubuntu) can't find gateway metadata → Run as root with HOME=/root and PATH including nvm node → SSH config placed in /root/.ssh/ (not ubuntu's) Also fix all SSH commands to run as root (consistent with setup context). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

kosaku-sim and others added 18 commits March 23, 2026 20:33

Fix UserData size limit: compress to fit 25600 byte base64 limit

81b8862

Replace heredoc configs with python/printf generation, remove comments and blank lines from UserData to reduce from ~40KB to ~25KB base64. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

add external NemoClaw+LiteLLM setup script

13ea8ca

This script is downloaded and executed by UserData when EnableSandbox=true. Contains LiteLLM proxy install/config, NemoClaw sandbox setup, network policy, and systemd services. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: URL-encode branch slash in external script URL

8ec149c

raw.githubusercontent.com requires %2F for branch names containing slashes (feature/nemoclaw-litellm-integration). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: export HOME in NemoClaw setup script

375e2f8

NemoClaw installer (nc.sh) requires HOME to be set. CloudFormation UserData runs as root without HOME exported, causing 'unbound variable' error with set -e. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

fix: remove nc.sh retry that destroys sandbox on second run

2450bd6

The NemoClaw installer retry (||) re-runs onboard which recreates the gateway and destroys the existing sandbox. Run once only. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: correct OpenClaw provider config (api and auth fields)

f94add8

- Change api from "openai" to "openai-completions" (valid enum value) - Remove invalid "auth":"none" from provider config Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

fix: escape shell variable in Fn::Sub for CloudFormation compatibility

d475264

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

kosaku-sim mentioned this pull request Mar 26, 2026

S3ベースのセキュリティポリシー一括管理: 複数EC2への再デプロイ不要な設定配信 #3

Open

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add NemoClaw sandbox + LiteLLM proxy integration#2

Add NemoClaw sandbox + LiteLLM proxy integration#2
kosaku-sim wants to merge 18 commits intomainfrom
feature/nemoclaw-litellm-integration

kosaku-sim commented Mar 23, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

kosaku-sim commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

概要

EnableSandbox=false（従来）と EnableSandbox=true（本PR）の比較

アーキテクチャ

セキュリティモデル（3層のLinuxカーネル隔離）

変更内容

scripts/setup-nemoclaw-litellm.sh — 完全書き直し

clawdbot-bedrock.yaml — CloudFormation テンプレート

ドキュメント

解決した技術課題

テスト計画

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

kosaku-sim commented Mar 23, 2026 •

edited

Loading

`scripts/setup-nemoclaw-litellm.sh` — 完全書き直し

`clawdbot-bedrock.yaml` — CloudFormation テンプレート