Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
77 changes: 67 additions & 10 deletions spark-install.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,12 @@
## Quick Start

```bash
# Install OpenShell:
curl -LsSf https://raw.githubusercontent.com/NVIDIA/OpenShell/main/install.sh | sh
# Download and review the installer before running
curl -fsSL https://nvidia.com/nemoclaw.sh -o nemoclaw-install.sh
less nemoclaw-install.sh # review the script
sudo bash nemoclaw-install.sh

# Clone NemoClaw:
# Or clone and install manually
git clone https://github.com/NVIDIA/NemoClaw.git
cd NemoClaw

Expand All @@ -31,7 +33,7 @@ curl -fsSL https://www.nvidia.com/nemoclaw.sh | bash

## What's Different on Spark

DGX Spark ships **Ubuntu 24.04 + Docker 28.x** but no k8s/k3s. OpenShell embeds k3s inside a Docker container, which hits two problems on Spark:
DGX Spark ships **Ubuntu 24.04 (Noble) + Docker 28.x/29.x** on **aarch64 (Grace CPU + GB10 GPU)** but no k8s/k3s. OpenShell embeds k3s inside a Docker container, which hits two problems on Spark:

### 1. Docker permissions

Expand Down Expand Up @@ -99,6 +101,9 @@ nemoclaw onboard
| CoreDNS CrashLoop after setup | Fixed in `fix-coredns.sh` | Uses container gateway IP, not 127.0.0.11 |
| Image pull failure (k3s can't find built image) | OpenShell bug | `openshell gateway destroy && openshell gateway start`, re-run setup |
| GPU passthrough | Untested on Spark | Should work with `--gpu` flag if NVIDIA Container Toolkit is configured |
| `pip install` fails with system packages | Known | Use a venv (recommended) or `--break-system-packages` (last resort, can break system tools) |
| Port 3000 conflict with AI Workbench | Known | AI Workbench Traefik proxy uses port 3000 (and 10000); use a different port for other services |
| Network policy blocks NVIDIA cloud API | By design | Ensure `integrate.api.nvidia.com` is in the sandbox network policy if using cloud inference |

## Verifying Your Install

Expand All @@ -116,13 +121,65 @@ nemoclaw-start openclaw agent --agent main --local -m 'hello' --session-id test
openshell term
```

## Web Dashboard

The OpenClaw gateway includes a built-in web UI. Access it at:

```
http://127.0.0.1:18789/#token=<your-gateway-token>
```

Find your gateway token in `~/.openclaw/openclaw.json` under `gateway.auth.token` inside the sandbox.

> **Important**: Use `127.0.0.1` (not `localhost`) — the gateway's origin check requires an exact match. External dashboards like Mission Control cannot currently connect due to the gateway resetting `controlUi.allowedOrigins` on every config reload (see [openclaw#49950](https://github.com/openclaw/openclaw/issues/49950)).

## Using Local LLMs

DGX Spark has 128 GB unified memory shared between CPU and GPU. You can run local models alongside the sandbox:

```bash
# Build llama.cpp for GB10 (sm_121)
git clone https://github.com/ggml-org/llama.cpp.git
cd llama.cpp
PATH=/usr/local/cuda/bin:$PATH cmake -B build -DGGML_CUDA=ON -DCMAKE_CUDA_ARCHITECTURES=121
cmake --build build --config Release -j$(nproc)

# Run a model (e.g. Nemotron-3-Super-120B Q4_K_M ~78 GB)
./build/bin/llama-server --model <path-to-gguf> --host 0.0.0.0 --port 8000 \
--n-gpu-layers 999 --ctx-size 32768
```

Then configure your sandbox to use the local model by updating `~/.openclaw/openclaw.json` inside the sandbox:

```json
{
"models": {
"providers": {
"local": {
"baseUrl": "http://host.containers.internal:8000/v1",
"apiKey": "not-needed",
"api": "openai-completions",
"models": [{ "id": "my-model", "name": "Local Model" }]
}
}
},
"agents": {
"defaults": { "model": { "primary": "local/my-model" } }
}
}
```

> **Note**: The sandbox egress proxy blocks direct access to the host network. Use `inference.local` with `"apiKey": "openshell-managed"` if your model is configured via NIM or `nemoclaw setup-spark`.

> **Note**: Some NIM containers (e.g., Nemotron-3-Super-120B-A12B) ship native arm64 images and run on the Spark. However, many NIM images are amd64-only and will fail with `exec format error`. Check the image architecture before pulling. GGUF models with llama.cpp are a reliable alternative for models without arm64 NIM support.
Comment on lines +172 to +174
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix blockquote spacing to satisfy markdownlint MD028.

There is a blank line inside a blockquote around Line 165. Keep the separator line prefixed with >.

📝 Suggested fix
 > **Note**: The sandbox egress proxy blocks direct access to the host network. Use `inference.local` with `"apiKey": "openshell-managed"` if your model is configured via NIM or `nemoclaw setup-spark`.
-
+>
 > **Note**: Some NIM containers (e.g., Nemotron-3-Super-120B-A12B) ship native arm64 images and run on the Spark. However, many NIM images are amd64-only and will fail with `exec format error`. Check the image architecture before pulling. GGUF models with llama.cpp are a reliable alternative for models without arm64 NIM support.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
> **Note**: The sandbox egress proxy blocks direct access to the host network. Use `inference.local` with `"apiKey": "openshell-managed"` if your model is configured via NIM or `nemoclaw setup-spark`.
> **Note**: Some NIM containers (e.g., Nemotron-3-Super-120B-A12B) ship native arm64 images and run on the Spark. However, many NIM images are amd64-only and will fail with `exec format error`. Check the image architecture before pulling. GGUF models with llama.cpp are a reliable alternative for models without arm64 NIM support.
> **Note**: The sandbox egress proxy blocks direct access to the host network. Use `inference.local` with `"apiKey": "openshell-managed"` if your model is configured via NIM or `nemoclaw setup-spark`.
>
> **Note**: Some NIM containers (e.g., Nemotron-3-Super-120B-A12B) ship native arm64 images and run on the Spark. However, many NIM images are amd64-only and will fail with `exec format error`. Check the image architecture before pulling. GGUF models with llama.cpp are a reliable alternative for models without arm64 NIM support.
🧰 Tools
🪛 markdownlint-cli2 (0.21.0)

[warning] 165-165: Blank line inside blockquote

(MD028, no-blanks-blockquote)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@spark-install.md` around lines 164 - 166, The blockquote contains a blank
line that breaks the Markdown blockquote (MD028); remove the empty line and
ensure every line of the quoted notes remains prefixed with ">" so the two
consecutive notes stay inside the same blockquote. Edit the block that starts
with "> **Note**: The sandbox egress proxy..." and the following "> **Note**:
Some NIM containers..." so there is no bare blank line between them and each
line (including the separator) keeps the ">" prefix.


## Architecture Notes

```text
DGX Spark (Ubuntu 24.04, cgroup v2)
└── Docker (28.x, cgroupns=host)
└── OpenShell gateway container
└── k3s (embedded)
└── nemoclaw sandbox pod
└── OpenClaw agent + NemoClaw plugin
DGX Spark (Ubuntu 24.04, aarch64, cgroup v2, 128 GB unified memory)
└── Docker (28.x/29.x, cgroupns=host)
└── OpenShell gateway container (k3s embedded)
└── nemoclaw sandbox pod
└── OpenClaw agent + NemoClaw plugin
└── llama-server (optional, local inference on GB10 GPU)
```