Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
27 changes: 23 additions & 4 deletions spark-install.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ This guide walks you through installing and running NemoClaw on an NVIDIA DGX Sp

Before starting, make sure you have:

- **Docker** (pre-installed on DGX Spark)
- **Docker** (pre-installed on DGX Spark, v28.x/29.x)
- **Node.js 22** (installed automatically by the NemoClaw installer)
- **OpenShell CLI** (must be installed separately before running NemoClaw — see the Quick Start below)
- **API key** (cloud inference only) — the onboarding wizard prompts for a provider and key during setup. For example, an NVIDIA API key from [build.nvidia.com](https://build.nvidia.com) for NVIDIA Endpoints, or an OpenAI, Anthropic, or Gemini key for those providers. **If you plan to use local inference with Ollama instead, no API key is needed** — see [Local Inference with Ollama](#local-inference-with-ollama) to set up Ollama before installing NemoClaw.
Expand Down Expand Up @@ -163,6 +163,9 @@ openclaw agent --agent main --local -m "Which model and GPU are in use?" --sessi
| CoreDNS CrashLoop after setup | Fixed in `fix-coredns.sh` | Uses container gateway IP, not 127.0.0.11 |
| Image pull failure (k3s can't find built image) | OpenShell bug | `openshell gateway destroy && openshell gateway start`, re-run setup |
| GPU passthrough | Untested on Spark | Should work with `--gpu` flag if NVIDIA Container Toolkit is configured |
| `pip install` fails with system packages | Known | Use a venv (recommended) or `--break-system-packages` (last resort, can break system tools) |
| Port 3000 conflict with AI Workbench | Known | AI Workbench Traefik proxy uses port 3000 (and 10000); use a different port for other services |
| Network policy blocks NVIDIA cloud API | By design | Ensure `integrate.api.nvidia.com` is in the sandbox network policy if using cloud inference |

### Manual Setup (if setup-spark doesn't work)

Expand Down Expand Up @@ -197,9 +200,25 @@ newgrp docker # or log out and back in

## Technical Reference

### Web Dashboard

The OpenClaw gateway includes a built-in web UI. Access it at:

```text
http://127.0.0.1:18789/#token=<your-gateway-token>
```

Find your gateway token in `~/.openclaw/openclaw.json` under `gateway.auth.token` inside the sandbox.

> **Important**: Use `127.0.0.1` (not `localhost`) — the gateway's origin check requires an exact match. External dashboards like Mission Control cannot currently connect due to the gateway resetting `controlUi.allowedOrigins` on every config reload (see [openclaw#49950](https://github.com/openclaw/openclaw/issues/49950)).
Comment on lines +211 to +213
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify how token/config location is documented elsewhere in the repo.
# Expected: consistent guidance for openclaw.json location and token retrieval context.

rg -n -C2 'openclaw\.json|gateway\.auth\.token|127\.0\.0\.1:18789|Web Dashboard|Mission Control'

Repository: NVIDIA/NemoClaw

Length of output: 46323


Specify the command or method to retrieve the gateway token for host users.

Line 211 mixes host path notation (~/.openclaw/) with "inside the sandbox," leaving users unclear on how to actually obtain the token for the Web Dashboard URL. Replace the vague reference with an explicit retrieval method. For example:

To retrieve your gateway token, run this command inside the sandbox:
jq -r '.gateway.auth.token' /sandbox/.openclaw/openclaw.json

Or use openshell sandbox download /sandbox/.openclaw/openclaw.json from the host if the sandbox supports it.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@spark-install.md` around lines 211 - 213, The doc currently references
"~/.openclaw/openclaw.json inside the sandbox" without showing how to fetch the
gateway token; update the text to explicitly tell users how to retrieve
gateway.auth.token by (a) running jq against /sandbox/.openclaw/openclaw.json
from inside the sandbox to extract gateway.auth.token, or (b) from the host
using the sandbox download command (e.g. openshell sandbox download) to fetch
/sandbox/.openclaw/openclaw.json and then extract gateway.auth.token; retain the
note to use 127.0.0.1 for origin checks.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point — updated to show the explicit jq command for extracting the token from inside the sandbox. The ~/.openclaw/ path was misleading since the config lives at /sandbox/.openclaw/openclaw.json inside the container.


### NIM Compatibility on arm64

Some NIM containers (e.g., Nemotron-3-Super-120B-A12B) ship native arm64 images and run on the Spark. However, many NIM images are amd64-only and will fail with `exec format error`. Check the image architecture before pulling. For models without arm64 NIM support, consider using Ollama or [llama.cpp](https://github.com/ggml-org/llama.cpp) with GGUF models as alternatives.

### What's Different on Spark

DGX Spark ships **Ubuntu 24.04 + Docker** but no k8s/k3s. OpenShell embeds k3s inside a Docker container, which hits two problems on Spark:
DGX Spark ships **Ubuntu 24.04 (Noble) + Docker 28.x/29.x** on **aarch64 (Grace CPU + GB10 GPU, 128 GB unified memory)** but no k8s/k3s. OpenShell embeds k3s inside a Docker container, which hits two problems on Spark:

#### Docker permissions

Expand All @@ -226,8 +245,8 @@ Failed to start ContainerManager: failed to initialize top level QOS containers
### Architecture

```text
DGX Spark (Ubuntu 24.04, cgroup v2)
└── Docker (cgroupns=host)
DGX Spark (Ubuntu 24.04, aarch64, cgroup v2, 128 GB unified memory)
└── Docker (28.x/29.x, cgroupns=host)
└── OpenShell gateway container
└── k3s (embedded)
└── nemoclaw sandbox pod
Expand Down
Loading