dAppCore · Snider · Apr 14, 2026 · Apr 14, 2026 · Apr 14, 2026 · Apr 14, 2026
diff --git a/CLAUDE.md b/CLAUDE.md
@@ -1,52 +1,108 @@
-# CLAUDE.md
+# go-rocm — AMD ROCm GPU Inference
 
 ## What This Is
 
-AMD ROCm GPU inference for Linux. Module: `forge.lthn.ai/core/go-rocm`
+AMD ROCm GPU inference for Linux via managed `llama-server` subprocess. Module: `dappco.re/go/rocm`.
 
-Implements `inference.Backend` and `inference.TextModel` (from `core/go-inference`) using llama.cpp compiled with HIP/ROCm. Targets AMD RDNA 3+ GPUs.
+Implements `inference.Backend` and `inference.TextModel` (from `core/go-inference`) using llama.cpp compiled with `-DGGML_HIP=ON`. Targets AMD RDNA 2+ GPUs (tested on Radeon RX 7800 XT, gfx1100).
 
-## Target Hardware
+Sibling to `go-mlx` (Metal on macOS). Both expose the same interface; users select at runtime based on `Available()`.
 
-- **GPU**: AMD Radeon RX 7800 XT (gfx1100, RDNA 3, 16 GB VRAM) — confirmed gfx1100, not gfx1101
-- **OS**: Ubuntu 24.04 LTS (linux/amd64)
-- **ROCm**: 7.2.0 installed
-- **Kernel**: 6.17.0
+## Key Facts
 
-## Commands
+- **Subprocess model:** llama-server runs as isolated process, communicates via HTTP/SSE
+- **GGUF parser:** Reads model metadata (v2/v3) without loading tensors — enables fast discovery
+- **VRAM monitoring:** sysfs-based (no ROCm runtime library dependency)
+- **iGPU masking:** `HIP_VISIBLE_DEVICES=0` hardcoded — Ryzen 9 iGPU crashes llama-server if exposed
+- **Auto-register:** `init()` registers backend into `inference.Register()` on linux && amd64
+- **Platform stubs:** Exports no-op funcs on non-Linux/amd64 to avoid build failures
+- **Error wrapping:** All errors use `coreerr.E(scope, msg, cause)` from `go-log`
 
-```bash
-go test ./...                       # Unit tests (no GPU required)
-go test -tags rocm ./...            # Integration tests + benchmarks (GPU required)
-go test -tags rocm -v -run TestROCm ./...   # Full GPU tests only
-go test -tags rocm -bench=. -benchtime=3x ./...  # Benchmarks
-```
+## Hardware & OS
 
-## Architecture
+| Component | Value |
+|-----------|-------|
+| GPU | Radeon RX 7800 XT (gfx1100, RDNA 3, 16 GB) |
+| CPU | Ryzen 9 9950X |
+| OS | Ubuntu 24.04 LTS |
+| ROCm | 7.2.0 |
+| Kernel | 6.17.0 |
 
-See `docs/architecture.md` for full detail.
+## Architecture
 
 ```
-go-rocm/
-├── backend.go           inference.Backend (linux && amd64)
-├── model.go             inference.TextModel (linux && amd64)
-├── server.go            llama-server subprocess lifecycle
-├── vram.go              VRAM monitoring via sysfs
-├── discover.go          GGUF model discovery
-├── register_rocm.go     auto-registers via init() (linux && amd64)
-├── rocm_stub.go         stubs for non-linux/non-amd64
-└── internal/
-    ├── llamacpp/        llama-server HTTP client + health check
-    └── gguf/            GGUF v2/v3 binary metadata parser
+dappco.re/go/rocm/
+├── Public:
+│   ├── rocm.go              [VRAMInfo, ModelInfo types]
+│   ├── discover.go          [DiscoverModels(dir) -> []ModelInfo]
+│   ├── register_rocm.go     [init() register]
+│
+├── Backend/Model (linux && amd64):
+│   ├── backend.go           [rocmBackend impl]
+│   ├── model.go             [rocmModel impl, metrics, streaming]
+│   ├── server.go            [subprocess lifecycle, port mgmt]
+│   ├── vram.go              [GetVRAMInfo() via sysfs]
+│   ├── rocm_stub.go         [stubs for other platforms]
+│
+└── Internal:
+    ├── internal/gguf/
+    │   └── gguf.go          [GGUF v2/v3 binary header parser]
+    │
+    └── internal/llamacpp/
+        ├── client.go        [HTTP client, Complete, ChatComplete]
+        └── health.go        [/health endpoint polling]
 ```
 
-## Critical: iGPU Crash
+## Critical Rules
+
+1. **iGPU always masked:** `serverEnv()` enforces `HIP_VISIBLE_DEVICES=0`. This is non-negotiable. Do not accept as config or env var override.
+
+2. **Platform-specific:** Build tags `linux && amd64` for GPU code. Stubs on other platforms prevent build errors.
+
+3. **Subprocess isolation:** llama-server is not trusted. Runs at default perms, minimal env, auto-killed on exit.
 
-The Ryzen 9 9950X iGPU appears as ROCm Device 1. llama-server crashes trying to split tensors across it. `serverEnv()` always sets `HIP_VISIBLE_DEVICES=0`. Do not remove or weaken this.
+4. **Error scope:** All errors use `coreerr.E()`. No `fmt.Errorf`, no `errors.New`, no `log` package.
 
-## Building llama-server with ROCm
+5. **Banned imports:** `fmt`, `log`, `errors`, `os/exec` use their core.* equivalents. (Note: `os` used directly for file/env ops, justified by GPU module weight constraints.)
+
+6. **Metrics best-effort:** VRAM stats read non-atomically from sysfs. Under heavy churn, transient gaps expected. Recording is not real-time.
+
+## Spec Index
+
+See `/sessions/vibrant-sharp-fermat/mnt/plans/code/core/go/rocm/RFC.md`:
+
+- **§1–2:** Overview & package layout
+- **§3:** Type definitions (VRAMInfo, ModelInfo, rocmBackend, rocmModel, server)
+- **§4:** Inference pipeline (Load, Generate, Chat, metrics)
+- **§5:** GGUF parser internals
+- **§6:** llama-server HTTP bridge
+- **§7–9:** VRAM discovery, model discovery, platform support
+- **§10–16:** Error handling, config, quantisation, design notes, cross-refs
+
+## Working Commands
 
 ```bash
+# Unit tests (no GPU required)
+go test ./...
+
+# Integration tests + benchmarks (GPU required, gfx1100)
+go test -tags rocm ./...
+
+# Full GPU tests only
+go test -tags rocm -v -run TestROCm ./...
+
+# Benchmarks
+go test -tags rocm -bench=. -benchtime=3x ./...
+
+# Format & lint
+go fmt ./...
+```
+
+## Building llama-server
+
+```bash
+git clone https://github.com/ggerganov/llama.cpp
+cd llama.cpp
 cmake -B build \
     -DGGML_HIP=ON \
     -DAMDGPU_TARGETS=gfx1100 \
@@ -56,34 +112,26 @@ cmake --build build --parallel $(nproc) -t llama-server
 sudo cp build/bin/llama-server /usr/local/bin/llama-server
 ```
 
-## Environment Variables
+## Coordination
 
-| Variable | Default | Purpose |
-|----------|---------|---------|
-| `ROCM_LLAMA_SERVER_PATH` | PATH lookup | Path to llama-server binary |
-| `HIP_VISIBLE_DEVICES` | overridden to `0` | Always forced to 0 — do not rely on ambient value |
+- **Virgil** (forge.lthn.ai/core) — orchestrator, task writer, PR reviewer
+- **go-mlx** — sibling Metal backend (same interface contract)
+- **go-inference** — shared TextModel/Backend interface definitions
+- **go-ml** — scoring engine wrapping both backends
+- **LEM training** — uses go-rocm for model eval on Charon homelab
 
-## Coding Standards
+## Test Naming
 
-- UK English
-- Tests: testify assert/require
-- Build tags: `linux && amd64` for GPU code, `rocm` for integration tests
-- Errors: `coreerr.E("pkg.Func", "what failed", err)` via `go-log`, never `fmt.Errorf` or `errors.New`
-- File I/O: `os` package used directly — `go-io` not imported (its transitive deps are too heavy for a GPU inference module)
-- Conventional commits
-- Co-Author: `Co-Authored-By: Virgil <virgil@lethean.io>`
-- Licence: EUPL-1.2
+Format: `TestFilename_Function_{Good,Bad,Ugly}` — all three categories mandatory.
 
-## Coordination
+Example: `TestModel_Generate_Good`, `TestModel_Generate_Bad`, `TestModel_Generate_Ugly`.
 
-- **Virgil** (core/go) is the orchestrator — writes tasks and reviews PRs
-- **go-mlx** is the sibling — Metal backend on macOS, same interface contract
-- **go-inference** defines the shared TextModel/Backend interfaces both backends implement
-- **go-ml** wraps both backends into the scoring engine
+## Commit Style
 
-## Documentation
+```
+type(scope): description
+
+Co-Authored-By: Virgil <virgil@lethean.io>
+```
 
-- `docs/architecture.md` — component design, data flow, interface contracts
-- `docs/development.md` — prerequisites, test commands, benchmarks, coding standards
-- `docs/history.md` — completed phases, commit hashes, known limitations
-- `docs/plans/` — phase design documents (read-only reference)
+Example: `feat(rocm): add VRAM monitoring via sysfs`
diff --git a/README.md b/README.md
@@ -2,7 +2,7 @@
 
 AMD ROCm GPU inference for Linux via a managed llama-server subprocess. Implements the `inference.Backend` and `inference.TextModel` interfaces from go-inference for AMD RDNA 3+ GPUs (validated on RX 7800 XT with ROCm 7.2). Uses llama-server's OpenAI-compatible streaming API rather than direct HIP CGO bindings, giving access to 50+ GGUF model architectures with GPU crash isolation. Includes a GGUF v2/v3 binary metadata parser, sysfs VRAM monitoring, and model discovery. Platform-restricted: `linux/amd64` only; a safe stub compiles everywhere else.
 
-**Module**: `forge.lthn.ai/core/go-rocm`
+**Module**: `dappco.re/go/rocm`
 **Licence**: EUPL-1.2
 **Language**: Go 1.25
 
@@ -11,7 +11,7 @@ AMD ROCm GPU inference for Linux via a managed llama-server subprocess. Implemen
 ```go
 import (
     "forge.lthn.ai/core/go-inference"
-    _ "forge.lthn.ai/core/go-rocm"  // registers "rocm" backend via init()
+    _ "dappco.re/go/rocm"  // registers "rocm" backend via init()
 )
 
 // Requires llama-server compiled with HIP/ROCm on PATH

diff --git a/backend.go b/backend.go
@@ -6,14 +6,16 @@ import (
 	"os"
 	"strings"
 
-	coreerr "forge.lthn.ai/core/go-log"
-	"forge.lthn.ai/core/go-inference"
-	"forge.lthn.ai/core/go-rocm/internal/gguf"
+	"dappco.re/go/inference"
+	coreerr "dappco.re/go/log"
+	"dappco.re/go/rocm/internal/gguf"
 )
 
 // rocmBackend implements inference.Backend for AMD ROCm GPUs.
 type rocmBackend struct{}
 
+const defaultContextLengthCap = 4096
+
 func (b *rocmBackend) Name() string { return "rocm" }
 
 // Available reports whether ROCm GPU inference can run on this machine.
@@ -30,68 +32,83 @@ func (b *rocmBackend) Available() bool {
 
 // LoadModel loads a GGUF model onto the AMD GPU via llama-server.
 // Model architecture is read from GGUF metadata (replacing filename-based guessing).
-// If no context length is specified, defaults to min(model_context_length, 4096)
-// to prevent VRAM exhaustion on models with 128K+ native context.
+// If no context length is specified, defaults to min(model_context_length,
+// 4096). When metadata omits the native context, it falls back to 4096 to
+// keep the load path on the safe side of VRAM usage.
 func (b *rocmBackend) LoadModel(path string, opts ...inference.LoadOption) (inference.TextModel, error) {
-	cfg := inference.ApplyLoadOpts(opts)
+	loadConfig := inference.ApplyLoadOpts(opts)
 
 	binary, err := findLlamaServer()
 	if err != nil {
 		return nil, err
 	}
 
-	meta, err := gguf.ReadMetadata(path)
+	metadata, err := gguf.ReadMetadata(path)
 	if err != nil {
 		return nil, coreerr.E("rocm.LoadModel", "read model metadata", err)
 	}
 
-	ctxLen := cfg.ContextLen
-	if ctxLen == 0 && meta.ContextLength > 0 {
-		ctxLen = int(min(meta.ContextLength, 4096))
-	}
+	contextLength := resolveContextLength(loadConfig.ContextLen, metadata)
 
-	srv, err := startServer(binary, path, cfg.GPULayers, ctxLen, cfg.ParallelSlots)
+	modelServer, err := startServer(serverStartConfig{
+		BinaryPath:        binary,
+		ModelPath:         path,
+		GPULayerCount:     loadConfig.GPULayers,
+		ContextSize:       contextLength,
+		ParallelSlotCount: loadConfig.ParallelSlots,
+	})
 	if err != nil {
 		return nil, err
 	}
 
-	// Map quantisation file type to bit width.
-	quantBits := 0
-	quantGroup := 0
-	ftName := gguf.FileTypeName(meta.FileType)
-	switch {
-	case strings.HasPrefix(ftName, "Q4_"):
-		quantBits = 4
-		quantGroup = 32
-	case strings.HasPrefix(ftName, "Q5_"):
-		quantBits = 5
-		quantGroup = 32
-	case strings.HasPrefix(ftName, "Q8_"):
-		quantBits = 8
-		quantGroup = 32
-	case strings.HasPrefix(ftName, "Q2_"):
-		quantBits = 2
-		quantGroup = 16
-	case strings.HasPrefix(ftName, "Q3_"):
-		quantBits = 3
-		quantGroup = 32
-	case strings.HasPrefix(ftName, "Q6_"):
-		quantBits = 6
-		quantGroup = 64
-	case ftName == "F16":
-		quantBits = 16
-	case ftName == "F32":
-		quantBits = 32
-	}
-
 	return &rocmModel{
-		srv:       srv,
-		modelType: meta.Architecture,
-		modelInfo: inference.ModelInfo{
-			Architecture: meta.Architecture,
-			NumLayers:    int(meta.BlockCount),
-			QuantBits:    quantBits,
-			QuantGroup:   quantGroup,
-		},
+		server:    modelServer,
+		modelType: metadata.Architecture,
+		modelInfo: modelInfoFromMetadata(metadata),
 	}, nil
 }
+
+func resolveContextLength(requestedContextLength int, metadata gguf.Metadata) int {
+	if requestedContextLength > 0 {
+		return requestedContextLength
+	}
+	if metadata.ContextLength == 0 {
+		return defaultContextLengthCap
+	}
+	return min(int(metadata.ContextLength), defaultContextLengthCap)
+}
+
+func modelInfoFromMetadata(metadata gguf.Metadata) inference.ModelInfo {
+	quantBits, quantGroup := quantisationFromFileType(metadata.FileType)
+	return inference.ModelInfo{
+		Architecture: metadata.Architecture,
+		NumLayers:    int(metadata.BlockCount),
+		QuantBits:    quantBits,
+		QuantGroup:   quantGroup,
+	}
+}
+
+func quantisationFromFileType(fileType uint32) (bits, groupSize int) {
+	fileTypeName := gguf.FileTypeName(fileType)
+
+	switch {
+	case strings.HasPrefix(fileTypeName, "Q4_"):
+		return 4, 32
+	case strings.HasPrefix(fileTypeName, "Q5_"):
+		return 5, 32
+	case strings.HasPrefix(fileTypeName, "Q8_"):
+		return 8, 32
+	case strings.HasPrefix(fileTypeName, "Q2_"):
+		return 2, 16
+	case strings.HasPrefix(fileTypeName, "Q3_"):
+		return 3, 32
+	case strings.HasPrefix(fileTypeName, "Q6_"):
+		return 6, 64
+	case fileTypeName == "F16":
+		return 16, 0
+	case fileTypeName == "F32":
+		return 32, 0
+	default:
+		return 0, 0
+	}
+}