Skip to content

feat(remote): auto-detect GPU and configure passthrough in devc-remote.sh #468

@gerchowl

Description

@gerchowl

Summary

devc-remote.sh should detect GPU availability on the remote host during preflight and auto-configure GPU passthrough in the compose stack. Currently GPU setup is fully manual.

Parent

Problem

Projects like duplet_patients_analysis (JAX, PyMC) need GPU access. The user must manually add GPU config to docker-compose.local.yaml — error-prone and runtime-dependent (podman CDI vs docker deploy.resources).

Detection (preflight)

Add GPU detection to the existing remote preflight heredoc:

# GPU detection
if command -v nvidia-smi &>/dev/null; then
    GPU_AVAILABLE=1
    GPU_NAME=$(nvidia-smi --query-gpu=name --format=csv,noheader | head -1)
    GPU_MEM=$(nvidia-smi --query-gpu=memory.total --format=csv,noheader | head -1)
    # Check NVIDIA Container Toolkit
    if command -v nvidia-ctk &>/dev/null; then
        NVIDIA_CTK=1
        # Check CDI spec (podman >= 4.1)
        CDI_AVAILABLE=$(nvidia-ctk cdi list 2>/dev/null | grep -c nvidia || echo 0)
    fi
fi

Injection (compose config)

GPU config varies by runtime:

Podman with CDI (preferred, modern)

services:
  devcontainer:
    devices:
      - nvidia.com/gpu=all

Requires: nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml (one-time setup, can be part of --bootstrap).

Podman without CDI (legacy)

services:
  devcontainer:
    security_opt:
      - label=disable
    hooks:
      prestart:
        - path: /usr/bin/nvidia-container-toolkit

Docker Compose

services:
  devcontainer:
    deploy:
      resources:
        reservations:
          devices:
            - driver: nvidia
              count: all
              capabilities: [gpu]

Proposed UX

Auto-detect + prompt

$ ./scripts/devc-remote.sh ksb-meatgrinder:/path/to/project
ℹ  GPU detected: NVIDIA T400 4GB (nvidia-container-toolkit available)
ℹ  Enable GPU passthrough? [Y/n]
✓  GPU passthrough configured (CDI: nvidia.com/gpu=all)

Flags

--gpu           # Enable GPU passthrough (auto-detect method)
--gpu=all       # All GPUs
--gpu=0         # Specific GPU index
--no-gpu        # Explicitly disable (skip detection)

Where to inject

GPU config should go into docker-compose.local.yaml (personal/machine-specific, not committed). Same injection pattern as Tailscale key injection.

For projects that always need GPU, they can add it to docker-compose.project.yaml (team-shared, committed).

Bootstrap integration

--bootstrap should:

  1. Detect NVIDIA toolkit
  2. Generate CDI spec if missing: sudo nvidia-ctk cdi generate --output=/etc/cdi/nvidia.yaml
  3. Report GPU status

Acceptance Criteria

  • GPU detection in remote preflight (nvidia-smi + nvidia-ctk + CDI)
  • Auto-configure compose GPU based on runtime (podman CDI / podman legacy / docker)
  • --gpu / --no-gpu flags
  • CDI spec generation in --bootstrap
  • Inject GPU config into docker-compose.local.yaml
  • Support docker-compose.project.yaml for always-GPU projects
  • Report GPU info in preflight output
  • Tests (mock nvidia-smi / nvidia-ctk responses)

Context

  • ksb-meatgrinder: NVIDIA T400 4GB, nvidia-container-toolkit installed, podman with crun
  • duplet_patients_analysis: JAX + PyMC (GPU-accelerated Bayesian inference)

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions