Skip to content

GPU CI — ephemeral RunPod runner for CUDA tests #39

@SaschaOnTour

Description

@SaschaOnTour

Problem / Motivation

CUDA tests (cargo nextest run --features cuda) can only run on machines with NVIDIA GPUs. Currently these must be run manually on RunPod. An ephemeral GPU CI runner would catch GPU regressions automatically on every PR.

Solution

GitHub Actions workflow that:

  1. Spins up an ephemeral RunPod instance (or similar GPU cloud) on PR
  2. Runs cargo nextest run --features cuda
  3. Optionally runs the benchmark protocol script
  4. Tears down the instance after completion

Options

  • RunPod serverless endpoints
  • GitHub-hosted GPU runners (if available)
  • Self-hosted runner on always-on GPU instance (cheapest long-term)

Key files

  • NEW: .github/workflows/gpu-ci.yml
  • Existing: .github/workflows/ci.yml — CPU-only CI

Acceptance criteria

  • GPU tests run automatically on PR
  • Instance is torn down after completion (no idle costs)
  • CI passes for the current codebase
  • Cost per run is reasonable (< $1)

Metadata

Metadata

Assignees

No one assigned

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions