Skip to content

Fix #2683: Fallback to nvidia-smi -L when /proc/driver/nvidia/gpus unavailable#2759

Open
wadeKeith wants to merge 1 commit intoGenesis-Embodied-AI:mainfrom
wadeKeith:fix/nvidia-smi-fallback
Open

Fix #2683: Fallback to nvidia-smi -L when /proc/driver/nvidia/gpus unavailable#2759
wadeKeith wants to merge 1 commit intoGenesis-Embodied-AI:mainfrom
wadeKeith:fix/nvidia-smi-fallback

Conversation

@wadeKeith
Copy link
Copy Markdown

Summary

Fixes #2683 — GPU enumeration fallback for cloud providers where /proc/driver/nvidia/gpus is unavailable.

Problem

On certain cloud providers (e.g., packet.ai RTX 6000 PRO instances), the /proc/driver/nvidia/gpus interface is not available, causing benchmark tests and multi-GPU detection to fail.

Solution

Added nvidia-smi -L as a fallback in two functions in tests/conftest.py:

  1. _get_gpu_indices(): When /proc/driver/nvidia/gpus raises FileNotFoundError, enumerate GPUs by parsing nvidia-smi -L output (counting lines starting with "GPU ").

  2. _torch_get_gpu_idx(device): When /proc/driver/nvidia/gpus raises FileNotFoundError, find the GPU index by matching device UUID against nvidia-smi -L output, extracting the GPU number from the "GPU N:" prefix.

The warning message is updated to note when both the proc interface and nvidia-smi fallback fail.

Testing

This change only affects Linux systems and only activates when the /proc/driver/nvidia/gpus path is unavailable — existing behavior is fully preserved on systems where it exists.

Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature]: Fallback to nvidia-smi -L if no /proc/driver/nvidia/gpus

1 participant