Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions init/eessi_archdetect.sh
Original file line number Diff line number Diff line change
Expand Up @@ -181,6 +181,12 @@ accelpath() {
nvidia_smi_out=$(mktemp -p /tmp nvidia_smi_out.XXXXX)
nvidia-smi --query-gpu=gpu_name,count,driver_version,compute_cap --format=csv,noheader 2>&1 > $nvidia_smi_out
if [[ $? -eq 0 ]]; then
if grep -q "Failed to initialize NVML: Driver/library version mismatch" $nvidia_smi_out; then
log "ERROR" "accelpath: nvidia-smi command failed with 'Failed to initialize NVML: Driver/library version mismatch'"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this PR @terjekv !

We discussed this at a support meeting and suggest the following:

  • print the unaltered error message (e.g., just cat $nvidia_smi_out)
  • print a hint how this could be fixed (you mentioned that you fixed this)

Does this sound ok?

rm -f $nvidia_smi_out
exit 4
fi

nvidia_smi_info=$(head -1 $nvidia_smi_out)
cuda_cc=$(echo $nvidia_smi_info | sed 's/, /,/g' | cut -f4 -d, | sed 's/\.//g')
log "DEBUG" "accelpath: CUDA compute capability '${cuda_cc}' derived from nvidia-smi output '${nvidia_smi_info}'"
Expand Down