Skip to content

Wrong return of TestName in DCGM Diag Run #97

@g-rayban

Description

@g-rayban

When results, err := dcgm.RunDiag(dcgm.DiagQuick, dcgm.GroupAllGPUs()) is called and we print the results, the following is the output

{Software:[{Status:pass TestName:presence of drivers on the denylist (e.g. nouveau) TestOutput:Allocated 83618558100 bytes (98.4%) ErrorCode:0 ErrorMessage:} {Status:pass TestName:presence of drivers on the denylist (e.g. nouveau) TestOutput:Allocated 83618558100 bytes (98.4%) ErrorCode:0 ErrorMessage:} {Status:pass TestName:presence of drivers on the denylist (e.g. nouveau) TestOutput:Allocated 83618558100 bytes (98.4%) ErrorCode:0 ErrorMessage:}]}

Should not the TestName be "software", "memory" and "pcie" the way it's displayed in dcgmi command. I also see an used function gpuTestName in diag.go which should be the ideal testname.

dcgmi diag -r 2
Successfully ran diagnostic for group. +---------------------------+------------------------------------------------+ | Diagnostic | Result | +===========================+================================================+ |----- Metadata ----------+------------------------------------------------| | DCGM Version | 4.4.1 | | Driver Version Detected | 580.76.05 | | GPU Device IDs Detected | 2330 | |----- Deployment --------+------------------------------------------------| | software | Pass | | | GPU0: Pass | +----- Hardware ----------+------------------------------------------------+ | memory | Pass | | | GPU0: Pass | +----- Integration -------+------------------------------------------------+ | pcie | Pass | | | GPU0: Pass | +---------------------------+------------------------------------------------+

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions