-
Notifications
You must be signed in to change notification settings - Fork 43
Open
Description
I am using go-dcgm to receive policy violation events, but I find PolicyViolation struct does not contain gpu index
// PolicyViolation represents a detected violation of a policy condition
type PolicyViolation struct {
// Condition specifies the type of policy that was violated
Condition policyCondition
// Timestamp indicates when the violation occurred
Timestamp time.Time
// Data contains violation-specific details
Data any
}
But in dcgmPolicyCallbackResponse_v2 struct from which we generates the PolicyViolation we can see there is a gpuId
/**
* Define the structure that is given to the callback function
*/
typedef struct
{
// version must always be first
unsigned int version; //!< version number (dcgmPolicyCallbackResponse_version)
dcgmPolicyCondition_t condition; //!< Condition that was violated
union
{
dcgmPolicyConditionDbe_t dbe; //!< ECC DBE return structure
dcgmPolicyConditionPci_t pci; //!< PCI replay error return structure
dcgmPolicyConditionMpr_t mpr; //!< Max retired pages limit return structure
dcgmPolicyConditionThermal_t thermal; //!< Thermal policy violations return structure
dcgmPolicyConditionPower_t power; //!< Power policy violations return structure
dcgmPolicyConditionNvlink_t nvlink; //!< Nvlink policy violations return structure
dcgmPolicyConditionXID_t xid; //!< XID policy violations return structure
} val;
unsigned int gpuId; //!< GPU ID of GPU which violated the policy.
} dcgmPolicyCallbackResponse_v2;
So i wonder can we add gpu index into PolicyViolation so we can see which gpu the violation belongs to? We can just add a simple gpuID = uint(response.gpuId) in the following code:
Lines 202 to 207 in 7c92211
| func ViolationRegistration(data unsafe.Pointer) int { | |
| var con policyCondition | |
| var timestamp time.Time | |
| var val any | |
| response := *(*C.dcgmPolicyCallbackResponse_t)(data) |
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels