Skip to content

provide gpuid in PolicyViolation #84

@ilyee

Description

@ilyee

I am using go-dcgm to receive policy violation events, but I find PolicyViolation struct does not contain gpu index

// PolicyViolation represents a detected violation of a policy condition
type PolicyViolation struct {
	// Condition specifies the type of policy that was violated
	Condition policyCondition
	// Timestamp indicates when the violation occurred
	Timestamp time.Time
	// Data contains violation-specific details
	Data any
}

But in dcgmPolicyCallbackResponse_v2 struct from which we generates the PolicyViolation we can see there is a gpuId

/**
 * Define the structure that is given to the callback function
 */
typedef struct
{
    // version must always be first
    unsigned int version; //!< version number (dcgmPolicyCallbackResponse_version)

    dcgmPolicyCondition_t condition; //!< Condition that was violated
    union
    {
        dcgmPolicyConditionDbe_t dbe;         //!< ECC DBE return structure
        dcgmPolicyConditionPci_t pci;         //!< PCI replay error return structure
        dcgmPolicyConditionMpr_t mpr;         //!< Max retired pages limit return structure
        dcgmPolicyConditionThermal_t thermal; //!< Thermal policy violations return structure
        dcgmPolicyConditionPower_t power;     //!< Power policy violations return structure
        dcgmPolicyConditionNvlink_t nvlink;   //!< Nvlink policy violations return structure
        dcgmPolicyConditionXID_t xid;         //!< XID policy violations return structure
    } val;

    unsigned int gpuId; //!< GPU ID of GPU which violated the policy.
} dcgmPolicyCallbackResponse_v2;

So i wonder can we add gpu index into PolicyViolation so we can see which gpu the violation belongs to? We can just add a simple gpuID = uint(response.gpuId) in the following code:

go-dcgm/pkg/dcgm/policy.go

Lines 202 to 207 in 7c92211

func ViolationRegistration(data unsafe.Pointer) int {
var con policyCondition
var timestamp time.Time
var val any
response := *(*C.dcgmPolicyCallbackResponse_t)(data)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions