collectors: add sriov metrics collector#32
collectors: add sriov metrics collector#32aharivel wants to merge 3 commits intoopenstack-k8s-operators:mainfrom
Conversation
Add a new collector that exports SR-IOV metrics for physical functions and virtual functions. Supports VFs bound to both network drivers and vfio-pci by reading stats from the parent PF when direct VF stats are unavailable. Exported metrics include traffic counters (rx/tx bytes, unicast, multicast, broadcast), error counters (dropped, allocation failures), and TX performance metrics. Each metric includes NUMA node information for topology-aware monitoring. Parses per-VF statistics from Intel PF drivers (ixgbe, i40e, ice) which use different naming conventions for VF stats. Signed-off-by: Anthony Harivel <aharivel@redhat.com>
edfab9a to
87f576a
Compare
rjarry
left a comment
There was a problem hiding this comment.
I wonder if it would make sense and/or be simpler to query the statistics using https://github.com/vishvananda/netlink.
The stats are parsed here: https://github.com/vishvananda/netlink/blob/c6faf428e8f84dcb73774e7c77a1e4fe38bbdb4d/link_linux.go#L3991
That way, no need to deal with hardware specific ethtool extended stats.
The problem is when the VF is bond to vfio_pci and netlink communicate with the kernel only . That's why I used "ethtool -s" to get the metrics directly from the PF. From what I see, it works only for mlx_5 driver. because each VF has a representator on the eswitch (e.g., eth0_0, eth0_1, or enp3s0f0_0) and when the VF is bond to vfio_pci, the representor still exists and has stats. So it's going to be tricky to filter which VF is intel and which is mlx in order to retrieves the metrics the right way. Any other idea ? |
Replace exec.Command("ethtool", "-S", ...) with the safchain/ethtool
Go library which uses ioctl directly. This removes shell exec overhead
and provides cleaner error handling while maintaining the same
functionality.
Signed-off-by: Anthony Harivel <aharivel@redhat.com>
Signed-off-by: Anthony Harivel <aharivel@redhat.com>
Add a new collector that exports SR-IOV metrics for physical functions and virtual functions. Supports VFs bound to both network drivers and vfio-pci by reading stats from the parent PF when direct VF stats are unavailable.
Exported metrics include traffic counters (rx/tx bytes, unicast, multicast, broadcast), error counters (dropped, allocation failures), and TX performance metrics. Each metric includes NUMA node information for topology-aware monitoring.
Parses per-VF statistics from Intel PF drivers (ixgbe, i40e, ice) which use different naming conventions for VF stats.