Skip to content

feat(nic-stats): added NIC statistics in shared memory with FFI and RPC#674

Open
kt0ns wants to merge 3 commits intomainfrom
feature-nic-counter-dpdk
Open

feat(nic-stats): added NIC statistics in shared memory with FFI and RPC#674
kt0ns wants to merge 3 commits intomainfrom
feature-nic-counter-dpdk

Conversation

@kt0ns
Copy link
Copy Markdown
Collaborator

@kt0ns kt0ns commented Apr 24, 2026

Closes #673

Copilot AI review requested due to automatic review settings April 24, 2026 13:14
@kt0ns kt0ns linked an issue Apr 24, 2026 that may be closed by this pull request
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to add NIC-level statistics collection in the dataplane and expose those counters through shared memory, FFI, and a new gRPC endpoint, aligning with issue #673’s goal of making hardware/driver-level metrics available to the controlplane.

Changes:

  • Added NIC stat counter fields to the dataplane worker shared structures and updated the dataplane worker loop to periodically read rte_eth_stats_get.
  • Added a new C API (yanet_get_nic_counters) and Go FFI wrapper (DPConfig.NicCounters) for retrieving NIC counters.
  • Extended the Counters gRPC proto with a Nic RPC.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
lib/dataplane/config/zone.h Adds NIC counter pointer fields to struct dp_worker.
lib/controlplane/config/zone.h Adds cp_config_gen_get_nic_counter_storage() helper.
lib/controlplane/agent/agent.c Adds yanet_get_nic_counters() counter-handle listing function.
dataplane/worker.c Registers new counters, wires worker fields to counter storage, and periodically updates NIC stats.
dataplane/dataplane.c Comments out basic rte_eth_stats_get logging in stat_thread().
controlplane/ynpb/counters.proto Adds rpc Nic(NicCounterRequest).
controlplane/ffi/shm.go Adds DPConfig.NicCounters() FFI wrapper.
api/counter.h Exposes yanet_get_nic_counters() in the public C header.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +1689 to +1714
struct counter_handle_list *list = (struct counter_handle_list *)malloc(
sizeof(struct counter_handle_list) +
sizeof(struct counter_handle) * count
);

if (list == NULL)
return NULL;
list->instance_count =
ADDR_OF(&counter_storage->allocator)->instance_count;
list->count = count;
struct counter_handle *handlers = list->counters;

uint64_t out_idx = 0;
for (uint64_t idx = 0; idx < count; ++idx) {
if (!counter_name_matches_query(
names[idx].name, query, query_count
)) {
continue;
}
strtcpy(handlers[out_idx].name, names[idx].name, 60);
handlers[out_idx].size = names[idx].size;
handlers[out_idx].gen = names[idx].gen;
handlers[out_idx].value_handle =
counter_get_value_handle(idx, counter_storage);
out_idx++;
}
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yanet_get_nic_counters allocates count handlers and sets list->count = count, but only fills entries that match the query. If there are fewer (or zero) matches, the remaining handlers are left uninitialized and will be encoded/returned as garbage. Compute match_count first (like the existing query-based counter listing logic earlier in this file), allocate based on match_count, and set list->count = match_count (return NULL when match_count == 0).

Copilot uses AI. Check for mistakes.
Comment thread lib/controlplane/config/zone.h Outdated
cp_config_gen_get_nic_counter_storage(
struct cp_config_gen *config_gen, const char *device_name
) {
return cp_config_gen_get_device_counter_storage(config_gen, device_name);
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cp_config_gen_get_nic_counter_storage() currently just aliases cp_config_gen_get_device_counter_storage(), but NIC counters are named separately ("nic_*") and aren’t registered in the device counter registry by default. As-is, NIC lookups will return the same storage as device counters and won’t contain NIC metrics. Introduce a dedicated NIC counter storage entry in the counter storage registry (or explicitly extend device counters with NIC fields) so this function returns the correct storage.

Suggested change
return cp_config_gen_get_device_counter_storage(config_gen, device_name);
return cp_config_counter_storage_registry_lookup_nic(
&config_gen->counter_storage_registry, device_name
);

Copilot uses AI. Check for mistakes.
Comment thread dataplane/worker.c Outdated
{"rx", 2, (const size_t[]){offsetof(struct dp_worker, rx_count), offsetof(struct dp_worker, rx_size)}},
{"tx", 2, (const size_t[]){offsetof(struct dp_worker, tx_count), offsetof(struct dp_worker, tx_size)}},
{"remote_rx", 1, (const size_t[]){offsetof(struct dp_worker, remote_rx_count)}},
{"remote_tx", 1, (const size_t[]){offsetof(struct dp_worker, remote_tx_count)},},
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Initializer for the worker_counter_info entry "remote_tx" has an extra trailing comma inside the struct initializer (..., (const size_t[]){...},},). This provides too many initializers for the anonymous struct and should fail compilation; remove the extra comma so the entry has exactly 3 fields like the others.

Suggested change
{"remote_tx", 1, (const size_t[]){offsetof(struct dp_worker, remote_tx_count)},},
{"remote_tx", 1, (const size_t[]){offsetof(struct dp_worker, remote_tx_count)}},

Copilot uses AI. Check for mistakes.
Comment thread dataplane/worker.c Outdated
Comment on lines +363 to +367
uint64_t iterations = 0;

while (1) {
iterations = (iterations + 1) % NIC_STATS_FREAQUENCY;
if (iterations == 0) worker_unload_nic_stats(worker);
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

iterations is a local variable initialized to 0 on each worker_loop_round() call, so the (iterations + 1) % NIC_STATS_FREAQUENCY counter resets every round. Since the inner while (1) typically exits once the packet fronts are empty, this likely never reaches 0 and worker_unload_nic_stats() won’t run (NIC stats won’t be updated). Persist the counter across rounds (e.g., store it in struct dataplane_worker/dp_worker, or use a time-based check).

Copilot uses AI. Check for mistakes.
Comment thread dataplane/worker.c Outdated

#include <rte_ethdev.h>

#define NIC_STATS_FREAQUENCY 1000
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Macro name NIC_STATS_FREAQUENCY is misspelled; rename to NIC_STATS_FREQUENCY to avoid propagating the typo into call sites and future code.

Suggested change
#define NIC_STATS_FREAQUENCY 1000
#define NIC_STATS_FREQUENCY 1000

Copilot uses AI. Check for mistakes.
Comment thread dataplane/dataplane.c Outdated
Comment on lines 620 to 625
// struct rte_eth_stats stats0[dataplane->device_count];
for (uint16_t idx = 0; idx < dataplane->device_count; ++idx) {
rte_eth_stats_get(
dataplane->devices[idx].port_id, &stats0[idx]
);
// rte_eth_stats_get(
// dataplane->devices[idx].port_id, &stats0[idx]
// );
rte_eth_xstats_get(
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Large blocks of commented-out rte_eth_stats_get/fprintf logic were left in stat_thread(). If this logging is no longer needed, remove the dead code; if it is still useful for debugging, consider guarding it behind a build flag (to keep the runtime path and source clean).

Copilot uses AI. Check for mistakes.
Comment on lines +66 to +67
// trade-off: nic stats unload not often
// bcs put it in second cache line
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment uses informal abbreviations ("bcs", "unload not often") and is hard to parse. Please rephrase in clear English (e.g., explain the cache-line placement trade-off explicitly).

Suggested change
// trade-off: nic stats unload not often
// bcs put it in second cache line
// Trade-off: place NIC statistics in the second cache line
// because they are accessed less frequently than the fields above.

Copilot uses AI. Check for mistakes.
Comment thread lib/controlplane/agent/agent.c Outdated
Comment on lines +1672 to +1684
struct counter_storage *cs = cp_config_gen_get_nic_counter_storage(
cp_config_gen, device_name
);

if (cs == NULL) {
cp_config_unlock(cp_config);
return NULL;
}
counter_storage = cs;
counter_registry = ADDR_OF(&counter_storage->registry);

uint64_t count = counter_registry->count;
struct counter *names = ADDR_OF(&counter_registry->names);
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yanet_get_nic_counters looks up counter storage via cp_config_gen_get_nic_counter_storage(), which currently aliases cp_config_gen_get_device_counter_storage(). Device counter registries only register "rx/tx/rx_bytes/tx_bytes" (see lib/controlplane/config/cp_device.c), so the NIC-prefixed counters won’t exist in this storage and the NIC RPC will return no real data. Either register these NIC counters in the device counter registry/storage, or change this lookup to point at the actual storage where NIC stats are written.

Copilot uses AI. Check for mistakes.
Comment on lines 11 to 15
rpc Function(FunctionCountersRequest) returns (CountersResponse) {}
rpc Chain(ChainCountersRequest) returns (CountersResponse) {}
rpc Module(ModuleCountersRequest) returns (CountersResponse) {}
rpc Nic(NicCounterRequest) returns (CountersResponse) {}
rpc Perf(PerfCountersRequest) returns (PerfCountersResponse) {}
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new rpc Nic is added to the public gRPC service, but there is no corresponding handler implemented in controlplane/internal/gateway/counters_service.go (so clients will get Unimplemented unless/ until it’s wired up). Please add the server-side method that calls DPConfig.NicCounters() (or remove the RPC until it’s supported end-to-end).

Copilot uses AI. Check for mistakes.
Comment thread dataplane/worker.c Outdated
Comment on lines +276 to +311
struct nic_stat {
uint64_t ibytes;
uint64_t obytes;
uint64_t ipackets;
uint64_t opackets;
uint64_t ierrors;
uint64_t oerrors;
uint64_t rx_nombuf;
};

void worker_unload_nic_stats(struct dataplane_worker *worker) {
struct rte_eth_stats stats1;
struct dp_worker *dp_worker = worker->dp_worker;

// const struct nic_stat stats0 = {
// .ibytes = *dp_worker->nic_rx_bytes,
// .obytes = *dp_worker->nic_tx_bytes,
// .ipackets = *dp_worker->nic_rx_packets,
// .opackets = *dp_worker->nic_tx_packets,
// .ierrors = *dp_worker->nic_rx_errors,
// .oerrors = *dp_worker->nic_tx_errors,
// .rx_nombuf = *dp_worker->nic_rx_nombuf
// };


rte_eth_stats_get(worker->port_id, &stats1);

// struct nic_stat diff = {
// .ibytes = stats1.ibytes - stats0.ibytes,
// .obytes = stats1.obytes - stats0.obytes,
// .ipackets = stats1.ipackets - stats0.ipackets,
// .opackets = stats1.opackets - stats0.opackets,
// .ierrors = stats1.ierrors - stats0.ierrors,
// .oerrors = stats1.oerrors - stats0.oerrors,
// .rx_nombuf = stats1.rx_nombuf - stats0.rx_nombuf
// };
Copy link

Copilot AI Apr 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

struct nic_stat is currently unused (all related logic is commented out). Removing the unused type and commented code will keep the dataplane hot-path file easier to maintain.

Suggested change
struct nic_stat {
uint64_t ibytes;
uint64_t obytes;
uint64_t ipackets;
uint64_t opackets;
uint64_t ierrors;
uint64_t oerrors;
uint64_t rx_nombuf;
};
void worker_unload_nic_stats(struct dataplane_worker *worker) {
struct rte_eth_stats stats1;
struct dp_worker *dp_worker = worker->dp_worker;
// const struct nic_stat stats0 = {
// .ibytes = *dp_worker->nic_rx_bytes,
// .obytes = *dp_worker->nic_tx_bytes,
// .ipackets = *dp_worker->nic_rx_packets,
// .opackets = *dp_worker->nic_tx_packets,
// .ierrors = *dp_worker->nic_rx_errors,
// .oerrors = *dp_worker->nic_tx_errors,
// .rx_nombuf = *dp_worker->nic_rx_nombuf
// };
rte_eth_stats_get(worker->port_id, &stats1);
// struct nic_stat diff = {
// .ibytes = stats1.ibytes - stats0.ibytes,
// .obytes = stats1.obytes - stats0.obytes,
// .ipackets = stats1.ipackets - stats0.ipackets,
// .opackets = stats1.opackets - stats0.opackets,
// .ierrors = stats1.ierrors - stats0.ierrors,
// .oerrors = stats1.oerrors - stats0.oerrors,
// .rx_nombuf = stats1.rx_nombuf - stats0.rx_nombuf
// };
void worker_unload_nic_stats(struct dataplane_worker *worker) {
struct rte_eth_stats stats1;
struct dp_worker *dp_worker = worker->dp_worker;
rte_eth_stats_get(worker->port_id, &stats1);

Copilot uses AI. Check for mistakes.
@yanet-platform yanet-platform deleted a comment from Copilot AI Apr 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: add NIC-level statistics collection and exposure

2 participants