Skip to content

Conversation

@rjarry
Copy link
Collaborator

@rjarry rjarry commented Jan 9, 2026

Some devices report fewer TX queues than the number of datapath workers. Cap the TX queue count to the device maximum and distribute workers across available queues using modulo assignment.

When multiple workers are assigned to the same TX queue, protect rte_eth_tx_burst() with a per-queue spinlock. The lock pointer in tx_node_ctx is only set for queues that are actually shared, avoiding locking overhead when each worker has exclusive access to its queue.

Only assign TX queues to workers that have at least one RX queue since workers without RX queues will never transmit packets.

Add a workaround for the iavf driver which reports 256 max queues but fails to configure more than 16 when the PF does not support large VF.

Summary by CodeRabbit

  • New Features

    • Device-aware TX queue configuration with driver-specific limits, validation, and notices.
    • Per-TX-queue locks added and used when queues are shared across workers.
  • Refactor

    • Centralized TX-queue distribution and two-phase worker reload for safer redistribution.
    • Graph and TX path updated to track and wire queues explicitly, respecting shared-queue locking.
  • Tests

    • Unit tests updated for new distribution and device limits.
  • Chores

    • Upstream patch applied to correct reported max queue counts.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link

coderabbitai bot commented Jan 9, 2026

📝 Walkthrough

Walkthrough

Adds per-TX-queue spinlocks and propagates TX-queue-aware context and wiring through control and datapath. iface_info_port now contains txq_locks sized RTE_MAX_QUEUES_PER_PORT. Introduces struct port_queue and a new tx\_node\_ctx holding a txq and rte_spinlock_t *lock. Graph build now finds ports by id, tracks TXQ ownership across workers and assigns locks for shared TXQs. Worker graph reload is two‑phased and TXQ assignment is centralized in worker_txq_distribute. Port configuration now respects device-reported RX/TX limits; a DPDK patch fixes iavf reported max queues.


📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9c70405 and 4ffc11d.

📒 Files selected for processing (7)
  • modules/infra/control/gr_port.h
  • modules/infra/control/graph.c
  • modules/infra/control/port.c
  • modules/infra/control/worker.c
  • modules/infra/control/worker_test.c
  • modules/infra/datapath/gr_rxtx.h
  • modules/infra/datapath/port_tx.c
🚧 Files skipped from review as they are similar to previous changes (2)
  • modules/infra/control/graph.c
  • modules/infra/control/gr_port.h
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{c,h}

⚙️ CodeRabbit configuration file

**/*.{c,h}: - gr_vec_*() functions cannot fail. No need to check their return value.

  • gr_vec_free(x) always sets x = NULL. There is no risk of double free.
  • ec_node_*() functions consume their ec_node arguments. No leaks on error.
  • rte_node->ctx is an uint8_t array of size 16, not a pointer.
  • Never suggest to replace assert() with graceful error checking.
  • We compile with -std=gnu2x. Unnamed parameters in function definitions are valid.

Files:

  • modules/infra/control/worker.c
  • modules/infra/datapath/port_tx.c
  • modules/infra/control/worker_test.c
  • modules/infra/control/port.c
  • modules/infra/datapath/gr_rxtx.h
🧠 Learnings (3)
📚 Learning: 2025-11-05T14:28:28.817Z
Learnt from: rjarry
Repo: DPDK/grout PR: 372
File: modules/infra/datapath/loop_input.c:53-56
Timestamp: 2025-11-05T14:28:28.817Z
Learning: In modules/infra/datapath/loop_input.c, the l3_edges array is intentionally indexed using network byte order (rte_be16_t) for EtherType values. Both loopback_input_add_type() registration and loopback_input_process() lookup use network byte order consistently, so no byte order conversion is needed when accessing l3_edges.

Applied to files:

  • modules/infra/datapath/port_tx.c
📚 Learning: 2025-11-03T13:28:19.489Z
Learnt from: rjarry
Repo: DPDK/grout PR: 379
File: modules/infra/datapath/port_tx.c:36-46
Timestamp: 2025-11-03T13:28:19.489Z
Learning: In DPDK graph node process callbacks, the return value is only used for statistics and does not affect packet flow or scheduling through the graph. Nodes can return 0 when they haven't processed packets (e.g., when dropping or redirecting due to port state).

Applied to files:

  • modules/infra/datapath/port_tx.c
📚 Learning: 2025-12-17T17:32:21.746Z
Learnt from: christophefontaine
Repo: DPDK/grout PR: 466
File: modules/srv6/datapath/l2_encap.c:26-32
Timestamp: 2025-12-17T17:32:21.746Z
Learning: The GR_MBUF_PRIV_DATA_TYPE macro in modules/infra/datapath/gr_mbuf.h automatically adds `const struct iface *iface` as the first field to any structure defined with it. All types defined using this macro (e.g., srv6_dx2_mbuf_data, mbuf_data, queue_mbuf_data) will have the iface field available, followed by any custom fields provided as the second macro argument.

Applied to files:

  • modules/infra/datapath/port_tx.c
🧬 Code graph analysis (4)
modules/infra/control/worker.c (3)
main/gr_vec.h (1)
  • gr_vec_len (47-51)
modules/infra/control/iface.c (1)
  • iface_next (239-254)
modules/infra/control/graph.c (1)
  • worker_graph_reload_all (332-357)
modules/infra/control/worker_test.c (1)
modules/infra/control/worker.c (2)
  • worker_rxq_assign (251-312)
  • worker_count (125-133)
modules/infra/control/port.c (1)
api/gr_errno.h (1)
  • errno_set (10-13)
modules/infra/datapath/gr_rxtx.h (1)
modules/infra/control/gr_iface.h (1)
  • iface (16-25)
🪛 Cppcheck (2.19.0)
modules/infra/control/worker.c

[error] 231-231: Expression 'worker->txqs?free(__gr_vec_hdr(worker->txqs))

(unknownEvaluationOrder)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: build-and-tests (clang-15, none, debugoptimized, ubuntu-22.04, false)
  • GitHub Check: build-and-tests (gcc-14, address, debug, ubuntu-24.04, -Dfrr=enabled, true)
  • GitHub Check: build-and-tests (clang-16, none, debugoptimized, ubuntu-24.04, false)
  • GitHub Check: build-and-tests (clang-18, none, debugoptimized, ubuntu-24.04, false)
  • GitHub Check: build-and-tests (gcc-13, none, debugoptimized, ubuntu-24.04, -Dfrr=enabled, false)
  • GitHub Check: rpm
  • GitHub Check: deb
🔇 Additional comments (7)
modules/infra/control/worker_test.c (2)

29-34: LGTM!

The test mock now correctly includes max_rx_queues and max_tx_queues to test the device-aware queue configuration logic added in this PR.


408-412: LGTM!

Test expectations correctly verify TXQ sharing behavior. With 5 workers but only 4 max TXQs, workers w3 and w4 both receive queue_id=0, validating the modulo distribution logic in worker_txq_distribute.

modules/infra/datapath/port_tx.c (1)

57-63: LGTM!

The conditional locking correctly protects rte_eth_tx_burst() only when the queue is shared (lock pointer non-NULL), avoiding unnecessary locking overhead for exclusively-owned queues.

modules/infra/control/port.c (1)

94-118: LGTM!

The device-aware queue configuration correctly:

  • Caps TX queues to device maximum (with logging when limited)
  • Returns explicit error for RX queue overflow (user-requested vs. auto-computed)
  • Handles driver-specific constraints for net_tap/net_virtio
  • Initializes per-TXQ spinlocks within bounds
modules/infra/datapath/gr_rxtx.h (1)

21-35: LGTM!

Clean structure design. Using a pointer for lock enables the optimization of skipping locking for exclusively-owned queues (NULL pointer check in datapath).

modules/infra/control/worker.c (2)

225-249: LGTM!

The centralized TXQ distribution correctly implements the PR objectives:

  • Clears existing assignments before redistribution
  • Assigns TXQs only to workers with RX queues
  • Uses modulo to handle sharing when workers exceed available TXQs

The static analysis hint about gr_vec_free evaluation order is a false positive per coding guidelines.


300-311: LGTM!

Correctly rebuilds the full port list before redistributing TXQs and reloading all graphs, ensuring consistent state after RXQ reassignment.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Fix all issues with AI agents
In @modules/infra/control/port.c:
- Around line 113-121: When forcing TX queue count for drivers "net_tap" or
"net_virtio" (check info.driver_name), validate that p->n_rxq <=
info.max_tx_queues before setting p->n_txq = p->n_rxq; if it exceeds, return a
clear error (mirror the RX overflow handling) instead of assigning and proceed
only when valid, and only initialize txq_locks after the successful assignment.
Use the same error/return style as the RX overflow path and reference p->n_rxq,
p->n_txq, and info.max_tx_queues in the message so the failure is explicit (this
change affects the block that currently assigns p->n_txq and the subsequent
rte_spinlock_init loop).

In @modules/infra/control/worker.c:
- Around line 234-245: The loop that assigns tx queue IDs can divide by zero
because it uses txq_idx % port->n_txq; in gr_vec_foreach (port, ports) before
computing queue_id add a guard that checks if port->n_txq == 0 and skip mapping
for that port (or set enabled=false) so you don't perform the modulo; update the
block around STAILQ_FOREACH (worker, &workers, next) where struct queue_map is
constructed to only compute .queue_id = txq_idx % port->n_txq when port->n_txq >
0 and otherwise handle the case by continuing the worker loop or setting a safe
default, then increment txq_idx only when a mapping was actually added to
worker->txqs.
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3c10bc6 and fd0451d.

📒 Files selected for processing (6)
  • modules/infra/control/gr_port.h
  • modules/infra/control/graph.c
  • modules/infra/control/port.c
  • modules/infra/control/worker.c
  • modules/infra/datapath/gr_rxtx.h
  • modules/infra/datapath/port_tx.c
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{c,h}

⚙️ CodeRabbit configuration file

**/*.{c,h}: - gr_vec_*() functions cannot fail. No need to check their return value.

  • gr_vec_free(x) always sets x = NULL. There is no risk of double free.
  • ec_node_*() functions consume their ec_node arguments. No leaks on error.
  • rte_node->ctx is an uint8_t array of size 16, not a pointer.
  • Never suggest to replace assert() with graceful error checking.
  • We compile with -std=gnu2x. Unnamed parameters in function definitions are valid.

Files:

  • modules/infra/control/worker.c
  • modules/infra/control/port.c
  • modules/infra/datapath/port_tx.c
  • modules/infra/datapath/gr_rxtx.h
  • modules/infra/control/gr_port.h
  • modules/infra/control/graph.c
🧠 Learnings (5)
📚 Learning: 2025-11-05T14:28:28.817Z
Learnt from: rjarry
Repo: DPDK/grout PR: 372
File: modules/infra/datapath/loop_input.c:53-56
Timestamp: 2025-11-05T14:28:28.817Z
Learning: In modules/infra/datapath/loop_input.c, the l3_edges array is intentionally indexed using network byte order (rte_be16_t) for EtherType values. Both loopback_input_add_type() registration and loopback_input_process() lookup use network byte order consistently, so no byte order conversion is needed when accessing l3_edges.

Applied to files:

  • modules/infra/datapath/port_tx.c
  • modules/infra/control/gr_port.h
📚 Learning: 2025-11-03T13:28:19.489Z
Learnt from: rjarry
Repo: DPDK/grout PR: 379
File: modules/infra/datapath/port_tx.c:36-46
Timestamp: 2025-11-03T13:28:19.489Z
Learning: In DPDK graph node process callbacks, the return value is only used for statistics and does not affect packet flow or scheduling through the graph. Nodes can return 0 when they haven't processed packets (e.g., when dropping or redirecting due to port state).

Applied to files:

  • modules/infra/datapath/port_tx.c
  • modules/infra/control/graph.c
📚 Learning: 2025-12-17T17:32:21.746Z
Learnt from: christophefontaine
Repo: DPDK/grout PR: 466
File: modules/srv6/datapath/l2_encap.c:26-32
Timestamp: 2025-12-17T17:32:21.746Z
Learning: The GR_MBUF_PRIV_DATA_TYPE macro in modules/infra/datapath/gr_mbuf.h automatically adds `const struct iface *iface` as the first field to any structure defined with it. All types defined using this macro (e.g., srv6_dx2_mbuf_data, mbuf_data, queue_mbuf_data) will have the iface field available, followed by any custom fields provided as the second macro argument.

Applied to files:

  • modules/infra/datapath/port_tx.c
  • modules/infra/control/gr_port.h
📚 Learning: 2025-09-05T08:58:53.047Z
Learnt from: rjarry
Repo: DPDK/grout PR: 294
File: modules/policy/control/conntrack.c:533-539
Timestamp: 2025-09-05T08:58:53.047Z
Learning: DPDK functions rte_mempool_free() and rte_hash_free() are NULL-safe - they explicitly handle NULL pointers and do nothing when passed NULL, unlike many C standard library functions.

Applied to files:

  • modules/infra/control/gr_port.h
📚 Learning: 2025-09-09T12:08:36.790Z
Learnt from: rjarry
Repo: DPDK/grout PR: 294
File: modules/policy/control/conntrack.c:621-637
Timestamp: 2025-09-09T12:08:36.790Z
Learning: DPDK's rte_hash supports safe deletion during iteration via rte_hash_iterate, unlike many other hash table implementations that require two-phase deletion patterns.

Applied to files:

  • modules/infra/control/gr_port.h
🧬 Code graph analysis (3)
modules/infra/control/worker.c (1)
main/gr_vec.h (1)
  • gr_vec_len (47-51)
modules/infra/control/port.c (1)
api/gr_errno.h (1)
  • errno_set (10-13)
modules/infra/control/graph.c (2)
modules/infra/control/worker_test.c (1)
  • q (150-157)
main/gr_vec.h (1)
  • gr_vec_len (47-51)
🪛 Cppcheck (2.19.0)
modules/infra/control/worker.c

[error] 231-231: Expression 'worker->txqs?free(__gr_vec_hdr(worker->txqs))

(unknownEvaluationOrder)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: build-and-tests (clang-15, none, debugoptimized, ubuntu-22.04, false)
  • GitHub Check: build-and-tests (gcc-14, address, debug, ubuntu-24.04, -Dfrr=enabled, true)
  • GitHub Check: build-and-tests (gcc-13, none, debugoptimized, ubuntu-24.04, -Dfrr=enabled, false)
  • GitHub Check: deb
  • GitHub Check: rpm
🔇 Additional comments (1)
modules/infra/control/worker.c (1)

230-232: The macro gr_vec_free(v) uses the comma operator to sequence operations: ((v) ? free(__gr_vec_hdr(v)) : (void)0, (v) = NULL). The comma operator is a sequence point, ensuring the conditional free is fully evaluated before the assignment. No undefined behavior exists when passing a member lvalue like worker->txqs. Cppcheck's report is a false positive.

Likely an incorrect or invalid review comment.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Fix all issues with AI agents
In @modules/infra/control/worker.c:
- Around line 225-248: The assign_txqs function can divide by zero when
computing queue_id with txq_idx % port->n_txq; fix it by checking port->n_txq
before using the modulus: inside assign_txqs (and the inner STAILQ_FOREACH loop
over worker), if port->n_txq == 0 either skip creating/adding the txq for that
port (recommended) or set txq.queue_id = 0 and txq.enabled = false to avoid the
modulus; adjust the code that creates the struct queue_map (port->n_txq,
txq_idx, queue_id, enabled) accordingly so no modulo by zero occurs.
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fd0451d and 4a3a05d.

📒 Files selected for processing (6)
  • modules/infra/control/gr_port.h
  • modules/infra/control/graph.c
  • modules/infra/control/port.c
  • modules/infra/control/worker.c
  • modules/infra/datapath/gr_rxtx.h
  • modules/infra/datapath/port_tx.c
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{c,h}

⚙️ CodeRabbit configuration file

**/*.{c,h}: - gr_vec_*() functions cannot fail. No need to check their return value.

  • gr_vec_free(x) always sets x = NULL. There is no risk of double free.
  • ec_node_*() functions consume their ec_node arguments. No leaks on error.
  • rte_node->ctx is an uint8_t array of size 16, not a pointer.
  • Never suggest to replace assert() with graceful error checking.
  • We compile with -std=gnu2x. Unnamed parameters in function definitions are valid.

Files:

  • modules/infra/control/worker.c
  • modules/infra/datapath/port_tx.c
  • modules/infra/control/port.c
  • modules/infra/control/gr_port.h
  • modules/infra/datapath/gr_rxtx.h
  • modules/infra/control/graph.c
🧠 Learnings (5)
📚 Learning: 2025-11-05T14:28:28.817Z
Learnt from: rjarry
Repo: DPDK/grout PR: 372
File: modules/infra/datapath/loop_input.c:53-56
Timestamp: 2025-11-05T14:28:28.817Z
Learning: In modules/infra/datapath/loop_input.c, the l3_edges array is intentionally indexed using network byte order (rte_be16_t) for EtherType values. Both loopback_input_add_type() registration and loopback_input_process() lookup use network byte order consistently, so no byte order conversion is needed when accessing l3_edges.

Applied to files:

  • modules/infra/datapath/port_tx.c
  • modules/infra/control/gr_port.h
📚 Learning: 2025-11-03T13:28:19.489Z
Learnt from: rjarry
Repo: DPDK/grout PR: 379
File: modules/infra/datapath/port_tx.c:36-46
Timestamp: 2025-11-03T13:28:19.489Z
Learning: In DPDK graph node process callbacks, the return value is only used for statistics and does not affect packet flow or scheduling through the graph. Nodes can return 0 when they haven't processed packets (e.g., when dropping or redirecting due to port state).

Applied to files:

  • modules/infra/datapath/port_tx.c
  • modules/infra/control/graph.c
📚 Learning: 2025-12-17T17:32:21.746Z
Learnt from: christophefontaine
Repo: DPDK/grout PR: 466
File: modules/srv6/datapath/l2_encap.c:26-32
Timestamp: 2025-12-17T17:32:21.746Z
Learning: The GR_MBUF_PRIV_DATA_TYPE macro in modules/infra/datapath/gr_mbuf.h automatically adds `const struct iface *iface` as the first field to any structure defined with it. All types defined using this macro (e.g., srv6_dx2_mbuf_data, mbuf_data, queue_mbuf_data) will have the iface field available, followed by any custom fields provided as the second macro argument.

Applied to files:

  • modules/infra/datapath/port_tx.c
📚 Learning: 2025-09-05T08:58:53.047Z
Learnt from: rjarry
Repo: DPDK/grout PR: 294
File: modules/policy/control/conntrack.c:533-539
Timestamp: 2025-09-05T08:58:53.047Z
Learning: DPDK functions rte_mempool_free() and rte_hash_free() are NULL-safe - they explicitly handle NULL pointers and do nothing when passed NULL, unlike many C standard library functions.

Applied to files:

  • modules/infra/control/gr_port.h
📚 Learning: 2025-09-09T12:08:36.790Z
Learnt from: rjarry
Repo: DPDK/grout PR: 294
File: modules/policy/control/conntrack.c:621-637
Timestamp: 2025-09-09T12:08:36.790Z
Learning: DPDK's rte_hash supports safe deletion during iteration via rte_hash_iterate, unlike many other hash table implementations that require two-phase deletion patterns.

Applied to files:

  • modules/infra/control/gr_port.h
🧬 Code graph analysis (2)
modules/infra/control/port.c (1)
api/gr_errno.h (1)
  • errno_set (10-13)
modules/infra/control/graph.c (2)
modules/infra/control/worker_test.c (1)
  • q (150-157)
main/gr_vec.h (1)
  • gr_vec_len (47-51)
🪛 Cppcheck (2.19.0)
modules/infra/control/worker.c

[error] 231-231: Expression 'worker->txqs?free(__gr_vec_hdr(worker->txqs))

(unknownEvaluationOrder)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: build-and-tests (gcc-13, none, debugoptimized, ubuntu-24.04, -Dfrr=enabled, false)
  • GitHub Check: build-and-tests (gcc-14, address, debug, ubuntu-24.04, -Dfrr=enabled, true)
  • GitHub Check: build-and-tests (clang-15, none, debugoptimized, ubuntu-22.04, false)
  • GitHub Check: deb
  • GitHub Check: rpm
🔇 Additional comments (1)
modules/infra/control/port.c (1)

94-126: No fix needed. DPDK's ethdev layer guarantees that info.max_tx_queues will not exceed RTE_MAX_QUEUES_PER_PORT, so p->n_txq cannot exceed the array bounds of txq_locks[RTE_MAX_QUEUES_PER_PORT].

}

tx_ok = rte_eth_tx_burst(ctx->port_id, ctx->queue_id, mbufs, nb_objs);
if (ctx->lock != NULL)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not possible to have a dedicated function tx_process_locked to avoid the if (ctx->lock != NULL).

TX_NODE_BASE_LOCKED et TX_NODE_BASE.

It will avoid two branch in the hot path.

// force number of TX queues equal to requested RX queues
p->n_txq = p->n_rxq;
if (p->n_txq > info.max_tx_queues)
return errno_set(EOVERFLOW);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not a tx queue with N virtio ring ?

}

if (p->n_rxq > info.max_rx_queues)
return errno_set(EOVERFLOW);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with virtio ?

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
modules/infra/control/graph.c (1)

196-213: TX queue user count may include disabled queues.

The loop counting txq_users doesn't check q->enabled, so disabled queue mappings are counted. This won't cause data races (overcounting is safe—just adds an unnecessary lock), but the warning log may be misleading.

If this is intentional (defensive locking), no action needed. Otherwise:

🔧 Optional: filter by enabled queues
 		gr_vec_foreach_ref (struct queue_map *q, w->txqs) {
-			if (q->port_id == qmap->port_id && q->queue_id == qmap->queue_id)
+			if (q->enabled && q->port_id == qmap->port_id && q->queue_id == qmap->queue_id)
 				txq_users++;
 		}
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1c78547 and 2f697ac.

📒 Files selected for processing (9)
  • modules/infra/control/gr_port.h
  • modules/infra/control/graph.c
  • modules/infra/control/port.c
  • modules/infra/control/worker.c
  • modules/infra/control/worker_test.c
  • modules/infra/datapath/gr_rxtx.h
  • modules/infra/datapath/port_tx.c
  • subprojects/dpdk.wrap
  • subprojects/packagefiles/dpdk/iavf-fix-reported-max-TX-and-RX-queues-in-ethdev-inf.patch
✅ Files skipped from review due to trivial changes (1)
  • subprojects/dpdk.wrap
🧰 Additional context used
📓 Path-based instructions (1)
**/*.{c,h}

⚙️ CodeRabbit configuration file

**/*.{c,h}: - gr_vec_*() functions cannot fail. No need to check their return value.

  • gr_vec_free(x) always sets x = NULL. There is no risk of double free.
  • ec_node_*() functions consume their ec_node arguments. No leaks on error.
  • rte_node->ctx is an uint8_t array of size 16, not a pointer.
  • Never suggest to replace assert() with graceful error checking.
  • We compile with -std=gnu2x. Unnamed parameters in function definitions are valid.

Files:

  • modules/infra/control/gr_port.h
  • modules/infra/datapath/port_tx.c
  • modules/infra/control/port.c
  • modules/infra/control/graph.c
  • modules/infra/control/worker_test.c
  • modules/infra/control/worker.c
  • modules/infra/datapath/gr_rxtx.h
🧠 Learnings (5)
📚 Learning: 2025-09-05T08:58:53.047Z
Learnt from: rjarry
Repo: DPDK/grout PR: 294
File: modules/policy/control/conntrack.c:533-539
Timestamp: 2025-09-05T08:58:53.047Z
Learning: DPDK functions rte_mempool_free() and rte_hash_free() are NULL-safe - they explicitly handle NULL pointers and do nothing when passed NULL, unlike many C standard library functions.

Applied to files:

  • modules/infra/control/gr_port.h
📚 Learning: 2025-11-05T14:28:28.817Z
Learnt from: rjarry
Repo: DPDK/grout PR: 372
File: modules/infra/datapath/loop_input.c:53-56
Timestamp: 2025-11-05T14:28:28.817Z
Learning: In modules/infra/datapath/loop_input.c, the l3_edges array is intentionally indexed using network byte order (rte_be16_t) for EtherType values. Both loopback_input_add_type() registration and loopback_input_process() lookup use network byte order consistently, so no byte order conversion is needed when accessing l3_edges.

Applied to files:

  • modules/infra/control/gr_port.h
  • modules/infra/datapath/port_tx.c
📚 Learning: 2025-09-09T12:08:36.790Z
Learnt from: rjarry
Repo: DPDK/grout PR: 294
File: modules/policy/control/conntrack.c:621-637
Timestamp: 2025-09-09T12:08:36.790Z
Learning: DPDK's rte_hash supports safe deletion during iteration via rte_hash_iterate, unlike many other hash table implementations that require two-phase deletion patterns.

Applied to files:

  • modules/infra/control/gr_port.h
📚 Learning: 2025-11-03T13:28:19.489Z
Learnt from: rjarry
Repo: DPDK/grout PR: 379
File: modules/infra/datapath/port_tx.c:36-46
Timestamp: 2025-11-03T13:28:19.489Z
Learning: In DPDK graph node process callbacks, the return value is only used for statistics and does not affect packet flow or scheduling through the graph. Nodes can return 0 when they haven't processed packets (e.g., when dropping or redirecting due to port state).

Applied to files:

  • modules/infra/datapath/port_tx.c
  • modules/infra/control/graph.c
📚 Learning: 2025-12-17T17:32:21.746Z
Learnt from: christophefontaine
Repo: DPDK/grout PR: 466
File: modules/srv6/datapath/l2_encap.c:26-32
Timestamp: 2025-12-17T17:32:21.746Z
Learning: The GR_MBUF_PRIV_DATA_TYPE macro in modules/infra/datapath/gr_mbuf.h automatically adds `const struct iface *iface` as the first field to any structure defined with it. All types defined using this macro (e.g., srv6_dx2_mbuf_data, mbuf_data, queue_mbuf_data) will have the iface field available, followed by any custom fields provided as the second macro argument.

Applied to files:

  • modules/infra/datapath/port_tx.c
🧬 Code graph analysis (2)
modules/infra/control/port.c (1)
api/gr_errno.h (1)
  • errno_set (10-13)
modules/infra/control/worker_test.c (1)
modules/infra/control/worker.c (2)
  • worker_rxq_assign (251-312)
  • worker_count (125-133)
🪛 Cppcheck (2.19.0)
modules/infra/control/worker.c

[error] 231-231: Expression 'worker->txqs?free(__gr_vec_hdr(worker->txqs))

(unknownEvaluationOrder)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)
  • GitHub Check: build-and-tests (clang-16, none, debugoptimized, ubuntu-24.04, false)
  • GitHub Check: build-and-tests (clang-18, none, debugoptimized, ubuntu-24.04, false)
  • GitHub Check: build-and-tests (gcc-13, none, debugoptimized, ubuntu-24.04, -Dfrr=enabled, false)
  • GitHub Check: build-and-tests (clang-15, none, debugoptimized, ubuntu-22.04, false)
  • GitHub Check: build-and-tests (gcc-14, address, debug, ubuntu-24.04, -Dfrr=enabled, true)
  • GitHub Check: deb
  • GitHub Check: rpm
🔇 Additional comments (10)
modules/infra/datapath/port_tx.c (1)

57-63: LGTM! Conditional locking for shared TX queues is correctly implemented.

The lock-only-when-shared pattern avoids spinlock overhead for exclusively-owned queues while properly protecting concurrent access when multiple workers share a TX queue.

subprojects/packagefiles/dpdk/iavf-fix-reported-max-TX-and-RX-queues-in-ethdev-inf.patch (1)

38-44: LGTM! Correctly caps max queue reporting based on VF capability.

The fix properly checks VIRTCHNL_VF_LARGE_NUM_QPAIRS capability before reporting 256 queues, falling back to 16 when the PF doesn't support large VF.

modules/infra/control/gr_port.h (1)

36-36: LGTM! Per-TX-queue spinlock array for shared queue synchronization.

The array size matches DPDK's RTE_MAX_QUEUES_PER_PORT, ensuring coverage for all possible queue configurations.

modules/infra/control/port.c (1)

94-118: LGTM! Device-aware TX queue configuration with proper driver quirks handling.

The implementation correctly:

  • Caps TX queues to device maximum
  • Handles net_tap/net_virtio requirement for equal RX/TX counts
  • Logs a notice when queue count is reduced
  • Initializes per-queue spinlocks for the sharing mechanism
modules/infra/datapath/gr_rxtx.h (1)

21-35: LGTM! Clean separation of TX node context with optional locking.

The tx_node_ctx structure properly encapsulates the TX queue info and lock pointer, enabling conditional locking only for shared queues (when lock != NULL).

modules/infra/control/worker_test.c (1)

408-412: LGTM! Test correctly validates TX queue sharing via modulo distribution.

With 5 workers and 4 TX queues per port, the modulo assignment causes workers to share TXQ 0 (w3 and w4), which is the expected behavior protected by the spinlocks in port_tx.c.

modules/infra/control/worker.c (1)

225-249: LGTM! Clean TX queue distribution with proper sharing semantics.

The implementation correctly:

  • Clears all TXQ assignments before redistribution
  • Only assigns TXQs to workers that have RXQs (workers without RXQs won't transmit)
  • Uses modulo assignment to distribute workers across available TXQs

The static analysis warning on line 231 is a false positive - per coding guidelines, gr_vec_free() is safe and sets the pointer to NULL.

modules/infra/control/graph.c (3)

77-87: LGTM!

Helper function cleanly encapsulates port lookup logic. The assert is appropriate here since callers should only pass valid port IDs from the existing queue mappings.


161-168: LGTM!

Proper initialization of TX node contexts to invalid values prevents accidental transmission on unassigned queues.


215-223: LGTM!

Edge wiring correctly maps port_id to the corresponding TX node edge. The index is bounded by DPDK's port limits.

This fix has been submitted upstream.

Link: https://patches.dpdk.org/project/dpdk/patch/20260109142129.1550058-2-rjarry@redhat.com/
Signed-off-by: Robin Jarry <rjarry@redhat.com>
Replace worker_txq_id() with worker_txq_distribute() which clears all
TX queue assignments and reassigns them only to workers that have at
least one RX queue. Workers without RX queues do not process packets
and therefore have no need for TX queues.

Call worker_txq_distribute() from both worker_queue_distribute() during
initial setup and worker_rxq_assign() when RX queues are moved at
runtime. Since worker_rxq_assign() now calls worker_graph_reload_all()
instead of reloading individual workers, remove the now-redundant
worker_graph_reload() call from worker_queue_distribute().

To prevent two workers from transmitting on the same TX queue during
graph reloads, modify worker_graph_reload_all() to first stop all
workers by reloading them with NULL ports, then reload with the actual
configuration. Adjust worker_graph_new() to treat NULL ports as a
request to create a NULL graph.

Update unit tests to expect empty TX queue assignments for workers
without RX queues.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
Replace direct use of port_queue as the TX node context with a new
tx_node_ctx structure that wraps port_queue as a member. This allows
extending the TX node context with additional fields in future commits.

Convert port_queue from a GR_NODE_CTX_TYPE macro to a plain struct since
it is no longer used directly as a node context. It remains used for
packet tracing where only port_id and queue_id are recorded.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
Extract the port lookup loop into a dedicated find_port() function.
This will be reused in the next commit when configuring TX node context.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
Some network devices report fewer TX queues than the number of datapath
workers. Cap the TX queue count to the device maximum and distribute
workers across available queues using modulo assignment.

When multiple workers share the same TX queue, protect rte_eth_tx_burst()
with a per-queue spinlock stored in iface_info_port. The lock pointer in
tx_node_ctx is only set for shared queues to avoid locking overhead when
a worker has exclusive access.

Update unit tests to set max_tx_queues in the mock device info and
adjust expected TX queue assignments for modulo wrapping.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
Unlike TX queues which are silently capped to the device maximum, RX
queues are explicitly requested by the user. Return EOVERFLOW if the
requested count exceeds what the device supports.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
The net_tap and net_virtio drivers require equal RX and TX queue counts.
Set TX queue count to match RX queues for these drivers. This may result
in fewer TX queues than workers, which is now handled by the TX queue
sharing support.

Signed-off-by: Robin Jarry <rjarry@redhat.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants