infra: add TX queue sharing support #474

rjarry · 2026-01-09T15:36:57Z

Some devices report fewer TX queues than the number of datapath workers. Cap the TX queue count to the device maximum and distribute workers across available queues using modulo assignment.

When multiple workers are assigned to the same TX queue, protect rte_eth_tx_burst() with a per-queue spinlock. The lock pointer in tx_node_ctx is only set for queues that are actually shared, avoiding locking overhead when each worker has exclusive access to its queue.

Only assign TX queues to workers that have at least one RX queue since workers without RX queues will never transmit packets.

Add a workaround for the iavf driver which reports 256 max queues but fails to configure more than 16 when the PF does not support large VF.

Summary by CodeRabbit

New Features
- Device-aware TX queue configuration with driver-specific limits, validation, and notices.
- Per-TX-queue locks added and used when queues are shared across workers.
Refactor
- Centralized TX-queue distribution and two-phase worker reload for safer redistribution.
- Graph and TX path updated to track and wire queues explicitly, respecting shared-queue locking.
Tests
- Unit tests updated for new distribution and device limits.
Chores
- Upstream patch applied to correct reported max queue counts.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2026-01-09T15:42:14Z

📝 Walkthrough

Walkthrough

Adds per-TX-queue spinlocks and propagates TX-queue-aware context and wiring through control and datapath. iface_info_port now contains txq_locks sized RTE_MAX_QUEUES_PER_PORT. Introduces struct port_queue and a new tx\_node\_ctx holding a txq and rte_spinlock_t *lock. Graph build now finds ports by id, tracks TXQ ownership across workers and assigns locks for shared TXQs. Worker graph reload is two‑phased and TXQ assignment is centralized in worker_txq_distribute. Port configuration now respects device-reported RX/TX limits; a DPDK patch fixes iavf reported max queues.

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 9c70405 and 4ffc11d.

📒 Files selected for processing (7)

modules/infra/control/gr_port.h
modules/infra/control/graph.c
modules/infra/control/port.c
modules/infra/control/worker.c
modules/infra/control/worker_test.c
modules/infra/datapath/gr_rxtx.h
modules/infra/datapath/port_tx.c

🚧 Files skipped from review as they are similar to previous changes (2)

modules/infra/control/graph.c
modules/infra/control/gr_port.h

🧰 Additional context used

📓 Path-based instructions (1)

**/*.{c,h}

⚙️ CodeRabbit configuration file

**/*.{c,h}: - gr_vec_*() functions cannot fail. No need to check their return value.

gr_vec_free(x) always sets x = NULL. There is no risk of double free.

ec_node_*() functions consume their ec_node arguments. No leaks on error.

rte_node->ctx is an uint8_t array of size 16, not a pointer.

Never suggest to replace assert() with graceful error checking.

We compile with -std=gnu2x. Unnamed parameters in function definitions are valid.

Files:

modules/infra/control/worker.c
modules/infra/datapath/port_tx.c
modules/infra/control/worker_test.c
modules/infra/control/port.c
modules/infra/datapath/gr_rxtx.h

🧠 Learnings (3)

📚 Learning: 2025-11-05T14:28:28.817Z

Learnt from: rjarry
Repo: DPDK/grout PR: 372
File: modules/infra/datapath/loop_input.c:53-56
Timestamp: 2025-11-05T14:28:28.817Z
Learning: In modules/infra/datapath/loop_input.c, the l3_edges array is intentionally indexed using network byte order (rte_be16_t) for EtherType values. Both loopback_input_add_type() registration and loopback_input_process() lookup use network byte order consistently, so no byte order conversion is needed when accessing l3_edges.

Applied to files:

modules/infra/datapath/port_tx.c

📚 Learning: 2025-11-03T13:28:19.489Z

Learnt from: rjarry
Repo: DPDK/grout PR: 379
File: modules/infra/datapath/port_tx.c:36-46
Timestamp: 2025-11-03T13:28:19.489Z
Learning: In DPDK graph node process callbacks, the return value is only used for statistics and does not affect packet flow or scheduling through the graph. Nodes can return 0 when they haven't processed packets (e.g., when dropping or redirecting due to port state).

Applied to files:

modules/infra/datapath/port_tx.c

📚 Learning: 2025-12-17T17:32:21.746Z

Learnt from: christophefontaine
Repo: DPDK/grout PR: 466
File: modules/srv6/datapath/l2_encap.c:26-32
Timestamp: 2025-12-17T17:32:21.746Z
Learning: The GR_MBUF_PRIV_DATA_TYPE macro in modules/infra/datapath/gr_mbuf.h automatically adds `const struct iface *iface` as the first field to any structure defined with it. All types defined using this macro (e.g., srv6_dx2_mbuf_data, mbuf_data, queue_mbuf_data) will have the iface field available, followed by any custom fields provided as the second macro argument.

Applied to files:

modules/infra/datapath/port_tx.c

🧬 Code graph analysis (4)

modules/infra/control/worker.c (3)

main/gr_vec.h (1)

gr_vec_len (47-51)

modules/infra/control/iface.c (1)

iface_next (239-254)

modules/infra/control/graph.c (1)

worker_graph_reload_all (332-357)

modules/infra/control/worker_test.c (1)

modules/infra/control/worker.c (2)

worker_rxq_assign (251-312)

worker_count (125-133)

modules/infra/control/port.c (1)

api/gr_errno.h (1)

errno_set (10-13)

modules/infra/datapath/gr_rxtx.h (1)

modules/infra/control/gr_iface.h (1)

iface (16-25)

🪛 Cppcheck (2.19.0)

modules/infra/control/worker.c

[error] 231-231: Expression 'worker->txqs?free(__gr_vec_hdr(worker->txqs))

(unknownEvaluationOrder)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: build-and-tests (clang-15, none, debugoptimized, ubuntu-22.04, false)
GitHub Check: build-and-tests (gcc-14, address, debug, ubuntu-24.04, -Dfrr=enabled, true)
GitHub Check: build-and-tests (clang-16, none, debugoptimized, ubuntu-24.04, false)
GitHub Check: build-and-tests (clang-18, none, debugoptimized, ubuntu-24.04, false)
GitHub Check: build-and-tests (gcc-13, none, debugoptimized, ubuntu-24.04, -Dfrr=enabled, false)
GitHub Check: rpm
GitHub Check: deb

🔇 Additional comments (7)

modules/infra/control/worker_test.c (2)

29-34: LGTM!

The test mock now correctly includes max_rx_queues and max_tx_queues to test the device-aware queue configuration logic added in this PR.

408-412: LGTM!

Test expectations correctly verify TXQ sharing behavior. With 5 workers but only 4 max TXQs, workers w3 and w4 both receive queue_id=0, validating the modulo distribution logic in worker_txq_distribute.

modules/infra/datapath/port_tx.c (1)

57-63: LGTM!

The conditional locking correctly protects rte_eth_tx_burst() only when the queue is shared (lock pointer non-NULL), avoiding unnecessary locking overhead for exclusively-owned queues.

modules/infra/control/port.c (1)

94-118: LGTM!

The device-aware queue configuration correctly:

Caps TX queues to device maximum (with logging when limited)

Returns explicit error for RX queue overflow (user-requested vs. auto-computed)

Handles driver-specific constraints for net_tap/net_virtio

Initializes per-TXQ spinlocks within bounds

modules/infra/datapath/gr_rxtx.h (1)

21-35: LGTM!

Clean structure design. Using a pointer for lock enables the optimization of skipping locking for exclusively-owned queues (NULL pointer check in datapath).

modules/infra/control/worker.c (2)

225-249: LGTM!

The centralized TXQ distribution correctly implements the PR objectives:

Clears existing assignments before redistribution

Assigns TXQs only to workers with RX queues

Uses modulo to handle sharing when workers exceed available TXQs

The static analysis hint about gr_vec_free evaluation order is a false positive per coding guidelines.

300-311: LGTM!

Correctly rebuilds the full port list before redistributing TXQs and reloading all graphs, ensuring consistent state after RXQ reassignment.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Fix all issues with AI agents

In @modules/infra/control/port.c:
- Around line 113-121: When forcing TX queue count for drivers "net_tap" or
"net_virtio" (check info.driver_name), validate that p->n_rxq <=
info.max_tx_queues before setting p->n_txq = p->n_rxq; if it exceeds, return a
clear error (mirror the RX overflow handling) instead of assigning and proceed
only when valid, and only initialize txq_locks after the successful assignment.
Use the same error/return style as the RX overflow path and reference p->n_rxq,
p->n_txq, and info.max_tx_queues in the message so the failure is explicit (this
change affects the block that currently assigns p->n_txq and the subsequent
rte_spinlock_init loop).

In @modules/infra/control/worker.c:
- Around line 234-245: The loop that assigns tx queue IDs can divide by zero
because it uses txq_idx % port->n_txq; in gr_vec_foreach (port, ports) before
computing queue_id add a guard that checks if port->n_txq == 0 and skip mapping
for that port (or set enabled=false) so you don't perform the modulo; update the
block around STAILQ_FOREACH (worker, &workers, next) where struct queue_map is
constructed to only compute .queue_id = txq_idx % port->n_txq when port->n_txq >
0 and otherwise handle the case by continuing the worker loop or setting a safe
default, then increment txq_idx only when a mapping was actually added to
worker->txqs.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3c10bc6 and fd0451d.

📒 Files selected for processing (6)

modules/infra/control/gr_port.h
modules/infra/control/graph.c
modules/infra/control/port.c
modules/infra/control/worker.c
modules/infra/datapath/gr_rxtx.h
modules/infra/datapath/port_tx.c

🧰 Additional context used

📓 Path-based instructions (1)

**/*.{c,h}

⚙️ CodeRabbit configuration file

**/*.{c,h}: - gr_vec_*() functions cannot fail. No need to check their return value.

gr_vec_free(x) always sets x = NULL. There is no risk of double free.

ec_node_*() functions consume their ec_node arguments. No leaks on error.

rte_node->ctx is an uint8_t array of size 16, not a pointer.

Never suggest to replace assert() with graceful error checking.

We compile with -std=gnu2x. Unnamed parameters in function definitions are valid.

Files:

modules/infra/control/worker.c
modules/infra/control/port.c
modules/infra/datapath/port_tx.c
modules/infra/datapath/gr_rxtx.h
modules/infra/control/gr_port.h
modules/infra/control/graph.c

🧠 Learnings (5)

📚 Learning: 2025-11-05T14:28:28.817Z

Learnt from: rjarry
Repo: DPDK/grout PR: 372
File: modules/infra/datapath/loop_input.c:53-56
Timestamp: 2025-11-05T14:28:28.817Z
Learning: In modules/infra/datapath/loop_input.c, the l3_edges array is intentionally indexed using network byte order (rte_be16_t) for EtherType values. Both loopback_input_add_type() registration and loopback_input_process() lookup use network byte order consistently, so no byte order conversion is needed when accessing l3_edges.

Applied to files:

modules/infra/datapath/port_tx.c
modules/infra/control/gr_port.h

📚 Learning: 2025-11-03T13:28:19.489Z

Learnt from: rjarry
Repo: DPDK/grout PR: 379
File: modules/infra/datapath/port_tx.c:36-46
Timestamp: 2025-11-03T13:28:19.489Z
Learning: In DPDK graph node process callbacks, the return value is only used for statistics and does not affect packet flow or scheduling through the graph. Nodes can return 0 when they haven't processed packets (e.g., when dropping or redirecting due to port state).

Applied to files:

modules/infra/datapath/port_tx.c
modules/infra/control/graph.c

📚 Learning: 2025-12-17T17:32:21.746Z

Learnt from: christophefontaine
Repo: DPDK/grout PR: 466
File: modules/srv6/datapath/l2_encap.c:26-32
Timestamp: 2025-12-17T17:32:21.746Z
Learning: The GR_MBUF_PRIV_DATA_TYPE macro in modules/infra/datapath/gr_mbuf.h automatically adds `const struct iface *iface` as the first field to any structure defined with it. All types defined using this macro (e.g., srv6_dx2_mbuf_data, mbuf_data, queue_mbuf_data) will have the iface field available, followed by any custom fields provided as the second macro argument.

Applied to files:

modules/infra/datapath/port_tx.c
modules/infra/control/gr_port.h

📚 Learning: 2025-09-05T08:58:53.047Z

Learnt from: rjarry
Repo: DPDK/grout PR: 294
File: modules/policy/control/conntrack.c:533-539
Timestamp: 2025-09-05T08:58:53.047Z
Learning: DPDK functions rte_mempool_free() and rte_hash_free() are NULL-safe - they explicitly handle NULL pointers and do nothing when passed NULL, unlike many C standard library functions.

Applied to files:

modules/infra/control/gr_port.h

📚 Learning: 2025-09-09T12:08:36.790Z

Learnt from: rjarry
Repo: DPDK/grout PR: 294
File: modules/policy/control/conntrack.c:621-637
Timestamp: 2025-09-09T12:08:36.790Z
Learning: DPDK's rte_hash supports safe deletion during iteration via rte_hash_iterate, unlike many other hash table implementations that require two-phase deletion patterns.

Applied to files:

modules/infra/control/gr_port.h

🧬 Code graph analysis (3)

modules/infra/control/worker.c (1)

main/gr_vec.h (1)

gr_vec_len (47-51)

modules/infra/control/port.c (1)

api/gr_errno.h (1)

errno_set (10-13)

modules/infra/control/graph.c (2)

modules/infra/control/worker_test.c (1)

q (150-157)

main/gr_vec.h (1)

gr_vec_len (47-51)

🪛 Cppcheck (2.19.0)

modules/infra/control/worker.c

[error] 231-231: Expression 'worker->txqs?free(__gr_vec_hdr(worker->txqs))

(unknownEvaluationOrder)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: build-and-tests (clang-15, none, debugoptimized, ubuntu-22.04, false)
GitHub Check: build-and-tests (gcc-14, address, debug, ubuntu-24.04, -Dfrr=enabled, true)
GitHub Check: build-and-tests (gcc-13, none, debugoptimized, ubuntu-24.04, -Dfrr=enabled, false)
GitHub Check: deb
GitHub Check: rpm

🔇 Additional comments (1)

modules/infra/control/worker.c (1)

230-232: The macro gr_vec_free(v) uses the comma operator to sequence operations: ((v) ? free(__gr_vec_hdr(v)) : (void)0, (v) = NULL). The comma operator is a sequence point, ensuring the conditional free is fully evaluated before the assignment. No undefined behavior exists when passing a member lvalue like worker->txqs. Cppcheck's report is a false positive.

Likely an incorrect or invalid review comment.

modules/infra/control/port.c

modules/infra/control/worker.c

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In @modules/infra/control/worker.c:
- Around line 225-248: The assign_txqs function can divide by zero when
computing queue_id with txq_idx % port->n_txq; fix it by checking port->n_txq
before using the modulus: inside assign_txqs (and the inner STAILQ_FOREACH loop
over worker), if port->n_txq == 0 either skip creating/adding the txq for that
port (recommended) or set txq.queue_id = 0 and txq.enabled = false to avoid the
modulus; adjust the code that creates the struct queue_map (port->n_txq,
txq_idx, queue_id, enabled) accordingly so no modulo by zero occurs.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between fd0451d and 4a3a05d.

📒 Files selected for processing (6)

modules/infra/control/gr_port.h
modules/infra/control/graph.c
modules/infra/control/port.c
modules/infra/control/worker.c
modules/infra/datapath/gr_rxtx.h
modules/infra/datapath/port_tx.c

🧰 Additional context used

📓 Path-based instructions (1)

**/*.{c,h}

⚙️ CodeRabbit configuration file

**/*.{c,h}: - gr_vec_*() functions cannot fail. No need to check their return value.

gr_vec_free(x) always sets x = NULL. There is no risk of double free.

ec_node_*() functions consume their ec_node arguments. No leaks on error.

rte_node->ctx is an uint8_t array of size 16, not a pointer.

Never suggest to replace assert() with graceful error checking.

We compile with -std=gnu2x. Unnamed parameters in function definitions are valid.

Files:

modules/infra/control/worker.c
modules/infra/datapath/port_tx.c
modules/infra/control/port.c
modules/infra/control/gr_port.h
modules/infra/datapath/gr_rxtx.h
modules/infra/control/graph.c

🧠 Learnings (5)

📚 Learning: 2025-11-05T14:28:28.817Z

Learnt from: rjarry
Repo: DPDK/grout PR: 372
File: modules/infra/datapath/loop_input.c:53-56
Timestamp: 2025-11-05T14:28:28.817Z
Learning: In modules/infra/datapath/loop_input.c, the l3_edges array is intentionally indexed using network byte order (rte_be16_t) for EtherType values. Both loopback_input_add_type() registration and loopback_input_process() lookup use network byte order consistently, so no byte order conversion is needed when accessing l3_edges.

Applied to files:

modules/infra/datapath/port_tx.c
modules/infra/control/gr_port.h

📚 Learning: 2025-11-03T13:28:19.489Z

Learnt from: rjarry
Repo: DPDK/grout PR: 379
File: modules/infra/datapath/port_tx.c:36-46
Timestamp: 2025-11-03T13:28:19.489Z
Learning: In DPDK graph node process callbacks, the return value is only used for statistics and does not affect packet flow or scheduling through the graph. Nodes can return 0 when they haven't processed packets (e.g., when dropping or redirecting due to port state).

Applied to files:

modules/infra/datapath/port_tx.c
modules/infra/control/graph.c

📚 Learning: 2025-12-17T17:32:21.746Z

Learnt from: christophefontaine
Repo: DPDK/grout PR: 466
File: modules/srv6/datapath/l2_encap.c:26-32
Timestamp: 2025-12-17T17:32:21.746Z
Learning: The GR_MBUF_PRIV_DATA_TYPE macro in modules/infra/datapath/gr_mbuf.h automatically adds `const struct iface *iface` as the first field to any structure defined with it. All types defined using this macro (e.g., srv6_dx2_mbuf_data, mbuf_data, queue_mbuf_data) will have the iface field available, followed by any custom fields provided as the second macro argument.

Applied to files:

modules/infra/datapath/port_tx.c

📚 Learning: 2025-09-05T08:58:53.047Z

Learnt from: rjarry
Repo: DPDK/grout PR: 294
File: modules/policy/control/conntrack.c:533-539
Timestamp: 2025-09-05T08:58:53.047Z
Learning: DPDK functions rte_mempool_free() and rte_hash_free() are NULL-safe - they explicitly handle NULL pointers and do nothing when passed NULL, unlike many C standard library functions.

Applied to files:

modules/infra/control/gr_port.h

📚 Learning: 2025-09-09T12:08:36.790Z

Learnt from: rjarry
Repo: DPDK/grout PR: 294
File: modules/policy/control/conntrack.c:621-637
Timestamp: 2025-09-09T12:08:36.790Z
Learning: DPDK's rte_hash supports safe deletion during iteration via rte_hash_iterate, unlike many other hash table implementations that require two-phase deletion patterns.

Applied to files:

modules/infra/control/gr_port.h

🧬 Code graph analysis (2)

modules/infra/control/port.c (1)

api/gr_errno.h (1)

errno_set (10-13)

modules/infra/control/graph.c (2)

modules/infra/control/worker_test.c (1)

q (150-157)

main/gr_vec.h (1)

gr_vec_len (47-51)

🪛 Cppcheck (2.19.0)

modules/infra/control/worker.c

[error] 231-231: Expression 'worker->txqs?free(__gr_vec_hdr(worker->txqs))

(unknownEvaluationOrder)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)

GitHub Check: build-and-tests (gcc-13, none, debugoptimized, ubuntu-24.04, -Dfrr=enabled, false)
GitHub Check: build-and-tests (gcc-14, address, debug, ubuntu-24.04, -Dfrr=enabled, true)
GitHub Check: build-and-tests (clang-15, none, debugoptimized, ubuntu-22.04, false)
GitHub Check: deb
GitHub Check: rpm

🔇 Additional comments (1)

modules/infra/control/port.c (1)

94-126: No fix needed. DPDK's ethdev layer guarantees that info.max_tx_queues will not exceed RTE_MAX_QUEUES_PER_PORT, so p->n_txq cannot exceed the array bounds of txq_locks[RTE_MAX_QUEUES_PER_PORT].

modules/infra/control/worker.c

maxime-leroy · 2026-01-12T11:11:14Z

modules/infra/datapath/port_tx.c

 	}

-	tx_ok = rte_eth_tx_burst(ctx->port_id, ctx->queue_id, mbufs, nb_objs);
+	if (ctx->lock != NULL)


it's not possible to have a dedicated function tx_process_locked to avoid the if (ctx->lock != NULL).

TX_NODE_BASE_LOCKED et TX_NODE_BASE.

It will avoid two branch in the hot path.

maxime-leroy · 2026-01-12T11:13:53Z

modules/infra/control/port.c

+		// force number of TX queues equal to requested RX queues
+		p->n_txq = p->n_rxq;
+		if (p->n_txq > info.max_tx_queues)
+			return errno_set(EOVERFLOW);


Why not a tx queue with N virtio ring ?

maxime-leroy · 2026-01-12T11:14:32Z

modules/infra/control/port.c

 	}

+	if (p->n_rxq > info.max_rx_queues)
+		return errno_set(EOVERFLOW);


with virtio ?

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

modules/infra/control/graph.c (1)
196-213: TX queue user count may include disabled queues.

The loop counting txq_users doesn't check q->enabled, so disabled queue mappings are counted. This won't cause data races (overcounting is safe—just adds an unnecessary lock), but the warning log may be misleading.

If this is intentional (defensive locking), no action needed. Otherwise:
🔧 Optional: filter by enabled queues
 		gr_vec_foreach_ref (struct queue_map *q, w->txqs) {
-			if (q->port_id == qmap->port_id && q->queue_id == qmap->queue_id)
+			if (q->enabled && q->port_id == qmap->port_id && q->queue_id == qmap->queue_id)
 				txq_users++;
 		}

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 1c78547 and 2f697ac.

📒 Files selected for processing (9)

modules/infra/control/gr_port.h
modules/infra/control/graph.c
modules/infra/control/port.c
modules/infra/control/worker.c
modules/infra/control/worker_test.c
modules/infra/datapath/gr_rxtx.h
modules/infra/datapath/port_tx.c
subprojects/dpdk.wrap
subprojects/packagefiles/dpdk/iavf-fix-reported-max-TX-and-RX-queues-in-ethdev-inf.patch

✅ Files skipped from review due to trivial changes (1)

subprojects/dpdk.wrap

🧰 Additional context used

📓 Path-based instructions (1)

**/*.{c,h}

⚙️ CodeRabbit configuration file

**/*.{c,h}: - gr_vec_*() functions cannot fail. No need to check their return value.

gr_vec_free(x) always sets x = NULL. There is no risk of double free.

ec_node_*() functions consume their ec_node arguments. No leaks on error.

rte_node->ctx is an uint8_t array of size 16, not a pointer.

Never suggest to replace assert() with graceful error checking.

We compile with -std=gnu2x. Unnamed parameters in function definitions are valid.

Files:

modules/infra/control/gr_port.h
modules/infra/datapath/port_tx.c
modules/infra/control/port.c
modules/infra/control/graph.c
modules/infra/control/worker_test.c
modules/infra/control/worker.c
modules/infra/datapath/gr_rxtx.h

🧠 Learnings (5)

📚 Learning: 2025-09-05T08:58:53.047Z

Learnt from: rjarry
Repo: DPDK/grout PR: 294
File: modules/policy/control/conntrack.c:533-539
Timestamp: 2025-09-05T08:58:53.047Z
Learning: DPDK functions rte_mempool_free() and rte_hash_free() are NULL-safe - they explicitly handle NULL pointers and do nothing when passed NULL, unlike many C standard library functions.

Applied to files:

modules/infra/control/gr_port.h

📚 Learning: 2025-11-05T14:28:28.817Z

Learnt from: rjarry
Repo: DPDK/grout PR: 372
File: modules/infra/datapath/loop_input.c:53-56
Timestamp: 2025-11-05T14:28:28.817Z
Learning: In modules/infra/datapath/loop_input.c, the l3_edges array is intentionally indexed using network byte order (rte_be16_t) for EtherType values. Both loopback_input_add_type() registration and loopback_input_process() lookup use network byte order consistently, so no byte order conversion is needed when accessing l3_edges.

Applied to files:

modules/infra/control/gr_port.h
modules/infra/datapath/port_tx.c

📚 Learning: 2025-09-09T12:08:36.790Z

Learnt from: rjarry
Repo: DPDK/grout PR: 294
File: modules/policy/control/conntrack.c:621-637
Timestamp: 2025-09-09T12:08:36.790Z
Learning: DPDK's rte_hash supports safe deletion during iteration via rte_hash_iterate, unlike many other hash table implementations that require two-phase deletion patterns.

Applied to files:

modules/infra/control/gr_port.h

📚 Learning: 2025-11-03T13:28:19.489Z

Learnt from: rjarry
Repo: DPDK/grout PR: 379
File: modules/infra/datapath/port_tx.c:36-46
Timestamp: 2025-11-03T13:28:19.489Z
Learning: In DPDK graph node process callbacks, the return value is only used for statistics and does not affect packet flow or scheduling through the graph. Nodes can return 0 when they haven't processed packets (e.g., when dropping or redirecting due to port state).

Applied to files:

modules/infra/datapath/port_tx.c
modules/infra/control/graph.c

📚 Learning: 2025-12-17T17:32:21.746Z

Learnt from: christophefontaine
Repo: DPDK/grout PR: 466
File: modules/srv6/datapath/l2_encap.c:26-32
Timestamp: 2025-12-17T17:32:21.746Z
Learning: The GR_MBUF_PRIV_DATA_TYPE macro in modules/infra/datapath/gr_mbuf.h automatically adds `const struct iface *iface` as the first field to any structure defined with it. All types defined using this macro (e.g., srv6_dx2_mbuf_data, mbuf_data, queue_mbuf_data) will have the iface field available, followed by any custom fields provided as the second macro argument.

Applied to files:

modules/infra/datapath/port_tx.c

🧬 Code graph analysis (2)

modules/infra/control/port.c (1)

api/gr_errno.h (1)

errno_set (10-13)

modules/infra/control/worker_test.c (1)

modules/infra/control/worker.c (2)

worker_rxq_assign (251-312)

worker_count (125-133)

🪛 Cppcheck (2.19.0)

modules/infra/control/worker.c

[error] 231-231: Expression 'worker->txqs?free(__gr_vec_hdr(worker->txqs))

(unknownEvaluationOrder)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (7)

GitHub Check: build-and-tests (clang-16, none, debugoptimized, ubuntu-24.04, false)
GitHub Check: build-and-tests (clang-18, none, debugoptimized, ubuntu-24.04, false)
GitHub Check: build-and-tests (gcc-13, none, debugoptimized, ubuntu-24.04, -Dfrr=enabled, false)
GitHub Check: build-and-tests (clang-15, none, debugoptimized, ubuntu-22.04, false)
GitHub Check: build-and-tests (gcc-14, address, debug, ubuntu-24.04, -Dfrr=enabled, true)
GitHub Check: deb
GitHub Check: rpm

🔇 Additional comments (10)

modules/infra/datapath/port_tx.c (1)

57-63: LGTM! Conditional locking for shared TX queues is correctly implemented.

The lock-only-when-shared pattern avoids spinlock overhead for exclusively-owned queues while properly protecting concurrent access when multiple workers share a TX queue.

subprojects/packagefiles/dpdk/iavf-fix-reported-max-TX-and-RX-queues-in-ethdev-inf.patch (1)

38-44: LGTM! Correctly caps max queue reporting based on VF capability.

The fix properly checks VIRTCHNL_VF_LARGE_NUM_QPAIRS capability before reporting 256 queues, falling back to 16 when the PF doesn't support large VF.

modules/infra/control/gr_port.h (1)

36-36: LGTM! Per-TX-queue spinlock array for shared queue synchronization.

The array size matches DPDK's RTE_MAX_QUEUES_PER_PORT, ensuring coverage for all possible queue configurations.

modules/infra/control/port.c (1)

94-118: LGTM! Device-aware TX queue configuration with proper driver quirks handling.

The implementation correctly:

Caps TX queues to device maximum

Handles net_tap/net_virtio requirement for equal RX/TX counts

Logs a notice when queue count is reduced

Initializes per-queue spinlocks for the sharing mechanism

modules/infra/datapath/gr_rxtx.h (1)

21-35: LGTM! Clean separation of TX node context with optional locking.

The tx_node_ctx structure properly encapsulates the TX queue info and lock pointer, enabling conditional locking only for shared queues (when lock != NULL).

modules/infra/control/worker_test.c (1)

408-412: LGTM! Test correctly validates TX queue sharing via modulo distribution.

With 5 workers and 4 TX queues per port, the modulo assignment causes workers to share TXQ 0 (w3 and w4), which is the expected behavior protected by the spinlocks in port_tx.c.

modules/infra/control/worker.c (1)

225-249: LGTM! Clean TX queue distribution with proper sharing semantics.

The implementation correctly:

Clears all TXQ assignments before redistribution

Only assigns TXQs to workers that have RXQs (workers without RXQs won't transmit)

Uses modulo assignment to distribute workers across available TXQs

The static analysis warning on line 231 is a false positive - per coding guidelines, gr_vec_free() is safe and sets the pointer to NULL.

modules/infra/control/graph.c (3)

77-87: LGTM!

Helper function cleanly encapsulates port lookup logic. The assert is appropriate here since callers should only pass valid port IDs from the existing queue mappings.

161-168: LGTM!

Proper initialization of TX node contexts to invalid values prevents accidental transmission on unassigned queues.

215-223: LGTM!

Edge wiring correctly maps port_id to the corresponding TX node edge. The index is bounded by DPDK's port limits.

modules/infra/control/graph.c

This fix has been submitted upstream. Link: https://patches.dpdk.org/project/dpdk/patch/20260109142129.1550058-2-rjarry@redhat.com/ Signed-off-by: Robin Jarry <rjarry@redhat.com>

Replace worker_txq_id() with worker_txq_distribute() which clears all TX queue assignments and reassigns them only to workers that have at least one RX queue. Workers without RX queues do not process packets and therefore have no need for TX queues. Call worker_txq_distribute() from both worker_queue_distribute() during initial setup and worker_rxq_assign() when RX queues are moved at runtime. Since worker_rxq_assign() now calls worker_graph_reload_all() instead of reloading individual workers, remove the now-redundant worker_graph_reload() call from worker_queue_distribute(). To prevent two workers from transmitting on the same TX queue during graph reloads, modify worker_graph_reload_all() to first stop all workers by reloading them with NULL ports, then reload with the actual configuration. Adjust worker_graph_new() to treat NULL ports as a request to create a NULL graph. Update unit tests to expect empty TX queue assignments for workers without RX queues. Signed-off-by: Robin Jarry <rjarry@redhat.com>

Replace direct use of port_queue as the TX node context with a new tx_node_ctx structure that wraps port_queue as a member. This allows extending the TX node context with additional fields in future commits. Convert port_queue from a GR_NODE_CTX_TYPE macro to a plain struct since it is no longer used directly as a node context. It remains used for packet tracing where only port_id and queue_id are recorded. Signed-off-by: Robin Jarry <rjarry@redhat.com>

Extract the port lookup loop into a dedicated find_port() function. This will be reused in the next commit when configuring TX node context. Signed-off-by: Robin Jarry <rjarry@redhat.com>

Some network devices report fewer TX queues than the number of datapath workers. Cap the TX queue count to the device maximum and distribute workers across available queues using modulo assignment. When multiple workers share the same TX queue, protect rte_eth_tx_burst() with a per-queue spinlock stored in iface_info_port. The lock pointer in tx_node_ctx is only set for shared queues to avoid locking overhead when a worker has exclusive access. Update unit tests to set max_tx_queues in the mock device info and adjust expected TX queue assignments for modulo wrapping. Signed-off-by: Robin Jarry <rjarry@redhat.com>

Unlike TX queues which are silently capped to the device maximum, RX queues are explicitly requested by the user. Return EOVERFLOW if the requested count exceeds what the device supports. Signed-off-by: Robin Jarry <rjarry@redhat.com>

The net_tap and net_virtio drivers require equal RX and TX queue counts. Set TX queue count to match RX queues for these drivers. This may result in fewer TX queues than workers, which is now handled by the TX queue sharing support. Signed-off-by: Robin Jarry <rjarry@redhat.com>

coderabbitai bot reviewed Jan 9, 2026

View reviewed changes

modules/infra/control/port.c Show resolved Hide resolved

modules/infra/control/worker.c Show resolved Hide resolved

rjarry force-pushed the txq-share branch from fd0451d to 4a3a05d Compare January 9, 2026 17:55

coderabbitai bot reviewed Jan 9, 2026

View reviewed changes

modules/infra/control/worker.c Outdated Show resolved Hide resolved

rjarry force-pushed the txq-share branch from 4a3a05d to 1c78547 Compare January 9, 2026 18:41

maxime-leroy reviewed Jan 12, 2026

View reviewed changes

modules/infra/control/port.c

}

if (p->n_rxq > info.max_rx_queues)

return errno_set(EOVERFLOW);

Copy link

Collaborator

maxime-leroy Jan 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with virtio ?

rjarry force-pushed the txq-share branch from 1c78547 to 2f697ac Compare January 12, 2026 13:45

coderabbitai bot reviewed Jan 12, 2026

View reviewed changes

modules/infra/control/graph.c Outdated Show resolved Hide resolved

dpdk: add patch to fix iavf max queues

a7dfd26

This fix has been submitted upstream. Link: https://patches.dpdk.org/project/dpdk/patch/20260109142129.1550058-2-rjarry@redhat.com/ Signed-off-by: Robin Jarry <rjarry@redhat.com>

rjarry force-pushed the txq-share branch from 2f697ac to 9c70405 Compare January 12, 2026 15:48

rjarry added 6 commits January 12, 2026 17:10

infra: factor out find_port helper in graph.c

a66b344

Extract the port lookup loop into a dedicated find_port() function. This will be reused in the next commit when configuring TX node context. Signed-off-by: Robin Jarry <rjarry@redhat.com>

rjarry force-pushed the txq-share branch from 9c70405 to 4ffc11d Compare January 12, 2026 16:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

infra: add TX queue sharing support #474

infra: add TX queue sharing support #474

Uh oh!

rjarry commented Jan 9, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Jan 9, 2026 •

edited

Loading

Walkthrough

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

maxime-leroy Jan 12, 2026

Uh oh!

maxime-leroy Jan 12, 2026

Uh oh!

maxime-leroy Jan 12, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

infra: add TX queue sharing support #474

Are you sure you want to change the base?

infra: add TX queue sharing support #474

Uh oh!

Conversation

rjarry commented Jan 9, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Jan 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

maxime-leroy Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

maxime-leroy Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

maxime-leroy Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rjarry commented Jan 9, 2026 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Jan 9, 2026 •

edited

Loading