[PWCI] "net/mlx5: future HW devargs defaults and fixes"#619
[PWCI] "net/mlx5: future HW devargs defaults and fixes"#619
Conversation
SWS (software steering) will be disabled on future hardware generations. This fix should check for SWS capability bits and return relevant error. Update the defaults for the dv_flow_en and allow_duplicate_pattern devargs accordingly. - Default dv_flow_en devarg value will be chosen based on whether NIC supports SW steering and/or HW steering. - If DV flow is not supported and allow_duplicate_pattern is set by the user, forcibly disable it and emit a clear log message. This change improves reliability by ensuring only valid configurations are applied, and provides clear feedback to the user when fallbacks are triggered. Fixes: 1b55eeb ("common/mlx5: add ConnectX-9 SuperNIC") Cc: stable@dpdk.org Signed-off-by: Maayan Kashani <mkashani@nvidia.com> Signed-off-by: 0-day Robot <robot@bytheb.org>
Commit [1] has changed the default behavior of flow engine selection
in mlx5 PMD to accommodate for new NIC generations.
Whenever underlying device does not support SWS (e.g., ConnectX-9
or untrusted VFs/SFs) and device does support HWS,
default flow engine would be HWS (dv_flow_en=2) which also supports
sync flow API.
This behavior change had consequence in memory usage whenever
SFs are probed by DPDK. In default HWS configuration supporting
sync flow API (i.e. without calling rte_flow_configure())
mlx5 PMD allocated 4 rte_ring objects per port:
- indir_iq and indir_cq - For handling indirect action completions.
- flow_transfer_pending and flow_transfer_completed - For handling
template table resizing.
This has not happened previously with SWS as default flow engine.
Since a dedicated memzone is allocated for each rte_ring object,
this lead to exhaustion of default memzone limit
on setups with ~1K SFs to probe.
It resulted in the following error on port start:
EAL: memzone_reserve_aligned_thread_unsafe():
Number of requested memzone segments exceeds maximum 2560
RING: Cannot reserve memory
mlx5_net: Failed to start port 998 mlx5_core.sf.998:
fail to configure port
Since template table resizing is allowed if and only if
async flow API was configured, 2 of the aforementioned rings
are never used in the default sync flow API configuration.
This patch removes allocation of flow_transfer_pending and
flow_transfer_completed rings in default sync flow API configuration
of mlx5 PMD to reduce memzone usage and allow DPDK probing
to succeed on setups with ~1K SFs to probe.
[1] commit d1ac7b6
("net/mlx5: update flow devargs handling for future HW")
Fixes: 27d171b ("net/mlx5: abstract flow action and enable reconfigure")
Cc: stable@dpdk.org
Signed-off-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
Signed-off-by: 0-day Robot <robot@bytheb.org>
HWS pattern template creation tries to build a table with the tested items after basic verifications to check if the pattern is valid. Time consumed in that table creation can be critical for applications that require fast PMD initialization. The patch separates pattern templates to internal and external. Internal templates are created by the PMD and are considered safe and can skip some validations. Pattern templates provided by applications will be fully validated. Fixes: a190f25 ("net/mlx5: improve pattern template validation") Cc: stable@dpdk.org Signed-off-by: Gregory Etelson <getelson@nvidia.com> Signed-off-by: Maayan Kashani <mkashani@nvidia.com> Signed-off-by: 0-day Robot <robot@bytheb.org>
When promiscuous mode is enabled, the device receives all traffic regardless of destination MAC address. Previously, the code was setting both the promiscuous flag AND the DMAC/multicast control flow rules, which is redundant. This patch makes the DMAC and multicast/broadcast control flow rules conditional on NOT being in promiscuous mode. When promiscuous mode is enabled, only the MLX5_CTRL_PROMISCUOUS flag is set. Fixes: 9fa7c1c ("net/mlx5: create control flow rules with HWS") Cc: stable@dpdk.org Signed-off-by: Maayan Kashani <mkashani@nvidia.com> Signed-off-by: 0-day Robot <robot@bytheb.org>
Reviewer's Guidemlx5 HW steering (HWS) defaults and capabilities handling are updated: HWS-specific rings are now optional and asserted only for async APIs, pattern template creation distinguishes between internal and external templates for validation, device arguments and capabilities now derive the default flow engine (DV/SW/HWS) and allow_duplicate_pattern behavior from HCA capabilities and user kvargs, and new capability bits for SW-owner flow tables and steering logic formats are wired in; minor control-flow fixes are added in HWS traffic enablement. Sequence diagram for internal vs external pattern template creation and validationsequenceDiagram
participant Caller as Caller
participant Ops as mlx5_flow_driver_ops
participant ExtCreate as flow_hw_external_pattern_template_create
participant PTCreate as flow_hw_pattern_template_create
participant Dev as rte_eth_dev
participant Priv as mlx5_priv
participant IT as mlx5_flow_hw_itt_entry
Caller->>Ops: pattern_template_create(dev, attr, items, error)
Ops->>ExtCreate: flow_hw_external_pattern_template_create(dev, attr, items, error)
ExtCreate->>PTCreate: flow_hw_pattern_template_create(dev, attr, items, external=true, error)
PTCreate->>Dev: access dev->data->dev_private
Dev-->>PTCreate: priv
PTCreate->>Priv: allocate and init IT
PTCreate->>IT: refcnt++
alt external template
PTCreate->>Priv: pattern_template_validate(dev, &IT, 1, error)
alt validation fails
PTCreate-->>ExtCreate: NULL (error path)
ExtCreate-->>Ops: NULL
Ops-->>Caller: error
else validation succeeds
PTCreate->>Priv: LIST_INSERT_HEAD(flow_hw_itt, IT)
PTCreate-->>ExtCreate: IT
ExtCreate-->>Ops: IT
Ops-->>Caller: IT
end
end
Caller->>PTCreate: internal callers
Caller->>PTCreate: flow_hw_create_*_pattern_template(dev, attr, items, error)
PTCreate->>PTCreate: called with external=false
PTCreate->>IT: refcnt++
PTCreate->>Priv: skip pattern_template_validate
PTCreate->>Priv: LIST_INSERT_HEAD(flow_hw_itt, IT)
PTCreate-->>Caller: IT
Updated class diagram for mlx5_hca_attr and flow configurationclassDiagram
class mlx5_hca_attr {
uint8_t rx_sw_owner
uint8_t rx_sw_owner_v2
uint8_t tx_sw_owner
uint8_t tx_sw_owner_v2
uint8_t esw_sw_owner
uint8_t esw_sw_owner_v2
uint64_t system_image_guid
uint8_t log_max_conn_track_offload
}
class mlx5_dev_shared_config {
uint8_t dv_esw_en
uint8_t dv_flow_en
uint8_t decap_en
uint8_t allow_duplicate_pattern
uint8_t fdb_def_rule
struct cnt_svc_conf cnt_svc
}
class mlx5_dev_ctx_shared {
struct mlx5_common_device *cdev
struct mlx5_dev_shared_config config
struct dev_cap dev_cap
}
class mlx5_common_device {
struct mlx5_device_config config
}
class mlx5_device_config {
struct mlx5_hca_attr hca_attr
uint8_t devx
}
class mlx5_kvargs_ctrl {
struct rte_kvargs *kvlist
bool is_used[]
bool mlx5_kvargs_is_used(key)
}
class mlx5_flow_driver_ops {
pattern_template_create(dev, attr, items, error)
}
class flow_hw_pattern_template_create {
flow_hw_pattern_template_create(dev, attr, items, external, error)
}
class flow_hw_external_pattern_template_create {
flow_hw_external_pattern_template_create(dev, attr, items, error)
}
mlx5_dev_ctx_shared --> mlx5_common_device : has
mlx5_common_device --> mlx5_device_config : has
mlx5_device_config --> mlx5_hca_attr : has
mlx5_dev_ctx_shared --> mlx5_dev_shared_config : config
mlx5_kvargs_ctrl --> mlx5_dev_shared_config : influences
mlx5_flow_driver_ops --> flow_hw_external_pattern_template_create : uses
flow_hw_external_pattern_template_create --> flow_hw_pattern_template_create : delegates
class MLX5_STEERING_LOGIC_FORMAT {
<<enumeration>>
MLX5_STEERING_LOGIC_FORMAT_CONNECTX_5 = 0
MLX5_STEERING_LOGIC_FORMAT_CONNECTX_6DX = 1
MLX5_STEERING_LOGIC_FORMAT_CONNECTX_7 = 2
MLX5_STEERING_LOGIC_FORMAT_CONNECTX_8 = 3
}
Flow diagram for HWS queue ring setup with optional async ringsflowchart TD
A[flow_hw_queue_setup_rings
dev, queue, queue_size, nt_mode] --> B[MLX5_ASSERT hw_q container allocated]
B --> C[create indir_cq ring]
C -->|fail| E1[log error indir_act_cq
return -ENOMEM]
C -->|success| D[create indir_iq ring]
D -->|fail| E2[log error indir_act_iq
return -ENOMEM]
D -->|success| NT{nt_mode == true?}
NT -->|true| R0[return 0
only indirect rings needed for sync API]
NT -->|false| F[create flow_transfer_pending ring]
F -->|fail| E3[log error tx_pending
return -ENOMEM]
F -->|success| G[create flow_transfer_completed ring]
G -->|fail| E4[log error tx_done
return -ENOMEM]
G -->|success| R[return 0]
subgraph Usage_of_optional_rings
U1[__flow_hw_push_action] --> U2{flow_transfer_pending and
flow_transfer_completed not NULL?}
U2 -->|true| U3[mlx5_hw_push_queue
flow_transfer_pending,
flow_transfer_completed]
U2 -->|false| U4[skip push]
U5[mlx5_hw_pull_flow_transfer_comp] --> U6{flow_transfer_completed ring NULL?}
U6 -->|true| U7[return 0]
U6 -->|false| U8[dequeue results from ring]
U9[flow_hw_update_resized async only] --> U10[MLX5_ASSERT
flow_transfer_pending not NULL]
U10 --> U11[MLX5_ASSERT
flow_transfer_completed not NULL]
end
File-Level Changes
Tips and commandsInteracting with Sourcery
Customizing Your ExperienceAccess your dashboard to:
Getting Help
|
📝 WalkthroughWalkthroughThe pull request extends MLX5 driver support for hardware steering capabilities by refining device capability detection, adding new HCA attribute fields with fallback mechanisms, introducing steering logic format enums, improving pattern template creation with external flag support, and enhancing configuration logic for DV flow steering with capability-driven defaults. Changes
Sequence Diagram(s)sequenceDiagram
participant Config as Driver Configuration
participant DevX as DevX Interface
participant Device as MLX5 Device
participant User as User Configuration
User->>Config: Provide kvargs (optional dv_flow_en)
Config->>DevX: Query HCA capabilities
DevX->>Device: Get flow table capabilities
Device-->>DevX: Return capability flags
DevX-->>Config: Return hws_supported, sws_supported status
Config->>Config: Determine dv_flow_en default
alt SWS Supported
Config->>Config: dv_flow_en = 1 (SW/DV)
else HWS Supported Only
Config->>Config: dv_flow_en = 2 (HWS)
else Neither Supported
Config->>Config: dv_flow_en = 0
end
alt dv_flow_en in HWS mode
Config->>Config: Disable allow_duplicate_pattern
Config->>User: Warn about disabled duplicates
else dv_flow_en in SW mode
Config->>Config: Default allow_duplicate_pattern = 1
end
Config-->>User: Return configuration
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~35 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing touches
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Hey - I've found 2 issues, and left some high level feedback:
- In mlx5_traffic_enable_hws(), the new logic no longer sets MLX5_CTRL_DMAC when the port is in promiscuous mode, changing previous behavior; please double-check whether DMAC should still be enabled in promiscuous mode or clarify the intended steering semantics.
- The dv_flow_en field is now used with multiple semantic values (0/1/2, with 2 meaning HWS mode); consider introducing an enum or named constants and using them consistently instead of bare integers to make the new HWS/SWS selection logic clearer and less error-prone.
Prompt for AI Agents
Please address the comments from this code review:
## Overall Comments
- In mlx5_traffic_enable_hws(), the new logic no longer sets MLX5_CTRL_DMAC when the port is in promiscuous mode, changing previous behavior; please double-check whether DMAC should still be enabled in promiscuous mode or clarify the intended steering semantics.
- The dv_flow_en field is now used with multiple semantic values (0/1/2, with 2 meaning HWS mode); consider introducing an enum or named constants and using them consistently instead of bare integers to make the new HWS/SWS selection logic clearer and less error-prone.
## Individual Comments
### Comment 1
<location> `drivers/net/mlx5/mlx5_trigger.c:1685-1694` </location>
<code_context>
goto error;
}
- if (dev->data->promiscuous)
+ if (dev->data->promiscuous) {
flags |= MLX5_CTRL_PROMISCUOUS;
- if (dev->data->all_multicast)
- flags |= MLX5_CTRL_ALL_MULTICAST;
- else
- flags |= MLX5_CTRL_BROADCAST | MLX5_CTRL_IPV4_MULTICAST | MLX5_CTRL_IPV6_MULTICAST;
- flags |= MLX5_CTRL_DMAC;
+ } else {
+ if (dev->data->all_multicast)
+ flags |= MLX5_CTRL_ALL_MULTICAST;
+ else
+ flags |= (MLX5_CTRL_BROADCAST |
+ MLX5_CTRL_IPV4_MULTICAST |
+ MLX5_CTRL_IPV6_MULTICAST);
+ flags |= MLX5_CTRL_DMAC;
+ }
if (priv->vlan_filter_n)
</code_context>
<issue_to_address>
**issue (bug_risk):** Revised flag logic in mlx5_traffic_enable_hws drops DMAC/multicast flags when promiscuous is enabled, which may change behavior.
Previously, `MLX5_CTRL_DMAC` (and multicast/broadcast flags based on `all_multicast`) were set regardless of `promiscuous`. With the new logic, when `dev->data->promiscuous` is true, only `MLX5_CTRL_PROMISCUOUS` is set and DMAC/multicast/broadcast are never enabled. If FW or downstream logic still relies on `MLX5_CTRL_DMAC` or multicast/broadcast flags even in promiscuous mode, some traffic may be dropped. Consider retaining `MLX5_CTRL_DMAC` (and possibly multicast/broadcast flags) in promiscuous mode unless there is a clear requirement not to.
</issue_to_address>
### Comment 2
<location> `doc/guides/nics/mlx5.rst:701` </location>
<code_context>
- It is configured by default to 1 (DV flow steering) if supported.
- Otherwise, the value is 0 which indicates legacy Verbs flow offloading.
+ By default, the PMD will set this value according to capability.
+ If DV flow steering is supported, it will be set to 1.
+ If DV flow steering is not supported and HW steering is supported,
</code_context>
<issue_to_address>
**suggestion (typo):** Consider changing "according to capability" to "according to capabilities" for more natural wording.
Using "according to capabilities" (or "according to the device capabilities") would be more idiomatic and clearly indicate that multiple capabilities are taken into account.
```suggestion
By default, the PMD will set this value according to the device capabilities.
```
</issue_to_address>Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.
There was a problem hiding this comment.
Actionable comments posted: 5
🤖 Fix all issues with AI agents
In @drivers/common/mlx5/mlx5_devx_cmds.c:
- Around line 1307-1318: The code reads v2 flags into attr->rx_sw_owner_v2 and
attr->tx_sw_owner_v2 but leaves legacy attr->rx_sw_owner and attr->tx_sw_owner
untouched, which can leave stale bits when the same attr struct is reused;
update the v2-path so that when attr->rx_sw_owner_v2 is non-zero you explicitly
set attr->rx_sw_owner = 0 (and likewise set attr->tx_sw_owner = 0 when
attr->tx_sw_owner_v2 is non-zero) after the MLX5_GET assignments to clear the
legacy fields.
- Around line 1471-1477: attr->esw_sw_owner_v2 being non-zero must clear the
legacy flag to avoid stale carryover: in the block that sets
attr->esw_sw_owner_v2 (MLX5_GET(...
flow_table_properties_nic_esw_fdb.sw_owner_v2)), add an else branch that
explicitly sets attr->esw_sw_owner = 0 when esw_sw_owner_v2 is non-zero; apply
the same change for the RX/TX counterparts (rx_sw_owner_v2 -> rx_sw_owner and
tx_sw_owner_v2 -> tx_sw_owner) so legacy fields are cleared whenever the v2 flag
is present, and verify device spec to confirm whether sw_owner and sw_owner_v2
can be simultaneously set and adjust priority semantics if needed.
In @drivers/net/mlx5/mlx5.c:
- Around line 1469-1481: mlx5_kvargs_is_used currently dereferences
mkvlist->kvlist and indexes mkvlist->is_used without validation; add defensive
checks at the start of mlx5_kvargs_is_used to ensure mkvlist and mkvlist->kvlist
are non-NULL and that mkvlist->is_used is present and sized at least
mkvlist->kvlist->count (or use a safe min between count and stored is_used
length if available), and return false (or assert) on invalid state to avoid
NULL deref / OOB when iterating mkvlist->kvlist->pairs and accessing
mkvlist->is_used[i].
🧹 Nitpick comments (4)
doc/guides/nics/mlx5.rst (2)
701-705: Clarify when capability-based defaults apply.The documentation states "By default, the PMD will set this value according to capability" followed by a three-tier fallback logic. Consider clarifying:
- Does this logic apply only when
dv_flow_enis not explicitly specified by the user, or does it override user-specified values?- The term "capability" is used without definition. Specify whether this refers to hardware capability, firmware capability, driver support, or a combination.
- If a user explicitly sets
dv_flow_ento a value not supported by the device capabilities, will it be silently adjusted, cause a warning, or cause an error?Suggested clarification
- By default, the PMD will set this value according to capability. + By default, when not explicitly specified, the PMD will set this value based on device capabilities:And consider adding a note:
+ If DV flow steering is not supported and HW steering is supported, + then it will be set to 2. + Otherwise, it will be set to 0. + + .. note:: + + Explicitly setting ``dv_flow_en`` to a value not supported by the device + will result in a configuration error during device initialization.
840-844: Clarify the behavior when unsupported configuration is provided.The documentation contains potentially confusing statements:
- Line 840 states the option "is not supported in HWS mode"
- Line 841 states "If this option is set to 1 in HWS mode, it will be set to 0"
These statements appear contradictory. If the option is "not supported," users might expect an error rather than silent adjustment to 0. Additionally, line 843 states "the PMD will set this value according to capability" without explaining what capability-based logic applies to this parameter.
Consider clarifying:
- Will the driver log a warning when automatically adjusting the value?
- What does "according to capability" mean for this specific parameter?
- Should the documentation use "not supported" or "automatically disabled" for consistency?
Suggested clarification
- This option is not supported in :ref:`HWS mode <mlx5_hws>`. - If this option is set to 1 in HWS mode, it will be set to 0. + This option is not supported in :ref:`HWS mode <mlx5_hws>` + and will be automatically disabled (set to 0) with a warning message + if explicitly set to 1 when HWS mode is active. - By default, the PMD will set this value according to capability. + By default, when not explicitly specified, the PMD enables this option (set to 1) + if DV flow steering is supported, and disables it (set to 0) otherwise.drivers/net/mlx5/mlx5_flow_hw.c (2)
12037-12053: Good refactor: centralized ring setup per queue; ensure errors propagate with enough context.Minor: since
flow_hw_queue_setup_rings()logs but doesn’t setrte_errno, consider standardizing (either setrte_errnobefore returning or return-rte_errnoconsistently) to make upstream error reporting deterministic.
15398-15460: Asserts match intended usage (async-only), but consider returning a real error instead of asserting.
flow_hw_update_resized()now asserts the transfer rings exist. That’s fine for catching internal misuse, but if there’s any chance this can be reached via public ops when only sync/nt-mode is configured, returningENOTSUP/EINVALwould be safer than an assert-triggered abort in debug builds.
📜 Review details
Configuration used: defaults
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (7)
doc/guides/nics/mlx5.rstdrivers/common/mlx5/mlx5_devx_cmds.cdrivers/common/mlx5/mlx5_devx_cmds.hdrivers/common/mlx5/mlx5_prm.hdrivers/net/mlx5/mlx5.cdrivers/net/mlx5/mlx5_flow_hw.cdrivers/net/mlx5/mlx5_trigger.c
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Sourcery review
🔇 Additional comments (17)
drivers/net/mlx5/mlx5_trigger.c (1)
1685-1695: Summary inconsistency: DMAC is not added in promiscuous path.The AI summary incorrectly states "In promiscuous path: DMAC-related controls are still added within the same branch." The actual code shows
MLX5_CTRL_DMAC(line 1694) is only set inside theelseblock, meaning it is not added when promiscuous mode is enabled.This aligns with the PR objective: "when promiscuous is enabled, set only the promiscuous flag and skip adding DMAC/multicast control flow rules."
The implementation is correct—in promiscuous mode, DMAC-specific rules are redundant since all traffic is accepted regardless of destination MAC.
drivers/net/mlx5/mlx5_flow_hw.c (11)
4394-4415: Good defensive guard for absent flow-transfer completion ring.Returning 0 when
flow_transfer_completedis NULL avoids deref/crash in configurations that don’t allocate those rings.
4630-4650: Nice: only push flow-transfer queues when both rings exist.This matches the “don’t allocate transfer rings in nt/sync mode” approach and prevents NULL deref.
9209-9217: Good wrapper to force “external” semantics via driver ops.This makes the public
pattern_template_createpath consistently take the external-validation route.
9804-9825: Updated internal pattern template creation usesexternal=falseas intended.
10084-10117: Updated internal pattern template creation usesexternal=falseas intended.
10135-10171: Updated internal pattern template creation usesexternal=falseas intended.
10186-10208: Updated internal pattern template creation usesexternal=falseas intended.
10222-10245: Updated internal pattern template creation usesexternal=falseas intended.
11468-11489: Updated internal pattern template creation usesexternal=falseas intended.
15510-15562: Driver ops now routespattern_template_createthrough the external wrapper: good.
9027-9207: The concern aboutrte_ring_free(NULL)safety is invalid; no action needed.
rte_ring_free()in lib/ring/rte_ring.c (line 336–337) explicitly checksif (r == NULL) return;, so passing NULL pointers from rings skipped duringnt_modeinitialization is safe. The cleanup at lines 11791–11794 can remain unconditional.Likely an incorrect or invalid review comment.
drivers/common/mlx5/mlx5_prm.h (2)
1600-1606: LGTM!The new enum provides clear symbolic constants for the steering logic format versions across ConnectX hardware generations. Explicit value assignments are appropriate for hardware-defined constants, and this aligns with the
steering_format_versionfield inmlx5_ifc_cmd_hca_cap_bits.
2354-2363: LGTM!The structure layout changes correctly expose
flow_table_properties_nic_esw_fdbcapability field from previously reserved space. The offset calculations are verified:
reserved_at_0[0x200]ends at offset 0x200flow_table_properties_nic_esw_fdb(0x200 bits) ends at offset 0x400reserved_at_400[0x400]ends at offset 0x800Field naming conventions follow the established pattern in the file, and the total structure size remains consistent.
drivers/common/mlx5/mlx5_devx_cmds.h (1)
339-346: Confirmstruct mlx5_hca_attris not consumed as a stable ABI / serialized blob.Adding these bitfields changes
struct mlx5_hca_attrlayout/size (Line 340-345). That’s OK if it’s strictly internal and always rebuilt together, but risky if any part is shared across binary boundaries or persisted.drivers/net/mlx5/mlx5.c (2)
1451-1467: SWS detection depends onhca_attrbeing fully initialized (avoid false positives).
mlx5_sws_is_any_supported()(Line 1451-1467) ORs_v2and legacy fields; that’s fine, but it assumes the legacy fields are reliably 0 when_v2is present (and that the struct isn’t carrying stale data). This ties directly to the “clear legacy on v2 path” fix inmlx5_devx_cmd_query_hca_attr()population.
1523-1581: Minor: warning text is misleading; also validate dv_flow_en capability gating for HWS.
- The warning on Line 1564-1566 says “DV flow is not supported” but the condition is specifically “SWS is not supported while HWS is”; consider rewording to avoid confusing users.
- Please confirm that the earlier gate (Line 1559-1562:
!sh->dev_cap.dv_flow_en => dv_flow_en=0) is correct for dv_flow_en==2 as well (i.e.,dv_flow_encapability truly means “no DV at all”, not just “no SWS”). If it only reflects SWS availability, this would incorrectly disable HWS.
| attr->rx_sw_owner_v2 = MLX5_GET(flow_table_nic_cap, hcattr, | ||
| flow_table_properties_nic_receive.sw_owner_v2); | ||
| if (!attr->rx_sw_owner_v2) | ||
| attr->rx_sw_owner = MLX5_GET(flow_table_nic_cap, hcattr, | ||
| flow_table_properties_nic_receive.sw_owner); | ||
|
|
||
| attr->tx_sw_owner_v2 = MLX5_GET(flow_table_nic_cap, hcattr, | ||
| flow_table_properties_nic_transmit.sw_owner_v2); | ||
| if (!attr->tx_sw_owner_v2) | ||
| attr->tx_sw_owner = MLX5_GET(flow_table_nic_cap, hcattr, | ||
| flow_table_properties_nic_transmit.sw_owner); | ||
|
|
There was a problem hiding this comment.
Avoid stale legacy *_sw_owner when _v2 is present.
On Line 1307-1317, if attr->rx_sw_owner_v2/tx_sw_owner_v2 is non-zero, the legacy attr->rx_sw_owner/tx_sw_owner is left untouched (could be stale if attr is reused). Safer to explicitly clear legacy bits when v2 is present.
Proposed fix (clear legacy on v2 path)
- attr->rx_sw_owner_v2 = MLX5_GET(flow_table_nic_cap, hcattr,
- flow_table_properties_nic_receive.sw_owner_v2);
- if (!attr->rx_sw_owner_v2)
- attr->rx_sw_owner = MLX5_GET(flow_table_nic_cap, hcattr,
- flow_table_properties_nic_receive.sw_owner);
+ attr->rx_sw_owner_v2 = MLX5_GET(flow_table_nic_cap, hcattr,
+ flow_table_properties_nic_receive.sw_owner_v2);
+ if (!attr->rx_sw_owner_v2) {
+ attr->rx_sw_owner = MLX5_GET(flow_table_nic_cap, hcattr,
+ flow_table_properties_nic_receive.sw_owner);
+ } else {
+ attr->rx_sw_owner = 0;
+ }
- attr->tx_sw_owner_v2 = MLX5_GET(flow_table_nic_cap, hcattr,
- flow_table_properties_nic_transmit.sw_owner_v2);
- if (!attr->tx_sw_owner_v2)
- attr->tx_sw_owner = MLX5_GET(flow_table_nic_cap, hcattr,
- flow_table_properties_nic_transmit.sw_owner);
+ attr->tx_sw_owner_v2 = MLX5_GET(flow_table_nic_cap, hcattr,
+ flow_table_properties_nic_transmit.sw_owner_v2);
+ if (!attr->tx_sw_owner_v2) {
+ attr->tx_sw_owner = MLX5_GET(flow_table_nic_cap, hcattr,
+ flow_table_properties_nic_transmit.sw_owner);
+ } else {
+ attr->tx_sw_owner = 0;
+ }🤖 Prompt for AI Agents
In @drivers/common/mlx5/mlx5_devx_cmds.c around lines 1307 - 1318, The code
reads v2 flags into attr->rx_sw_owner_v2 and attr->tx_sw_owner_v2 but leaves
legacy attr->rx_sw_owner and attr->tx_sw_owner untouched, which can leave stale
bits when the same attr struct is reused; update the v2-path so that when
attr->rx_sw_owner_v2 is non-zero you explicitly set attr->rx_sw_owner = 0 (and
likewise set attr->tx_sw_owner = 0 when attr->tx_sw_owner_v2 is non-zero) after
the MLX5_GET assignments to clear the legacy fields.
|
|
||
| attr->esw_sw_owner_v2 = MLX5_GET(flow_table_esw_cap, hcattr, | ||
| flow_table_properties_nic_esw_fdb.sw_owner_v2); | ||
| if (!attr->esw_sw_owner_v2) | ||
| attr->esw_sw_owner = MLX5_GET(flow_table_esw_cap, hcattr, | ||
| flow_table_properties_nic_esw_fdb.sw_owner); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# Get the context around lines 1471-1477 in the file
cat -n drivers/common/mlx5/mlx5_devx_cmds.c | sed -n '1450,1490p'Repository: ovsrobot/dpdk
Length of output: 1621
🏁 Script executed:
# Search for the NIC path mentioned in the comment
rg -A 5 -B 5 "nic_esw_fdb.sw_owner" drivers/common/mlx5/mlx5_devx_cmds.cRepository: ovsrobot/dpdk
Length of output: 548
🏁 Script executed:
# Look for initialization of esw_sw_owner and esw_sw_owner_v2
rg "esw_sw_owner" drivers/common/mlx5/mlx5_devx_cmds.cRepository: ovsrobot/dpdk
Length of output: 210
🏁 Script executed:
# Check for similar patterns with v2 fields and legacy fallback
rg -B 3 -A 3 "if.*v2.*==" drivers/common/mlx5/mlx5_devx_cmds.cRepository: ovsrobot/dpdk
Length of output: 39
🏁 Script executed:
# Search for NIC path equivalent - look for nic_fdb patterns
rg -B 5 -A 5 "nic_fdb.sw_owner" drivers/common/mlx5/mlx5_devx_cmds.cRepository: ovsrobot/dpdk
Length of output: 39
🏁 Script executed:
# Look for the struct definition to understand field initialization
rg -B 3 "esw_sw_owner" drivers/common/mlx5/mlx5.hRepository: ovsrobot/dpdk
Length of output: 123
🏁 Script executed:
# Search for patterns with sw_owner to find similar code
rg -B 10 -A 10 "sw_owner" drivers/common/mlx5/mlx5_devx_cmds.c | head -80Repository: ovsrobot/dpdk
Length of output: 2079
🏁 Script executed:
# Check if there's initialization of attr structure
rg "struct.*attr" drivers/common/mlx5/mlx5_devx_cmds.c | head -20Repository: ovsrobot/dpdk
Length of output: 982
🏁 Script executed:
# Search for memset or initialization of attr in this function
cat -n drivers/common/mlx5/mlx5_devx_cmds.c | sed -n '1340,1380p'Repository: ovsrobot/dpdk
Length of output: 2169
🏁 Script executed:
# Find where attr is allocated/initialized in the calling context
rg -B 20 "mlx5_devx_query_hca_attr" drivers/common/mlx5/mlx5_devx_cmds.c | head -60Repository: ovsrobot/dpdk
Length of output: 39
🏁 Script executed:
# Look for the struct mlx5_hca_attr definition
find drivers/common/mlx5 -name "*.h" -type fRepository: ovsrobot/dpdk
Length of output: 753
🏁 Script executed:
# Search for usage of these sw_owner fields
rg "sw_owner" drivers/ -A 2 -B 2 | grep -E "(esw_sw_owner|tx_sw_owner|rx_sw_owner)" | head -40Repository: ovsrobot/dpdk
Length of output: 1495
🏁 Script executed:
# Check if there's memset or zeroing of the attr structure
rg "memset.*attr|attr.*memset|calloc.*attr|malloc.*attr" drivers/common/mlx5/mlx5_devx_cmds.cRepository: ovsrobot/dpdk
Length of output: 39
🏁 Script executed:
# Find the function signature for mlx5_devx_query_hca_attr
rg -B 5 "^.*mlx5_devx_query_hca_attr" drivers/common/mlx5/mlx5_devx_cmds.c | head -30Repository: ovsrobot/dpdk
Length of output: 39
🏁 Script executed:
# Get full struct definition for mlx5_hca_attr
cat -n drivers/common/mlx5/mlx5_devx_cmds.h | grep -A 200 "struct mlx5_hca_attr" | head -250Repository: ovsrobot/dpdk
Length of output: 10180
🏁 Script executed:
# Check how this function is called - is attr allocated/zeroed before?
rg -B 10 "mlx5_devx_query_hca_attr" drivers/net/mlx5/ | head -50Repository: ovsrobot/dpdk
Length of output: 39
🏁 Script executed:
# Look for any documentation or comments about sw_owner semantics
rg -B 2 -A 2 "sw_owner" drivers/common/mlx5/mlx5_prm.h | head -40Repository: ovsrobot/dpdk
Length of output: 696
🏁 Script executed:
# Search for how attr is allocated in calling code
rg -B 15 "mlx5_devx_query_hca_attr" drivers/net/mlx5/mlx5.c | head -50Repository: ovsrobot/dpdk
Length of output: 39
🏁 Script executed:
# Check if attr is zero-initialized
rg -A 5 -B 5 "struct mlx5_hca_attr" drivers/net/mlx5/mlx5.c | head -40Repository: ovsrobot/dpdk
Length of output: 1063
🏁 Script executed:
# Look at the usage context - where sw_owner fields come from device
cat -n drivers/common/mlx5/mlx5_devx_cmds.c | sed -n '1400,1410p'Repository: ovsrobot/dpdk
Length of output: 565
🏁 Script executed:
# Check if function is called multiple times on same attr struct
rg "mlx5_devx_query_hca_attr.*attr" drivers/net/mlx5/mlx5.cRepository: ovsrobot/dpdk
Length of output: 39
Clear esw_sw_owner when esw_sw_owner_v2 is non-zero to prevent stale-legacy flag carryover.
The issue is valid: if attr->esw_sw_owner_v2 is non-zero, attr->esw_sw_owner is not cleared (lines 1474-1476), allowing a previously-set legacy value to persist. Since attr is a persistent config structure and the usage logic checks if (hca_attr->esw_sw_owner_v2 || hca_attr->esw_sw_owner), a stale esw_sw_owner bit from a prior query would incorrectly enable the capability. The same pattern affects both RX and TX paths (rx_sw_owner, tx_sw_owner).
Implement the proposed fix: explicitly set attr->esw_sw_owner = 0 in an else branch when esw_sw_owner_v2 is non-zero. Apply the same correction to the RX and TX paths.
Regarding the fallback assumption (0 means "unsupported"): confirm from device specification that a device supporting sw_owner_v2 cannot legitimately return both sw_owner_v2=1 and sw_owner=1 simultaneously. If both can be set, the priority and semantics need clarification.
🤖 Prompt for AI Agents
In @drivers/common/mlx5/mlx5_devx_cmds.c around lines 1471 - 1477,
attr->esw_sw_owner_v2 being non-zero must clear the legacy flag to avoid stale
carryover: in the block that sets attr->esw_sw_owner_v2 (MLX5_GET(...
flow_table_properties_nic_esw_fdb.sw_owner_v2)), add an else branch that
explicitly sets attr->esw_sw_owner = 0 when esw_sw_owner_v2 is non-zero; apply
the same change for the RX/TX counterparts (rx_sw_owner_v2 -> rx_sw_owner and
tx_sw_owner_v2 -> tx_sw_owner) so legacy fields are cleared whenever the v2 flag
is present, and verify device spec to confirm whether sw_owner and sw_owner_v2
can be simultaneously set and adjust priority semantics if needed.
| static int | ||
| flow_hw_queue_setup_rings(struct rte_eth_dev *dev, | ||
| uint16_t queue, | ||
| uint32_t queue_size, | ||
| bool nt_mode) | ||
| { | ||
| struct mlx5_priv *priv = dev->data->dev_private; | ||
|
|
||
| /* HWS queue info container must be already allocated. */ | ||
| MLX5_ASSERT(priv->hw_q != NULL); | ||
|
|
||
| /* Notice ring name length is limited. */ | ||
| priv->hw_q[queue].indir_cq = mlx5_hwq_ring_create | ||
| (dev->data->port_id, queue, queue_size, "indir_act_cq"); | ||
| if (!priv->hw_q[queue].indir_cq) { | ||
| DRV_LOG(ERR, "port %u failed to allocate indir_act_cq ring for HWS", | ||
| dev->data->port_id); | ||
| return -ENOMEM; | ||
| } | ||
|
|
||
| priv->hw_q[queue].indir_iq = mlx5_hwq_ring_create | ||
| (dev->data->port_id, queue, queue_size, "indir_act_iq"); | ||
| if (!priv->hw_q[queue].indir_iq) { | ||
| DRV_LOG(ERR, "port %u failed to allocate indir_act_iq ring for HWS", | ||
| dev->data->port_id); | ||
| return -ENOMEM; | ||
| } | ||
|
|
||
| /* | ||
| * Sync flow API does not require rings used for table resize handling, | ||
| * because these rings are only used through async flow APIs. | ||
| */ | ||
| if (nt_mode) | ||
| return 0; | ||
|
|
||
| priv->hw_q[queue].flow_transfer_pending = mlx5_hwq_ring_create | ||
| (dev->data->port_id, queue, queue_size, "tx_pending"); | ||
| if (!priv->hw_q[queue].flow_transfer_pending) { | ||
| DRV_LOG(ERR, "port %u failed to allocate tx_pending ring for HWS", | ||
| dev->data->port_id); | ||
| return -ENOMEM; | ||
| } | ||
|
|
||
| priv->hw_q[queue].flow_transfer_completed = mlx5_hwq_ring_create | ||
| (dev->data->port_id, queue, queue_size, "tx_done"); | ||
| if (!priv->hw_q[queue].flow_transfer_completed) { | ||
| DRV_LOG(ERR, "port %u failed to allocate tx_done ring for HWS", | ||
| dev->data->port_id); | ||
| return -ENOMEM; | ||
| } | ||
|
|
||
| return 0; | ||
| } |
There was a problem hiding this comment.
Potential NULL-free hazard after skipping transfer rings in nt_mode.
flow_hw_queue_setup_rings() intentionally leaves flow_transfer_pending/completed NULL when nt_mode is true. Please ensure the teardown path doesn’t assume they’re always allocated.
If rte_ring_free(NULL) is not guaranteed safe here, add NULL guards in __flow_hw_resource_release():
Proposed hardening diff
@@
for (i = 0; i < priv->nb_queue; i++) {
struct mlx5_hw_q *hwq = &priv->hw_q[i];
- rte_ring_free(hwq->indir_iq);
- rte_ring_free(hwq->indir_cq);
- rte_ring_free(hwq->flow_transfer_pending);
- rte_ring_free(hwq->flow_transfer_completed);
+ if (hwq->indir_iq)
+ rte_ring_free(hwq->indir_iq);
+ if (hwq->indir_cq)
+ rte_ring_free(hwq->indir_cq);
+ if (hwq->flow_transfer_pending)
+ rte_ring_free(hwq->flow_transfer_pending);
+ if (hwq->flow_transfer_completed)
+ rte_ring_free(hwq->flow_transfer_completed);
}Also applies to: 11788-11795
| static bool | ||
| mlx5_hws_is_supported(struct mlx5_dev_ctx_shared *sh) | ||
| { | ||
| return (sh->cdev->config.devx && | ||
| sh->cdev->config.hca_attr.wqe_based_flow_table_sup); | ||
| } |
There was a problem hiding this comment.
Guard HWS auto-defaulting with HAVE_MLX5_HWS_SUPPORT (avoid dv_flow_en=2 on non-HWS builds).
mlx5_hws_is_supported() (Line 1444-1449) only checks runtime caps. If this file is built without HAVE_MLX5_HWS_SUPPORT, the new defaulting (Line 1535-1538) can still set dv_flow_en=2, potentially selecting a mode the build can’t support.
Proposed fix (compile-time guard)
static bool
mlx5_hws_is_supported(struct mlx5_dev_ctx_shared *sh)
{
- return (sh->cdev->config.devx &&
- sh->cdev->config.hca_attr.wqe_based_flow_table_sup);
+#ifdef HAVE_MLX5_HWS_SUPPORT
+ return (sh->cdev->config.devx &&
+ sh->cdev->config.hca_attr.wqe_based_flow_table_sup);
+#else
+ RTE_SET_USED(sh);
+ return false;
+#endif
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| static bool | |
| mlx5_hws_is_supported(struct mlx5_dev_ctx_shared *sh) | |
| { | |
| return (sh->cdev->config.devx && | |
| sh->cdev->config.hca_attr.wqe_based_flow_table_sup); | |
| } | |
| static bool | |
| mlx5_hws_is_supported(struct mlx5_dev_ctx_shared *sh) | |
| { | |
| #ifdef HAVE_MLX5_HWS_SUPPORT | |
| return (sh->cdev->config.devx && | |
| sh->cdev->config.hca_attr.wqe_based_flow_table_sup); | |
| #else | |
| RTE_SET_USED(sh); | |
| return false; | |
| #endif | |
| } |
| static bool | ||
| mlx5_kvargs_is_used(struct mlx5_kvargs_ctrl *mkvlist, const char *key) | ||
| { | ||
| const struct rte_kvargs_pair *pair; | ||
| uint32_t i; | ||
|
|
||
| for (i = 0; i < mkvlist->kvlist->count; ++i) { | ||
| pair = &mkvlist->kvlist->pairs[i]; | ||
| if (strcmp(pair->key, key) == 0 && mkvlist->is_used[i]) | ||
| return true; | ||
| } | ||
| return false; | ||
| } |
There was a problem hiding this comment.
mlx5_kvargs_is_used() should defensively handle unexpected mkvlist internals.
mlx5_kvargs_is_used() (Line 1469-1481) assumes mkvlist->kvlist is non-NULL and mkvlist->is_used[] is sized for kvlist->count. If that’s guaranteed by mlx5_kvargs_process(), OK; otherwise add assertions (or NULL checks) to prevent OOB/NULL deref.
🤖 Prompt for AI Agents
In @drivers/net/mlx5/mlx5.c around lines 1469 - 1481, mlx5_kvargs_is_used
currently dereferences mkvlist->kvlist and indexes mkvlist->is_used without
validation; add defensive checks at the start of mlx5_kvargs_is_used to ensure
mkvlist and mkvlist->kvlist are non-NULL and that mkvlist->is_used is present
and sized at least mkvlist->kvlist->count (or use a safe min between count and
stored is_used length if available), and return false (or assert) on invalid
state to avoid NULL deref / OOB when iterating mkvlist->kvlist->pairs and
accessing mkvlist->is_used[i].
Auto-submission for "http://patchwork.dpdk.org/project/dpdk/list/?series=37015"
Summary by Sourcery
Adjust mlx5 flow steering defaults and capabilities handling while tightening HWS ring usage and pattern template validation based on hardware and API mode.
Bug Fixes:
Enhancements:
Summary by CodeRabbit
Release Notes
New Features
Bug Fixes
Documentation
✏️ Tip: You can customize this high-level summary in your review settings.