Skip to content

Conversation

@maxime-leroy
Copy link
Contributor

Add dplane ctx helpers to carry interface speed and skip ethtool polling when the dataplane provides a value. Avoid scheduling speed update timers in this case.

@frrbot frrbot bot added the zebra label Aug 14, 2025
@maxime-leroy maxime-leroy marked this pull request as draft August 14, 2025 12:58
@mjstapp mjstapp self-requested a review August 14, 2025 14:27
@mjstapp
Copy link
Contributor

mjstapp commented Aug 14, 2025

A couple of questions:
This appears to make the dplane-sourced link speed info one-shot, only really available along with a complete set of interface attributes. Would it make more sense to make the speed path more distinct, with a dedicated path, to allow it to be more dynamic?
Or would it make more sense for the existing "query for the speed periodically" approach to be adapted to use the dplane instead of inline system calls? In that way, the kernel info could be checked as it is now, but a different dplane plugin could respond to that query event itself.

@maxime-leroy
Copy link
Contributor Author

A couple of questions: This appears to make the dplane-sourced link speed info one-shot, only really available along with a complete set of interface attributes. Would it make more sense to make the speed path more distinct, with a dedicated path, to allow it to be more dynamic? Or would it make more sense for the existing "query for the speed periodically" approach to be adapted to use the dplane instead of inline system calls? In that way, the kernel info could be checked as it is now, but a different dplane plugin could respond to that query event itself.

Speed is always included in Grout’s interface notifications, updated when the interface goes up or down. In practice we receive all attributes needed for DPLANE_OP_INTF_INSTALL/DELETE, including speed.

Netlink is different: it does not carry speed for RTM_NEWLINK notfication, so FRR must call ethtool/ioctl—and often retry—to obtain a stable value.

Both of your proposals align with the kernel model where RTM_NEWLINK lacks speed. Your first option also removes the ioctl from the zebra main thread by doing it on the dplane thread, which is a better design overall.

This patch is the minimal change to consume Grout-provided speed. It's why this patch is a draft.

Anyway, introducing a dedicated API like DPLANE_OP_INTF_SPEED_UPDATE would require:

  • netlink_link_change to start a timer to query speed on link-up, similar to what zebra_if_dplane_ifp_handling does today.
  • zebra_if_dplane_ifp_handling currently sets speed for a new interface via kernel_get_speed. This cannot do anymore. We would need a default intial speed link (i.e. 0 ?) until DPLANE_OP_INTF_SPEED_UPDATE arrives.

For Grout, this would mean sending DPLANE_OP_INTF_UPDATE (to set link up for example) plus DPLANE_OP_INTF_UPDATE_SPEED (to set speed link). It’s not optimal, but acceptable if we want a unified API used by kernel and Grout.

@mjstapp
Copy link
Contributor

mjstapp commented Aug 18, 2025

I'm not trying to get into a "linux vs grout" struggle. I just want the dplane api to be something reasonably neutral, so if it's going to change, I'd rather it change in ways that could be somewhat general.

Assuming that there's never a speed available at creation seems limiting, as you say. But assuming that there's never a need to query is also limiting. I don't think you have to force "grout" to change to accomodate that - I'd just like there be a path to moving the existing work into the dataplane - and making it available to plugins to decide what to do (if anything).

A couple of questions: This appears to make the dplane-sourced link speed info one-shot, only really available along with a complete set of interface attributes. Would it make more sense to make the speed path more distinct, with a dedicated path, to allow it to be more dynamic? Or would it make more sense for the existing "query for the speed periodically" approach to be adapted to use the dplane instead of inline system calls? In that way, the kernel info could be checked as it is now, but a different dplane plugin could respond to that query event itself.

Speed is always included in Grout’s interface notifications, updated when the interface goes up or down. In practice we receive all attributes needed for DPLANE_OP_INTF_INSTALL/DELETE, including speed.

Netlink is different: it does not carry speed for RTM_NEWLINK notfication, so FRR must call ethtool/ioctl—and often retry—to obtain a stable value.

Both of your proposals align with the kernel model where RTM_NEWLINK lacks speed. Your first option also removes the ioctl from the zebra main thread by doing it on the dplane thread, which is a better design overall.

This patch is the minimal change to consume Grout-provided speed. It's why this patch is a draft.

Anyway, introducing a dedicated API like DPLANE_OP_INTF_SPEED_UPDATE would require:

* netlink_link_change to start a timer to query speed on link-up, similar to what zebra_if_dplane_ifp_handling does today.

* zebra_if_dplane_ifp_handling currently sets speed for a new interface via kernel_get_speed. This cannot do anymore. We would need a default intial speed link (i.e. 0 ?) until DPLANE_OP_INTF_SPEED_UPDATE arrives.

For Grout, this would mean sending DPLANE_OP_INTF_UPDATE (to set link up for example) plus DPLANE_OP_INTF_UPDATE_SPEED (to set speed link). It’s not optimal, but acceptable if we want a unified API used by kernel and Grout.

@donaldsharp
Copy link
Member

This code change WILL break bonds/lags under linux. It is common for a bond/LAG to be slow coming up while FRR is already started and the interface speed will change.

This is the specific scenario I put this code in for:

a) System start
b) Interface comes up with Speed X ( say we are bonding 10 interfaces and only 3 of the 10 interfaces have been bonded )
c) FRR starts receives speed X
d) Interface adds a new interface to the bond, speed becomes Y
e) Interface adds a final interface to the bond, speed becomes Z

With this code change you are proposing Zebra will now no longer be able to handle this

What you need to do is either have a grout version of kernel_get_speed ( that returns the interface speed again so it looks like nothing has changed ), or you need to add a dplane query for getting the speed that returns the new speed when asked for it.

Copy link
Member

@donaldsharp donaldsharp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm NAK'ing this because in this current format, this will fundamentally break bonds being able to have the correct speed under linux. Which is no bueno :(

@maxime-leroy
Copy link
Contributor Author

This code change WILL break bonds/lags under linux. It is common for a bond/LAG to be slow coming up while FRR is already started and the interface speed will change.

This is the specific scenario I put this code in for:

a) System start
b) Interface comes up with Speed X ( say we are bonding 10 interfaces and only 3 of the 10 interfaces have been bonded )
c) FRR starts receives speed X
d) Interface adds a new interface to the bond, speed becomes Y
e) Interface adds a final interface to the bond, speed becomes Z

With this code change you are proposing Zebra will now no longer be able to handle this

What you need to do is either have a grout version of kernel_get_speed ( that returns the interface speed again so it looks like nothing has changed ), or you need to add a dplane query for getting the speed that returns the new speed when asked for it.

First, this PR is a draft; it is not intended to be merged.
Second, this commit only adds APIs to set link speed via DPLANE_OP_INTF_INSTALL/DELETE. These APIs are not used by the zebra/kernel path. The existing timer still collects link speed by calling kernel_get_speed, so behavior is unchanged and bonds/LAGs should continue to work as before.

Last point: I think it would be better to run the ioctl on the zebra dplane thread rather than the main thread—unless I’m missing something?

We could add DPLANE_OP_INTF_UPDATE_SPEED to obtain speed from the dplane and update it on the main thread (as Mjtsapp suggested). That should handle bond interfaces properly.

How the dplane obtains link speed (polling via ioctl or by notification) is an implementation detail that should remain in the dplane backend.

@github-actions github-actions bot added size/L rebase PR needs rebase and removed size/M labels Sep 2, 2025
@maxime-leroy
Copy link
Contributor Author

Still work in progress, just back from holiday, didn't have time to test it (yet).

Copy link
Contributor

@mjstapp mjstapp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, no, I think maybe some of the discussion wasn't clear?

the way the dplane works is by sending events down from zebra towards the dataplane(s), and sending results and other events "up" towards zebra from the dataplane(s).

it'd be fine to include a field for interface speed, and apply that if it's present

the interface update timer could emit a dplane event for processing. the kernel dplane would make the appropriate system call. some other dataplane plugin might just ignore the event. if the current notion of the speed were in the event, and if that was unchanged, the kernel plugin could also avoid making any change.

a dplane event with a speed update would be processed in the zebra main pthread context, which is the only pthread allowed to touch the zebra structs.

I don't think it's worth adding a header file just to rename a couple of status codes - is there a real benefit to changing those labels?

@maxime-leroy
Copy link
Contributor Author

maxime-leroy commented Sep 5, 2025

hmm, no, I think maybe some of the discussion wasn't clear?

the way the dplane works is by sending events down from zebra towards the dataplane(s), and sending results and other events "up" towards zebra from the dataplane(s).

it'd be fine to include a field for interface speed, and apply that if it's present

the interface update timer could emit a dplane event for processing. the kernel dplane would make the appropriate system call. some other dataplane plugin might just ignore the event. if the current notion of the speed were in the event, and if that was unchanged, the kernel plugin could also avoid making any change.

a dplane event with a speed update would be processed in the zebra main pthread context, which is the only pthread allowed to touch the zebra structs.

I don't think it's worth adding a header file just to rename a couple of status codes - is there a real benefit to changing those labels?

Thanks for the feedback.
Just updated in consequence. It's still WIP, see notes section in last commit.

@maxime-leroy maxime-leroy force-pushed the if_speed_dplane branch 2 times, most recently from 812ba4f to 8481e29 Compare September 8, 2025 07:07
@maxime-leroy
Copy link
Contributor Author

For dplane_intf_speed, should we reuse dplane_intf_update_internal(ifp, DPLANE_OP_INTF_SPEED) or build own ctx dplane ? In this case, should we have own dplane stats (i.e. .dg_intfs_in/errors) ? thanks

@maxime-leroy
Copy link
Contributor Author

This new api is used in a PR on grout side: DPDK/grout@9e24a62

@maxime-leroy maxime-leroy force-pushed the if_speed_dplane branch 3 times, most recently from df8b4ef to 15e016b Compare September 12, 2025 10:24
@maxime-leroy
Copy link
Contributor Author

The PR has been updated to resolve conflicts. It's ready for review.

&zebra_if->speed_update);
event_ignore_late_timer(zebra_if->speed_update);

zebra_if_schedule_speed_update(zebra_if, 15);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, that 15 sticks out - do you know where that comes from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dc7b3ca it comes from this commit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding, the 15s timer is needed when FRR is starting. FRR probes all interfaces from the kernel. If an interface is already up but the link negotiation is not finished yet, if_up() will not be called, and the speed will never be updated. Since the speed is only queried when the interface transitions to up in if_up(), we can’t rely on that during startup. That’s why a timer is scheduled to query the speed again 15 seconds later in if_zebra_new_hook. Maybe we could start this timer only at startup, but it doesn’t seem we have a startup field in this hook.

Donald could probably shed some light on this point since he added the initial code — @donaldsharp ?

@github-actions github-actions bot added the rebase PR needs rebase label Dec 12, 2025
@maxime-leroy maxime-leroy force-pushed the if_speed_dplane branch 2 times, most recently from caa8a18 to a727763 Compare December 12, 2025 11:31
@frrbot frrbot bot added the bugfix label Dec 12, 2025
@maxime-leroy maxime-leroy force-pushed the if_speed_dplane branch 4 times, most recently from 2b44b62 to 9a4179e Compare December 12, 2025 14:37
@maxime-leroy maxime-leroy requested a review from mjstapp December 23, 2025 20:13
@github-actions
Copy link

github-actions bot commented Jan 7, 2026

This pull request has conflicts, please resolve those before we can evaluate the pull request.

@maxime-leroy
Copy link
Contributor Author

ci:rerun

@mjstapp mjstapp removed the bugfix label Jan 13, 2026
Add dplane ctx helpers to carry interface speed.

Signed-off-by: Maxime Leroy <maxime@leroys.fr>
kernel_get_speed() doesn't need the full interface object; it only needs
the interface name and the VRF id to open the right socket/ioctl. Update
the prototype and callers accordingly.

This will be used in the next commit.

Signed-off-by: Maxime Leroy <maxime@leroys.fr>
This introduces DPLANE_OP_INTF_SPEED_GET so link speed is resolved in the
dataplane via ethtool and reported to zebra. Zebra no longer performs
synchronous speed reads; it simply applies the value provided by the
dataplane.

If speed is already known during interface creation or modification, it
can be included in INTF_INSTALL/INTF_UPDATE and zebra will use it
directly. If speed is not provided, the zebra main thread
issues a follow-up INTF_SPEED_GET to request the dataplane to fetch the
speed asynchronously.

For dataplane providers that implement only INTF_INSTALL/INTF_UPDATE and
do not support INTF_SPEED_GET, zebra relies on any speed value provided
by install/update. If speed is missing, zebra attempts a single
INTF_SPEED_GET query and stops if the operation is unsupported or fails.

Signed-off-by: Maxime Leroy <maxime@leroys.fr>
The 15-second timer used to re-query interface speed is currently
scheduled in if_zebra_new_hook() for every newly created
interface. However, at that point the interface may not yet exist in the
OS, and in some cases it may never be created.

Because of this, the speed query will usually
fail (e.g. INTERFACE_SPEED_ERROR_READ) since the interface doesn't
exist. There is also a race condition: even if the interface is created,
the timer may run before the RTM_NEWLINK message is processed.

As a result, ifp->ifp_index can remain IFINDEX_INTERNAL (0). When
if_add_update() calls zebra_ns_link_ifp(), the interface tree is updated
with this incorrect index. If this happens for multiple interfaces, the
tree can end up with duplicate keys, eventually causing a zebra crash.

A check was added to zebra_ns_link_ifp() to avoid adding an interface
with an IFINDEX_INTERNAL index, but the root cause remained.

This change fixes the underlying issue by scheduling the speed-update
timer only when a valid RTM_NEWLINK has been received. The scheduling
logic is moved from if_zebra_new_hook() to
zebra_if_dplane_ifp_handling(), and runs only once the interface has a
correct ifindex.

Fixes: dc7b3ca ("zebra: Add one-shot thread to recheck speed")
Signed-off-by: Maxime Leroy <maxime@leroys.fr>
skip_kernel was only applied in the netlink batch send path. As a result,
operations processed by dedicated handlers (notably DPLANE_OP_INTF_SPEED_GET)
were still executed by the kernel provider even when a previous provider
plugin requested to skip kernel updates.

Handle skip_kernel early in kernel_dplane_process_func() so it applies to
all kernel provider operations, and remove the scattered checks from the
netlink helpers.

Signed-off-by: Maxime Leroy <maxime@leroys.fr>
@maxime-leroy
Copy link
Contributor Author

I have updated this PR by removing the interface speed flag, as requested by Mark, to avoid introducing a Linux-specific / special-case behavior. As a result, a DPLANE plugin (such as the Grout one) must now return either SUCCESS or FAILURE for DPLANE_OP_INTF_SPEED_GET, even though the speed is already reported via DPLANE_OP_INTF_NEW/UPDATE.

While doing this, I noticed that for DPLANE_OP_INTF_SPEED_GET the skip-kernel flag was ignored by the kernel provider. I therefore added an extra commit to fix this. With this change, skip-kernel applies to all kernel dplane operations (not only the netlink batch path), restoring the behavior from the original skip-kernel introduction (commit 9677961, “zebra: support skip-kernel for dataplane updates”).

This updated API has also been tested with Grout (see DPDK/grout PR #292: DPDK/grout#292).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants