From ef58b9393336bf342fdcc756dc140f19098f7397 Mon Sep 17 00:00:00 2001 From: Amit Cohen Date: Thu, 24 Aug 2023 14:49:44 +0300 Subject: [PATCH 01/21] selftests: mlxsw: qos_pfc: Remove wrong description In the diagram of the topology, $swp3 and $swp4 are described as 1Gbps ports. This is wrong information, the test does not configure such speed. Fixes: bfa804784e32 ("selftests: mlxsw: Add a PFC test") Signed-off-by: Amit Cohen Reviewed-by: Ido Schimmel --- tools/testing/selftests/drivers/net/mlxsw/qos_pfc.sh | 1 - 1 file changed, 1 deletion(-) diff --git a/tools/testing/selftests/drivers/net/mlxsw/qos_pfc.sh b/tools/testing/selftests/drivers/net/mlxsw/qos_pfc.sh index 42ce602d8d492..49bef76083b83 100755 --- a/tools/testing/selftests/drivers/net/mlxsw/qos_pfc.sh +++ b/tools/testing/selftests/drivers/net/mlxsw/qos_pfc.sh @@ -40,7 +40,6 @@ # | + $swp1 $swp3 + + $swp4 | # | | iPOOL1 iPOOL0 | | iPOOL2 | # | | ePOOL4 ePOOL5 | | ePOOL4 | -# | | 1Gbps | | 1Gbps | # | | PFC:enabled=1 | | PFC:enabled=1 | # | +-|----------------------|-+ +-|------------------------+ | # | | + $swp1.111 $swp3.111 + | | + $swp4.111 | | From a52d6d9152d7f7dd0d74835b7eb9eb13e72fcc83 Mon Sep 17 00:00:00 2001 From: Amit Cohen Date: Thu, 24 Aug 2023 14:49:45 +0300 Subject: [PATCH 02/21] selftests: mlxsw: qos_pfc: Adjust the test to support 8 lanes 'qos_pfc' test checks PFC behavior. The idea is to limit the traffic using a shaper somewhere in the flow of the packets. In this area, the buffer is smaller than the buffer at the beginning of the flow, so it fills up until there is no more space left. The test configures there PFC which is supposed to notice that the headroom is filling up and send PFC Xoff to indicate the transmitter to stop sending traffic for the priorities sharing this PG. The Xon/Xoff threshold is auto-configured and always equal to 2*(MTU rounded up to cell size). Even after sending the PFC Xoff packet, traffic will keep arriving until the transmitter receives and processes the PFC packet. This amount of traffic is known as the PFC delay allowance. Currently the buffer for the delay traffic is configured as 100KB. The MTU in the test is 10KB, therefore the threshold for Xoff is about 20KB. This allows 80KB extra to be stored in this buffer. 8-lane ports use two buffers among which the configured buffer is split, the Xoff threshold then applies to each buffer in parallel. The test does not take into account the behavior of 8-lane ports, when the ports are configured to 400Gbps with 8 lanes or 800Gbps with 8 lanes, packets are dropped and the test fails. Check if the relevant ports use 8 lanes, in such case double the size of the buffer, as the headroom is split half-half. Fixes: bfa804784e32 ("selftests: mlxsw: Add a PFC test") Signed-off-by: Amit Cohen Reviewed-by: Ido Schimmel --- .../selftests/drivers/net/mlxsw/qos_pfc.sh | 18 +++++++++++++++++- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/tools/testing/selftests/drivers/net/mlxsw/qos_pfc.sh b/tools/testing/selftests/drivers/net/mlxsw/qos_pfc.sh index 49bef76083b83..0f0f4f05807c9 100755 --- a/tools/testing/selftests/drivers/net/mlxsw/qos_pfc.sh +++ b/tools/testing/selftests/drivers/net/mlxsw/qos_pfc.sh @@ -119,6 +119,9 @@ h2_destroy() switch_create() { + local lanes_swp4 + local pg1_size + # pools # ----- @@ -228,7 +231,20 @@ switch_create() dcb pfc set dev $swp4 prio-pfc all:off 1:on # PG0 will get autoconfigured to Xoff, give PG1 arbitrarily 100K, which # is (-2*MTU) about 80K of delay provision. - dcb buffer set dev $swp4 buffer-size all:0 1:$_100KB + pg1_size=$_100KB + + setup_wait_dev_with_timeout $swp4 + + lanes_swp4=$(ethtool $swp4 | grep 'Lanes:') + lanes_swp4=${lanes_swp4#*"Lanes: "} + + # 8-lane ports use two buffers among which the configured buffer + # is split, so double the size to get twice (20K + 80K). + if [[ $lanes_swp4 -eq 8 ]]; then + pg1_size=$((pg1_size * 2)) + fi + + dcb buffer set dev $swp4 buffer-size all:0 1:$pg1_size # bridges # ------- From 560b915852e4e8162467b36a197818d66795205f Mon Sep 17 00:00:00 2001 From: Ido Schimmel Date: Tue, 31 Oct 2023 14:03:00 +0200 Subject: [PATCH 03/21] devlink: Move private netlink flags to C file The flags are not used outside of the C file so move them there. Suggested-by: Jiri Pirko Signed-off-by: Ido Schimmel Reviewed-by: Jiri Pirko --- net/devlink/devl_internal.h | 3 --- net/devlink/netlink.c | 3 +++ 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h index 183dbe3807ab3..2a9b263300a4b 100644 --- a/net/devlink/devl_internal.h +++ b/net/devlink/devl_internal.h @@ -111,9 +111,6 @@ int devlink_rel_devlink_handle_put(struct sk_buff *msg, struct devlink *devlink, bool *msg_updated); /* Netlink */ -#define DEVLINK_NL_FLAG_NEED_PORT BIT(0) -#define DEVLINK_NL_FLAG_NEED_DEVLINK_OR_PORT BIT(1) - enum devlink_multicast_groups { DEVLINK_MCGRP_CONFIG, }; diff --git a/net/devlink/netlink.c b/net/devlink/netlink.c index d0b90ebc8b152..7350138c8bb44 100644 --- a/net/devlink/netlink.c +++ b/net/devlink/netlink.c @@ -9,6 +9,9 @@ #include "devl_internal.h" +#define DEVLINK_NL_FLAG_NEED_PORT BIT(0) +#define DEVLINK_NL_FLAG_NEED_DEVLINK_OR_PORT BIT(1) + static const struct genl_multicast_group devlink_nl_mcgrps[] = { [DEVLINK_MCGRP_CONFIG] = { .name = DEVLINK_GENL_MCGRP_CONFIG_NAME }, }; From 4183315bfa09c63b3f68de7a901e18e28ce7de9e Mon Sep 17 00:00:00 2001 From: Ido Schimmel Date: Tue, 31 Oct 2023 14:03:01 +0200 Subject: [PATCH 04/21] devlink: Acquire device lock during netns dismantle Device drivers register with devlink from their probe routines (under the device lock) by acquiring the devlink instance lock and calling devl_register(). Drivers that support a devlink reload usually implement the reload_{down, up}() operations in a similar fashion to their remove and probe routines, respectively. However, while the remove and probe routines are invoked with the device lock held, the reload operations are only invoked with the devlink instance lock held. It is therefore impossible for drivers to acquire the device lock from their reload operations, as this would result in lock inversion. The motivating use case for invoking the reload operations with the device lock held is in mlxsw which needs to trigger a PCI reset as part of the reload. The driver cannot call pci_reset_function() as this function acquires the device lock. Instead, it needs to call __pci_reset_function_locked which expects the device lock to be held. To that end, adjust devlink to always acquire the device lock before the devlink instance lock when performing a reload. For now, only do that when reload is triggered as part of netns dismantle. Subsequent patches will handle the case where reload is explicitly triggered by user space. Signed-off-by: Ido Schimmel Reviewed-by: Jiri Pirko --- net/devlink/core.c | 4 ++-- net/devlink/devl_internal.h | 15 +++++++++++++++ 2 files changed, 17 insertions(+), 2 deletions(-) diff --git a/net/devlink/core.c b/net/devlink/core.c index 6984877e9f10d..4275a2bc6d8e0 100644 --- a/net/devlink/core.c +++ b/net/devlink/core.c @@ -503,14 +503,14 @@ static void __net_exit devlink_pernet_pre_exit(struct net *net) * all devlink instances from this namespace into init_net. */ devlinks_xa_for_each_registered_get(net, index, devlink) { - devl_lock(devlink); + devl_dev_lock(devlink, true); err = 0; if (devl_is_registered(devlink)) err = devlink_reload(devlink, &init_net, DEVLINK_RELOAD_ACTION_DRIVER_REINIT, DEVLINK_RELOAD_LIMIT_UNSPEC, &actions_performed, NULL); - devl_unlock(devlink); + devl_dev_unlock(devlink, true); devlink_put(devlink); if (err && err != -EOPNOTSUPP) pr_warn("Failed to reload devlink instance into init_net\n"); diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h index 2a9b263300a4b..178abaf74a107 100644 --- a/net/devlink/devl_internal.h +++ b/net/devlink/devl_internal.h @@ -3,6 +3,7 @@ * Copyright (c) 2016 Jiri Pirko */ +#include #include #include #include @@ -96,6 +97,20 @@ static inline bool devl_is_registered(struct devlink *devlink) return xa_get_mark(&devlinks, devlink->index, DEVLINK_REGISTERED); } +static inline void devl_dev_lock(struct devlink *devlink, bool dev_lock) +{ + if (dev_lock) + device_lock(devlink->dev); + devl_lock(devlink); +} + +static inline void devl_dev_unlock(struct devlink *devlink, bool dev_lock) +{ + devl_unlock(devlink); + if (dev_lock) + device_unlock(devlink->dev); +} + typedef void devlink_rel_notify_cb_t(struct devlink *devlink, u32 obj_index); typedef void devlink_rel_cleanup_cb_t(struct devlink *devlink, u32 obj_index, u32 rel_index); From 1d72ddafdbdec1eda83db7f646cf20d5b6244ded Mon Sep 17 00:00:00 2001 From: Ido Schimmel Date: Tue, 31 Oct 2023 14:03:02 +0200 Subject: [PATCH 05/21] devlink: Enable the use of private flags in post_doit operations Currently, private flags (e.g., 'DEVLINK_NL_FLAG_NEED_PORT') are only used in pre_doit operations, but a subsequent patch will need to conditionally lock and unlock the device lock in pre and post doit operations, respectively. As a preparation, enable the use of private flags in post_doit operations in a similar fashion to how it is done for pre_doit operations. No functional changes intended. Signed-off-by: Ido Schimmel Reviewed-by: Jiri Pirko --- net/devlink/netlink.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/net/devlink/netlink.c b/net/devlink/netlink.c index 7350138c8bb44..5bb6624f3288e 100644 --- a/net/devlink/netlink.c +++ b/net/devlink/netlink.c @@ -141,8 +141,8 @@ int devlink_nl_pre_doit_port_optional(const struct genl_split_ops *ops, return __devlink_nl_pre_doit(skb, info, DEVLINK_NL_FLAG_NEED_DEVLINK_OR_PORT); } -void devlink_nl_post_doit(const struct genl_split_ops *ops, - struct sk_buff *skb, struct genl_info *info) +static void __devlink_nl_post_doit(struct sk_buff *skb, struct genl_info *info, + u8 flags) { struct devlink *devlink; @@ -151,6 +151,12 @@ void devlink_nl_post_doit(const struct genl_split_ops *ops, devlink_put(devlink); } +void devlink_nl_post_doit(const struct genl_split_ops *ops, + struct sk_buff *skb, struct genl_info *info) +{ + __devlink_nl_post_doit(skb, info, 0); +} + static int devlink_nl_inst_single_dumpit(struct sk_buff *msg, struct netlink_callback *cb, int flags, devlink_nl_dump_one_func_t *dump_one, From b2691bf73d86bcb001ca46b7c79dec4f1d01ed7d Mon Sep 17 00:00:00 2001 From: Ido Schimmel Date: Tue, 31 Oct 2023 14:03:03 +0200 Subject: [PATCH 06/21] devlink: Allow taking device lock in pre_doit operations Introduce a new private flag ('DEVLINK_NL_FLAG_NEED_DEV_LOCK') to allow netlink commands to specify that they need to acquire the device lock in their pre_doit operation and release it in their post_doit operation. The reload command will use this flag in the subsequent patch. No functional changes intended. Signed-off-by: Ido Schimmel Reviewed-by: Jiri Pirko --- net/devlink/devl_internal.h | 3 ++- net/devlink/health.c | 3 ++- net/devlink/netlink.c | 19 ++++++++++++------- net/devlink/region.c | 3 ++- 4 files changed, 18 insertions(+), 10 deletions(-) diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h index 178abaf74a107..5ea2e2012e930 100644 --- a/net/devlink/devl_internal.h +++ b/net/devlink/devl_internal.h @@ -152,7 +152,8 @@ typedef int devlink_nl_dump_one_func_t(struct sk_buff *msg, int flags); struct devlink * -devlink_get_from_attrs_lock(struct net *net, struct nlattr **attrs); +devlink_get_from_attrs_lock(struct net *net, struct nlattr **attrs, + bool dev_lock); int devlink_nl_dumpit(struct sk_buff *msg, struct netlink_callback *cb, devlink_nl_dump_one_func_t *dump_one); diff --git a/net/devlink/health.c b/net/devlink/health.c index 695df61f8ac2a..71ae121dc739d 100644 --- a/net/devlink/health.c +++ b/net/devlink/health.c @@ -1151,7 +1151,8 @@ devlink_health_reporter_get_from_cb_lock(struct netlink_callback *cb) struct nlattr **attrs = info->attrs; struct devlink *devlink; - devlink = devlink_get_from_attrs_lock(sock_net(cb->skb->sk), attrs); + devlink = devlink_get_from_attrs_lock(sock_net(cb->skb->sk), attrs, + false); if (IS_ERR(devlink)) return NULL; diff --git a/net/devlink/netlink.c b/net/devlink/netlink.c index 5bb6624f3288e..86f12531bf998 100644 --- a/net/devlink/netlink.c +++ b/net/devlink/netlink.c @@ -11,6 +11,7 @@ #define DEVLINK_NL_FLAG_NEED_PORT BIT(0) #define DEVLINK_NL_FLAG_NEED_DEVLINK_OR_PORT BIT(1) +#define DEVLINK_NL_FLAG_NEED_DEV_LOCK BIT(2) static const struct genl_multicast_group devlink_nl_mcgrps[] = { [DEVLINK_MCGRP_CONFIG] = { .name = DEVLINK_GENL_MCGRP_CONFIG_NAME }, @@ -64,7 +65,8 @@ int devlink_nl_msg_reply_and_new(struct sk_buff **msg, struct genl_info *info) } struct devlink * -devlink_get_from_attrs_lock(struct net *net, struct nlattr **attrs) +devlink_get_from_attrs_lock(struct net *net, struct nlattr **attrs, + bool dev_lock) { struct devlink *devlink; unsigned long index; @@ -78,12 +80,12 @@ devlink_get_from_attrs_lock(struct net *net, struct nlattr **attrs) devname = nla_data(attrs[DEVLINK_ATTR_DEV_NAME]); devlinks_xa_for_each_registered_get(net, index, devlink) { - devl_lock(devlink); + devl_dev_lock(devlink, dev_lock); if (devl_is_registered(devlink) && strcmp(devlink->dev->bus->name, busname) == 0 && strcmp(dev_name(devlink->dev), devname) == 0) return devlink; - devl_unlock(devlink); + devl_dev_unlock(devlink, dev_lock); devlink_put(devlink); } @@ -93,11 +95,13 @@ devlink_get_from_attrs_lock(struct net *net, struct nlattr **attrs) static int __devlink_nl_pre_doit(struct sk_buff *skb, struct genl_info *info, u8 flags) { + bool dev_lock = flags & DEVLINK_NL_FLAG_NEED_DEV_LOCK; struct devlink_port *devlink_port; struct devlink *devlink; int err; - devlink = devlink_get_from_attrs_lock(genl_info_net(info), info->attrs); + devlink = devlink_get_from_attrs_lock(genl_info_net(info), info->attrs, + dev_lock); if (IS_ERR(devlink)) return PTR_ERR(devlink); @@ -117,7 +121,7 @@ static int __devlink_nl_pre_doit(struct sk_buff *skb, struct genl_info *info, return 0; unlock: - devl_unlock(devlink); + devl_dev_unlock(devlink, dev_lock); devlink_put(devlink); return err; } @@ -144,10 +148,11 @@ int devlink_nl_pre_doit_port_optional(const struct genl_split_ops *ops, static void __devlink_nl_post_doit(struct sk_buff *skb, struct genl_info *info, u8 flags) { + bool dev_lock = flags & DEVLINK_NL_FLAG_NEED_DEV_LOCK; struct devlink *devlink; devlink = info->user_ptr[0]; - devl_unlock(devlink); + devl_dev_unlock(devlink, dev_lock); devlink_put(devlink); } @@ -165,7 +170,7 @@ static int devlink_nl_inst_single_dumpit(struct sk_buff *msg, struct devlink *devlink; int err; - devlink = devlink_get_from_attrs_lock(sock_net(msg->sk), attrs); + devlink = devlink_get_from_attrs_lock(sock_net(msg->sk), attrs, false); if (IS_ERR(devlink)) return PTR_ERR(devlink); err = dump_one(msg, devlink, cb, flags | NLM_F_DUMP_FILTERED); diff --git a/net/devlink/region.c b/net/devlink/region.c index 0aab7b82d6780..e3bab458db940 100644 --- a/net/devlink/region.c +++ b/net/devlink/region.c @@ -883,7 +883,8 @@ int devlink_nl_region_read_dumpit(struct sk_buff *skb, start_offset = state->start_offset; - devlink = devlink_get_from_attrs_lock(sock_net(cb->skb->sk), attrs); + devlink = devlink_get_from_attrs_lock(sock_net(cb->skb->sk), attrs, + false); if (IS_ERR(devlink)) return PTR_ERR(devlink); From c649be67168e2dbb5f14a25929034d22cc2b76ae Mon Sep 17 00:00:00 2001 From: Ido Schimmel Date: Tue, 31 Oct 2023 14:03:04 +0200 Subject: [PATCH 07/21] devlink: Acquire device lock during reload command Device drivers register with devlink from their probe routines (under the device lock) by acquiring the devlink instance lock and calling devl_register(). Drivers that support a devlink reload usually implement the reload_{down, up}() operations in a similar fashion to their remove and probe routines, respectively. However, while the remove and probe routines are invoked with the device lock held, the reload operations are only invoked with the devlink instance lock held. It is therefore impossible for drivers to acquire the device lock from their reload operations, as this would result in lock inversion. The motivating use case for invoking the reload operations with the device lock held is in mlxsw which needs to trigger a PCI reset as part of the reload. The driver cannot call pci_reset_function() as this function acquires the device lock. Instead, it needs to call __pci_reset_function_locked which expects the device lock to be held. To that end, adjust devlink to always acquire the device lock before the devlink instance lock when performing a reload. Do that when reload is explicitly triggered by user space by specifying the 'DEVLINK_NL_FLAG_NEED_DEV_LOCK' flag in the pre_doit and post_doit operations of the reload command. A previous patch already handled the case where reload is invoked as part of netns dismantle. Signed-off-by: Ido Schimmel Reviewed-by: Jiri Pirko --- Documentation/netlink/specs/devlink.yaml | 4 ++-- net/devlink/netlink.c | 13 +++++++++++++ net/devlink/netlink_gen.c | 4 ++-- net/devlink/netlink_gen.h | 5 +++++ 4 files changed, 22 insertions(+), 4 deletions(-) diff --git a/Documentation/netlink/specs/devlink.yaml b/Documentation/netlink/specs/devlink.yaml index c6ba4889575a0..63f0d1c366c28 100644 --- a/Documentation/netlink/specs/devlink.yaml +++ b/Documentation/netlink/specs/devlink.yaml @@ -1480,8 +1480,8 @@ operations: dont-validate: [ strict ] flags: [ admin-perm ] do: - pre: devlink-nl-pre-doit - post: devlink-nl-post-doit + pre: devlink-nl-pre-doit-dev-lock + post: devlink-nl-post-doit-dev-lock request: attributes: - bus-name diff --git a/net/devlink/netlink.c b/net/devlink/netlink.c index 86f12531bf998..fa9afe3e6d9b2 100644 --- a/net/devlink/netlink.c +++ b/net/devlink/netlink.c @@ -138,6 +138,12 @@ int devlink_nl_pre_doit_port(const struct genl_split_ops *ops, return __devlink_nl_pre_doit(skb, info, DEVLINK_NL_FLAG_NEED_PORT); } +int devlink_nl_pre_doit_dev_lock(const struct genl_split_ops *ops, + struct sk_buff *skb, struct genl_info *info) +{ + return __devlink_nl_pre_doit(skb, info, DEVLINK_NL_FLAG_NEED_DEV_LOCK); +} + int devlink_nl_pre_doit_port_optional(const struct genl_split_ops *ops, struct sk_buff *skb, struct genl_info *info) @@ -162,6 +168,13 @@ void devlink_nl_post_doit(const struct genl_split_ops *ops, __devlink_nl_post_doit(skb, info, 0); } +void +devlink_nl_post_doit_dev_lock(const struct genl_split_ops *ops, + struct sk_buff *skb, struct genl_info *info) +{ + __devlink_nl_post_doit(skb, info, DEVLINK_NL_FLAG_NEED_DEV_LOCK); +} + static int devlink_nl_inst_single_dumpit(struct sk_buff *msg, struct netlink_callback *cb, int flags, devlink_nl_dump_one_func_t *dump_one, diff --git a/net/devlink/netlink_gen.c b/net/devlink/netlink_gen.c index 9cbae01692498..41aad298ed4aa 100644 --- a/net/devlink/netlink_gen.c +++ b/net/devlink/netlink_gen.c @@ -846,9 +846,9 @@ const struct genl_split_ops devlink_nl_ops[73] = { { .cmd = DEVLINK_CMD_RELOAD, .validate = GENL_DONT_VALIDATE_STRICT, - .pre_doit = devlink_nl_pre_doit, + .pre_doit = devlink_nl_pre_doit_dev_lock, .doit = devlink_nl_reload_doit, - .post_doit = devlink_nl_post_doit, + .post_doit = devlink_nl_post_doit_dev_lock, .policy = devlink_reload_nl_policy, .maxattr = DEVLINK_ATTR_RELOAD_LIMITS, .flags = GENL_ADMIN_PERM | GENL_CMD_CAP_DO, diff --git a/net/devlink/netlink_gen.h b/net/devlink/netlink_gen.h index 0e9e89c31c314..02f3c0bfae0ee 100644 --- a/net/devlink/netlink_gen.h +++ b/net/devlink/netlink_gen.h @@ -22,12 +22,17 @@ int devlink_nl_pre_doit(const struct genl_split_ops *ops, struct sk_buff *skb, struct genl_info *info); int devlink_nl_pre_doit_port(const struct genl_split_ops *ops, struct sk_buff *skb, struct genl_info *info); +int devlink_nl_pre_doit_dev_lock(const struct genl_split_ops *ops, + struct sk_buff *skb, struct genl_info *info); int devlink_nl_pre_doit_port_optional(const struct genl_split_ops *ops, struct sk_buff *skb, struct genl_info *info); void devlink_nl_post_doit(const struct genl_split_ops *ops, struct sk_buff *skb, struct genl_info *info); +void +devlink_nl_post_doit_dev_lock(const struct genl_split_ops *ops, + struct sk_buff *skb, struct genl_info *info); int devlink_nl_get_doit(struct sk_buff *skb, struct genl_info *info); int devlink_nl_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb); From a486a988a08f53b7fc0aab15bf07f56225d68ff1 Mon Sep 17 00:00:00 2001 From: Ido Schimmel Date: Tue, 31 Oct 2023 14:03:05 +0200 Subject: [PATCH 08/21] devlink: Add device lock assert in reload operation Add an assert to verify that the device lock is always held throughout reload operations. Tested the following flows with netdevsim and mlxsw while lockdep is enabled: netdevsim: # echo "10 1" > /sys/bus/netdevsim/new_device # devlink dev reload netdevsim/netdevsim10 # ip netns add bla # devlink dev reload netdevsim/netdevsim10 netns bla # ip netns del bla # echo 10 > /sys/bus/netdevsim/del_device mlxsw: # devlink dev reload pci/0000:01:00.0 # ip netns add bla # devlink dev reload pci/0000:01:00.0 netns bla # ip netns del bla # echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/remove # echo 1 > /sys/bus/pci/rescan Signed-off-by: Ido Schimmel Reviewed-by: Jiri Pirko --- net/devlink/dev.c | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/net/devlink/dev.c b/net/devlink/dev.c index 4fc7adb326631..ea6a92f2e6a2e 100644 --- a/net/devlink/dev.c +++ b/net/devlink/dev.c @@ -4,6 +4,7 @@ * Copyright (c) 2016 Jiri Pirko */ +#include #include #include #include "devl_internal.h" @@ -433,6 +434,13 @@ int devlink_reload(struct devlink *devlink, struct net *dest_net, struct net *curr_net; int err; + /* Make sure the reload operations are invoked with the device lock + * held to allow drivers to trigger functionality that expects it + * (e.g., PCI reset) and to close possible races between these + * operations and probe/remove. + */ + device_lock_assert(devlink->dev); + memcpy(remote_reload_stats, devlink->stats.remote_reload_stats, sizeof(remote_reload_stats)); From 0298e47f64c0679146300385607d6e8f032e4bde Mon Sep 17 00:00:00 2001 From: Ido Schimmel Date: Tue, 31 Oct 2023 14:03:06 +0200 Subject: [PATCH 09/21] PCI: Add no PM reset quirk for NVIDIA Spectrum devices Spectrum-{1,2,3,4} devices report that a D3hot->D0 transition causes a reset (i.e., they advertise NoSoftRst-). However, this transition does not have any effect on the device: It continues to be operational and network ports remain up. Advertising this support makes it seem as if a PM reset is viable for these devices. Mark it as unavailable to skip it when testing reset methods. Before: # cat /sys/bus/pci/devices/0000\:03\:00.0/reset_method pm bus After: # cat /sys/bus/pci/devices/0000\:03\:00.0/reset_method bus Signed-off-by: Ido Schimmel Acked-by: Bjorn Helgaas --- drivers/pci/quirks.c | 13 +++++++++++++ 1 file changed, 13 insertions(+) diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index eeec1d6f90238..8b2f6d4e4763a 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -3784,6 +3784,19 @@ static void quirk_no_pm_reset(struct pci_dev *dev) DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_VENDOR_ID_ATI, PCI_ANY_ID, PCI_CLASS_DISPLAY_VGA, 8, quirk_no_pm_reset); +/* + * Spectrum-{1,2,3,4} devices report that a D3hot->D0 transition causes a reset + * (i.e., they advertise NoSoftRst-). However, this transition does not have + * any effect on the device: It continues to be operational and network ports + * remain up. Advertising this support makes it seem as if a PM reset is viable + * for these devices. Mark it as unavailable to skip it when testing reset + * methods. + */ +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MELLANOX, 0xcb84, quirk_no_pm_reset); +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MELLANOX, 0xcf6c, quirk_no_pm_reset); +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MELLANOX, 0xcf70, quirk_no_pm_reset); +DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_MELLANOX, 0xcf80, quirk_no_pm_reset); + /* * Thunderbolt controllers with broken MSI hotplug signaling: * Entire 1st generation (Light Ridge, Eagle Ridge, Light Peak) and part From 77462980661a6f1d748fa787250bad9e0f93b74b Mon Sep 17 00:00:00 2001 From: Ido Schimmel Date: Tue, 31 Oct 2023 14:03:07 +0200 Subject: [PATCH 10/21] PCI: Add debug print for device ready delay Currently, the time it took a PCI device to become ready after reset is only printed if it was longer than 1000ms ('PCI_RESET_WAIT'). However, for debugging purposes it is useful to know this time even if it was shorter. For example, with the device I am working on, hardware engineers asked to verify that it becomes ready on the first try (no delay). To that end, add a debug level print that can be enabled using dynamic debug. Example: # echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/reset # dmesg -c | grep ready # echo "file drivers/pci/pci.c +p" > /sys/kernel/debug/dynamic_debug/control # echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/reset # dmesg -c | grep ready [ 396.060335] mlxsw_spectrum4 0000:01:00.0: ready 0ms after bus reset # echo "file drivers/pci/pci.c -p" > /sys/kernel/debug/dynamic_debug/control # echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/reset # dmesg -c | grep ready Signed-off-by: Ido Schimmel Acked-by: Bjorn Helgaas --- drivers/pci/pci.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c index 59c01d68c6d5e..0a708e65c5c4e 100644 --- a/drivers/pci/pci.c +++ b/drivers/pci/pci.c @@ -1216,6 +1216,9 @@ static int pci_dev_wait(struct pci_dev *dev, char *reset_type, int timeout) if (delay > PCI_RESET_WAIT) pci_info(dev, "ready %dms after %s\n", delay - 1, reset_type); + else + pci_dbg(dev, "ready %dms after %s\n", delay - 1, + reset_type); return 0; } From 54a6f441ab96ae52c2997b2dc2070f373ca452d2 Mon Sep 17 00:00:00 2001 From: Amit Cohen Date: Tue, 31 Oct 2023 14:03:08 +0200 Subject: [PATCH 11/21] mlxsw: Extend MRSR pack() function to support new commands Currently mlxsw_reg_mrsr_pack() always sets 'command=1'. As preparation for support of new reset flow, pass the command as an argument to the function and add an enum for this field. For now, always pass 'command=1' to the pack() function. Signed-off-by: Amit Cohen Reviewed-by: Petr Machata Signed-off-by: Ido Schimmel --- drivers/net/ethernet/mellanox/mlxsw/pci.c | 2 +- drivers/net/ethernet/mellanox/mlxsw/reg.h | 14 ++++++++++++-- 2 files changed, 13 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c index e4b25e187467a..7af37f78ed1a4 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/pci.c +++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c @@ -1491,7 +1491,7 @@ static int mlxsw_pci_sw_reset(struct mlxsw_pci *mlxsw_pci, return err; } - mlxsw_reg_mrsr_pack(mrsr_pl); + mlxsw_reg_mrsr_pack(mrsr_pl, MLXSW_REG_MRSR_COMMAND_SOFTWARE_RESET); err = mlxsw_reg_write(mlxsw_pci->core, MLXSW_REG(mrsr), mrsr_pl); if (err) return err; diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h b/drivers/net/ethernet/mellanox/mlxsw/reg.h index 25b294fdeb3df..13c0ff9945378 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/reg.h +++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h @@ -10122,6 +10122,15 @@ mlxsw_reg_mgir_unpack(char *payload, u32 *hw_rev, char *fw_info_psid, MLXSW_REG_DEFINE(mrsr, MLXSW_REG_MRSR_ID, MLXSW_REG_MRSR_LEN); +enum mlxsw_reg_mrsr_command { + /* Switch soft reset, does not reset PCI firmware. */ + MLXSW_REG_MRSR_COMMAND_SOFTWARE_RESET = 1, + /* Reset will be done when PCI link will be disabled. + * This command will reset PCI firmware also. + */ + MLXSW_REG_MRSR_COMMAND_RESET_AT_PCI_DISABLE = 6, +}; + /* reg_mrsr_command * Reset/shutdown command * 0 - do nothing @@ -10130,10 +10139,11 @@ MLXSW_REG_DEFINE(mrsr, MLXSW_REG_MRSR_ID, MLXSW_REG_MRSR_LEN); */ MLXSW_ITEM32(reg, mrsr, command, 0x00, 0, 4); -static inline void mlxsw_reg_mrsr_pack(char *payload) +static inline void mlxsw_reg_mrsr_pack(char *payload, + enum mlxsw_reg_mrsr_command command) { MLXSW_REG_ZERO(mrsr, payload); - mlxsw_reg_mrsr_command_set(payload, 1); + mlxsw_reg_mrsr_command_set(payload, command); } /* MLCR - Management LED Control Register From 1591a891576607021ba0e83d6af5b1321e79bf33 Mon Sep 17 00:00:00 2001 From: Amit Cohen Date: Tue, 31 Oct 2023 14:03:09 +0200 Subject: [PATCH 12/21] mlxsw: pci: Rename mlxsw_pci_sw_reset() In the next patches, mlxsw_pci_sw_reset() will be extended to support more reset types and will not necessarily issue a software reset. Rename the function to reflect that. Signed-off-by: Amit Cohen Reviewed-by: Petr Machata Signed-off-by: Ido Schimmel --- drivers/net/ethernet/mellanox/mlxsw/pci.c | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c index 7af37f78ed1a4..fe0b8a38d44e3 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/pci.c +++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c @@ -1476,8 +1476,8 @@ static int mlxsw_pci_sys_ready_wait(struct mlxsw_pci *mlxsw_pci, return -EBUSY; } -static int mlxsw_pci_sw_reset(struct mlxsw_pci *mlxsw_pci, - const struct pci_device_id *id) +static int +mlxsw_pci_reset(struct mlxsw_pci *mlxsw_pci, const struct pci_device_id *id) { struct pci_dev *pdev = mlxsw_pci->pdev; char mrsr_pl[MLXSW_REG_MRSR_LEN]; @@ -1537,9 +1537,9 @@ static int mlxsw_pci_init(void *bus_priv, struct mlxsw_core *mlxsw_core, if (!mbox) return -ENOMEM; - err = mlxsw_pci_sw_reset(mlxsw_pci, mlxsw_pci->id); + err = mlxsw_pci_reset(mlxsw_pci, mlxsw_pci->id); if (err) - goto err_sw_reset; + goto err_reset; err = mlxsw_pci_alloc_irq_vectors(mlxsw_pci); if (err < 0) { @@ -1672,7 +1672,7 @@ static int mlxsw_pci_init(void *bus_priv, struct mlxsw_core *mlxsw_core, err_query_fw: mlxsw_pci_free_irq_vectors(mlxsw_pci); err_alloc_irq: -err_sw_reset: +err_reset: mbox_put: mlxsw_cmd_mbox_free(mbox); return err; From 1871ac3691753ac3250957d520bf8d01edae2e36 Mon Sep 17 00:00:00 2001 From: Amit Cohen Date: Tue, 31 Oct 2023 14:03:10 +0200 Subject: [PATCH 13/21] mlxsw: pci: Move software reset code to a separate function In general, the existing flow of software reset in the driver is: 1. Wait for system ready status. 2. Send MRSR command, to start the reset. 3. Wait for system ready status. This flow will be extended once a new reset command is supported. As a preparation, move step #2 to a separate function. Signed-off-by: Amit Cohen Reviewed-by: Petr Machata Signed-off-by: Ido Schimmel --- drivers/net/ethernet/mellanox/mlxsw/pci.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c index fe0b8a38d44e3..080881c94c5a4 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/pci.c +++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c @@ -1476,11 +1476,18 @@ static int mlxsw_pci_sys_ready_wait(struct mlxsw_pci *mlxsw_pci, return -EBUSY; } +static int mlxsw_pci_reset_sw(struct mlxsw_pci *mlxsw_pci) +{ + char mrsr_pl[MLXSW_REG_MRSR_LEN]; + + mlxsw_reg_mrsr_pack(mrsr_pl, MLXSW_REG_MRSR_COMMAND_SOFTWARE_RESET); + return mlxsw_reg_write(mlxsw_pci->core, MLXSW_REG(mrsr), mrsr_pl); +} + static int mlxsw_pci_reset(struct mlxsw_pci *mlxsw_pci, const struct pci_device_id *id) { struct pci_dev *pdev = mlxsw_pci->pdev; - char mrsr_pl[MLXSW_REG_MRSR_LEN]; u32 sys_status; int err; @@ -1491,8 +1498,7 @@ mlxsw_pci_reset(struct mlxsw_pci *mlxsw_pci, const struct pci_device_id *id) return err; } - mlxsw_reg_mrsr_pack(mrsr_pl, MLXSW_REG_MRSR_COMMAND_SOFTWARE_RESET); - err = mlxsw_reg_write(mlxsw_pci->core, MLXSW_REG(mrsr), mrsr_pl); + err = mlxsw_pci_reset_sw(mlxsw_pci); if (err) return err; From 34deaacf541455e7cb34eef398447c6b7496587c Mon Sep 17 00:00:00 2001 From: Ido Schimmel Date: Tue, 31 Oct 2023 14:03:11 +0200 Subject: [PATCH 14/21] mlxsw: pci: Add support for new reset flow The driver resets the device during probe and during a devlink reload. The current reset method reloads the current firmware version or a pending one, if one was previously flashed using devlink. However, the current reset method does not result in a PCI hot reset, preventing the PCI firmware from being upgraded, unless the system is rebooted. To solve this problem, a new reset command (6) was implemented in the firmware. Unlike the current command (1), after issuing the new command the device will not start the reset immediately, but only after a PCI hot reset. Implement the new reset method by first verifying that it is supported by the current firmware version by querying the Management Capabilities Mask (MCAM) register. If supported, issue the new reset command (6) via MRSR register followed by a PCI reset by calling __pci_reset_function_locked(). Once the PCI firmware is operational, go back to the regular reset flow and wait for the entire device to become ready. That is, repeatedly read the "system_status" register from the BAR until a value of "FW_READY" (0x5E) appears. Tested: # for i in $(seq 1 10); do devlink dev reload pci/0000:01:00.0; done Signed-off-by: Ido Schimmel Reviewed-by: Petr Machata --- drivers/net/ethernet/mellanox/mlxsw/pci.c | 44 ++++++++++++++++++++++- drivers/net/ethernet/mellanox/mlxsw/reg.h | 2 ++ 2 files changed, 45 insertions(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c index 080881c94c5a4..726ed707e27b9 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/pci.c +++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c @@ -1476,6 +1476,33 @@ static int mlxsw_pci_sys_ready_wait(struct mlxsw_pci *mlxsw_pci, return -EBUSY; } +static int mlxsw_pci_reset_at_pci_disable(struct mlxsw_pci *mlxsw_pci) +{ + struct pci_dev *pdev = mlxsw_pci->pdev; + char mrsr_pl[MLXSW_REG_MRSR_LEN]; + int err; + + mlxsw_reg_mrsr_pack(mrsr_pl, + MLXSW_REG_MRSR_COMMAND_RESET_AT_PCI_DISABLE); + err = mlxsw_reg_write(mlxsw_pci->core, MLXSW_REG(mrsr), mrsr_pl); + if (err) + return err; + + device_lock_assert(&pdev->dev); + + pci_cfg_access_lock(pdev); + pci_save_state(pdev); + + err = __pci_reset_function_locked(pdev); + if (err) + pci_err(pdev, "PCI function reset failed with %d\n", err); + + pci_restore_state(pdev); + pci_cfg_access_unlock(pdev); + + return err; +} + static int mlxsw_pci_reset_sw(struct mlxsw_pci *mlxsw_pci) { char mrsr_pl[MLXSW_REG_MRSR_LEN]; @@ -1488,6 +1515,8 @@ static int mlxsw_pci_reset(struct mlxsw_pci *mlxsw_pci, const struct pci_device_id *id) { struct pci_dev *pdev = mlxsw_pci->pdev; + char mcam_pl[MLXSW_REG_MCAM_LEN]; + bool pci_reset_supported; u32 sys_status; int err; @@ -1498,10 +1527,23 @@ mlxsw_pci_reset(struct mlxsw_pci *mlxsw_pci, const struct pci_device_id *id) return err; } - err = mlxsw_pci_reset_sw(mlxsw_pci); + mlxsw_reg_mcam_pack(mcam_pl, + MLXSW_REG_MCAM_FEATURE_GROUP_ENHANCED_FEATURES); + err = mlxsw_reg_query(mlxsw_pci->core, MLXSW_REG(mcam), mcam_pl); if (err) return err; + mlxsw_reg_mcam_unpack(mcam_pl, MLXSW_REG_MCAM_PCI_RESET, + &pci_reset_supported); + + if (pci_reset_supported) { + pci_dbg(pdev, "Starting PCI reset flow\n"); + err = mlxsw_pci_reset_at_pci_disable(mlxsw_pci); + } else { + pci_dbg(pdev, "Starting software reset flow\n"); + err = mlxsw_pci_reset_sw(mlxsw_pci); + } + err = mlxsw_pci_sys_ready_wait(mlxsw_pci, id, &sys_status); if (err) { dev_err(&pdev->dev, "Failed to reach system ready status after reset. Status is 0x%x\n", diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h b/drivers/net/ethernet/mellanox/mlxsw/reg.h index 13c0ff9945378..e26e9d06bd721 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/reg.h +++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h @@ -10594,6 +10594,8 @@ MLXSW_ITEM32(reg, mcam, feature_group, 0x00, 16, 8); enum mlxsw_reg_mcam_mng_feature_cap_mask_bits { /* If set, MCIA supports 128 bytes payloads. Otherwise, 48 bytes. */ MLXSW_REG_MCAM_MCIA_128B = 34, + /* If set, MRSR.command=6 is supported. */ + MLXSW_REG_MCAM_PCI_RESET = 48, }; #define MLXSW_REG_BYTES_PER_DWORD 0x4 From d6ebfd2a29f5b0eec71a397a64d188e800ae3c9c Mon Sep 17 00:00:00 2001 From: Ido Schimmel Date: Tue, 31 Oct 2023 14:03:12 +0200 Subject: [PATCH 15/21] mlxsw: pci: Implement PCI reset handlers Implement reset_prepare() and reset_done() handlers that are invoked by the PCI core before and after issuing a PCI reset, respectively. Specifically, implement reset_prepare() by calling mlxsw_core_bus_device_unregister() and reset_done() by calling mlxsw_core_bus_device_register(). This is the same implementation as the reload_{down,up}() devlink operations with the following differences: 1. The devlink instance is unregistered and then registered again after the reset. 2. A reset via the device's command interface (using MRSR register) is not issued during reset_done() as PCI core already issued a PCI reset. Tested: # for i in $(seq 1 10); do echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/reset; done Reviewed-by: Petr Machata Signed-off-by: Ido Schimmel --- drivers/net/ethernet/mellanox/mlxsw/pci.c | 28 +++++++++++++++++++++++ 1 file changed, 28 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlxsw/pci.c b/drivers/net/ethernet/mellanox/mlxsw/pci.c index 726ed707e27b9..5b1f2483a3ccf 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/pci.c +++ b/drivers/net/ethernet/mellanox/mlxsw/pci.c @@ -130,6 +130,7 @@ struct mlxsw_pci { const struct pci_device_id *id; enum mlxsw_pci_cqe_v max_cqe_ver; /* Maximal supported CQE version */ u8 num_sdq_cqs; /* Number of CQs used for SDQs */ + bool skip_reset; }; static void mlxsw_pci_queue_tasklet_schedule(struct mlxsw_pci_queue *q) @@ -1527,6 +1528,10 @@ mlxsw_pci_reset(struct mlxsw_pci *mlxsw_pci, const struct pci_device_id *id) return err; } + /* PCI core already issued a PCI reset, do not issue another reset. */ + if (mlxsw_pci->skip_reset) + return 0; + mlxsw_reg_mcam_pack(mcam_pl, MLXSW_REG_MCAM_FEATURE_GROUP_ENHANCED_FEATURES); err = mlxsw_reg_query(mlxsw_pci->core, MLXSW_REG(mcam), mcam_pl); @@ -2107,11 +2112,34 @@ static void mlxsw_pci_remove(struct pci_dev *pdev) kfree(mlxsw_pci); } +static void mlxsw_pci_reset_prepare(struct pci_dev *pdev) +{ + struct mlxsw_pci *mlxsw_pci = pci_get_drvdata(pdev); + + mlxsw_core_bus_device_unregister(mlxsw_pci->core, false); +} + +static void mlxsw_pci_reset_done(struct pci_dev *pdev) +{ + struct mlxsw_pci *mlxsw_pci = pci_get_drvdata(pdev); + + mlxsw_pci->skip_reset = true; + mlxsw_core_bus_device_register(&mlxsw_pci->bus_info, &mlxsw_pci_bus, + mlxsw_pci, false, NULL, NULL); + mlxsw_pci->skip_reset = false; +} + +static const struct pci_error_handlers mlxsw_pci_err_handler = { + .reset_prepare = mlxsw_pci_reset_prepare, + .reset_done = mlxsw_pci_reset_done, +}; + int mlxsw_pci_driver_register(struct pci_driver *pci_driver) { pci_driver->probe = mlxsw_pci_probe; pci_driver->remove = mlxsw_pci_remove; pci_driver->shutdown = mlxsw_pci_remove; + pci_driver->err_handler = &mlxsw_pci_err_handler; return pci_register_driver(pci_driver); } EXPORT_SYMBOL(mlxsw_pci_driver_register); From 18aee495d69d4cc2c3fc40895d3ccbc18e789280 Mon Sep 17 00:00:00 2001 From: Ido Schimmel Date: Tue, 31 Oct 2023 14:03:13 +0200 Subject: [PATCH 16/21] selftests: mlxsw: Add PCI reset test Test that PCI reset works correctly by verifying that only the expected reset methods are supported and that after issuing the reset the ifindex of the port changes. Signed-off-by: Ido Schimmel Reviewed-by: Petr Machata --- .../selftests/drivers/net/mlxsw/pci_reset.sh | 58 +++++++++++++++++++ 1 file changed, 58 insertions(+) create mode 100755 tools/testing/selftests/drivers/net/mlxsw/pci_reset.sh diff --git a/tools/testing/selftests/drivers/net/mlxsw/pci_reset.sh b/tools/testing/selftests/drivers/net/mlxsw/pci_reset.sh new file mode 100755 index 0000000000000..fe0343b95e6c0 --- /dev/null +++ b/tools/testing/selftests/drivers/net/mlxsw/pci_reset.sh @@ -0,0 +1,58 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 +# +# Test that PCI reset works correctly by verifying that only the expected reset +# methods are supported and that after issuing the reset the ifindex of the +# port changes. + +lib_dir=$(dirname $0)/../../../net/forwarding + +ALL_TESTS=" + pci_reset_test +" +NUM_NETIFS=1 +source $lib_dir/lib.sh +source $lib_dir/devlink_lib.sh + +pci_reset_test() +{ + RET=0 + + local bus=$(echo $DEVLINK_DEV | cut -d '/' -f 1) + local bdf=$(echo $DEVLINK_DEV | cut -d '/' -f 2) + + if [ $bus != "pci" ]; then + check_err 1 "devlink device is not a PCI device" + log_test "pci reset" + return + fi + + if [ ! -f /sys/bus/pci/devices/$bdf/reset_method ]; then + check_err 1 "reset is not supported" + log_test "pci reset" + return + fi + + [[ $(cat /sys/bus/pci/devices/$bdf/reset_method) == "bus" ]] + check_err $? "only \"bus\" reset method should be supported" + + local ifindex_pre=$(ip -j link show dev $swp1 | jq '.[]["ifindex"]') + + echo 1 > /sys/bus/pci/devices/$bdf/reset + check_err $? "reset failed" + + # Wait for udev to rename newly created netdev. + udevadm settle + + local ifindex_post=$(ip -j link show dev $swp1 | jq '.[]["ifindex"]') + + [[ $ifindex_pre != $ifindex_post ]] + check_err $? "reset not performed" + + log_test "pci reset" +} + +swp1=${NETIFS[p1]} +tests_run + +exit $EXIT_STATUS From 49796ef9b08d8010c3682c754c143632f6d1f4b0 Mon Sep 17 00:00:00 2001 From: Danielle Ratson Date: Thu, 1 Jul 2021 12:24:16 +0300 Subject: [PATCH 17/21] TMP: selftests: net: Change the indication for iface is up According to Documentation/networking/operstates.rst, the administrative and operational state of an interface can be determined via both the 'ifi_flags' field in the ancillary header of a RTM_NEWLINK message and the 'IFLA_OPERSTATE' attribute in the message. When a driver signals loss of carrier via netif_carrier_off(), the change is reflected immediately in the 'ifi_flags' field, but not in the 'IFLA_OPERSTATE' attribute. This is because changes in 'IFLA_OPERSTATE' are performed in a link watch delayed work. From the document: "Whenever the driver CHANGES one of these flags, a workqueue event is scheduled to translate the flag combination to IFLA_OPERSTATE as follows" This means that it is possible for user space that is constantly querying the kernel for interface state to see the output below when the following commands are run. Their purpose is to simulate tearing down of a selftest and start of a new one. Commands: # ip link set dev veth1 down # ip link set dev veth0 down # ip link set dev veth0 up # ip link set dev veth1 up Output: 11: veth0@veth1: mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/ether 06:49:03:9b:96:12 brd ff:ff:ff:ff:ff:ff 11: veth0@veth1: mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/ether 06:49:03:9b:96:12 brd ff:ff:ff:ff:ff:ff 11: veth0@veth1: mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default qlen 1000 link/ether 06:49:03:9b:96:12 brd ff:ff:ff:ff:ff:ff 11: veth0@veth1: mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/ether 06:49:03:9b:96:12 brd ff:ff:ff:ff:ff:ff 11: veth0@veth1: mtu 1500 qdisc noqueue state UP mode DEFAULT group default qlen 1000 link/ether 06:49:03:9b:96:12 brd ff:ff:ff:ff:ff:ff In the third line, the operational state is set by rtnl_fill_ifinfo() to 'IF_OPER_DOWN' because the interface is administratively down, but 'dev->operstate' is still 'IF_OPER_UP' as the delayed work has yet to run. That is why the interface's operational state is reported as 'IF_OPER_UP' in the next line after it was put administratively up again. The output in the fourth line would make user space believe that the interface is fully operational if only the 'IFLA_OPERSTATE' attribute is considered. This is false as the interface does not have a carrier ('LOWER_UP' is not set) at this stage. Solve this by determining if an interface is operational solely based on the presence of the 'IFF_UP' and 'IFF_LOWER_UP' bits in the 'ifi_flags' field. Signed-off-by: Danielle Ratson Reviewed-by: Jiri Pirko Signed-off-by: Ido Schimmel --- tools/testing/selftests/net/forwarding/lib.sh | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/tools/testing/selftests/net/forwarding/lib.sh b/tools/testing/selftests/net/forwarding/lib.sh index e37a15eda6c24..ad2f5bde9c118 100755 --- a/tools/testing/selftests/net/forwarding/lib.sh +++ b/tools/testing/selftests/net/forwarding/lib.sh @@ -510,7 +510,7 @@ setup_wait_dev_with_timeout() for ((i = 1; i <= $max_iterations; ++i)); do ip link show dev $dev up \ - | grep 'state UP' &> /dev/null + | grep 'UP,LOWER_UP' &> /dev/null if [[ $? -ne 0 ]]; then sleep 1 else From b3a47b6888f89ccac84141998a1c37fc8d1714fd Mon Sep 17 00:00:00 2001 From: Danielle Ratson Date: Tue, 21 Dec 2021 16:29:18 +0200 Subject: [PATCH 18/21] TMP: mlxsw: reg: Extend MFGD register with new fields and parameters Extend MFGD (Monitoring FW General Debug Register) with a new field, en_debug_assert that enables INTR severity for fatal events sent by MFDE. In addition, add a fw_fatal_event_mode for enabling MFDE trap and stopping FW. Signed-off-by: Danielle Ratson Signed-off-by: Ido Schimmel --- drivers/net/ethernet/mellanox/mlxsw/reg.h | 7 +++++++ 1 file changed, 7 insertions(+) diff --git a/drivers/net/ethernet/mellanox/mlxsw/reg.h b/drivers/net/ethernet/mellanox/mlxsw/reg.h index e26e9d06bd721..1efc9ac246cad 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/reg.h +++ b/drivers/net/ethernet/mellanox/mlxsw/reg.h @@ -11173,9 +11173,16 @@ static inline void mlxsw_reg_mtpcpc_pack(char *payload, bool pport, MLXSW_REG_DEFINE(mfgd, MLXSW_REG_MFGD_ID, MLXSW_REG_MFGD_LEN); +/* reg_mfgd_en_debug_assert + * Enables INTR severity for fatal events sent by MFDE + * Access: RW + */ +MLXSW_ITEM32(reg, mfgd, en_debug_assert, 0x00, 16, 1); + /* reg_mfgd_fw_fatal_event_mode * 0 - don't check FW fatal (default) * 1 - check FW fatal - enable MFDE trap + * 2 - check FW fatal stop FW - enable MFDE trap and stop FW * Access: RW */ MLXSW_ITEM32(reg, mfgd, fatal_event_mode, 0x00, 9, 2); From e2f82738de877bd118a9b9898e762c25e338bac1 Mon Sep 17 00:00:00 2001 From: Danielle Ratson Date: Tue, 21 Dec 2021 16:29:19 +0200 Subject: [PATCH 19/21] TMP: mlxsw: core: Enable debug assert in testing environment INTERNAL severity is unexpected state without systemic failure. Enable INTERNAL asserts in MFGD register to be used for better coverage. Signed-off-by: Danielle Ratson Signed-off-by: Ido Schimmel --- drivers/net/ethernet/mellanox/mlxsw/core.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c index f23421f038f3f..2c425aa58fd94 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/core.c +++ b/drivers/net/ethernet/mellanox/mlxsw/core.c @@ -2023,6 +2023,7 @@ static int mlxsw_core_health_fw_fatal_config(struct mlxsw_core *mlxsw_core, err = mlxsw_reg_query(mlxsw_core, MLXSW_REG(mfgd), mfgd_pl); if (err) return err; + mlxsw_reg_mfgd_en_debug_assert_set(mfgd_pl, enable); mlxsw_reg_mfgd_fatal_event_mode_set(mfgd_pl, enable); return mlxsw_reg_write(mlxsw_core, MLXSW_REG(mfgd), mfgd_pl); } From 0e986e7be884095a06a49609ce30f2a3e434bae2 Mon Sep 17 00:00:00 2001 From: Danielle Ratson Date: Tue, 21 Dec 2021 16:29:20 +0200 Subject: [PATCH 20/21] TMP: mlxsw: core: Set MFGD.event_mode to check FW fatal and stop FW MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit MFGD.fw_fatal_event_mode has a new mode for enabling MFDE trap and stopping FW on assert. Use that event mode in testing mode in order to create a more accurate dump. Suggested-by: Miryam Levy > I requested it to be 2 and I don’t see a reason why it should be changed, for > tests environment it is always 2, also for SDK verification. Signed-off-by: Danielle Ratson Signed-off-by: Ido Schimmel --- drivers/net/ethernet/mellanox/mlxsw/core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/mellanox/mlxsw/core.c b/drivers/net/ethernet/mellanox/mlxsw/core.c index 2c425aa58fd94..66c555001387c 100644 --- a/drivers/net/ethernet/mellanox/mlxsw/core.c +++ b/drivers/net/ethernet/mellanox/mlxsw/core.c @@ -2024,7 +2024,7 @@ static int mlxsw_core_health_fw_fatal_config(struct mlxsw_core *mlxsw_core, if (err) return err; mlxsw_reg_mfgd_en_debug_assert_set(mfgd_pl, enable); - mlxsw_reg_mfgd_fatal_event_mode_set(mfgd_pl, enable); + mlxsw_reg_mfgd_fatal_event_mode_set(mfgd_pl, 2); return mlxsw_reg_write(mlxsw_core, MLXSW_REG(mfgd), mfgd_pl); } From 66e24d11c4e0d765b98e6120a94eb4c233d33676 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Fri, 3 Nov 2023 15:51:10 +0000 Subject: [PATCH 21/21] build(deps): bump pip from 23.2.1 to 23.3 in /drivers/gpu/drm/ci/xfails Bumps [pip](https://github.com/pypa/pip) from 23.2.1 to 23.3. - [Changelog](https://github.com/pypa/pip/blob/main/NEWS.rst) - [Commits](https://github.com/pypa/pip/compare/23.2.1...23.3) --- updated-dependencies: - dependency-name: pip dependency-type: direct:production ... Signed-off-by: dependabot[bot] --- drivers/gpu/drm/ci/xfails/requirements.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/ci/xfails/requirements.txt b/drivers/gpu/drm/ci/xfails/requirements.txt index d8856d1581fdb..7247ebc7589fa 100644 --- a/drivers/gpu/drm/ci/xfails/requirements.txt +++ b/drivers/gpu/drm/ci/xfails/requirements.txt @@ -5,7 +5,7 @@ termcolor==2.3.0 certifi==2023.7.22 charset-normalizer==3.2.0 idna==3.4 -pip==23.2.1 +pip==23.3 python-gitlab==3.15.0 requests==2.31.0 requests-toolbelt==1.0.0