-
Notifications
You must be signed in to change notification settings - Fork 126
netdev CI testing #6666
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
kuba-moo
wants to merge
1,060
commits into
kernel-patches:bpf-next_base
Choose a base branch
from
linux-netdev:to-test
base: bpf-next_base
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
netdev CI testing #6666
+41,713
−27,302
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
4f22ee0
to
8a9a8e0
Compare
64c403f
to
8da1f58
Compare
78ebb17
to
9325308
Compare
c8c7b2f
to
a71aae6
Compare
9325308
to
7940ae1
Compare
d8feb00
to
b16a6b9
Compare
7940ae1
to
8f1ff3c
Compare
4164329
to
c5cecb3
Compare
A physical device may consists of several PCI physical functions. Each of this PCI function's "serial_number" is same because they are part of single board. From this serial number, PCI function cannot be uniquely referenced in a system. Expanding this in slightly more complex system of multi-host "board.serial_number" is not even now unique across two hosts. Further expanding this for DPU based board, a DPU board has PCI functions on the external host as well as DPU internal host. Such DPU side PCI physical functions also have the same "serial_number". There is a need to identify each PCI function uniquely in a factory. We are presently missing this function unique identifier. Hence, introduce a function unique identifier, which is uniquely identifies a function across one or multiple hosts, also has unique identifier with/without DPU based NICs. Signed-off-by: Jiri Pirko <[email protected]> Reviewed-by: Parav Pandit <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Kalesh AP <[email protected]> Signed-off-by: NipaLocal <nipa@local>
Devlink info allows to expose function UID. Get the value from PCI VPD and expose it. $ devlink dev info pci/0000:08:00.0: driver mlx5_core serial_number e4397f872caeed218000846daa7d2f49 board.serial_number MT2314XZ00YA function.uid MT2314XZ00YAMLNXS0D0F0 versions: fixed: fw.psid MT_0000000894 running: fw.version 28.41.1000 fw 28.41.1000 stored: fw.version 28.41.1000 fw 28.41.1000 auxiliary/mlx5_core.eth.0: driver mlx5_core.eth pci/0000:08:00.1: driver mlx5_core serial_number e4397f872caeed218000846daa7d2f49 board.serial_number MT2314XZ00YA function.uid MT2314XZ00YAMLNXS0D0F1 versions: fixed: fw.psid MT_0000000894 running: fw.version 28.41.1000 fw 28.41.1000 stored: fw.version 28.41.1000 fw 28.41.1000 auxiliary/mlx5_core.eth.1: driver mlx5_core.eth Signed-off-by: Jiri Pirko <[email protected]> Reviewed-by: Parav Pandit <[email protected]> Reviewed-by: Simon Horman <[email protected]> Reviewed-by: Kalesh AP <[email protected]> Acked-by: Tariq Toukan <[email protected]> Signed-off-by: NipaLocal <nipa@local>
ism.h appears to be part of s390 networking drivers so add it to the corresponding section in MAINTAINERS. This aids developers, and tooling such as get_maintainer.pl alike to CC patches to the appropriate people and mailing lists. And is in keeping with an ongoing effort for NETWORKING entries in MAINTAINERS to more accurately reflect the way code is maintained. Signed-off-by: Simon Horman <[email protected]> Signed-off-by: NipaLocal <nipa@local>
These files are already correctly covered by the S390 NETWORKING DRIVERS section. In practice commits for these drivers feed into the Networking subsystem. So it seems appropriate to also list them under NETWORKING DRIVERS. This aids developers, and tooling such as get_maintainer.pl alike to CC patches to all the appropriate people and mailing lists. And is in keeping with an ongoing effort for NETWORKING entries in MAINTAINERS to more accurately reflect the way code is maintained. Signed-off-by: Simon Horman <[email protected]> Signed-off-by: NipaLocal <nipa@local>
GCC 14.2.0 reports that passing a non-string literal as the format argument of dev_set_name() is potentially insecure. drivers/s390/net/ism_drv.c: In function 'ism_probe': drivers/s390/net/ism_drv.c:615:2: warning: format not a string literal and no format arguments [-Wformat-security] 615 | dev_set_name(&ism->dev, dev_name(&pdev->dev)); | ^~~~~~~~~~~~ It seems to me that as pdev is a PCIE device then the dev_name call above should always return the device's BDF, e.g. 00:12.0. That this should not contain format escape sequences. And thus the current usage is safe. But, it seems better to be safe than sorry. And, as a bonus, compiler output becomes less verbose by addressing this issue. Compile tested only. No functional change intended. Signed-off-by: Simon Horman <[email protected]> Signed-off-by: NipaLocal <nipa@local>
At the time when bpf_xdp_adjust_tail() gained support for non-linear buffers, ENETC was already generating this kind of geometry on RX, due to its use of 2K half page buffers. Frames larger than 1472 bytes (without FCS) are stored as multi-buffer, presenting a need for multi buffer support to work properly even in standard MTU circumstances. Allow bpf_xdp_frags_increase_tail() to know the allocation size of paged data, so it can safely permit growing the tailroom of the buffer from XDP programs. Fixes: bf25146 ("bpf: add frags support to the bpf_xdp_adjust_tail() API") Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: NipaLocal <nipa@local>
This small snippet of code ensures that we do something with the array of RX software buffer descriptor elements after passing the skb to the stack. In this case, we see if the other half of the page is reusable, and if so, we "turn around" the buffers, making them directly usable by enetc_refill_rx_ring() without going to enetc_new_page(). We will need to perform this kind of buffer flipping from a new code path, i.e. from XDP_PASS. Currently, enetc_build_skb() does it there buffer by buffer, but in a subsequent change we will stop using enetc_build_skb() for XDP_PASS. Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: NipaLocal <nipa@local>
…P_PASS Vlatko Markovikj reported that XDP programs attached to ENETC do not work well if they use bpf_xdp_adjust_head() or bpf_xdp_adjust_tail(), combined with the XDP_PASS verdict. A typical use case is to add or remove a VLAN tag. The resulting sk_buff passed to the stack is corrupted, because the algorithm used by the driver for XDP_PASS is to unwind the current buffer pointer in the RX ring and to re-process the current frame with enetc_build_skb() as if XDP hadn't run. That is incorrect because XDP may have modified the geometry of the buffer, which we then are completely unaware of. We are looking at a modified buffer with the original geometry. The initial reaction, both from me and from Vlatko, was to shop around the kernel for code to steal that would calculate a delta between the old and the new XDP buffer geometry, and apply that to the sk_buff too. We noticed that veth and generic xdp have such code. The headroom adjustment is pretty uncontroversial, but what turned out severely problematic is the tailroom. veth has this snippet: __skb_put(skb, off); /* positive on grow, negative on shrink */ which on first sight looks decent enough, except __skb_put() takes an "unsigned int" for the second argument, and the arithmetic seems to only work correctly by coincidence. Second issue, __skb_put() contains a SKB_LINEAR_ASSERT(). It's not a great pattern to make more widespread. The skb may still be nonlinear at that point - it only becomes linear later when resetting skb->data_len to zero. To avoid the above, bpf_prog_run_generic_xdp() does this instead: skb_set_tail_pointer(skb, xdp->data_end - xdp->data); skb->len += off; /* positive on grow, negative on shrink */ which is more open-coded, uses lower-level functions and is in general a bit too much to spread around in driver code. Then there is the snippet: if (xdp_buff_has_frags(xdp)) skb->data_len = skb_shinfo(skb)->xdp_frags_size; else skb->data_len = 0; One would have expected __pskb_trim() to be the function of choice for this task. But it's not used in veth/xdpgeneric because the extraneous fragments were _already_ freed by bpf_xdp_adjust_tail() -> bpf_xdp_frags_shrink_tail() -> ... -> __xdp_return() - the backing memory for the skb frags and the xdp frags is the same, but they don't keep individual references. In fact, that is the biggest reason why this snippet cannot be reused as-is, because ENETC temporarily constructs an skb with the original len and the original number of frags. Because the extraneous frags are already freed by bpf_xdp_adjust_tail() and returned to the page allocator, it means the entire approach of using enetc_build_skb() is questionable for XDP_PASS. To avoid that, one would need to elevate the page refcount of all frags before calling bpf_prog_run_xdp() and drop it after XDP_PASS. There are other things that are missing in ENETC's handling of XDP_PASS, like for example updating skb_shinfo(skb)->meta_len. These are all handled correctly and cleanly in commit 539c1fb ("xdp: add generic xdp_build_skb_from_buff()"), added to net-next in Dec 2024, and in addition might even be quicker that way. I have a very strong preference towards backporting that commit for "stable", and that is what is used to fix the handling bugs. It is way too messy to go this deep into the guts of an sk_buff from the code of a device driver. Fixes: d1b1510 ("net: enetc: add support for XDP_DROP and XDP_PASS") Reported-by: Vlatko Markovikj <[email protected]> Signed-off-by: Vladimir Oltean <[email protected]> Signed-off-by: NipaLocal <nipa@local>
The function xdp_convert_buff_to_frame() may return NULL if it fails to correctly convert the XDP buffer into an XDP frame due to memory constraints, internal errors, or invalid data. Failing to check for NULL may lead to a NULL pointer dereference if the result is used later in processing, potentially causing crashes, data corruption, or undefined behavior. On XDP redirect failure, the associated page must be released explicitly if it was previously retained via get_page(). Failing to do so may result in a memory leak, as the pages reference count is not decremented. Found by Linux Verification Center (linuxtesting.org) with SVACE. Cc: [email protected] # v5.9+ Fixes: 6c5aa6f ("xen networking: add basic XDP support for xen-netfront") Signed-off-by: Alexey Nepomnyashih <[email protected]> Signed-off-by: NipaLocal <nipa@local>
Leverage the new nlmsg_payload() helper to avoid checking for message size and then reading the nlmsg data. Signed-off-by: Breno Leitao <[email protected]> Signed-off-by: NipaLocal <nipa@local>
Leverage the new nlmsg_payload() helper to avoid checking for message size and then reading the nlmsg data. Signed-off-by: Breno Leitao <[email protected]> Signed-off-by: NipaLocal <nipa@local>
Force the use of kmemleak to check everything's OK and report results for each test case. Also, useless sleeps were removed, and the bash script was renamed to something that makes more sense. Due to kmemleak, some tests may be false negatives. To mitigate that (i.e., to have more stable results), the solution of a kmemleak scan at the end (vs. for each test) was preferred. Signed-off-by: Justin Iurman <[email protected]> Signed-off-by: NipaLocal <nipa@local>
In the past %pK was preferable to %p as it would not leak raw pointer values into the kernel log. Since commit ad67b74 ("printk: hash addresses printed with %p") the regular %p has been improved to avoid this issue. Furthermore, restricted pointers ("%pK") were never meant to be used through printk(). They can still unintentionally leak raw pointers or acquire sleeping looks in atomic contexts. Switch to the regular pointer formatting which is safer and easier to reason about. There are still a few users of %pK left, but these use it through seq_file, for which its usage is safe. Acked-by: Przemek Kitszel <[email protected]> Signed-off-by: Thomas Weißschuh <[email protected]> Reviewed-by: Aleksandr Loktionov <[email protected]> Signed-off-by: NipaLocal <nipa@local>
In the past %pK was preferable to %p as it would not leak raw pointer values into the kernel log. Since commit ad67b74 ("printk: hash addresses printed with %p") the regular %p has been improved to avoid this issue. Furthermore, restricted pointers ("%pK") were never meant to be used through tracepoints. They can still unintentionally leak raw pointers or acquire sleeping looks in atomic contexts. Switch to the regular pointer formatting which is safer and easier to reason about. There are still a few users of %pK left, but these use it through seq_file, for which its usage is safe. Signed-off-by: Thomas Weißschuh <[email protected]> Reviewed-by: Aleksandr Loktionov <[email protected]> Reviewed-by: Tariq Toukan <[email protected]> Signed-off-by: NipaLocal <nipa@local>
…functions When a bridge port STP state is changed from BLOCKING/DISABLED to FORWARDING, the port's igmp query timer will NOT re-arm itself if the bridge has been configured as per-VLAN multicast snooping. Solve this by choosing the correct multicast context(s) to enable/disable port multicast based on whether per-VLAN multicast snooping is enabled or not, i.e. using per-{port, VLAN} context in case of per-VLAN multicast snooping by re-implementing br_multicast_enable_port() and br_multicast_disable_port() functions. Before the patch, the IGMP query does not happen in the last step of the following test sequence, i.e. no growth for tx counter: # ip link add name br1 up type bridge vlan_filtering 1 mcast_snooping 1 mcast_vlan_snooping 1 mcast_querier 1 mcast_stats_enabled 1 # bridge vlan global set vid 1 dev br1 mcast_snooping 1 mcast_querier 1 mcast_query_interval 100 mcast_startup_query_count 0 # ip link add name swp1 up master br1 type dummy # bridge link set dev swp1 state 0 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 # sleep 1 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 # bridge link set dev swp1 state 3 # sleep 2 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 After the patch, the IGMP query happens in the last step of the test: # ip link add name br1 up type bridge vlan_filtering 1 mcast_snooping 1 mcast_vlan_snooping 1 mcast_querier 1 mcast_stats_enabled 1 # bridge vlan global set vid 1 dev br1 mcast_snooping 1 mcast_querier 1 mcast_query_interval 100 mcast_startup_query_count 0 # ip link add name swp1 up master br1 type dummy # bridge link set dev swp1 state 0 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 # sleep 1 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 # bridge link set dev swp1 state 3 # sleep 2 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 3 Signed-off-by: Yong Wang <[email protected]> Reviewed-by: Andy Roulin <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Signed-off-by: Petr Machata <[email protected]> Signed-off-by: NipaLocal <nipa@local>
When the vlan STP state is changed, which could be manipulated by "bridge vlan" commands, similar to port STP state, this also impacts multicast behaviors such as igmp query. In the scenario of per-VLAN snooping, there's a need to update the corresponding multicast context to re-arm the port query timer when vlan state becomes "forwarding" etc. Update br_vlan_set_state() function to enable vlan multicast context in such scenario. Before the patch, the IGMP query does not happen in the last step of the following test sequence, i.e. no growth for tx counter: # ip link add name br1 up type bridge vlan_filtering 1 mcast_snooping 1 mcast_vlan_snooping 1 mcast_querier 1 mcast_stats_enabled 1 # bridge vlan global set vid 1 dev br1 mcast_snooping 1 mcast_querier 1 mcast_query_interval 100 mcast_startup_query_count 0 # ip link add name swp1 up master br1 type dummy # sleep 1 # bridge vlan set vid 1 dev swp1 state 4 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 # sleep 1 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 # bridge vlan set vid 1 dev swp1 state 3 # sleep 2 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 After the patch, the IGMP query happens in the last step of the test: # ip link add name br1 up type bridge vlan_filtering 1 mcast_snooping 1 mcast_vlan_snooping 1 mcast_querier 1 mcast_stats_enabled 1 # bridge vlan global set vid 1 dev br1 mcast_snooping 1 mcast_querier 1 mcast_query_interval 100 mcast_startup_query_count 0 # ip link add name swp1 up master br1 type dummy # sleep 1 # bridge vlan set vid 1 dev swp1 state 4 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 # sleep 1 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 1 # bridge vlan set vid 1 dev swp1 state 3 # sleep 2 # ip -j -p stats show dev swp1 group xstats_slave subgroup bridge suite mcast | jq '.[]["multicast"]["igmp_queries"]["tx_v2"]' 3 Signed-off-by: Yong Wang <[email protected]> Reviewed-by: Andy Roulin <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Signed-off-by: Petr Machata <[email protected]> Signed-off-by: NipaLocal <nipa@local>
…e changes Change ALL_TESTS definition to "test-per-line". Add the test case of per vlan snooping with port stp state change to forwarding and also vlan equivalent case in both bridge_igmp.sh and bridge_mld.sh. Signed-off-by: Yong Wang <[email protected]> Reviewed-by: Andy Roulin <[email protected]> Reviewed-by: Ido Schimmel <[email protected]> Signed-off-by: Petr Machata <[email protected]> Signed-off-by: NipaLocal <nipa@local>
The "noqueue" qdisc can either be directly attached, or get default attached if net_device priv_flags has IFF_NO_QUEUE. In both cases, the allocated Qdisc structure gets it's enqueue function pointer reset to NULL by noqueue_init() via noqueue_qdisc_ops. This is a common case for software virtual net_devices. For these devices with no-queue, the transmission path in __dev_queue_xmit() will bypass the qdisc layer. Directly invoking device drivers ndo_start_xmit (via dev_hard_start_xmit). In this mode the device driver is not allowed to ask for packets to be queued (either via returning NETDEV_TX_BUSY or stopping the TXQ). The simplest and most reliable way to identify this no-queue case is by checking if enqueue == NULL. The vrf driver currently open-codes this check (!qdisc->enqueue). While functionally correct, this low-level detail is better encapsulated in a dedicated helper for clarity and long-term maintainability. To make this behavior more explicit and reusable, this patch introduce a new helper: qdisc_txq_has_no_queue(). Helper will also be used by the veth driver in the next patch, which introduces optional qdisc-based backpressure. This is a non-functional change. Reviewed-by: David Ahern <[email protected]> Reviewed-by: Toke Høiland-Jørgensen <[email protected]> Signed-off-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: NipaLocal <nipa@local>
In production, we're seeing TX drops on veth devices when the ptr_ring fills up. This can occur when NAPI mode is enabled, though it's relatively rare. However, with threaded NAPI - which we use in production - the drops become significantly more frequent. The underlying issue is that with threaded NAPI, the consumer often runs on a different CPU than the producer. This increases the likelihood of the ring filling up before the consumer gets scheduled, especially under load, leading to drops in veth_xmit() (ndo_start_xmit()). This patch introduces backpressure by returning NETDEV_TX_BUSY when the ring is full, signaling the qdisc layer to requeue the packet. The txq (netdev queue) is stopped in this condition and restarted once veth_poll() drains entries from the ring, ensuring coordination between NAPI and qdisc. Backpressure is only enabled when a qdisc is attached. Without a qdisc, the driver retains its original behavior - dropping packets immediately when the ring is full. This avoids unexpected behavior changes in setups without a configured qdisc. With a qdisc in place (e.g. fq, sfq) this allows Active Queue Management (AQM) to fairly schedule packets across flows and reduce collateral damage from elephant flows. A known limitation of this approach is that the full ring sits in front of the qdisc layer, effectively forming a FIFO buffer that introduces base latency. While AQM still improves fairness and mitigates flow dominance, the latency impact is measurable. In hardware drivers, this issue is typically addressed using BQL (Byte Queue Limits), which tracks in-flight bytes needed based on physical link rate. However, for virtual drivers like veth, there is no fixed bandwidth constraint - the bottleneck is CPU availability and the scheduler's ability to run the NAPI thread. It is unclear how effective BQL would be in this context. This patch serves as a first step toward addressing TX drops. Future work may explore adapting a BQL-like mechanism to better suit virtual devices like veth. Reported-by: Yan Zhai <[email protected]> Signed-off-by: Jesper Dangaard Brouer <[email protected]> Signed-off-by: NipaLocal <nipa@local>
tc_actions.sh keeps hanging the forwarding tests. sdf@: tdc & tdc-dbg started intermittenly failing around Sep 25th Signed-off-by: NipaLocal <nipa@local>
Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>
Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>
Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>
Signed-off-by: NipaLocal <nipa@local>
Signed-off-by: NipaLocal <nipa@local>
Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>
Signed-off-by: Jakub Kicinski <[email protected]> Signed-off-by: NipaLocal <nipa@local>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Reusable PR for hooking netdev CI to BPF testing.