Skip to content

Conversation

bmastbergen
Copy link
Collaborator

Background

This PR started with a backport of 3f981138109f sch_hfsc: Fix qlen accounting bug when using peek in hfsc_enqueue() to address CVE-2025-38000. This had a minor conflict due to a prior backport, but no changes to the commit content.

3f98113 had an upstream fix in 103406b38c60 net/sched: Always pass notifications when child class becomes empty which is itself associated with CVE-2025-38350

103406b had an upstream fix in 87c6efc5ce9c net/sched: ets: use old 'nbands' while purging unused classes which is itself associated with CVE-2025-38684. But it needs three prereqs (a7a15f39c682, d92adacdd8c2, c5f1dde7f731) one of which is associated with CVE-2025-38107

Commits

    net/sched: ets: use old 'nbands' while purging unused classes

    jira VULN-136263
    cve CVE-2025-38684
    commit-author Davide Caratti <dcaratti@redhat.com>
    commit 87c6efc5ce9c126ae4a781bc04504b83780e3650
    net_sched: sch_ets: implement lockless ets_dump()

    jira VULN-136263
    cve-pre CVE-2025-38684
    commit-author Eric Dumazet <edumazet@google.com>
    commit c5f1dde7f731e7bf2e7c169ca42cb4989fc2f8b9
    net_sched: ets: fix a race in ets_qdisc_change()

    jira VULN-71728
    cve CVE-2025-38107
    commit-author Eric Dumazet <edumazet@google.com>
    commit d92adacdd8c2960be856e0b82acc5b7c5395fddb
    sch_ets: make est_qlen_notify() idempotent

    jira VULN-71728
    cve-pre CVE-2025-38107
    commit-author Cong Wang <xiyou.wangcong@gmail.com>
    commit a7a15f39c682ac4268624da2abdb9114bdde96d5
    net/sched: Always pass notifications when child class becomes empty

    jira VULN-136684
    cve CVE-2025-38350
    commit-author Lion Ackermann <nnamrec@gmail.com>
    commit 103406b38c600fec1fe375a77b27d87e314aea09
    sch_hfsc: Fix qlen accounting bug when using peek in hfsc_enqueue()

    jira VULN-68355
    cve CVE-2025-38000
    commit-author Cong Wang <xiyou.wangcong@gmail.com>
    commit 3f981138109f63232a5fb7165938d4c945cc1b9d
    upstream-diff Minor conflict in hfsc_enqueue because we have already
                  backported ac9fe7dd8e73 which changed !cl->cl_nactive
                  to cl_in_el_or_vttree(cl).  No changes to the commit
                  content.

Build Log

/home/brett/kernel-src-tree
Running make mrproper...
[TIMER]{MRPROPER}: 12s
x86_64 architecture detected, copying config
'configs/kernel-x86_64-rhel.config' -> '.config'
Setting Local Version for build
CONFIG_LOCALVERSION="-bmastbergen_ciqlts9_2_many-vulns-9-19-25-dd196621c035"
Making olddefconfig
--
  HOSTCC  scripts/kconfig/util.o
  HOSTLD  scripts/kconfig/conf
#
# configuration written to .config
#
Starting Build
  SYSHDR  arch/x86/include/generated/uapi/asm/unistd_32.h
  SYSHDR  arch/x86/include/generated/uapi/asm/unistd_64.h
  SYSHDR  arch/x86/include/generated/uapi/asm/unistd_x32.h
  SYSTBL  arch/x86/include/generated/asm/syscalls_32.h
  SYSHDR  arch/x86/include/generated/asm/unistd_32_ia32.h
--
  LD [M]  sound/xen/snd_xen_front.ko
  LD [M]  virt/lib/irqbypass.ko
  BTF [M] sound/x86/snd-hdmi-lpe-audio.ko
  BTF [M] virt/lib/irqbypass.ko
  BTF [M] sound/xen/snd_xen_front.ko
[TIMER]{BUILD}: 921s
Making Modules
  INSTALL /lib/modules/5.14.0-bmastbergen_ciqlts9_2_many-vulns-9-19-25-dd196621c035+/kernel/arch/x86/crypto/blake2s-x86_64.ko
  INSTALL /lib/modules/5.14.0-bmastbergen_ciqlts9_2_many-vulns-9-19-25-dd196621c035+/kernel/arch/x86/crypto/blowfish-x86_64.ko
  INSTALL /lib/modules/5.14.0-bmastbergen_ciqlts9_2_many-vulns-9-19-25-dd196621c035+/kernel/arch/x86/crypto/camellia-aesni-avx-x86_64.ko
  INSTALL /lib/modules/5.14.0-bmastbergen_ciqlts9_2_many-vulns-9-19-25-dd196621c035+/kernel/arch/x86/crypto/camellia-x86_64.ko
--
  SIGN    /lib/modules/5.14.0-bmastbergen_ciqlts9_2_many-vulns-9-19-25-dd196621c035+/kernel/sound/virtio/virtio_snd.ko
  SIGN    /lib/modules/5.14.0-bmastbergen_ciqlts9_2_many-vulns-9-19-25-dd196621c035+/kernel/sound/xen/snd_xen_front.ko
  SIGN    /lib/modules/5.14.0-bmastbergen_ciqlts9_2_many-vulns-9-19-25-dd196621c035+/kernel/virt/lib/irqbypass.ko
  SIGN    /lib/modules/5.14.0-bmastbergen_ciqlts9_2_many-vulns-9-19-25-dd196621c035+/kernel/sound/usb/snd-usb-audio.ko
  DEPMOD  /lib/modules/5.14.0-bmastbergen_ciqlts9_2_many-vulns-9-19-25-dd196621c035+
[TIMER]{MODULES}: 8s
Making Install
sh ./arch/x86/boot/install.sh \
	5.14.0-bmastbergen_ciqlts9_2_many-vulns-9-19-25-dd196621c035+ arch/x86/boot/bzImage \
	System.map "/boot"
[TIMER]{INSTALL}: 57s
Checking kABI
kABI check passed
Setting Default Kernel to /boot/vmlinuz-5.14.0-bmastbergen_ciqlts9_2_many-vulns-9-19-25-dd196621c035+ and Index to 2
Hopefully Grub2.0 took everything ... rebooting after time metrices
[TIMER]{MRPROPER}: 12s
[TIMER]{BUILD}: 921s
[TIMER]{MODULES}: 8s
[TIMER]{INSTALL}: 57s
[TIMER]{TOTAL} 1017s
Rebooting in 10 seconds

Testing

selftest-5.14.0-284.30.1.el9_2.92ciq_lts.10.1.x86_64.log

selftest-5.14.0-bmastbergen_ciqlts9_2_many-vulns-9-19-25-dd196621c035+.log

brett@lycia ~/ciq/many-92-vulns-9-19-25
 % grep ^ok selftest-5.14.0-284.30.1.el9_2.92ciq_lts.10.1.x86_64.log | wc -l
297
brett@lycia ~/ciq/many-92-vulns-9-19-25
 % grep ^ok selftest-5.14.0-bmastbergen_ciqlts9_2_many-vulns-9-19-25-dd196621c035+.log | wc -l
296
brett@lycia ~/ciq/many-92-vulns-9-19-25
 %

jira VULN-68355
cve CVE-2025-38000
commit-author Cong Wang <xiyou.wangcong@gmail.com>
commit 3f98113
upstream-diff Minor conflict in hfsc_enqueue because we have already
              backported ac9fe7d which changed !cl->cl_nactive
              to cl_in_el_or_vttree(cl).  No changes to the commit
              content.

When enqueuing the first packet to an HFSC class, hfsc_enqueue() calls the
child qdisc's peek() operation before incrementing sch->q.qlen and
sch->qstats.backlog. If the child qdisc uses qdisc_peek_dequeued(), this may
trigger an immediate dequeue and potential packet drop. In such cases,
qdisc_tree_reduce_backlog() is called, but the HFSC qdisc's qlen and backlog
have not yet been updated, leading to inconsistent queue accounting. This
can leave an empty HFSC class in the active list, causing further
consequences like use-after-free.

This patch fixes the bug by moving the increment of sch->q.qlen and
sch->qstats.backlog before the call to the child qdisc's peek() operation.
This ensures that queue length and backlog are always accurate when packet
drops or dequeues are triggered during the peek.

Fixes: 12d0ad3 ("net/sched/sch_hfsc.c: handle corner cases where head may change invalidating calculated deadline")
	Reported-by: Mingi Cho <mincho@theori.io>
	Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
	Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://patch.msgid.link/20250518222038.58538-2-xiyou.wangcong@gmail.com
	Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
	Signed-off-by: Paolo Abeni <pabeni@redhat.com>
(cherry picked from commit 3f98113)
	Signed-off-by: Brett Mastbergen <bmastbergen@ciq.com>
jira VULN-136684
cve CVE-2025-38350
commit-author Lion Ackermann <nnamrec@gmail.com>
commit 103406b

Certain classful qdiscs may invoke their classes' dequeue handler on an
enqueue operation. This may unexpectedly empty the child qdisc and thus
make an in-flight class passive via qlen_notify(). Most qdiscs do not
expect such behaviour at this point in time and may re-activate the
class eventually anyways which will lead to a use-after-free.

The referenced fix commit attempted to fix this behavior for the HFSC
case by moving the backlog accounting around, though this turned out to
be incomplete since the parent's parent may run into the issue too.
The following reproducer demonstrates this use-after-free:

    tc qdisc add dev lo root handle 1: drr
    tc filter add dev lo parent 1: basic classid 1:1
    tc class add dev lo parent 1: classid 1:1 drr
    tc qdisc add dev lo parent 1:1 handle 2: hfsc def 1
    tc class add dev lo parent 2: classid 2:1 hfsc rt m1 8 d 1 m2 0
    tc qdisc add dev lo parent 2:1 handle 3: netem
    tc qdisc add dev lo parent 3:1 handle 4: blackhole

    echo 1 | socat -u STDIN UDP4-DATAGRAM:127.0.0.1:8888
    tc class delete dev lo classid 1:1
    echo 1 | socat -u STDIN UDP4-DATAGRAM:127.0.0.1:8888

Since backlog accounting issues leading to a use-after-frees on stale
class pointers is a recurring pattern at this point, this patch takes
a different approach. Instead of trying to fix the accounting, the patch
ensures that qdisc_tree_reduce_backlog always calls qlen_notify when
the child qdisc is empty. This solves the problem because deletion of
qdiscs always involves a call to qdisc_reset() and / or
qdisc_purge_queue() which ultimately resets its qlen to 0 thus causing
the following qdisc_tree_reduce_backlog() to report to the parent. Note
that this may call qlen_notify on passive classes multiple times. This
is not a problem after the recent patch series that made all the
classful qdiscs qlen_notify() handlers idempotent.

Fixes: 3f98113 ("sch_hfsc: Fix qlen accounting bug when using peek in hfsc_enqueue()")
	Signed-off-by: Lion Ackermann <nnamrec@gmail.com>
	Reviewed-by: Jamal Hadi Salim <jhs@mojatatu.com>
	Acked-by: Cong Wang <xiyou.wangcong@gmail.com>
	Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
Link: https://patch.msgid.link/d912cbd7-193b-4269-9857-525bee8bbb6a@gmail.com
	Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit 103406b)
	Signed-off-by: Brett Mastbergen <bmastbergen@ciq.com>
jira VULN-71728
cve-pre CVE-2025-38107
commit-author Cong Wang <xiyou.wangcong@gmail.com>
commit a7a15f3

est_qlen_notify() deletes its class from its active list with
list_del() when qlen is 0, therefore, it is not idempotent and
not friendly to its callers, like fq_codel_dequeue().

Let's make it idempotent to ease qdisc_tree_reduce_backlog() callers'
life. Also change other list_del()'s to list_del_init() just to be
extra safe.

	Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg>
	Signed-off-by: Cong Wang <xiyou.wangcong@gmail.com>
Link: https://patch.msgid.link/20250403211033.166059-6-xiyou.wangcong@gmail.com
	Acked-by: Jamal Hadi Salim <jhs@mojatatu.com>
	Signed-off-by: Paolo Abeni <pabeni@redhat.com>
(cherry picked from commit a7a15f3)
	Signed-off-by: Brett Mastbergen <bmastbergen@ciq.com>
jira VULN-71728
cve CVE-2025-38107
commit-author Eric Dumazet <edumazet@google.com>
commit d92adac

Gerrard Tai reported a race condition in ETS, whenever SFQ perturb timer
fires at the wrong time.

The race is as follows:

CPU 0                                 CPU 1
[1]: lock root
[2]: qdisc_tree_flush_backlog()
[3]: unlock root
 |
 |                                    [5]: lock root
 |                                    [6]: rehash
 |                                    [7]: qdisc_tree_reduce_backlog()
 |
[4]: qdisc_put()

This can be abused to underflow a parent's qlen.

Calling qdisc_purge_queue() instead of qdisc_tree_flush_backlog()
should fix the race, because all packets will be purged from the qdisc
before releasing the lock.

Fixes: b05972f ("net: sched: tbf: don't call qdisc_put() while holding tree lock")
	Reported-by: Gerrard Tai <gerrard.tai@starlabs.sg>
	Suggested-by: Gerrard Tai <gerrard.tai@starlabs.sg>
	Signed-off-by: Eric Dumazet <edumazet@google.com>
Link: https://patch.msgid.link/20250611111515.1983366-5-edumazet@google.com
	Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit d92adac)
	Signed-off-by: Brett Mastbergen <bmastbergen@ciq.com>
jira VULN-136263
cve-pre CVE-2025-38684
commit-author Eric Dumazet <edumazet@google.com>
commit c5f1dde

Instead of relying on RTNL, ets_dump() can use READ_ONCE()
annotations, paired with WRITE_ONCE() ones in ets_change().

	Signed-off-by: Eric Dumazet <edumazet@google.com>
	Reviewed-by: Simon Horman <horms@kernel.org>
	Signed-off-by: David S. Miller <davem@davemloft.net>
(cherry picked from commit c5f1dde)
	Signed-off-by: Brett Mastbergen <bmastbergen@ciq.com>
jira VULN-136263
cve CVE-2025-38684
commit-author Davide Caratti <dcaratti@redhat.com>
commit 87c6efc

Shuang reported sch_ets test-case [1] crashing in ets_class_qlen_notify()
after recent changes from Lion [2]. The problem is: in ets_qdisc_change()
we purge unused DWRR queues; the value of 'q->nbands' is the new one, and
the cleanup should be done with the old one. The problem is here since my
first attempts to fix ets_qdisc_change(), but it surfaced again after the
recent qdisc len accounting fixes. Fix it purging idle DWRR queues before
assigning a new value of 'q->nbands', so that all purge operations find a
consistent configuration:

 - old 'q->nbands' because it's needed by ets_class_find()
 - old 'q->nstrict' because it's needed by ets_class_is_strict()

 BUG: kernel NULL pointer dereference, address: 0000000000000000
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 0 P4D 0
 Oops: Oops: 0000 [#1] SMP NOPTI
 CPU: 62 UID: 0 PID: 39457 Comm: tc Kdump: loaded Not tainted 6.12.0-116.el10.x86_64 #1 PREEMPT(voluntary)
 Hardware name: Dell Inc. PowerEdge R640/06DKY5, BIOS 2.12.2 07/09/2021
 RIP: 0010:__list_del_entry_valid_or_report+0x4/0x80
 Code: ff 4c 39 c7 0f 84 39 19 8e ff b8 01 00 00 00 c3 cc cc cc cc 66 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 f3 0f 1e fa <48> 8b 17 48 8b 4f 08 48 85 d2 0f 84 56 19 8e ff 48 85 c9 0f 84 ab
 RSP: 0018:ffffba186009f400 EFLAGS: 00010202
 RAX: 00000000000000d6 RBX: 0000000000000000 RCX: 0000000000000004
 RDX: ffff9f0fa29b69c0 RSI: 0000000000000000 RDI: 0000000000000000
 RBP: ffffffffc12c2400 R08: 0000000000000008 R09: 0000000000000004
 R10: ffffffffffffffff R11: 0000000000000004 R12: 0000000000000000
 R13: ffff9f0f8cfe0000 R14: 0000000000100005 R15: 0000000000000000
 FS:  00007f2154f37480(0000) GS:ffff9f269c1c0000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: 0000000000000000 CR3: 00000001530be001 CR4: 00000000007726f0
 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
 PKRU: 55555554
 Call Trace:
  <TASK>
  ets_class_qlen_notify+0x65/0x90 [sch_ets]
  qdisc_tree_reduce_backlog+0x74/0x110
  ets_qdisc_change+0x630/0xa40 [sch_ets]
  __tc_modify_qdisc.constprop.0+0x216/0x7f0
  tc_modify_qdisc+0x7c/0x120
  rtnetlink_rcv_msg+0x145/0x3f0
  netlink_rcv_skb+0x53/0x100
  netlink_unicast+0x245/0x390
  netlink_sendmsg+0x21b/0x470
  ____sys_sendmsg+0x39d/0x3d0
  ___sys_sendmsg+0x9a/0xe0
  __sys_sendmsg+0x7a/0xd0
  do_syscall_64+0x7d/0x160
  entry_SYSCALL_64_after_hwframe+0x76/0x7e
 RIP: 0033:0x7f2155114084
 Code: 89 02 b8 ff ff ff ff eb bb 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 00 f3 0f 1e fa 80 3d 25 f0 0c 00 00 74 13 b8 2e 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 54 c3 0f 1f 00 48 83 ec 28 89 54 24 1c 48 89
 RSP: 002b:00007fff1fd7a988 EFLAGS: 00000202 ORIG_RAX: 000000000000002e
 RAX: ffffffffffffffda RBX: 0000560ec063e5e0 RCX: 00007f2155114084
 RDX: 0000000000000000 RSI: 00007fff1fd7a9f0 RDI: 0000000000000003
 RBP: 00007fff1fd7aa60 R08: 0000000000000010 R09: 000000000000003f
 R10: 0000560ee9b3a010 R11: 0000000000000202 R12: 00007fff1fd7aae0
 R13: 000000006891ccde R14: 0000560ec063e5e0 R15: 00007fff1fd7aad0
  </TASK>

 [1] https://lore.kernel.org/netdev/e08c7f4a6882f260011909a868311c6e9b54f3e4.1639153474.git.dcaratti@redhat.com/
 [2] https://lore.kernel.org/netdev/d912cbd7-193b-4269-9857-525bee8bbb6a@gmail.com/

	Cc: stable@vger.kernel.org
Fixes: 103406b ("net/sched: Always pass notifications when child class becomes empty")
Fixes: c062f2a ("net/sched: sch_ets: don't remove idle classes from the round-robin list")
Fixes: dcc68b4 ("net: sch_ets: Add a new Qdisc")
	Reported-by: Li Shuang <shuali@redhat.com>
Closes: https://issues.redhat.com/browse/RHEL-108026
	Reviewed-by: Petr Machata <petrm@nvidia.com>
Co-developed-by: Ivan Vecera <ivecera@redhat.com>
	Signed-off-by: Ivan Vecera <ivecera@redhat.com>
	Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Link: https://patch.msgid.link/7928ff6d17db47a2ae7cc205c44777b1f1950545.1755016081.git.dcaratti@redhat.com
	Signed-off-by: Jakub Kicinski <kuba@kernel.org>
(cherry picked from commit 87c6efc)
	Signed-off-by: Brett Mastbergen <bmastbergen@ciq.com>
Copy link

🔍 Upstream Linux Kernel Commit Check

  • ⚠️ PR commit 25d8933619a6 (sch_hfsc: Fix qlen accounting bug when using peek in hfsc_enqueue()) references upstream commit
    3f981138109f which has been referenced by a Fixes: tag in the upstream
    Linux kernel:
    103406b38c60 net/sched: Always pass notifications when child class becomes empty (Lion Ackermann)
  • ⚠️ PR commit d7bc710743d1 (net/sched: Always pass notifications when child class becomes empty) references upstream commit
    103406b38c60 which has been referenced by a Fixes: tag in the upstream
    Linux kernel:
    87c6efc5ce9c net/sched: ets: use old 'nbands' while purging unused classes (Davide Caratti)

This is an automated message from the kernel commit checker workflow.

@bmastbergen
Copy link
Collaborator Author

🔍 Upstream Linux Kernel Commit Check

  • ⚠️ PR commit 25d8933619a6 (sch_hfsc: Fix qlen accounting bug when using peek in hfsc_enqueue()) references upstream commit
    3f981138109f which has been referenced by a Fixes: tag in the upstream
    Linux kernel:
    103406b38c60 net/sched: Always pass notifications when child class becomes empty (Lion Ackermann)
  • ⚠️ PR commit d7bc710743d1 (net/sched: Always pass notifications when child class becomes empty) references upstream commit
    103406b38c60 which has been referenced by a Fixes: tag in the upstream
    Linux kernel:
    87c6efc5ce9c net/sched: ets: use old 'nbands' while purging unused classes (Davide Caratti)

This is an automated message from the kernel commit checker workflow.

Both of these fixes are in the PR.

Copy link
Collaborator

@PlaidCat PlaidCat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@bmastbergen bmastbergen merged commit b6ab14f into ciqlts9_2 Sep 22, 2025
4 checks passed
@bmastbergen bmastbergen deleted the bmastbergen_ciqlts9_2/many-vulns-9-19-25 branch September 22, 2025 17:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

3 participants