ACK: [SRU][J][PATCH 0/1] raid10: block discard causes a NULL pointer dereference after 5.15.0-144-generic
Massimiliano Pellizzer
massimiliano.pellizzer at canonical.com
Tue Jul 22 07:33:47 UTC 2025
On Tue, 22 Jul 2025 at 09:09, Stefan Bader <stefan.bader at canonical.com> wrote:
>
> On 22.07.25 06:10, Matthew Ruffell wrote:
> > BugLink: https://bugs.launchpad.net/bugs/2117395
> >
> > [Impact]
> >
> > The below commit was backported to 5.15.181 -stable, and introduced a NULL
> > pointer dereference in the raid10 subsystem, due to io_acct_set only being used
> > in raid 0 and 456, and not 1 or 10.
> >
> > commit d05af90d6218e9c8f1c2026990c3f53c1b41bfb0
> > Author: Yu Kuai <yukuai3 at huawei.com>
> > Date: Tue Mar 25 09:57:46 2025 +0800
> > Subject: md/raid10: fix missing discard IO accounting
> > Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d05af90d6218e9c8f1c2026990c3f53c1b41bfb0
> >
> > Kernel oops:
> >
> > kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000
> > kernel: #PF: supervisor instruction fetch in kernel mode
> > kernel: #PF: error_code(0x0010) - not-present page
> > kernel: PGD 0 P4D 0
> > kernel: Oops: 0010 [#1] SMP PTI
> > kernel: CPU: 5 PID: 784107 Comm: fstrim Not tainted 5.15.0-144-generic #157-Ubuntu
> > kernel: RIP: 0010:0x0
> > kernel: Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
> > kernel: RSP: 0018:ffffb576409c7858 EFLAGS: 00010206
> > kernel: RAX: 0000000000000000 RBX: 0000000000092800 RCX: 0000000000000001
> > kernel: RDX: ffff8e7e012426f0 RSI: 0000000000000000 RDI: 0000000000092800
> > kernel: RBP: ffffb576409c78c8 R08: ffff8e884ec966c0 R09: ffff8e7e07c6b050
> > kernel: R10: 0000000000002ecb R11: 00000000000030c8 R12: 0000000000092c00
> > kernel: R13: 0000000000000400 R14: ffff8e7e01242708 R15: ffff8e7e10743400
> > kernel: FS: 00007f6fff9f0800(0000) GS:ffff8e8cee540000(0000) knlGS:0000000000000000
> > kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > kernel: CR2: ffffffffffffffd6 CR3: 00000001090f6005 CR4: 00000000003706e0
> > kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> > kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> > kernel: Call Trace:
> > kernel: <TASK>
> > kernel: mempool_alloc+0x61/0x1b0
> > kernel: ? __kmalloc+0x179/0x330
> > kernel: bio_alloc_bioset+0x9d/0x370
> > kernel: ? r10bio_pool_alloc+0x26/0x30 [raid10]
> > kernel: bio_clone_fast+0x1f/0x90
> > kernel: md_account_bio+0x42/0x80
> > kernel: raid10_handle_discard+0x56f/0x6b0 [raid10]
> > kernel: raid10_make_request+0x147/0x180 [raid10]
> > kernel: md_handle_request+0x12a/0x1b0
> > kernel: ? submit_bio_checks+0x1a5/0x580
> > kernel: md_submit_bio+0x76/0xc0
> > kernel: __submit_bio+0x1a2/0x220
> > kernel: ? mempool_alloc_slab+0x17/0x20
> > kernel: ? mempool_alloc+0x61/0x1b0
> > kernel: ? schedule_timeout+0x91/0x140
> > kernel: __submit_bio_noacct+0x85/0x200
> > kernel: submit_bio_noacct+0x4e/0x120
> > kernel: ? __cond_resched+0x1a/0x60
> > kernel: submit_bio+0x4a/0x130
> > kernel: submit_bio_wait+0x5a/0xc0
> > kernel: blkdev_issue_discard+0x7e/0xd0
> > kernel: ext4_try_to_trim_range+0x2db/0x520
> > kernel: ? ext4_mb_load_buddy_gfp+0x91/0x3e0
> > kernel: ext4_trim_fs+0x313/0x510
> > kernel: __ext4_ioctl+0x82c/0xef0
> > kernel: ext4_ioctl+0xe/0x20
> > kernel: __x64_sys_ioctl+0x92/0xd0
> > kernel: x64_sys_call+0x1e5f/0x1fa0
> > kernel: do_syscall_64+0x56/0xb0
> > kernel: entry_SYSCALL_64_after_hwframe+0x6c/0xd6
> >
> > A workaround is to disable the systemd weekly fstrim timer and to not fstrim /
> > discard blocks while the problem exists.
> >
> > [Fix]
> >
> > The below necessary commit was mainlined in 6.6-rc1 and needs to be backported
> > to jammy.
> >
> > commit c567c86b90d4715081adfe5eb812141a5b6b4883
> > Author: Yu Kuai <yukuai3 at huawei.com>
> > Date: Thu Jun 22 00:51:03 2023 +0800
> > Subject: md: move initialization and destruction of 'io_acct_set' to md.c
> > Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c567c86b90d4715081adfe5eb812141a5b6b4883
> >
> > This needs a minor backport, adjusting __md_stop() to md_stop().
> >
> > [Testcase]
> >
> > You will need a machine with at least 4x NVMe drives which support block
> > discard. I use a i3.8xlarge instance on AWS, since it has all of these things.
> >
> > $ lsblk
> > xvda 202:0 0 8G 0 disk
> > └─xvda1 202:1 0 8G 0 part /
> > nvme0n1 259:2 0 1.7T 0 disk
> > nvme1n1 259:0 0 1.7T 0 disk
> > nvme2n1 259:1 0 1.7T 0 disk
> > nvme3n1 259:3 0 1.7T 0 disk
> >
> > Create a Raid10 array:
> >
> > $ sudo mdadm --create --verbose /dev/md0 --level=10 --raid-devices=4 /dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1
> >
> > Format the array with XFS (use -K to disable initial discard):
> >
> > $ sudo mkfs.xfs -K /dev/md0
> >
> > $ sudo mkdir /mnt/disk
> > $ sudo mount /dev/md0 /mnt/disk
> >
> > Do a fstrim:
> >
> > $ sudo fstrim /mnt/disk
> >
> > There are test packages available in the following ppa:
> >
> > https://launchpad.net/~mruffell/+archive/ubuntu/sf414897-test
> >
> > If you install the test kernel, the kernel will no longer panic on fstrim.
> >
> > [Where problems can occur]
> >
> > This changes io_acct_set from being sometimes initialised, mostly under raid 0,
> > 456 to being always initialised under all raid types.
> >
> > If a regression were to occur, it would likely impact block discard on any raid
> > type, not just raid 10, but raid 10 would carry more risk as we may be missing
> > more patches due to discard on raid10 being very new, as in the last 5 or so
> > years, versus 0, 456 which have had full discard for a decade or more.
> >
> > The workarounds would be the same, to disable the systemd block discard timer
> > or disable fstrim.
> >
> > [Other info]
> >
> > Upstream bug:
> > https://lists.linaro.org/archives/list/linux-stable-mirror@lists.linaro.org/thread/TM2PPS3XKE6M5H2FW63MLZV2T7HTM3QJ/
> >
> > Debian bug:
> > https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1104460
> >
> > Yu Kuai (1):
> > md: move initialization and destruction of 'io_acct_set' to md.c
> >
> > drivers/md/md.c | 27 ++++++++++-----------------
> > drivers/md/md.h | 2 --
> > drivers/md/raid0.c | 16 ++--------------
> > drivers/md/raid5.c | 41 +++++++++++------------------------------
> > 4 files changed, 23 insertions(+), 63 deletions(-)
> >
> Backport looks good and good test results.
>
> Acked-by: Stefan Bader <stefan.bader at canonical.com>
>
> - Stefan
> --
> kernel-team mailing list
> kernel-team at lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team
Acked-by: Massimiliano Pellizzer <massimiliano.pellizzer at canonical.com>
--
Massimiliano Pellizzer
More information about the kernel-team
mailing list