APPLIED: [SRU][J][PATCH 0/1] raid10: block discard causes a NULL pointer dereference after 5.15.0-144-generic
Stefan Bader
stefan.bader at canonical.com
Tue Jul 22 15:07:31 UTC 2025
On 22.07.25 06:10, Matthew Ruffell wrote:
> BugLink: https://bugs.launchpad.net/bugs/2117395
>
> [Impact]
>
> The below commit was backported to 5.15.181 -stable, and introduced a NULL
> pointer dereference in the raid10 subsystem, due to io_acct_set only being used
> in raid 0 and 456, and not 1 or 10.
>
> commit d05af90d6218e9c8f1c2026990c3f53c1b41bfb0
> Author: Yu Kuai <yukuai3 at huawei.com>
> Date: Tue Mar 25 09:57:46 2025 +0800
> Subject: md/raid10: fix missing discard IO accounting
> Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d05af90d6218e9c8f1c2026990c3f53c1b41bfb0
>
> Kernel oops:
>
> kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000
> kernel: #PF: supervisor instruction fetch in kernel mode
> kernel: #PF: error_code(0x0010) - not-present page
> kernel: PGD 0 P4D 0
> kernel: Oops: 0010 [#1] SMP PTI
> kernel: CPU: 5 PID: 784107 Comm: fstrim Not tainted 5.15.0-144-generic #157-Ubuntu
> kernel: RIP: 0010:0x0
> kernel: Code: Unable to access opcode bytes at RIP 0xffffffffffffffd6.
> kernel: RSP: 0018:ffffb576409c7858 EFLAGS: 00010206
> kernel: RAX: 0000000000000000 RBX: 0000000000092800 RCX: 0000000000000001
> kernel: RDX: ffff8e7e012426f0 RSI: 0000000000000000 RDI: 0000000000092800
> kernel: RBP: ffffb576409c78c8 R08: ffff8e884ec966c0 R09: ffff8e7e07c6b050
> kernel: R10: 0000000000002ecb R11: 00000000000030c8 R12: 0000000000092c00
> kernel: R13: 0000000000000400 R14: ffff8e7e01242708 R15: ffff8e7e10743400
> kernel: FS: 00007f6fff9f0800(0000) GS:ffff8e8cee540000(0000) knlGS:0000000000000000
> kernel: CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> kernel: CR2: ffffffffffffffd6 CR3: 00000001090f6005 CR4: 00000000003706e0
> kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> kernel: DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> kernel: Call Trace:
> kernel: <TASK>
> kernel: mempool_alloc+0x61/0x1b0
> kernel: ? __kmalloc+0x179/0x330
> kernel: bio_alloc_bioset+0x9d/0x370
> kernel: ? r10bio_pool_alloc+0x26/0x30 [raid10]
> kernel: bio_clone_fast+0x1f/0x90
> kernel: md_account_bio+0x42/0x80
> kernel: raid10_handle_discard+0x56f/0x6b0 [raid10]
> kernel: raid10_make_request+0x147/0x180 [raid10]
> kernel: md_handle_request+0x12a/0x1b0
> kernel: ? submit_bio_checks+0x1a5/0x580
> kernel: md_submit_bio+0x76/0xc0
> kernel: __submit_bio+0x1a2/0x220
> kernel: ? mempool_alloc_slab+0x17/0x20
> kernel: ? mempool_alloc+0x61/0x1b0
> kernel: ? schedule_timeout+0x91/0x140
> kernel: __submit_bio_noacct+0x85/0x200
> kernel: submit_bio_noacct+0x4e/0x120
> kernel: ? __cond_resched+0x1a/0x60
> kernel: submit_bio+0x4a/0x130
> kernel: submit_bio_wait+0x5a/0xc0
> kernel: blkdev_issue_discard+0x7e/0xd0
> kernel: ext4_try_to_trim_range+0x2db/0x520
> kernel: ? ext4_mb_load_buddy_gfp+0x91/0x3e0
> kernel: ext4_trim_fs+0x313/0x510
> kernel: __ext4_ioctl+0x82c/0xef0
> kernel: ext4_ioctl+0xe/0x20
> kernel: __x64_sys_ioctl+0x92/0xd0
> kernel: x64_sys_call+0x1e5f/0x1fa0
> kernel: do_syscall_64+0x56/0xb0
> kernel: entry_SYSCALL_64_after_hwframe+0x6c/0xd6
>
> A workaround is to disable the systemd weekly fstrim timer and to not fstrim /
> discard blocks while the problem exists.
>
> [Fix]
>
> The below necessary commit was mainlined in 6.6-rc1 and needs to be backported
> to jammy.
>
> commit c567c86b90d4715081adfe5eb812141a5b6b4883
> Author: Yu Kuai <yukuai3 at huawei.com>
> Date: Thu Jun 22 00:51:03 2023 +0800
> Subject: md: move initialization and destruction of 'io_acct_set' to md.c
> Link: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=c567c86b90d4715081adfe5eb812141a5b6b4883
>
> This needs a minor backport, adjusting __md_stop() to md_stop().
>
> [Testcase]
>
> You will need a machine with at least 4x NVMe drives which support block
> discard. I use a i3.8xlarge instance on AWS, since it has all of these things.
>
> $ lsblk
> xvda 202:0 0 8G 0 disk
> └─xvda1 202:1 0 8G 0 part /
> nvme0n1 259:2 0 1.7T 0 disk
> nvme1n1 259:0 0 1.7T 0 disk
> nvme2n1 259:1 0 1.7T 0 disk
> nvme3n1 259:3 0 1.7T 0 disk
>
> Create a Raid10 array:
>
> $ sudo mdadm --create --verbose /dev/md0 --level=10 --raid-devices=4 /dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1
>
> Format the array with XFS (use -K to disable initial discard):
>
> $ sudo mkfs.xfs -K /dev/md0
>
> $ sudo mkdir /mnt/disk
> $ sudo mount /dev/md0 /mnt/disk
>
> Do a fstrim:
>
> $ sudo fstrim /mnt/disk
>
> There are test packages available in the following ppa:
>
> https://launchpad.net/~mruffell/+archive/ubuntu/sf414897-test
>
> If you install the test kernel, the kernel will no longer panic on fstrim.
>
> [Where problems can occur]
>
> This changes io_acct_set from being sometimes initialised, mostly under raid 0,
> 456 to being always initialised under all raid types.
>
> If a regression were to occur, it would likely impact block discard on any raid
> type, not just raid 10, but raid 10 would carry more risk as we may be missing
> more patches due to discard on raid10 being very new, as in the last 5 or so
> years, versus 0, 456 which have had full discard for a decade or more.
>
> The workarounds would be the same, to disable the systemd block discard timer
> or disable fstrim.
>
> [Other info]
>
> Upstream bug:
> https://lists.linaro.org/archives/list/linux-stable-mirror@lists.linaro.org/thread/TM2PPS3XKE6M5H2FW63MLZV2T7HTM3QJ/
>
> Debian bug:
> https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1104460
>
> Yu Kuai (1):
> md: move initialization and destruction of 'io_acct_set' to md.c
>
> drivers/md/md.c | 27 ++++++++++-----------------
> drivers/md/md.h | 2 --
> drivers/md/raid0.c | 16 ++--------------
> drivers/md/raid5.c | 41 +++++++++++------------------------------
> 4 files changed, 23 insertions(+), 63 deletions(-)
>
Applied to jammy:linux in s2025.06.16. Thanks.
-Stefan
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_0xE8675DEECBEECEA3.asc
Type: application/pgp-keys
Size: 47863 bytes
Desc: OpenPGP public key
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20250722/c145ef74/attachment-0001.key>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20250722/c145ef74/attachment-0001.sig>
More information about the kernel-team
mailing list