APPLIED: [SRU][J][PATCH v4 0/11] Fix hung task when heavily accessing kernfs files

Mehmet Basaran mehmet.basaran at canonical.com
Fri Oct 10 20:18:24 UTC 2025


Applied to jammy:linux master-next branch. Thanks.

-------------- next part --------------
Ghadi Elie Rahme <ghadi.rahme at canonical.com> writes:

> BugLink: https://bugs.launchpad.net/bugs/2125142
>
> [impact]
>
> When running heavy IO operations on sysfs, the kernel will lock up and many hung tasks will be printed in dmesg.
> This issue is due to a bug in the kernfs driver and was fixed upstream through 4 commits:
>
>     06fb4736139f kernfs: change kernfs_rename_lock into a read-write lock.
>     c9f2dfb7b59e kernfs: Use a per-fs rwsem to protect per-fs list of kernfs_super_info.
>     9caf69614225 kernfs: Introduce separate rwsem to protect inode attributes.
>     393c3714081a kernfs: switch global kernfs_rwsem lock to per-fs lock
>
> I have backported these commits and also backported two other required commits that these commit depend on:
>
>     44a41882575b kernfs: dont take i_lock on inode attr read.
>     f44a3ca1e533 kernfs: move struct kernfs_root out of the public view.
>
> 5 other commits are required as well that contain fixes for the above mentioned commits:
>
>     0559f63057f9 kernfs: fix missing kernfs_iattr_rwsem locking
>     72b5d5aef246 kernfs: fix potential NULL dereference in __kernfs_remove
>     ad8d869343ae kernfs: fix NULL dereferencing in kernfs_remove
>     f3a690227f07 kernfs: remove redundant kernfs_rwsem declaration.
>     555a0ce4558d kernfs: prevent early freeing of root node
>
> In total 11 commits are required.
>
> One of the main signs of hitting this bug, is seeing a huge number of tasks in the Uninterruptible state when analyzing the core dump:
>
> crash> ps -S
>   RU: 99
>   UN: 7725
>   IN: 6113
>   ID: 444
>
> Stack traces can be very different based on the application locking up but what they all have in common is that they are waiting for a mutex to unlock:
>
> [308188.415026] INFO: task READACTED:15703 blocked for more than 360 seconds.
> [308188.415362] Not tainted 5.15.0-153-generic #163-Ubuntu
> [308188.415749] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [308188.416054] task:REDACTED state:D stack: 0 pid:15703 ppid: 15160 flags:0x00000000
> [308188.416347] Call Trace:
> [308188.416634] <TASK>
> [308188.416917] __schedule+0x24e/0x590
> [308188.417202] schedule+0x69/0x110
> [308188.417506] schedule_preempt_disabled+0xe/0x20
> [308188.417789] __mutex_lock.constprop.0+0x267/0x490
> [308188.418052] ? rtnl_getlink+0x420/0x420
> [308188.418311] __mutex_lock_slowpath+0x13/0x20
> [308188.418575] mutex_lock+0x38/0x50
> [308188.418818] __netlink_dump_start+0xbf/0x2f0
> [308188.419061] ? rtnl_getlink+0x420/0x420
> [308188.419321] rtnetlink_rcv_msg+0x2af/0x400
> [308188.419564] ? rtnl_getlink+0x420/0x420
> [308188.419807] ? rtnl_calcit.isra.0+0x130/0x130
> [308188.420051] netlink_rcv_skb+0x53/0x100
> [308188.420297] rtnetlink_rcv+0x15/0x20
> [308188.420539] netlink_unicast+0x220/0x340
> [308188.420781] netlink_sendmsg+0x24b/0x4c0
> [308188.421023] __sock_sendmsg+0x66/0x70
> [308188.421262] __sys_sendto+0x113/0x190
> [308188.421511] ? __audit_syscall_exit+0x269/0x2d0
> [308188.421754] ? __audit_syscall_entry+0xde/0x120
> [308188.422024] __x64_sys_sendto+0x24/0x30
> [308188.422258] x64_sys_call+0x1bcb/0x1fa0
> [308188.422485] do_syscall_64+0x56/0xb0
> [308188.422711] entry_SYSCALL_64_after_hwframe+0x6c/0xd6
> [308188.422946] RIP: 0033:0x48fbae
> [308188.423196] RSP: 002b:000000c0038b1480 EFLAGS: 00000216 ORIG_RAX: 000000000000002c
> [308188.423454] RAX: ffffffffffffffda RBX: 000000000000008a RCX: 000000000048fbae
> [308188.423694] RDX: 0000000000000011 RSI: 000000c0074b84e0 RDI: 000000000000008a
> [308188.423935] RBP: 000000c0038b14c0 R08: 000000c0074b84d4 R09: 000000000000000c
> [308188.424180] R10: 0000000000000000 R11: 0000000000000216 R12: 000000c0074b84e0
> [308188.424422] R13: 0000000000000000 R14: 000000c0039a2540 R15: 00000000000007c6
> [308188.424667] </TASK>
>
> The issue currently affects Jammy 5.15 kernel and is not present in the 6.8 kernel or above.
>
> [Test Plan]
>
> Reproducing the issue requires a machine that is able to run a high number of applications concurrently that are hammering kernfs with I/O requests.
>
> commit "9caf69614225 kernfs: Introduce separate rwsem to protect inode attributes" contains a C program snippet reproducer.
>
> Although in this loop the sysfs being accessed is for an infiniband device, having a any similar program that reads files from sysfs with multiple instances of the program running, will trigger the issue.
> An example could be reading data from "/sys/class/net/<interface>/statistics/*" instead.
>
> In other words running multiple instances of a program that does IO operations on sysfs would confirm if the issue remains or is resolved. To reliably reproduce or confirm the resolution of the issue, dozens if not hundreds of instances should be executed.
>
> [ Where problems could occur ]
>
> * Since the fix changes the semaphore being used to access kernfs files from one that protects an entire kernfs filesystem to one that protects individual inodes, there is a chance that this might cause race conditions on read/write.
> * This might also cause deadlocks similar to the ones this bug is reporting.
>
> Greg Kroah-Hartman (1):
>   kernfs: move struct kernfs_root out of the public view.
>
> Ian Kent (2):
>   kernfs: dont take i_lock on inode attr read
>   kernfs: fix missing kernfs_iattr_rwsem locking
>
> Imran Khan (4):
>   kernfs: Introduce separate rwsem to protect inode attributes.
>   kernfs: Use a per-fs rwsem to protect per-fs list of
>     kernfs_super_info.
>   kernfs: change kernfs_rename_lock into a read-write lock.
>   kernfs: remove redundant kernfs_rwsem declaration.
>
> Minchan Kim (3):
>   kernfs: switch global kernfs_rwsem lock to per-fs lock
>   kernfs: prevent early freeing of root node
>   kernfs: fix NULL dereferencing in kernfs_remove
>
> Yushan Zhou (1):
>   kernfs: fix potential NULL dereference in __kernfs_remove
>
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c |   4 +-
>  fs/kernfs/dir.c                        | 169 ++++++++++++++++---------
>  fs/kernfs/file.c                       |   6 +-
>  fs/kernfs/inode.c                      |  26 ++--
>  fs/kernfs/kernfs-internal.h            |  21 ++-
>  fs/kernfs/mount.c                      |  15 ++-
>  fs/kernfs/symlink.c                    |   5 +-
>  fs/sysfs/mount.c                       |   2 +-
>  include/linux/kernfs.h                 |   6 +
>  kernel/cgroup/cgroup.c                 |   4 +-
>  10 files changed, 173 insertions(+), 85 deletions(-)
>
> -- 
> 2.34.1
>
>
> -- 
> kernel-team mailing list
> kernel-team at lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 873 bytes
Desc: not available
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20251010/3d24991d/attachment.sig>


More information about the kernel-team mailing list