NACK: [SRU][J][PATCH v3 0/6] Fix hung task when heavily accessing kernfs files

Alessio Faina alessio.faina at canonical.com
Wed Oct 1 07:18:06 UTC 2025


v4 submitted

On Fri, Sep 19, 2025 at 03:52:57PM +0000, ghadi.rahme at canonical.com wrote:
> From: Ghadi Elie Rahme <ghadi.rahme at canonical.com>
> 
> BugLink: https://bugs.launchpad.net/bugs/2125142
> 
> [impact]
> 
> When running heavy IO operations on sysfs, the kernel will lock up and many hung tasks will be printed in dmesg.
> This issue is due to a bug in the kernfs driver and was fixed upstream through 4 commits:
> 
>     06fb4736139f kernfs: change kernfs_rename_lock into a read-write lock.
>     c9f2dfb7b59e kernfs: Use a per-fs rwsem to protect per-fs list of kernfs_super_info.
>     9caf69614225 kernfs: Introduce separate rwsem to protect inode attributes.
>     393c3714081a kernfs: switch global kernfs_rwsem lock to per-fs lock
> 
> I have backported these commits and also backported two other required commits that these commit depend on:
> 
> 44a41882575b kernfs: dont take i_lock on inode attr read.
> f44a3ca1e533 kernfs: move struct kernfs_root out of the public view.
> 
> One of the main signs of hitting this bug, is seeing a huge number of tasks in the Uninterruptible state when analyzing
> the core dump:
> 
> crash> ps -S
>   RU: 99
>   UN: 7725
>   IN: 6113
>   ID: 444
> 
> Stack traces can be very different based on the application locking up but what they all have in common is that they are
> waiting for a mutex to unlock:
> 
> [308188.415026] INFO: task READACTED:15703 blocked for more than 360 seconds.
> [308188.415362] Not tainted 5.15.0-153-generic #163-Ubuntu
> [308188.415749] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> [308188.416054] task:REDACTED state:D stack: 0 pid:15703 ppid: 15160 flags:0x00000000
> [308188.416347] Call Trace:
> [308188.416634] <TASK>
> [308188.416917] __schedule+0x24e/0x590
> [308188.417202] schedule+0x69/0x110
> [308188.417506] schedule_preempt_disabled+0xe/0x20
> [308188.417789] __mutex_lock.constprop.0+0x267/0x490
> [308188.418052] ? rtnl_getlink+0x420/0x420
> [308188.418311] __mutex_lock_slowpath+0x13/0x20
> [308188.418575] mutex_lock+0x38/0x50
> [308188.418818] __netlink_dump_start+0xbf/0x2f0
> [308188.419061] ? rtnl_getlink+0x420/0x420
> [308188.419321] rtnetlink_rcv_msg+0x2af/0x400
> [308188.419564] ? rtnl_getlink+0x420/0x420
> [308188.419807] ? rtnl_calcit.isra.0+0x130/0x130
> [308188.420051] netlink_rcv_skb+0x53/0x100
> [308188.420297] rtnetlink_rcv+0x15/0x20
> [308188.420539] netlink_unicast+0x220/0x340
> [308188.420781] netlink_sendmsg+0x24b/0x4c0
> [308188.421023] __sock_sendmsg+0x66/0x70
> [308188.421262] __sys_sendto+0x113/0x190
> [308188.421511] ? __audit_syscall_exit+0x269/0x2d0
> [308188.421754] ? __audit_syscall_entry+0xde/0x120
> [308188.422024] __x64_sys_sendto+0x24/0x30
> [308188.422258] x64_sys_call+0x1bcb/0x1fa0
> [308188.422485] do_syscall_64+0x56/0xb0
> [308188.422711] entry_SYSCALL_64_after_hwframe+0x6c/0xd6
> [308188.422946] RIP: 0033:0x48fbae
> [308188.423196] RSP: 002b:000000c0038b1480 EFLAGS: 00000216 ORIG_RAX: 000000000000002c
> [308188.423454] RAX: ffffffffffffffda RBX: 000000000000008a RCX: 000000000048fbae
> [308188.423694] RDX: 0000000000000011 RSI: 000000c0074b84e0 RDI: 000000000000008a
> [308188.423935] RBP: 000000c0038b14c0 R08: 000000c0074b84d4 R09: 000000000000000c
> [308188.424180] R10: 0000000000000000 R11: 0000000000000216 R12: 000000c0074b84e0
> [308188.424422] R13: 0000000000000000 R14: 000000c0039a2540 R15: 00000000000007c6
> [308188.424667] </TASK>
> 
> The issue currently affects Jammy 5.15 kernel and is not present in the 6.8 kernel or above.
> 
> [Test Plan]
> 
> Reproducing the issue requires a machine that is able to run a high number of applications concurrently that are hammering
> kernfs with I/O requests.
> 
> commit "9caf69614225 kernfs: Introduce separate rwsem to protect inode attributes" contains a C program snippet reproducer.
> 
> Although in this loop the sysfs being accessed is for an infiniband device, having a any similar program that reads files
> from sysfs with multiple instances of the program running, will trigger the issue.
> An example could be reading data from "/sys/class/net/<interface>/statistics/*" instead.
> 
> In other words running multiple instances of a program that does IO operations on sysfs would confirm if the issue remains
> or is resolved. To reliably reproduce or confirm the resolution of the issue, dozens if not hundreds of instances should
> be executed.
> 
> [ Where problems could occur ]
> 
> * Since the fix changes the semaphore being used to access kernfs files from one that protects an entire kernfs filesystem
> to one that protects individual inodes, there is a chance that this might cause race conditions on read/write.
> * This might also cause deadlocks similar to the ones this bug is reporting.
> 
> Greg Kroah-Hartman (1):
>   kernfs: move struct kernfs_root out of the public view.
> 
> Ian Kent (1):
>   kernfs: dont take i_lock on inode attr read
> 
> Imran Khan (3):
>   kernfs: Introduce separate rwsem to protect inode attributes.
>   kernfs: Use a per-fs rwsem to protect per-fs list of
>     kernfs_super_info.
>   kernfs: change kernfs_rename_lock into a read-write lock.
> 
> Minchan Kim (1):
>   kernfs: switch global kernfs_rwsem lock to per-fs lock
> 
>  arch/x86/kernel/cpu/resctrl/rdtgroup.c |   4 +-
>  fs/kernfs/dir.c                        | 145 ++++++++++++++++---------
>  fs/kernfs/file.c                       |   6 +-
>  fs/kernfs/inode.c                      |  26 +++--
>  fs/kernfs/kernfs-internal.h            |  20 ++++
>  fs/kernfs/mount.c                      |  15 ++-
>  fs/kernfs/symlink.c                    |   5 +-
>  fs/sysfs/mount.c                       |   2 +-
>  include/linux/kernfs.h                 |   6 +
>  kernel/cgroup/cgroup.c                 |   4 +-
>  10 files changed, 152 insertions(+), 81 deletions(-)
> 
> -- 
> 2.34.1
> 
> 
> -- 
> kernel-team mailing list
> kernel-team at lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team



More information about the kernel-team mailing list