[Bug 2062568] Re: nfsd gets unresponsive after some hours of operation
Jeff
2062568 at bugs.launchpad.net
Sun Apr 21 22:42:03 UTC 2024
This kind of issue appeared with Ubuntu 23.10 for me on the server
mostly using an HDD for bulk storage with a not exactly powerful CPU
also being occupied with using WireGuard to secure the NFS connection.
Mentioning the performance details because I have a feeling they matter. An also not exactly high performance client connecting over 1 Gb/s only very occasionally caused this problem, however given a 10 Gb/s connection, the issue appeared significantly more commonly. A higher performance setup utilizing a 2.5 Gb/s connection triggered this bug in a couple of days after setup.
The lockup always seem to occur with heavy NFS usage, suspiciously mostly when there's both reading and writing going on, at least I don't recall it happening with reading only, but I'm not confident in stating it didn't happen with a writing only load.
Found this bug report by the client error message, server side differs due to the different version:
```
[300146.046666] INFO: task nfsd:1426 blocked for more than 241 seconds.
[300146.046732] Not tainted 6.5.0-27-generic #28~22.04.1-Ubuntu
[300146.046770] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[300146.046813] task:nfsd state:D stack:0 pid:1426 ppid:2 flags:0x00004000
[300146.046827] Call Trace:
[300146.046832] <TASK>
[300146.046839] __schedule+0x2cb/0x750
[300146.046860] schedule+0x63/0x110
[300146.046870] schedule_timeout+0x157/0x170
[300146.046881] wait_for_completion+0x88/0x150
[300146.046894] __flush_workqueue+0x140/0x3e0
[300146.046908] nfsd4_probe_callback_sync+0x1a/0x30 [nfsd]
[300146.047074] nfsd4_destroy_session+0x193/0x260 [nfsd]
[300146.047219] nfsd4_proc_compound+0x3b7/0x770 [nfsd]
[300146.047365] nfsd_dispatch+0xbf/0x1d0 [nfsd]
[300146.047497] svc_process_common+0x420/0x6e0 [sunrpc]
[300146.047695] ? __pfx_read_tsc+0x10/0x10
[300146.047706] ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
[300146.047848] ? __pfx_nfsd+0x10/0x10 [nfsd]
[300146.047977] svc_process+0x132/0x1b0 [sunrpc]
[300146.048157] nfsd+0xdc/0x1c0 [nfsd]
[300146.048287] kthread+0xf2/0x120
[300146.048299] ? __pfx_kthread+0x10/0x10
[300146.048310] ret_from_fork+0x47/0x70
[300146.048321] ? __pfx_kthread+0x10/0x10
[300146.048331] ret_from_fork_asm+0x1b/0x30
[300146.048341] </TASK>
```
This seems to be matching, but the previous lockups experienced may have been somewhat different.
I mostly remember the client whining about the server not responding instead of the message presented here, and the server call trace used to have btrfs in it which made me suspect it may be exclusive to that, although the issue was always with NFS, nothing else locked up despite having some other sources of heavy I/O.
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to nfs-utils in Ubuntu.
https://bugs.launchpad.net/bugs/2062568
Title:
nfsd gets unresponsive after some hours of operation
Status in nfs-utils package in Ubuntu:
Confirmed
Bug description:
I installed the 24.04 Beta on two test machines that were running
22.04 without issues before. One of them exports two volumes that are
mounted by the other machine, which primarily uses them as a secondary
storage for ccache.
After being up for a couple of hours (happened twice since yesterday
evening) it seems that nfsd on the machine exporting the volumes hangs
on something.
From dmesg on the server (repeated a few times):
[11183.290548] INFO: task nfsd:1419 blocked for more than 1228 seconds.
[11183.290558] Not tainted 6.8.0-22-generic #22-Ubuntu
[11183.290563] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[11183.290582] task:nfsd state:D stack:0 pid:1419 tgid:1419 ppid:2 flags:0x00004000
[11183.290587] Call Trace:
[11183.290602] <TASK>
[11183.290606] __schedule+0x27c/0x6b0
[11183.290612] schedule+0x33/0x110
[11183.290615] schedule_timeout+0x157/0x170
[11183.290619] wait_for_completion+0x88/0x150
[11183.290623] __flush_workqueue+0x140/0x3e0
[11183.290629] nfsd4_probe_callback_sync+0x1a/0x30 [nfsd]
[11183.290689] nfsd4_destroy_session+0x186/0x260 [nfsd]
[11183.290744] nfsd4_proc_compound+0x3af/0x770 [nfsd]
[11183.290798] nfsd_dispatch+0xd4/0x220 [nfsd]
[11183.290851] svc_process_common+0x44d/0x710 [sunrpc]
[11183.290924] ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
[11183.290976] svc_process+0x132/0x1b0 [sunrpc]
[11183.291041] svc_handle_xprt+0x4d3/0x5d0 [sunrpc]
[11183.291105] svc_recv+0x18b/0x2e0 [sunrpc]
[11183.291168] ? __pfx_nfsd+0x10/0x10 [nfsd]
[11183.291220] nfsd+0x8b/0xe0 [nfsd]
[11183.291270] kthread+0xef/0x120
[11183.291274] ? __pfx_kthread+0x10/0x10
[11183.291276] ret_from_fork+0x44/0x70
[11183.291279] ? __pfx_kthread+0x10/0x10
[11183.291281] ret_from_fork_asm+0x1b/0x30
[11183.291286] </TASK>
From dmesg on the client (repeated a number of times):
[ 6596.911785] RPC: Could not send backchannel reply error: -110
[ 6596.972490] RPC: Could not send backchannel reply error: -110
[ 6837.281307] RPC: Could not send backchannel reply error: -110
ProblemType: Bug
DistroRelease: Ubuntu 24.04
Package: nfs-kernel-server 1:2.6.4-3ubuntu5
ProcVersionSignature: Ubuntu 6.8.0-22.22-generic 6.8.1
Uname: Linux 6.8.0-22-generic x86_64
.etc.request-key.d.id_resolver.conf: create id_resolver * * /usr/sbin/nfsidmap -t 600 %k %d
ApportVersion: 2.28.1-0ubuntu1
Architecture: amd64
CasperMD5CheckResult: pass
Date: Fri Apr 19 14:10:25 2024
InstallationDate: Installed on 2024-04-16 (3 days ago)
InstallationMedia: Ubuntu-Server 24.04 LTS "Noble Numbat" - Beta amd64 (20240410.1)
NFSMounts:
NFSv4Mounts:
ProcEnviron:
LANG=en_US.UTF-8
PATH=(custom, no user)
SHELL=/bin/bash
TERM=xterm-256color
XDG_RUNTIME_DIR=<set>
SourcePackage: nfs-utils
UpgradeStatus: No upgrade log present (probably fresh install)
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/2062568/+subscriptions
More information about the foundations-bugs
mailing list