[Bug 2062568] Re: nfsd gets unresponsive after some hours of operation

Stefan 2062568 at bugs.launchpad.net
Sat Aug 31 15:09:03 UTC 2024


Confirmed on 24.04.1 and previously on 23.10 (both server and client),
also using large files (1-100GB) and 10G networking to large/fast disk
arrays, which others have suggested to be a key factor. All mountpoints
are running BTRFS (in some cases a brand new filesystem) without any
LUKS.

My observations with throughput also match, e.g.
host B as client connects to host A's nfs server and is high traffic, this fails after ~12 hours, requiring server A to reboot to recover
host A as client connects to host B's nfs server but is low traffic, and mount has not failed, even if neither servers rebooted for days

I have amended both my nfs.conf and fstab on all devices to force
nfsvers3 only as a workaround until there's a more permanent fix, or we
migrate to Debian

 <TASK>
 __schedule+0x27c/0x6b0
 ? __smp_call_single_queue+0xfd/0x180
 schedule+0x33/0x110
 schedule_timeout+0x157/0x170
 wait_for_completion+0x88/0x150
 __flush_workqueue+0x140/0x3e0
 ? nfsd4_run_cb+0x30/0x70 [nfsd]
 nfsd4_probe_callback_sync+0x1a/0x30 [nfsd]
 nfsd4_destroy_session+0x186/0x260 [nfsd]
 nfsd4_proc_compound+0x3b7/0x780 [nfsd]
 nfsd_dispatch+0xd7/0x220 [nfsd]
 svc_process_common+0x450/0x710 [sunrpc]
 ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
 svc_process+0x132/0x1b0 [sunrpc]
 svc_handle_xprt+0x4d3/0x5d0 [sunrpc]
 svc_recv+0x18b/0x2e0 [sunrpc]
 ? __pfx_nfsd+0x10/0x10 [nfsd]
 nfsd+0x8b/0xe0 [nfsd]
 kthread+0xf2/0x120
 ? __pfx_kthread+0x10/0x10
 ret_from_fork+0x47/0x70
 ? __pfx_kthread+0x10/0x10
 ret_from_fork_asm+0x1b/0x30
 </TASK>

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to nfs-utils in Ubuntu.
https://bugs.launchpad.net/bugs/2062568

Title:
  nfsd gets unresponsive after some hours of operation

Status in nfs-utils package in Ubuntu:
  Confirmed

Bug description:
  I installed the 24.04 Beta on two test machines that were running
  22.04 without issues before. One of them exports two volumes that are
  mounted by the other machine, which primarily uses them as a secondary
  storage for ccache.

  After being up for a couple of hours (happened twice since yesterday
  evening) it seems that nfsd on the machine exporting the volumes hangs
  on something.

  From dmesg on the server (repeated a few times):

  [11183.290548] INFO: task nfsd:1419 blocked for more than 1228 seconds.
  [11183.290558]       Not tainted 6.8.0-22-generic #22-Ubuntu
  [11183.290563] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
  [11183.290582] task:nfsd            state:D stack:0     pid:1419  tgid:1419  ppid:2      flags:0x00004000
  [11183.290587] Call Trace:
  [11183.290602]  <TASK>
  [11183.290606]  __schedule+0x27c/0x6b0
  [11183.290612]  schedule+0x33/0x110
  [11183.290615]  schedule_timeout+0x157/0x170
  [11183.290619]  wait_for_completion+0x88/0x150
  [11183.290623]  __flush_workqueue+0x140/0x3e0
  [11183.290629]  nfsd4_probe_callback_sync+0x1a/0x30 [nfsd]
  [11183.290689]  nfsd4_destroy_session+0x186/0x260 [nfsd]
  [11183.290744]  nfsd4_proc_compound+0x3af/0x770 [nfsd]
  [11183.290798]  nfsd_dispatch+0xd4/0x220 [nfsd]
  [11183.290851]  svc_process_common+0x44d/0x710 [sunrpc]
  [11183.290924]  ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
  [11183.290976]  svc_process+0x132/0x1b0 [sunrpc]
  [11183.291041]  svc_handle_xprt+0x4d3/0x5d0 [sunrpc]
  [11183.291105]  svc_recv+0x18b/0x2e0 [sunrpc]
  [11183.291168]  ? __pfx_nfsd+0x10/0x10 [nfsd]
  [11183.291220]  nfsd+0x8b/0xe0 [nfsd]
  [11183.291270]  kthread+0xef/0x120
  [11183.291274]  ? __pfx_kthread+0x10/0x10
  [11183.291276]  ret_from_fork+0x44/0x70
  [11183.291279]  ? __pfx_kthread+0x10/0x10
  [11183.291281]  ret_from_fork_asm+0x1b/0x30
  [11183.291286]  </TASK>

  From dmesg on the client (repeated a number of times):
  [ 6596.911785] RPC: Could not send backchannel reply error: -110
  [ 6596.972490] RPC: Could not send backchannel reply error: -110
  [ 6837.281307] RPC: Could not send backchannel reply error: -110

  ProblemType: Bug
  DistroRelease: Ubuntu 24.04
  Package: nfs-kernel-server 1:2.6.4-3ubuntu5
  ProcVersionSignature: Ubuntu 6.8.0-22.22-generic 6.8.1
  Uname: Linux 6.8.0-22-generic x86_64
  .etc.request-key.d.id_resolver.conf: create	id_resolver	*	*	/usr/sbin/nfsidmap -t 600 %k %d
  ApportVersion: 2.28.1-0ubuntu1
  Architecture: amd64
  CasperMD5CheckResult: pass
  Date: Fri Apr 19 14:10:25 2024
  InstallationDate: Installed on 2024-04-16 (3 days ago)
  InstallationMedia: Ubuntu-Server 24.04 LTS "Noble Numbat" - Beta amd64 (20240410.1)
  NFSMounts:

  NFSv4Mounts:

  ProcEnviron:
   LANG=en_US.UTF-8
   PATH=(custom, no user)
   SHELL=/bin/bash
   TERM=xterm-256color
   XDG_RUNTIME_DIR=<set>
  SourcePackage: nfs-utils
  UpgradeStatus: No upgrade log present (probably fresh install)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/2062568/+subscriptions




More information about the foundations-bugs mailing list