[Bug 2062568] Re: nfsd gets unresponsive after some hours of operation
Mehmet Basaran
2062568 at bugs.launchpad.net
Wed Sep 25 13:17:13 UTC 2024
Hi all,
I am from the Canonical's kernel team and currently investigating this
issue. In this case, jammy-hwe, mantic-hwe, and noble by default uses
6.8 kernel (when a generic jammy and mantic is installed it uses hwe
version by default). So, the issue is with 6.8 kernel rather than
series.
I was not able to reproduce the error with generic 6.8.0-45.45 kernel
after 1 hour of stressing. I am still working on this. I really
appreciate all the feedback you provided.
Meanwhile, for those who are having the problem, I have created an unofficial version of 6.8.0-45.45 kernel which includes the upstream fix from "6ddc9deacc1312762c2edd9de00ce76b00f69f7c",
- for jammy: https://launchpad.net/~mehmetbasaran/+archive/ubuntu/linux-hwe-6.8-6.8.0-45.45-nfs-patch
- for noble: https://launchpad.net/~mehmetbasaran/+archive/ubuntu/linux-6.8.0-45.45-nfs-patch
Installation instructions:
Note that, if you are using secure boot, you will not be able to boot
into these kernels. You will need to disable it first.
# Add the unofficial ppa. Pick the correct one depending on your series
# For jammy: sudo add-apt-repository ppa:mehmetbasaran/linux-hwe-6.8-6.8.0-45.45-nfs-patch
# For noble: sudo add-apt-repository ppa:mehmetbasaran/linux-6.8.0-45.45-nfs-patch
$ sudo add-apt-repository ppa:mehmetbasaran/linux-6.8.0-45.45-nfs-patch
$ sudo apt update
$ sudo apt install linux-buildinfo-6.8.0-46-generic-nfs \
linux-cloud-tools-6.8.0-46-generic-nfs \
linux-cloud-tools-common \
linux-headers-6.8.0-46-generic-nfs \
linux-image-unsigned-6.8.0-46-generic-nfs \
linux-modules-6.8.0-46-generic-nfs \
linux-modules-extra-6.8.0-46-generic-nfs \
linux-modules-ipu6-6.8.0-46-generic-nfs \
linux-modules-iwlwifi-6.8.0-46-generic-nfs \
linux-modules-usbio-6.8.0-46-generic-nfs \
linux-nfs-6.8-cloud-tools-6.8.0-46 \
linux-nfs-6.8-headers-6.8.0-46 \
linux-nfs-6.8-tools-6.8.0-46 \
linux-tools-6.8.0-46-generic-nfs
Next time you boot, you will be using the patched 6.8.0-45.45
$ uname -r
# 6.8.0-46-generic-nfs
To return back to the previous kernel (official 6.8.0-45.45) you just need to update grub:
$ grep 'menuentry \|submenu ' /boot/grub/grub.cfg | cut -f2 -d "'" # Prints available kernels on your machine, in my case:
Ubuntu
Advanced options for Ubuntu
Ubuntu, with Linux 6.8.0-46-generic-nfs
Ubuntu, with Linux 6.8.0-46-generic-nfs (recovery mode)
Ubuntu, with Linux 6.8.0-45-generic
Ubuntu, with Linux 6.8.0-45-generic (recovery mode)
Ubuntu, with Linux 6.5.0-18-generic
Ubuntu, with Linux 6.5.0-18-generic (recovery mode)
# Change GRUB_DEFAULT in /etc/default/grub
# from GRUB_DEFAULT=0
# to GRUB_DEFAULT="Advanced options for Ubuntu>Ubuntu, with Linux 6.8.0-45-generic"
$ sudo update-grub
$ reboot
$ uname -r
# 6.8.0-45-generic
After changing your kernel to previous version you can completely remove the unofficial kernel:
# Now these packages will be safe to be removed
$ sudo apt remove linux-buildinfo-6.8.0-46-generic-nfs \
linux-cloud-tools-6.8.0-46-generic-nfs \
linux-headers-6.8.0-46-generic-nfs \
linux-image-unsigned-6.8.0-46-generic-nfs \
linux-modules-6.8.0-46-generic-nfs \
linux-modules-extra-6.8.0-46-generic-nfs \
linux-modules-ipu6-6.8.0-46-generic-nfs \
linux-modules-iwlwifi-6.8.0-46-generic-nfs \
linux-modules-usbio-6.8.0-46-generic-nfs \
linux-nfs-6.8-cloud-tools-6.8.0-46 \
linux-nfs-6.8-headers-6.8.0-46 \
linux-nfs-6.8-tools-6.8.0-46 \
linux-tools-6.8.0-46-generic-nfs
# Remove unofficial ppa from update list
$ sudo add-apt-repository --remove ppa:mehmetbasaran/linux-6.8.0-45.45-nfs-patch
# Restore grub settings
# Change GRUB_DEFAULT in /etc/default/grub to GRUB_DEFAULT=0
$ sudo update-grub
$ reboot
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to nfs-utils in Ubuntu.
https://bugs.launchpad.net/bugs/2062568
Title:
nfsd gets unresponsive after some hours of operation
Status in linux package in Ubuntu:
Confirmed
Status in nfs-utils package in Ubuntu:
Confirmed
Bug description:
I installed the 24.04 Beta on two test machines that were running
22.04 without issues before. One of them exports two volumes that are
mounted by the other machine, which primarily uses them as a secondary
storage for ccache.
After being up for a couple of hours (happened twice since yesterday
evening) it seems that nfsd on the machine exporting the volumes hangs
on something.
From dmesg on the server (repeated a few times):
[11183.290548] INFO: task nfsd:1419 blocked for more than 1228 seconds.
[11183.290558] Not tainted 6.8.0-22-generic #22-Ubuntu
[11183.290563] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[11183.290582] task:nfsd state:D stack:0 pid:1419 tgid:1419 ppid:2 flags:0x00004000
[11183.290587] Call Trace:
[11183.290602] <TASK>
[11183.290606] __schedule+0x27c/0x6b0
[11183.290612] schedule+0x33/0x110
[11183.290615] schedule_timeout+0x157/0x170
[11183.290619] wait_for_completion+0x88/0x150
[11183.290623] __flush_workqueue+0x140/0x3e0
[11183.290629] nfsd4_probe_callback_sync+0x1a/0x30 [nfsd]
[11183.290689] nfsd4_destroy_session+0x186/0x260 [nfsd]
[11183.290744] nfsd4_proc_compound+0x3af/0x770 [nfsd]
[11183.290798] nfsd_dispatch+0xd4/0x220 [nfsd]
[11183.290851] svc_process_common+0x44d/0x710 [sunrpc]
[11183.290924] ? __pfx_nfsd_dispatch+0x10/0x10 [nfsd]
[11183.290976] svc_process+0x132/0x1b0 [sunrpc]
[11183.291041] svc_handle_xprt+0x4d3/0x5d0 [sunrpc]
[11183.291105] svc_recv+0x18b/0x2e0 [sunrpc]
[11183.291168] ? __pfx_nfsd+0x10/0x10 [nfsd]
[11183.291220] nfsd+0x8b/0xe0 [nfsd]
[11183.291270] kthread+0xef/0x120
[11183.291274] ? __pfx_kthread+0x10/0x10
[11183.291276] ret_from_fork+0x44/0x70
[11183.291279] ? __pfx_kthread+0x10/0x10
[11183.291281] ret_from_fork_asm+0x1b/0x30
[11183.291286] </TASK>
From dmesg on the client (repeated a number of times):
[ 6596.911785] RPC: Could not send backchannel reply error: -110
[ 6596.972490] RPC: Could not send backchannel reply error: -110
[ 6837.281307] RPC: Could not send backchannel reply error: -110
ProblemType: Bug
DistroRelease: Ubuntu 24.04
Package: nfs-kernel-server 1:2.6.4-3ubuntu5
ProcVersionSignature: Ubuntu 6.8.0-22.22-generic 6.8.1
Uname: Linux 6.8.0-22-generic x86_64
.etc.request-key.d.id_resolver.conf: create id_resolver * * /usr/sbin/nfsidmap -t 600 %k %d
ApportVersion: 2.28.1-0ubuntu1
Architecture: amd64
CasperMD5CheckResult: pass
Date: Fri Apr 19 14:10:25 2024
InstallationDate: Installed on 2024-04-16 (3 days ago)
InstallationMedia: Ubuntu-Server 24.04 LTS "Noble Numbat" - Beta amd64 (20240410.1)
NFSMounts:
NFSv4Mounts:
ProcEnviron:
LANG=en_US.UTF-8
PATH=(custom, no user)
SHELL=/bin/bash
TERM=xterm-256color
XDG_RUNTIME_DIR=<set>
SourcePackage: nfs-utils
UpgradeStatus: No upgrade log present (probably fresh install)
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2062568/+subscriptions
More information about the foundations-bugs
mailing list