[Bug 1074470] [NEW] NFSv4 client hang under network load
Konstantin L. Metlov
metlov at fti.dn.ua
Fri Nov 2 19:21:31 UTC 2012
Public bug reported:
While trying to upgrade some of my systems to Ubuntu 12.04 "Precise" I'm
seeing strange hangs of various processes working with files on
nfs4-mounted /home. KDE sessions in particular hang very often on
startup or after short usage.
In hanged state all the processes, accessing NFS-mounted /home, enter
the state of uninterruptable sleep (D). Sometimes, after long wait
(around 10-15 minutes) some of these processes wake up and continue, but
realistically reboot is the only option to bring the machine back on-
line for a brief period before the next hang. After the hang dmesg
displays a number of kernel stack traces "process XXX blocked for more
than YYY seconds" with "ktime_get_ts" and "rpc_make_runnable" on the
top of call stack. It happens with both TCP and UDP transports.
The hang happens only when the network is loaded. When client is
connected directly to the NFS server (running under ubuntu Lucid with
oneiric backported kernel) via a separate Ethernet switch NFS on it
works perfectly ! But, if there is network congestion, the NFS accesses
randomly hang.
It is also possible to reproduce the hang by making a large rsync file
transfer to the client, while accessing the NFS-mounted /home. In this
case the NFS-reading processes hang almost instantly even when logging
in via console.
By all symptoms this hang resembles the one fixed by "SUNRPC: Fix a UDP
transport regression" in 3.2.0-32.51 Ubuntu kernel (exactly the kernel
I'm using and seeng hangs on). RPC traces show a number of hanged
requests, in "q:xprt_sending" state like this
Nov 2 20:22:51 XXX kernel: [15060.853376] -pid- flgs status -client- --rqstp- -timeout ---ops--
Nov 2 20:22:51 XXX kernel: [15060.853393] 9903 0821 -11 f243f000 f256d700 0 f870d0f4 nfsv4 READ a:call_status q:xprt_sending
Nov 2 20:22:51 XXX kernel: [15060.853401] 9904 0821 -11 f243f000 f256d600 0 f870d0f4 nfsv4 READ a:call_status q:xprt_sending
Nov 2 20:22:51 XXX kernel: [15060.853408] 9916 0080 -11 f243f000 f256d500 0 f86c1b18 nfsv4 STATFS a:call_connect_status q:xprt_sending
Nov 2 20:22:51 XXX kernel: [15060.853415] 9917 0080 -11 f243f000 f256d200 0 f86c1b18 nfsv4 ACCESS a:call_connect_status q:xprt_sending
Nov 2 20:22:51 XXX kernel: [15060.853423] 9914 0281 -11 f256d800 f256d300 0 f870d8ec nfsv4 RENEW a:call_status q:xprt_sending
The problem can be similar to the one, fixed by "SUNRPC: Fix a UDP
transport regression", but in NFSv4.
I'm ready to provide more information on my configuration if necessary.
** Affects: nfs-utils (Ubuntu)
Importance: Undecided
Status: New
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to nfs-utils in Ubuntu.
https://bugs.launchpad.net/bugs/1074470
Title:
NFSv4 client hang under network load
Status in “nfs-utils” package in Ubuntu:
New
Bug description:
While trying to upgrade some of my systems to Ubuntu 12.04 "Precise"
I'm seeing strange hangs of various processes working with files on
nfs4-mounted /home. KDE sessions in particular hang very often on
startup or after short usage.
In hanged state all the processes, accessing NFS-mounted /home, enter
the state of uninterruptable sleep (D). Sometimes, after long wait
(around 10-15 minutes) some of these processes wake up and continue,
but realistically reboot is the only option to bring the machine back
on-line for a brief period before the next hang. After the hang dmesg
displays a number of kernel stack traces "process XXX blocked for more
than YYY seconds" with "ktime_get_ts" and "rpc_make_runnable" on the
top of call stack. It happens with both TCP and UDP transports.
The hang happens only when the network is loaded. When client is
connected directly to the NFS server (running under ubuntu Lucid with
oneiric backported kernel) via a separate Ethernet switch NFS on it
works perfectly ! But, if there is network congestion, the NFS
accesses randomly hang.
It is also possible to reproduce the hang by making a large rsync file
transfer to the client, while accessing the NFS-mounted /home. In this
case the NFS-reading processes hang almost instantly even when logging
in via console.
By all symptoms this hang resembles the one fixed by "SUNRPC: Fix a
UDP transport regression" in 3.2.0-32.51 Ubuntu kernel (exactly the
kernel I'm using and seeng hangs on). RPC traces show a number of
hanged requests, in "q:xprt_sending" state like this
Nov 2 20:22:51 XXX kernel: [15060.853376] -pid- flgs status -client- --rqstp- -timeout ---ops--
Nov 2 20:22:51 XXX kernel: [15060.853393] 9903 0821 -11 f243f000 f256d700 0 f870d0f4 nfsv4 READ a:call_status q:xprt_sending
Nov 2 20:22:51 XXX kernel: [15060.853401] 9904 0821 -11 f243f000 f256d600 0 f870d0f4 nfsv4 READ a:call_status q:xprt_sending
Nov 2 20:22:51 XXX kernel: [15060.853408] 9916 0080 -11 f243f000 f256d500 0 f86c1b18 nfsv4 STATFS a:call_connect_status q:xprt_sending
Nov 2 20:22:51 XXX kernel: [15060.853415] 9917 0080 -11 f243f000 f256d200 0 f86c1b18 nfsv4 ACCESS a:call_connect_status q:xprt_sending
Nov 2 20:22:51 XXX kernel: [15060.853423] 9914 0281 -11 f256d800 f256d300 0 f870d8ec nfsv4 RENEW a:call_status q:xprt_sending
The problem can be similar to the one, fixed by "SUNRPC: Fix a UDP
transport regression", but in NFSv4.
I'm ready to provide more information on my configuration if
necessary.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/1074470/+subscriptions
More information about the foundations-bugs
mailing list