[Bug 1074470] [NEW] NFSv4 client hang under network load

Fri Nov 2 19:21:31 UTC 2012

Public bug reported:

While trying to upgrade some of my systems to Ubuntu 12.04 "Precise" I'm
seeing strange hangs of various processes working with files on
nfs4-mounted /home. KDE sessions in particular hang very often on
startup or after short usage.

In hanged state all the processes, accessing NFS-mounted /home, enter
the state of uninterruptable sleep (D). Sometimes, after long wait
(around 10-15 minutes) some of these processes wake up and continue, but
realistically reboot is the only option to bring the machine back on-
line for a brief period before the next hang. After the hang dmesg
displays a number of kernel stack traces "process XXX blocked for more
than YYY seconds" with "ktime_get_ts" and "rpc_make_runnable" on  the
top of call stack. It happens with both TCP and UDP transports.

The hang happens only when the network is loaded. When client is
connected directly to the NFS server (running under ubuntu Lucid with
oneiric backported kernel) via a separate Ethernet switch NFS on it
works perfectly ! But, if there is network congestion, the NFS accesses
randomly hang.

It is also possible to reproduce the hang by making a large rsync file
transfer to the client, while accessing the NFS-mounted /home. In this
case the NFS-reading processes hang almost instantly even when logging
in via console.

By all symptoms this hang resembles the one fixed by "SUNRPC: Fix a UDP
transport regression" in 3.2.0-32.51 Ubuntu kernel (exactly the kernel
I'm using and seeng hangs on). RPC traces show a number of hanged
requests, in "q:xprt_sending" state like this

Nov  2 20:22:51 XXX kernel: [15060.853376] -pid- flgs status -client- --rqstp- -timeout ---ops--
Nov  2 20:22:51 XXX kernel: [15060.853393]  9903 0821    -11 f243f000 f256d700        0 f870d0f4 nfsv4 READ a:call_status q:xprt_sending
Nov  2 20:22:51 XXX kernel: [15060.853401]  9904 0821    -11 f243f000 f256d600        0 f870d0f4 nfsv4 READ a:call_status q:xprt_sending
Nov  2 20:22:51 XXX kernel: [15060.853408]  9916 0080    -11 f243f000 f256d500        0 f86c1b18 nfsv4 STATFS a:call_connect_status q:xprt_sending
Nov  2 20:22:51 XXX kernel: [15060.853415]  9917 0080    -11 f243f000 f256d200        0 f86c1b18 nfsv4 ACCESS a:call_connect_status q:xprt_sending
Nov  2 20:22:51 XXX kernel: [15060.853423]  9914 0281    -11 f256d800 f256d300        0 f870d8ec nfsv4 RENEW a:call_status q:xprt_sending

The problem can be similar to the one, fixed by "SUNRPC: Fix a UDP
transport regression", but in NFSv4.

I'm ready to provide more information on my configuration if necessary.

** Affects: nfs-utils (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to nfs-utils in Ubuntu.
https://bugs.launchpad.net/bugs/1074470

Title:
  NFSv4 client hang under network load

Status in “nfs-utils” package in Ubuntu:
  New

Bug description:
  While trying to upgrade some of my systems to Ubuntu 12.04 "Precise"
  I'm seeing strange hangs of various processes working with files on
  nfs4-mounted /home. KDE sessions in particular hang very often on
  startup or after short usage.

  In hanged state all the processes, accessing NFS-mounted /home, enter
  the state of uninterruptable sleep (D). Sometimes, after long wait
  (around 10-15 minutes) some of these processes wake up and continue,
  but realistically reboot is the only option to bring the machine back
  on-line for a brief period before the next hang. After the hang dmesg
  displays a number of kernel stack traces "process XXX blocked for more
  than YYY seconds" with "ktime_get_ts" and "rpc_make_runnable" on  the
  top of call stack. It happens with both TCP and UDP transports.

  The hang happens only when the network is loaded. When client is
  connected directly to the NFS server (running under ubuntu Lucid with
  oneiric backported kernel) via a separate Ethernet switch NFS on it
  works perfectly ! But, if there is network congestion, the NFS
  accesses randomly hang.

  It is also possible to reproduce the hang by making a large rsync file
  transfer to the client, while accessing the NFS-mounted /home. In this
  case the NFS-reading processes hang almost instantly even when logging
  in via console.

  By all symptoms this hang resembles the one fixed by "SUNRPC: Fix a
  UDP transport regression" in 3.2.0-32.51 Ubuntu kernel (exactly the
  kernel I'm using and seeng hangs on). RPC traces show a number of
  hanged requests, in "q:xprt_sending" state like this

  Nov  2 20:22:51 XXX kernel: [15060.853376] -pid- flgs status -client- --rqstp- -timeout ---ops--
  Nov  2 20:22:51 XXX kernel: [15060.853393]  9903 0821    -11 f243f000 f256d700        0 f870d0f4 nfsv4 READ a:call_status q:xprt_sending
  Nov  2 20:22:51 XXX kernel: [15060.853401]  9904 0821    -11 f243f000 f256d600        0 f870d0f4 nfsv4 READ a:call_status q:xprt_sending
  Nov  2 20:22:51 XXX kernel: [15060.853408]  9916 0080    -11 f243f000 f256d500        0 f86c1b18 nfsv4 STATFS a:call_connect_status q:xprt_sending
  Nov  2 20:22:51 XXX kernel: [15060.853415]  9917 0080    -11 f243f000 f256d200        0 f86c1b18 nfsv4 ACCESS a:call_connect_status q:xprt_sending
  Nov  2 20:22:51 XXX kernel: [15060.853423]  9914 0281    -11 f256d800 f256d300        0 f870d8ec nfsv4 RENEW a:call_status q:xprt_sending

  The problem can be similar to the one, fixed by "SUNRPC: Fix a UDP
  transport regression", but in NFSv4.

  I'm ready to provide more information on my configuration if
  necessary.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/nfs-utils/+bug/1074470/+subscriptions