[Bug 506798] Re: du crashes when traversing nfs mounted .snapshot directories
Bug Watch Updater
506798 at bugs.launchpad.net
Fri Oct 27 03:04:41 UTC 2017
Launchpad has imported 14 comments from the remote bug at
https://bugzilla.redhat.com/show_bug.cgi?id=533569.
If you reply to an imported comment from within Launchpad, your comment
will be sent to the remote bug automatically. Read more about
Launchpad's inter-bugtracker facilities at
https://help.launchpad.net/InterBugTracking.
------------------------------------------------------------------------
On 2009-11-07T11:52:48+00:00 Jim wrote:
Description of problem: in the vicinity of a mount point directory, two
directories may have the same device and inode number. This is a
serious problem because many tools treat the condition as indicating a
hard directory cycle, which usually indicates file system corruption.
Version-Release number of selected component (if applicable):
2.6.31.5-122.fc12.x86_64
How reproducible: every time
Steps to Reproduce:
Based on the set-up from Kamil Dudka in https://bugzilla.redhat.com/show_bug.cgi?id=501848#c45
# mount | grep ^/
...
/dev/sda8 on /home type ext4 (rw,noatime)
...
# top=/home
# cat /etc/exports
# printf "/ *(fsid=0,crossmnt)\n$top *(crossmnt)\n" >> /etc/exports
# service nfs restart
...
# mkdir /tmp/mnt
# mount -t nfs4 localhost:/ /tmp/mnt
# stat --printf "%d %i %n\n" /tmp/mnt{,$top}
22 2 /tmp/mnt
22 2 /tmp/mnt/home
Then, using the very latest du from upstream coreutils.git,
I see this:
$ du /tmp/mnt > /dev/null
du: WARNING: Circular directory structure.
This almost certainly means that you have a corrupted file system.
NOTIFY YOUR SYSTEM MANAGER.
The following directory is part of the cycle:
`/tmp/mnt/home'
Actual results: above
Expected results: different dev and/or inode, no du failure
Additional info:
Reply at:
https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/0
------------------------------------------------------------------------
On 2009-11-10T18:33:05+00:00 Steve wrote:
> # stat --printf "%d %i %n\n" /tmp/mnt{,$top}
> 22 2 /tmp/mnt
> 22 2 /tmp/mnt/home
I do see this... but
> $ du /tmp/mnt > /dev/null
> du: WARNING: Circular directory structure.
> This almost certainly means that you have a corrupted file system.
> NOTIFY YOUR SYSTEM MANAGER.
> The following directory is part of the cycle:
> `/tmp/mnt/home'
What kernel are you using and nfs-utils
Reply at:
https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/1
------------------------------------------------------------------------
On 2009-11-10T18:35:03+00:00 Steve wrote:
I meant to say... I don't see the du error... what kernel/nfs-utils are
you using..
Reply at:
https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/2
------------------------------------------------------------------------
On 2009-11-10T18:37:09+00:00 Kamil wrote:
(In reply to comment #2)
> I meant to say... I don't see the du error... what kernel/nfs-utils are
> you using..
You need to compile GNU coreutils from git to see the error.
Reply at:
https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/3
------------------------------------------------------------------------
On 2009-11-10T18:39:56+00:00 Jim wrote:
Hi Steve, kernel version is listed above.
nfs-utils-1.2.0-18.fc12.x86_64
Reply at:
https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/4
------------------------------------------------------------------------
On 2009-11-10T18:45:33+00:00 Jeff wrote:
I think I understand what the issue is here. I just don't think that
there's much we can do about it...
The stat program is doing a lstat() and that doesn't trigger a submount
(LOOKUP_FOLLOW isn't set). So we end up doing a GETATTR call that
returns info on the root inode of the /home mount. So the stat() syscall
gets the "real" st_ino of /tmp/mnt/home, but the st_dev is still that of
the parent (/tmp/mnt).
This is particularly evident here because the root of any ext3/4
filesystem has an st_ino of 2.
I think our options are:
1) fix the kernel to trigger a submount even when LOOKUP_FOLLOW isn't
set (quite possibly very hard on performance)
2) fix the kernel to return a bit more info when we have a "potential
mountpoint" like this. My suggestion on LKML was to coopt a new
st_mode/i_mode bit and use that to indicate that a directory is
potentially a new mountpoint if someone were to walk into it
So far, my suggestion hasn't received any feedback upstream.
Reply at:
https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/5
------------------------------------------------------------------------
On 2009-11-16T15:17:01+00:00 Bug wrote:
This bug appears to have been reported against 'rawhide' during the Fedora 12 development cycle.
Changing version to '12'.
More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Reply at:
https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/6
------------------------------------------------------------------------
On 2010-03-16T07:48:36+00:00 Jim wrote:
AFAIK, nothing has changed, so I've reset "Version:" to rawhide.
Reply at:
https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/25
------------------------------------------------------------------------
On 2010-03-16T12:18:51+00:00 Bug wrote:
This bug appears to have been reported against 'rawhide' during the Fedora 13 development cycle.
Changing version to '13'.
More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Reply at:
https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/26
------------------------------------------------------------------------
On 2010-04-08T12:33:19+00:00 Jim wrote:
Still affects rawhide, too.
Reply at:
https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/27
------------------------------------------------------------------------
On 2010-07-30T10:46:43+00:00 Bug wrote:
This bug appears to have been reported against 'rawhide' during the Fedora 14 development cycle.
Changing version to '14'.
More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping
Reply at:
https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/28
------------------------------------------------------------------------
On 2010-09-02T07:02:44+00:00 Jim wrote:
Changing version back to 'rawhide'.
Reply at:
https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/29
------------------------------------------------------------------------
On 2012-10-17T07:45:33+00:00 Ric wrote:
Is this something that we can change in upstream or should we close this
out?
Reply at:
https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/30
------------------------------------------------------------------------
On 2012-10-18T15:29:02+00:00 Jeff wrote:
Not much we can do, I don't think...
If anything, the automount semantics are even less likely to trigger a
mount these days. I think the only hope for this problem is the xstat()
work that dhowells was working on, but that has sort of died upstream.
I'll go ahead and close this WONTFIX for now. Please reopen it if you
want to discuss it further.
Reply at:
https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/comments/31
** Changed in: linux (Fedora)
Status: Unknown => Won't Fix
** Changed in: linux (Fedora)
Importance: Unknown => Medium
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to findutils in Ubuntu.
https://bugs.launchpad.net/bugs/506798
Title:
du crashes when traversing nfs mounted .snapshot directories
Status in coreutils package in Ubuntu:
Triaged
Status in findutils package in Ubuntu:
Triaged
Status in linux package in Ubuntu:
Confirmed
Status in coreutils package in Fedora:
Unknown
Status in linux package in Fedora:
Won't Fix
Bug description:
Binary package hint: coreutils
I'm getting a problem where du errors (and exits) with "du: fts_read
failed: no such file or directory" when traversing a directory with a
NetApp ".snapshot" directory.
My understanding (clarified by the discussions linked bellow) is that:
1) The device ID/inode of a directory is recorded before the submount is made.
2) The device ID of the directory changes after the directory has been read (via readdir which causes the submount)
3) After examining the contents of the directory du goes back up the tree (via '..') finds the device ID doesn't match what it has recorded and assumes things have been moved around under it and bails for safety reasons.
I've researched online and this is an upstream bug. We're using
Ubuntu 9.10 so I feel there should be a bug in the Ubuntu system.
The best information I've found is within Redhat's bugzilla:
https://bugzilla.redhat.com/show_bug.cgi?id=501848
https://bugzilla.redhat.com/show_bug.cgi?id=533569
This bug has also been discussed on the coreutils mailing list:
http://lists.gnu.org/archive/html/bug-gnulib/2009-11/msg00027.html
http://lists.gnu.org/archive/html/bug-gnulib/2009-11/msg00032.html
and LKML:
http://lkml.org/lkml/2009/11/4/451
Unfortunately none of these discussions has resulted in a widely
accepted solution.
We use NetApp .snapshots very extensively and can't afford for du to
be unreliable. At the moment we will either have to patch du or
downgrade all of coreutils to an older version.
For comparison we are upgrading from Ubunto 7.04 which works
perfectly.
There is a similar problem with find, but it has a --without-fts build
option which 'fixes' it.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/coreutils/+bug/506798/+subscriptions
More information about the foundations-bugs
mailing list