[Bug 1313513] Re: mountall does not honour _netdev

Tue Apr 29 00:51:33 UTC 2014

On 29/04/14 10:32, Steve Langasek wrote:
> While the mount(8) manpage says that _netdev causes the mount to be
> deferred until the network is up, this manpage was written in a bygone
> era when "network up" was a discrete event, which it hasn't been for a
> long time.

Ahh, so out of date documentation strikes again.  Ahh well, we should
perhaps amend that documentation.  Or an equivalent feature re-instated,
as I believe there are valid use cases for the old _netdev behaviour.

> The current behavior is that _netdev devices will be tried
> immediately on boot, and tried again each time a network interface comes
> up.  If this doesn't give the desired results, I think this is a bug in
> the ceph driver - not in mountall, which has been tested with _netdev
> (and network filesystems) repeatedly and shown to work correctly.

The trouble is it hangs waiting for a /dev/rbd device to appear, which
won't happen until the 'rbdmap' service is started.

Once 'rbdmap' has done its duty, mount works as expected (and thus,
mountall should also work).

>> As seen from the attached snapshot, it doesn't bother to wait,
>> and blindly tries to mount the RBD before connecting to Ceph:
>> this will never work.
> 
> If there is a specific connection that needs to be made before running
> the mount command, then I don't think that's something mountall can be
> expected to handle.  Something else on the system would need to
> intercept the request for a ceph mount, and block it until ceph is
> available.

How about not blocking the entire system boot so the machine remains
unresponsive and impossible to connect to remotely?

Some of the machines we look after are stuck in military bases or
underground in mines: it's not like we can just stroll up to the console
and press a button.

Had the 'mountall' not stalled the entire boot sequence, but allowed the
boot to proceed minus the /var/lib/one whilst continuing to retry, it
might've found the device it needed would appear in time.

I can understand the "let's wait it out and see if it appears", but not
the "let's halt everything until the device magically appears".  The
latter is dangerous for any system for which local console access is
difficult or unavailable. (As is my case here, with the buggered keyboard.)

Regards,
-- 
Stuart Longland
Systems Engineer
     _ ___
\  /|_) |                           T: +61 7 3535 9619
 \/ | \ |     38b Douglas Street    F: +61 7 3535 9699
   SYSTEMS    Milton QLD 4064       http://www.vrt.com.au

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to mountall in Ubuntu.
https://bugs.launchpad.net/bugs/1313513

Title:
  mountall does not honour _netdev

Status in “mountall” package in Ubuntu:
  New

Bug description:
  Hi,

  This is a fresh install of Ubuntu 14.04 LTS AMD64.  I tried
  configuring a Ceph Rados Block Device (rbd) to be mounted during boot
  on /var/lib/one, containing my OpenNebula configuration and database.

  The idea being that should the machine go belly up, I'll have an up-
  to-date snapshot of the OpenNebula data on Ceph to mount on the new
  frontend machine.

  /etc/ceph/rbdmap is configured, I set up /etc/fstab with an entry:

  /dev/rbd/pool/rbdname /var/lib/one xfs defaults,_netdev 0 1

  then rebooted.  According to mount(8), _netdev is supposed to tell
  mountall to skip mounting this device until the network is up.

  As seen from the attached snapshot, it doesn't bother to wait, and
  blindly tries to mount the RBD before connecting to Ceph: this will
  never work.

  mountall seems to rely on *knowing* a list of network file systems:
  this means when someone comes up with a new network file system, or
  uses a conventional disk file system with a remote block device,
  mountall's heuristic falls flat on its face as has been demonstrated
  here.  The problem would also exist for iSCSI, AoE, FibreChannel, nbd
  and drbd devices.

  Due to bug 1313497, the keyboard is non-functional.  Recovery is
  useless as the keyboard is broken there too, and now the machine is
  waiting for a keypress it will never see due to that bug.  A headless
  system would similarly have this problem.

  Two suggestions I would have:
  1. mountall should honour _netdev to decide whether to mount a device or not: this gives the user the means to manually tell mountall that the device needs network access to operate even if the filesystem looks to be local.  I'd wager that if the user specified _netdev, they probably meant it and likely know better than mountall.
  2. mountall should time out after a predefined period and NEVER wait indefinitely: even if the disk is local.  If a disk goes missing, then it is better the machine tries to boot in its degraded state so it can be remotely managed and raise an alarm, than to wait for someone to notice the machine being down.

  Unfortunately since the machine is now effectively bricked, I can only
  grep proxy server logs to see what packages got installed.
  mountall_2.53_amd64.deb seems to be the culprit.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/mountall/+bug/1313513/+subscriptions