[Bug 1860046] Re: Race to mount seed device

Łukasz Zemczak 1860046 at bugs.launchpad.net
Fri Jan 24 18:10:38 UTC 2020


Hello Dave, or anyone else affected,

Accepted livecd-rootfs into bionic-proposed. The package will build now
and be available at https://launchpad.net/ubuntu/+source/livecd-
rootfs/2.525.39 in a few hours, and then in the -proposed repository.

Please help us by testing this new package.  See
https://wiki.ubuntu.com/Testing/EnableProposed for documentation on how
to enable and use -proposed.  Your feedback will aid us getting this
update out to other Ubuntu users.

If this package fixes the bug for you, please add a comment to this bug,
mentioning the version of the package you tested and change the tag from
verification-needed-bionic to verification-done-bionic. If it does not
fix the bug for you, please add a comment stating that, and change the
tag to verification-failed-bionic. In either case, without details of
your testing we will not be able to proceed.

Further information regarding the verification process can be found at
https://wiki.ubuntu.com/QATeam/PerformingSRUVerification .  Thank you in
advance for helping!

N.B. The updated package will be released to -updates after the bug(s)
fixed by this package have been verified and the package has been in
-proposed for a minimum of 7 days.

** Description changed:

- We've just come across a situation where cloud-init races systemd to
- mount a partition, with the result that either cloud-init fails to load
- its seed, or a relatively important partition fails to mount. I'm not
- entirely sure this is a bug - it could well be argued this is mis-
- configuration on our part, and there's a trivial workaround, but I'll
- leave the cloud-init devs to make that call.
+ [Impact]
+ 
+ We've just come across a situation where cloud-init races systemd to mount a partition, with the result that either cloud-init fails to load its seed, or a relatively important partition fails to mount.
+ This can basically lead to situations where the device will not boot.
+ 
+ [Test Case]
+ 
+ Boot the image built using the bionic-proposed livecd-rootfs multiple
+ times and make sure it is booting up correctly.
+ 
+ TODO: find a better test case?
+ 
+ [Regression Potential]
+ 
+ The change seems relatively safe and doesn't seem like there are any
+ obvious regressions it could cause, but one should be on a lookout for
+ any cloud-init weirdness or issues on boot-time.
+ 
+ [Original Description]
+ 
+ 
+ We've just come across a situation where cloud-init races systemd to mount a partition, with the result that either cloud-init fails to load its seed, or a relatively important partition fails to mount. I'm not entirely sure this is a bug - it could well be argued this is mis-configuration on our part, and there's a trivial workaround, but I'll leave the cloud-init devs to make that call.
  
  The configuration is as follows:
  
  1. On recent Ubuntu classic images for the Raspberry Pi, cloud-init's
  seed is located on the boot partition (/dev/mmcblk0p1, labelled "system-
  boot"). This is desirable as it's a straight-forward FAT partition, and
  thus easily accessible and editable on any OS (as opposed to the
  original seed location on the root ext4 partition).
  
  2. Cloud-init is configured to look for its seed with "fs_label: system-
  boot"
  
  3. Systemd is configured (via /etc/fstab) to mount /dev/disk/by-label
  /system-boot (/dev/mmcblk0p1) on /boot/firmware
  
  The result on boot-up is one of three situations:
  
  1. cloud-init mounts /dev/mmcblk0p1 on a temp path. While attempting to
  read its seed, systemd attempts to mount /dev/mmcblk0p1 on
  /boot/firmware but fails because the device is already mounted
  elsewhere. cloud-init succeeds in reading its seed, umounts
  /dev/mmcblk0p1 and the system boots successfully, but /boot/firmware
  isn't mounted (leading to issues further down the line with other things
  like flash-kernel that expect it to be mounted).
  
  2. cloud-init checks for /dev/mmcblk0p1 in /proc/mounts, and doesn't see
  it there. It goes to mount /dev/mmcblk0p1 on a temp path, but before it
  can do so, systemd mounts it on /boot/firmware. cloud-init's mount fails
  (device already mounted), so it fails to read its seed and uses a
  default config instead (which isn't entirely desirable for us as the
  default involves a long wait if ethernet is not connected; our default
  seed overrides that).
  
  3. Everything works fine. This occurs when either a) systemd mounts
  /boot/firmware first, then cloud-init sees this mount in /proc/mounts
  and uses it or b) cloud-init mounts /dev/mmcblk0p1 on a temp path, reads
  its seed and unmounts it, *then* systemd mounts /boot/firmware.
  
  Here's the relevant snippet of traceback from when the second result
  occurs:
  
  ...
  2020-01-14 20:48:01,337 - util.py[DEBUG]: Read 386 bytes from /etc/os-release
  2020-01-14 20:48:01,338 - util.py[DEBUG]: Running command ['blkid', '-odevice', '/dev/sr0'] with allowed return codes [0, 2] (shell=False, capture=True)
  2020-01-14 20:48:01,348 - util.py[DEBUG]: Running command ['blkid', '-odevice', '/dev/sr1'] with allowed return codes [0, 2] (shell=False, capture=True)
  2020-01-14 20:48:01,357 - util.py[DEBUG]: Running command ['blkid', '-tTYPE=vfat', '-odevice'] with allowed return codes [0, 2] (shell=False, capture=True)
  2020-01-14 20:48:01,415 - util.py[DEBUG]: Running command ['blkid', '-tTYPE=iso9660', '-odevice'] with allowed return codes [0, 2] (shell=False, capture=True)
  2020-01-14 20:48:01,432 - util.py[DEBUG]: Running command ['blkid', '-tLABEL=SYSTEM-BOOT', '-odevice'] with allowed return codes [0, 2] (shell=False, capture=True)
  2020-01-14 20:48:01,448 - util.py[DEBUG]: Running command ['blkid', '-tLABEL=system-boot', '-odevice'] with allowed return codes [0, 2] (shell=False, capture=True)
  2020-01-14 20:48:01,466 - DataSourceNoCloud.py[DEBUG]: Attempting to use data from /dev/mmcblk0p1
  2020-01-14 20:48:01,467 - util.py[DEBUG]: Reading from /proc/mounts (quiet=False)
  2020-01-14 20:48:01,467 - util.py[DEBUG]: Read 2062 bytes from /proc/mounts
  2020-01-14 20:48:01,468 - util.py[DEBUG]: Fetched {'sysfs': {'fstype': 'sysfs', 'mountpoint': '/sys', 'opts': 'rw,nosuid,nodev,noexec,relatime'}, 'proc': {'fstype': 'proc', 'mountpoint': '/proc', 'opts': 'rw,nosuid,nodev,noexec,relatime'}, 'udev': {'fstype': 'devtmpfs', 'mountpoint': '/dev', 'opts': 'rw,nosuid,relatime,size=1913472k,nr_inodes=156992,mode=755'}, 'devpts': {'fstype': 'devpts', 'mountpoint': '/dev/pts', 'opts': 'rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000'}, 'tmpfs': {'fstype': 'tmpfs', 'mountpoint': '/etc/machine-id', 'opts': 'ro,relatime,size=391432k,mode=755'}, '/dev/mmcblk0p2': {'fstype': 'ext4', 'mountpoint': '/', 'opts': 'rw,relatime'}, 'securityfs': {'fstype': 'securityfs', 'mountpoint': '/sys/kernel/security', 'opts': 'rw,nosuid,nodev,noexec,relatime'}, 'cgroup': {'fstype': 'cgroup', 'mountpoint': '/sys/fs/cgroup/freezer', 'opts': 'rw,nosuid,nodev,noexec,relatime,freezer'}, 'pstore': {'fstype': 'pstore', 'mountpoint': '/sys/fs/pstore', 'opts': 'rw,nosuid,nodev,noexec,relatime'}, 'systemd-1': {'fstype': 'autofs', 'mountpoint': '/proc/sys/fs/binfmt_misc', 'opts': 'rw,relatime,fd=26,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=1728'}, 'debugfs': {'fstype': 'debugfs', 'mountpoint': '/sys/kernel/debug', 'opts': 'rw,relatime'}, 'mqueue': {'fstype': 'mqueue', 'mountpoint': '/dev/mqueue', 'opts': 'rw,relatime'}, 'tracefs': {'fstype': 'tracefs', 'mountpoint': '/sys/kernel/debug/tracing', 'opts': 'rw,relatime'}, 'configfs': {'fstype': 'configfs', 'mountpoint': '/sys/kernel/config', 'opts': 'rw,relatime'}, 'fusectl': {'fstype': 'fusectl', 'mountpoint': '/sys/fs/fuse/connections', 'opts': 'rw,relatime'}} mounts from proc
  2020-01-14 20:48:01,468 - util.py[DEBUG]: Running command ['mount', '-o', 'ro', '-t', 'auto', '/dev/mmcblk0p1', '/run/cloud-init/tmp/tmp2z8rueu4'] with allowed return codes [0] (shell=False, capture=True)
  2020-01-14 20:48:01,529 - util.py[DEBUG]: Failed mount of '/dev/mmcblk0p1' as 'auto': Unexpected error while running command.
  Command: ['mount', '-o', 'ro', '-t', 'auto', '/dev/mmcblk0p1', '/run/cloud-init/tmp/tmp2z8rueu4']
  Exit code: 32
  Reason: -
- Stdout: 
+ Stdout:
  Stderr: mount: /run/cloud-init/tmp/tmp2z8rueu4: /dev/mmcblk0p1 already mounted on /boot/firmware.
  2020-01-14 20:48:01,530 - util.py[WARNING]: Failed to mount /dev/mmcblk0p1 when looking for data
  2020-01-14 20:48:01,533 - util.py[DEBUG]: Failed to mount /dev/mmcblk0p1 when looking for data
  Traceback (most recent call last):
-   File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceNoCloud.py", line 128, in _get_data
-     pp2d_kwargs)
-   File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 1758, in mount_cb
-     (device, tmpd, failure_reason))
+   File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceNoCloud.py", line 128, in _get_data
+     pp2d_kwargs)
+   File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 1758, in mount_cb
+     (device, tmpd, failure_reason))
  cloudinit.util.MountFailedError: Failed mounting /dev/mmcblk0p1 to /run/cloud-init/tmp/tmp2z8rueu4 due to: Unexpected error while running command.
  Command: ['mount', '-o', 'ro', '-t', 'auto', '/dev/mmcblk0p1', '/run/cloud-init/tmp/tmp2z8rueu4']
  Exit code: 32
  Reason: -
- Stdout: 
+ Stdout:
  Stderr: mount: /run/cloud-init/tmp/tmp2z8rueu4: /dev/mmcblk0p1 already mounted on /boot/firmware.
  2020-01-14 20:48:01,549 - __init__.py[DEBUG]: Datasource DataSourceNoCloud [seed=None][dsmode=net] not updated for events: New instance first boot
  2020-01-14 20:48:01,549 - handlers.py[DEBUG]: finish: init-local/search-NoCloud: SUCCESS: no local data found from DataSourceNoCloud
  2020-01-14 20:48:01,549 - main.py[DEBUG]: No local datasource found
  ...
  
  The relevant code is mount_cb() in util.py which is calling mounts() to
  read /proc/mounts then attempting to mount the device when it's not
  found (leading to a classic race between the check and the action).
  
  It might be argued that it should simply attempt the mount, and check
  /proc/mounts in the event of failure, but I'm wary that mounting FS' is
  a potentially costly exercise (can the seed be on a remote mount?), and
  there's probably edge cases here I'm unaware of (is it possible, perhaps
  dangerous, to double-mount certain devices?).
  
  Perhaps it could be adjusted to check /proc/mounts, attempt the mount,
  then *re-check* /proc/mounts in the case of failure?
  
  Nevertheless, there's a trivial work-around in our case: just add an
  override for "RequiresMountFor=/boot/firmware" to our systemd config for
  cloud-init-local, thus ensuring the mount is definitely available prior
  to cloud-init looking for its seed. This in turn argues the issue could
  simply be viewed as mis-configuration on our part: we *know* the seed is
  on /dev/mmcblk0p1 as that's where we've configured it to be; should we
  not also configure cloud-init in such a way that we guarantee the device
  containing its seed is definitely mounted prior to it running?

** Changed in: cloud-init (Ubuntu)
       Status: Triaged => Fix Committed

** Changed in: cloud-init (Ubuntu)
       Status: Fix Committed => Invalid

** Also affects: livecd-rootfs (Ubuntu)
   Importance: Undecided
       Status: New

** Changed in: livecd-rootfs (Ubuntu)
       Status: New => Fix Committed

** Also affects: livecd-rootfs (Ubuntu Bionic)
   Importance: Undecided
       Status: New

** Also affects: cloud-init (Ubuntu Bionic)
   Importance: Undecided
       Status: New

** Changed in: livecd-rootfs (Ubuntu Bionic)
       Status: New => In Progress

** Changed in: livecd-rootfs (Ubuntu Bionic)
   Importance: Undecided => Critical

** Changed in: livecd-rootfs (Ubuntu)
   Importance: Undecided => High

** Changed in: livecd-rootfs (Ubuntu Bionic)
       Status: In Progress => Fix Committed

** Tags added: verification-needed verification-needed-bionic

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to livecd-rootfs in Ubuntu.
https://bugs.launchpad.net/bugs/1860046

Title:
  Race to mount seed device

Status in cloud-init package in Ubuntu:
  Invalid
Status in livecd-rootfs package in Ubuntu:
  Fix Committed
Status in cloud-init source package in Bionic:
  New
Status in livecd-rootfs source package in Bionic:
  Fix Committed

Bug description:
  [Impact]

  We've just come across a situation where cloud-init races systemd to mount a partition, with the result that either cloud-init fails to load its seed, or a relatively important partition fails to mount.
  This can basically lead to situations where the device will not boot.

  [Test Case]

  Boot the image built using the bionic-proposed livecd-rootfs multiple
  times and make sure it is booting up correctly.

  TODO: find a better test case?

  [Regression Potential]

  The change seems relatively safe and doesn't seem like there are any
  obvious regressions it could cause, but one should be on a lookout for
  any cloud-init weirdness or issues on boot-time.

  [Original Description]

  
  We've just come across a situation where cloud-init races systemd to mount a partition, with the result that either cloud-init fails to load its seed, or a relatively important partition fails to mount. I'm not entirely sure this is a bug - it could well be argued this is mis-configuration on our part, and there's a trivial workaround, but I'll leave the cloud-init devs to make that call.

  The configuration is as follows:

  1. On recent Ubuntu classic images for the Raspberry Pi, cloud-init's
  seed is located on the boot partition (/dev/mmcblk0p1, labelled
  "system-boot"). This is desirable as it's a straight-forward FAT
  partition, and thus easily accessible and editable on any OS (as
  opposed to the original seed location on the root ext4 partition).

  2. Cloud-init is configured to look for its seed with "fs_label:
  system-boot"

  3. Systemd is configured (via /etc/fstab) to mount /dev/disk/by-label
  /system-boot (/dev/mmcblk0p1) on /boot/firmware

  The result on boot-up is one of three situations:

  1. cloud-init mounts /dev/mmcblk0p1 on a temp path. While attempting
  to read its seed, systemd attempts to mount /dev/mmcblk0p1 on
  /boot/firmware but fails because the device is already mounted
  elsewhere. cloud-init succeeds in reading its seed, umounts
  /dev/mmcblk0p1 and the system boots successfully, but /boot/firmware
  isn't mounted (leading to issues further down the line with other
  things like flash-kernel that expect it to be mounted).

  2. cloud-init checks for /dev/mmcblk0p1 in /proc/mounts, and doesn't
  see it there. It goes to mount /dev/mmcblk0p1 on a temp path, but
  before it can do so, systemd mounts it on /boot/firmware. cloud-init's
  mount fails (device already mounted), so it fails to read its seed and
  uses a default config instead (which isn't entirely desirable for us
  as the default involves a long wait if ethernet is not connected; our
  default seed overrides that).

  3. Everything works fine. This occurs when either a) systemd mounts
  /boot/firmware first, then cloud-init sees this mount in /proc/mounts
  and uses it or b) cloud-init mounts /dev/mmcblk0p1 on a temp path,
  reads its seed and unmounts it, *then* systemd mounts /boot/firmware.

  Here's the relevant snippet of traceback from when the second result
  occurs:

  ...
  2020-01-14 20:48:01,337 - util.py[DEBUG]: Read 386 bytes from /etc/os-release
  2020-01-14 20:48:01,338 - util.py[DEBUG]: Running command ['blkid', '-odevice', '/dev/sr0'] with allowed return codes [0, 2] (shell=False, capture=True)
  2020-01-14 20:48:01,348 - util.py[DEBUG]: Running command ['blkid', '-odevice', '/dev/sr1'] with allowed return codes [0, 2] (shell=False, capture=True)
  2020-01-14 20:48:01,357 - util.py[DEBUG]: Running command ['blkid', '-tTYPE=vfat', '-odevice'] with allowed return codes [0, 2] (shell=False, capture=True)
  2020-01-14 20:48:01,415 - util.py[DEBUG]: Running command ['blkid', '-tTYPE=iso9660', '-odevice'] with allowed return codes [0, 2] (shell=False, capture=True)
  2020-01-14 20:48:01,432 - util.py[DEBUG]: Running command ['blkid', '-tLABEL=SYSTEM-BOOT', '-odevice'] with allowed return codes [0, 2] (shell=False, capture=True)
  2020-01-14 20:48:01,448 - util.py[DEBUG]: Running command ['blkid', '-tLABEL=system-boot', '-odevice'] with allowed return codes [0, 2] (shell=False, capture=True)
  2020-01-14 20:48:01,466 - DataSourceNoCloud.py[DEBUG]: Attempting to use data from /dev/mmcblk0p1
  2020-01-14 20:48:01,467 - util.py[DEBUG]: Reading from /proc/mounts (quiet=False)
  2020-01-14 20:48:01,467 - util.py[DEBUG]: Read 2062 bytes from /proc/mounts
  2020-01-14 20:48:01,468 - util.py[DEBUG]: Fetched {'sysfs': {'fstype': 'sysfs', 'mountpoint': '/sys', 'opts': 'rw,nosuid,nodev,noexec,relatime'}, 'proc': {'fstype': 'proc', 'mountpoint': '/proc', 'opts': 'rw,nosuid,nodev,noexec,relatime'}, 'udev': {'fstype': 'devtmpfs', 'mountpoint': '/dev', 'opts': 'rw,nosuid,relatime,size=1913472k,nr_inodes=156992,mode=755'}, 'devpts': {'fstype': 'devpts', 'mountpoint': '/dev/pts', 'opts': 'rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000'}, 'tmpfs': {'fstype': 'tmpfs', 'mountpoint': '/etc/machine-id', 'opts': 'ro,relatime,size=391432k,mode=755'}, '/dev/mmcblk0p2': {'fstype': 'ext4', 'mountpoint': '/', 'opts': 'rw,relatime'}, 'securityfs': {'fstype': 'securityfs', 'mountpoint': '/sys/kernel/security', 'opts': 'rw,nosuid,nodev,noexec,relatime'}, 'cgroup': {'fstype': 'cgroup', 'mountpoint': '/sys/fs/cgroup/freezer', 'opts': 'rw,nosuid,nodev,noexec,relatime,freezer'}, 'pstore': {'fstype': 'pstore', 'mountpoint': '/sys/fs/pstore', 'opts': 'rw,nosuid,nodev,noexec,relatime'}, 'systemd-1': {'fstype': 'autofs', 'mountpoint': '/proc/sys/fs/binfmt_misc', 'opts': 'rw,relatime,fd=26,pgrp=1,timeout=0,minproto=5,maxproto=5,direct,pipe_ino=1728'}, 'debugfs': {'fstype': 'debugfs', 'mountpoint': '/sys/kernel/debug', 'opts': 'rw,relatime'}, 'mqueue': {'fstype': 'mqueue', 'mountpoint': '/dev/mqueue', 'opts': 'rw,relatime'}, 'tracefs': {'fstype': 'tracefs', 'mountpoint': '/sys/kernel/debug/tracing', 'opts': 'rw,relatime'}, 'configfs': {'fstype': 'configfs', 'mountpoint': '/sys/kernel/config', 'opts': 'rw,relatime'}, 'fusectl': {'fstype': 'fusectl', 'mountpoint': '/sys/fs/fuse/connections', 'opts': 'rw,relatime'}} mounts from proc
  2020-01-14 20:48:01,468 - util.py[DEBUG]: Running command ['mount', '-o', 'ro', '-t', 'auto', '/dev/mmcblk0p1', '/run/cloud-init/tmp/tmp2z8rueu4'] with allowed return codes [0] (shell=False, capture=True)
  2020-01-14 20:48:01,529 - util.py[DEBUG]: Failed mount of '/dev/mmcblk0p1' as 'auto': Unexpected error while running command.
  Command: ['mount', '-o', 'ro', '-t', 'auto', '/dev/mmcblk0p1', '/run/cloud-init/tmp/tmp2z8rueu4']
  Exit code: 32
  Reason: -
  Stdout:
  Stderr: mount: /run/cloud-init/tmp/tmp2z8rueu4: /dev/mmcblk0p1 already mounted on /boot/firmware.
  2020-01-14 20:48:01,530 - util.py[WARNING]: Failed to mount /dev/mmcblk0p1 when looking for data
  2020-01-14 20:48:01,533 - util.py[DEBUG]: Failed to mount /dev/mmcblk0p1 when looking for data
  Traceback (most recent call last):
    File "/usr/lib/python3/dist-packages/cloudinit/sources/DataSourceNoCloud.py", line 128, in _get_data
      pp2d_kwargs)
    File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 1758, in mount_cb
      (device, tmpd, failure_reason))
  cloudinit.util.MountFailedError: Failed mounting /dev/mmcblk0p1 to /run/cloud-init/tmp/tmp2z8rueu4 due to: Unexpected error while running command.
  Command: ['mount', '-o', 'ro', '-t', 'auto', '/dev/mmcblk0p1', '/run/cloud-init/tmp/tmp2z8rueu4']
  Exit code: 32
  Reason: -
  Stdout:
  Stderr: mount: /run/cloud-init/tmp/tmp2z8rueu4: /dev/mmcblk0p1 already mounted on /boot/firmware.
  2020-01-14 20:48:01,549 - __init__.py[DEBUG]: Datasource DataSourceNoCloud [seed=None][dsmode=net] not updated for events: New instance first boot
  2020-01-14 20:48:01,549 - handlers.py[DEBUG]: finish: init-local/search-NoCloud: SUCCESS: no local data found from DataSourceNoCloud
  2020-01-14 20:48:01,549 - main.py[DEBUG]: No local datasource found
  ...

  The relevant code is mount_cb() in util.py which is calling mounts()
  to read /proc/mounts then attempting to mount the device when it's not
  found (leading to a classic race between the check and the action).

  It might be argued that it should simply attempt the mount, and check
  /proc/mounts in the event of failure, but I'm wary that mounting FS'
  is a potentially costly exercise (can the seed be on a remote mount?),
  and there's probably edge cases here I'm unaware of (is it possible,
  perhaps dangerous, to double-mount certain devices?).

  Perhaps it could be adjusted to check /proc/mounts, attempt the mount,
  then *re-check* /proc/mounts in the case of failure?

  Nevertheless, there's a trivial work-around in our case: just add an
  override for "RequiresMountFor=/boot/firmware" to our systemd config
  for cloud-init-local, thus ensuring the mount is definitely available
  prior to cloud-init looking for its seed. This in turn argues the
  issue could simply be viewed as mis-configuration on our part: we
  *know* the seed is on /dev/mmcblk0p1 as that's where we've configured
  it to be; should we not also configure cloud-init in such a way that
  we guarantee the device containing its seed is definitely mounted
  prior to it running?

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/cloud-init/+bug/1860046/+subscriptions



More information about the foundations-bugs mailing list