[Bug 1834875] Re: cloud-init growpart race with udev
Ryan Harper
1834875 at bugs.launchpad.net
Fri Aug 23 14:24:53 UTC 2019
The sequence is:
exec growpart
exec sgdisk --info # read-only
exec sgdisk --pretend # read-only
exec sgdisk --backup # read-only copy
# modification of disk starts
exec sgdisk --move-second-header \
--delete=PART \
--new=PART \
--typecode --partition-guid --change-name
# now that sgdisk has *closed* the filehandle on the disk, systemd-udevd will
# get an inotify signal and trigger udevd to run udev scripts on the disk.
# this includes the *removal* of symlinks due to the --delete portion of sgdisk call
# and following the removal, the -new will trigger the add run on the rules which would
# recreate the symlinks.
# update kernel partition sizes; this is an ioctl so it does not trigger an udev events
exec partx --update
# the kernel has the new partition sizes, and udev scripts/events are all queued (and possibly in flight)
exit growpart
cloud-init invokes get_size() operation which:
# this is where the race occurs if the symlink created by udev is *not* present
os.open(/dev/disk/by-id/fancy-symlink-with-partuuid-points-to-sdb1)
Dan had put a udevadm settle in this spot like so
def get_size(filename)
util.subp(['udevadm', 'settle'])
os.open(....)
So, you're suggesting that somehow _not all_ of the uevents triggered by
the sgdisk command in growpart *wouldn't* have been queued before we
call udevadm settle?
If some other events are happening how is cloud-init to know such that
it can take action to "handle this race" more robustly?
Lastly if there is a *race* in the symlink creation/remove/delay in
uevent propigation; why is that a userspace let alone a cloud-init
issue. This isn't universally reproducible, rather it's pretty narrow
circumstances between certain kernels and udevs all the while the
growpart/cloud-init code remains the same.
** Changed in: cloud-init
Status: New => Incomplete
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1834875
Title:
cloud-init growpart race with udev
Status in cloud-init:
Incomplete
Status in systemd package in Ubuntu:
New
Bug description:
On Azure, it happens regularly (20-30%), that cloud-init's growpart
module fails to extend the partition to full size.
Such as in this example:
========================================
2019-06-28 12:24:18,666 - util.py[DEBUG]: Running command ['growpart', '--dry-run', '/dev/sda', '1'] with allowed return codes [0] (shell=False, capture=True)
2019-06-28 12:24:19,157 - util.py[DEBUG]: Running command ['growpart', '/dev/sda', '1'] with allowed return codes [0] (shell=False, capture=True)
2019-06-28 12:24:19,726 - util.py[DEBUG]: resize_devices took 1.075 seconds
2019-06-28 12:24:19,726 - handlers.py[DEBUG]: finish: init-network/config-growpart: FAIL: running config-growpart with frequency always
2019-06-28 12:24:19,727 - util.py[WARNING]: Running module growpart (<module 'cloudinit.config.cc_growpart' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py'>) failed
2019-06-28 12:24:19,727 - util.py[DEBUG]: Running module growpart (<module 'cloudinit.config.cc_growpart' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py'>) failed
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 812, in _run_modules
freq=freq)
File "/usr/lib/python3/dist-packages/cloudinit/cloud.py", line 54, in run
return self._runners.run(name, functor, args, freq, clear_on_fail)
File "/usr/lib/python3/dist-packages/cloudinit/helpers.py", line 187, in run
results = functor(*args)
File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 351, in handle
func=resize_devices, args=(resizer, devices))
File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 2521, in log_time
ret = func(*args, **kwargs)
File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 298, in resize_devices
(old, new) = resizer.resize(disk, ptnum, blockdev)
File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 159, in resize
return (before, get_size(partdev))
File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 198, in get_size
fd = os.open(filename, os.O_RDONLY)
FileNotFoundError: [Errno 2] No such file or directory: '/dev/disk/by-partuuid/a5f2b49f-abd6-427f-bbc4-ba5559235cf3'
========================================
@rcj suggested this is a race with udev. This seems to only happen on
Cosmic and later.
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1834875/+subscriptions
More information about the foundations-bugs
mailing list