[Bug 1834875] Re: cloud-init growpart race with udev
Dan Watkins
daniel.watkins at canonical.com
Fri Aug 23 14:49:36 UTC 2019
[N.B. I wrote the below before I saw Ryan's comment, so there is some
repetition.]
OK, I've spent some time catching up on this properly so I can
summarise: per comment #24, the issue is that when udev processes the
events emitted by the kernel, it (sometimes) doesn't determine the
correct partition information. The kernel _does_ emit all the events we
would expect, and udev _does_ handle all the events we would expect
(which is to say that `udevadm settle` doesn't change behaviour here, it
merely ensures that the broken behaviour has completed before we
proceed). The hypothesised race condition is somewhere between the
kernel and udev: I believe the kernel event is emitted before the
partition table has necessarily been fully updated so when udev
processes the event and reads the partition table, sometimes it finds
the partition and sometimes it doesn't. To be clear, the kernel event
generation and the buggy udev event handling all happens as a result of
the resize command, _not_ as a result of anything else cloud-init runs
subsequently.
So as far as I can tell, this bug would occur regardless of what runs
the resize command, and no matter what commands are executed after the
resize command. (It might be possible to work around this bug by
issuing commands that force a re-read of the partition table on a disk,
for example, but this bug _would_ still have occurred before then.)
cloud-init could potentially work around a (kernel|systemd) that isn't
handling partitions correctly, but we really shouldn't have to. Until
we're satisfied that they cannot actually be fixed, we shouldn't do
that. (I am _not_ convinced that this cannot be fixed in (the
kernel|systemd), because using a different kernel and using a different
udevadm have both caused the issue to stop reproducing.)
So, let me be a little more categorical. The information we have at the
moment indicates an issue in the interactions between the kernel and
udev on partition resize. cloud-init's involvement is merely as the
initiator of that resize. Until we have more information that indicates
the issue to be in cloud-init, this isn't a valid cloud-init issue.
Once we have more information from the kernel and/or systemd folks, if
it indicates that cloud-init _is_ at fault, please move this back to
New.
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1834875
Title:
cloud-init growpart race with udev
Status in cloud-init:
Incomplete
Status in systemd package in Ubuntu:
New
Bug description:
On Azure, it happens regularly (20-30%), that cloud-init's growpart
module fails to extend the partition to full size.
Such as in this example:
========================================
2019-06-28 12:24:18,666 - util.py[DEBUG]: Running command ['growpart', '--dry-run', '/dev/sda', '1'] with allowed return codes [0] (shell=False, capture=True)
2019-06-28 12:24:19,157 - util.py[DEBUG]: Running command ['growpart', '/dev/sda', '1'] with allowed return codes [0] (shell=False, capture=True)
2019-06-28 12:24:19,726 - util.py[DEBUG]: resize_devices took 1.075 seconds
2019-06-28 12:24:19,726 - handlers.py[DEBUG]: finish: init-network/config-growpart: FAIL: running config-growpart with frequency always
2019-06-28 12:24:19,727 - util.py[WARNING]: Running module growpart (<module 'cloudinit.config.cc_growpart' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py'>) failed
2019-06-28 12:24:19,727 - util.py[DEBUG]: Running module growpart (<module 'cloudinit.config.cc_growpart' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py'>) failed
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 812, in _run_modules
freq=freq)
File "/usr/lib/python3/dist-packages/cloudinit/cloud.py", line 54, in run
return self._runners.run(name, functor, args, freq, clear_on_fail)
File "/usr/lib/python3/dist-packages/cloudinit/helpers.py", line 187, in run
results = functor(*args)
File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 351, in handle
func=resize_devices, args=(resizer, devices))
File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 2521, in log_time
ret = func(*args, **kwargs)
File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 298, in resize_devices
(old, new) = resizer.resize(disk, ptnum, blockdev)
File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 159, in resize
return (before, get_size(partdev))
File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 198, in get_size
fd = os.open(filename, os.O_RDONLY)
FileNotFoundError: [Errno 2] No such file or directory: '/dev/disk/by-partuuid/a5f2b49f-abd6-427f-bbc4-ba5559235cf3'
========================================
@rcj suggested this is a race with udev. This seems to only happen on
Cosmic and later.
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1834875/+subscriptions
More information about the foundations-bugs
mailing list