[Bug 1834875] Re: cloud-init growpart race with udev
Scott Moser
ssmoser2+ubuntu at gmail.com
Thu Nov 7 16:55:23 UTC 2019
I really think you are all *way* over thinking this.
a. growpart made a change to the partition table (using sfdisk)
b. growpart called partx --update --nr 3 /dev/sda
c. growpart exited
With a and b growpart created udev events. If you create udev events,
you really need to wait for those events to finish, or your just pushing
complexity onto your consumer.
Now Daniel's updated cloud-init with output captured in comment 14
clearly called udevadm settle after 'c'. But the problem still existed.
So that means we have this sequence of events:
a.) growpart change partition table
b.) growpart call partx
c.) udev created and events being processed
d.) growpart exits
e.) cloud-init calls udevadm settle
f.) udev events occurring that removed the link
g.) cloud-init raced on reading that size - fail
To me, that means either udevadm settle called in 'e' didn't actually
do what it is suppsoed to do and wait for all events in the queue to
clear. Or, something else created new events. I have long suspected
that to be the case, and I think the thing doing it is udev rules.
If that is true, then udev events cause more udev events, so a single
'settle' is never enough. Nor can you actually ever know if you *have*
settled long enough.
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1834875
Title:
cloud-init growpart race with udev
Status in cloud-init:
Incomplete
Status in cloud-utils:
New
Status in linux-azure package in Ubuntu:
New
Status in systemd package in Ubuntu:
Incomplete
Bug description:
On Azure, it happens regularly (20-30%), that cloud-init's growpart
module fails to extend the partition to full size.
Such as in this example:
========================================
2019-06-28 12:24:18,666 - util.py[DEBUG]: Running command ['growpart', '--dry-run', '/dev/sda', '1'] with allowed return codes [0] (shell=False, capture=True)
2019-06-28 12:24:19,157 - util.py[DEBUG]: Running command ['growpart', '/dev/sda', '1'] with allowed return codes [0] (shell=False, capture=True)
2019-06-28 12:24:19,726 - util.py[DEBUG]: resize_devices took 1.075 seconds
2019-06-28 12:24:19,726 - handlers.py[DEBUG]: finish: init-network/config-growpart: FAIL: running config-growpart with frequency always
2019-06-28 12:24:19,727 - util.py[WARNING]: Running module growpart (<module 'cloudinit.config.cc_growpart' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py'>) failed
2019-06-28 12:24:19,727 - util.py[DEBUG]: Running module growpart (<module 'cloudinit.config.cc_growpart' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py'>) failed
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 812, in _run_modules
freq=freq)
File "/usr/lib/python3/dist-packages/cloudinit/cloud.py", line 54, in run
return self._runners.run(name, functor, args, freq, clear_on_fail)
File "/usr/lib/python3/dist-packages/cloudinit/helpers.py", line 187, in run
results = functor(*args)
File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 351, in handle
func=resize_devices, args=(resizer, devices))
File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 2521, in log_time
ret = func(*args, **kwargs)
File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 298, in resize_devices
(old, new) = resizer.resize(disk, ptnum, blockdev)
File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 159, in resize
return (before, get_size(partdev))
File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 198, in get_size
fd = os.open(filename, os.O_RDONLY)
FileNotFoundError: [Errno 2] No such file or directory: '/dev/disk/by-partuuid/a5f2b49f-abd6-427f-bbc4-ba5559235cf3'
========================================
@rcj suggested this is a race with udev. This seems to only happen on
Cosmic and later.
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1834875/+subscriptions
More information about the foundations-bugs
mailing list