[Bug 1834875] Re: cloud-init growpart race with udev
Ryan Harper
1834875 at bugs.launchpad.net
Mon Aug 26 15:10:47 UTC 2019
On Mon, Aug 26, 2019 at 4:05 AM Tobias Koch <1834875 at bugs.launchpad.net>
wrote:
> > (Odds are that whatever causes it to be recreated later in boot would be
> > blocked by cloud-init waiting.)
>
> But that's not happening. The instance does boot normally, the only
> service degraded is cloud-init and there is no significant delay either.
>
> So conversely, if I put a loop into cloud-init and just waited on the
> symlink to appear and if that worked with minimal delay, would that
> refute the above?
>
That's still a workaround for something we don't exactly know why is racing
nor why this isn't more widespread. The code in cloud-init and growpart,
sgdisk
and partx are stable (the code has not changed significantly much in some
time).
We don't have root cause for the race at this time. When cloud-init
invokes growpart
the symlink exists, and when growpart returns sometimes it does not. If
anything growpart
should address the race itself; and at this point, it would have to pickup
a workaround as well.
Let's at least make sure we understand the actual race before we look
further into workarounds.
>From what I can see in what growpart is doing, the sgdisk command will
clear the partition tables (this involves removing the partition and then
re-adding it, which triggers udev. Further, Dan's show that partx --update
can also trigger a remove and an add. Looking at the partx update code;
*sometimes* it will remove and add, however, if the partition to be updated
*exists* then it will instead issue an update IOCTL which only updates the
size value in sysfs.
https://github.com/karelzak/util-
linux/blob/53ae7d60cfeacd4e87bfe6fcc015b58b78ef4555/disk-
utils/partx.c#L451
Which makes me think that in the successful path, we're seeing partx
--update take the partx_resize_partition path, which submits the resize
IOCTL
https://github.com/karelzak/util-
linux/blob/917f53cf13c36d32c175f80f2074576595830573/include/partx.h#L54
which in linux kernel does:
https://elixir.bootlin.com/linux/latest/source/block/ioctl.c#L100
and just updates the size value in sysfs:
https://elixir.bootlin.com/linux/latest/source/block/ioctl.c#L146
which AFAICT does not emit any new uevents;
Lastly, in either path (partx updates vs partx removes/adds); invoking a
udevadm settle after the binary has exited is the reasonable way to ensure
that *if* any uevents were created, that they are processed.
growpart could add udevadm settle code; so could cloud-init. We actually
did that in our first test package and that did not result in ensuring the
symlink was present.
All of this suggests to me that *something* isn't processing the sequence
of uevents in such a way that the once they've all been processed we have
the symlink.
We must be missing some other bit of information in the failing path where
the symlink is eventually recreated (possibly due to some other write or
close on the disk on the disk which re-triggers rules).
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/1834875
>
> Title:
> cloud-init growpart race with udev
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/cloud-init/+bug/1834875/+subscriptions
>
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1834875
Title:
cloud-init growpart race with udev
Status in cloud-init:
Incomplete
Status in cloud-utils:
New
Status in systemd package in Ubuntu:
New
Bug description:
On Azure, it happens regularly (20-30%), that cloud-init's growpart
module fails to extend the partition to full size.
Such as in this example:
========================================
2019-06-28 12:24:18,666 - util.py[DEBUG]: Running command ['growpart', '--dry-run', '/dev/sda', '1'] with allowed return codes [0] (shell=False, capture=True)
2019-06-28 12:24:19,157 - util.py[DEBUG]: Running command ['growpart', '/dev/sda', '1'] with allowed return codes [0] (shell=False, capture=True)
2019-06-28 12:24:19,726 - util.py[DEBUG]: resize_devices took 1.075 seconds
2019-06-28 12:24:19,726 - handlers.py[DEBUG]: finish: init-network/config-growpart: FAIL: running config-growpart with frequency always
2019-06-28 12:24:19,727 - util.py[WARNING]: Running module growpart (<module 'cloudinit.config.cc_growpart' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py'>) failed
2019-06-28 12:24:19,727 - util.py[DEBUG]: Running module growpart (<module 'cloudinit.config.cc_growpart' from '/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py'>) failed
Traceback (most recent call last):
File "/usr/lib/python3/dist-packages/cloudinit/stages.py", line 812, in _run_modules
freq=freq)
File "/usr/lib/python3/dist-packages/cloudinit/cloud.py", line 54, in run
return self._runners.run(name, functor, args, freq, clear_on_fail)
File "/usr/lib/python3/dist-packages/cloudinit/helpers.py", line 187, in run
results = functor(*args)
File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 351, in handle
func=resize_devices, args=(resizer, devices))
File "/usr/lib/python3/dist-packages/cloudinit/util.py", line 2521, in log_time
ret = func(*args, **kwargs)
File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 298, in resize_devices
(old, new) = resizer.resize(disk, ptnum, blockdev)
File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 159, in resize
return (before, get_size(partdev))
File "/usr/lib/python3/dist-packages/cloudinit/config/cc_growpart.py", line 198, in get_size
fd = os.open(filename, os.O_RDONLY)
FileNotFoundError: [Errno 2] No such file or directory: '/dev/disk/by-partuuid/a5f2b49f-abd6-427f-bbc4-ba5559235cf3'
========================================
@rcj suggested this is a race with udev. This seems to only happen on
Cosmic and later.
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-init/+bug/1834875/+subscriptions
More information about the foundations-bugs
mailing list