[Bug 1215911] Re: wait-for-root fails to wait for plain /dev/sdaX partitions.
Tetsuo Handa
1215911 at bugs.launchpad.net
Sat Sep 21 04:13:18 UTC 2013
Martin Pitt (pitti) wrote on 2013-08-26:
> I have never actually seen ENOBUFS, or uevents being missed due to it,
> so I think the chance of that is quite small. But I can't assert that
> all messages will be received after an ENOBUFS. But as you said,
> waiting longer in that case is a safer fallback than not waiting at
> all. My hope is that that ENOBUFS situation clears itself up
> automatically after some time, otherwise your whole system would be
> screwed (as you could never receive any uevent).
I think that you can observe ENOBUFS and target uevents being missed due to it
if you try below change. I confirmed using below change that wait-for-root
waits until SIGALRM if wait-for-root failed to receive target uevents. Current
code assumes that socket buffer size is large enough to queue target uevents.
Anyway, although the possibility that wait-for-root waits longer than it should
is remaining, the possibility that wait-for-root waits shorter than it should
was fixed.
Thank you.
----------
--- a/src/wait-for-root.c
+++ b/src/wait-for-root.c
@@ -12,6 +12,9 @@
#include <unistd.h>
#include <fcntl.h>
+#include <sys/socket.h>
+#include <libudev.h>
+#include <errno.h>
static int device_queued (struct udev *udev, const char *path);
static int matching_device (struct udev_device *device, const char *path);
@@ -60,6 +63,11 @@ main (int argc,
*/
udev = udev_new ();
udev_monitor = udev_monitor_new_from_netlink (udev, "udev");
+ {
+ // Reduce socket buffer size.
+ int buff_size = 4096;
+ setsockopt(udev_monitor_get_fd(udev_monitor), SOL_SOCKET, SO_RCVBUF, &buff_size, sizeof(buff_size));
+ }
udev_monitor_filter_add_match_subsystem_devtype (udev_monitor, "block", NULL);
udev_monitor_enable_receiving (udev_monitor);
@@ -96,11 +104,15 @@ main (int argc,
/* When the device doesn't exist yet, or is still being processed
* by udev, use the monitor socket to wait it to be done.
*/
+ sleep(3); // Inject delay to make socket buffer overflow.
while (1) {
/* even though we use a blocking socket this might still fail
* due to ENOBUFS or similar. */
- while ((udev_device = udev_monitor_receive_device (udev_monitor)) == NULL)
- sleep (1);
+ while (errno = 0, (udev_device = udev_monitor_receive_device (udev_monitor)) == NULL) {
+ const int err = errno;
+ fprintf(stderr, "***** %s (%d)\n", strerror(err), err);
+ //sleep (1);
+ }
if (matching_device (udev_device, devpath)) {
type = udev_device_get_property_value (udev_device, "ID_FS_TYPE");
if (type) {
----------
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to initramfs-tools in Ubuntu.
https://bugs.launchpad.net/bugs/1215911
Title:
wait-for-root fails to wait for plain /dev/sdaX partitions.
Status in “initramfs-tools” package in Ubuntu:
Fix Released
Status in “initramfs-tools” source package in Precise:
Fix Committed
Status in “initramfs-tools” source package in Quantal:
Fix Committed
Status in “initramfs-tools” source package in Raring:
Fix Committed
Bug description:
SRU Justification:
[Impact]
* Boot failures can occur with the wait-for-root utility in P/Q/R due to a race condition.
* Because of this issue unattended reboots and boots can randomly fail.
* The original bug was submitted against Precise LTS.
[Test Case]
* Reboot machine and look for "ALERT! /dev/sda1 does not exist. Dropping to a shell!". Entering exit from prompt should boot system normally.
* We expect that continuous reboots should allow for the machine to boot normally without this alert.
[Regression Potential]
* This patch has already been uploaded into Saucy, and tested.
--
Moving the discussion from http://www.spinics.net/lists/hotplug/msg05769.html
to launchpad, for I think that this bug needs to be handled in initramfs-tools
package rather than in udev package.
----------
I'm experiencing random boot failures with wait-for-root utility in Ubuntu
12.04 ( ubuntu-12.04-server-amd64.iso ) on a HP ProLiant DL360p Gen8 server.
For example, wait-for-root waited for only 0.13 seconds before giving
up at
FSTYPE=$(wait-for-root "${ROOT}" ${ROOTDELAY:-30})
line in scripts/local in the initramfs, and immediately enters into
panic "ALERT! ${ROOT} does not exist. Dropping to a shell!"
line.
This is a race condition and manually entering "exit" from the panic prompt
boots the system normally. This is a critical bug for this environment because
it will randomly fail to perform unattended reboot (e.g. automatic reboot after
saving kdump).
----------
I examined main() in wait-for-root using debug fprintf() and it turned out that
udev_monitor_receive_device() is sometimes immediately returning NULL (although
wait-for-root is using blocking socket).
I examined udev_monitor_receive_device() in libudev.so.0 using debug fprintf()
and it turned out that recvmsg() in udev_monitor_receive_device() (which is in
libudev-monitor.c in udev package) is returning ENOBUFS error before recvmsg()
returns information of the root partition.
The wait-for-root utility in initramfs-tools package is not expecting recvmsg()
to return ENOBUFS error. But since ENOBUFS is an inevitable error, I think that
wait-for-root (i.e. the caller of udev_monitor_receive_device()) should handle
this error.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/initramfs-tools/+bug/1215911/+subscriptions
More information about the foundations-bugs
mailing list