[Bug 1215911] Re: wait-for-root fails to wait for plain /dev/sdaX partitions.

Tetsuo Handa 1215911 at bugs.launchpad.net
Sat Sep 21 04:13:18 UTC 2013


Martin Pitt (pitti) wrote on 2013-08-26:
> I have never actually seen ENOBUFS, or uevents being missed due to it,
> so I think the chance of that is quite small. But I can't assert that
> all messages will be received after an ENOBUFS. But as you said,
> waiting longer in that case is a safer fallback than not waiting at
> all. My hope is that that ENOBUFS situation clears itself up
> automatically after some time, otherwise your whole system would be
> screwed (as you could never receive any uevent).

I think that you can observe ENOBUFS and target uevents being missed due to it
if you try below change. I confirmed using below change that wait-for-root
waits until SIGALRM if wait-for-root failed to receive target uevents. Current
code assumes that socket buffer size is large enough to queue target uevents.

Anyway, although the possibility that wait-for-root waits longer than it should
is remaining, the possibility that wait-for-root waits shorter than it should
was fixed.

Thank you.

----------
--- a/src/wait-for-root.c
+++ b/src/wait-for-root.c
@@ -12,6 +12,9 @@
 #include <unistd.h>
 #include <fcntl.h>
 
+#include <sys/socket.h>
+#include <libudev.h>
+#include <errno.h>
 
 static int device_queued   (struct udev *udev, const char *path);
 static int matching_device (struct udev_device *device, const char *path);
@@ -60,6 +63,11 @@ main (int   argc,
 	 */
 	udev = udev_new ();
 	udev_monitor = udev_monitor_new_from_netlink (udev, "udev");
+	{
+		// Reduce socket buffer size.
+		int buff_size = 4096;
+		setsockopt(udev_monitor_get_fd(udev_monitor), SOL_SOCKET, SO_RCVBUF, &buff_size, sizeof(buff_size));
+	}
 
 	udev_monitor_filter_add_match_subsystem_devtype (udev_monitor, "block", NULL);
 	udev_monitor_enable_receiving (udev_monitor);
@@ -96,11 +104,15 @@ main (int   argc,
 	/* When the device doesn't exist yet, or is still being processed
 	 * by udev, use the monitor socket to wait it to be done.
 	 */
+	sleep(3); // Inject delay to make socket buffer overflow.
 	while (1) {
                 /* even though we use a blocking socket this might still fail
                  * due to ENOBUFS or similar. */
-                while ((udev_device = udev_monitor_receive_device (udev_monitor)) == NULL)
-                        sleep (1);
+		while (errno = 0, (udev_device = udev_monitor_receive_device (udev_monitor)) == NULL) {
+			const int err = errno;
+			fprintf(stderr, "***** %s (%d)\n", strerror(err), err);
+			//sleep (1);
+		}
 		if (matching_device (udev_device, devpath)) {
 			type = udev_device_get_property_value (udev_device, "ID_FS_TYPE");
 			if (type) {
----------

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to initramfs-tools in Ubuntu.
https://bugs.launchpad.net/bugs/1215911

Title:
  wait-for-root fails to wait for plain /dev/sdaX partitions.

Status in “initramfs-tools” package in Ubuntu:
  Fix Released
Status in “initramfs-tools” source package in Precise:
  Fix Committed
Status in “initramfs-tools” source package in Quantal:
  Fix Committed
Status in “initramfs-tools” source package in Raring:
  Fix Committed

Bug description:
  SRU Justification:
  [Impact] 
    * Boot failures can occur with the wait-for-root utility in P/Q/R due to a race condition.
    * Because of this issue unattended reboots and boots can randomly fail.
    * The original bug was submitted against Precise LTS.

  [Test Case]
   * Reboot machine and look for "ALERT! /dev/sda1 does not exist. Dropping to a shell!". Entering exit from prompt should boot system normally.
   * We expect that continuous reboots should allow for the machine to boot normally without this alert.

  [Regression Potential] 
   * This patch has already been uploaded into Saucy, and tested.

  --

  Moving the discussion from http://www.spinics.net/lists/hotplug/msg05769.html
  to launchpad, for I think that this bug needs to be handled in initramfs-tools
  package rather than in udev package.

  ----------

  I'm experiencing random boot failures with wait-for-root utility in Ubuntu
  12.04 ( ubuntu-12.04-server-amd64.iso ) on a HP ProLiant DL360p Gen8 server.

  For example, wait-for-root waited for only 0.13 seconds before giving
  up at

    FSTYPE=$(wait-for-root "${ROOT}" ${ROOTDELAY:-30})

  line in scripts/local in the initramfs, and  immediately enters into

    panic "ALERT!  ${ROOT} does not exist.  Dropping to a shell!"

  line.

  This is a race condition and manually entering "exit" from the panic prompt
  boots the system normally. This is a critical bug for this environment because
  it will randomly fail to perform unattended reboot (e.g. automatic reboot after
  saving kdump).

  ----------

  I examined main() in wait-for-root using debug fprintf() and it turned out that
  udev_monitor_receive_device() is sometimes immediately returning NULL (although
  wait-for-root is using blocking socket).

  I examined udev_monitor_receive_device() in libudev.so.0 using debug fprintf()
  and it turned out that recvmsg() in udev_monitor_receive_device() (which is in
  libudev-monitor.c in udev package) is returning ENOBUFS error before recvmsg()
  returns information of the root partition.

  The wait-for-root utility in initramfs-tools package is not expecting recvmsg()
  to return ENOBUFS error. But since ENOBUFS is an inevitable error, I think that
  wait-for-root (i.e. the caller of udev_monitor_receive_device()) should handle
  this error.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/initramfs-tools/+bug/1215911/+subscriptions



More information about the foundations-bugs mailing list