[Bug 1188642] Re: Init aborts with the message: init: log.c:813: Assertion failed in log_clear_unflushed: ! log->io

regmka regmka at gmail.com
Sat Jun 8 05:36:47 UTC 2013


** Description changed:

  init (in Ubuntu 13.04) accidentally aborts with the following messages
  in the kernel log:
  
  [    4.981969] init: log.c:813: Assertion failed in log_clear_unflushed: ! log->io
  [    4.991031] init: Caught abort, core dumped
  [    4.995571] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000600
  
- I've figured out that the probable cause is a race between the line 699 of the file init/log.c
- of the function log_read_watch() that sets the field log->remote_closed to 1 (the caller clears log->io)
- and the routine log_clear_unflushed() that causes the abort when log->remote_closed == 1 but log->io != NULL:
- log_clear_unflushed() may preempt log_read_watch() when log->remote_closed is set to 1 but
- before log->io is cleared for the same log.
+ I've figured out that the probable cause is a race between the line 699 of the file init/log.c of the function log_read_watch() that sets the field
+ log->remote_closed to 1 (the caller clears log->io) and the routine log_clear_unflushed() that causes the abort when log->remote_closed == 1 but log->io != NULL: log_clear_unflushed() may preempt log_read_watch() when
+ log->remote_closed is set to 1 but before log->io is cleared for
+ the same log.
  
  I applied the following patch to prove this assertion:
  
  @@ -264,6 +266,8 @@
-  			 */
-  			log->io->error_handler = NULL;
-  
+      */
+     log->io->error_handler = NULL;
+ 
  +			nih_fatal("log %p: clearing log->io\n", log);
  +
-  			nih_free (log->io);
-  			log->io = NULL;
-  		}
+     nih_free (log->io);
+     log->io = NULL;
+    }
  @@ -384,6 +388,8 @@
-  
-  	nih_free (err);
-  
+ 
+   nih_free (err);
+ 
  +	nih_fatal("log %p: clearing log->io\n", log);
  +
-  	/* Ensure the NihIo is closed */
-  	nih_free (log->io);
-  	log->io = NULL;
+   /* Ensure the NihIo is closed */
+   nih_free (log->io);
+   log->io = NULL;
  @@ -696,7 +702,11 @@
-  			 *
-  			 */
-  			if (saved && saved != EAGAIN && saved != EWOULDBLOCK)
+      *
+      */
+     if (saved && saved != EAGAIN && saved != EWOULDBLOCK)
  +			{
  +				nih_fatal("log %p: setting log->remote_closed\n", log);
-  				log->remote_closed = 1;
+      log->remote_closed = 1;
  +			}
-  
-  			close (log->fd);
-  			log->fd = -1;
+ 
+     close (log->fd);
+     log->fd = -1;
  @@ -806,6 +816,8 @@
-  		nih_assert (log->open_errno);
-  
-  		if (log->remote_closed) {
+    nih_assert (log->open_errno);
+ 
+    if (log->remote_closed) {
  +			nih_fatal("log %p: checking log->io\n", log);
  +
-  			/* Parent job has ended and unflushed data
-  			 * exists.
-  			 */
- 
+     /* Parent job has ended and unflushed data
+      * exists.
+      */
  
  I got the  following kernel log before the abort:
  
  [    4.635146] init: log 0xb882a660: clearing log->io
  [    4.640487] init: log 0xb88035e0: setting log->remote_closed
  [    4.654816] init: log 0xb882ce88: clearing log->io
  [    4.655263] init: log 0xb882a1d0: setting log->remote_closed
  [    4.706758] init: log 0xb8831370: setting log->remote_closed
  [    6.714543] init: log 0xb882a600: clearing log->io
  [    6.733596] init: log 0xb882c6a0: clearing log->io
  [    6.746124] init: log 0xb882cbd8: setting log->remote_closed
  [    6.765075] init: log 0xb882b1d0: clearing log->io
  [    6.787772] init: log 0xb8842b50: clearing log->io
  [    6.801163] init: log 0xb882cb78: clearing log->io
  [    6.810717] init: log 0xb882df58: clearing log->io
  [    6.864869] init: log 0xb883e418: clearing log->io
  [    6.871033] init: log 0xb882d2a8: clearing log->io
  [    6.899113] init: log 0xb882d268: clearing log->io
  [    6.927477] init: log 0xb884eb28: clearing log->io
  [    6.942432] init: log 0xb88519a8: clearing log->io
  [    6.997231] init: log 0xb8850c40: clearing log->io
  [    7.031685] init: log 0xb8850650: clearing log->io
  [    7.042562] init: log 0xb8856d10: clearing log->io
  [    7.042811] init: log 0xb8857cf8: clearing log->io
  [    7.063461] init: log 0xb885a850: setting log->remote_closed
  [    7.063996] init: log 0xb883e478: setting log->remote_closed
  [    7.095505] init: log 0xb8854960: setting log->remote_closed
  [    7.135581] init: log 0xb883e478: checking log->io
  [    7.135756] init: log.c:825: Assertion failed in log_clear_unflushed: ! log->io
  [    7.140537] init: Caught abort, core dumped
  [    7.143211] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000600
  
- So the assertion has been proved for the log at 0xb883e478 which has the field remote_closed updated
- but the field io dirty.
+ So the assertion has been proved for the log at 0xb883e478 which has the
+ field remote_closed updated but the field io dirty.

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to upstart in Ubuntu.
https://bugs.launchpad.net/bugs/1188642

Title:
  Init aborts with the message: init: log.c:813: Assertion failed in
  log_clear_unflushed: ! log->io

Status in Upstart:
  New
Status in “upstart” package in Ubuntu:
  New

Bug description:
  init (in Ubuntu 13.04) accidentally aborts with the following messages
  in the kernel log:

  [    4.981969] init: log.c:813: Assertion failed in log_clear_unflushed: ! log->io
  [    4.991031] init: Caught abort, core dumped
  [    4.995571] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000600

  I've figured out that the probable cause is a race between the line 699 of the file init/log.c of the function log_read_watch() that sets the field
  log->remote_closed to 1 (the caller clears log->io) and the routine log_clear_unflushed() that causes the abort when log->remote_closed == 1 but log->io != NULL: log_clear_unflushed() may preempt log_read_watch() when
  log->remote_closed is set to 1 but before log->io is cleared for
  the same log.

  I applied the following patch to prove this assertion:

  @@ -264,6 +266,8 @@
       */
      log->io->error_handler = NULL;

  +			nih_fatal("log %p: clearing log->io\n", log);
  +
      nih_free (log->io);
      log->io = NULL;
     }
  @@ -384,6 +388,8 @@

    nih_free (err);

  +	nih_fatal("log %p: clearing log->io\n", log);
  +
    /* Ensure the NihIo is closed */
    nih_free (log->io);
    log->io = NULL;
  @@ -696,7 +702,11 @@
       *
       */
      if (saved && saved != EAGAIN && saved != EWOULDBLOCK)
  +			{
  +				nih_fatal("log %p: setting log->remote_closed\n", log);
       log->remote_closed = 1;
  +			}

      close (log->fd);
      log->fd = -1;
  @@ -806,6 +816,8 @@
     nih_assert (log->open_errno);

     if (log->remote_closed) {
  +			nih_fatal("log %p: checking log->io\n", log);
  +
      /* Parent job has ended and unflushed data
       * exists.
       */

  I got the  following kernel log before the abort:

  [    4.635146] init: log 0xb882a660: clearing log->io
  [    4.640487] init: log 0xb88035e0: setting log->remote_closed
  [    4.654816] init: log 0xb882ce88: clearing log->io
  [    4.655263] init: log 0xb882a1d0: setting log->remote_closed
  [    4.706758] init: log 0xb8831370: setting log->remote_closed
  [    6.714543] init: log 0xb882a600: clearing log->io
  [    6.733596] init: log 0xb882c6a0: clearing log->io
  [    6.746124] init: log 0xb882cbd8: setting log->remote_closed
  [    6.765075] init: log 0xb882b1d0: clearing log->io
  [    6.787772] init: log 0xb8842b50: clearing log->io
  [    6.801163] init: log 0xb882cb78: clearing log->io
  [    6.810717] init: log 0xb882df58: clearing log->io
  [    6.864869] init: log 0xb883e418: clearing log->io
  [    6.871033] init: log 0xb882d2a8: clearing log->io
  [    6.899113] init: log 0xb882d268: clearing log->io
  [    6.927477] init: log 0xb884eb28: clearing log->io
  [    6.942432] init: log 0xb88519a8: clearing log->io
  [    6.997231] init: log 0xb8850c40: clearing log->io
  [    7.031685] init: log 0xb8850650: clearing log->io
  [    7.042562] init: log 0xb8856d10: clearing log->io
  [    7.042811] init: log 0xb8857cf8: clearing log->io
  [    7.063461] init: log 0xb885a850: setting log->remote_closed
  [    7.063996] init: log 0xb883e478: setting log->remote_closed
  [    7.095505] init: log 0xb8854960: setting log->remote_closed
  [    7.135581] init: log 0xb883e478: checking log->io
  [    7.135756] init: log.c:825: Assertion failed in log_clear_unflushed: ! log->io
  [    7.140537] init: Caught abort, core dumped
  [    7.143211] Kernel panic - not syncing: Attempted to kill init! exitcode=0x00000600

  So the assertion has been proved for the log at 0xb883e478 which has
  the field remote_closed updated but the field io dirty.

To manage notifications about this bug go to:
https://bugs.launchpad.net/upstart/+bug/1188642/+subscriptions




More information about the foundations-bugs mailing list