[Bug 1062159] Re: Raid is incorrectly determined as DEGRADED preventing boot in 12.04

Tue Dec 11 12:44:59 UTC 2012

** Changed in: mdadm (Ubuntu)
       Status: New => Confirmed

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to mdadm in Ubuntu.
https://bugs.launchpad.net/bugs/1062159

Title:
  Raid is incorrectly determined as DEGRADED preventing boot in 12.04

Status in “mdadm” package in Ubuntu:
  Confirmed

Bug description:
  After upgrading from 11.04 to 12.04 in two steps, my server failed to
  boot printing:

  "Could not start the RAID in degraded mode.", referring to /dev/md/3.
  Then dropping to an initramfs-shell.

  My RAID setup is the following:

  # cat /proc/mdstat 
  Personalities : [raid6] [raid5] [raid4] [raid1] [linear] [multipath] [raid0] [raid10] 
  md3 : active raid0 dm-2[0] sdc2[2] sdb2[1] sdd2[3]
        82075648 blocks super 1.2 1024k chunks

  md0 : active raid1 sdf1[1] sde1[0]
        530048 blocks [2/2] [UU]

  md4 : active raid5 sdf3[1] sdh3[4] sdg3[2] sde3[0]
        5856021120 blocks super 1.2 level 5, 128k chunk, algorithm 2 [4/4] [UUUU]

  md2 : active raid6 sdh2[3] sdf2[1] sdg2[2] sde2[0]
        1950720 blocks super 1.2 level 6, 512k chunk, algorithm 2 [4/4] [UUUU]

  md1 : active raid5 sda1[0] sdc1[2] sdb1[1] sdd1[3]
        11712000 blocks super 1.2 level 5, 512k chunk, algorithm 2 [4/4] [UUUU]

  The following are my mount points:

  # mount
  /dev/mapper/md1_crypt on / type ext4 (rw,noatime,errors=remount-ro)
  /dev/md0 on /boot type ext4 (rw)
  /dev/md3 on /something/else/irrelevant type xfs (rw,discard,discard)

  # grep -e md -e mapper -e boot /etc/fstab
  /dev/mapper/md1_crypt /               ext4    noatime,errors=remount-ro 0       1
  UUID=9b199b09-078a-4f88-82dc-a099be4c6a09 /boot           ext4    defaults        0       2
  /dev/mapper/md2_crypt none            swap    sw              0       0
  /dev/mapper/md4_crypt /mnt ext4 noatime,defaults 0 0
  /dev/md3 /something/else/irrelevant xfs defaults,discard 0 0

  Current crypttab setup:

  # cat /etc/crypttab 
  md1_crypt /dev/md1 none luks,discard
  md2_crypt /dev/md2 /dev/urandom cipher=aes-cbc-essiv:sha256,size=256,swap
  md3_crypt /dev/md3 /some/key/file cipher=aes-cbc-essiv:sha256,size=256
  sda2_crypt /dev/sda2 /some/key/file cipher=aes-cbc-essiv:sha256,size=256,discard
  md4_crypt /dev/md4 /some/key/file cipher=aes-cbc-essiv:sha256,size=256

  As you can see the first fact is that /dev/md3 should not be relevant
  for booting the system. It's not the rootfs, it's not the swap, it's
  not /boot. Which is all I need to get my system up and running.

  The part #2 which you can find as a contributor to the problem is that
  /dev/md3 is a RAID0 (0 drive tolerance for fault) which includes a
  device which is initiated in the crypttab (sda2_crypt). So once slice
  of the /dev/md3 is encrypted.

  During boot, this is what happens:

  1) System mounts the initrd stuff (which has a local derrivation of
  fstab, mdadm.conf and crypttab). It tries to determine what to do. It
  determines the system has an encrypted rootfs, and correctly prompt
  for the password.

  2) /dev/mapper/md1_crypt is unlocked from /dev/md1. /dev/md1 is
  assembled at this point, and operational.

  3) The system moves on trying to determine how to assemble the rest of
  the raids. It reads mdadm.conf (the problem persists even though I
  remove this file, although then my md3 is named md127). It finds
  definitions of md0, md1, md2, md3 & md4. It will try to run the stuff
  from /usr/share/initramfs-tools/hooks/mdadm.

  4) /usr/share/initramfs-tools/hooks/mdadm runs before the rest of the
  encrypted devices are assembled. Which kind of makes sense, as the
  encrypted devices may actually be on a raid. However, md3 consists of
  chunks from raw block devices and a device which is derived from the
  crypttab. The hook utilizes /usr/share/initramfs-tools/scripts/mdadm-
  functions.

  md3 : active raid0 dm-2[0] sdc2[2] sdb2[1] sdd2[3]
        82075648 blocks super 1.2 1024k chunks

  Notice the first device.

  5) For some reason, even though adding BOOT_DEGRADED=true in
  /etc/default/mdadm it will ignore this for a degraded RAID0, as it is
  probably marked as faulty and not degraded?

  6) The system halts. Throws me into the initramfs-shell.

  I got the system successfully booting by "hacking" the mdadm-functions
  file:

  --- usr/share/initramfs-tools/scripts/mdadm-functions	2012-02-10 04:04:54.000000000 +0100
  +++ /usr/share/initramfs-tools/scripts/mdadm-functions	2012-10-02 23:55:08.246402544 +0200
  @@ -3,8 +3,9 @@

   degraded_arrays()
   {
  -	mdadm --misc --scan --detail --test >/dev/null 2>&1
  -	return $((! $?))
  +#	mdadm --misc --scan --detail --test >/dev/null 2>&1
  +   return 0
  +#	return $((! $?))
   }

   mountroot_fail()
  @@ -83,10 +84,11 @@
   					echo "Started the RAID in degraded mode."
   					return 0
   				else
  +               mdadm --stop /dev/md3
   					echo "Could not start the RAID in degraded mode."
   				fi
   			fi
   		fi
   	fi
  -	return 1
  +	return 0
   }

  So basically I force mdadm-functions to always return 0, and never check for degraded arrays. In addition I make it stop the faulty assembled /dev/md3 which will be re-assembled after the initramfs completes anyhow. 

  This setup was working in 11.04.

  Lucky me having a remote serial console to actually solve it... :)

  The setup should be quite reproducible along any 12.04 setup.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/1062159/+subscriptions