[Bug 1850540] Re: multi-zone raid0 corruption

Wed Dec 4 21:26:11 UTC 2019

** Description changed:

  Bug 1849682 tracks the temporarily revert of the fix for this issue,
  while this bug tracks the re-application of that fix once we have a full
  solution.

- Fix checklist:
- [ ] Restore c84a1372df929 md/raid0: avoid RAID0 data corruption due to layout confusion.
- [ ] Also apply these fixes:
-     33f2c35a54dfd md: add feature flag MD_FEATURE_RAID0_LAYOUT
-     3874d73e06c9b md/raid0: fix warning message for parameter default_layout
- [ ] If upstream, include https://marc.info/?l=linux-raid&m=157239231220119&w=2
- [ ] mdadm update (see Comment #2)
- [ ] Packaging work to detect/aide admin before reboot
+ [Impact]
+ (cut & paste from https://marc.info/?l=linux-raid&m=157360088014027&w=2)
+ An unintentional RAID0 layout change was introduced in the v3.14 kernel. This effectively means there are 2 different layouts Linux will use to write data to RAID0 arrays in the wild - the “pre-3.14” way and the “3.14 and later” way. Mixing these layouts by writing to an array while booted on these different kernel versions can lead to corruption.

- Users of RAID0 arrays are susceptible to a corruption issue if:
-  - The members of the RAID array are not all the same size[*]
-  - Data has been written to the array while running kernels < 3.14 *and* >= 3.14.
+ Note that this only impacts RAID0 arrays that include devices of
+ different sizes. If your devices are all the same size, both layouts are
+ equivalent, and your array is not at risk of corruption due to this
+ issue.

- This is because of an change in v3.14 that accidentally changed how data was written - as described in the upstream commit message:
- https://github.com/torvalds/linux/commit/c84a1372df929033cb1a0441fb57bd3932f39ac9
+ Unfortunately, the kernel cannot detect which layout was used for writes
+ to pre-existing arrays, and therefore requires input from the
+ administrator. This input can be provided via the kernel command line
+ with the raid0.default_layout=<N> parameter, or by setting the
+ default_layout module parameter when loading the raid0 module. With a
+ new enough version of mdadm (>= 4.2, or equivalent distro backports),
+ you can set the layout version when assembling a stopped array. For
+ example:

- That change has been applied to stable, but we reverted it to fix
- 1849682 until we have a full solution ready.
+ mdadm --stop /dev/md0
+ mdadm --assemble -U layout-alternate /dev/md0 /dev/sda1 /dev/sda2
+ See the mdadm manpage for more details. Once set in this manner, the layout will be recorded in the array and will not need to be explicitly specified in the future.

- To summarize, upstream is dealing with this by adding a versioned layout
- in v5.4, and that is being backported to stable kernels - which is why
- we're now seeing it. Layout version 1 is the pre-3.14 layout, version 2
- is post 3.14. Mixing version 1 & version 2 layouts can cause corruption.
- However, until an mdadm exists that is able to set a layout in the
- array, there's no way for the kernel to know which version(s) was used
- to write the existing data. This undefined mode is considered "Version
- 0", and the kernel will now refuse to start these arrays w/o user
- intervention.
+ (The mdadm part of this SRU is for the above support ^)

- The user experience is pretty awful here. A user upgrades to the next
- SRU and all of a sudden their system stops at an (initramfs) prompt. A
- clueful user can spot something like the following in dmesg:
+ [Test Case]
+ = mdadm =
+ Confirm that a multi-zone raid0 created w/ older mdadm is able to be started on a fixed kernel by setting a layout.
+ 1) Ex: w/ old kernel/mdadm:
+   mdadm --create /dev/md0 --run --metadata=default \
+         --level=0 --raid-devices=2 /dev/vdb1 /dev/vdc1
+ 2) Reboot onto fixed kernel & update mdadm
+ 3) sudo mdadm --assemble -U layout-alternate \
+      /dev/md0 /dev/vdb1 /dev/vdc1
+ 4) Confirm that the array autostarts on reboot
+ 5) Confirm that w/ new kernel & new mdadm, a user can create and start an array in a backwards-compatible fashion (i.e. w/o an explicit layout).
+ 6) Verify that 'mdadm --detail /dev/md0' displays the layout 

- Here's the message which , as you can see from the log in Comment #1, is
- hidden in a ton of other messages:
+ = linux =
+ Similar to above, but using kernel command line options.

- [ 72.720232] md/raid0:md0: cannot assemble multi-zone RAID0 with default_layout setting
- [ 72.728149] md/raid0: please set raid.default_layout to 1 or 2
- [ 72.733979] md: pers->run() failed ...
- mdadm: failed to start array /dev/md0: Unknown error 524
+ [Regression Risk]
+ The kernel side of things will break starting pre-existing arrays. That's intentional.

- What that is trying to say is that you should determine if your data -
- specifically the data toward the end of your array - was most likely
- written with a pre-3.14 or post-3.14 kernel. Based on that, reboot with
- the kernel parameter raid0.default_layout=1 or raid0.default_layout=2 on
- the kernel command line. And note it should be *raid0.default_layout*
- not *raid.default_layout* as the message says - a fix for that message
- is now queued for stable:
- 
- https://github.com/torvalds/linux/commit/3874d73e06c9b9dc15de0b7382fc223986d75571)
- 
- IMHO, we should work with upstream to create a web page that clearly
- walks the user through this process, and update the error message to
- point to that page. I'd also like to see if we can detect this problem
- *before* the user reboots (debconf?) and help the user fix things. e.g.
- "We detected that you have RAID0 arrays that maybe susceptible to a
- corruption problem", guide the user to choosing a layout, and update the
- mdadm initramfs hook to poke the answer in via sysfs before starting the
- array on reboot.
- 
- Note that it also seems like we should investigate backporting this to <
- 3.14 kernels. Imagine a user switching between the trusty HWE kernel and
- the GA kernel.
- 
- References from users of other distros:
- https://blog.icod.de/2019/10/10/caution-kernel-5-3-4-and-raid0-default_layout/
- https://www.linuxquestions.org/questions/linux-general-1/raid-arrays-not-assembling-4175662774/
- 
- [*] Which surprisingly is not the case reported in this bug - the user
- here had a raid0 of 8 identically-sized devices. I suspect there's a bug
- in the detection code somewhere.
+ Although I've done due-diligence to check for backwards compatibility
+ issues, the mdadm side may still present some.

** Changed in: mdadm (Ubuntu Eoan)
       Status: Confirmed => In Progress

** Changed in: mdadm (Ubuntu Eoan)
     Assignee: (unassigned) => dann frazier (dannf)

** Changed in: mdadm (Ubuntu Disco)
       Status: Confirmed => In Progress

** Changed in: mdadm (Ubuntu Disco)
     Assignee: (unassigned) => dann frazier (dannf)

** Changed in: mdadm (Ubuntu Bionic)
       Status: Confirmed => In Progress

** Changed in: mdadm (Ubuntu Bionic)
     Assignee: (unassigned) => dann frazier (dannf)

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to mdadm in Ubuntu.
https://bugs.launchpad.net/bugs/1850540

Title:
  multi-zone raid0 corruption

Status in Release Notes for Ubuntu:
  New
Status in linux package in Ubuntu:
  Confirmed
Status in mdadm package in Ubuntu:
  Fix Released
Status in linux source package in Precise:
  New
Status in mdadm source package in Precise:
  New
Status in linux source package in Trusty:
  Confirmed
Status in mdadm source package in Trusty:
  Confirmed
Status in linux source package in Xenial:
  Confirmed
Status in mdadm source package in Xenial:
  Confirmed
Status in linux source package in Bionic:
  Confirmed
Status in mdadm source package in Bionic:
  In Progress
Status in linux source package in Disco:
  Confirmed
Status in mdadm source package in Disco:
  In Progress
Status in linux source package in Eoan:
  Confirmed
Status in mdadm source package in Eoan:
  In Progress
Status in linux source package in Focal:
  Confirmed
Status in mdadm source package in Focal:
  Fix Released
Status in mdadm package in Debian:
  Fix Released

Bug description:
  Bug 1849682 tracks the temporarily revert of the fix for this issue,
  while this bug tracks the re-application of that fix once we have a
  full solution.

  [Impact]
  (cut & paste from https://marc.info/?l=linux-raid&m=157360088014027&w=2)
  An unintentional RAID0 layout change was introduced in the v3.14 kernel. This effectively means there are 2 different layouts Linux will use to write data to RAID0 arrays in the wild - the “pre-3.14” way and the “3.14 and later” way. Mixing these layouts by writing to an array while booted on these different kernel versions can lead to corruption.

  Note that this only impacts RAID0 arrays that include devices of
  different sizes. If your devices are all the same size, both layouts
  are equivalent, and your array is not at risk of corruption due to
  this issue.

  Unfortunately, the kernel cannot detect which layout was used for
  writes to pre-existing arrays, and therefore requires input from the
  administrator. This input can be provided via the kernel command line
  with the raid0.default_layout=<N> parameter, or by setting the
  default_layout module parameter when loading the raid0 module. With a
  new enough version of mdadm (>= 4.2, or equivalent distro backports),
  you can set the layout version when assembling a stopped array. For
  example:

  mdadm --stop /dev/md0
  mdadm --assemble -U layout-alternate /dev/md0 /dev/sda1 /dev/sda2
  See the mdadm manpage for more details. Once set in this manner, the layout will be recorded in the array and will not need to be explicitly specified in the future.

  (The mdadm part of this SRU is for the above support ^)

  [Test Case]
  = mdadm =
  Confirm that a multi-zone raid0 created w/ older mdadm is able to be started on a fixed kernel by setting a layout.
  1) Ex: w/ old kernel/mdadm:
    mdadm --create /dev/md0 --run --metadata=default \
          --level=0 --raid-devices=2 /dev/vdb1 /dev/vdc1
  2) Reboot onto fixed kernel & update mdadm
  3) sudo mdadm --assemble -U layout-alternate \
       /dev/md0 /dev/vdb1 /dev/vdc1
  4) Confirm that the array autostarts on reboot
  5) Confirm that w/ new kernel & new mdadm, a user can create and start an array in a backwards-compatible fashion (i.e. w/o an explicit layout).
  6) Verify that 'mdadm --detail /dev/md0' displays the layout 

  = linux =
  Similar to above, but using kernel command line options.

  [Regression Risk]
  The kernel side of things will break starting pre-existing arrays. That's intentional.

  Although I've done due-diligence to check for backwards compatibility
  issues, the mdadm side may still present some.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-release-notes/+bug/1850540/+subscriptions