[Bug 1940207] [NEW] Mdadm slow RAID6 & RAID10 resync

Sergiu 1940207 at bugs.launchpad.net
Tue Aug 17 03:46:49 UTC 2021


Public bug reported:

I am having a RAID 10 made of 24 x Micron 9300 Pro 15.36TB. I have
initialized the array using the command as follows:

mdadm --create /dev/md0 --raid-devices=24 --chunk=32 --level=raid10
/dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1 /dev/nvme4n1
/dev/nvme5n1 /dev/nvme6n1 /dev/nvme7n1 /dev/nvme8n1 /dev/nvme9n1
/dev/nvme10n1 /dev/nvme11n1 /dev/nvme12n1 /dev/nvme13n1 /dev/nvme14n1
/dev/nvme15n1 /dev/nvme16n1 /dev/nvme17n1 /dev/nvme18n1 /dev/nvme19n1
/dev/nvme20n1 /dev/nvme21n1 /dev/nvme22n1 /dev/nvme23n1

What I have found out is that array resyncs at a total of 1.3GB/s and
does not appear to be any way to speed it up. If built as 12 arrays of
RAID1 in a RAID0 config, each individual group ends up being resynced at
~3.2GB/s for a total throughput of about 38.4GB/s which is almost 30
times faster, however in this configuration random IOPS performance
appears to be way more unstable than in standard RAID10, thus negating
the advantages of resync. If mdadm offers predefined RAID10
configuration, it should offer the same resync behavior as RAID 1+0,
however it is not due to lack of parallelization in plain resync. The
dev.raid.speed_limit_max parameter was raised to 5000000 during this
exercise.

Same resync speed is observed also in RAID6, however there, setting
group_thread_cnt value to 12 increases the resync speed from 1.3GB to
about 10.5GB, which is however still far from theoretical speed of
72GB/s. To be noted that increasing group_thread_cnt above 12 does not
scale up, contrary, throughput starts decreasing.

To be mentioned that server is having way more than enough CPU power (2
x 64 cores) and all SSDs are directly attached NVMe, thus there is no
bus sharing of any kind, which was confirmed by running fio on all
devices concurrently and observing max theoretical speed.

** Affects: mdadm (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to mdadm in Ubuntu.
https://bugs.launchpad.net/bugs/1940207

Title:
  Mdadm slow RAID6 &  RAID10 resync

Status in mdadm package in Ubuntu:
  New

Bug description:
  I am having a RAID 10 made of 24 x Micron 9300 Pro 15.36TB. I have
  initialized the array using the command as follows:

  mdadm --create /dev/md0 --raid-devices=24 --chunk=32 --level=raid10
  /dev/nvme0n1 /dev/nvme1n1 /dev/nvme2n1 /dev/nvme3n1 /dev/nvme4n1
  /dev/nvme5n1 /dev/nvme6n1 /dev/nvme7n1 /dev/nvme8n1 /dev/nvme9n1
  /dev/nvme10n1 /dev/nvme11n1 /dev/nvme12n1 /dev/nvme13n1 /dev/nvme14n1
  /dev/nvme15n1 /dev/nvme16n1 /dev/nvme17n1 /dev/nvme18n1 /dev/nvme19n1
  /dev/nvme20n1 /dev/nvme21n1 /dev/nvme22n1 /dev/nvme23n1

  What I have found out is that array resyncs at a total of 1.3GB/s and
  does not appear to be any way to speed it up. If built as 12 arrays of
  RAID1 in a RAID0 config, each individual group ends up being resynced
  at ~3.2GB/s for a total throughput of about 38.4GB/s which is almost
  30 times faster, however in this configuration random IOPS performance
  appears to be way more unstable than in standard RAID10, thus negating
  the advantages of resync. If mdadm offers predefined RAID10
  configuration, it should offer the same resync behavior as RAID 1+0,
  however it is not due to lack of parallelization in plain resync. The
  dev.raid.speed_limit_max parameter was raised to 5000000 during this
  exercise.

  Same resync speed is observed also in RAID6, however there, setting
  group_thread_cnt value to 12 increases the resync speed from 1.3GB to
  about 10.5GB, which is however still far from theoretical speed of
  72GB/s. To be noted that increasing group_thread_cnt above 12 does not
  scale up, contrary, throughput starts decreasing.

  To be mentioned that server is having way more than enough CPU power
  (2 x 64 cores) and all SSDs are directly attached NVMe, thus there is
  no bus sharing of any kind, which was confirmed by running fio on all
  devices concurrently and observing max theoretical speed.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/mdadm/+bug/1940207/+subscriptions




More information about the foundations-bugs mailing list