[Bug 2137589] [NEW] curtin fails to clear existing RAID setup during wipe

Tue Jan 6 20:55:20 UTC 2026

Public bug reported:

When installing Ubuntu 24.04 server on a machine with 9 disks, with 8 in
a pre-existing software RAID setup, the autoinstall/curtin step to
remove pre-existing software RAID unexpectedly fails.

The error is almost always that the device or resource that curtin is
trying to wipe is busy or is in use.  curtin then tries to investigate
who is using it and it prints that a python3.10 process is using it.

I believe curtin is, under the hood, somehow spawning a child process
that is reading the partition table while also maintaining the process
trying to wipe. The file lock on the device is an exclusive lock but
it's possible that the process that reads the partition table is not
marked as a critical section. Issue is somewhere here I believe
https://github.com/canonical/curtin/blob/master/curtin/commands/block_meta.py#L804

Workaround is to manually wipe all disks by calling `mdadm --stop
--scan` and `wipefs --all --force /dev/nvme*n1`.

Apologies for not having logs: the early environment + hypervisor we use
don't allow data to be ferried out.

** Affects: curtin
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to curtin.
https://bugs.launchpad.net/bugs/2137589

Title:
  curtin fails to clear existing RAID setup during wipe

Status in curtin:
  New

Bug description:
  When installing Ubuntu 24.04 server on a machine with 9 disks, with 8
  in a pre-existing software RAID setup, the autoinstall/curtin step to
  remove pre-existing software RAID unexpectedly fails.

  The error is almost always that the device or resource that curtin is
  trying to wipe is busy or is in use.  curtin then tries to investigate
  who is using it and it prints that a python3.10 process is using it.

  I believe curtin is, under the hood, somehow spawning a child process
  that is reading the partition table while also maintaining the process
  trying to wipe. The file lock on the device is an exclusive lock but
  it's possible that the process that reads the partition table is not
  marked as a critical section. Issue is somewhere here I believe
  https://github.com/canonical/curtin/blob/master/curtin/commands/block_meta.py#L804

  Workaround is to manually wipe all disks by calling `mdadm --stop
  --scan` and `wipefs --all --force /dev/nvme*n1`.

  Apologies for not having logs: the early environment + hypervisor we
  use don't allow data to be ferried out.

To manage notifications about this bug go to:
https://bugs.launchpad.net/curtin/+bug/2137589/+subscriptions