[Bug 2110891] Re: System freeze on release upgrade 24.10 oracular to 25.04 plucky with root fs on ZFS

Seth Arnold 2110891 at bugs.launchpad.net
Thu Jul 10 23:00:56 UTC 2025


Well, there's a few different kinds of "real fix".

My message is mostly about trying to get our users to an upgrade path
that works with a minimum of fuss.

The upstream OpenZFS developers have never supported different versions
of userland utilities than the kernel modules. It's just happened to
work fine for years because most people reboot into matching kernel
modules shortly after they install new userland utilities and don't
exhaustively stress the entire user-to-kernel communication in that
window.

But our scripts are doing something that exposes a mismatch. Probably
the OpenZFS developers didn't expect our scripts to use this
functionality in the moments between installing the utilities and
rebooting into the new kernel modules.

Fully addressing backwards and forwards compatibility in both the zfs
kernel modules and the zfs userland utilities is probably expensive
enough that nobody will ever do it. (If it were cheap and easy, surely
someone would have done it.) Projects have to prioritize what they can
work on with the resources that they have -- on-disk compatibility is
far more important to the OpenZFS developers than backwards and forwards
compatibility on the command and control channel.

Yes, a full fix would be nice, but I expect it'll take far more
resources than I can contribute, and probably more than Canonical can
contribute. But we can probably figure out a way to get people past this
hurdle, it just might take a while.

I guess it's possible that whatever change broke the compatibility could
also be found and reverted, but that'd just re-introduce the problem for
whoever had already dealt with the problem. Depending upon how many
people are affected, it might or might not make sense to revert the
change and deal with the fallout of a *second* break.

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to ubuntu-release-upgrader in
Ubuntu.
https://bugs.launchpad.net/bugs/2110891

Title:
  System freeze on release upgrade 24.10 oracular to 25.04 plucky with
  root fs on ZFS

Status in linux package in Ubuntu:
  Confirmed
Status in ubuntu-release-upgrader package in Ubuntu:
  Deferred
Status in zfs-linux package in Ubuntu:
  Confirmed
Status in linux source package in Plucky:
  Confirmed
Status in ubuntu-release-upgrader source package in Plucky:
  Fix Released
Status in zfs-linux source package in Plucky:
  Confirmed

Bug description:
  [Impact]
  Upgrades to 25.10 (and newer) fail as the ZFS userspace modules are newer than the kernel modules and there is some incompatibility that can cause the kernel to hang, particularly the bug observes a problem with iterating over the snapshots of the root file system freezing it.

  [Test plan: ubuntu-release-upgrader]
  1. Create a vm
  2. Check that the upgrade doesn't trigger the screen
     (no zpool exists)
  3. Install zfsutils-linux
  4. Check that the upgrade doesn't trigger the screen
     (zpool list -H will be empty)
  5. Create a zpool, for example in a file
      # fallocate /swap -l 2G
      # zpool create foo /swap
      # zpool list
      NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
      foo 1.88G 134K 1.87G - - 0% 0% 1.00x ONLINE -
      # zpool list -H
      foo 1.88G 110K 1.87G - - 0% 0% 1.00x ONLINE -
  6. Check that the upgrade aborts

  [Where problems could occur: ubuntu-release-upgrader]
  The change disables upgrades for all systems with a zpool, regardless of snapshots existing or zpool being on root, this also may include systems that could upgrade successfully.

  If the code is wrong it could inadvertently block systems without ZFS
  from upgrading, however this seems reasonably mitigated by the test
  plan.

  [Original bug report]

  SUMMARY:
  I'm running Ubuntu 24.10 on a ZFS root fs on LUKS which also has snapshots. As part of the release upgrade process ubuntu-release-upgrader is triggering a bug in ZFS triggered by the update-grub steps which executes ls /.zfs/snapshot/… via 10_linux_zfs and eventually causes a kernel deadlock on mount.zfs.

  As the `ls` statement in question works flawlessly before the upgrade
  process, I assume the hang is caused due to ZoL been replaced by apt
  previously? That's why I feel this is a ubuntu-release-upgrader
  package bug.

  I've reported this bug also upstream at ZoL:
  https://github.com/openzfs/zfs/issues/17337

  DETAILED DESCRIPTION:
  I've failed now 7 time to upgrade my Ubuntu 24.10 to 25.04 (Beta, RC, First Release, Today). In all cases the upgrade runs into a complete system freeze deadlock. `zfs rollback`  for the rescue. Today, only using text console with screen & running `dmesg -Hxw`, `htop -d 5` and `do-release-upgrade` in parallel I finally was able to pinpoint the problem:

    1. `do-release-upgrade` downloads & install all updated .deb
  packages

    2. Eventually, the upgrade tries to run `upgrade-grub`

    3. The grub script executes the hook `10_linux_zfs` to identify the
  available kernel versions for the grub boot menu

    4. As part of this discovery an `ls /.zfs/snapshot/[snapshot]/etc`
  is executed which causes a system freeze

    5. Eventually, after a long pause the kernel reports `task ls:…
  blocked for more than 122 seconds.`

    6. Only option left for me is to shutdown server and revert to ZFS
  snapshot I did before the upgrade

  REMARKS:
  * Executing "ls /.zfs/snapshot/[snapshot]/etc" and "update-grub" works successfully BEFORE upgrade
  * Rebooting into the partial upgraded system, running "strace update-grub" reproduces the problem.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2110891/+subscriptions




More information about the foundations-bugs mailing list