[Bug 2110891] Re: System freeze on release upgrade 24.10 oracular to 25.04 plucky with root fs on ZFS

Helio Loureiro 2110891 at bugs.launchpad.net
Fri Jul 11 15:05:21 UTC 2025


Hi,

Please consider the ideas bellow as brain storm.

But why not forget a bit about the upgrade to plucky and instead have a fix
on oracular instead?  That might bypass the problem with people that
already upgraded.

Another option: deliver specialized ISO media to upgrade only kernel and
zfs also on oracular.  So we can proceed to the upgrade without stepping
into the bug.

Are one of those ideas doable?

Best Regards,
Helio Loureiro
https://helio.loureiro.eng.br
https://github.com/helioloureiro
https://mastodon.social/@helioloureiro


On Thu, 10 Jul 2025 at 23:10, Seth Arnold <2110891 at bugs.launchpad.net>
wrote:

> Well, there's a few different kinds of "real fix".
>
> My message is mostly about trying to get our users to an upgrade path
> that works with a minimum of fuss.
>
> The upstream OpenZFS developers have never supported different versions
> of userland utilities than the kernel modules. It's just happened to
> work fine for years because most people reboot into matching kernel
> modules shortly after they install new userland utilities and don't
> exhaustively stress the entire user-to-kernel communication in that
> window.
>
> But our scripts are doing something that exposes a mismatch. Probably
> the OpenZFS developers didn't expect our scripts to use this
> functionality in the moments between installing the utilities and
> rebooting into the new kernel modules.
>
> Fully addressing backwards and forwards compatibility in both the zfs
> kernel modules and the zfs userland utilities is probably expensive
> enough that nobody will ever do it. (If it were cheap and easy, surely
> someone would have done it.) Projects have to prioritize what they can
> work on with the resources that they have -- on-disk compatibility is
> far more important to the OpenZFS developers than backwards and forwards
> compatibility on the command and control channel.
>
> Yes, a full fix would be nice, but I expect it'll take far more
> resources than I can contribute, and probably more than Canonical can
> contribute. But we can probably figure out a way to get people past this
> hurdle, it just might take a while.
>
> I guess it's possible that whatever change broke the compatibility could
> also be found and reverted, but that'd just re-introduce the problem for
> whoever had already dealt with the problem. Depending upon how many
> people are affected, it might or might not make sense to revert the
> change and deal with the fallout of a *second* break.
>
> --
> You received this bug notification because you are subscribed to the bug
> report.
> https://bugs.launchpad.net/bugs/2110891
>
> Title:
>   System freeze on release upgrade 24.10 oracular to 25.04 plucky with
>   root fs on ZFS
>
> Status in linux package in Ubuntu:
>   Confirmed
> Status in ubuntu-release-upgrader package in Ubuntu:
>   Deferred
> Status in zfs-linux package in Ubuntu:
>   Confirmed
> Status in linux source package in Plucky:
>   Confirmed
> Status in ubuntu-release-upgrader source package in Plucky:
>   Fix Released
> Status in zfs-linux source package in Plucky:
>   Confirmed
>
> Bug description:
>   [Impact]
>   Upgrades to 25.10 (and newer) fail as the ZFS userspace modules are
> newer than the kernel modules and there is some incompatibility that can
> cause the kernel to hang, particularly the bug observes a problem with
> iterating over the snapshots of the root file system freezing it.
>
>   [Test plan: ubuntu-release-upgrader]
>   1. Create a vm
>   2. Check that the upgrade doesn't trigger the screen
>      (no zpool exists)
>   3. Install zfsutils-linux
>   4. Check that the upgrade doesn't trigger the screen
>      (zpool list -H will be empty)
>   5. Create a zpool, for example in a file
>       # fallocate /swap -l 2G
>       # zpool create foo /swap
>       # zpool list
>       NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
>       foo 1.88G 134K 1.87G - - 0% 0% 1.00x ONLINE -
>       # zpool list -H
>       foo 1.88G 110K 1.87G - - 0% 0% 1.00x ONLINE -
>   6. Check that the upgrade aborts
>
>   [Where problems could occur: ubuntu-release-upgrader]
>   The change disables upgrades for all systems with a zpool, regardless of
> snapshots existing or zpool being on root, this also may include systems
> that could upgrade successfully.
>
>   If the code is wrong it could inadvertently block systems without ZFS
>   from upgrading, however this seems reasonably mitigated by the test
>   plan.
>
>   [Original bug report]
>
>   SUMMARY:
>   I'm running Ubuntu 24.10 on a ZFS root fs on LUKS which also has
> snapshots. As part of the release upgrade process ubuntu-release-upgrader
> is triggering a bug in ZFS triggered by the update-grub steps which
> executes ls /.zfs/snapshot/… via 10_linux_zfs and eventually causes a
> kernel deadlock on mount.zfs.
>
>   As the `ls` statement in question works flawlessly before the upgrade
>   process, I assume the hang is caused due to ZoL been replaced by apt
>   previously? That's why I feel this is a ubuntu-release-upgrader
>   package bug.
>
>   I've reported this bug also upstream at ZoL:
>   https://github.com/openzfs/zfs/issues/17337
>
>   DETAILED DESCRIPTION:
>   I've failed now 7 time to upgrade my Ubuntu 24.10 to 25.04 (Beta, RC,
> First Release, Today). In all cases the upgrade runs into a complete system
> freeze deadlock. `zfs rollback`  for the rescue. Today, only using text
> console with screen & running `dmesg -Hxw`, `htop -d 5` and
> `do-release-upgrade` in parallel I finally was able to pinpoint the problem:
>
>     1. `do-release-upgrade` downloads & install all updated .deb
>   packages
>
>     2. Eventually, the upgrade tries to run `upgrade-grub`
>
>     3. The grub script executes the hook `10_linux_zfs` to identify the
>   available kernel versions for the grub boot menu
>
>     4. As part of this discovery an `ls /.zfs/snapshot/[snapshot]/etc`
>   is executed which causes a system freeze
>
>     5. Eventually, after a long pause the kernel reports `task ls:…
>   blocked for more than 122 seconds.`
>
>     6. Only option left for me is to shutdown server and revert to ZFS
>   snapshot I did before the upgrade
>
>   REMARKS:
>   * Executing "ls /.zfs/snapshot/[snapshot]/etc" and "update-grub" works
> successfully BEFORE upgrade
>   * Rebooting into the partial upgraded system, running "strace
> update-grub" reproduces the problem.
>
> To manage notifications about this bug go to:
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2110891/+subscriptions
>
>

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to ubuntu-release-upgrader in
Ubuntu.
https://bugs.launchpad.net/bugs/2110891

Title:
  System freeze on release upgrade 24.10 oracular to 25.04 plucky with
  root fs on ZFS

Status in linux package in Ubuntu:
  Confirmed
Status in ubuntu-release-upgrader package in Ubuntu:
  Deferred
Status in zfs-linux package in Ubuntu:
  Confirmed
Status in linux source package in Plucky:
  Confirmed
Status in ubuntu-release-upgrader source package in Plucky:
  Fix Released
Status in zfs-linux source package in Plucky:
  Confirmed

Bug description:
  [Impact]
  Upgrades to 25.10 (and newer) fail as the ZFS userspace modules are newer than the kernel modules and there is some incompatibility that can cause the kernel to hang, particularly the bug observes a problem with iterating over the snapshots of the root file system freezing it.

  [Test plan: ubuntu-release-upgrader]
  1. Create a vm
  2. Check that the upgrade doesn't trigger the screen
     (no zpool exists)
  3. Install zfsutils-linux
  4. Check that the upgrade doesn't trigger the screen
     (zpool list -H will be empty)
  5. Create a zpool, for example in a file
      # fallocate /swap -l 2G
      # zpool create foo /swap
      # zpool list
      NAME SIZE ALLOC FREE CKPOINT EXPANDSZ FRAG CAP DEDUP HEALTH ALTROOT
      foo 1.88G 134K 1.87G - - 0% 0% 1.00x ONLINE -
      # zpool list -H
      foo 1.88G 110K 1.87G - - 0% 0% 1.00x ONLINE -
  6. Check that the upgrade aborts

  [Where problems could occur: ubuntu-release-upgrader]
  The change disables upgrades for all systems with a zpool, regardless of snapshots existing or zpool being on root, this also may include systems that could upgrade successfully.

  If the code is wrong it could inadvertently block systems without ZFS
  from upgrading, however this seems reasonably mitigated by the test
  plan.

  [Original bug report]

  SUMMARY:
  I'm running Ubuntu 24.10 on a ZFS root fs on LUKS which also has snapshots. As part of the release upgrade process ubuntu-release-upgrader is triggering a bug in ZFS triggered by the update-grub steps which executes ls /.zfs/snapshot/… via 10_linux_zfs and eventually causes a kernel deadlock on mount.zfs.

  As the `ls` statement in question works flawlessly before the upgrade
  process, I assume the hang is caused due to ZoL been replaced by apt
  previously? That's why I feel this is a ubuntu-release-upgrader
  package bug.

  I've reported this bug also upstream at ZoL:
  https://github.com/openzfs/zfs/issues/17337

  DETAILED DESCRIPTION:
  I've failed now 7 time to upgrade my Ubuntu 24.10 to 25.04 (Beta, RC, First Release, Today). In all cases the upgrade runs into a complete system freeze deadlock. `zfs rollback`  for the rescue. Today, only using text console with screen & running `dmesg -Hxw`, `htop -d 5` and `do-release-upgrade` in parallel I finally was able to pinpoint the problem:

    1. `do-release-upgrade` downloads & install all updated .deb
  packages

    2. Eventually, the upgrade tries to run `upgrade-grub`

    3. The grub script executes the hook `10_linux_zfs` to identify the
  available kernel versions for the grub boot menu

    4. As part of this discovery an `ls /.zfs/snapshot/[snapshot]/etc`
  is executed which causes a system freeze

    5. Eventually, after a long pause the kernel reports `task ls:…
  blocked for more than 122 seconds.`

    6. Only option left for me is to shutdown server and revert to ZFS
  snapshot I did before the upgrade

  REMARKS:
  * Executing "ls /.zfs/snapshot/[snapshot]/etc" and "update-grub" works successfully BEFORE upgrade
  * Rebooting into the partial upgraded system, running "strace update-grub" reproduces the problem.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2110891/+subscriptions




More information about the foundations-bugs mailing list