[Bug 1891718] Re: [Regression] breaks GRUB install on an nvme device
dann frazier
1891718 at bugs.launchpad.net
Wed Oct 14 15:45:33 UTC 2020
Hi Robie,
Thank you for the review. I don't believe a surgical fix is practical here. I did look to see if we could somehow back out the code that introduced the regression. That was introduced upstream between versions v35 & v36 in this commit:
https://github.com/rhboot/efivar/commit/ff696a432bf92af1206e235a60ad28b58ff0ab92
That was a significant refactoring itself and taking a v37-based-package back to the "old way" didn't seem feasible. So I think we're stuck evolving the new code to deal with these devices, and I don't have an alternate solution to following what upstream has done. Here's some patch-by-patch justification/commentary:
= 0001-Fix-the-error-path-in-set_disk_and_part_name.patch =
TBH, this is one patch that could be dropped, and I could refactor later patches to leave the old behavior. It fixes an obvious bug - that is, even if set_disk_and_part_name() fails to parse the device, it will always return 0. The calling code in device_get() checks for non-zero before proceeding to dereference strings that set_disk_and_part_name() would only set if it *actually* succeeded. Since technically this bug is unrelated to what I'm fixing here, I'm open to dropping it if you prefer or, alternatively, I could file a new bug and update the patch comments to make it's justification appear less arbitrary.
= 0002-sysfs-parsers-make-all-the-sys-block-link-parsers-wo.patch =
While the upstream version of the patch I backported does change all parsers, I restricted the changes to the NVMe parser to reduce regression risk. This patch is part of the evolution to add support for nvme-subsys devices, and I think trying to avoid it would be riskier than including it.
= 0003-Try-even-harder-to-find-disk-device-symlinks-in-sysf.patch =
The refactoring here is introduced to change sysfs parsing behavior, where NVMe devices are the given example in the commit log/comments. I don't see a practical way of avoiding it.
= 0004-Handle-sys-devices-virtual-nvme-fabrics-nvme-subsyst.patch =
This adds the virtual/nvme-subsystem parsing code that is required to solve this issue.
= 0005-Fix-variable-sz-uninitialized-error.patch =
A bugfix for the code introduced above.
= 0006-Fix-parsing-for-nvme-subsystem-devices.patch =
Another bugfix that makes the nvme-subsystem code actually do what the existing comments assert it should do.
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to efivar in Ubuntu.
https://bugs.launchpad.net/bugs/1891718
Title:
[Regression] breaks GRUB install on an nvme device
Status in Efivar:
New
Status in efivar package in Ubuntu:
Fix Released
Status in efivar source package in Focal:
In Progress
Status in efivar source package in Groovy:
Fix Released
Bug description:
[Impact]
Grub fails to install on systems with nvme-subsys storage when installing focal, or upgrading from bionic to focal. As symptom of the latter is shown below:
┌───────────────────────┤ Configuring shim-signed ├────────────────────────┐
│ │
│ GRUB failed to install to the following devices: │
│ │
│ /dev/nvme0n1p1 │
│ │
│ Do you want to continue anyway? If you do, your computer may not start │
│ up properly. │
│ │
│ Writing GRUB to boot device failed - continue? │
│ │
│ <Yes> <No>
[Test Case]
On a system with an EFI System Partition residing on an nvme-subsys block device, run grub-install:
$ sudo /usr/sbin/grub-install
Installing for x86_64-efi platform.
/usr/sbin/grub-install: warning: Internal error.
/usr/sbin/grub-install: error: failed to register the EFI boot entry: Operation not permitted.
Also, regression test on a system with a non-nvme-subsys NVMe device.
[Regression Potential]
There's a risk that a parsing bug will introduce a regression to other systems - most at risk are systems with NVMe block devices. Supposedly nvme-fabrics systems were not previously supported and, as a side-effect of this backport, will now be. However, it's possible - like was the case with bionic/nvme-subsys - that nvme-fabrics *used* to happen to work, and now will work differently or even break. Ideally we'd be able to test on such a system, but I don't know where to find that hardware.
This fix has been in groovy since just after beta (minus an innocuous
debug statement that upstream requested during review), and the same
patches apply cleanly to focal, which should help mitigate the risk by
way of some real world exposure.
To manage notifications about this bug go to:
https://bugs.launchpad.net/efivar/+bug/1891718/+subscriptions
More information about the foundations-bugs
mailing list