[Bug 1900796] Re: Moonshot ProLiant m400 fails to boot "Wrong Ramdisk Image Format"
dann frazier
1900796 at bugs.launchpad.net
Mon Jan 25 15:36:53 UTC 2021
** Description changed:
- Issue found on ARM64 node ms10-35-mcdivittb0-kernel with F-5.8.
- Install the plymouth package from proposed will cause system failed to boot, with error message:
- Wrong Ramdisk Image Format
- Ramdisk image is corrupt or invalid
+ [Impact]
+ Due to a firmware (u-boot) bug in reading ext4 filesystems extents, ProLiant m400 systems may fail to boot after installing a new kernel. This seems to be exacerbated when there is limited free space on the /boot filesystem. HPE is no longer providing new firmware fixes for this platform.
- It's 100% reproducible.
+ [Test Case]
+ Install a new kernel and reboot. When this bug is triggered, you'll see the following errors (emphasis <<>> mine):
- Steps to reproduce this:
- 1. Deploy Focal on this node (Architecture set to arm64/xgene-uboot, it can't be deployed if set to arm64/generic, PXE boot failed with TFTP error: 'File not found' (1))
- 2. Enable proposed pocket.
- 3. Install the linux-generic-hwe-20.04-edge and reboot
- 4. It's now running with 5.8.0-25-generic on Focal
- 5. Install plymouth from proposed and reboot
-
- In step 5 it will generate the new boot image.
-
- $ sudo apt install plymouth
- Reading package lists... Done
- Building dependency tree
- Reading state information... Done
- The following additional packages will be installed:
- plymouth-theme-ubuntu-text
- Suggested packages:
- desktop-base plymouth-themes
- The following packages will be upgraded:
- plymouth plymouth-theme-ubuntu-text
- 2 upgraded, 0 newly installed, 0 to remove and 18 not upgraded.
- Need to get 121 kB of archives.
- After this operation, 0 B of additional disk space will be used.
- Do you want to continue? [Y/n]
- Get:1 http://ports.ubuntu.com/ubuntu-ports focal-proposed/main arm64 plymouth-theme-ubuntu-text arm64 0.9.4git20200323-0ubuntu6.1 [9148 B]
- Get:2 http://ports.ubuntu.com/ubuntu-ports focal-proposed/main arm64 plymouth arm64 0.9.4git20200323-0ubuntu6.1 [112 kB]
- Fetched 121 kB in 0s (326 kB/s)
- (Reading database ... 112559 files and directories currently installed.)
- Preparing to unpack .../plymouth-theme-ubuntu-text_0.9.4git20200323-0ubuntu6.1_arm64.deb ...
- Unpacking plymouth-theme-ubuntu-text (0.9.4git20200323-0ubuntu6.1) over (0.9.4git20200323-0ubuntu6) ...
- Preparing to unpack .../plymouth_0.9.4git20200323-0ubuntu6.1_arm64.deb ...
- Unpacking plymouth (0.9.4git20200323-0ubuntu6.1) over (0.9.4git20200323-0ubuntu6) ...
- Setting up plymouth (0.9.4git20200323-0ubuntu6.1) ...
- update-initramfs: deferring update (trigger activated)
- update-rc.d: warning: start and stop actions are no longer supported; falling back to defaults
- update-rc.d: warning: start and stop actions are no longer supported; falling back to defaults
- Setting up plymouth-theme-ubuntu-text (0.9.4git20200323-0ubuntu6.1) ...
- update-initramfs: deferring update (trigger activated)
- Processing triggers for man-db (2.9.1-1) ...
- Processing triggers for systemd (245.4-4ubuntu3.2) ...
- Processing triggers for initramfs-tools (0.136ubuntu6.3) ...
- update-initramfs: Generating /boot/initrd.img-5.8.0-25-generic
- flash-kernel: installing version 5.8.0-25-generic
- Generating kernel u-boot image... done.
- Taking backup of uImage.
- Installing new uImage.
- Generating initramfs u-boot image... done.
- Taking backup of uInitrd.
- Installing new uInitrd.
- Generating boot script u-boot image... done.
- Taking backup of boot.scr.
- Installing new boot.scr.
-
- From the console you will see:
- [ OK ] Finished Reboot.
- [ OK ] Reached target Reboot.
- [ 290.396974] pci_bus 0000:01: 2-byte config write to 0000:01:00.0 offset 0x4 may corrupt adjacent RW1C bits
- [ 290.512964] pci_bus 0000:00: 2-byte config write to 0000:00:00.0 offset 0x5c may corrupt adjacent RW1C bits
- [ 290.629960] pci_bus 0000:00: 2-byte config write to 0000:00:00.0 offset 0x48 may corrupt adjacent RW1C bits
- [ 290.746976] pci_bus 0000:00: 2-byte config write to 0000:00:00.0 offset 0x4 may corrupt adjacent RW1C bits
- [ 290.863165] reboot: Restarting system
-
- U-Boot 2013.04 (Mar 26 2015 - 11:31:01)
-
- ProLiant m400 Server Cartridge - U02 (02/26/2015)
- Copyright 2013 - 2015 Hewlett-Packard Development Company, L.P.
- Copyright 2000 - 2012 Wolfgang Denk, DENX Software Engineering, wd at denx.de
-
- CPU0: APM ARM 64-bit Potenza Rev B0 2400MHz PCP 2400MHz
- 32 KB ICACHE, 32 KB DCACHE
- SOC 2000MHz IOBAXI 400MHz AXI 250MHz AHB 200MHz GFC 125MHz
- Boot from SPI-NOR
- Slimpro FW: Ver: 2.3 (build 2015/03/16)
- PMD: 960 mV
- SOC: 950 mV
- I2C: ready
- DRAM: ECC 64 GiB @ 1333MHz
- relocation Address is: 0x4ffff27000
- Using default environment
-
- API sig @ 0x0000004ffdf17170
- In: serial
- Out: serial
- Err: serial
- CPUs: 11111111
- CPLD: 0B
- PCIE3: (RC) X8 GEN-2 link up
- 00:00.0 - 19aa:e008 - Bridge device
- 01:00.0 - 15b3:1007 - Network controller
- SF: Detected MX25L12805D with page size 64 KiB, total 16 MiB
- SF: 16384 KiB MX25L12805D at 0:0 is now current device
-
- SF: flash read success (19048 bytes @ 0xe0000)
- .
- SF: flash read success (65568 bytes @ 0xc0000)
- Node Boot Start Time: 2020-10-21T05:29:01
- Node Serial Number: CN7505VJ4B
- Cartridge Chassis Slot ID: 35
- Cartridge Serial Number: CN7505VJ4B
- Chassis Serial Number: USE42207F7
- Chassis Asset Tag:
- Node UUID: 9A026903-E1ED-5E39-9768-ED1FB68A9301
- Product ID: 721717-B21
- Timezone Name: America/New_York
- SCSI: Target spinup took 0 ms.
- AHCI2 0001.0300 32 slots 2 ports 6 Gbps 0x3 impl SATA mode
- flags: 64bit ncq pm only pmp fbss pio slum part ccc
- scanning bus for devices...
- Device 0: (4:0) Vendor: ATA Prod.: XR0240GEBLU Rev: HPS4
- Type: Hard Disk
- Capacity: 228936.5 MB = 223.5 GB (468862128 x 512)
- Found 1 device(s).
- Boot: PXE, M.2
- Mellanox ConnectX3 U-Boot driver version 1.1
- Mellanox ConnectX3 Firmware Version 2.32.5330
- Net: NIC1 [PRIME], NIC2
-
- Booting PXE
- Requesting DHCP address via NIC1
- BOOTP broadcast 1
- DHCP client bound to address 10.229.65.135
- Retrieving file: pxelinux.cfg/9A026903-E1ED-5E39-9768-ED1FB68A9301
- Using NIC1 device
- TFTP from server 10.229.32.21; our IP address is 10.229.65.135
- Filename 'pxelinux.cfg/9A026903-E1ED-5E39-9768-ED1FB68A9301'.
- Load address: 0x4000800000
- Loading: *
- TFTP error: 'File not found' (1)
- Not retrying...
- Retrieving file: pxelinux.cfg/01-14-58-d0-58-c3-c2
- Using NIC1 device
- TFTP from server 10.229.32.21; our IP address is 10.229.65.135
- Filename 'pxelinux.cfg/01-14-58-d0-58-c3-c2'.
- Load address: 0x4000800000
- Loading: #
- 0 Bytes/s
- done
- Bytes transferred = 41 (29 hex)
- Config file found
- 1: local
- PXE: executing localboot
- 288 bytes read in 34 ms (7.8 KiB/s)
## Executing script at 4004000000
11349894 bytes read in 312 ms (34.7 MiB/s)
- invalid extent block
+ <<invalid extent block>>
## Booting kernel from Legacy Image at 4002000000 ...
- Image Name: kernel 5.8.0-25-generic
- Created: 2020-10-21 5:26:34 UTC
- Image Type: ARM Linux Kernel Image (gzip compressed)
- Data Size: 11349830 Bytes = 10.8 MiB
- Load Address: 00080000
- Entry Point: 00080000
- Verifying Checksum ... OK
+ Image Name: kernel 5.8.0-25-generic
+ Created: 2020-10-21 5:26:34 UTC
+ Image Type: ARM Linux Kernel Image (gzip compressed)
+ Data Size: 11349830 Bytes = 10.8 MiB
+ Load Address: 00080000
+ Entry Point: 00080000
+ Verifying Checksum ... OK
Wrong Ramdisk Image Format
- Ramdisk image is corrupt or invalid
- Booting M.2
+ <<Ramdisk image is corrupt or invalid>>
- ProblemType: Bug
- DistroRelease: Ubuntu 20.04
- Package: plymouth 0.9.4git20200323-0ubuntu6.1
- ProcVersionSignature: Ubuntu 5.8.0-25.26~20.04.1-generic 5.8.14
- Uname: Linux 5.8.0-25-generic aarch64
- ApportVersion: 2.20.11-0ubuntu27.10
- Architecture: arm64
- BootLog: Error: [Errno 2] No such file or directory: '/var/log/boot.log'
- CasperMD5CheckResult: skip
- Date: Wed Oct 21 05:27:35 2020
- Lspci-vt: -[0000:00]---00.0-[01]----00.0 Mellanox Technologies MT27520 Family [ConnectX-3 Pro]
- Lsusb: Error: command ['lsusb'] failed with exit code 1:
- Lsusb-t:
-
- Lsusb-v: Error: command ['lsusb', '-v'] failed with exit code 1:
- ProcCmdLine: console=ttyS0,9600n8r ro
- ProcFB:
-
- ProcKernelCmdLine: console=ttyS0,9600n8r ro
- SourcePackage: plymouth
- TextPlymouth: /usr/share/plymouth/themes/ubuntu-text/ubuntu-text.plymouth
- UpgradeStatus: No upgrade log present (probably fresh install)
- acpidump:
+ [Where Problems Could Occur]
+ The workaround I've added here is to attempt to defrag the boot files so that the u-boot parsing bug is not triggered. This workaround is only activated for machines tagged with a certain property, and only the m400 server is tagged w/ that property. If there is a bug in detecting the platform or property, it could of course impact other platforms. Though it should be said that this code uses a well-established flash-kernel pattern. On the m400, the code only implements the workaround if /boot is on an ext4 filesystem (the Ubuntu default). If the filesystem detection code is buggy, we may unintentionally run e4defrag on a non-ext4 filesystem which could cause errors. Those errors currently only cause a warning to be printed - it does not fail the script. Users who miss this warning could still end up with an unbootable system if the workaround fails -- which it may, if the disk is very close to full. Long term, we should consider making this error fatal.
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to plymouth in Ubuntu.
https://bugs.launchpad.net/bugs/1900796
Title:
Moonshot ProLiant m400 fails to boot "Wrong Ramdisk Image Format"
Status in ubuntu-kernel-tests:
New
Status in flash-kernel package in Ubuntu:
Fix Released
Status in plymouth package in Ubuntu:
Invalid
Status in flash-kernel source package in Xenial:
New
Status in plymouth source package in Xenial:
Invalid
Status in flash-kernel source package in Bionic:
Confirmed
Status in plymouth source package in Bionic:
Invalid
Status in flash-kernel source package in Focal:
Fix Committed
Status in plymouth source package in Focal:
Invalid
Status in flash-kernel source package in Groovy:
Fix Committed
Status in plymouth source package in Groovy:
Invalid
Status in flash-kernel source package in Hirsute:
Fix Released
Status in plymouth source package in Hirsute:
Invalid
Bug description:
[Impact]
Due to a firmware (u-boot) bug in reading ext4 filesystems extents, ProLiant m400 systems may fail to boot after installing a new kernel. This seems to be exacerbated when there is limited free space on the /boot filesystem. HPE is no longer providing new firmware fixes for this platform.
[Test Case]
Install a new kernel and reboot. When this bug is triggered, you'll see the following errors (emphasis <<>> mine):
## Executing script at 4004000000
11349894 bytes read in 312 ms (34.7 MiB/s)
<<invalid extent block>>
## Booting kernel from Legacy Image at 4002000000 ...
Image Name: kernel 5.8.0-25-generic
Created: 2020-10-21 5:26:34 UTC
Image Type: ARM Linux Kernel Image (gzip compressed)
Data Size: 11349830 Bytes = 10.8 MiB
Load Address: 00080000
Entry Point: 00080000
Verifying Checksum ... OK
Wrong Ramdisk Image Format
<<Ramdisk image is corrupt or invalid>>
[Where Problems Could Occur]
The workaround I've added here is to attempt to defrag the boot files so that the u-boot parsing bug is not triggered. This workaround is only activated for machines tagged with a certain property, and only the m400 server is tagged w/ that property. If there is a bug in detecting the platform or property, it could of course impact other platforms. Though it should be said that this code uses a well-established flash-kernel pattern. On the m400, the code only implements the workaround if /boot is on an ext4 filesystem (the Ubuntu default). If the filesystem detection code is buggy, we may unintentionally run e4defrag on a non-ext4 filesystem which could cause errors. Those errors currently only cause a warning to be printed - it does not fail the script. Users who miss this warning could still end up with an unbootable system if the workaround fails -- which it may, if the disk is very close to full. Long term, we should consider making this error fatal.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu-kernel-tests/+bug/1900796/+subscriptions
More information about the foundations-bugs
mailing list