[Bug 1842320] Re: Out of Memory on boot with 5.2.0 kernel

jeremyszu 1842320 at bugs.launchpad.net
Tue Jun 28 14:59:13 UTC 2022


** Description changed:

+ [Impact]
+ 
+  * In some cases, if the users’ initramfs grow bigger, then it’ll likely
+ not be able to be loaded by grub2.
+ 
+  * Some real cases from OEM projects, in many built-in 4k monitor
+ laptops with nvidia drivers, the u-d-c puts the nvidia*.ko to initramfs
+ which grows the initramfs to ~120M. Also the gfxpayload=auto will remain
+ to use 4K resolution since it’s what EFI POST passed.
+ 
+ In this case, the grub isn't able to load initramfs because the grub_memalign() won't be able to get suitable memory for the larger file:
+ ```
+ #0 grub_memalign (align=1, size=592214020) at ../../../grub-core/kern/mm.c:376
+ #1 0x000000007dd7b074 in grub_malloc (size=592214020) at ../../../grub-core/kern/mm.c:408
+ #2 0x000000007dd7a2c8 in grub_verifiers_open (io=0x7bc02d80, type=131076)
+     at ../../../grub-core/kern/verifiers.c:150
+ #3 0x000000007dd801d4 in grub_file_open (name=0x7bc02f00 "/boot/initrd.img-5.17.0-1011-oem",
+     type=131076) at ../../../grub-core/kern/file.c:121
+ #4 0x000000007bcd5a30 in ?? ()
+ #5 0x000000007fe21247 in ?? ()
+ #6 0x000000007bc030c8 in ?? ()
+ #7 0x000000017fe21238 in ?? ()
+ #8 0x000000007bcd5320 in ?? ()
+ #9 0x000000007fe21250 in ?? ()
+ #10 0x0000000000000000 in ?? ()
+ ```
+ 
+ Based on grub_mm_dump, we can see the memory fragment (some parts seem likely be used because of 4K resolution?) and doesn’t have available contiguous memory for larger file as:
+ ```
+ grub_real_malloc(...)
+ …
+ if (cur->size >= n + extra)
+ ```
+ 
+ Based on UEFI Specification Section 7.2[1] and UEFI driver writers’
+ guide 4.2.3[2], we can ask 32bits+ on AllocatePages()
+ 
+ As most X86_64 platforms should support 64 bits addressing, we should
+ extend GRUB_EFI_MAX_USABLE_ADDRESS to 64 bits to get more available
+ memory.
+ 
+  * When users grown the initramfs, then probably will get initramfs not
+ found which really annoyed and impact the user experience (system not
+ able to boot).
+ 
+ [Test Plan]
+ 
+  * detailed instructions how to reproduce the bug:
+ Any method to grow the initramfs, such as install nvidia-driver.
+ If developers would like to reproduce, then could dd if=/dev/random of=... bs=1M count=500, something like:
+ ```
+ $ cat /usr/share/initramfs-tools/hooks/zzz-touch-a-file
+ #!/bin/sh
+ 
+ PREREQ=""
+ 
+ prereqs()
+ {
+         echo "$PREREQ"
+ }
+ 
+ case $1 in
+ # get pre-requisites
+ prereqs)
+         prereqs
+         exit 0
+         ;;
+ esac
+ 
+ . /usr/share/initramfs-tools/hook-functions
+ dd if=/dev/random of=${DESTDIR}/test-500M bs=1M count=500
+ ```
+ And update-initramfs
+ 
+  * After applying my patches, the issue is gone.
+ 
+  * I did also test my test grubx64.efi in
+ X86_64 qemu with 
+ 60M initramfs + 5.15.0-37-generic kernel
+ 565M initramfs + 5.17.0-1011-oem kernel
+ Amd64 HP mobile workstation with
+ 65M initramfs + 5.15.0-39-generic kernel
+ 771M initramfs + 5.17.0-1011-oem kernel
+ All working well.
+ 
+ [Where problems could occur]
+ 
+ * The changes almost in i386/efi, thus the impact will be in the i386 /
+ x86_64 EFI system. The other change is to modify the “grub-
+ core/kern/efi/mm.c” but I use the original addressing for
+ “arm/arm64/ia64/riscv32/riscv64”. Thus it should not impact them.
+ 
+ * There is a “#if defined(__x86_64__)” which intent to limit the > 32bits code in i386 system and also
+ ```
+  #if defined (__code_model_large__)
+ -#define GRUB_EFI_MAX_USABLE_ADDRESS 0xffffffff
+ +#define GRUB_EFI_MAX_USABLE_ADDRESS __UINTPTR_MAX__
+ +#define GRUB_EFI_MAX_ALLOCATION_ADDRESS 0x7fffffff
+  #else
+  #define GRUB_EFI_MAX_USABLE_ADDRESS 0x7fffffff
+ +#define GRUB_EFI_MAX_ALLOCATION_ADDRESS 0x3fffffff
+  #endif
+ ```
+ If everything works as expected, then i386 should working good.
+ If not lucky, based on “UEFI writers’ guide”[2], the i386 will get > 4GB memory region and never be able to access.
+ 
+ [Other Info]
+  
+  * Upstream grub2 bug #61058
+ https://savannah.gnu.org/bugs/index.php?61058
+  * Test PPA: https://launchpad.net/~os369510/+archive/ubuntu/lp1842320
+  * Test grubx64.efi: https://people.canonical.com/~jeremysu/lp1842320/grubx64.efi.lp1842320
+  * Test source code: https://github.com/os369510/grub2/tree/lp1842320
+  * If you built the package, then test grubx64.efi is under “obj/monolithic/grub-efi-amd64/grubx64.efi”, in my case: `/var/cache/pbuilder/build/276481/build/grub2-2.06/obj/monolithic/grub-efi-amd64/grubx64.efi`
+  * My build command: `sudo PBSHELL=1 pbuilder build --hookdir ~/hook-dir ubuntu-grub/grub2_2.06-2ubuntu7+jeremydev2.dsc 2>&1 | tee build.log`
+  * My qemu command: `qemu-system-x86_64 -bios edk2/Build/OvmfX64/DEBUG_GCC5/FV/OVMF.fd -hda Templates/grub.qcow2 -m 6G -vga cirrus -smp 8 -machine type=q35,accel=kvm -cpu host -enable-kvm -boot menu=on` (I built an edk2 binary with debugging log)
+  * You can use my grubx64.efi with debug symbols from https://people.canonical.com/~jeremysu/lp1842320/grubx64.efi.lp1842320-dev-with-debug-symbols and source code is from https://github.com/os369510/grub2/tree/jeremy-dev . After built the package from source code, then you can use gdb to attach the qemu session as:
+ ```
+ ubuntu at ubuntu-HP-ZBook-Fury-16-G9-Mobile-Workstation-PC [ /var/cache/pbuilder/build/35354/tmp/buildd/grub2-2.06/obj/grub-efi-amd64/grub-core ]
+ $ gdb -x gdb_grub # with “add-symbol-file kernel.img ${address}
+ ```
+ The address above can read from qemu serial port and found the last
+ “Loading driver at 0x000xxxxxxxxxx EntryPoint=0x000xxxxxxxabc”
+ In above case, fill “0x000xxxxxxxabc” to ${address}.
+ 
+ 
+ [1] https://uefi.org/sites/default/files/resources/UEFI_Spec_2_9_2021_03_18.pdf
+ [2] https://edk2-docs.gitbook.io/edk-ii-uefi-driver-writer-s-guide/4_general_driver_design_guidelines/readme.2/423_use_uefi_memory_allocation_services
+ 
+ ---
+ 
  Upgraded from 19.04 to current 19.10 using "do-release-upgrade -d". Can
  still boot using the previous 5.0.0-25-generic kernel, but the
  5.2.0-15-generic fails to start.
  
  On selecting Ubuntu from Grub, the message "error: out of memory." is
  immediately shown. Pressing a key attempts to start boot-up but fails to
  mount root fs.
  
  Machine is HP Spectre X360 with 8GB RAM. Under kernel 5.0.0, free shows
  the following (run from Gnome terminal):
  
                total        used        free      shared  buff/cache   available
  Mem:        7906564     1761196     3833240     1020216     2312128     4849224
  Swap:       1003516           0     1003516
  
  Kernel packages installed:
  
  linux-generic                              5.2.0.15.16 amd64
  linux-headers-5.2.0-15                     5.2.0-15.16 all
  linux-headers-5.2.0-15-generic             5.2.0-15.16 amd64
  linux-headers-generic                      5.2.0.15.16 amd64
  linux-image-5.0.0-25-generic               5.0.0-25.26 amd64
  linux-image-5.2.0-15-generic               5.2.0-15.16+signed1 amd64
  linux-image-generic                        5.2.0.15.16 amd64
  linux-modules-5.0.0-25-generic             5.0.0-25.26 amd64
  linux-modules-5.2.0-15-generic             5.2.0-15.16 amd64
  linux-modules-extra-5.0.0-25-generic       5.0.0-25.26 amd64
  linux-modules-extra-5.2.0-15-generic       5.2.0-15.16 amd64
  
  Photo of kernel panic attached.
  
  NVMe drive partition layout (GPT):
  
  Device           Start        End   Sectors   Size Type
  /dev/nvme0n1p1    2048    1050623   1048576   512M EFI System
  /dev/nvme0n1p2 1050624    2549759   1499136   732M Linux filesystem
  /dev/nvme0n1p3 2549760 1000214527 997664768 475.7G Linux filesystem
  
  $ sudo pvs
    PV                          VG        Fmt  Attr PSize    PFree
    /dev/mapper/nvme0n1p3_crypt ubuntu-vg lvm2 a--  <475.71g    0
  
  $ sudo lvs
    LV     VG        Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
    root   ubuntu-vg -wi-ao---- 474.75g
    swap_1 ubuntu-vg -wi-ao---- 980.00m
  
  Partition 3 is LUKS encrypted. Root LV is ext4.
- --- 
+ ---
  ProblemType: Bug
  ApportVersion: 2.20.11-0ubuntu7
  Architecture: amd64
  AudioDevicesInUse:
-  USER        PID ACCESS COMMAND
-  /dev/snd/controlC0:  gmckeown   1647 F.... pulseaudio
+  USER        PID ACCESS COMMAND
+  /dev/snd/controlC0:  gmckeown   1647 F.... pulseaudio
  CurrentDesktop: ubuntu:GNOME
  DistroRelease: Ubuntu 19.10
  InstallationDate: Installed on 2019-08-15 (18 days ago)
  InstallationMedia: Ubuntu 19.04 "Disco Dingo" - Release amd64 (20190416)
  Lsusb:
-  Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
-  Bus 001 Device 003: ID 8087:0a2b Intel Corp. 
-  Bus 001 Device 002: ID 04f2:b593 Chicony Electronics Co., Ltd HP Wide Vision FHD Camera
-  Bus 001 Device 004: ID 046d:c52b Logitech, Inc. Unifying Receiver
-  Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
+  Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
+  Bus 001 Device 003: ID 8087:0a2b Intel Corp.
+  Bus 001 Device 002: ID 04f2:b593 Chicony Electronics Co., Ltd HP Wide Vision FHD Camera
+  Bus 001 Device 004: ID 046d:c52b Logitech, Inc. Unifying Receiver
+  Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
  MachineType: HP HP Spectre x360 Convertible 13-ae0xx
  Package: linux (not installed)
  ProcFB: 0 inteldrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.0.0-25-generic root=/dev/mapper/ubuntu--vg-root ro quiet splash
  ProcVersionSignature: Ubuntu 5.0.0-25.26-generic 5.0.18
  RelatedPackageVersions:
-  linux-restricted-modules-5.0.0-25-generic N/A
-  linux-backports-modules-5.0.0-25-generic  N/A
-  linux-firmware                            1.181
+  linux-restricted-modules-5.0.0-25-generic N/A
+  linux-backports-modules-5.0.0-25-generic  N/A
+  linux-firmware                            1.181
  Tags:  eoan
  Uname: Linux 5.0.0-25-generic x86_64
  UpgradeStatus: Upgraded to eoan on 2019-09-02 (0 days ago)
  UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
  _MarkForUpload: True
  dmi.bios.date: 05/17/2019
  dmi.bios.vendor: AMI
  dmi.bios.version: F.25
  dmi.board.asset.tag: Base Board Asset Tag
  dmi.board.name: 83B9
  dmi.board.vendor: HP
  dmi.board.version: 56.43
  dmi.chassis.type: 31
  dmi.chassis.vendor: HP
  dmi.chassis.version: Chassis Version
  dmi.modalias: dmi:bvnAMI:bvrF.25:bd05/17/2019:svnHP:pnHPSpectrex360Convertible13-ae0xx:pvr:rvnHP:rn83B9:rvr56.43:cvnHP:ct31:cvrChassisVersion:
  dmi.product.family: 103C_5335KV HP Spectre
  dmi.product.name: HP Spectre x360 Convertible 13-ae0xx
  dmi.product.sku: 2QH38EA#ABU
  dmi.sys.vendor: HP

** Description changed:

  [Impact]
  
-  * In some cases, if the users’ initramfs grow bigger, then it’ll likely
+  * In some cases, if the users’ initramfs grow bigger, then it’ll likely
  not be able to be loaded by grub2.
  
-  * Some real cases from OEM projects, in many built-in 4k monitor
- laptops with nvidia drivers, the u-d-c puts the nvidia*.ko to initramfs
- which grows the initramfs to ~120M. Also the gfxpayload=auto will remain
- to use 4K resolution since it’s what EFI POST passed.
- 
- In this case, the grub isn't able to load initramfs because the grub_memalign() won't be able to get suitable memory for the larger file:
+  * Some real cases from OEM projects:
+ 
+ In many built-in 4k monitor laptops with nvidia drivers, the u-d-c puts
+ the nvidia*.ko to initramfs which grows the initramfs to ~120M. Also the
+ gfxpayload=auto will remain to use 4K resolution since it’s what EFI
+ POST passed.
+ 
+ In this case, the grub isn't able to load initramfs because the
+ grub_memalign() won't be able to get suitable memory for the larger
+ file:
+ 
  ```
  #0 grub_memalign (align=1, size=592214020) at ../../../grub-core/kern/mm.c:376
  #1 0x000000007dd7b074 in grub_malloc (size=592214020) at ../../../grub-core/kern/mm.c:408
  #2 0x000000007dd7a2c8 in grub_verifiers_open (io=0x7bc02d80, type=131076)
-     at ../../../grub-core/kern/verifiers.c:150
+     at ../../../grub-core/kern/verifiers.c:150
  #3 0x000000007dd801d4 in grub_file_open (name=0x7bc02f00 "/boot/initrd.img-5.17.0-1011-oem",
-     type=131076) at ../../../grub-core/kern/file.c:121
+     type=131076) at ../../../grub-core/kern/file.c:121
  #4 0x000000007bcd5a30 in ?? ()
  #5 0x000000007fe21247 in ?? ()
  #6 0x000000007bc030c8 in ?? ()
  #7 0x000000017fe21238 in ?? ()
  #8 0x000000007bcd5320 in ?? ()
  #9 0x000000007fe21250 in ?? ()
  #10 0x0000000000000000 in ?? ()
  ```
  
- Based on grub_mm_dump, we can see the memory fragment (some parts seem likely be used because of 4K resolution?) and doesn’t have available contiguous memory for larger file as:
+ Based on grub_mm_dump, we can see the memory fragment (some parts seem
+ likely be used because of 4K resolution?) and doesn’t have available
+ contiguous memory for larger file as:
+ 
  ```
  grub_real_malloc(...)
- …
+ ...
  if (cur->size >= n + extra)
  ```
  
  Based on UEFI Specification Section 7.2[1] and UEFI driver writers’
- guide 4.2.3[2], we can ask 32bits+ on AllocatePages()
+ guide 4.2.3[2], we can ask 32bits+ on AllocatePages().
  
  As most X86_64 platforms should support 64 bits addressing, we should
  extend GRUB_EFI_MAX_USABLE_ADDRESS to 64 bits to get more available
  memory.
  
-  * When users grown the initramfs, then probably will get initramfs not
+  * When users grown the initramfs, then probably will get initramfs not
  found which really annoyed and impact the user experience (system not
  able to boot).
  
  [Test Plan]
  
-  * detailed instructions how to reproduce the bug:
- Any method to grow the initramfs, such as install nvidia-driver.
- If developers would like to reproduce, then could dd if=/dev/random of=... bs=1M count=500, something like:
+  * detailed instructions how to reproduce the bug:
+ 
+ 1. Any method to grow the initramfs, such as install nvidia-driver.
+ 
+ 2. If developers would like to reproduce, then could dd if=/dev/random
+ of=... bs=1M count=500, something like:
+ 
  ```
  $ cat /usr/share/initramfs-tools/hooks/zzz-touch-a-file
  #!/bin/sh
  
  PREREQ=""
  
  prereqs()
  {
-         echo "$PREREQ"
+         echo "$PREREQ"
  }
  
  case $1 in
  # get pre-requisites
  prereqs)
-         prereqs
-         exit 0
-         ;;
+         prereqs
+         exit 0
+         ;;
  esac
  
  . /usr/share/initramfs-tools/hook-functions
  dd if=/dev/random of=${DESTDIR}/test-500M bs=1M count=500
  ```
- And update-initramfs
- 
-  * After applying my patches, the issue is gone.
- 
-  * I did also test my test grubx64.efi in
- X86_64 qemu with 
- 60M initramfs + 5.15.0-37-generic kernel
- 565M initramfs + 5.17.0-1011-oem kernel
- Amd64 HP mobile workstation with
- 65M initramfs + 5.15.0-39-generic kernel
- 771M initramfs + 5.17.0-1011-oem kernel
+ 
+ And then update-initramfs
+ 
+  * After applying my patches, the issue is gone.
+ 
+  * I did also test my test grubx64.efi in:
+ 
+ 1. X86_64 qemu with
+ 1.1. 60M initramfs + 5.15.0-37-generic kernel
+ 1.2. 565M initramfs + 5.17.0-1011-oem kernel
+ 
+ 2. Amd64 HP mobile workstation with
+ 2.1. 65M initramfs + 5.15.0-39-generic kernel
+ 2.2. 771M initramfs + 5.17.0-1011-oem kernel
+ 
  All working well.
  
  [Where problems could occur]
  
- * The changes almost in i386/efi, thus the impact will be in the i386 /
- x86_64 EFI system. The other change is to modify the “grub-
- core/kern/efi/mm.c” but I use the original addressing for
- “arm/arm64/ia64/riscv32/riscv64”. Thus it should not impact them.
- 
- * There is a “#if defined(__x86_64__)” which intent to limit the > 32bits code in i386 system and also
- ```
-  #if defined (__code_model_large__)
+ * The changes almost in i386/efi, thus the impact will be in the i386 / x86_64 EFI system.
+ The other change is to modify the “grub-core/kern/efi/mm.c” but I use the original addressing for “arm/arm64/ia64/riscv32/riscv64”.
+ Thus it should not impact them.
+ 
+ * There is a “#if defined(__x86_64__)” which intent to limit the >
+ 32bits code in i386 system and also
+ 
+ ```
+  #if defined (__code_model_large__)
  -#define GRUB_EFI_MAX_USABLE_ADDRESS 0xffffffff
  +#define GRUB_EFI_MAX_USABLE_ADDRESS __UINTPTR_MAX__
  +#define GRUB_EFI_MAX_ALLOCATION_ADDRESS 0x7fffffff
-  #else
-  #define GRUB_EFI_MAX_USABLE_ADDRESS 0x7fffffff
+  #else
+  #define GRUB_EFI_MAX_USABLE_ADDRESS 0x7fffffff
  +#define GRUB_EFI_MAX_ALLOCATION_ADDRESS 0x3fffffff
-  #endif
- ```
+  #endif
+ ```
+ 
  If everything works as expected, then i386 should working good.
- If not lucky, based on “UEFI writers’ guide”[2], the i386 will get > 4GB memory region and never be able to access.
+ 
+ If not lucky, based on “UEFI writers’ guide”[2], the i386 will get > 4GB
+ memory region and never be able to access.
  
  [Other Info]
-  
-  * Upstream grub2 bug #61058
+ 
+  * Upstream grub2 bug #61058
  https://savannah.gnu.org/bugs/index.php?61058
-  * Test PPA: https://launchpad.net/~os369510/+archive/ubuntu/lp1842320
-  * Test grubx64.efi: https://people.canonical.com/~jeremysu/lp1842320/grubx64.efi.lp1842320
-  * Test source code: https://github.com/os369510/grub2/tree/lp1842320
-  * If you built the package, then test grubx64.efi is under “obj/monolithic/grub-efi-amd64/grubx64.efi”, in my case: `/var/cache/pbuilder/build/276481/build/grub2-2.06/obj/monolithic/grub-efi-amd64/grubx64.efi`
-  * My build command: `sudo PBSHELL=1 pbuilder build --hookdir ~/hook-dir ubuntu-grub/grub2_2.06-2ubuntu7+jeremydev2.dsc 2>&1 | tee build.log`
-  * My qemu command: `qemu-system-x86_64 -bios edk2/Build/OvmfX64/DEBUG_GCC5/FV/OVMF.fd -hda Templates/grub.qcow2 -m 6G -vga cirrus -smp 8 -machine type=q35,accel=kvm -cpu host -enable-kvm -boot menu=on` (I built an edk2 binary with debugging log)
-  * You can use my grubx64.efi with debug symbols from https://people.canonical.com/~jeremysu/lp1842320/grubx64.efi.lp1842320-dev-with-debug-symbols and source code is from https://github.com/os369510/grub2/tree/jeremy-dev . After built the package from source code, then you can use gdb to attach the qemu session as:
+ 
+  * Test PPA: https://launchpad.net/~os369510/+archive/ubuntu/lp1842320
+ 
+  * Test grubx64.efi:
+ https://people.canonical.com/~jeremysu/lp1842320/grubx64.efi.lp1842320
+ 
+  * Test source code: https://github.com/os369510/grub2/tree/lp1842320
+ 
+  * If you built the package, then test grubx64.efi is under
+ “obj/monolithic/grub-efi-amd64/grubx64.efi”, in my case:
+ `/var/cache/pbuilder/build/276481/build/grub2-2.06/obj/monolithic/grub-
+ efi-amd64/grubx64.efi`
+ 
+  * My build command: `sudo PBSHELL=1 pbuilder build --hookdir ~/hook-dir
+ ubuntu-grub/grub2_2.06-2ubuntu7+jeremydev2.dsc 2>&1 | tee build.log`
+ 
+  * My qemu command: `qemu-system-x86_64 -bios
+ edk2/Build/OvmfX64/DEBUG_GCC5/FV/OVMF.fd -hda Templates/grub.qcow2 -m 6G
+ -vga cirrus -smp 8 -machine type=q35,accel=kvm -cpu host -enable-kvm
+ -boot menu=on` (I built an edk2 binary with debugging log)
+ 
+  * You can use my grubx64.efi with debug symbols from
+ https://people.canonical.com/~jeremysu/lp1842320/grubx64.efi.lp1842320-dev-
+ with-debug-symbols and source code is from
+ https://github.com/os369510/grub2/tree/jeremy-dev .
+ 
+ After built the package from source code, then you can use gdb to attach
+ the qemu session as:
+ 
  ```
  ubuntu at ubuntu-HP-ZBook-Fury-16-G9-Mobile-Workstation-PC [ /var/cache/pbuilder/build/35354/tmp/buildd/grub2-2.06/obj/grub-efi-amd64/grub-core ]
  $ gdb -x gdb_grub # with “add-symbol-file kernel.img ${address}
  ```
+ 
  The address above can read from qemu serial port and found the last
  “Loading driver at 0x000xxxxxxxxxx EntryPoint=0x000xxxxxxxabc”
+ 
  In above case, fill “0x000xxxxxxxabc” to ${address}.
- 
  
  [1] https://uefi.org/sites/default/files/resources/UEFI_Spec_2_9_2021_03_18.pdf
  [2] https://edk2-docs.gitbook.io/edk-ii-uefi-driver-writer-s-guide/4_general_driver_design_guidelines/readme.2/423_use_uefi_memory_allocation_services
  
  ---
  
  Upgraded from 19.04 to current 19.10 using "do-release-upgrade -d". Can
  still boot using the previous 5.0.0-25-generic kernel, but the
  5.2.0-15-generic fails to start.
  
  On selecting Ubuntu from Grub, the message "error: out of memory." is
  immediately shown. Pressing a key attempts to start boot-up but fails to
  mount root fs.
  
  Machine is HP Spectre X360 with 8GB RAM. Under kernel 5.0.0, free shows
  the following (run from Gnome terminal):
  
                total        used        free      shared  buff/cache   available
  Mem:        7906564     1761196     3833240     1020216     2312128     4849224
  Swap:       1003516           0     1003516
  
  Kernel packages installed:
  
  linux-generic                              5.2.0.15.16 amd64
  linux-headers-5.2.0-15                     5.2.0-15.16 all
  linux-headers-5.2.0-15-generic             5.2.0-15.16 amd64
  linux-headers-generic                      5.2.0.15.16 amd64
  linux-image-5.0.0-25-generic               5.0.0-25.26 amd64
  linux-image-5.2.0-15-generic               5.2.0-15.16+signed1 amd64
  linux-image-generic                        5.2.0.15.16 amd64
  linux-modules-5.0.0-25-generic             5.0.0-25.26 amd64
  linux-modules-5.2.0-15-generic             5.2.0-15.16 amd64
  linux-modules-extra-5.0.0-25-generic       5.0.0-25.26 amd64
  linux-modules-extra-5.2.0-15-generic       5.2.0-15.16 amd64
  
  Photo of kernel panic attached.
  
  NVMe drive partition layout (GPT):
  
  Device           Start        End   Sectors   Size Type
  /dev/nvme0n1p1    2048    1050623   1048576   512M EFI System
  /dev/nvme0n1p2 1050624    2549759   1499136   732M Linux filesystem
  /dev/nvme0n1p3 2549760 1000214527 997664768 475.7G Linux filesystem
  
  $ sudo pvs
    PV                          VG        Fmt  Attr PSize    PFree
    /dev/mapper/nvme0n1p3_crypt ubuntu-vg lvm2 a--  <475.71g    0
  
  $ sudo lvs
    LV     VG        Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
    root   ubuntu-vg -wi-ao---- 474.75g
    swap_1 ubuntu-vg -wi-ao---- 980.00m
  
  Partition 3 is LUKS encrypted. Root LV is ext4.
  ---
  ProblemType: Bug
  ApportVersion: 2.20.11-0ubuntu7
  Architecture: amd64
  AudioDevicesInUse:
   USER        PID ACCESS COMMAND
   /dev/snd/controlC0:  gmckeown   1647 F.... pulseaudio
  CurrentDesktop: ubuntu:GNOME
  DistroRelease: Ubuntu 19.10
  InstallationDate: Installed on 2019-08-15 (18 days ago)
  InstallationMedia: Ubuntu 19.04 "Disco Dingo" - Release amd64 (20190416)
  Lsusb:
   Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
   Bus 001 Device 003: ID 8087:0a2b Intel Corp.
   Bus 001 Device 002: ID 04f2:b593 Chicony Electronics Co., Ltd HP Wide Vision FHD Camera
   Bus 001 Device 004: ID 046d:c52b Logitech, Inc. Unifying Receiver
   Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
  MachineType: HP HP Spectre x360 Convertible 13-ae0xx
  Package: linux (not installed)
  ProcFB: 0 inteldrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.0.0-25-generic root=/dev/mapper/ubuntu--vg-root ro quiet splash
  ProcVersionSignature: Ubuntu 5.0.0-25.26-generic 5.0.18
  RelatedPackageVersions:
   linux-restricted-modules-5.0.0-25-generic N/A
   linux-backports-modules-5.0.0-25-generic  N/A
   linux-firmware                            1.181
  Tags:  eoan
  Uname: Linux 5.0.0-25-generic x86_64
  UpgradeStatus: Upgraded to eoan on 2019-09-02 (0 days ago)
  UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
  _MarkForUpload: True
  dmi.bios.date: 05/17/2019
  dmi.bios.vendor: AMI
  dmi.bios.version: F.25
  dmi.board.asset.tag: Base Board Asset Tag
  dmi.board.name: 83B9
  dmi.board.vendor: HP
  dmi.board.version: 56.43
  dmi.chassis.type: 31
  dmi.chassis.vendor: HP
  dmi.chassis.version: Chassis Version
  dmi.modalias: dmi:bvnAMI:bvrF.25:bd05/17/2019:svnHP:pnHPSpectrex360Convertible13-ae0xx:pvr:rvnHP:rn83B9:rvr56.43:cvnHP:ct31:cvrChassisVersion:
  dmi.product.family: 103C_5335KV HP Spectre
  dmi.product.name: HP Spectre x360 Convertible 13-ae0xx
  dmi.product.sku: 2QH38EA#ABU
  dmi.sys.vendor: HP

** Changed in: oem-priority
       Status: In Progress => Triaged

-- 
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to initramfs-tools in Ubuntu.
https://bugs.launchpad.net/bugs/1842320

Title:
  Out of Memory on boot with 5.2.0 kernel

Status in grub:
  Unknown
Status in OEM Priority Project:
  Triaged
Status in grub2-signed package in Ubuntu:
  Confirmed
Status in initramfs-tools package in Ubuntu:
  Confirmed
Status in linux package in Ubuntu:
  Confirmed

Bug description:
  [Impact]

   * In some cases, if the users’ initramfs grow bigger, then it’ll
  likely not be able to be loaded by grub2.

   * Some real cases from OEM projects:

  In many built-in 4k monitor laptops with nvidia drivers, the u-d-c
  puts the nvidia*.ko to initramfs which grows the initramfs to ~120M.
  Also the gfxpayload=auto will remain to use 4K resolution since it’s
  what EFI POST passed.

  In this case, the grub isn't able to load initramfs because the
  grub_memalign() won't be able to get suitable memory for the larger
  file:

  ```
  #0 grub_memalign (align=1, size=592214020) at ../../../grub-core/kern/mm.c:376
  #1 0x000000007dd7b074 in grub_malloc (size=592214020) at ../../../grub-core/kern/mm.c:408
  #2 0x000000007dd7a2c8 in grub_verifiers_open (io=0x7bc02d80, type=131076)
      at ../../../grub-core/kern/verifiers.c:150
  #3 0x000000007dd801d4 in grub_file_open (name=0x7bc02f00 "/boot/initrd.img-5.17.0-1011-oem",
      type=131076) at ../../../grub-core/kern/file.c:121
  #4 0x000000007bcd5a30 in ?? ()
  #5 0x000000007fe21247 in ?? ()
  #6 0x000000007bc030c8 in ?? ()
  #7 0x000000017fe21238 in ?? ()
  #8 0x000000007bcd5320 in ?? ()
  #9 0x000000007fe21250 in ?? ()
  #10 0x0000000000000000 in ?? ()
  ```

  Based on grub_mm_dump, we can see the memory fragment (some parts seem
  likely be used because of 4K resolution?) and doesn’t have available
  contiguous memory for larger file as:

  ```
  grub_real_malloc(...)
  ...
  if (cur->size >= n + extra)
  ```

  Based on UEFI Specification Section 7.2[1] and UEFI driver writers’
  guide 4.2.3[2], we can ask 32bits+ on AllocatePages().

  As most X86_64 platforms should support 64 bits addressing, we should
  extend GRUB_EFI_MAX_USABLE_ADDRESS to 64 bits to get more available
  memory.

   * When users grown the initramfs, then probably will get initramfs
  not found which really annoyed and impact the user experience (system
  not able to boot).

  [Test Plan]

   * detailed instructions how to reproduce the bug:

  1. Any method to grow the initramfs, such as install nvidia-driver.

  2. If developers would like to reproduce, then could dd if=/dev/random
  of=... bs=1M count=500, something like:

  ```
  $ cat /usr/share/initramfs-tools/hooks/zzz-touch-a-file
  #!/bin/sh

  PREREQ=""

  prereqs()
  {
          echo "$PREREQ"
  }

  case $1 in
  # get pre-requisites
  prereqs)
          prereqs
          exit 0
          ;;
  esac

  . /usr/share/initramfs-tools/hook-functions
  dd if=/dev/random of=${DESTDIR}/test-500M bs=1M count=500
  ```

  And then update-initramfs

   * After applying my patches, the issue is gone.

   * I did also test my test grubx64.efi in:

  1. X86_64 qemu with
  1.1. 60M initramfs + 5.15.0-37-generic kernel
  1.2. 565M initramfs + 5.17.0-1011-oem kernel

  2. Amd64 HP mobile workstation with
  2.1. 65M initramfs + 5.15.0-39-generic kernel
  2.2. 771M initramfs + 5.17.0-1011-oem kernel

  All working well.

  [Where problems could occur]

  * The changes almost in i386/efi, thus the impact will be in the i386 / x86_64 EFI system.
  The other change is to modify the “grub-core/kern/efi/mm.c” but I use the original addressing for “arm/arm64/ia64/riscv32/riscv64”.
  Thus it should not impact them.

  * There is a “#if defined(__x86_64__)” which intent to limit the >
  32bits code in i386 system and also

  ```
   #if defined (__code_model_large__)
  -#define GRUB_EFI_MAX_USABLE_ADDRESS 0xffffffff
  +#define GRUB_EFI_MAX_USABLE_ADDRESS __UINTPTR_MAX__
  +#define GRUB_EFI_MAX_ALLOCATION_ADDRESS 0x7fffffff
   #else
   #define GRUB_EFI_MAX_USABLE_ADDRESS 0x7fffffff
  +#define GRUB_EFI_MAX_ALLOCATION_ADDRESS 0x3fffffff
   #endif
  ```

  If everything works as expected, then i386 should working good.

  If not lucky, based on “UEFI writers’ guide”[2], the i386 will get >
  4GB memory region and never be able to access.

  [Other Info]

   * Upstream grub2 bug #61058
  https://savannah.gnu.org/bugs/index.php?61058

   * Test PPA: https://launchpad.net/~os369510/+archive/ubuntu/lp1842320

   * Test grubx64.efi:
  https://people.canonical.com/~jeremysu/lp1842320/grubx64.efi.lp1842320

   * Test source code: https://github.com/os369510/grub2/tree/lp1842320

   * If you built the package, then test grubx64.efi is under
  “obj/monolithic/grub-efi-amd64/grubx64.efi”, in my case:
  `/var/cache/pbuilder/build/276481/build/grub2-2.06/obj/monolithic/grub-
  efi-amd64/grubx64.efi`

   * My build command: `sudo PBSHELL=1 pbuilder build --hookdir ~/hook-
  dir ubuntu-grub/grub2_2.06-2ubuntu7+jeremydev2.dsc 2>&1 | tee
  build.log`

   * My qemu command: `qemu-system-x86_64 -bios
  edk2/Build/OvmfX64/DEBUG_GCC5/FV/OVMF.fd -hda Templates/grub.qcow2 -m
  6G -vga cirrus -smp 8 -machine type=q35,accel=kvm -cpu host -enable-
  kvm -boot menu=on` (I built an edk2 binary with debugging log)

   * You can use my grubx64.efi with debug symbols from
  https://people.canonical.com/~jeremysu/lp1842320/grubx64.efi.lp1842320-dev-
  with-debug-symbols and source code is from
  https://github.com/os369510/grub2/tree/jeremy-dev .

  After built the package from source code, then you can use gdb to
  attach the qemu session as:

  ```
  ubuntu at ubuntu-HP-ZBook-Fury-16-G9-Mobile-Workstation-PC [ /var/cache/pbuilder/build/35354/tmp/buildd/grub2-2.06/obj/grub-efi-amd64/grub-core ]
  $ gdb -x gdb_grub # with “add-symbol-file kernel.img ${address}
  ```

  The address above can read from qemu serial port and found the last
  “Loading driver at 0x000xxxxxxxxxx EntryPoint=0x000xxxxxxxabc”

  In above case, fill “0x000xxxxxxxabc” to ${address}.

  [1] https://uefi.org/sites/default/files/resources/UEFI_Spec_2_9_2021_03_18.pdf
  [2] https://edk2-docs.gitbook.io/edk-ii-uefi-driver-writer-s-guide/4_general_driver_design_guidelines/readme.2/423_use_uefi_memory_allocation_services

  ---

  Upgraded from 19.04 to current 19.10 using "do-release-upgrade -d".
  Can still boot using the previous 5.0.0-25-generic kernel, but the
  5.2.0-15-generic fails to start.

  On selecting Ubuntu from Grub, the message "error: out of memory." is
  immediately shown. Pressing a key attempts to start boot-up but fails
  to mount root fs.

  Machine is HP Spectre X360 with 8GB RAM. Under kernel 5.0.0, free
  shows the following (run from Gnome terminal):

                total        used        free      shared  buff/cache   available
  Mem:        7906564     1761196     3833240     1020216     2312128     4849224
  Swap:       1003516           0     1003516

  Kernel packages installed:

  linux-generic                              5.2.0.15.16 amd64
  linux-headers-5.2.0-15                     5.2.0-15.16 all
  linux-headers-5.2.0-15-generic             5.2.0-15.16 amd64
  linux-headers-generic                      5.2.0.15.16 amd64
  linux-image-5.0.0-25-generic               5.0.0-25.26 amd64
  linux-image-5.2.0-15-generic               5.2.0-15.16+signed1 amd64
  linux-image-generic                        5.2.0.15.16 amd64
  linux-modules-5.0.0-25-generic             5.0.0-25.26 amd64
  linux-modules-5.2.0-15-generic             5.2.0-15.16 amd64
  linux-modules-extra-5.0.0-25-generic       5.0.0-25.26 amd64
  linux-modules-extra-5.2.0-15-generic       5.2.0-15.16 amd64

  Photo of kernel panic attached.

  NVMe drive partition layout (GPT):

  Device           Start        End   Sectors   Size Type
  /dev/nvme0n1p1    2048    1050623   1048576   512M EFI System
  /dev/nvme0n1p2 1050624    2549759   1499136   732M Linux filesystem
  /dev/nvme0n1p3 2549760 1000214527 997664768 475.7G Linux filesystem

  $ sudo pvs
    PV                          VG        Fmt  Attr PSize    PFree
    /dev/mapper/nvme0n1p3_crypt ubuntu-vg lvm2 a--  <475.71g    0

  $ sudo lvs
    LV     VG        Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
    root   ubuntu-vg -wi-ao---- 474.75g
    swap_1 ubuntu-vg -wi-ao---- 980.00m

  Partition 3 is LUKS encrypted. Root LV is ext4.
  ---
  ProblemType: Bug
  ApportVersion: 2.20.11-0ubuntu7
  Architecture: amd64
  AudioDevicesInUse:
   USER        PID ACCESS COMMAND
   /dev/snd/controlC0:  gmckeown   1647 F.... pulseaudio
  CurrentDesktop: ubuntu:GNOME
  DistroRelease: Ubuntu 19.10
  InstallationDate: Installed on 2019-08-15 (18 days ago)
  InstallationMedia: Ubuntu 19.04 "Disco Dingo" - Release amd64 (20190416)
  Lsusb:
   Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
   Bus 001 Device 003: ID 8087:0a2b Intel Corp.
   Bus 001 Device 002: ID 04f2:b593 Chicony Electronics Co., Ltd HP Wide Vision FHD Camera
   Bus 001 Device 004: ID 046d:c52b Logitech, Inc. Unifying Receiver
   Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
  MachineType: HP HP Spectre x360 Convertible 13-ae0xx
  Package: linux (not installed)
  ProcFB: 0 inteldrmfb
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-5.0.0-25-generic root=/dev/mapper/ubuntu--vg-root ro quiet splash
  ProcVersionSignature: Ubuntu 5.0.0-25.26-generic 5.0.18
  RelatedPackageVersions:
   linux-restricted-modules-5.0.0-25-generic N/A
   linux-backports-modules-5.0.0-25-generic  N/A
   linux-firmware                            1.181
  Tags:  eoan
  Uname: Linux 5.0.0-25-generic x86_64
  UpgradeStatus: Upgraded to eoan on 2019-09-02 (0 days ago)
  UserGroups: adm cdrom dip lpadmin plugdev sambashare sudo
  _MarkForUpload: True
  dmi.bios.date: 05/17/2019
  dmi.bios.vendor: AMI
  dmi.bios.version: F.25
  dmi.board.asset.tag: Base Board Asset Tag
  dmi.board.name: 83B9
  dmi.board.vendor: HP
  dmi.board.version: 56.43
  dmi.chassis.type: 31
  dmi.chassis.vendor: HP
  dmi.chassis.version: Chassis Version
  dmi.modalias: dmi:bvnAMI:bvrF.25:bd05/17/2019:svnHP:pnHPSpectrex360Convertible13-ae0xx:pvr:rvnHP:rn83B9:rvr56.43:cvnHP:ct31:cvrChassisVersion:
  dmi.product.family: 103C_5335KV HP Spectre
  dmi.product.name: HP Spectre x360 Convertible 13-ae0xx
  dmi.product.sku: 2QH38EA#ABU
  dmi.sys.vendor: HP

To manage notifications about this bug go to:
https://bugs.launchpad.net/grub/+bug/1842320/+subscriptions




More information about the foundations-bugs mailing list