[Bug 1978079] Re: EFI pstore not cleared on boot
Feysel Mohammed
1978079 at bugs.launchpad.net
Wed Sep 20 18:46:38 UTC 2023
root at bu-lab26v-oob:~# cat /etc/mlnx-release
DOCA_2.2.0_BSP_4.2.1_Ubuntu_20.04-2.sru.5.4.0-1070
root at bu-lab26v-oob:~# uname -a
Linux bu-lab26v-oob 5.4.0-1070-bluefield #76-Ubuntu SMP PREEMPT Wed Aug 30 16:56:35 UTC 2023 aarch64 aarch64 aarch64 GNU/Linux
root at bu-lab26v-oob:~# cat /sys/module/pstore/parameters/backend
efi
root at bu-lab26v-oob:~# systemctl status systemd-pstore.service
● systemd-pstore.service - Platform Persistent Storage Archival
Loaded: loaded (/lib/systemd/system/systemd-pstore.service; enabled; vendor preset: enabled)
Active: inactive (dead)
Condition: start condition failed at Tue 2023-09-19 14:16:32 UTC; 1 day 1h ago
└─ ConditionDirectoryNotEmpty=/sys/fs/pstore was not met
Docs: man:systemd-pstore(8)
Sep 19 14:16:28 localhost systemd[1]: Condition check resulted in Platform Persistent Storage Archival being skipped.
Sep 19 14:16:32 localhost systemd[1]: Condition check resulted in Platform Persistent Storage Archival being skipped.
root at bu-lab26v-oob:~# echo 1 > /proc/sys/kernel/sysrq
root at bu-lab26v-oob:~# echo 1 > /proc/sys/kernel/panic
root at bu-lab26v-oob:~# echo "c" > /proc/sysrq-trigger
*system rebooted*
root at bu-lab26v-oob:~# ls /sys/fs/pstore
root at bu-lab26v-oob:~# ls /var/lib/systemd/pstore
169522441 169522442
root at bu-lab26v-oob:~# systemctl status systemd-pstore.service
● systemd-pstore.service - Platform Persistent Storage Archival
Loaded: loaded (/lib/systemd/system/systemd-pstore.service; enabled; vendor preset: enabled)
Active: active (exited) since Wed 2023-09-20 15:41:29 UTC; 56s ago
Docs: man:systemd-pstore(8)
Process: 485 ExecStart=/lib/systemd/systemd-pstore (code=exited, status=0/SUCCESS)
Main PID: 485 (code=exited, status=0/SUCCESS)
Sep 20 15:41:29 localhost systemd-pstore[485]: PStore dmesg-efi-169522441409001 moved to /var/lib/systemd/pstore/169522441/dmesg-efi-169522441409001
Sep 20 15:41:29 localhost systemd-pstore[485]: PStore dmesg-efi-169522441408001 moved to /var/lib/systemd/pstore/169522441/dmesg-efi-169522441408001
Sep 20 15:41:29 localhost systemd-pstore[485]: PStore dmesg-efi-169522441407001 moved to /var/lib/systemd/pstore/169522441/dmesg-efi-169522441407001
Sep 20 15:41:29 localhost systemd-pstore[485]: PStore dmesg-efi-169522441406001 moved to /var/lib/systemd/pstore/169522441/dmesg-efi-169522441406001
Sep 20 15:41:29 localhost systemd-pstore[485]: PStore dmesg-efi-169522441405001 moved to /var/lib/systemd/pstore/169522441/dmesg-efi-169522441405001
Sep 20 15:41:29 localhost systemd-pstore[485]: PStore dmesg-efi-169522441304001 moved to /var/lib/systemd/pstore/169522441/dmesg-efi-169522441304001
Sep 20 15:41:29 localhost systemd-pstore[485]: PStore dmesg-efi-169522441303001 moved to /var/lib/systemd/pstore/169522441/dmesg-efi-169522441303001
Sep 20 15:41:29 localhost systemd-pstore[485]: PStore dmesg-efi-169522441302001 moved to /var/lib/systemd/pstore/169522441/dmesg-efi-169522441302001
Sep 20 15:41:29 localhost systemd-pstore[485]: PStore dmesg-efi-169522441301001 moved to /var/lib/systemd/pstore/169522441/dmesg-efi-169522441301001
Sep 20 15:41:29 localhost systemd[1]: Finished Platform Persistent Storage Archival.
** Tags removed: verification-needed-focal-linux-bluefield
** Tags added: verification-done-focal-linux-bluefield
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to systemd in Ubuntu.
https://bugs.launchpad.net/bugs/1978079
Title:
EFI pstore not cleared on boot
Status in linux-bluefield package in Ubuntu:
New
Status in systemd package in Ubuntu:
Fix Released
Status in linux-bluefield source package in Focal:
Fix Committed
Status in systemd source package in Focal:
Fix Released
Status in systemd source package in Impish:
Won't Fix
Status in linux-bluefield source package in Jammy:
Fix Committed
Status in systemd source package in Jammy:
Fix Released
Status in systemd source package in Kinetic:
Fix Released
Bug description:
[Impact]
Systemd has a systemd-pstore component that scans the pstore on boot
and if non-empty, takes all previously created dumps, transfers them
into its journal and removes the pstore elements. This is very
important on UEFI systems, which only have a limited amount of space
for variables.
In Ubuntu, the kernel is configured with CONFIG_EFI_VARS_PSTORE=m
which means the EFI pstore support gets loaded dynamically. In all of
my boots, this dynamic module loading happened *after* systemd tried
to check for pstore variables. So systemd-pstore never starts and
never clears the UEFI variable store. I see this happening in AWS on
Graviton instances, which eventually run out of space to store the
dumps. On real hardware, this behavior may lead to unbootable systems.
```
$ systemctl status systemd-pstore
○ systemd-pstore.service - Platform Persistent Storage Archival
Loaded: loaded (/lib/systemd/system/systemd-pstore.service; enabled; vendor preset: enabled)
Active: inactive (dead)
Condition: start condition failed at Thu 2022-06-09 09:11:41 UTC; 29min ago
└─ ConditionDirectoryNotEmpty=/sys/fs/pstore was not met
Docs: man:systemd-pstore(8)
Jun 09 09:11:41 ip-172-31-0-61 systemd[1]: Condition check resulted in
Platform Persistent Storage Archival being skipped.
$ ls -la /sys/fs/pstore
total 0
drwxr-x--- 2 root root 0 Jun 9 09:11 .
drwxr-xr-x 8 root root 0 Jun 9 09:11 ..
-r--r--r-- 1 root root 1803 Jun 9 09:07 dmesg-efi-165476562001001
-r--r--r-- 1 root root 1777 Jun 9 09:07 dmesg-efi-165476562002001
-r--r--r-- 1 root root 1773 Jun 9 09:07 dmesg-efi-165476562003001
-r--r--r-- 1 root root 1815 Jun 9 09:07 dmesg-efi-165476562004001
-r--r--r-- 1 root root 1826 Jun 9 09:07 dmesg-efi-165476562005001
-r--r--r-- 1 root root 1754 Jun 9 09:07 dmesg-efi-165476562006001
-r--r--r-- 1 root root 1821 Jun 9 09:07 dmesg-efi-165476562007001
-r--r--r-- 1 root root 1767 Jun 9 09:07 dmesg-efi-165476562008001
-r--r--r-- 1 root root 1729 Jun 9 09:07 dmesg-efi-165476562009001
-r--r--r-- 1 root root 1819 Jun 9 09:07 dmesg-efi-165476562010001
-r--r--r-- 1 root root 1767 Jun 9 09:07 dmesg-efi-165476562011001
-r--r--r-- 1 root root 1775 Jun 9 09:07 dmesg-efi-165476562012001
-r--r--r-- 1 root root 1802 Jun 9 09:07 dmesg-efi-165476562013001
-r--r--r-- 1 root root 1812 Jun 9 09:07 dmesg-efi-165476562014001
-r--r--r-- 1 root root 1764 Jun 9 09:07 dmesg-efi-165476562015001
-r--r--r-- 1 root root 1795 Jun 9 09:11 dmesg-efi-165476589801001
-r--r--r-- 1 root root 1785 Jun 9 09:11 dmesg-efi-165476589802001
-r--r--r-- 1 root root 1683 Jun 9 09:11 dmesg-efi-165476589803001
-r--r--r-- 1 root root 1785 Jun 9 09:11 dmesg-efi-165476589804001
-r--r--r-- 1 root root 1771 Jun 9 09:11 dmesg-efi-165476589805001
-r--r--r-- 1 root root 1797 Jun 9 09:11 dmesg-efi-165476589806001
-r--r--r-- 1 root root 1805 Jun 9 09:11 dmesg-efi-165476589807001
-r--r--r-- 1 root root 1781 Jun 9 09:11 dmesg-efi-165476589808001
-r--r--r-- 1 root root 1806 Jun 9 09:11 dmesg-efi-165476589809001
-r--r--r-- 1 root root 1821 Jun 9 09:11 dmesg-efi-165476589810001
-r--r--r-- 1 root root 1763 Jun 9 09:11 dmesg-efi-165476589811001
-r--r--r-- 1 root root 1783 Jun 9 09:11 dmesg-efi-165476589812001
-r--r--r-- 1 root root 1788 Jun 9 09:11 dmesg-efi-165476589813001
-r--r--r-- 1 root root 1788 Jun 9 09:11 dmesg-efi-165476589814001
-r--r--r-- 1 root root 1786 Jun 9 09:11 dmesg-efi-165476589815001
```
This problem affects (at least) Ubuntu 20.04 and 22.04. A quick fix
would be to configure CONFIG_EFI_VARS_PSTORE=y so that it's always
available. A long term fix would make systemd rescan the directory
after all module probing settled.
[Test Plan]
In order to be able to reproduce this issue, the system must have EFI-
backed pstore.
To check which kind of backend that pstore, use `cat
/sys/module/pstore/parameters/backend`
If it says `efi`, the steps below are applicable. Otherwise, find an
environment that has EFI backed pstore.
# Enable the pstore service. This service is supposed to move the data in /sys/fs/pstore
# to the `/var/lib/systemd/pstore` path on boot.
systemctl enable systemd-pstore.service # (or can be vendor enabled)
# Crash the kernel
echo 1 > /proc/sys/kernel/sysrq
echo 1 > /proc/sys/kernel/panic # this is usually set to zero, causing kernel to loop over the panic and freeze
echo "c" > /proc/sysrq-trigger
# The system will reboot itself. Check `/sys/fs/pstore` path first:
ls /sys/fs/pstore # The path should not be empty, which means the systemd-pstore has failed to do its' job
ls /var/lib/systemd/pstore # The path should be empty.
# Apply the fix
sudo add-apt-repository ppa:mustafakemalgilor/lp-1978079-1
sudo apt upgrade
# Crash the kernel
echo 1 > /proc/sys/kernel/sysrq
echo 1 > /proc/sys/kernel/panic # this is usually set to zero, causing kernel to loop over the panic and freeze
echo "c" > /proc/sysrq-trigger
# The system will reboot itself. After reboot, the contents of the `/sys/fs/pstore` must have been moved to the `/var/lib/systemd/pstore` path.
ls /sys/fs/pstore # The path should be empty
ls /var/lib/systemd/pstore # The path should not be empty
[Where problems could occur]
On some systems, even though the described bug is present, the effect
of the bug could not be observed. The nature of the issue suggests
that this is a due to a timing issue.
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-bluefield/+bug/1978079/+subscriptions
More information about the foundations-bugs
mailing list