[Bug 1927547] Re: seabios missing NMI disable in rtc_mask()
Heitor Alves de Siqueira
1927547 at bugs.launchpad.net
Fri Jun 18 18:02:11 UTC 2021
Since we don't have a reliable test procedure for triggering the KVM
emulation failures, I did basic smoke tests on VMs using seabios from
mitaka-proposed. Things look good, and general NMI functionality seems
to be working correctly.
I've also confirmed with affected users that this version has fixed
those specific instances of KVM emulation failures for them, so they
also confirmed the fix is working as intended.
** Tags removed: verification-mitaka-needed
** Tags added: verification-mitaka-done
--
You received this bug notification because you are a member of Ubuntu
OpenStack, which is subscribed to Ubuntu Cloud Archive.
https://bugs.launchpad.net/bugs/1927547
Title:
seabios missing NMI disable in rtc_mask()
Status in Ubuntu Cloud Archive:
Fix Released
Status in Ubuntu Cloud Archive mitaka series:
Fix Committed
Status in seabios package in Ubuntu:
Fix Released
Status in seabios source package in Trusty:
Fix Released
Status in seabios source package in Xenial:
Fix Released
Bug description:
[Impact]
On seabios before rel-1.9.0~47, there's a bug in rtc_mask() that can cause VMs to miss interrupts and get stuck in a 'PAUSED' state due to KVM emulation errors.
While reading PORT_CMOS_DATA, an NMI can "steal" execution before the
inb() call returns, which effectively leaves the guest waiting on the
port read forever. This can then trigger watchdogs, and usually
results in an KVM emulation error leaving the VM in the 'PAUSED'
state. Since the guest VM is broken due to the missed interrupts, the
only way to recover is restarting it.
[Test Plan]
Due to the somewhat small race window involved between the inb() call and an NMI coming in, this issue has been hard to reproduce consistently. Our test plan involves running the fixes in a heavily overcommited Openstack compute host where this issue has been reported multiple times, to also validate that no new regressions have been introduced.
[Where problems could occur]
The patch disables NMIs in rtc_mask(), so that it stays consistent with the other rtc_*() functions in seabios/srs/hw/rtc.c. After the CMOS port access finishes and the guest resumes execution, we could see regressions with missed interrupts or NMIs not being handled if they are not re-enabled.
Since the patch is already present in all Ubuntu releases starting
with Bionic and there have been no 'fixes:' tags for this patch
upstream, the chance for new regressions should be fairly limited.
[Other Info]
This has been fixed by the following upstream patch:
- 3156b71a535e (rtc: Disable NMI in rtc_mask()) [0]
$ git describe --contains 3156b71a535e661
rel-1.9.0~47
$ rmadison seabios -s trusty-updates,xenial,bionic
seabios | 1.7.4-4ubuntu1 | trusty-updates | source, all
seabios | 1.8.2-1ubuntu1 | xenial | source, all
seabios | 1.10.2-1ubuntu1 | bionic | source, all
Releases starting with Bionic already have this fix.
[0]
https://review.coreboot.org/plugins/gitiles/seabios/+/3156b71a535e661%5E%21/#F0
To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1927547/+subscriptions
More information about the Ubuntu-openstack-bugs
mailing list