[SRU][N][PATCH 0/1] MultiVM - L2 guest(s) running stress-ng getting stuck at booting after triggering crash (LP: 2077722)

frank.heimes at canonical.com frank.heimes at canonical.com
Wed Jan 22 12:43:08 UTC 2025


BugLink: https://bugs.launchpad.net/bugs/2077722

SRU Justification:
==================

[Impact]

 * L2 guest(s) (nested virtualization) running stress-ng getting stuck
   at booting after triggering crash.

 * When for example having two Ubuntu 24.04 guests and running
   stress-ng (90% load) on both and triggering crash simultaneously,
   1st guest gets stuck and does not boot up.

 * In one of the attempts, both the guests got stuck on booting with console hang.

[Fix]

 * a373830f96db a373830f96db288a3eb43a8692b6bcd0bd88dfe1
   "KVM: PPC: Book3S HV: Mask off LPCR_MER for a vCPU before running it to avoid spurious interrupts"

[Test Plan]

 * An Ubuntu Server 24.04 LPAR installation, acting as KVM host,
   on IBM Power 10 hardware (with nested KVM capable FW1060 or never) is needed.

 * On top two (or more) KVM guests (now nested), again running 24.04,
   need to be setup.

 * Run the attached stress-ng.sh script on both KVM guests.

 * Trigger crash(es) on both KVM guests at the same time:
   echo c >/proc/sysrq-trigger

 * At least one KVM guest (sometimes both) are now stuck while rebooting,
   without the above patch in place.

[Where problems could occur]

 * The changes are in arch/powerpc/kvm/book3s_hv.c only,
   hence are ppc specific and do not affect any other architecture.

 * The net changes are more or less only two effective code lines;
   and additional else case and the explicit masking off the 'MER' bit.

 * Wrong assumptions may have a different impact on KVM gusts (L0),
   or interfere with any other virtualization level.

 * But the commit is an upstream accepted fix
   [for ec0f6639fa88 ("KVM: PPC: Book3S HV nestedv2: Ensure LPCR_MER bit is passed to the L0")]
   that landed in kernel 6.12 and was also accepted as stable update
   for kernels v6.8+.

[Other Info]

 * This fix/commit discussed here will be part of the planned
   target kernel for plucky, hence plucky/25.04 is not affected.

 * The fix/commit is already included in oracular master-next
   as 08cbc81b9a61 and included starting with kernel Ubuntu-6.11.0-17.17.
   
 * With that only noble needs to be fixed (since this nested virtualization
   scenario is not supported by Ubuntu prior to noble).

 * Since the fix is upstream marked as stable update,
   it would usually be picked up by the kernel team automatically.
 
 * But to not loose the 24.04.2 window out of sight I was asked
   to submit this patch separately.

Gautam Menghani (1):
  KVM: PPC: Book3S HV: Mask off LPCR_MER for a vCPU before running it to
    avoid spurious interrupts

 arch/powerpc/kvm/book3s_hv.c | 12 ++++++++++++
 1 file changed, 12 insertions(+)

-- 
2.43.0




More information about the kernel-team mailing list