[SRU][N][PATCH 3/8] x86/vmscape: Add conditional IBPB mitigation
Massimiliano Pellizzer
massimiliano.pellizzer at canonical.com
Wed Sep 17 12:22:48 UTC 2025
From: Pawan Gupta <pawan.kumar.gupta at linux.intel.com>
BugLink: https://bugs.launchpad.net/bugs/2124105
Commit 2f8f173413f1cbf52660d04df92d0069c4306d25 upstream.
VMSCAPE is a vulnerability that exploits insufficient branch predictor
isolation between a guest and a userspace hypervisor (like QEMU). Existing
mitigations already protect kernel/KVM from a malicious guest. Userspace
can additionally be protected by flushing the branch predictors after a
VMexit.
Since it is the userspace that consumes the poisoned branch predictors,
conditionally issue an IBPB after a VMexit and before returning to
userspace. Workloads that frequently switch between hypervisor and
userspace will incur the most overhead from the new IBPB.
This new IBPB is not integrated with the existing IBPB sites. For
instance, a task can use the existing speculation control prctl() to
get an IBPB at context switch time. With this implementation, the
IBPB is doubled up: one at context switch and another before running
userspace.
The intent is to integrate and optimize these cases post-embargo.
[ dhansen: elaborate on suboptimal IBPB solution ]
Suggested-by: Dave Hansen <dave.hansen at linux.intel.com>
Signed-off-by: Pawan Gupta <pawan.kumar.gupta at linux.intel.com>
Signed-off-by: Dave Hansen <dave.hansen at linux.intel.com>
Reviewed-by: Dave Hansen <dave.hansen at linux.intel.com>
Reviewed-by: Borislav Petkov (AMD) <bp at alien8.de>
Acked-by: Sean Christopherson <seanjc at google.com>
Signed-off-by: Borislav Petkov (AMD) <bp at alien8.de>
Signed-off-by: Greg Kroah-Hartman <gregkh at linuxfoundation.org>
(backported from commit f866eef8d1c65504d30923c3f14082ad294d0e6d linux-6.6.y)
[mpellizzer: context adjusted due to missing definitions and code
required by other CPU vulnerability mitigations (ITS and TSA)
that have not been backported yet]
CVE-2025-40300
Signed-off-by: Massimiliano Pellizzer <massimiliano.pellizzer at canonical.com>
---
arch/x86/include/asm/cpufeatures.h | 1 +
arch/x86/include/asm/entry-common.h | 7 +++++++
arch/x86/include/asm/nospec-branch.h | 2 ++
arch/x86/kernel/cpu/bugs.c | 8 ++++++++
arch/x86/kvm/x86.c | 9 +++++++++
5 files changed, 27 insertions(+)
diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index fb7c4479f20a5..f269d2eaf7fbf 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -468,6 +468,7 @@
#define X86_FEATURE_BHI_CTRL (21*32+ 2) /* "" BHI_DIS_S HW control available */
#define X86_FEATURE_CLEAR_BHB_HW (21*32+ 3) /* "" BHI_DIS_S HW control enabled */
#define X86_FEATURE_CLEAR_BHB_LOOP_ON_VMEXIT (21*32+ 4) /* "" Clear branch history at vmexit using SW loop */
+#define X86_FEATURE_IBPB_EXIT_TO_USER (21*32+14) /* Use IBPB on exit-to-userspace, see VMSCAPE bug */
/*
* BUG word(s)
diff --git a/arch/x86/include/asm/entry-common.h b/arch/x86/include/asm/entry-common.h
index fb2809b20b0ac..bb0a5ecc807fe 100644
--- a/arch/x86/include/asm/entry-common.h
+++ b/arch/x86/include/asm/entry-common.h
@@ -83,6 +83,13 @@ static inline void arch_exit_to_user_mode_prepare(struct pt_regs *regs,
* 8 (ia32) bits.
*/
choose_random_kstack_offset(rdtsc());
+
+ /* Avoid unnecessary reads of 'x86_ibpb_exit_to_user' */
+ if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER) &&
+ this_cpu_read(x86_ibpb_exit_to_user)) {
+ indirect_branch_prediction_barrier();
+ this_cpu_write(x86_ibpb_exit_to_user, false);
+ }
}
#define arch_exit_to_user_mode_prepare arch_exit_to_user_mode_prepare
diff --git a/arch/x86/include/asm/nospec-branch.h b/arch/x86/include/asm/nospec-branch.h
index a5f0ad76430bf..723d02455c69b 100644
--- a/arch/x86/include/asm/nospec-branch.h
+++ b/arch/x86/include/asm/nospec-branch.h
@@ -538,6 +538,8 @@ void alternative_msr_write(unsigned int msr, u64 val, unsigned int feature)
extern u64 x86_pred_cmd;
+DECLARE_PER_CPU(bool, x86_ibpb_exit_to_user);
+
static inline void indirect_branch_prediction_barrier(void)
{
alternative_msr_write(MSR_IA32_PRED_CMD, x86_pred_cmd, X86_FEATURE_USE_IBPB);
diff --git a/arch/x86/kernel/cpu/bugs.c b/arch/x86/kernel/cpu/bugs.c
index cdd8ef8fe2f1a..1e53559f3352e 100644
--- a/arch/x86/kernel/cpu/bugs.c
+++ b/arch/x86/kernel/cpu/bugs.c
@@ -58,6 +58,14 @@ EXPORT_SYMBOL_GPL(x86_spec_ctrl_base);
DEFINE_PER_CPU(u64, x86_spec_ctrl_current);
EXPORT_SYMBOL_GPL(x86_spec_ctrl_current);
+/*
+ * Set when the CPU has run a potentially malicious guest. An IBPB will
+ * be needed to before running userspace. That IBPB will flush the branch
+ * predictor content.
+ */
+DEFINE_PER_CPU(bool, x86_ibpb_exit_to_user);
+EXPORT_PER_CPU_SYMBOL_GPL(x86_ibpb_exit_to_user);
+
u64 x86_pred_cmd __ro_after_init = PRED_CMD_IBPB;
EXPORT_SYMBOL_GPL(x86_pred_cmd);
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 761d625788735..a5fac89589f92 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -11031,6 +11031,15 @@ static int vcpu_enter_guest(struct kvm_vcpu *vcpu)
if (vcpu->arch.guest_fpu.xfd_err)
wrmsrl(MSR_IA32_XFD_ERR, 0);
+ /*
+ * Mark this CPU as needing a branch predictor flush before running
+ * userspace. Must be done before enabling preemption to ensure it gets
+ * set for the CPU that actually ran the guest, and not the CPU that it
+ * may migrate to.
+ */
+ if (cpu_feature_enabled(X86_FEATURE_IBPB_EXIT_TO_USER))
+ this_cpu_write(x86_ibpb_exit_to_user, true);
+
/*
* Consume any pending interrupts, including the possible source of
* VM-Exit on SVM and any ticks that occur between VM-Exit and now.
--
2.48.1
More information about the kernel-team
mailing list