[SRU][J:linux][PATCH 1/1] userfaultfd: fix checks for huge PMDs
Emil Renner Berthing
emil.renner.berthing at canonical.com
Wed Jun 11 14:21:12 UTC 2025
Philip Cox wrote:
> From: Jann Horn <jannh at google.com>
>
> CVE-2024-46787
>
> Patch series "userfaultfd: fix races around pmd_trans_huge() check", v2.
>
> The pmd_trans_huge() code in mfill_atomic() is wrong in three different
> ways depending on kernel version:
>
> 1. The pmd_trans_huge() check is racy and can lead to a BUG_ON() (if you hit
> the right two race windows) - I've tested this in a kernel build with
> some extra mdelay() calls. See the commit message for a description
> of the race scenario.
> On older kernels (before 6.5), I think the same bug can even
> theoretically lead to accessing transhuge page contents as a page table
> if you hit the right 5 narrow race windows (I haven't tested this case).
> 2. As pointed out by Qi Zheng, pmd_trans_huge() is not sufficient for
> detecting PMDs that don't point to page tables.
> On older kernels (before 6.5), you'd just have to win a single fairly
> wide race to hit this.
> I've tested this on 6.1 stable by racing migration (with a mdelay()
> patched into try_to_migrate()) against UFFDIO_ZEROPAGE - on my x86
> VM, that causes a kernel oops in ptlock_ptr().
> 3. On newer kernels (>=6.5), for shmem mappings, khugepaged is allowed
> to yank page tables out from under us (though I haven't tested that),
> so I think the BUG_ON() checks in mfill_atomic() are just wrong.
>
> I decided to write two separate fixes for these (one fix for bugs 1+2, one
> fix for bug 3), so that the first fix can be backported to kernels
> affected by bugs 1+2.
>
> This patch (of 2):
>
> This fixes two issues.
>
> I discovered that the following race can occur:
>
> mfill_atomic other thread
> ============ ============
> <zap PMD>
> pmdp_get_lockless() [reads none pmd]
> <bail if trans_huge>
> <if none:>
> <pagefault creates transhuge zeropage>
> __pte_alloc [no-op]
> <zap PMD>
> <bail if pmd_trans_huge(*dst_pmd)>
> BUG_ON(pmd_none(*dst_pmd))
>
> I have experimentally verified this in a kernel with extra mdelay() calls;
> the BUG_ON(pmd_none(*dst_pmd)) triggers.
>
> On kernels newer than commit 0d940a9b270b ("mm/pgtable: allow
> pte_offset_map[_lock]() to fail"), this can't lead to anything worse than
> a BUG_ON(), since the page table access helpers are actually designed to
> deal with page tables concurrently disappearing; but on older kernels
> (<=6.4), I think we could probably theoretically race past the two
> BUG_ON() checks and end up treating a hugepage as a page table.
>
> The second issue is that, as Qi Zheng pointed out, there are other types
> of huge PMDs that pmd_trans_huge() can't catch: devmap PMDs and swap PMDs
> (in particular, migration PMDs).
>
> On <=6.4, this is worse than the first issue: If mfill_atomic() runs on a
> PMD that contains a migration entry (which just requires winning a single,
> fairly wide race), it will pass the PMD to pte_offset_map_lock(), which
> assumes that the PMD points to a page table.
>
> Breakage follows: First, the kernel tries to take the PTE lock (which will
> crash or maybe worse if there is no "struct page" for the address bits in
> the migration entry PMD - I think at least on X86 there usually is no
> corresponding "struct page" thanks to the PTE inversion mitigation, amd64
> looks different).
>
> If that didn't crash, the kernel would next try to write a PTE into what
> it wrongly thinks is a page table.
>
> As part of fixing these issues, get rid of the check for pmd_trans_huge()
> before __pte_alloc() - that's redundant, we're going to have to check for
> that after the __pte_alloc() anyway.
>
> Backport note: pmdp_get_lockless() is pmd_read_atomic() in older kernels.
>
> Link: https://lkml.kernel.org/r/20240813-uffd-thp-flip-fix-v2-0-5efa61078a41@google.com
> Link: https://lkml.kernel.org/r/20240813-uffd-thp-flip-fix-v2-1-5efa61078a41@google.com
> Fixes: c1a4de99fada ("userfaultfd: mcopy_atomic|mfill_zeropage: UFFDIO_COPY|UFFDIO_ZEROPAGE preparation")
> Signed-off-by: Jann Horn <jannh at google.com>
> Acked-by: David Hildenbrand <david at redhat.com>
> Cc: Andrea Arcangeli <aarcange at redhat.com>
> Cc: Hugh Dickins <hughd at google.com>
> Cc: Jann Horn <jannh at google.com>
> Cc: Pavel Emelyanov <xemul at virtuozzo.com>
> Cc: Qi Zheng <zhengqi.arch at bytedance.com>
> Cc: <stable at vger.kernel.org>
> Signed-off-by: Andrew Morton <akpm at linux-foundation.org>
> (backport from commit 71c186efc1b2cf1aeabfeff3b9bd5ac4c5ac14d8)
> [philcox: replace calls to pmdp_get_lockless() with pmd_read_atomic()
> in __mcopy_atomic()i due to function being renamed]
There is an extra i here, and I assume the function that is patches was itself
also renanemd from __mcopy_atomic() to mcopy_atomic.
/Emil
> Signed-off-by: Philip Cox <philip.cox at canonical.com>
> ---
> mm/userfaultfd.c | 22 ++++++++++++----------
> 1 file changed, 12 insertions(+), 10 deletions(-)
>
> diff --git a/mm/userfaultfd.c b/mm/userfaultfd.c
> index 98a9d0ef2d917..50b76c34dd20e 100644
> --- a/mm/userfaultfd.c
> +++ b/mm/userfaultfd.c
> @@ -602,21 +602,23 @@ static __always_inline ssize_t __mcopy_atomic(struct mm_struct *dst_mm,
> }
>
> dst_pmdval = pmd_read_atomic(dst_pmd);
> - /*
> - * If the dst_pmd is mapped as THP don't
> - * override it and just be strict.
> - */
> - if (unlikely(pmd_trans_huge(dst_pmdval))) {
> - err = -EEXIST;
> - break;
> - }
> if (unlikely(pmd_none(dst_pmdval)) &&
> unlikely(__pte_alloc(dst_mm, dst_pmd))) {
> err = -ENOMEM;
> break;
> }
> - /* If an huge pmd materialized from under us fail */
> - if (unlikely(pmd_trans_huge(*dst_pmd))) {
> + dst_pmdval = pmd_read_atomic(dst_pmd);
> + /*
> + * If the dst_pmd is THP don't override it and just be strict.
> + * (This includes the case where the PMD used to be THP and
> + * changed back to none after __pte_alloc().)
> + */
> + if (unlikely(!pmd_present(dst_pmdval) || pmd_trans_huge(dst_pmdval) ||
> + pmd_devmap(dst_pmdval))) {
> + err = -EEXIST;
> + break;
> + }
> + if (unlikely(pmd_bad(dst_pmdval))) {
> err = -EFAULT;
> break;
> }
> --
> 2.43.0
>
>
> --
> kernel-team mailing list
> kernel-team at lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team
More information about the kernel-team
mailing list