ACK/Cmnt: [SRU][P:linux][PATCH 0/1 v2] deadlock on cpu_hotplug_lock in __accept_page()

Stefan Bader stefan.bader at canonical.com
Wed May 21 12:50:15 UTC 2025


On 15.05.25 18:49, Thibault Ferrante wrote:
> BugLink: https://bugs.launchpad.net/bugs/2109543
> 
> [ Impact ]
> 
>   * Boot hangs because of deadlock caused by mm (memory management)
>     during CPU bring-up.
> 
> [ Fix ]
> 
>   * Upstream commit :
>     https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4067196a52278156d18d8d6fa7f43970611b1b49
> 
> [ Test Plan ]
> 
>   * This deadlock has been uncovered in the context of confidential VMs work (AMD SNP & TDX)
>     since its appearance frequency has been increased by the introduction of another commit
>     for fixing an unrelated issue when booting a large memory profile TDX VM.
>     (https://lore.kernel.org/all/20250310082855.2587122-1-kirill.shutemov@linux.intel.com/#t)
> 
>     Per consequence, to have a better chance to reproduce the issue, run a AMD SNP or TDX VM
>     with a certain CPU configuration:
> 
>     For AMD SNP,  you can follow the instructions in the submission
>     https://lore.kernel.org/all/363f8293-23e3-44d3-8005-b31eb5b7f975@amd.com/#t
> 
>     For Intel TDX, we can also reproduce this issue with Intel TDX VM with our 6.14 -intel kernel that has
>     TDX feature enabled (since TDX feature is only in kernel main-next for now and not yet released).
> 
> [ Where problems could occur ]
> 
>   * Performance regression in memory allocation and regression at CPU bring up time.
> 
> [ Further information ]
> 
>   * It is necessary to backport this patch to Plucky and Oracular kernel since
>     the commit that exhibits this deadlock is in Plucky 6.14 and being backported
>     to Oracular 6.11 kernel, however as the backport is not trivial, this will be
>     coming in a subsequent submission.
> 
> v1 -> v2:
>   Updated problem section.
>   Provenance/SOB added to the patch.
>   More explication on Oracular inclusion.
>   
> Kirill A. Shutemov (1):
>    mm/page_alloc: fix deadlock on cpu_hotplug_lock in __accept_page()
> 
>   include/linux/mmzone.h |  3 +++
>   mm/internal.h          |  1 +
>   mm/mm_init.c           |  1 +
>   mm/page_alloc.c        | 28 ++++++++++++++++++++++++++--
>   4 files changed, 31 insertions(+), 2 deletions(-)
> 

The bug report also shows oracular as affected. Unclear whether the bug 
report is wrong or the submission. But as we near the end of Oracular 
support this might be intentionally not fixed...

Acked-by: Stefan Bader <stefan.bader at canonical.com>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_0xE8675DEECBEECEA3.asc
Type: application/pgp-keys
Size: 47863 bytes
Desc: OpenPGP public key
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20250521/14429f86/attachment-0001.key>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: OpenPGP_signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20250521/14429f86/attachment-0001.sig>


More information about the kernel-team mailing list