APPLIED: [SRU][N/O/P][PATCH 0/1] MGLRU: page allocation failure on NUMA-enabled systems
Koichiro Den
koichiro.den at canonical.com
Fri Feb 14 07:09:36 UTC 2025
On Fri, Feb 14, 2025 at 04:08:34PM GMT, Koichiro Den wrote:
> On Sun, Feb 02, 2025 at 12:21:50PM GMT, Heitor Alves de Siqueira wrote:
> > BugLink: https://bugs.launchpad.net/bugs/2097214
> >
> > [Impact]
> > * On MGLRU-enabled systems, high memory pressure on NUMA nodes will cause page
> > allocation failures
> > * This happens due to page reclaim not waking up flusher threads
> > * OOM can be triggered even if the system has enough available memory
> >
> > [Test Plan]
> > * For the bug to properly trigger, we should uninstall apport and use the
> > attached alloc_and_crash.c reproducer
> > * alloc_and_crash will mmap a huge range of memory, memset it and forcibly SEGFAULT
> > * The attached bash script will membind alloc_and_crash to NUMA node 0, so we
> > can see the allocation failures in dmesg
> > $ sudo apt remove --purge apport
> > $ sudo dmesg -c; ./lp2097214-repro.sh; sleep 2; sudo dmesg
> >
> > [Fix]
> > * The upstream patch wakes up flusher threads if there are too many dirty
> > entries in the coldest LRU generation
> > * This happens when trying to shrink lruvecs, so reclaim only gets woken up
> > during high memory pressure
> > * Fix was introduced by commit:
> > 1bc542c6a0d1 mm/vmscan: wake up flushers conditionally to avoid cgroup OOM
> >
> > [Regression Potential]
> > * This commit fixes the memory reclaim path, so regressions would likely show
> > up during increased system memory pressure
> > * According to the upstream patch, increased SSD/disk wearing is possible due
> > to waking up flusher threads, although these have not been noted in testing
> >
> > Zeng Jingxiang (1):
> > mm/vmscan: wake up flushers conditionally to avoid cgroup OOM
> >
> > mm/vmscan.c | 25 ++++++++++++++++++++++---
> > 1 file changed, 22 insertions(+), 3 deletions(-)
> >
>
> Applied to oracular:linux, noble:linux master-next branches. Thanks!
The title should have been 'APPLIED [N/O] ...'
More information about the kernel-team
mailing list