[SRU][N/O/P][PATCH 0/1] MGLRU: page allocation failure on NUMA-enabled systems
Heitor Alves de Siqueira
halves at canonical.com
Sun Feb 2 15:21:50 UTC 2025
BugLink: https://bugs.launchpad.net/bugs/2097214
[Impact]
* On MGLRU-enabled systems, high memory pressure on NUMA nodes will cause page
allocation failures
* This happens due to page reclaim not waking up flusher threads
* OOM can be triggered even if the system has enough available memory
[Test Plan]
* For the bug to properly trigger, we should uninstall apport and use the
attached alloc_and_crash.c reproducer
* alloc_and_crash will mmap a huge range of memory, memset it and forcibly SEGFAULT
* The attached bash script will membind alloc_and_crash to NUMA node 0, so we
can see the allocation failures in dmesg
$ sudo apt remove --purge apport
$ sudo dmesg -c; ./lp2097214-repro.sh; sleep 2; sudo dmesg
[Fix]
* The upstream patch wakes up flusher threads if there are too many dirty
entries in the coldest LRU generation
* This happens when trying to shrink lruvecs, so reclaim only gets woken up
during high memory pressure
* Fix was introduced by commit:
1bc542c6a0d1 mm/vmscan: wake up flushers conditionally to avoid cgroup OOM
[Regression Potential]
* This commit fixes the memory reclaim path, so regressions would likely show
up during increased system memory pressure
* According to the upstream patch, increased SSD/disk wearing is possible due
to waking up flusher threads, although these have not been noted in testing
Zeng Jingxiang (1):
mm/vmscan: wake up flushers conditionally to avoid cgroup OOM
mm/vmscan.c | 25 ++++++++++++++++++++++---
1 file changed, 22 insertions(+), 3 deletions(-)
--
2.48.1
More information about the kernel-team
mailing list