APPLIED: [SRU][N][PATCH 0/2] noble ubuntu_ftrace_smoke_test:mmiotrace timeout on aws:r5.metal (LP: #2121673)
Mehmet Basaran
mehmet.basaran at canonical.com
Thu Sep 18 08:02:08 UTC 2025
Applied to noble:linux master-next branch. Thanks.
-------------- next part --------------
Juerg Haefliger <juerg.haefliger at canonical.com> writes:
> BugLink: https://bugs.launchpad.net/bugs/2121673
>
> [Impact]
>
> This happens for 6.8.0-80.80 (2025.08.11) generic kernel and only happens with aws:r5.metal instance. 6.12 kernel works find. Juerg found the offending commit to be:
>
> memcg: drain obj stock on cpu hotplug teardown
>
> BugLink: https://bugs.launchpad.net/bugs/2119458
>
> commit 9f01b4954490d4ccdbcc2b9be34a9921ceee9cbb upstream.
>
> Currently on cpu hotplug teardown, only memcg stock is drained but we
> need to drain the obj stock as well otherwise we will miss the stats
> accumulated on the target cpu as well as the nr_bytes cached. The stats
> include MEMCG_KMEM, NR_SLAB_RECLAIMABLE_B & NR_SLAB_UNRECLAIMABLE_B. In
> addition we are leaking reference to struct obj_cgroup object.
>
> Because nothing in the upstream patchset depends on this commit we decided to delay applying this patch until the next SRU cycle.
>
> INFO | START ubuntu_ftrace_smoke_test.ftrace-smoke-test ubuntu_ftrace_smoke_test.ftrace-smoke-test timeout=900 timestamp=1756180477 localtime=Aug 26 03:54:37
> DEBUG| Persistent state client._record_indent now set to 2
> DEBUG| Persistent state client.unexpected_reboot now set to ('ubuntu_ftrace_smoke_test.ftrace-smoke-test', 'ubuntu_ftrace_smoke_test.ftrace-smoke-test')
> DEBUG| Waiting for pid 3906 for 900 seconds
> WARNI| System python is too old, crash handling disabled
> DEBUG| Running '/home/ubuntu/autotest/client/tests/ubuntu_ftrace_smoke_test/ubuntu_ftrace_smoke_test.sh'
> DEBUG| [stdout] PASSED (CONFIG_FUNCTION_TRACER=y in /boot/config-6.8.0-80-generic)
> DEBUG| [stdout] PASSED (CONFIG_FUNCTION_GRAPH_TRACER=y in /boot/config-6.8.0-80-generic)
> DEBUG| [stdout] PASSED (CONFIG_STACK_TRACER=y in /boot/config-6.8.0-80-generic)
> DEBUG| [stdout] PASSED (CONFIG_DYNAMIC_FTRACE=y in /boot/config-6.8.0-80-generic)
> DEBUG| [stdout] PASSED all expected /sys/kernel/debug/tracing files exist
> DEBUG| [stdout] PASSED (function_graph in /sys/kernel/debug/tracing/available_tracers)
> DEBUG| [stdout] PASSED (function in /sys/kernel/debug/tracing/available_tracers)
> DEBUG| [stdout] PASSED (nop in /sys/kernel/debug/tracing/available_tracers)
> DEBUG| [stdout] PASSED (tracer function can be enabled)
> DEBUG| [stdout] PASSED (tracer function_graph can be enabled)
> ERROR| [stderr] grep: /tmp/ftrace-kernel-trace-3910.tmp.log: binary file matches
> DEBUG| [stdout] - tracer function_graph got enough data
> DEBUG| [stdout] - tracer function_graph completed
> DEBUG| [stdout] - tracer function_graph being turned off
> ERROR| [stderr] grep: /tmp/ftrace-kernel-trace-3910.tmp.log: binary file matches
> DEBUG| [stdout] - tracer got 231 irq events
> DEBUG| [stdout] - tracer timerlat got enough data
> DEBUG| [stdout] - tracer timerlat completed
> DEBUG| [stdout] - tracer timerlat being turned off
> DEBUG| [stdout] - tracer nop being set as current tracer
> DEBUG| [stdout] PASSED (tracer timerlat can be enabled (got 660 lines of tracing output))
> DEBUG| [stdout] - tracer osnoise got enough data
> DEBUG| [stdout] - tracer osnoise completed
> DEBUG| [stdout] - tracer osnoise being turned off
> DEBUG| [stdout] - tracer nop being set as current tracer
> DEBUG| [stdout] PASSED (tracer osnoise can be enabled (got 11 lines of tracing output))
> DEBUG| [stdout] - tracer hwlat got enough data
> DEBUG| [stdout] - tracer hwlat completed
> DEBUG| [stdout] - tracer hwlat being turned off
> DEBUG| [stdout] - tracer nop being set as current tracer
> DEBUG| [stdout] PASSED (tracer hwlat can be enabled (got 13 lines of tracing output))
> DEBUG| [stdout] - tracer blk got enough data
> DEBUG| [stdout] - tracer blk completed
> DEBUG| [stdout] - tracer blk being turned off
> DEBUG| [stdout] - tracer nop being set as current tracer
> DEBUG| [stdout] PASSED (tracer blk can be enabled (got 2 lines of tracing output))
> DEBUG| [stdout] TIMER END Tue Aug 26 03:58:59 UTC 2025
> DEBUG| [stdout] TIMEOUT
> DEBUG| [stdout] FAILED: aborting, timeout, took way too long to complete
> INFO | Timer expired (900 sec.), nuking pid 3906
> INFO | ERROR ubuntu_ftrace_smoke_test.ftrace-smoke-test ubuntu_ftrace_smoke_test.ftrace-smoke-test timestamp=1756181377 localtime=Aug 26 04:09:37 Test timeout expired, rc=15
> INFO | END ERROR ubuntu_ftrace_smoke_test.ftrace-smoke-test ubuntu_ftrace_smoke_test.ftrace-smoke-test timestamp=1756181377 localtime=Aug 26 04:09:37
>
> Running 'sudo chcpu -d 1-95' results in:
>
> [ 82.891707] BUG: kernel NULL pointer dereference, address: 0000000000000000
> [ 82.891959] #PF: supervisor read access in kernel mode
> [ 82.891959] #PF: error_code(0x0000) - not-present page
> [ 82.891959] PGD 0 P4D 0
> [ 82.891959] Oops: 0000 [#1] PREEMPT SMP NOPTI
> [ 82.891959] CPU: 0 PID: 593 Comm: kworker/0:2 Not tainted 6.8.0-80-generic #80-Ubuntu
> [ 82.891959] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> [ 82.891959] Workqueue: events work_for_cpu_fn
> [ 82.891959] RIP: 0010:memcg_hotplug_cpu_dead+0x65/0xc0
> [ 82.891959] Code: 44 00 00 48 89 df e8 5a ef ff ff 48 89 c3 41 f7 c5 00 02 00 00 74 06 fb 0f 1f 44 00 00 4c 89 e7 e8 f0 cd ff ff e8 6b d9 d0 ff <48> 8b 03 a8 03 75 1e 65 48 ff 08 e8 ab 35 d1 ff 31 c0 5b 41 5c 41
> [ 82.891959] RSP: 0018:ffffbd548170bd10 EFLAGS: 00000246
> [ 82.891959] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> [ 82.891959] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> [ 82.891959] RBP: ffffbd548170bd28 R08: 0000000000000000 R09: 0000000000000000
> [ 82.891959] R10: 000000000000001c R11: 0000000000000000 R12: ffff99183bcb0c00
> [ 82.891959] R13: 0000000000000286 R14: 0000000000000001 R15: 0000000000000000
> [ 82.891959] FS: 0000000000000000(0000) GS:ffff99183bc00000(0000) knlGS:0000000000000000
> [ 82.891959] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 82.891959] CR2: 0000000000000000 CR3: 000000001c43c000 CR4: 00000000000006f0
> [ 82.891959] Call Trace:
> [ 82.891959] <TASK>
> [ 82.891959] ? show_regs+0x6d/0x80
> [ 82.891959] ? __die+0x24/0x80
> [ 82.891959] ? page_fault_oops+0x99/0x1b0
> [ 82.891959] ? kernelmode_fixup_or_oops.isra.0+0x69/0x90
> [ 82.891959] ? __bad_area_nosemaphore+0x19e/0x2c0
> [ 82.891959] ? bad_area_nosemaphore+0x16/0x30
> [ 82.891959] ? do_user_addr_fault+0x29d/0x670
> [ 82.891959] ? exc_page_fault+0x83/0x1b0
> [ 82.891959] ? asm_exc_page_fault+0x27/0x30
> [ 82.891959] ? memcg_hotplug_cpu_dead+0x65/0xc0
> [ 82.891959] ? __pfx_memcg_hotplug_cpu_dead+0x10/0x10
> [ 82.891959] cpuhp_invoke_callback+0x348/0x530
> [ 82.891959] __cpuhp_invoke_callback_range+0x80/0x100
> [ 82.891959] _cpu_down+0xfb/0x280
> [ 82.891959] __cpu_down_maps_locked+0x15/0x30
> [ 82.891959] work_for_cpu_fn+0x1a/0x30
> [ 82.891959] process_one_work+0x184/0x3a0
> [ 82.891959] worker_thread+0x306/0x440
> [ 82.891959] ? _raw_spin_lock_irqsave+0xe/0x20
> [ 82.891959] ? __pfx_worker_thread+0x10/0x10
> [ 82.891959] kthread+0xf2/0x120
> [ 82.891959] ? __pfx_kthread+0x10/0x10
> [ 82.891959] ret_from_fork+0x47/0x70
> [ 82.891959] ? __pfx_kthread+0x10/0x10
> [ 82.891959] ret_from_fork_asm+0x1b/0x30
> [ 82.891959] </TASK>
> [ 82.891959] Modules linked in: kvm_amd ccp kvm irqbypass input_leds psmouse ahci libahci serio_raw overlay 9pnet_virtio virtiofs 9p 9pnet netfs
> [ 82.891959] CR2: 0000000000000000
> [ 82.891959] ---[ end trace 0000000000000000 ]---
> [ 82.891959] RIP: 0010:memcg_hotplug_cpu_dead+0x65/0xc0
> [ 82.891959] Code: 44 00 00 48 89 df e8 5a ef ff ff 48 89 c3 41 f7 c5 00 02 00 00 74 06 fb 0f 1f 44 00 00 4c 89 e7 e8 f0 cd ff ff e8 6b d9 d0 ff <48> 8b 03 a8 03 75 1e 65 48 ff 08 e8 ab 35 d1 ff 31 c0 5b 41 5c 41
> [ 82.891959] RSP: 0018:ffffbd548170bd10 EFLAGS: 00000246
> [ 82.891959] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
> [ 82.891959] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 0000000000000000
> [ 82.891959] RBP: ffffbd548170bd28 R08: 0000000000000000 R09: 0000000000000000
> [ 82.891959] R10: 000000000000001c R11: 0000000000000000 R12: ffff99183bcb0c00
> [ 82.891959] R13: 0000000000000286 R14: 0000000000000001 R15: 0000000000000000
> [ 82.891959] FS: 0000000000000000(0000) GS:ffff99183bc00000(0000) knlGS:0000000000000000
> [ 82.891959] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 82.891959] CR2: 0000000000000000 CR3: 000000001c43c000 CR4: 00000000000006f0
> [ 82.891959] note: kworker/0:2[593] exited with irqs disabled
>
> [Fix]
>
> The offending commit relies on a NULL check introduced by an earlier commit which we don't have. Pull that in:
> 91b71e78b8e4 ("mm: memcg: add NULL check to obj_cgroup_put()")
>
> [Test Case]
>
> Running 'sudo chcpu -d 1-95' should not trigger a kernel BUG.
>
> [Where Problems Could Occur]
>
> This touches the CPU hotplug code path. Any on- and off-lining of CPUs could cause issues.
>
> Shakeel Butt (1):
> memcg: drain obj stock on cpu hotplug teardown
>
> Yosry Ahmed (1):
> mm: memcg: add NULL check to obj_cgroup_put()
>
> include/linux/memcontrol.h | 3 ++-
> kernel/bpf/memalloc.c | 6 ++----
> mm/memcontrol.c | 27 +++++++++++++++------------
> mm/zswap.c | 3 +--
> 4 files changed, 20 insertions(+), 19 deletions(-)
>
> --
> 2.48.1
>
>
> --
> kernel-team mailing list
> kernel-team at lists.ubuntu.com
> https://lists.ubuntu.com/mailman/listinfo/kernel-team
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 873 bytes
Desc: not available
URL: <https://lists.ubuntu.com/archives/kernel-team/attachments/20250918/9be0faf7/attachment.sig>
More information about the kernel-team
mailing list