ACK: [SRU][J][PATCH v2 0/2] WARN in trc_wait_for_one_reader on Xen instances

Mon Dec 2 13:28:50 UTC 2024

2024-11-22 18:53 CET, Krister Johansen:
> BugLink: https://bugs.launchpad.net/bugs/2089373
> 
> [Impact]
> 
> When ending bpf tracing, 5.15 kernels now report a warning in
> trc_wait_for_one_reader() on platforms that support hot-plugging CPUs,
> but that do not have all of their hotplug slots populated.  In this
> submitter's environment, it reproduces on Xen EC2 instances, but not
> Nitro ones.
> 
> The warning looks like this:
> 
> kernel: [ 6416.920266] ------------[ cut here ]------------
> kernel: [ 6416.920272] trc_wait_for_one_reader(): smp_call_function_single() failed for CPU: 64
> kernel: [ 6416.920289] WARNING: CPU: 0 PID: 13 at kernel/rcu/tasks.h:1044 trc_wait_for_one_reader+0x2b8/0x300
> kernel: [ 6416.920299] Modules linked in: xt_state xt_connmark nf_conntrack_netlink nfnetlink xt_addrtype xt_statistic xt_nat xt_tcpudp ip_vs_sh ip_vs_wrr ip_vs_rr ip_vs nvidia_uvm(POE) nvidia_drm(POE) drm_kms_helper cec rc_core fb_sys_fops syscopyarea sysfillrect sysimgblt nvidia_modeset(POE) nvidia(POE) iptable_mangle ip6table_mangle ip6table_filter ip6table_nat ip6_tables xt_MASQUERADE xt_conntrack xt_comment iptable_filter xt_mark iptable_nat nf_nat bpfilter aufs overlay udp_diag tcp_diag inet_diag binfmt_misc nls_iso8859_1 dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua crct10dif_pclmul crc32_pclmul ghash_clmulni_intel sha256_ssse3 sha1_ssse3 aesni_intel input_leds psmouse crypto_simd cryptd serio_raw floppy sch_fq_codel nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 libcrc32c ena drm efi_pstore ip_tables x_tables autofs4
> kernel: [ 6416.920368] CPU: 0 PID: 13 Comm: rcu_tasks_trace Tainted: P OE 5.15.0-1071-aws #77~20.04.1-Ubuntu
> kernel: [ 6416.920372] Hardware name: Xen HVM domU, BIOS 4.11.amazon 08/24/2006
> kernel: [ 6416.920374] RIP: 0010:trc_wait_for_one_reader+0x2b8/0x300
> kernel: [ 6416.920376] Code: 00 00 00 4c 89 ef e8 37 ac 4e 00 eb 9f 44 89 fa 48 c7 c6 00 63 e2 b8 48 c7 c7 a0 9a 1e b9 c6 05 2f 2e 09 02 01 e8 15 2e b9 00 <0f> 0b e9 31 ff ff ff 4c 89 ee 48 c7 c7 20 df b7 b9 e8 a2 99 52 00
> kernel: [ 6416.920380] RSP: 0018:ffff9e048c4efe00 EFLAGS: 00010286
> kernel: [ 6416.920382] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000027
> kernel: [ 6416.920384] RDX: 0000000000000027 RSI: 0000000000000003 RDI: ffff93074ae20588
> kernel: [ 6416.920385] RBP: ffff9e048c4efe28 R08: ffff93074ae20580 R09: 0000000000000001
> kernel: [ 6416.920387] R10: 0000000000ffff0a R11: ffff93463feb2c7f R12: ffff92cbc6a1e600
> kernel: [ 6416.920389] R13: 0000000000000040 R14: 00000000000205a4 R15: 0000000000000040
> kernel: [ 6416.920390] FS: 0000000000000000(0000) GS:ffff93074ae00000(0000) knlGS:0000000000000000
> kernel: [ 6416.920393] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> kernel: [ 6416.920394] CR2: 00007f4a72b04098 CR3: 00000046c8964001 CR4: 00000000001706f0
> kernel: [ 6416.920399] Call Trace:
> kernel: [ 6416.920401] <TASK>
> kernel: [ 6416.920404] ? show_regs.cold+0x1a/0x1f
> kernel: [ 6416.920410] ? trc_wait_for_one_reader+0x2b8/0x300
> kernel: [ 6416.920412] ? __warn+0x8b/0xe0
> kernel: [ 6416.920418] ? trc_wait_for_one_reader+0x2b8/0x300
> kernel: [ 6416.920421] ? report_bug+0xd5/0x110
> kernel: [ 6416.920427] ? handle_bug+0x39/0x90
> kernel: [ 6416.920431] ? exc_invalid_op+0x19/0x70
> kernel: [ 6416.920434] ? asm_exc_invalid_op+0x1b/0x20
> kernel: [ 6416.920442] ? trc_wait_for_one_reader+0x2b8/0x300
> kernel: [ 6416.920446] rcu_tasks_trace_postscan+0x47/0x80
> kernel: [ 6416.920449] rcu_tasks_wait_gp+0x108/0x210
> kernel: [ 6416.920453] rcu_tasks_kthread+0x10f/0x1c0
> kernel: [ 6416.920456] ? wait_woken+0x60/0x60
> kernel: [ 6416.920462] ? show_rcu_tasks_trace_gp_kthread+0x80/0x80
> kernel: [ 6416.920464] kthread+0x12a/0x150
> kernel: [ 6416.920471] ? set_kthread_struct+0x50/0x50
> kernel: [ 6416.920476] ret_from_fork+0x22/0x30
> kernel: [ 6416.920485] </TASK>
> kernel: [ 6416.920486] ---[ end trace 0500611ddaff33a7 ]---
> 
> The problem appears when:
> 
> - The system is performing a rcu_tasks_trace grace period wait
> - The system has more hot plug CPU slots available than are populated
> - The rcu tasks postscan detects a holdout
> 
> The problem is actually caused by a mismerge of 9b3c4ab304("sched,rcu:
> Rework try_invoke_on_locked_down_task()").  When that patch was applied,
> a conflict around task nesting was improperly resolved and lead to
> quiescent tasks getting flagged as holdouts.  This in turn results in
> more IPIs than necessary to idle CPUs, as well as WARNs about failing to
> send IPIs to CPUs that aren't running.
> 
> The fix is a twofer: 1) manually correct the mismerge in the same way
> that mainline resolved the conflict, and 2) backport an additional RCU
> patch that confines the rcu_tasks postscan to only CPUs that are
> running.
> 
> [Backport]
> 
> The upstream merge that shows the correct manual resolution of the merge
> conflicts is in this commit:
> 
>    commit 6fedc28076bbbb32edb722e80f9406a3d1d668a8
>    Merge tag 'rcu.2021.11.01a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
> 
> specifically:
> 
>  > @@ -951,18 +942,18 @@ static int trc_inspect_reader(struct task_struct *t, void *arg)
>  >  		n_heavy_reader_updates++;
>  >  		if (ofl)
>  >  			n_heavy_reader_ofl_updates++;
>  > -		in_qs = true;
>  > +		nesting = 0;
>  >  	} else {
>  >  		// The task is not running, so C-language access is safe.
>  > -		in_qs = likely(!t->trc_reader_nesting);
>  > +		nesting = t->trc_reader_nesting;
>  >  	}
>  >  
>  > -	// Mark as checked so that the grace-period kthread will
>  > -	// remove it from the holdout list.
>  > -	t->trc_reader_checked = true;
>  > -
>  > -	if (in_qs)
>  > -		return 0;  // Already in quiescent state, done!!!
>  > +	// If not exiting a read-side critical section, mark as checked
>  > +	// so that the grace-period kthread will remove it from the
>  > +	// holdout list.
>  > +	t->trc_reader_checked = nesting >= 0;
>  > +	if (nesting <= 0)
>  > +		return nesting ? -EINVAL : 0;  // If in QS, done, otherwise try again later.
> 
> The additional rcu_tasks patch for only running postscan on online cpus
> is:
> 
>    commit 5c9a9ca44fda41c5e82f50efced5297a9c19760d
>    rcu-tasks: Idle tasks on offline CPUs are in quiescent
> 
> I've additionally reached out to upstream about including this in
> stable:
> 
> https://lore.kernel.org/stable/cover.1732237776.git.kjlx@templeofstupid.com/
> 
> [Test]
> 
> A trivial reproducer for this problem is to use an up-to-date version of
> bpftrace to run a kfunc probe, which when destroyed uses the
> rcu_tasks_trace facility to cleanup:
> 
>    bpftrace -e 'kfunc:tcp_reset {@a = count();}'
>    ^C
> 
> Is all that's necessary to reproduce the problem on a Xen EC2 system.
> 
> I've run with and without the patches applied and can confirm that one
> and both are sufficient to resolve the problem.  Correcting the nesting
> ensures that idling cpus don't get flagged as holdouts, and confining
> the scan to just online cpus ensures that even if we incorrectly flag a
> cpu as a holdout the warning won't trigger because sending the IPI won't
> fail.
> 
> [Potential Regression]
> 
> The regression potential is low.  The corrected commit has been present
> in mainline since 2021 and the fix to only run postscan on online CPUs
> has been present since 2022.
> 
> 
> Krister Johansen (1):
>   UBUNTU: SAUCE: rcu-tasks: fix mismerge in trc_inspect_reader
> 
> Paul E. McKenney (1):
>   rcu-tasks: Idle tasks on offline CPUs are in quiescent states
> 
>  kernel/rcu/tasks.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)

Acked-by: Agathe Porte <agathe.porte at canonical.com>