ACK: [SRU][R][PATCH 0/1] System doesn't response with mt76 call trace
Yufeng Gao
yufeng.gao at canonical.com
Wed Apr 8 07:25:20 UTC 2026
On 8/4/26 10:43, AceLan Kao wrote:
> From: "Chia-Lin Kao (AceLan)" <acelan.kao at canonical.com>
>
> BugLink: https://bugs.launchpad.net/bugs/2137448
>
> [Impact]
> On Dell systems with MediaTek MT7925 WiFi cards (mt7925e driver), the system becomes unresponsive during firmware testing and high-load situations due to a deadlock in the mt76 driver. The system shows "workqueue hogging CPU" messages followed by system hang, preventing completion of certification testing.
>
> The issue occurs because:
> 1. Two workqueue functions (ps_work and mac_work) attempt to cancel each other using cancel_delayed_work_sync()
> 2. In high-load situations, both works get queued but cannot execute until CPUs are available
> 3. When CPUs become available, both work functions may run simultaneously, each trying to synchronously cancel the other, resulting in a deadlock
>
> The call path that creates the circular dependency is:
> mt792x_mac_work() -> ... -> cancel_delayed_work_sync(&pm->ps_work);
> mt792x_pm_power_save_work() -> cancel_delayed_work_sync(&mphy->mac_work);
>
> [Fix]
> Replace cancel_delayed_work_sync() with cancel_delayed_work() in the mt792x_pm_power_save_work() function to eliminate the deadlock condition.
>
> Upstream commit (submitted to linux-wireless):
> https://patchwork.kernel.org/project/linux-wireless/patch/20251215122231.3180648-1-leon.yen@mediatek.com/
>
> In linux-next:
> bb2f07819d063 wifi: mt76: mt792x: Fix a potential deadlock in high-load situations
>
> The non-synchronous cancel is safe here because:
> - The work cancellation is part of the power-save flow, not a critical cleanup path
> - Avoiding synchronous wait prevents the circular dependency that causes the deadlock
> - The code becomes simpler and easier to maintain
>
> [Test Plan]
> On a Dell system with MediaTek MT7925 WiFi (or similar affected platform):
>
> 1. Install fwts if not already available:
> $ sudo apt-get install fwts
>
> 2. Monitor system logs in a separate terminal:
> $ sudo dmesg -w
>
> 3. Run the firmware test cases that previously triggered the deadlock:
>
> $ sudo fwts wakealarm
> $ sudo fwts uefirtvariable
> $ sudo fwts oops
>
> Or run a comprehensive diagnostic test:
> $ sudo fwts --log-level=high -r stdout
>
> 4. Check for symptoms during and after the tests:
>
> Without the fix, you would see:
> - "Message 00020080 (seq N) timeout" from mt7925e
> - "workqueue: vmstat_update hogged CPU for >10000us" warnings
> - "workqueue: psi_avgs_work hogged CPU for >10000us" warnings
> - WARNING traces in iommu_dma_unmap_page
> - System becoming unresponsive
>
> With the fix, these symptoms should not occur and the system should remain responsive.
>
> 5. Run extended stress testing with WiFi activity during high CPU load:
> $ stress-ng --cpu 128 --timeout 300s &
> $ ping -f <router_ip> # flood ping to generate WiFi traffic
>
> The system should remain stable without deadlocks.
>
> [Where problems could occur]
> This change affects the MediaTek MT792x WiFi driver's power management and workqueue interaction on systems with mt7925e and similar chipsets.
>
> Potential issues if the non-synchronous cancel is not safe in this context:
> - If there are assumptions in the code that mac_work must be fully stopped before proceeding, using non-synchronous cancel might allow mac_work to run concurrently with subsequent operations, potentially causing race conditions
> - The mac_work might access hardware or data structures that ps_work assumes are quiescent after the cancel call, leading to unexpected behavior or crashes
> - Power management state transitions might become inconsistent if mac_work completes after ps_work has already proceeded with its power-save operations
>
> However, these risks are mitigated by:
> - The change is intentional and authored by MediaTek engineers who maintain the driver
> - The alternative (synchronous cancel) creates a known deadlock issue with 60% reproduction rate
> - The workqueue subsystem provides inherent protection against most race conditions
> - Similar patterns are used elsewhere in the kernel where work items need to coordinate
>
> The impact is limited to:
> - Systems with MediaTek MT792x series WiFi chipsets (mt7921, mt7925, etc.)
> - Primarily affects high-load scenarios where both work items are queued simultaneously
> - Does not affect other wireless drivers or systems without these chipsets
> Leon Yen (1):
> wifi: mt76: mt792x: Fix a potential deadlock in high-load situations
>
> drivers/net/wireless/mediatek/mt76/mt792x_mac.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
Acked-by: Yufeng Gao <yufeng.gao at canonical.com>
More information about the kernel-team
mailing list