[SRU][R][PATCH 0/1] System doesn't response with mt76 call trace
AceLan Kao
acelan.kao at canonical.com
Wed Apr 8 00:43:32 UTC 2026
From: "Chia-Lin Kao (AceLan)" <acelan.kao at canonical.com>
BugLink: https://bugs.launchpad.net/bugs/2137448
[Impact]
On Dell systems with MediaTek MT7925 WiFi cards (mt7925e driver), the system becomes unresponsive during firmware testing and high-load situations due to a deadlock in the mt76 driver. The system shows "workqueue hogging CPU" messages followed by system hang, preventing completion of certification testing.
The issue occurs because:
1. Two workqueue functions (ps_work and mac_work) attempt to cancel each other using cancel_delayed_work_sync()
2. In high-load situations, both works get queued but cannot execute until CPUs are available
3. When CPUs become available, both work functions may run simultaneously, each trying to synchronously cancel the other, resulting in a deadlock
The call path that creates the circular dependency is:
mt792x_mac_work() -> ... -> cancel_delayed_work_sync(&pm->ps_work);
mt792x_pm_power_save_work() -> cancel_delayed_work_sync(&mphy->mac_work);
[Fix]
Replace cancel_delayed_work_sync() with cancel_delayed_work() in the mt792x_pm_power_save_work() function to eliminate the deadlock condition.
Upstream commit (submitted to linux-wireless):
https://patchwork.kernel.org/project/linux-wireless/patch/20251215122231.3180648-1-leon.yen@mediatek.com/
In linux-next:
bb2f07819d063 wifi: mt76: mt792x: Fix a potential deadlock in high-load situations
The non-synchronous cancel is safe here because:
- The work cancellation is part of the power-save flow, not a critical cleanup path
- Avoiding synchronous wait prevents the circular dependency that causes the deadlock
- The code becomes simpler and easier to maintain
[Test Plan]
On a Dell system with MediaTek MT7925 WiFi (or similar affected platform):
1. Install fwts if not already available:
$ sudo apt-get install fwts
2. Monitor system logs in a separate terminal:
$ sudo dmesg -w
3. Run the firmware test cases that previously triggered the deadlock:
$ sudo fwts wakealarm
$ sudo fwts uefirtvariable
$ sudo fwts oops
Or run a comprehensive diagnostic test:
$ sudo fwts --log-level=high -r stdout
4. Check for symptoms during and after the tests:
Without the fix, you would see:
- "Message 00020080 (seq N) timeout" from mt7925e
- "workqueue: vmstat_update hogged CPU for >10000us" warnings
- "workqueue: psi_avgs_work hogged CPU for >10000us" warnings
- WARNING traces in iommu_dma_unmap_page
- System becoming unresponsive
With the fix, these symptoms should not occur and the system should remain responsive.
5. Run extended stress testing with WiFi activity during high CPU load:
$ stress-ng --cpu 128 --timeout 300s &
$ ping -f <router_ip> # flood ping to generate WiFi traffic
The system should remain stable without deadlocks.
[Where problems could occur]
This change affects the MediaTek MT792x WiFi driver's power management and workqueue interaction on systems with mt7925e and similar chipsets.
Potential issues if the non-synchronous cancel is not safe in this context:
- If there are assumptions in the code that mac_work must be fully stopped before proceeding, using non-synchronous cancel might allow mac_work to run concurrently with subsequent operations, potentially causing race conditions
- The mac_work might access hardware or data structures that ps_work assumes are quiescent after the cancel call, leading to unexpected behavior or crashes
- Power management state transitions might become inconsistent if mac_work completes after ps_work has already proceeded with its power-save operations
However, these risks are mitigated by:
- The change is intentional and authored by MediaTek engineers who maintain the driver
- The alternative (synchronous cancel) creates a known deadlock issue with 60% reproduction rate
- The workqueue subsystem provides inherent protection against most race conditions
- Similar patterns are used elsewhere in the kernel where work items need to coordinate
The impact is limited to:
- Systems with MediaTek MT792x series WiFi chipsets (mt7921, mt7925, etc.)
- Primarily affects high-load scenarios where both work items are queued simultaneously
- Does not affect other wireless drivers or systems without these chipsets
Leon Yen (1):
wifi: mt76: mt792x: Fix a potential deadlock in high-load situations
drivers/net/wireless/mediatek/mt76/mt792x_mac.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
--
2.53.0
More information about the kernel-team
mailing list