ACK: [SRU][N][PATCH 0/2] BUG: kernel NULL pointer dereference in amdgpu
Masahiro Yamada
masahiro.yamada at canonical.com
Thu Apr 9 05:17:11 UTC 2026
On 4/8/26 11:00, AceLan Kao wrote:
> From: "Chia-Lin Kao (AceLan)" <acelan.kao at canonical.com>
>
> BugLink: https://bugs.launchpad.net/bugs/2144577
>
> [Impact]
> System freezes during boot on machines with AMD Southern Islands (SI) GPUs
> using the amdgpu driver
> .
> The amdgpu driver calls flush_gpu_tlb_pasid() in a workqueue, but on SI
> hardware this function pointer is NULL. The kernel hits a NULL pointer
> dereference in amdgpu_gmc_flush_gpu_tlb_pasid() and crashes.
>
> Error log:
> kernel: BUG: kernel NULL pointer dereference, address: 0000000000000000
> kernel: Workqueue: events amdgpu_tlb_fence_work [amdgpu]
> kernel: RIP: 0010:0x0
> kernel: Call Trace:
> kernel: amdgpu_gmc_flush_gpu_tlb_pasid+0xfd/0x480 [amdgpu]
> kernel: amdgpu_tlb_fence_work+0x77/0x110 [amdgpu]
>
> Hits every boot on affected hardware. Regression from 6.17.0-14 to 6.17.0-19.
>
> [Fix]
> Two patches fix this together:
> 1. f4db9913e4d3 ("drm/amdgpu: validate the flush_gpu_tlb_pasid()")
> Adds a NULL check for flush_gpu_tlb_pasid before calling it.
> Upstream in v7.0-rc1.
> 2. e3a6eff92bbd ("drm/amdgpu: Fix validating flush_gpu_tlb_pasid()")
> Fixes the first patch — the early return skipped the unlock, causing
> a deadlock. Changes the bare return to a goto that unlocks first.
> Upstream in v7.0-rc1.
> Fixes: f4db9913e4d3
>
> [Test Plan]
> On a machine with an AMD SI GPU (Tahiti, Pitcairn, Verde, Oland, Hainan)
> booted with amdgpu.si_support=1:
>
> $ sudo reboot
>
> Without patches: kernel NULL pointer dereference during boot, system freezes.
> With patches: system boots normally, no crash or error in dmesg.
>
> Check dmesg after boot:
> $ dmesg | grep -i "BUG\|NULL pointer\|amdgpu"
>
> Without patches: "BUG: kernel NULL pointer dereference" present.
> With patches: no BUG or NULL pointer lines.
>
> [Where problems could occur]
> Could break TLB flushing on amdgpu.
>
> If the NULL check gates too broadly, TLB flushes could be skipped on GPUs
> that do have flush_gpu_tlb_pasid. This would cause stale TLB entries and
> GPU page faults or rendering corruption.
>
> The unlock path change in the second patch touches the reset/lock logic in
> amdgpu_gmc_flush_gpu_tlb_pasid(). A wrong goto target could leave the
> reset domain lock held, deadlocking the GPU.
>
> [Other Info]
> Both patches are upstream in v7.0-rc1.
>
> Prike Liang (1):
> drm/amdgpu: validate the flush_gpu_tlb_pasid()
>
> Timur Kristóf (1):
> drm/amdgpu: Fix validating flush_gpu_tlb_pasid()
>
> drivers/gpu/drm/amd/amdgpu/amdgpu_gmc.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
Acked-by: Masahiro Yamada <masahiro.yamada at canonical.com>
More information about the kernel-team
mailing list