[PATCH 1/2] drm/i915: Fix ref->mutex deadlock in i915_active_wait()

Seth Forshee seth.forshee at canonical.com
Thu Apr 9 19:56:04 UTC 2020


On Tue, Apr 07, 2020 at 03:27:39PM -0700, Sultan Alsawaf wrote:
> From: Sultan Alsawaf <sultan at kerneltoast.com>
> 
> The following deadlock exists in i915_active_wait() due to a double lock
> on ref->mutex (call chain listed in order from top to bottom):
>  i915_active_wait();
>  mutex_lock_interruptible(&ref->mutex); <-- ref->mutex first acquired
>  i915_active_request_retire();
>  node_retire();
>  active_retire();
>  mutex_lock_nested(&ref->mutex, SINGLE_DEPTH_NESTING); <-- DEADLOCK
> 
> Fix the deadlock by skipping the second ref->mutex lock when
> active_retire() is called through i915_active_request_retire().
> 
> Note that this bug only affects 5.4 and has since been fixed in 5.5.
> Normally, a backport of the fix from 5.5 would be in order, but the
> patch set that fixes this deadlock involves massive changes that are
> neither feasible nor desirable for backporting [1][2][3]. Therefore,
> this small patch was made to address the deadlock specifically for 5.4.
> 
> [1] 274cbf20fd10 ("drm/i915: Push the i915_active.retire into a worker")
> [2] 093b92287363 ("drm/i915: Split i915_active.mutex into an irq-safe spinlock for the rbtree")
> [3] 750bde2fd4ff ("drm/i915: Serialise with remote retirement")
> 
> Fixes: 12c255b5dad1 ("drm/i915: Provide an i915_active.acquire callback")
> Cc: <stable at vger.kernel.org> # 5.4.x
> Signed-off-by: Sultan Alsawaf <sultan at kerneltoast.com>
> Signed-off-by: Sultan Alsawaf <sultan.alsawaf at canonical.com>

I think this patch essentially does what is intended, though I wish that
were more obviously the case. Going through the code makes me dizzy --
various structures with their own retire callbacks, and trying to keep
track of what's being retired.

The i915_active_request struct has a retire callback that you are no
longer using. It seems that would almost always be set to node_retire(),
in which case having the callback in the first place seems pretty
pointless. But it's the "almost" which gives me pause -- in a couple of
cases I see INIT_ACTIVE_REQUEST() being used, which would set the
callback to i915_active_retire_noop(). I _think_ that the requests will
never be retired in that state, but I'm not 100% certain.

Are you 100% certain that the requests are never retired with the
callback set to i915_active_retire_noop()? If so, then I think the patch
accomplished its purpose, even if it is making some already confusing
code even more confusing.

Thanks,
Seth



More information about the kernel-team mailing list