[PATCH 1/2] drm/i915: Fix ref->mutex deadlock in i915_active_wait()

Seth Forshee seth.forshee at canonical.com
Thu Apr 9 20:20:22 UTC 2020


On Thu, Apr 09, 2020 at 01:09:49PM -0700, Sultan Alsawaf wrote:
> On Thu, Apr 09, 2020 at 02:56:04PM -0500, Seth Forshee wrote:
> > I think this patch essentially does what is intended, though I wish that
> > were more obviously the case. Going through the code makes me dizzy --
> > various structures with their own retire callbacks, and trying to keep
> > track of what's being retired.
> > 
> > The i915_active_request struct has a retire callback that you are no
> > longer using. It seems that would almost always be set to node_retire(),
> > in which case having the callback in the first place seems pretty
> > pointless. But it's the "almost" which gives me pause -- in a couple of
> > cases I see INIT_ACTIVE_REQUEST() being used, which would set the
> > callback to i915_active_retire_noop(). I _think_ that the requests will
> > never be retired in that state, but I'm not 100% certain.
> > 
> > Are you 100% certain that the requests are never retired with the
> > callback set to i915_active_retire_noop()? If so, then I think the patch
> > accomplished its purpose, even if it is making some already confusing
> > code even more confusing.
> 
> The trick is that only `struct active_node` objects are placed onto the
> `&ref->tree` rbtree, which is what is being retired:
> > 	rbtree_postorder_for_each_entry_safe(it, n, &ref->tree, node) {
> 
> `struct active_node` objects are only ever coupled with `node_retire()` when
> they are allocated in `i915_active_acquire_preallocate_barrier()` and
> `active_instance()`, so its retire callback always points to `node_retire()`.
> 
> The callback isn't exactly pointless because `struct active_node` objects can be
> retired through other generic code paths. It's just that in this instance,
> `i915_active_wait()` specifically only retires `struct active_node` objects.

Ok, thanks for the explanation. There's a lot of indirection and
subtlety in this code, which makes these patches pretty difficult to
review if one hasn't already spent time there (and also makes the code
prone to bugs, as I guess you've discovered).

I'm still working on patch 2 ...



More information about the kernel-team mailing list