[Pull Request][Lucid] Fix Xorg hangs with some i915 chipsets
Seth Forshee
seth.forshee at canonical.com
Wed Jun 22 15:44:10 UTC 2011
On Wed, Jun 22, 2011 at 04:50:34PM +0200, Stefan Bader wrote:
> On 17.06.2011 18:38, Seth Forshee wrote:
> > This is a rather large series of non-trivial backports to fix a Xorg
> > hang caused by the i915 driver. Patches 5 and 6 are the key patches that
> > implement an LRU-based eviction algorithm; patches 1-4 are
> > prerequisites, patch 7 fixes some of the clean-up, and patch 8 adds a
> > periodic flush to retire active buffers.
> >
> > This is a pretty severe bug for those affected by it, but it's also a
> > huge amount of change for a stable kernel, so let me know if it's beyond
> > hope for an SRU.
> >
> > SRU Justification:
> >
> > Impact: When the GPU aperture becomes sufficiently fragmented two or
> > more mmapped buffers in active use can repeatedly push each other out
> > of the aperture. This results in a sort of livelock situation, with
> > Xorg appearing to be hung with high CPU utilization.
> >
> > Fix: Backport of a series of patches that convert the i915 eviction
> > algorithm from a best-fit approach to one based on evicting the least
> > recently used objects and a patch that adds a periodic flush requests
> > to retire active buffers when no client is active.
> >
> > Testcase: Without these patches this scenario can be triggered
> > ocassionally when visiting certain web pages that utilize Flash or
> > viewing certain images in Firefox, as described on bugs #599017 and
> > #330460. Testing the same pages/images with these patches does not
> > induce the hang.
> >
> >
> > The following changes since commit 9cc333d02cb4620561e91a63ed86a2159ec32638:
> >
> > UBUNTU: Ubuntu-2.6.32-33.67 (2011-06-16 11:26:04 -0500)
> >
> > are available in the git repository at:
> > git://kernel.ubuntu.com/sforshee/ubuntu-lucid.git lp599017
> >
> > Chris Wilson (5):
> > drm/i915: Move the eviction logic to its own file.
> > drm/i915: Implement fair lru eviction across both rings. (v2)
> > drm/i915: Maintain LRU order of inactive objects upon access by CPU (v2)
> > drm/i915/evict: Ensure we completely cleanup on failure
> > drm/i915: Periodically flush the active lists and requests
> >
> > Daniel Vetter (3):
> > drm_mm: extract check_free_mm_node
> > drm: implement helper functions for scanning lru list
> > drm/i915: prepare for fair lru eviction
> >
> > drivers/gpu/drm/drm_mm.c | 236 +++++++++++++++++++++++++-----
> > drivers/gpu/drm/i915/Makefile | 1 +
> > drivers/gpu/drm/i915/i915_drv.h | 13 ++
> > drivers/gpu/drm/i915/i915_gem.c | 228 ++++-------------------------
> > drivers/gpu/drm/i915/i915_gem_evict.c | 253 +++++++++++++++++++++++++++++++++
> > include/drm/drm_mm.h | 15 ++-
> > 6 files changed, 510 insertions(+), 236 deletions(-)
> > create mode 100644 drivers/gpu/drm/i915/i915_gem_evict.c
> >
>
> I also tend to say the bug sounds serious enough to make an exception to the
> rule of limiting the change.
>
> I would probably add those to the 2.6.32+drm33 tree. Though there are two
> details which I think should be resolved before:
> 1. In the bug report (#599017) the last request for testing points to a place
> which only shows 6 patches. Does the kernel provided actually contain all of them?
No, I guess I didn't post a link to that build on the bug, although I
just did so. I have an affected machine though and have tested the full
series myself for several days. I was able to trigger the bug very
ocassionally without the last patch but have never triggered it with
that patch.
> 2. It would be good to get some feedback about the test kernel in the bug.
I have received some offline feedback, but I'll ask for the feedback to
be posted to the bug.
More information about the kernel-team
mailing list