[RFC] X Display Failures in Suspend and Resume
Jim Lieb
jim.lieb at canonical.com
Tue Nov 11 18:06:03 UTC 2008
On Tuesday 11 November 2008 09:18:34 Matthew Garrett wrote:
> On Mon, Nov 10, 2008 at 07:36:31PM -0800, Jim Lieb wrote:
> > On Monday 10 November 2008 17:25:24 Matthew Garrett wrote:
> > > Are you sure this diagnosis is correct? The resume from hibernation
> > > case is made rather more "interesting" due to usplash also being
> > > involved, but the suspend to RAM case is no more inherently racy than
> > > the normal VT switching case. Not having video on resume from RAM will
> > > generally be down to the kernel failing to restore graphical state,
> > > something it can currently only do for Intel hardware[1]. The userspace
> > > workarounds for reinitialising the graphics are certainly not
> > > guaranteed to work reliably.
> >
> > I've not gone into the details of the hibernation case. The issues that
> > I was originally looking into was a combination of the server not
> > resuming properly and some other desktop apps not resuming either because
> > they got lost in a VT_WAITACTIVE. The end result is the same, races.
> > The mix of suspend/resume (or hibernate/resume) into this mix just makes
> > it worse.
>
> If you resume and have unitialised graphics hardware, then there's a
> reasonable chance that the X server will attempt to switch back to
> graphics mode, hang and leave the VT system in VT_WAITACTIVE. The hang
> there is a symptom of the problem rather than the genuine cause. If it
> were possible to trigger in the general case then you'd also frequently
> see it when switching the console under normal use.
The ALT+Fn as the normal case works as I mention in the note because
I'm on the keyboard pounding the keys and the wait eventually returns;
that and my fingers are considerably slower than the HT enabled processor
that brought this to the fore. The potential of hangs going into graphics
mode are also an issue we have seen. I haven't gotten down to that
detail within the video drivers but there should be a prepare method there
as well. Part of what I see in the details is a way to detect errors at this
level and error return the handoff, in essence, breaking the wait. The
handoff can then be re-tried after hammering the hw again. btw, my
new HP laptop would hang too in Vista as did my old company Toshiba
under WinXP so not all is dispair ;)
>
> > There are some scenarios that have nothing to do with suspend/resume
> > that also break. F9 had similar problems switching from the boot X
> > server from what I've read.
>
> I also had some issues with usplash. It's very easy to get this wrong,
> but it's possible to write apps that avoid tickling the issues.
But, although it does not hang the kernel, we shouldn't depend on apps
either. There are more of them and at least the interface should error back
to the innocent party which could, presumably at least start the
cleanup/reset.
>
> > Thanks for your comments. Do you think the design is a) workable and
> > b) worth the work to shift to it? Our goal here is to make this problem
> > "go away" by cleaning up the API races.
>
> I think the direction the kernel's going in means that VT switching will
> be a pretty uncommon case in the not too distant future. I'm also pretty
> sure that the problem you're seeing has VT switching issues as a
> symptom, not a cause.
Most agreed. That is one of the reasons I like the console daemon idea.
At the end point, all we have is a mechanism to had a driver back and
forth among the display daemons. The whole of the vt switching logic
disappears. For those environments that still need console switching,
the console daemon covers that case at no code expense to the
kernel. If we dropped VTs all together, the suspend could be a real
suspend as in, "X server, we just closed the lid. Clean up and prepare
for suspend." That is a slight but important difference.
Thanks for the comments. Gives me a little more food for thought.
--
Jim Lieb
Ubuntu Kernel Team
Canonical Ltd.
More information about the kernel-team
mailing list