SRU request for LP#208551

Thu Sep 11 02:57:25 UTC 2008

Colin Ian King wrote:
> https://bugs.launchpad.net/ubuntu/hardy/+source/linux/+bug/208551
> 
> Sï»¿RU justification:
> 
> Impact: mdadm, Raid5 get stuck in uninterruptable sleep under heavy I/O
> load. Copying data to a Raid 5 XFS partition results in a permanent lock
> on several processes related to it, getting stuck in the D(+) state.
> Occurs when large quantities of data (10-40 GB) is copied, resulting in
> processes being unkillable, and the system cannot reboot and requires
> power cycling the server.
> 
> Fix: The patch from commit 6ed3003c19a96fe18edf8179c4be6fe14abbebbc. The
> fix is to not make any generic_make_request() calls in raid5
> make_request until all waiting has been done.  We do this by simply
> setting STRIPE_HANDLE instead of calling handle_stripe(). This causes a
> performance hit, so this patch also only calls raid5_activate_delayed()
> at unplug time, never in raid5.  This seems to bring back the
> performance numbers. [quoting the commit message]
> 
> Testing: Without the patch, Raid 5 using md on an XFS filesystem locks
> up under heavy data copying - this is repeatable. With the patch, the
> lock up does not occur.
> 
> Patch tested in my PPA by Andrew Cholakian
> https://bugs.launchpad.net/ubuntu/hardy/+source/linux/+bug/208551/comments/16
> on 2 64 bit servers.
> 
> Patch attached.
> 

ACK. How far back does this bug go? Might this patch be appropriate for
older releases?

-- 
Tim Gardner tim.gardner at canonical.com