[Lucid] SRU: Some testing on the larger writeback set
Stefan Bader
stefan.bader at canonical.com
Fri Aug 20 09:59:27 UTC 2010
The discussion upstream which solution to take for the writeback umount
regression seems not really near a final conclusion. So I think we should make a
decision to move forward with the bigger set (which also seems to have good
effects on normal performance and responsiveness).
I ran 2 tests of my own and the xfstests test suite on ext4 and saw no
regression compared to before. Run-times were usually shorter with the patchset
applied:
mount-umount of tmpfs with other IO: 0.33s -> 0.02s
mount-cp-umount of ext4 : 9.00s -> 8.00s
xfstests on ext4 : 24m30.00s -> 19m40.00s
The xfstests failed two aio testcases (239, 240) in both cases with very similar
looking errors. My kernels are based on the 2.6.32-24.41 release, so there can
be fixes to ext4 in upcoming stable.
Then I tried the xfstests on xfs and got scared by this on the new kernel:
INFO: task xfs_io:5764 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
xfs_io D ffff880206439650 0 5764 3651 0x00000000
ffff8801f29dfd38 0000000000000082 0000000000015bc0 0000000000015bc0
ffff8801de759ab0 ffff8801f29dffd8 0000000000015bc0 ffff8801de7596f0
0000000000015bc0 ffff8801f29dffd8 0000000000015bc0 ffff8801de759ab0
Call Trace:
[<ffffffff815595f7>] __mutex_lock_slowpath+0xe7/0x170
[<ffffffff8114f1e1>] ? path_put+0x31/0x40
[<ffffffff81559033>] mutex_lock+0x23/0x50
[<ffffffff81152b59>] do_filp_open+0x3d9/0xba0
[<ffffffff810f4487>] ? unlock_page+0x27/0x30
[<ffffffff81112a19>] ? __do_fault+0x439/0x500
[<ffffffff81115b78>] ? handle_mm_fault+0x1a8/0x3c0
[<ffffffff8115e4ca>] ? alloc_fd+0x10a/0x150
[<ffffffff81142219>] do_sys_open+0x69/0x170
[<ffffffff81142360>] sys_open+0x20/0x30
[<ffffffff810131b2>] system_call_fastpath+0x16/0x1b
However the test run completes after 118m30s (and fails 10 out of 146 [017 109
194 198 225 229 232 238 239 240] tests). I did not see the dump on the old
kernel, but that might just be because writeback is too slow to show that race.
I would re-run the test on the old kernel to get the run-time and tests that
fail. Though the run-time seems tremendous (that's why I forgot to note things
down yesterday).
Though all in all I think it should be safe for a larger regression testing in
proposed. If there is no veto (and enough oks), I would add the set to our
master branch.
Stefan
More information about the kernel-team
mailing list