Jaunty SRU #348836 - ext4: fix bb_prealloc_list corruption due to wrong group locking
Stefan Bader
stefan.bader at canonical.com
Tue Apr 14 16:53:24 UTC 2009
Does this indicate that Steve did not think this is worth another pre-release
upload? You wrote that you got positive feedback, so ACK from me. But as you
put up this and the other mail, I take it, this fall onto my head. Then we need
one more ACK.
Tim Gardner wrote:
> The following changes since commit 1901c1b176e0d8c62aeb1fd9512c45bc6010b180:
> Eric Sandeen (1):
> ext4: fix bb_prealloc_list corruption due to wrong group locking
>
> are available in the git repository at:
>
> git://kernel.ubuntu.com/rtg/ubuntu-jaunty lp348836
>
> From 1901c1b176e0d8c62aeb1fd9512c45bc6010b180 Mon Sep 17 00:00:00 2001
> From: Eric Sandeen <sandeen at redhat.com>
> Date: Mon, 16 Mar 2009 23:25:40 -0400
> Subject: [PATCH] ext4: fix bb_prealloc_list corruption due to wrong group locking
>
> Bug: #348836
>
> This is for Red Hat bug 490026: EXT4 panic, list corruption in
> ext4_mb_new_inode_pa
>
> ext4_lock_group(sb, group) is supposed to protect this list for
> each group, and a common code flow to remove an album is like
> this:
>
> ext4_get_group_no_and_offset(sb, pa->pa_pstart, &grp, NULL);
> ext4_lock_group(sb, grp);
> list_del(&pa->pa_group_list);
> ext4_unlock_group(sb, grp);
>
> so it's critical that we get the right group number back for
> this prealloc context, to lock the right group (the one
> associated with this pa) and prevent concurrent list manipulation.
>
> however, ext4_mb_put_pa() passes in (pa->pa_pstart - 1) with a
> comment, "-1 is to protect from crossing allocation group".
>
> This makes sense for the group_pa, where pa_pstart is advanced
> by the length which has been used (in ext4_mb_release_context()),
> and when the entire length has been used, pa_pstart has been
> advanced to the first block of the next group.
>
> However, for inode_pa, pa_pstart is never advanced; it's just
> set once to the first block in the group and not moved after
> that. So in this case, if we subtract one in ext4_mb_put_pa(),
> we are actually locking the *previous* group, and opening the
> race with the other threads which do not subtract off the extra
> block.
>
> Signed-off-by: Eric Sandeen <sandeen at redhat.com>
> Signed-off-by: "Theodore Ts'o" <tytso at mit.edu>
> Signed-off-by: Tim Gardner <tim.gardner at canonical.com>
> ---
> fs/ext4/mballoc.c | 9 +++++++--
> 1 files changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index f8e923f..add854a 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -3586,6 +3586,7 @@ static void ext4_mb_put_pa(struct ext4_allocation_context *ac,
> struct super_block *sb, struct ext4_prealloc_space *pa)
> {
> unsigned long grp;
> + ext4_fsblk_t grp_blk;
>
> if (!atomic_dec_and_test(&pa->pa_count) || pa->pa_free != 0)
> return;
> @@ -3600,8 +3601,12 @@ static void ext4_mb_put_pa(struct ext4_allocation_context *ac,
> pa->pa_deleted = 1;
> spin_unlock(&pa->pa_lock);
>
> - /* -1 is to protect from crossing allocation group */
> - ext4_get_group_no_and_offset(sb, pa->pa_pstart - 1, &grp, NULL);
> + grp_blk = pa->pa_pstart;
> + /* If linear, pa_pstart may be in the next group when pa is used up */
> + if (pa->pa_linear)
> + grp_blk--;
> +
> + ext4_get_group_no_and_offset(sb, grp_blk, &grp, NULL);
>
> /*
> * possible race:
--
When all other means of communication fail, try words!
More information about the kernel-team
mailing list