Jaunty SRU #348836 - ext4: fix bb_prealloc_list corruption due to wrong group locking

Stefan Bader stefan.bader at canonical.com
Tue Apr 14 16:53:24 UTC 2009


Does this indicate that Steve did not think this is worth another pre-release 
upload? You wrote that you got positive feedback, so ACK from me. But as you 
put up this and the other mail, I take it, this fall onto my head. Then we need 
one more ACK.

Tim Gardner wrote:
> The following changes since commit 1901c1b176e0d8c62aeb1fd9512c45bc6010b180:
>   Eric Sandeen (1):
>         ext4: fix bb_prealloc_list corruption due to wrong group locking
> 
> are available in the git repository at:
> 
>   git://kernel.ubuntu.com/rtg/ubuntu-jaunty lp348836
> 
> From 1901c1b176e0d8c62aeb1fd9512c45bc6010b180 Mon Sep 17 00:00:00 2001
> From: Eric Sandeen <sandeen at redhat.com>
> Date: Mon, 16 Mar 2009 23:25:40 -0400
> Subject: [PATCH] ext4: fix bb_prealloc_list corruption due to wrong group locking
> 
> Bug: #348836
> 
> This is for Red Hat bug 490026: EXT4 panic, list corruption in
> ext4_mb_new_inode_pa
> 
> ext4_lock_group(sb, group) is supposed to protect this list for
> each group, and a common code flow to remove an album is like
> this:
> 
>     ext4_get_group_no_and_offset(sb, pa->pa_pstart, &grp, NULL);
>     ext4_lock_group(sb, grp);
>     list_del(&pa->pa_group_list);
>     ext4_unlock_group(sb, grp);
> 
> so it's critical that we get the right group number back for
> this prealloc context, to lock the right group (the one
> associated with this pa) and prevent concurrent list manipulation.
> 
> however, ext4_mb_put_pa() passes in (pa->pa_pstart - 1) with a
> comment, "-1 is to protect from crossing allocation group".
> 
> This makes sense for the group_pa, where pa_pstart is advanced
> by the length which has been used (in ext4_mb_release_context()),
> and when the entire length has been used, pa_pstart has been
> advanced to the first block of the next group.
> 
> However, for inode_pa, pa_pstart is never advanced; it's just
> set once to the first block in the group and not moved after
> that.  So in this case, if we subtract one in ext4_mb_put_pa(),
> we are actually locking the *previous* group, and opening the
> race with the other threads which do not subtract off the extra
> block.
> 
> Signed-off-by: Eric Sandeen <sandeen at redhat.com>
> Signed-off-by: "Theodore Ts'o" <tytso at mit.edu>
> Signed-off-by: Tim Gardner <tim.gardner at canonical.com>
> ---
>  fs/ext4/mballoc.c |    9 +++++++--
>  1 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
> index f8e923f..add854a 100644
> --- a/fs/ext4/mballoc.c
> +++ b/fs/ext4/mballoc.c
> @@ -3586,6 +3586,7 @@ static void ext4_mb_put_pa(struct ext4_allocation_context *ac,
>  			struct super_block *sb, struct ext4_prealloc_space *pa)
>  {
>  	unsigned long grp;
> +	ext4_fsblk_t grp_blk;
>  
>  	if (!atomic_dec_and_test(&pa->pa_count) || pa->pa_free != 0)
>  		return;
> @@ -3600,8 +3601,12 @@ static void ext4_mb_put_pa(struct ext4_allocation_context *ac,
>  	pa->pa_deleted = 1;
>  	spin_unlock(&pa->pa_lock);
>  
> -	/* -1 is to protect from crossing allocation group */
> -	ext4_get_group_no_and_offset(sb, pa->pa_pstart - 1, &grp, NULL);
> +	grp_blk = pa->pa_pstart;
> +	/* If linear, pa_pstart may be in the next group when pa is used up */
> +	if (pa->pa_linear)
> +		grp_blk--;
> +
> +	ext4_get_group_no_and_offset(sb, grp_blk, &grp, NULL);
>  
>  	/*
>  	 * possible race:


-- 

When all other means of communication fail, try words!






More information about the kernel-team mailing list