[3.11.y.z extended stable] Patch "xfs: block allocation work needs to be kswapd aware" has been added to staging queue
Dave Chinner
david at fromorbit.com
Fri Jun 27 02:17:01 UTC 2014
On Thu, Jun 26, 2014 at 11:36:31AM +0100, Luis Henriques wrote:
> This is a note to let you know that I have just added a patch titled
>
> xfs: block allocation work needs to be kswapd aware
>
> to the linux-3.11.y-queue branch of the 3.11.y.z extended stable tree
> which can be found at:
>
> http://kernel.ubuntu.com/git?p=ubuntu/linux.git;a=shortlog;h=refs/heads/linux-3.11.y-queue
>
> If you, or anyone else, feels it should not be added to this tree, please
> reply to this email.
>
> For more information about the 3.11.y.z tree, see
> https://wiki.ubuntu.com/Kernel/Dev/ExtendedStable
No, do not apply this commit to stable kernels - it is due to be
reverted as is causes major memory reclaim behaviour regressions.
http://oss.sgi.com/pipermail/xfs/2014-June/036851.html
Cheers,
Dave.
>
> Thanks.
> -Luis
>
> ------
>
> From 979242514e5dbb38d8c68e91bb7adcf9cffce846 Mon Sep 17 00:00:00 2001
> From: Dave Chinner <dchinner at redhat.com>
> Date: Fri, 6 Jun 2014 15:59:59 +1000
> Subject: xfs: block allocation work needs to be kswapd aware
>
> commit 1f6d64829db78a7e1d63e15c9f48f0a5d2b5a679 upstream.
>
> Upon memory pressure, kswapd calls xfs_vm_writepage() from
> shrink_page_list(). This can result in delayed allocation occurring
> and that gets deferred to the the allocation workqueue.
>
> The allocation then runs outside kswapd context, which means if it
> needs memory (and it does to demand page metadata from disk) it can
> block in shrink_inactive_list() waiting for IO congestion. These
> blocking waits are normally avoiding in kswapd context, so under
> memory pressure writeback from kswapd can be arbitrarily delayed by
> memory reclaim.
>
> To avoid this, pass the kswapd context to the allocation being done
> by the workqueue, so that memory reclaim understands correctly that
> the work is being done for kswapd and therefore it is not blocked
> and does not delay memory reclaim.
>
> To avoid issues with int->char conversion of flag fields (as noticed
> in v1 of this patch) convert the flag fields in the struct
> xfs_bmalloca to bool types. pahole indicates these variables are
> still single byte variables, so no extra space is consumed by this
> change.
>
> Reported-by: Tetsuo Handa <penguin-kernel at I-love.SAKURA.ne.jp>
> Signed-off-by: Dave Chinner <dchinner at redhat.com>
> Reviewed-by: Christoph Hellwig <hch at lst.de>
> Signed-off-by: Dave Chinner <david at fromorbit.com>
> [ luis: backported to 3.11: files rename:
> - fs/xfs/xfs_bmap_util.c -> fs/xfs/xfs_bmap.c
> - fs/xfs/xfs_bmap_util.h -> fs/xfs/xfs_bmap.h ]
> Signed-off-by: Luis Henriques <luis.henriques at canonical.com>
> ---
> fs/xfs/xfs_bmap.c | 16 +++++++++++++---
> fs/xfs/xfs_bmap.h | 13 +++++++------
> 2 files changed, 20 insertions(+), 9 deletions(-)
>
> diff --git a/fs/xfs/xfs_bmap.c b/fs/xfs/xfs_bmap.c
> index 05c698ccb238..ed6dc2d02573 100644
> --- a/fs/xfs/xfs_bmap.c
> +++ b/fs/xfs/xfs_bmap.c
> @@ -4763,14 +4763,23 @@ xfs_bmapi_allocate_worker(
> struct xfs_bmalloca *args = container_of(work,
> struct xfs_bmalloca, work);
> unsigned long pflags;
> + unsigned long new_pflags = PF_FSTRANS;
>
> - /* we are in a transaction context here */
> - current_set_flags_nested(&pflags, PF_FSTRANS);
> + /*
> + * we are in a transaction context here, but may also be doing work
> + * in kswapd context, and hence we may need to inherit that state
> + * temporarily to ensure that we don't block waiting for memory reclaim
> + * in any way.
> + */
> + if (args->kswapd)
> + new_pflags |= PF_MEMALLOC | PF_SWAPWRITE | PF_KSWAPD;
> +
> + current_set_flags_nested(&pflags, new_pflags);
>
> args->result = __xfs_bmapi_allocate(args);
> complete(args->done);
>
> - current_restore_flags_nested(&pflags, PF_FSTRANS);
> + current_restore_flags_nested(&pflags, new_pflags);
> }
>
> /*
> @@ -4789,6 +4798,7 @@ xfs_bmapi_allocate(
>
>
> args->done = &done;
> + args->kswapd = current_is_kswapd();
> INIT_WORK_ONSTACK(&args->work, xfs_bmapi_allocate_worker);
> queue_work(xfs_alloc_wq, &args->work);
> wait_for_completion(&done);
> diff --git a/fs/xfs/xfs_bmap.h b/fs/xfs/xfs_bmap.h
> index 1cf1292d29b7..035afc71a9ca 100644
> --- a/fs/xfs/xfs_bmap.h
> +++ b/fs/xfs/xfs_bmap.h
> @@ -130,12 +130,13 @@ typedef struct xfs_bmalloca {
> xfs_extlen_t total; /* total blocks needed for xaction */
> xfs_extlen_t minlen; /* minimum allocation size (blocks) */
> xfs_extlen_t minleft; /* amount must be left after alloc */
> - char eof; /* set if allocating past last extent */
> - char wasdel; /* replacing a delayed allocation */
> - char userdata;/* set if is user data */
> - char aeof; /* allocated space at eof */
> - char conv; /* overwriting unwritten extents */
> - char stack_switch;
> + bool eof; /* set if allocating past last extent */
> + bool wasdel; /* replacing a delayed allocation */
> + bool userdata;/* set if is user data */
> + bool aeof; /* allocated space at eof */
> + bool conv; /* overwriting unwritten extents */
> + bool stack_switch;
> + bool kswapd; /* allocation in kswapd context */
> int flags;
> struct completion *done;
> struct work_struct work;
> --
> 1.9.1
>
>
--
Dave Chinner
david at fromorbit.com
More information about the kernel-team
mailing list