[PATCH 092/104] mm: fix aio performance regression for database caused by THP
Greg Kroah-Hartman
gregkh at linuxfoundation.org
Mon Sep 30 15:00:02 UTC 2013
On Mon, Sep 30, 2013 at 07:31:35AM -0600, Khalid Aziz wrote:
> On 09/30/2013 07:26 AM, Greg Kroah-Hartman wrote:
> > On Mon, Sep 30, 2013 at 03:14:52PM +0200, Jack Wang wrote:
> >> On 09/30/2013 12:11 PM, Luis Henriques wrote:
> >>> 3.5.7.22 -stable review patch. If anyone has any objections, please let me know.
> >>>
> >>> ------------------
> >>>
> >>> From: Khalid Aziz <khalid.aziz at oracle.com>
> >>>
> >>> commit 7cb2ef56e6a8b7b368b2e883a0a47d02fed66911 upstream.
> >>>
> >>> I am working with a tool that simulates oracle database I/O workload.
> >>> This tool (orion to be specific -
> >>> <http://docs.oracle.com/cd/E11882_01/server.112/e16638/iodesign.htm#autoId24>)
> >>> allocates hugetlbfs pages using shmget() with SHM_HUGETLB flag. It then
> >>> does aio into these pages from flash disks using various common block
> >>> sizes used by database. I am looking at performance with two of the most
> >>> common block sizes - 1M and 64K. aio performance with these two block
> >>> sizes plunged after Transparent HugePages was introduced in the kernel.
> >>> Here are performance numbers:
> >>>
> >>> pre-THP 2.6.39 3.11-rc5
> >>> 1M read 8384 MB/s 5629 MB/s 6501 MB/s
> >>> 64K read 7867 MB/s 4576 MB/s 4251 MB/s
> >>>
> >>> I have narrowed the performance impact down to the overheads introduced by
> >>> THP in __get_page_tail() and put_compound_page() routines. perf top shows
> >>>> 40% of cycles being spent in these two routines. Every time direct I/O
> >>> to hugetlbfs pages starts, kernel calls get_page() to grab a reference to
> >>> the pages and calls put_page() when I/O completes to put the reference
> >>> away. THP introduced significant amount of locking overhead to get_page()
> >>> and put_page() when dealing with compound pages because hugepages can be
> >>> split underneath get_page() and put_page(). It added this overhead
> >>> irrespective of whether it is dealing with hugetlbfs pages or transparent
> >>> hugepages. This resulted in 20%-45% drop in aio performance when using
> >>> hugetlbfs pages.
> >>>
> >>> Since hugetlbfs pages can not be split, there is no reason to go through
> >>> all the locking overhead for these pages from what I can see. I added
> >>> code to __get_page_tail() and put_compound_page() to bypass all the
> >>> locking code when working with hugetlbfs pages. This improved performance
> >>> significantly. Performance numbers with this patch:
> >>>
> >>> pre-THP 3.11-rc5 3.11-rc5 + Patch
> >>> 1M read 8384 MB/s 6501 MB/s 8371 MB/s
> >>> 64K read 7867 MB/s 4251 MB/s 6510 MB/s
> >>>
> >>> Performance with 64K read is still lower than what it was before THP, but
> >>> still a 53% improvement. It does mean there is more work to be done but I
> >>> will take a 53% improvement for now.
> >>>
> >>> Please take a look at the following patch and let me know if it looks
> >>> reasonable.
> >>>
> >>> [akpm at linux-foundation.org: tweak comments]
> >>> Signed-off-by: Khalid Aziz <khalid.aziz at oracle.com>
> >>> Cc: Pravin B Shelar <pshelar at nicira.com>
> >>> Cc: Christoph Lameter <cl at linux.com>
> >>> Cc: Andrea Arcangeli <aarcange at redhat.com>
> >>> Cc: Johannes Weiner <hannes at cmpxchg.org>
> >>> Cc: Mel Gorman <mel at csn.ul.ie>
> >>> Cc: Rik van Riel <riel at redhat.com>
> >>> Cc: Minchan Kim <minchan at kernel.org>
> >>> Cc: Andi Kleen <andi at firstfloor.org>
> >>> Signed-off-by: Andrew Morton <akpm at linux-foundation.org>
> >>> Signed-off-by: Linus Torvalds <torvalds at linux-foundation.org>
> >>> [ luis: backported to 3.5: adjusted context ]
> >>> Signed-off-by: Luis Henriques <luis.henriques at canonical.com>
> >> Hi Greg,
> >>
> >> I suppose this patch also needed for 3.4, right?
> >
> > As it didn't originally apply there, I didn't apply it.
> >
> > If people think it should be applicable for 3.4, I'll take it.
> >
> > thanks,
> >
> > greg k-h
> >
>
> Hi Greg,
>
> I did send you a backported version of this patch to apply to 3.0, 3.2
> and 3.4 last Monday and cc'd stable at vger.kernel.org. That patch should
> apply cleanly to those three kernels.
Ah, you didn't specifically say that in the patch, so I just thought you
were reminding me to apply it to the 3.10 and 3.11 trees. Please be
more explicit in the future.
I'll queue it up for the next round of stable kernels after this one.
thanks,
greg k-h
More information about the kernel-team
mailing list