[Bug 13144] New: Hoary 2.6.10 oom-killer

bugzilla-daemon at bugzilla.ubuntu.com bugzilla-daemon at bugzilla.ubuntu.com
Tue Aug 2 07:50:16 UTC 2005


Please do not reply to this email.  You can add comments at
http://bugzilla.ubuntu.com/show_bug.cgi?id=13144
Ubuntu | kernel-package

           Summary: Hoary 2.6.10 oom-killer
           Product: Ubuntu
           Version: unspecified
          Platform: i386
        OS/Version: Linux
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: kernel-package
        AssignedTo: fabbione at ubuntu.com
        ReportedBy: herbert at linuxhacker.at
         QAContact: kernel-bugs at lists.ubuntu.com


Description

    The oom-killer terminates processes, in spite of the fact that there is a
lot of free swap space available. I can see this error in two situation: a) on a
4 CPU and 4 GB RAM with 8GB Swap with 1TB NFS mounted space for iozone, bonnie++
tests and b) on a 64MB RAM System with one week uptime (not much, but referenced
in the RedHat bugzilla - see below). The error in situation a) can be reproduced
every time and occours, if i start a I/O intensive application (like iozone) on
the 1TB NFS mounted filesystem (the message.log are below). The machine in
situation b) is for internet access and with the time, i missing important
processes. I found the oom-killer messages in the system logfile files.

Details for situation a::

    $ free
                 total       used       free     shared    buffers     cached
    Mem:       3637100     109448    3527652          0      47588      13952
    -/+ buffers/cache:      47908    3589192
    Swap:      7815612       2744    7812868

    $ df
    ... 
    server:/volume
                          1.1T  6.2G  1.1T   1% /mnt

    Software: pure Hoary with kernel: 2.6.10-5-686-smp

Testcase

    If i'm starting iozone on the /mnt filesystem, then i can observe with top,
that the Mem: free space get lower and lower and the Swap: cached  get higher
and higer. After few seconds the systems "hangs". If the sshd process will be
not terminated, then the systems recovers after a few seconds. In the message
logfile, i found the following::

        messages:

        Jul 27 14:46:54 localhost kernel: oom-killer: gfp_mask=0xd0
        Jul 27 14:46:54 localhost kernel: DMA per-cpu:
        Jul 27 14:46:54 localhost kernel: cpu 0 hot: low 2, high 6, batch 1
        Jul 27 14:46:54 localhost kernel: cpu 0 cold: low 0, high 2, batch 1
        Jul 27 14:46:54 localhost kernel: cpu 1 hot: low 2, high 6, batch 1
        Jul 27 14:46:54 localhost kernel: cpu 1 cold: low 0, high 2, batch 1
        Jul 27 14:46:54 localhost kernel: cpu 2 hot: low 2, high 6, batch 1
        Jul 27 14:46:54 localhost kernel: cpu 2 cold: low 0, high 2, batch 1
        Jul 27 14:46:54 localhost kernel: cpu 3 hot: low 2, high 6, batch 1
        Jul 27 14:46:54 localhost kernel: cpu 3 cold: low 0, high 2, batch 1
        Jul 27 14:46:54 localhost kernel: Normal per-cpu:
        Jul 27 14:46:54 localhost kernel: cpu 0 hot: low 32, high 96, batch 16
        Jul 27 14:46:54 localhost kernel: cpu 0 cold: low 0, high 32, batch 16
        Jul 27 14:46:54 localhost kernel: cpu 1 hot: low 32, high 96, batch 16
        Jul 27 14:46:54 localhost kernel: cpu 1 cold: low 0, high 32, batch 16
        Jul 27 14:46:54 localhost kernel: cpu 2 hot: low 32, high 96, batch 16
        Jul 27 14:46:54 localhost kernel: cpu 2 cold: low 0, high 32, batch 16
        Jul 27 14:46:54 localhost kernel: cpu 3 hot: low 32, high 96, batch 16
        Jul 27 14:46:54 localhost kernel: cpu 3 cold: low 0, high 32, batch 16
        Jul 27 14:46:54 localhost kernel: HighMem per-cpu:
        Jul 27 14:46:54 localhost kernel: cpu 0 hot: low 32, high 96, batch 16
        Jul 27 14:46:54 localhost kernel: cpu 0 cold: low 0, high 32, batch 16
        Jul 27 14:46:54 localhost kernel: cpu 1 hot: low 32, high 96, batch 16
        Jul 27 14:46:54 localhost kernel: cpu 1 cold: low 0, high 32, batch 16
        Jul 27 14:46:54 localhost kernel: cpu 2 hot: low 32, high 96, batch 16
        Jul 27 14:46:54 localhost kernel: cpu 2 cold: low 0, high 32, batch 16
        Jul 27 14:46:54 localhost kernel: cpu 3 hot: low 32, high 96, batch 16
        Jul 27 14:46:54 localhost kernel: cpu 3 cold: low 0, high 32, batch 16
        Jul 27 14:46:54 localhost kernel:
        Jul 27 14:46:54 localhost kernel: Free pages:        4668kB (896kB HighMem)
        Jul 27 14:46:54 localhost kernel: Active:2955 inactive:863904 dirty:0
writeback:367192 unstable:0 free:1167 slab:37
        359 mapped:2718 pagetables:116
        Jul 27 14:46:54 localhost kernel: DMA free:68kB min:68kB low:84kB
high:100kB active:0kB inactive:12652kB present:16
        384kB pages_scanned:0 all_unreclaimable? no
        Jul 27 14:46:54 localhost kernel: protections[]: 0 0 0
        Jul 27 14:46:54 localhost kernel: Normal free:3704kB min:3756kB
low:4692kB high:5632kB active:68kB inactive:710704k
        B present:901120kB pages_scanned:757 all_unreclaimable? no
        Jul 27 14:46:54 localhost kernel: protections[]: 0 0 0
        Jul 27 14:46:54 localhost kernel: HighMem free:896kB min:512kB low:640kB
high:768kB active:11752kB inactive:2732316
        kB present:2752460kB pages_scanned:0 all_unreclaimable? no
        Jul 27 14:46:54 localhost kernel: protections[]: 0 0 0
        Jul 27 14:46:54 localhost kernel: DMA: 1*4kB 0*8kB 0*16kB 0*32kB 1*64kB
0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0
        *4096kB = 68kB
        Jul 27 14:46:54 localhost kernel: Normal: 0*4kB 1*8kB 1*16kB 1*32kB
1*64kB 0*128kB 0*256kB 1*512kB 1*1024kB 1*2048k
        B 0*4096kB = 3704kB
        Jul 27 14:46:54 localhost kernel: HighMem: 96*4kB 0*8kB 0*16kB 0*32kB
0*64kB 0*128kB 0*256kB 1*512kB 0*1024kB 0*204
        8kB 0*4096kB = 896kB
        Jul 27 14:46:54 localhost kernel: Swap cache: add 686, delete 1, find
0/0, race 0+0
        Jul 27 14:46:54 localhost kernel: Swap cache: add 686, delete 1, find
0/0, race 0+0
        Jul 27 14:46:55 localhost kernel: oom-killer: gfp_mask=0xd0


References in the net

    I found the same problem description in the RedHat Bugzilla

    - "112891":https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=112891

    - "147832":https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=147832

    - "129156":https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=129156
    

Workaround

    With kernel 2.6.11-1-686-smp, i can't reproduce the error.
    

Details for situation b::

    $ free
                 total       used       free     shared    buffers     cached
    Mem:         61124      59504       1620          0       9252      25920
    -/+ buffers/cache:      24332      36792
    Swap:      1943736          0    1943736

    messages:

    Aug  1 22:42:05 localhost kernel: oom-killer: gfp_mask=0xd2
    Aug  1 22:42:05 localhost kernel: DMA per-cpu:
    Aug  1 22:42:05 localhost kernel: cpu 0 hot: low 2, high 6, batch 1
    Aug  1 22:42:05 localhost kernel: cpu 0 cold: low 0, high 2, batch 1
    Aug  1 22:42:05 localhost kernel: Normal per-cpu:
    Aug  1 22:42:05 localhost kernel: cpu 0 hot: low 4, high 12, batch 2
    Aug  1 22:42:05 localhost kernel: cpu 0 cold: low 0, high 4, batch 2
    Aug  1 22:42:05 localhost kernel: HighMem per-cpu: empty
    Aug  1 22:42:05 localhost kernel:
    Aug  1 22:42:05 localhost kernel: Free pages:        1604kB (0kB HighMem)
    Aug  1 22:42:05 localhost kernel: Active:749 inactive:81 dirty:0 writeback:3
unstable:0 free:401 slab:12926 mapped:
    807 pagetables:109
    Aug  1 22:42:05 localhost kernel: DMA free:316kB min:256kB low:320kB
high:384kB active:0kB inactive:16kB present:16
    384kB pages_scanned:18 all_unreclaimable? yes
    Aug  1 22:42:05 localhost kernel: protections[]: 0 0 0
    Aug  1 22:42:05 localhost kernel: Normal free:1288kB min:752kB low:940kB
high:1128kB active:2996kB inactive:308kB p
    resent:48128kB pages_scanned:899 all_unreclaimable? no
    Aug  1 22:42:05 localhost kernel: protections[]: 0 0 0
    Aug  1 22:42:05 localhost kernel: HighMem free:0kB min:128kB low:160kB
high:192kB active:0kB inactive:0kB present:0
    kB pages_scanned:0 all_unreclaimable? no
    Aug  1 22:42:05 localhost kernel: protections[]: 0 0 0
    Aug  1 22:42:05 localhost kernel: DMA: 13*4kB 1*8kB 2*16kB 3*32kB 2*64kB
0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB
    0*4096kB = 316kB
    Aug  1 22:42:05 localhost kernel: Normal: 124*4kB 5*8kB 1*16kB 1*32kB 1*64kB
1*128kB 0*256kB 1*512kB 0*1024kB 0*204
    8kB 0*4096kB = 1288kB
    Aug  1 22:42:05 localhost kernel: HighMem: empty
    Aug  1 22:42:05 localhost kernel: Swap cache: add 109074, delete 108905,
find 21539/46436, race 0+8


Question:

    Will this error situation fixed in hoary with the 2.6.10 kernel? I tried to
apply the RedHat patches, but i cannot get it to be working. With the 2.6.11
kernel it seems to be working.

-- 
Configure bugmail: http://bugzilla.ubuntu.com/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug, or are watching the QA contact.




More information about the kernel-bugs mailing list