IRQ balancing problem on new Nehalem

Karl F. Larsen klarsen1 at gmail.com
Wed Oct 7 12:56:16 UTC 2009


DAHLBOKUM Markus (IVECO MOTORENFORSCHUNG AG) wrote:
> Hello all,
> 
> I have got a new server for our calculations department and I have some troubles with the performance.
> The server is:
> 
> CPU: XEON X5570 (1 Quad CPU)
> 24 GB Memory
> 2 10k Disks for the OS
> 10 15k Disks for scratch data
> Adaptec RAID 51645
> 
> The server is used for numerical calculations, so there is always a high load on the CPUs and on the disks.
> Simple benchmarks on the disks only or on the CPUs only give good performance values, but under full load the performance is not acceptable. The scratch disks have about 700 MB/s for reading and writing with bonnie++. This is ok. But within a calculation I only get 150 MB/s reading and 580 MB/s writing. Writing is ok because this is random access performance. But reading is a disaster. The calculation is reading 10h!!!
> What I found out now is that the interrupts all go over the first CPU. So all I/O activities (which can be a lot during a calculation) are handled by one core.
> 
> What I did so far:
> First I was running debian lenny (2.6.26) which was unstable -> several system freezes
> Then I updated the system via backports to kernel 2.6.30 -> the system became stable but with the described issues
> Now I'm running ubuntu 9.10 beta (2.6.31) -> still the same
> Between the OS changes I tried Knoppix 6.0.1 -> the IRQs were balanced!!!! (But I don't want to run knoppix as the main OS on this machine :))
> 
> The ksoftirqd are running on the machine but with no effect (why???). I tried the irqbalance package which had the effect that the interrupts were balanced a little more, but the I/O was still on one core.
> 
> Is this problem known? What can I do to balance the IRQs?
> 
> Thank you for your help.
> Markus
> 
> 
> 

	Your changing too many things. As for Ubuntu I would suggest 
you use 9.04 Jaunty and stay with it! This version is VERY 
stable.

	As for the odd results, once you stop changing the operating 
system, start checking the raid system. Look at each hard 
drive and measure data in/out times. Things like this will 
help you find the problem.


73 Karl


-- 

	Karl F. Larsen, AKA K5DI
	Linux User
	#450462   http://counter.li.org.
         Key ID = 3951B48D





More information about the ubuntu-users mailing list