100-200 load average on an idle system: Help!

Tue Jan 9 00:58:02 UTC 2007

Hi folks,

My Ubuntu Dapper (6.06.1) box is consistently showing a load average of 
100+ while completely idle, not swapping, and not particularly using its 
disks:

(from top)
top - 11:47:24 up 19:19,  1 user,  load average: 127.30, 126.59, 124.14
Tasks: 362 total,   2 running, 360 sleeping,   0 stopped,   0 zombie
Cpu(s):  1.0% us,  0.2% sy,  0.0% ni, 98.5% id,  0.3% wa,  0.0% hi,  0.0% si
Mem:    499648k total,   489568k used,    10080k free,    21220k buffers
Swap:  1502068k total,       88k used,  1501980k free,   159440k cached

   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND 

15560 thorin    15   0  6204 2444 1760 R  1.9  0.5   0:00.03 top 

     1 root      16   0  1564  528  460 S  0.0  0.1   0:01.20 init 

     2 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 
migration/0
     3 root      34  19     0    0    0 S  0.0  0.0   0:00.00 
ksoftirqd/0
     4 root      RT   0     0    0    0 S  0.0  0.0   0:00.00 watchdog/0 

     5 root      10  -5     0    0    0 S  0.0  0.0   0:00.10 events/0 

     6 root      10  -5     0    0    0 S  0.0  0.0   0:00.01 khelper 

     7 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 kthread 

     9 root      10  -5     0    0    0 S  0.0  0.0   0:00.00 kblockd/0 

    10 root      20  -5     0    0    0 S  0.0  0.0   0:00.00 kacpid 

   127 root      15   0     0    0    0 S  0.0  0.0   0:00.02 pdflush 

   128 root      15   0     0    0    0 S  0.0  0.0   0:00.00 pdflush

The output from dmesg doesn't show any hardware faults or time-outs:
dmesg | tail
[42949392.820000] md: md driver 0.90.3 MAX_MD_DEVS=256, MD_SB_DISKS=27
[42949392.820000] md: bitmap version 4.39
[42949393.200000] device-mapper: 4.4.0-ioctl (2005-01-12) initialised: 
dm-devel at redhat.com
[42949396.240000] kjournald starting.  Commit interval 5 seconds
[42949396.240000] EXT3 FS on hda1, internal journal
[42949396.240000] EXT3-fs: mounted filesystem with ordered data mode.
[42949402.240000] eth1: no IPv6 routers present
[42949413.210000] Netfilter messages via NETLINK v0.30.
[42949413.250000] ip_conntrack version 2.4 (3967 buckets, 31736 max) - 
232 bytes per conntrack
[42949413.660000] ip_tables: (C) 2000-2002 Netfilter core team

In fact, the system seems to be fine... except that uptime and top and 
/proc/loadavg all show a load average that would kill a 16-way monster.

What can I do to diagnose this further? How do I find out what's causing 
it? What does it mean? Help!

Please note:
I just recently got my server hosted, and chose to stick with Ubuntu 
because that's what I know. This is relevant because it's very primitive 
hosting: Please don't suggest any strategy that involves having physical 
access to the machine, booting special media, or performing operations 
that can only be done on the console. I have a total of three kinds of 
access to this box:
1. I can get in over the network, e.g. by ssh.
2. I can have the system rebooted (for a sum of money per reboot).
3. I can get the system reinstalled from scratch.
This last option is presently kinda less than ideal, since I don't have 
a current backup.

Thanks in advance!

-- 
Thorne Huw Lawler         trouble.net.au
"Croup and Vandermar, the Old Firm. Obstacles
obliterated, nuisances eradicated, bothersome
limbs removed and tutelary dentistry undertaken"