LKCD

Wed Dec 22 22:53:42 UTC 2010

On 12/22/2010 05:25 PM, Joseph Salisbury wrote:
> On 12/22/2010 04:24 PM, Peter M. Petrakis wrote:
>> 
>> 
>> On 12/22/2010 04:04 PM, Joseph Salisbury wrote:
>>> On 12/22/2010 02:21 PM, Peter M. Petrakis wrote:
>>>> Hi,
>>>> 
>>>> On 12/21/2010 09:45 AM, Joseph Salisbury wrote:
>>>>> Hello,
>>>>> 
>>>>> I'm attempting to use linux-crashdump to debug an issue.
>>>>> I've been following the documentation at:
>>>>> 
>>>>> https://wiki.ubuntu.com/Kernel/CrashdumpRecipe
>>>>> 
>>>>> The exact steps I've done are: Installed linux-crashdump: 
>>>>> sudo apt-get install linux-crashdump Rebooted system to
>>>>> enable crashdump.
>>>>> 
>>>>> My test to force a crash: echo 1 | sudo tee
>>>>> /proc/sys/kernel/panic_on_oops echo c | sudo tee
>>>>> /proc/sysrq-trigger
>>>>> 
>>>>> However, I never get anything in /var/crash.  In fact the
>>>>> /var/crash directory didn't exist until I created it.  I've
>>>>> tried this on Lucid, Maverick and Natty with the same
>>>>> results.
>>>>> 
>>>>> Has anyone successfully used linux-crashdump recently, or
>>>>> suggest another tool like kdump?  Maybe I'm missing a step?
>>>> 
>>>> No that's about right, it should work, but the crashdump
>>>> package isn't very robust. Nor is it nearly as configurable as
>>>> the RHEL variant, you have to customize it yourself. Judging
>>>> from the bug list it doesn't appear to be getting much
>>>> attention either.
>>>> 
>>>> Yes we do use it, and it does work, but it doesn't always work 
>>>> out of the box. So a few things:
>>>> 
>>>> 1) I've had issues using kexec in VirtualBox in the past, if
>>>> you're trying to sandbox it there, try bare metal instead.
>>>> 
>>>> 2) Can you do a "simple" kexec and succeed? See the man page on
>>>> how to prepare it. Just take what you're booting now, load
>>>> that, and kexec. If it works it'll be like a really fast reboot
>>>> :)
>>>> 
>>>> 3) kdump *is* linux-crashdump. The old, driver specific method
>>>> of dumping is gone. Like diskdump.
>>>> 
>>>> 4) Not all drivers take kindly to being thrown through
>>>> kdump/kexec. Alot of them you don't need. So if you have a
>>>> serial console, start taking note of all the peripherals that
>>>> give you problems, and compile a new kernel just for the
>>>> purposes of kdump without those things enabled.
>>>> 
>>>> 5) kexec/kdump doesn't always work, but with a solid,
>>>> reproducible test case, probability will usually grant you with
>>>> a readable dump :)
>>>> 
>>>>> Thanks,
>>>>> 
>>>>> Joe
>>>> 
>>>> 
>>>> Peter
>>>> 
>>> 
>>> Thanks for the feedback, Peter.
>>> 
>>> 1) Yes, I've tried bare metal as well as KVM VMs
>>> 
>>> 2) I performed the following kexec, and it did do a really fast
>>> reboot :-)
>>> 
>>> /sbin/kexec
>>> --command-line="BOOT_IMAGE=/boot/vmlinuz-2.6.37-11-generic 
>>> root=UUID=16a635bc-7110-4c13-97bf-1a3bb5931a96 ro vt.handoff=7
>>> quiet splash irqpoll maxcpus=1 nousb" 
>>> --initrd=/boot/initrd.img-2.6.37-11-generic 
>>> /boot/vmlinuz-2.6.37-11-generic
>> 
>> Good!
>> 
>>> So seems like kexec is working, but it is not triggered when I do
>>> an "alt+sysrq c" or "echo c | sudo tee /proc/sysrq-trigger". In
>>> either case the system just freezes.
>> 
>> Just freezes... Damn that's interesting :) What's you
>> /proc/cmdline look like before you issue the panic?
> 
> The following is /proc/cmdline before I initiated the panic: $ cat
> /proc/cmdline BOOT_IMAGE=/boot/vmlinuz-2.6.37-11-generic 
> root=UUID=16a635bc-7110-4c13-97bf-1a3bb5931a96 ro vt.handoff=7 
> crashkernel=384M-2G:64M,2G-:128M quiet splash
> 
> 
>> 
>> It doesn't take much from the kernel arg perspective, for example:
>> 
>> (ignore all the casper stuff) $ cat /proc/cmdline 
>> initrd=/casper/initrd.lz boot=casper live-config toram persistent
>> noprompt LIVEMEDIA=/dev/disk/by-label/DEBIAN_LIVE console=tty0
>> console=ttyS0,115200n8 apparmor=0 crashkernel=256M username=ubuntu
>> hostname=ubuntu exposedroot  BOOT_IMAGE=/casper/vmlinuz
>> 
>> The crashkernel= is really all you need, the init service however
>> should have primed the kexec kernel + initrd too. There's no magic
>> window (to my knowledge) of "when" you must prime the kdump kernel,
>> because the memory has already been reserved. So for example, you
>> could disable the kdump init script and setup manually, using it as
>> a guide, to ensure the init script is doing the right thing.
> 
> I can try disabling the kdump init script and see if I can set things
> up manually like you suggest.
> 
>> 
>> Also,
>> 
>> - CPU make, model, and # - Current kernel
> 
> I'm running this on a cheap netbook, but I can also reproduce this on
> a server if you prefer.  The netbooks cpu is: Single CPU: Intel(R)
> Atom(TM) CPU N455   @ 1.66GHz
> 
> I'm also running the latest Natty kernel: $ uname -r 
> 2.6.37-11-generic
> 
> 
>> 
>> If you try booting with "nosmp" and then trigger the panic does it 
>> still hang?
> 
> Yes, I added nosmp to the end of GRUB_CMDLINE_LINUX_DEFAULT and ran
> update-grub.  The contents of /proc/cmdline changed to:
> 
> BOOT_IMAGE=/boot/vmlinuz-2.6.37-11-generic 
> root=UUID=16a635bc-7110-4c13-97bf-1a3bb5931a96 ro vt.handoff=7 
> crashkernel=384M-2G:64M,2G-:128M quiet splash nosmp
> 
>> 
>> Perhaps you could send a magic sysrq key and dump the current
>> process list?
>> 
> 
> I took a screen shot of the console after I triggered the panic.
> Hopfully it is readable and/or useful.  I also attached the output of
> "alt+sysrq t"

So that NULL ptr deref looks like a real bug. Please retry the kdump
test with an earlier kernel, I use lucid regularly with kdump
without issue, 2.6.32-26. If you find that addresses the symptom
then please file a new bug against natty, this is supposed to work.

> 
>>> 3) Thanks for the info about kdump.
>> 
>> No problem. The more crashdump users the better.
>> 
>>> 4) Thanks for the suggestions, I will try this.
>>> 
>>> Thanks for the help, Peter!  I appreciate you taking the time,
>>> and sending me a response.
>> 
>> :)
>> 
>>> Joe
>> 
>> Peter
>> 
> 
> Thanks again!
> 
> Joe
>