LKCD

Joseph Salisbury joseph.salisbury at canonical.com
Wed Dec 22 23:03:43 UTC 2010


On 12/22/2010 05:53 PM, Peter M. Petrakis wrote:
>
>
> On 12/22/2010 05:25 PM, Joseph Salisbury wrote:
>> On 12/22/2010 04:24 PM, Peter M. Petrakis wrote:
>>>
>>>
>>> On 12/22/2010 04:04 PM, Joseph Salisbury wrote:
>>>> On 12/22/2010 02:21 PM, Peter M. Petrakis wrote:
>>>>> Hi,
>>>>>
>>>>> On 12/21/2010 09:45 AM, Joseph Salisbury wrote:
>>>>>> Hello,
>>>>>>
>>>>>> I'm attempting to use linux-crashdump to debug an issue.
>>>>>> I've been following the documentation at:
>>>>>>
>>>>>> https://wiki.ubuntu.com/Kernel/CrashdumpRecipe
>>>>>>
>>>>>> The exact steps I've done are: Installed linux-crashdump:
>>>>>> sudo apt-get install linux-crashdump Rebooted system to
>>>>>> enable crashdump.
>>>>>>
>>>>>> My test to force a crash: echo 1 | sudo tee
>>>>>> /proc/sys/kernel/panic_on_oops echo c | sudo tee
>>>>>> /proc/sysrq-trigger
>>>>>>
>>>>>> However, I never get anything in /var/crash.  In fact the
>>>>>> /var/crash directory didn't exist until I created it.  I've
>>>>>> tried this on Lucid, Maverick and Natty with the same
>>>>>> results.
>>>>>>
>>>>>> Has anyone successfully used linux-crashdump recently, or
>>>>>> suggest another tool like kdump?  Maybe I'm missing a step?
>>>>>
>>>>> No that's about right, it should work, but the crashdump
>>>>> package isn't very robust. Nor is it nearly as configurable as
>>>>> the RHEL variant, you have to customize it yourself. Judging
>>>>> from the bug list it doesn't appear to be getting much
>>>>> attention either.
>>>>>
>>>>> Yes we do use it, and it does work, but it doesn't always work
>>>>> out of the box. So a few things:
>>>>>
>>>>> 1) I've had issues using kexec in VirtualBox in the past, if
>>>>> you're trying to sandbox it there, try bare metal instead.
>>>>>
>>>>> 2) Can you do a "simple" kexec and succeed? See the man page on
>>>>> how to prepare it. Just take what you're booting now, load
>>>>> that, and kexec. If it works it'll be like a really fast reboot
>>>>> :)
>>>>>
>>>>> 3) kdump *is* linux-crashdump. The old, driver specific method
>>>>> of dumping is gone. Like diskdump.
>>>>>
>>>>> 4) Not all drivers take kindly to being thrown through
>>>>> kdump/kexec. Alot of them you don't need. So if you have a
>>>>> serial console, start taking note of all the peripherals that
>>>>> give you problems, and compile a new kernel just for the
>>>>> purposes of kdump without those things enabled.
>>>>>
>>>>> 5) kexec/kdump doesn't always work, but with a solid,
>>>>> reproducible test case, probability will usually grant you with
>>>>> a readable dump :)
>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Joe
>>>>>
>>>>>
>>>>> Peter
>>>>>
>>>>
>>>> Thanks for the feedback, Peter.
>>>>
>>>> 1) Yes, I've tried bare metal as well as KVM VMs
>>>>
>>>> 2) I performed the following kexec, and it did do a really fast
>>>> reboot :-)
>>>>
>>>> /sbin/kexec
>>>> --command-line="BOOT_IMAGE=/boot/vmlinuz-2.6.37-11-generic
>>>> root=UUID=16a635bc-7110-4c13-97bf-1a3bb5931a96 ro vt.handoff=7
>>>> quiet splash irqpoll maxcpus=1 nousb"
>>>> --initrd=/boot/initrd.img-2.6.37-11-generic
>>>> /boot/vmlinuz-2.6.37-11-generic
>>>
>>> Good!
>>>
>>>> So seems like kexec is working, but it is not triggered when I do
>>>> an "alt+sysrq c" or "echo c | sudo tee /proc/sysrq-trigger". In
>>>> either case the system just freezes.
>>>
>>> Just freezes... Damn that's interesting :) What's you
>>> /proc/cmdline look like before you issue the panic?
>>
>> The following is /proc/cmdline before I initiated the panic: $ cat
>> /proc/cmdline BOOT_IMAGE=/boot/vmlinuz-2.6.37-11-generic
>> root=UUID=16a635bc-7110-4c13-97bf-1a3bb5931a96 ro vt.handoff=7
>> crashkernel=384M-2G:64M,2G-:128M quiet splash
>>
>>
>>>
>>> It doesn't take much from the kernel arg perspective, for example:
>>>
>>> (ignore all the casper stuff) $ cat /proc/cmdline
>>> initrd=/casper/initrd.lz boot=casper live-config toram persistent
>>> noprompt LIVEMEDIA=/dev/disk/by-label/DEBIAN_LIVE console=tty0
>>> console=ttyS0,115200n8 apparmor=0 crashkernel=256M username=ubuntu
>>> hostname=ubuntu exposedroot  BOOT_IMAGE=/casper/vmlinuz
>>>
>>> The crashkernel= is really all you need, the init service however
>>> should have primed the kexec kernel + initrd too. There's no magic
>>> window (to my knowledge) of "when" you must prime the kdump kernel,
>>> because the memory has already been reserved. So for example, you
>>> could disable the kdump init script and setup manually, using it as
>>> a guide, to ensure the init script is doing the right thing.
>>
>> I can try disabling the kdump init script and see if I can set things
>> up manually like you suggest.
>>
>>>
>>> Also,
>>>
>>> - CPU make, model, and # - Current kernel
>>
>> I'm running this on a cheap netbook, but I can also reproduce this on
>> a server if you prefer.  The netbooks cpu is: Single CPU: Intel(R)
>> Atom(TM) CPU N455   @ 1.66GHz
>>
>> I'm also running the latest Natty kernel: $ uname -r
>> 2.6.37-11-generic
>>
>>
>>>
>>> If you try booting with "nosmp" and then trigger the panic does it
>>> still hang?
>>
>> Yes, I added nosmp to the end of GRUB_CMDLINE_LINUX_DEFAULT and ran
>> update-grub.  The contents of /proc/cmdline changed to:
>>
>> BOOT_IMAGE=/boot/vmlinuz-2.6.37-11-generic
>> root=UUID=16a635bc-7110-4c13-97bf-1a3bb5931a96 ro vt.handoff=7
>> crashkernel=384M-2G:64M,2G-:128M quiet splash nosmp
>>
>>>
>>> Perhaps you could send a magic sysrq key and dump the current
>>> process list?
>>>
>>
>> I took a screen shot of the console after I triggered the panic.
>> Hopfully it is readable and/or useful.  I also attached the output of
>> "alt+sysrq t"
>
> So that NULL ptr deref looks like a real bug. Please retry the kdump
> test with an earlier kernel, I use lucid regularly with kdump
> without issue, 2.6.32-26. If you find that addresses the symptom
> then please file a new bug against natty, this is supposed to work.

I will do that. Thanks again for all your help!

>
>
>>
>>>> 3) Thanks for the info about kdump.
>>>
>>> No problem. The more crashdump users the better.
>>>
>>>> 4) Thanks for the suggestions, I will try this.
>>>>
>>>> Thanks for the help, Peter!  I appreciate you taking the time,
>>>> and sending me a response.
>>>
>>> :)
>>>
>>>> Joe
>>>
>>> Peter
>>>
>>
>> Thanks again!
>>
>> Joe
>>
>





More information about the kernel-team mailing list