LKCD
Peter M. Petrakis
peter.petrakis at canonical.com
Wed Dec 22 22:53:42 UTC 2010
On 12/22/2010 05:25 PM, Joseph Salisbury wrote:
> On 12/22/2010 04:24 PM, Peter M. Petrakis wrote:
>>
>>
>> On 12/22/2010 04:04 PM, Joseph Salisbury wrote:
>>> On 12/22/2010 02:21 PM, Peter M. Petrakis wrote:
>>>> Hi,
>>>>
>>>> On 12/21/2010 09:45 AM, Joseph Salisbury wrote:
>>>>> Hello,
>>>>>
>>>>> I'm attempting to use linux-crashdump to debug an issue.
>>>>> I've been following the documentation at:
>>>>>
>>>>> https://wiki.ubuntu.com/Kernel/CrashdumpRecipe
>>>>>
>>>>> The exact steps I've done are: Installed linux-crashdump:
>>>>> sudo apt-get install linux-crashdump Rebooted system to
>>>>> enable crashdump.
>>>>>
>>>>> My test to force a crash: echo 1 | sudo tee
>>>>> /proc/sys/kernel/panic_on_oops echo c | sudo tee
>>>>> /proc/sysrq-trigger
>>>>>
>>>>> However, I never get anything in /var/crash. In fact the
>>>>> /var/crash directory didn't exist until I created it. I've
>>>>> tried this on Lucid, Maverick and Natty with the same
>>>>> results.
>>>>>
>>>>> Has anyone successfully used linux-crashdump recently, or
>>>>> suggest another tool like kdump? Maybe I'm missing a step?
>>>>
>>>> No that's about right, it should work, but the crashdump
>>>> package isn't very robust. Nor is it nearly as configurable as
>>>> the RHEL variant, you have to customize it yourself. Judging
>>>> from the bug list it doesn't appear to be getting much
>>>> attention either.
>>>>
>>>> Yes we do use it, and it does work, but it doesn't always work
>>>> out of the box. So a few things:
>>>>
>>>> 1) I've had issues using kexec in VirtualBox in the past, if
>>>> you're trying to sandbox it there, try bare metal instead.
>>>>
>>>> 2) Can you do a "simple" kexec and succeed? See the man page on
>>>> how to prepare it. Just take what you're booting now, load
>>>> that, and kexec. If it works it'll be like a really fast reboot
>>>> :)
>>>>
>>>> 3) kdump *is* linux-crashdump. The old, driver specific method
>>>> of dumping is gone. Like diskdump.
>>>>
>>>> 4) Not all drivers take kindly to being thrown through
>>>> kdump/kexec. Alot of them you don't need. So if you have a
>>>> serial console, start taking note of all the peripherals that
>>>> give you problems, and compile a new kernel just for the
>>>> purposes of kdump without those things enabled.
>>>>
>>>> 5) kexec/kdump doesn't always work, but with a solid,
>>>> reproducible test case, probability will usually grant you with
>>>> a readable dump :)
>>>>
>>>>> Thanks,
>>>>>
>>>>> Joe
>>>>
>>>>
>>>> Peter
>>>>
>>>
>>> Thanks for the feedback, Peter.
>>>
>>> 1) Yes, I've tried bare metal as well as KVM VMs
>>>
>>> 2) I performed the following kexec, and it did do a really fast
>>> reboot :-)
>>>
>>> /sbin/kexec
>>> --command-line="BOOT_IMAGE=/boot/vmlinuz-2.6.37-11-generic
>>> root=UUID=16a635bc-7110-4c13-97bf-1a3bb5931a96 ro vt.handoff=7
>>> quiet splash irqpoll maxcpus=1 nousb"
>>> --initrd=/boot/initrd.img-2.6.37-11-generic
>>> /boot/vmlinuz-2.6.37-11-generic
>>
>> Good!
>>
>>> So seems like kexec is working, but it is not triggered when I do
>>> an "alt+sysrq c" or "echo c | sudo tee /proc/sysrq-trigger". In
>>> either case the system just freezes.
>>
>> Just freezes... Damn that's interesting :) What's you
>> /proc/cmdline look like before you issue the panic?
>
> The following is /proc/cmdline before I initiated the panic: $ cat
> /proc/cmdline BOOT_IMAGE=/boot/vmlinuz-2.6.37-11-generic
> root=UUID=16a635bc-7110-4c13-97bf-1a3bb5931a96 ro vt.handoff=7
> crashkernel=384M-2G:64M,2G-:128M quiet splash
>
>
>>
>> It doesn't take much from the kernel arg perspective, for example:
>>
>> (ignore all the casper stuff) $ cat /proc/cmdline
>> initrd=/casper/initrd.lz boot=casper live-config toram persistent
>> noprompt LIVEMEDIA=/dev/disk/by-label/DEBIAN_LIVE console=tty0
>> console=ttyS0,115200n8 apparmor=0 crashkernel=256M username=ubuntu
>> hostname=ubuntu exposedroot BOOT_IMAGE=/casper/vmlinuz
>>
>> The crashkernel= is really all you need, the init service however
>> should have primed the kexec kernel + initrd too. There's no magic
>> window (to my knowledge) of "when" you must prime the kdump kernel,
>> because the memory has already been reserved. So for example, you
>> could disable the kdump init script and setup manually, using it as
>> a guide, to ensure the init script is doing the right thing.
>
> I can try disabling the kdump init script and see if I can set things
> up manually like you suggest.
>
>>
>> Also,
>>
>> - CPU make, model, and # - Current kernel
>
> I'm running this on a cheap netbook, but I can also reproduce this on
> a server if you prefer. The netbooks cpu is: Single CPU: Intel(R)
> Atom(TM) CPU N455 @ 1.66GHz
>
> I'm also running the latest Natty kernel: $ uname -r
> 2.6.37-11-generic
>
>
>>
>> If you try booting with "nosmp" and then trigger the panic does it
>> still hang?
>
> Yes, I added nosmp to the end of GRUB_CMDLINE_LINUX_DEFAULT and ran
> update-grub. The contents of /proc/cmdline changed to:
>
> BOOT_IMAGE=/boot/vmlinuz-2.6.37-11-generic
> root=UUID=16a635bc-7110-4c13-97bf-1a3bb5931a96 ro vt.handoff=7
> crashkernel=384M-2G:64M,2G-:128M quiet splash nosmp
>
>>
>> Perhaps you could send a magic sysrq key and dump the current
>> process list?
>>
>
> I took a screen shot of the console after I triggered the panic.
> Hopfully it is readable and/or useful. I also attached the output of
> "alt+sysrq t"
So that NULL ptr deref looks like a real bug. Please retry the kdump
test with an earlier kernel, I use lucid regularly with kdump
without issue, 2.6.32-26. If you find that addresses the symptom
then please file a new bug against natty, this is supposed to work.
>
>>> 3) Thanks for the info about kdump.
>>
>> No problem. The more crashdump users the better.
>>
>>> 4) Thanks for the suggestions, I will try this.
>>>
>>> Thanks for the help, Peter! I appreciate you taking the time,
>>> and sending me a response.
>>
>> :)
>>
>>> Joe
>>
>> Peter
>>
>
> Thanks again!
>
> Joe
>
More information about the kernel-team
mailing list