Solaris Hangs
What if you have what appears to be a hang (Solaris only at this time)?
I've broken this out of the data collection section since it was beginning to over crowd that it. If you spot anything wrong or glaring omissions, as ever, I would be grateful for your contribution.
What follows is for the next time scenario, if your machine is hung right now you more than likely can't do a lot about it.
It's best to start from the ilom for this and so for the next line I'll assume you have...
When you experience a hang you will be asked for information, using the script utility makes command output collection a little easier for you.From a terminal window type script -a <filename>and then
ssh to the ilom, execute start /SP/console. This gives you a console session on the host. Reboot the machine (using the ilom commands reset /SP or reset /SYS or however you want to/can) and modify
the grub boot option that you normally use. Put the machine into kernel
debugger mode as follows...
kernel /boot/multiboot kernel/unix -k , you might want to boot the machine single user too and that just needs you to add a -s to that command line also.
Once the machine appears to hang send a break via the Ilom console
(remember you should have come from the ilom at this point) as follows, 'Esc'
followed by shift+b. You should now drop to the kmdb prompt. If you happen to be on the host itself then so long as you have a
functioning keyboard you can drop into kmdb by using the key sequence
[shift]+[pause] and possibly F1+a though be prepared that it might not
work, also, if you have a usb kvm it would be as well not to
start switching keyboards/screen as it appears to lock up either the
keyboard or kmdb (I'm not sure which one). make sure that you do not run
the gui on the console as this tends to hide the console output.
Once at
the kmdb prompt please run and capture the output from the commands below:
Don't forget to type the "::" otherwise you'll either get rubbish or nothing.
::cpuinfo -v
::ps
::ptree
::kmastat
::cpustack -c 0
::cpustack -c 1
::interrupts
::msgbuf -v
::fsinfo
::cpuregs
::swapinfo
::sysevent
$<threadlist
If
you are able to collect a full crash dump then you can also type
$<systemdump whilst in the debugger.
If
the kernel isn't responding it might be worth enabling deadman kernel
code to help you out , this is done in /etc/system by using the line.
set snooping=1 (there are other arguments, but sunsolve can help you
out with that).
For particularly difficult hard hangs the following might help (in conjunction with set snooping), modify /etc/system to have the followings lines...
set pcplusmp:apic_panic_on_nmi=1
set pcplusmp:apic_kmdb_on_nmi=1
The lines above are intent on creating a crash dump on nmi and dropping to the kernel debugger.
---o---
What's an NMI?
None Maskable Interrupt, an interrupt that in most cases can't be stopped, since the nmi is set at int 2 (vector 2 of the idt table) this is fairly close to invincible. An nmi is called either by external hardware or by either the system bus or an apic. Only one nmi at a time is permitted. The nmi reset button (if your machine has one) is tied to the INTR pin of the cpu, this is classed as a maskable interrupt though so you might not get the same result as a true nmi event. An NMI is a level 5 priority (as opposed to a level 2 interrupt, different things you see) and comes after the following,
hardware reset and / or machine check (mce)
task switch trap
external hardware int
breakpoints and debug traps
the our NMI
There are another 5 after this (1-10) but we don't need to care about them really. All of this means that we have a way to cause a machine to inteerupt anything it's doing and panic on certain events (hence the /etc/system lines above).
Notes:
idt = an index to the place where the inteerupt service routine is held
interrupt = a notice for the processor to stop what it's doing and deal with this request (based on priority of course).
apic = advanced programmable interrupt controller (cpu pin based interrupts or cpu to cpu interrupts in multi cpu units).
---o---
It's also possible that you will need to boot 32bit mode, this is how you do that. Edit the grub entry that reads "kernel /platform/i86pc/multiboot " and change it to read "kernel /platform/i86pc/multiboot kernel/unix" if you want to boot 32bit and single user then add a "-s" to the end of the boot line.