An interesting problem in solaris to troubleshoot
One of my colleague installed Solaris 10 U3 on a Metropolis machine.
The machine was working fine for sometime.
Then it got powered off and didn't come up next time due to boot
archive problem.
So she booted in failsafe session and updated the boot-archive.
Now when tried to boot in normal solaris, it boots up properly and
hangs the system within a minute. Nothing works.
Myself and Shesha were trying to troubleshoot this.
Initially we thought it has something to do with Xorg. So we booted in
commandline.
Even then it hunged. What frustating is everytime it hangs, reboot was
the only option.
And the window where we could operate was very small. The problem was
not see when we booted in single user mode.
Now we thought it has something to do with some of the services. So
next time when we booted in milestone all,
the first command we executed was 'svcs -xv'. Everything looked OK.
By this time, we had already wasted almost 1 hour debugging this
problem.
Even telnet, ssh was not working.
The last options at this time was to reinstall OS.
We decided to give one last try by running it under kmdb and debug the
problem.
So in the grub, we gave -k option and booted the OS. When the problem
was hit, we went to kmdb mode and tried to see the threadlist.
We found that there was some activity on 'acpi' module. That was the
culprit. so when we disabled it while booting by specifying option
acpi-users-option=0x2.
Now that machine is working fine and my colleague is happy.
Posted at
02:07PM Sep 25, 2006
by madhu in General |