Madhu K R's Weblog

From the view finder
Monday Sep 25, 2006

An interesting problem in solaris to troubleshoot

One of my colleague installed Solaris 10 U3 on a Metropolis machine. The machine was working fine for sometime.
Then it got powered off and didn't come up next time due to boot archive problem.
So she booted in failsafe session and updated the boot-archive.
Now when tried to boot in normal solaris, it boots up properly and hangs the system within a minute. Nothing works.

Myself and Shesha were trying to troubleshoot this.
Initially we thought it has something to do with Xorg. So we booted in commandline.
Even then it hunged. What frustating is everytime it hangs, reboot was the only option.
And the window where we could operate was very small. The problem was not see when we booted in single user mode.

Now we thought it has something to do with some of the services. So next time when we booted in milestone all,
the first command we executed was 'svcs -xv'. Everything looked OK.

By this time, we had already wasted almost 1 hour debugging this problem.
Even telnet, ssh was not working.
The last options at this time was to reinstall OS.
We decided to give one last try by running it under kmdb and debug the problem.

So in the grub, we gave -k option and booted the OS. When the problem was hit, we went to kmdb mode and tried to see the threadlist.
We found that there was some activity on 'acpi' module. That was the culprit. so when we disabled it while booting by specifying option
acpi-users-option=0x2.
Now that machine is working fine and my colleague is happy.



Comments:

Post a Comment:
  • HTML Syntax: NOT allowed

Archives
Links
Referrers