« For CDE users... | Main
path_to_inst_trauma
So, I'm working along happily from home, when the vpn driver I'm using panicks my Solaris box. Fine, no problem; I'm used to this from the vpn driver by now, although it doesn't happen all that often. So, the system crashes and when it comes back up, shortly after the obpsym messages, the system goes into a loop trying to read an obviously corrupt /etc/path_to_inst file.

Great.

Well, now I have to boot from cdrom. But my media's at the office. I don't have another Sun I can stick this disk into, and I don't have a boot server either. Sooooo, to the office, to bring back a cd. I come back home to find that my cdrom won't read the cd. I now remember that it gave me some hassle about the last cd I tried to read, but I shrugged it off at the time. Now I have media, but no way to use it.

Great.

OK. This isn't a terrible tragedy, I do have backups, and other computers I can use while I wait for a replacement cd drive, but it irks me that I can't fix this. A few minutes later, I've booted the system under kadb, and I'm looking at the area of memory that holds the string with the path_to_inst filename. It's 17 bytes long, and followed by 7 bytes all zero. This is a lucky alignment for me. Using the /v modifier in kadb, I tack on a ".old" to the filename, and continue past the initial breakpoint. (Note: I wouldn't ever recommend doing this on a system that had anything important on it, but I was prepared to reinstall and restore from backup, so I figured why not? It's not like my system could get less bootable...) For the morbidly curious, the memory layout from mdb on my recovered system looks like this:

> *instance_file,20::dump
           0 1 2 3  4 5 6 7 \/ 9 a b  c d e f  01234567v9abcdef
144ccc0:  68650000 00000000 2f657463 2f706174  he....../etc/pat
144ccd0:  685f746f 5f696e73 74000000 00000000  h_to_inst.......
144cce0:  726f6f74 6e657800 00000000 00000000  rootnex.........
Anyway, once past the breakpoint, the system comes up, and I'm back in business.

Great.

The point here? There are several:

Oh, and one more: upgrade to Solaris 10! If I'd been running 10, and not 9 at home, this problem would have been found and fixed automagically and even if it hadn't, doing what I wound up doing would have been far easier.

(2004-06-20 23:42:00.0) Permalink

Comments:

Post a Comment:

Comments are closed for this entry.