So I can boot back into Solaris just fine, but if I select my Windows partition it will not boot. I get:
A disk read error occured Press Ctrl+Alt+Del to restart
Now I'm actually okay with this for right now. Normally I'd be a bit put off, you are supposed to install WinXP first and then your other OS. But, with the trick of booting up in single user mode from the DVD and then using the installgrub tip we learned from Derek (Solaris 11 GRUB), I'm willing to try to fix the WinXP partition and then recover. If not, it just means we need to restart the install process.
The two paths are:
Well, in my mind, even if the recovery fails, we are back to the first path. And before we then retry to install the Solaris partition, we can try to fix the MBR.
The evil thought in the back of my mind is that the WinXP registration process has nuked my install to teach me to not pirate software.
Okay, I did some reading, it could be a bad cable or a too large HD. Yes, in spite of Solaris booting okay. I'm going to add the ATA/133 card into the system. I'm unhappy with the DVD being on the same path as the ATA drive anyway.
We can see what a mess it is back there:
The ribbon could be twisted too much for WinXP. Also, it tends to end up back in the cooler fan.
I take this time to add a Soundblaster card:
I take the ATA drive off of the cable and I am neatly able to tuck it up in the unused space above the DVD:
With the controller added, both PCI slots are now in use:
And we can see the ribbon cables on the disks:
At this point I am expecting two problems:
And I am right on both counts. The first is easy to solve, thanks to the capabilities of the BIOS. The second I'll deal with when I fix the MBR.
But this did not fix the root problem, so I'm about to try and repair the WinXP partition. fixmbr worked, but chkdsk -r refused to do anything. I'll reboot to see if the MBR was fixed enough, but if not, time for a fresh install. Hmm, it booted into grub.
Okay, I'll add a new entry once I get WinXP reinstalled and I try to fix the Solaris booting.
Okay, the system booted after being off all night. Yes, this is a concern for me because of the labeling problems. Last year this step failed.
We want to create a large pool, so we need to find out what is available to us:
# format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c1d0 <DEFAULT cyl 9565 alt 2 hd 255 sec 63>
/pci@0,0/pci-ide@c/ide@0/cmdk@0,0
1. c2d0 <DEFAULT cyl 30398 alt 2 hd 255 sec 63>
/pci@0,0/pci-ide@d/ide@0/cmdk@0,0
2. c3d0 <DEFAULT cyl 30398 alt 2 hd 255 sec 63>
/pci@0,0/pci-ide@d/ide@1/cmdk@0,0
3. c4d0 <DEFAULT cyl 30398 alt 2 hd 255 sec 63>
/pci@0,0/pci-ide@d,1/ide@0/cmdk@0,0
4. c5d0 <DEFAULT cyl 30398 alt 2 hd 255 sec 63>
/pci@0,0/pci-ide@d,1/ide@1/cmdk@0,0
Specify disk (enter its number): ^D
And now we can create a pool (with raidz) for playing with. Note that I've given the pool the entire disks and I don't have a spare. This isn't a production system. I'm also not worried about silent data loss. These are all things I would challange in a setting where I cared about my data. But, if you think about it, most home desktops have been running this way for years.
# zpool create zoo raidz c2d0 c3d0 c4d0 c5d0 # zpool list NAME SIZE USED AVAIL CAP HEALTH ALTROOT zoo 928G 147K 928G 0% ONLINE - # zfs create zoo/isos # zfs create zoo/home # zfs set mountpoint=/export/zfs zoo/home # zfs set sharenfs=on zoo/home # zfs set compression=on zoo/home # zfs create zoo/home/nfsv2 # zfs create zoo/home/nfsv3 # zfs create zoo/home/nfsv4 # zfs create zoo/home/tdh # zfs list NAME USED AVAIL REFER MOUNTPOINT zoo 376K 683G 38.2K /zoo zoo/home 189K 683G 42.6K /export/zfs zoo/home/nfsv2 36.7K 683G 36.7K /export/zfs/nfsv2 zoo/home/nfsv3 36.7K 683G 36.7K /export/zfs/nfsv3 zoo/home/nfsv4 36.7K 683G 36.7K /export/zfs/nfsv4 zoo/home/tdh 36.7K 683G 36.7K /export/zfs/tdh zoo/isos 36.7K 683G 36.7K /zoo/isos
I didn't show it explicitly, but the NFS server was not yet enabled. You can do it with svcadm(1M) or count on the fact that either issuing a share(1M) command or setting the sharenfs property on a ZFS filesystem will cause the service and server to be started. We can check this on the server:
# share -@zoo/home /export/zfs rw "" -@zoo/home /export/zfs/nfsv2 rw "" -@zoo/home /export/zfs/nfsv3 rw "" -@zoo/home /export/zfs/nfsv4 rw "" -@zoo/home /export/zfs/tdh rw "" # zfs set sharenfs=on zoo/isos # share -@zoo/isos /zoo/isos rw "" -@zoo/home /export/zfs rw "" -@zoo/home /export/zfs/nfsv2 rw "" -@zoo/home /export/zfs/nfsv3 rw "" -@zoo/home /export/zfs/nfsv4 rw "" -@zoo/home /export/zfs/tdh rw ""
And we can see that NFS gets automatically enabled by checking from a client:
[tdh@adept ~/tmp]> showmount -e kanigix Export list for kanigix: /export/zfs (everyone) /export/zfs/nfsv2 (everyone) /export/zfs/nfsv3 (everyone) /export/zfs/nfsv4 (everyone) /export/zfs/tdh (everyone) /zoo/isos (everyone)
We can create some user accounts. First I add the following to /etc/group:
users:x:100:
And then the following users are created:
# useradd -m -u 1094 -g 100 -c "Mr. NFSv2" -d /export/zfs/nfsv2 nfsv2 # useradd -m -u 1813 -g 100 -c "Mr. NFSv3" -d /export/zfs/nfsv3 nfsv3 # useradd -m -u 3530 -g 100 -c "Mr. NFSv4" -d /export/zfs/nfsv4 nfsv4 # useradd -m -u 1066 -g 10 -c "Tom Haynes" -d /export/zfs/tdh tdh
Note I could connect to my NIS server to get this stuff, but I prefer some local accounts.
I forgot to do my account such that I get tcsh as a shell:
useradd -m -u 1066 -g 10 -c "Tom Haynes" -s /bin/tcsh -d /export/zfs/tdh tdh
No biggie, I can edit that in /etc/passwd directly.
I use gid 10, staff, for granting sudo permissions for not providing a password. I then use gid 100, users, for having to provide a password. It lets me know when I'm in the wrong role. I've never learned the RBAC stuff.
Let's get my environment over there:
[tdh@adept ~]> scp .tcshrc kanigix:/export/zfs/tdh Password:
Whoops, it won't take a blank password. Need to set one up:
# passwd tdh New Password: Re-enter new Password: passwd: password successfully changed for tdh
And back on the other box:
[tdh@adept ~]> scp .tcshrc kanigix:/export/zfs/tdh Password: scp: /export/zfs/tdh/.tcshrc: Permission denied
What is up with that? Even if the uids are different on the two boxes, it shouldn't matter. ssh uses the string names. We need to look at the permissions on the server:
# ls -la /export/zfs total 22 drwxr-xr-x 6 root sys 6 Jan 14 14:11 . drwxr-xr-x 4 root sys 512 Jan 14 14:10 .. drwxr-xr-x 2 root sys 2 Jan 14 14:10 nfsv2 drwxr-xr-x 2 root sys 2 Jan 14 14:10 nfsv3 drwxr-xr-x 2 root sys 2 Jan 14 14:11 nfsv4 drwxr-xr-x 2 root sys 2 Jan 14 14:11 tdh # chown -R nfsv2:users /export/zfs/nfsv2 # chown -R nfsv3:users /export/zfs/nfsv3 # chown -R nfsv4:users /export/zfs/nfsv4 # chown -R tdh:staff /export/zfs/tdh # ls -la /export/zfs total 22 drwxr-xr-x 6 root sys 6 Jan 14 14:11 . drwxr-xr-x 4 root sys 512 Jan 14 14:10 .. drwxr-xr-x 2 nfsv2 users 2 Jan 14 14:10 nfsv2 drwxr-xr-x 2 nfsv3 users 2 Jan 14 14:10 nfsv3 drwxr-xr-x 2 nfsv4 users 2 Jan 14 14:11 nfsv4 drwxr-xr-x 2 tdh staff 2 Jan 14 14:11 tdh
And now:
[tdh@adept ~]> scp .tcshrc kanigix:/export/zfs/tdh Password: .tcshrc 100% 5417 5.3KB/s 00:00
Can we get there?
[tdh@adept ~]> ssh kanigix Password: Last login: Sun Jan 14 14:24:24 2007 from adept.internal. Sun Microsystems Inc. SunOS 5.11 snv_55 October 2007 [tdh@kanigix ~]> ls -la total 16 drwxr-xr-x 2 tdh staff 3 Jan 14 14:24 . drwxr-xr-x 6 root sys 6 Jan 14 14:11 .. -rw------- 1 tdh staff 5417 Jan 14 14:24 .tcshrc
What would have happened if we hadn't fixed the permissions?
# zfs create zoo/home/monster # useradd -m -u 2025 -g 100 -c "The Monster" -s /bin/tcsh -d /export/zfs/monster monster # ls -la /export/zfs/monster total 8 drwxr-xr-x 2 root sys 2 Jan 14 14:25 . drwxr-xr-x 7 root sys 7 Jan 14 14:25 .. # passwd monster New Password: Re-enter new Password: passwd: password successfully changed for monster
And from the client:
[tdh@adept ~]> ssh moster@kanigix Password: Password: Password: [tdh@adept ~]> ssh monster@kanigix Password: Last login: Sun Jan 14 14:27:11 2007 from adept.internal. Sun Microsystems Inc. SunOS 5.11 snv_55 October 2007 > touch foo touch: foo cannot create
Notice there is no indication that moster is not a valid account.
I was expecting that perhaps we wouldn't be able to login to that directory. Put the permissions allowed us in. If we play with them a bit:
# chmod go-rwx /export/zfs/monster # ls -la /export/zfs/monster total 8 drwx------ 2 root sys 2 Jan 14 14:25 . drwxr-xr-x 7 root sys 7 Jan 14 14:25 ..
We end up getting bounced:
[tdh@adept ~]> ssh monster@kanigix Password: Last login: Sun Jan 14 14:27:11 2007 from adept.internal. Could not chdir to home directory /export/zfs/monster: Permission denied Sun Microsystems Inc. SunOS 5.11 snv_55 October 2007 > pwd /
Back to the zfs stuff. Time to reboot and see if I still have the pool. Note, with any other set of disks, I wouldn't even question this part. But after my experiences with them, I'm a doubter.
And no problems. After the reboot:
# df -h
Filesystem size used avail capacity Mounted on
/dev/dsk/c1d0s0 63G 5.2G 57G 9% /
/devices 0K 0K 0K 0% /devices
/dev 0K 0K 0K 0% /dev
ctfs 0K 0K 0K 0% /system/contract
proc 0K 0K 0K 0% /proc
mnttab 0K 0K 0K 0% /etc/mnttab
swap 4.3G 812K 4.3G 1% /etc/svc/volatile
objfs 0K 0K 0K 0% /system/object
/usr/lib/libc/libc_hwcap2.so.1
63G 5.2G 57G 9% /lib/libc.so.1
fd 0K 0K 0K 0% /dev/fd
swap 4.3G 40K 4.3G 1% /tmp
swap 4.3G 40K 4.3G 1% /var/run
/dev/dsk/c1d0s7 6.7G 6.8M 6.6G 1% /export/home
zoo/home 683G 44K 683G 1% /export/zfs
zoo/home/monster 683G 36K 683G 1% /export/zfs/monster
zoo/home/nfsv2 683G 36K 683G 1% /export/zfs/nfsv2
zoo/home/nfsv3 683G 36K 683G 1% /export/zfs/nfsv3
zoo/home/nfsv4 683G 36K 683G 1% /export/zfs/nfsv4
zoo/home/tdh 683G 40K 683G 1% /export/zfs/tdh
zoo 683G 38K 683G 1% /zoo
zoo/isos 683G 36K 683G 1% /zoo/isos
Okay, Solaris is installed, and we reboot the system. When it comes back up, it hangs and my stomach drops. Okay, it can't get DHCP on nge0 - probably a missing driver. No biggie. When loading the devices, it complains about the labels again on the SATA drives. Again, not an issue. Just booting is a big win.
Okay, the system is not on the network.
I go into format and get the SATA drives into shape. Basically I think I did a backup to get the EFI label loaded. I then did an fdisk to change the type. I then ran partition to give the bulk of the data to the first slice. Note that if I didn't do the backup step, I got a funky partition table.
Also, the fact that these drives were messed up is something unique to me. Most people would not suffer what I am about to go through.
Okay, I'm not going to install zfs just yet. I want to reboot and see if these new labels are hunky-dory. I power the system down (an issue I had with wont, so I wanted to make sure I had the power off) and took the DVD out. I also removed the post board. In my mind, I was getting close to wrapping it all up. I also added back the right side panel.
Power the puppy on, oh by the way, I went through will, phantom, corsair, and finally settled on kanigix as the name of the box. Okay, where were we? Oh yes, stuck on the dreaded BAD PBR sig. And did I mention I went back to my USB keyboard?
I got this on wont last year and I got sick. A quick search turned up: Re: [s-x86] bad PBR Sig. Okay, I rebooted and noticed that I couldn't see the boot drive in the list of attached drives. The cable was loose from when I took out the DVD. A quick fix and I still got the bad PBR sig. I also told the BIOS to no longer boot from the CDROM. No luck again.
I think I know what is going on. When I was fixing the SATA drives, they were being marked as the Active disk/partition. I'm not sure it mentions that disk is the boot disk. I'm pretty sure that the last SATA drive is what the system is trying to boot from.
Screw it all, I'm going to put that loud DVD drive in the case for right now. Okay, we need to pop off one of the black faceplates:
And now we start twisting the metal plate:
And now we have a place to put our drive:
We need to line the rails up on the drive. We can figure out which row of screws by sliding the drive into the opening and eyeballing it.
Okay, both rails are on. We want the latch to be about even with the edge of the drive:
When we mount it, it looks like there is too much lip. Who cares, it will come out sooner or later.
And we have to twist the hd ribbon to get the thing connected. Again, we don't want the DVD on the same chain as the HD. Oh well, for now it is okay.
I don't want to reinstall everything. I don't want to use WinXP to change the boot drive. I want to use Solaris if possible. The MBR is being controlled by GRUB. And I don't want to fudge with that because I want to keep the system as dual-boot.
The version of GRUB I have doesn't seem to have a maintenance mode. Of course I find this out after I reboot and can't move about in the menu with my USB keyboard!
I needed to get into the grub menu and edit one of the selections. I wanted the kernel line to be this:
kernel /boot/multiboot kernel/unix -s
Which tells it to boot in single user mode.
I then had it mount the first drive. I went into format and visited all of the drives. Three of them were marked as Active. I fixed them all to not be Active except for the Solaris partition on the first drive.
Reboot and I get "No active partition". Well, I'm on the right track!
Regrub (press 'e' to edit any entry and press 'e' again to edit that line, hit return when done, and press 'b' to boot with your new change) and boot back up in single user mode. All of the drives are marked correctly in format.
Okay, can we rewrite the MBR? Derek things so here at Solaris 11 GRUB. (By the way, I forgot all about his great blog until Google.com revealed it again to me.)
# /sbin/installgrub -m /boot/grub/stage1 /boot/grub/stage2 /dev/rdsk/c0d0s0
Except I'm going to try:
# /a/sbin/installgrub -m /a/boot/grub/stage1 /a/boot/grub/stage2 /dev/rdsk/c1d0s1
Okay, another misstep. This time though, I select F12 at the bios prompt, which lets me pick the boot order and disk. So I get to the first disk and grub throws up a prompt. I think I messed up above and zapped the MBR.
Regrub and this time pay attention. When it boots in single user mode, it tells us that there is a Solaris installed in '/dev/dsk/c1d0s0'. So I was right to change the path, but not the slice. I do need:
# /a/sbin/installgrub -m /a/boot/grub/stage1 /a/boot/grub/stage2 /dev/rdsk/c1d0s0
Reboot and we get the same message about the "No active partition". But, reboot again and use F12 to get to the correct disk. And I get a login prompt. I think the system is trying to boot from one of the DVD drives and failing. I think clearing all of the active partitions earlier fixed things as far as the "bad PBR sig" went and if the DVDs had not been mounted, it would have booted.
A quick test is to change the boot order and reboot. And I am wrong. Okay, I can get the system to boot if I use F12 to get me to the correct disk. Hmm, I wonder if I have to tell the BIOS which one is the boot disk? YES! YES! YES! I found it. And the ATA drive was behind all of the SATA drives.
My guess is that before I fixed the labels in Solaris, the drives had not been showing up as bootable to the BIOS. Who knows what it saw with the corrupt labels? Anyway, I fixed the order and the system now boots again. The network is not up, but that is a battle I can fix when I drag myself out of bed in the afternoon. It is 4AM here.
I lied, I did a sys-unconfig and now kanigix is on the net!
# uname -a SunOS kanigix 5.11 snv_55 i86pc i386 i86pc
When I fixed the labels, I had several which had active partitions, i.e., places we could boot from. When the BIOS added these in front of the ATA drive, it picked one to boot from and found no MBR. If I had changed the boot order of the disks first, I wouldn't have had all of this fun! Who knew?
It looks like I could have set the USB mode to 1.1 instead of 2.0 in the BIOS. That may have been a way to get by the install issue I was seeing. I'll try that later.
I lied again. I tried this and it still would not see the USB DVD to install from. Well, with the system coming up and networked, I can get the system information needed to file a bug.
A lot went on here and I am actually quite happy. I learned a lot about my new system and I've got working SATA drives. When I did wont, I couldn't get past this part and that was a big reason the machine ended up with my son. That and he needed to kill Rebel Scum or Imperial Plastic Soldiers. Don't knock the power of Lucas Arts in this house.
As far as noise goes, the DVD is the loudest component in the system. It doesn't matter if it is the external one or the exposed internal one. When it isn't going, I can't hear the system over my desktop Shuttle: adept.
As far as heat, when it was installing and running WinXP, the CPU cooler was not warm to the touch. The graphics cooler was warm. I remember seeing the BIOS stating that the case temperature was 25C and the CPU core was at 27-28C. Both of the case fans and the cooler fan were going.
When I flipped it onto its side, the rear case fan and cooler fan did not turn on when I started to install Solaris. The HD ribbon was stopping the cooler fan. The CPU cooler was warm to the touch and so was the video cooler. After 5-10 minutes of the two fans being back on, the CPU cooler is not as warm to the touch.
I put a cheap digital thermometer over there - it had said room temperature was about 23C. After 10 minutes ontop of the DVD drive, it said 26.7C. Hmm, the DVD drive is pretty warm - not as warm as the video cooler, but warm.
I'll find a way to get at the temperature from Solaris.
Anyway, the Solaris install is done and I'm off to play with it. It is also 30 minutes later and the CPU cooler is not warm. The fans are working.
I realized that the Solaris DVD I was trying to install from was corrupt. I made sure that this time it made its way into the trash bucket. I tried a known good image and it failed the same way.
So I turned the machine off and flipped it on its side. Note, be sure to either unplug the power cord in the back or if possible turn the PSU power switch off. I managed to press the front power button on through the front door.
I started trying to cable the DVD in and realized that the data ribbon was too short. I was going to either try another cable or just pull the HD out as well. Too many things perched on the case would just be too much:
Anyway, you can notice in the above picture that the ribbon cable is upside down. Now I hadn't wanted to either flip the data ribbon or put the devices in a set master or slave mode. I didn't know what that would do to the disk device names under Solaris after I was done. But I was willing to fix this before I installed Solaris. So, we pull it off and see the correct side for connecting two drives:
We flip it over, reconnect everything and we have an ad-hoc installation.
Now we shouldn't be upset that beta-software didn't boot up correctly. Not only am I trying to install cutting edge beta bits, but I've got the set currently being tested internally (okay, I just saw b56 bits get posted late yesterday). You can be sure I'll feed my results back into the other developers.
But, here comes the braindead aspect of WinXP registration - I didn't get the Solaris DVD into the drive in time. I ended up back in WinXP. And it declared that my system had changed drastically since the installation. Let's see, possibly the HD changed locations on a chain, the external USB drive was gone, and I added an internal DVD drive. Yes, that would never happen unless someone was trying to pirate the OS.
Fine, I've got a legal copy of the software, I'll let it register itself again. And that means putting in the CD key again. And that means being insulted by the registration process which has decided I'm trying to install the license on too many machines in too short of a time. I can call Microsoft up to explain why I am not pirating software and please, could I get my registration reset. Please! Aargh is too polite for what I feel right now.
What are you supposed to do in a lab situation where you have to reinstall WinXP all the time on the same machine to get to a stock system? I guess you are supposed to ghost the drive and never upgrade?
For some good news, the Solaris installation is going along just fine. It saw the SATA drives and decided it did not like the labels. Hehe, I knew that from way back in b34. Anyway, I'm hoping once I get the system back up (hehe again, I'm hoping it boots with those drives in) that I can quickly put a ZFS filesystem in place on them.