Friday Jul 04, 2008

Back when I got my first real break as a sysadmin, one of my first tasks was to upgrade the Uni's finance office server, a SparcServer 1000. Running Solaris 2.5 with a gaggle of external unipacks and multipacks for Oracle 7.$mumble, I organised an outage with the DBAs and the Finance stakeholders, practiced installing Solaris 2.6 on a new system (we'd just got an E450), and at the appointed time on the Saturday morning I rocked up and got to work on my precisely specified upgrade plan.

That all went swimmingly (though looooooooowly) until the time came to reboot after the final SDS 4.1 mirror had been created. The primary system board decided that it really didn't like me, and promptly died along with the boot prom.


PANIC!!


At that point I didn't know all that much about the innards of the SS1000 otherwise I probably would have just engaged in some swaptronics with the other three boards. However, I was green, nervous, and - by that point - very tired of sitting in a cold, loud machine room for 12 hours. Turned the box off, rang the local Sun support office and left a message (we didn't have weekend coverage on any of our systems then), rang my boss and the primary stakeholder in the Finance unit and went home.

Come Monday morning, all hell broke loose - the Accounts groups were unable to do any work, and the DBAs had to do a very quick enable of the DR system so I could get time to work on the problem with Sun. The "quick enable" took around 4 hours, if I'm remembering it correctly. Fortunately for me, not only were the DBAs quite sympathetic and very quick to help, but Miriam on the support phone number (who later hired me) was able to diagnose the problem and organise a service call to replace the faulty board. She also calmed me down, which I really, really appreciated. (Thankyou Miriam!)

So ... why am I dredging this up? Because I've just done a LiveUpgrade (LU) from Solaris Nevada build 91 to build 93, with ZFS root, and it took me a shade under 90 minutes. Total. Including the post-installation reboot. Not only would I have gone all gooey at the idea of being able to do something like LU back in that job, but if I could have done it with ZFS and not had to reconfigure all the uni- and multi-pack devices I probably could have had the whole upgrade done in around 4 or 5 hours rather than 12. (Remember, of course, that while the SS1000 could take quite a few cpus, they were still very very very very sloooooooooow).

Here's a trancript of this evening's upgrade:


# uname -a
SunOS gedanken 5.11 snv_91 i86pc i386 i86xpv

(remove the snv_91 LU packages)
pkgrm SUNWlu... packages from snv_91
(add the snv_93 LU packages)
pkgadd SUNWlu... packages from snv_93

(Create my LU config)
# lucreate -n snv_93 -p rpool
Checking GRUB menu...
Analyzing system configuration.
No name for current boot environment.
INFORMATION: The current boot environment is not named - assigning name .
Current boot environment is named .
Creating initial configuration for primary boot environment .
The device  is not a root device for any boot environment; cannot get BE ID.
PBE configuration successful: PBE name  PBE Boot Device .
Comparing source boot environment  file systems with the file 
system(s) you specified for the new boot environment. Determining which 
file systems should be in the new boot environment.
Updating boot environment description database on all BEs.
Updating system configuration files.
Creating configuration for boot environment .
Source boot environment is .
Creating boot environment .
Cloning file systems from boot environment  to create boot environment .
Creating snapshot for  on .
Creating clone for  on .
Setting canmount=noauto for  in zone  on .
Saving existing file  in top level dataset for BE  as //boot/grub/menu.lst.prev.
File  propagation successful
Copied GRUB menu from PBE to ABE
No entry for BE  in GRUB menu
Population of boot environment  successful.
Creation of boot environment  successful.
-bash-3.2# zfs list
NAME                             USED  AVAIL  REFER  MOUNTPOINT
rpool                           50.0G   151G    35K  /rpool
rpool/ROOT                      7.06G   151G    18K  legacy
rpool/ROOT/snv_91               7.06G   151G  7.06G  /
rpool/ROOT/snv_91@snv_93        71.5K      -  7.06G  -
rpool/ROOT/snv_93                128K   151G  7.06G  /tmp/.alt.luupdall.2695
rpool/WinXP-Host0-Vol0          3.57G   151G  3.57G  -
rpool/WinXP-Host0-Vol0@install  4.74M      -  3.57G  -
rpool/dump                      4.00G   151G  4.00G  -
rpool/export                    7.47G   151G    19K  /export
rpool/export/home               7.47G   151G  7.47G  /export/home
rpool/gate                      5.86G   151G  5.86G  /opt/gate
rpool/hometools                 2.10G   151G  2.10G  /opt/hometools
rpool/optcsw                     225M   151G   225M  /opt/csw
rpool/optlocal                  1.20G   151G  1.20G  /opt/local
rpool/scratch                   14.4G   151G  14.4G  /scratch
rpool/swap                         4G   155G  64.6M  -

# lustatus
Boot Environment           Is       Active Active    Can    Copy      
Name                       Complete Now    On Reboot Delete Status    
-------------------------- -------- ------ --------- ------ ----------
snv_91                     yes      yes    yes       no     -         
snv_93                     yes      no     no        yes    -         


Golly, that was so easy! Here I was rtfming for the LU with UFS syntax.... not needed at all.


# time luupgrade -u -s /media/SOL_11_X86 -n snv_93

No entry for BE  in GRUB menu
Copying failsafe kernel from media.
Uncompressing miniroot
Uncompressing miniroot archive (Part2)
13367 blocks
Creating miniroot device
miniroot filesystem is 
Mounting miniroot at 
Mounting miniroot Part 2 at 
Validating the contents of the media .
The media is a standard Solaris media.
The media contains an operating system upgrade image.
The media contains  version <11>.
Constructing upgrade profile to use.
Locating the operating system upgrade program.
Checking for existence of previously scheduled Live Upgrade requests.
Creating upgrade profile for BE .
Checking for GRUB menu on ABE .
Saving GRUB menu on ABE .
Checking for x86 boot partition on ABE.
Determining packages to install or upgrade for BE .
Performing the operating system upgrade of the BE .
CAUTION: Interrupting this process may leave the boot environment unstable 
or unbootable.
Upgrading Solaris: 100% completed
Installation of the packages from this media is complete.
Restoring GRUB menu on ABE .
Adding operating system patches to the BE .
The operating system patch installation is complete.
ABE boot partition backing deleted.
PBE GRUB has no capability information.
PBE GRUB has no versioning information.
ABE GRUB is newer than PBE GRUB. Updating GRUB.
GRUB update was successful.
Configuring failsafe for system.
Failsafe configuration is complete.
INFORMATION: The file  on boot 
environment  contains a log of the upgrade operation.
INFORMATION: The file  on boot 
environment  contains a log of cleanup operations required.
WARNING: <3> packages failed to install properly on boot environment .
INFORMATION: The file  on 
boot environment  contains a list of packages that failed to 
upgrade or install properly.
INFORMATION: Review the files listed above. Remember that all of the files 
are located on boot environment . Before you activate boot 
environment , determine if any additional system maintenance is 
required or if additional media of the software distribution must be 
installed.
The Solaris upgrade of the boot environment  is partially complete.
Installing failsafe
Failsafe install is complete.

real    83m24.299s
user    13m33.199s
sys     24m8.313s

# zfs list
NAME                             USED  AVAIL  REFER  MOUNTPOINT
rpool                           52.5G   148G  36.5K  /rpool
rpool/ROOT                      9.56G   148G    18K  legacy
rpool/ROOT/snv_91               7.07G   148G  7.06G  /
rpool/ROOT/snv_91@snv_93        18.9M      -  7.06G  -
rpool/ROOT/snv_93               2.49G   148G  5.53G  /tmp/.luupgrade.inf.2862
rpool/WinXP-Host0-Vol0          3.57G   148G  3.57G  -
rpool/WinXP-Host0-Vol0@install  4.74M      -  3.57G  -
rpool/dump                      4.00G   148G  4.00G  -
rpool/export                    7.47G   148G    19K  /export
rpool/export/home               7.47G   148G  7.47G  /export/home
rpool/gate                      5.86G   148G  5.86G  /opt/gate
rpool/hometools                 2.10G   148G  2.10G  /opt/hometools
rpool/optcsw                     225M   148G   225M  /opt/csw
rpool/optlocal                  1.20G   148G  1.20G  /opt/local
rpool/scratch                   14.4G   148G  14.4G  /scratch
rpool/swap                         4G   152G  64.9M  -
-bash-3.2# lustatus
Boot Environment           Is       Active Active    Can    Copy      
Name                       Complete Now    On Reboot Delete Status    
-------------------------- -------- ------ --------- ------ ----------
snv_91                     yes      yes    yes       no     -         
snv_93                     yes      no     no        yes    -         

# luactivate snv_93
System has findroot enabled GRUB
Generating boot-sign, partition and slice information for PBE 
Saving existing file  in top level dataset for BE  as //etc/bootsign.prev.
WARNING: <3> packages failed to install properly on boot environment .
INFORMATION:  on boot 
environment  contains a list of packages that failed to upgrade or 
install properly. Review the file before you reboot the system to 
determine if any additional system maintenance is required.

Generating boot-sign for ABE 
Saving existing file  in top level dataset for BE  as //etc/bootsign.prev.
Generating partition and slice information for ABE 
Copied boot menu from top level dataset.
Generating direct boot menu entries for PBE.
Generating xVM menu entries for PBE.
Generating direct boot menu entries for ABE.
Generating xVM menu entries for ABE.
Disabling splashimage
Re-enabling splashimage
No more bootadm entries. Deletion of bootadm entries is complete.
Changing GRUB menu default setting to <0>
Done eliding bootadm entries.

**********************************************************************

The target boot environment has been activated. It will be used when you 
reboot. NOTE: You MUST NOT USE the reboot, halt, or uadmin commands. You 
MUST USE either the init or the shutdown command when you reboot. If you 
do not use either init or shutdown, the system will not boot using the 
target BE.

**********************************************************************

In case of a failure while booting to the target BE, the following process 
needs to be followed to fallback to the currently working boot environment:

1. Boot from Solaris failsafe or boot in single user mode from the Solaris 
Install CD or Network.

2. Mount the Parent boot environment root slice to some directory (like 
/mnt). You can use the following command to mount:

     mount -Fzfs /dev/dsk/c1t0d0s0 /mnt

3. Run  utility with out any arguments from the Parent boot 
environment root slice, as shown below:

     /mnt/sbin/luactivate

4. luactivate, activates the previous working boot environment and 
indicates the result.

5. Exit Single User mode and reboot the machine.

**********************************************************************

Modifying boot archive service
Propagating findroot GRUB for menu conversion.
File  propagation successful
File  propagation successful
File  propagation successful
File  propagation successful
Deleting stale GRUB loader from all BEs.
File  deletion successful
File  deletion successful
File  deletion successful
Activation of boot environment  successful.

# date
Friday,  4 July 2008  9:45:41 PM EST


# init 6
propagating updated GRUB menu
Saving existing file  in top level dataset for BE  as //boot/grub/menu.lst.prev.
File  propagation successful
File  propagation successful
File  propagation successful
File  propagation successful



Here I reboot and then login.


# lustatus
Boot Environment           Is       Active Active    Can    Copy      
Name                       Complete Now    On Reboot Delete Status    
-------------------------- -------- ------ --------- ------ ----------
snv_91                     yes      no     no        yes    -         
snv_93                     yes      yes    yes       no     -         

# lufslist -n snv_91
               boot environment name: snv_91

Filesystem              fstype    device size Mounted on          Mount Options
----------------------- -------- ------------ ------------------- --------------
/dev/zvol/dsk/rpool/swap swap       4294967296 -                   -
rpool/ROOT/snv_91       zfs          20630528 /                   -


# lufslist -n snv_93
               boot environment name: snv_93
               This boot environment is currently active.
               This boot environment will be active on next system boot.

Filesystem              fstype    device size Mounted on          Mount Options
----------------------- -------- ------------ ------------------- --------------
/dev/zvol/dsk/rpool/swap swap       4294967296 -                   -
rpool/ROOT/snv_93       zfs       10342821376 /                   -




Cor! That was so easy I think I need to fall off my chair.

Thinking about this for a moment, I needed just 6 commands and around 90 minutes to upgrade my laptop. If only I'd had this technology available to me back then.


Finally, let me send a massive, massive thankyou to the install team and the ZFS team for all their hard work to get these technologies integrated and working pretty darned smoothly together.

Tuesday Jun 17, 2008

I did a bios upgrade on my laptop the other day - from A05 to A08. Thought nothing of it until I re-installed the beast with build 91 to get some ZFS root goodness. (Note that currently you have to use the text-mode installer to do this).

xVM told me, none too politely, that it couldn't find any virtualization capabilities in my cpus, so it wasn't going to be my friend any more.

I logged 6714698 snv_91 xVM spurious failure on VT-enabled hardware and provided what I thought was enough info (prtpicl -v and prtconf -v output). Turns out I should have also provided the output from xm info and xm dmesg. When I did, I noticed these lines:
...
xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p 
...

and
(xVM) Processor #0 6:15 APIC version 20
(xVM) Processor #1 6:15 APIC version 20
(xVM) IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-23
(xVM) Enabling APIC mode:  Flat.  Using 1 I/O APICs
(xVM) Using scheduler: SMP Credit Scheduler (credit)
(xVM) Detected 2194.558 MHz processor.
(xVM) VMX disabled by Feature Control MSR.
(xVM) CPU0: Intel(R) Core(TM)2 Duo CPU     T7500  @ 2.20GHz stepping 0b
(xVM) Booting processor 1/1 eip 90000
(xVM) VMX disabled by Feature Control MSR.
(xVM) CPU1: Intel(R) Core(TM)2 Duo CPU     T7500  @ 2.20GHz stepping 0b
(xVM) Total of 2 processors activated.


What the...?


Quick jump into the bios revealed that there was a new option - Virtualization support. It was, of course, turned off by default. Turning it on and booting the xVM kernel showed me some much nicer output from those commands:
xen_caps               : xen-3.0-x86_64 xen-3.0-x86_32p
                             hvm-3.0-x86_32 hvm-3.0-x86_32p hvm-3.0-x86_64 

and
(xVM) Processor #0 6:15 APIC version 20
(xVM) Processor #1 6:15 APIC version 20
(xVM) IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-23
(xVM) Enabling APIC mode:  Flat.  Using 1 I/O APICs
(xVM) Using scheduler: SMP Credit Scheduler (credit)
(xVM) Detected 2194.555 MHz processor.
(xVM) HVM: VMX enabled
(xVM) VMX: MSR intercept bitmap enabled
(xVM) CPU0: Intel(R) Core(TM)2 Duo CPU     T7500  @ 2.20GHz stepping 0b
(xVM) Booting processor 1/1 eip 90000
(xVM) CPU1: Intel(R) Core(TM)2 Duo CPU     T7500  @ 2.20GHz stepping 0b
(xVM) Total of 2 processors activated.


Now as soon as I get a spare cycle or three, I can go and see about building an S10 domU for backport builds. That'll be fun!

Thursday May 22, 2008

I met a bloke called Arjen Lentz years ago. And I do mean years ago. Via HUMBUG, a user group I joined at its inception. He was with MySQL at the time, either their only or perhaps one of 2 employees that the company had in Australia.

He really drove home to me that it really was possible to work remotely from your boss, and from any other colleague, and develop software, and do so effectively.

Last year he moved on from MySQL to found his own consulting company, Open Query. Training, design, tuning... anything you want to do with MySQL (and PostgreSQL for that matter) he'll be able to help you do it. As far as I know, he's doing really well. I'm not sure that I'd like the startup-on-my-own thing - makes me just a little bit nervous right now - so I've got a lot of respect for those who do. (My best mate Leighton does this as well with Bare Metal Software).

Anyway.... Arjen was interviewed by one of Australia's well-known OpenSource commentators, Sam Varghese, and that interview is now live on ITWire.

If there's truly anybody out there who still wonders how you can make money from Open Source Software, Arjen is a great example - and he only lives about 10km from me :-)

Wednesday May 14, 2008

I run two non-global zones on my workstation - one for web/dns/blog, and one for my VPN connection to Sun. Yesterday realised that there was an internal webcast I really needed to listen, so I started playing around with audio in the zone. First off, there wasn't any audio output. No /dev/audio* or /dev/sound/*.

After a bit of searching, I found that I should add a "set match" option to my zonecfg:


# zoneadm -z knockout
zonecfg:knockout> add device
zonecfg:knockout:device> set match=/dev/sound/*
zonecfg:knockout:device> end
zonecfg:knockout> commit
zonecfg:knockout> exit
# zoneadm -z knockout boot

But that didn't work. I was rather annoyed at that point, so I logged 6701076 zones should not be sound proof!. Perhaps I was a bit hasty - the RE updated the bug overnight (my time) asking "Why didn't you do the obvious thing and add a 'set match=/dev/audio*' ?"

Which was the "well, duh!" moment for me. Boy do I feel like a nong:


# zoneadm -z knockout halt
# zoneadm -z knockout
# zonecfg -z knockout
zonecfg:knockout> add device
zonecfg:knockout:device> set match=/dev/audio*
zonecfg:knockout:device> end
zonecfg:knockout> commit
zonecfg:knockout> exit
# zoneadm -z knockout boot

/me looks around sheepishly.... it works :-)

Tuesday Mar 18, 2008

Just read this interview with James Gosling. On the front page of the SMH no less.

My favourite quote from the article is this:


At 52, Mr Gosling is a researcher at Sun Microsystems where his main interest is software development tools. "The reason why I stay is it's filled with a bunch of nutcases. Sun is a (relatively) small organisation, so there is a culture of tolerating craziness. It is open and understanding to risk; to an idea that might not be what people are expecting."

Ain't that the truth!

I've often said (to myself, at least!) that I work with some scarysmart people here at Sun, and it's nice to know that I'm not the only one who thinks that we're more than a little wacky.

Sunday Mar 02, 2008

Care to get your hands on a Sun workstation? If you're going to the Sydney Tech Days or the OpenSolaris day and register on OpenSolaris.org during the event, then you'll go into a draw to win one of three Ultra 20 workstations.

I don't know the exact conditions attached - so you'll have to rock up to the event to find out :)

While you're attending, come along to the OpenSolaris booth where you'll find people like .... oh, me!... hanging around and talking about pretty much anything and everything related to OpenSolaris.

Technorati tags: , , , ,

Tuesday Feb 26, 2008

Got me a new laptop two weeks ago - spiffy new Dell XPSM1530, dual core Intel T7500 cpu, 4gb ram, 320Gb sata disk, the ultrabright 1680x1050 screen, Intel 4965abg wireless, builtin webcam. Very nice.

Except that the builtin wired nic is a Marvell Yukon FE+. Not supported by skge, or yukonx from Marvell and while there's a patch for FreeBSD, it hasn't been ported or integrated into the myk driver that Masa Murayama wrote.

I logged 6660771 need GLDv3 driver support for Marvell Yukon FE+ in Solaris but it's not resolved yet.

Note for the unwary: when I tried the skge and yukonx drivers, I got system panics:


update_drv -v -a -i ' "pci11ab,22e" ' [skge|yukonx]

which results in a message like this:


ERROR: yukonx0: SkGeHwInnit: Currently not supported!

So being the Bright, Resourceful, Usually Correct and Exact person that I am, I emailed Masa directly asking for help.

A number of myk test iterations later and I've now got a working myk driver. Not totally sure when he's going to post the updated version to his website, but the version I've found success with is 2.6.0t9 - it's still missing a few things but it seems to be able to give me 11.mumble Mbyte/sec over my 100Mbit/sec switch to blinder (u40m2) - pretty good indeed.

I also needed to install the Opensound drivers but once PSARC/2008/043 is integrated I don't think that'll be necessary.

Now I can go off to the Sun TechDays conference next week with all the bits working together.

Thankyou Masa - you're a champ!

Wednesday Jan 16, 2008

Got my travel, hotel and everything else booked today so that I can go and present (for Jim Walker) on OpenSolaris Testing at the upcoming Sun Tech Days conference at the start of March.

I'm really looking forward to it and I hope to catch up with my Sydney-based colleagues and SOSUG mates as well as our friends and family. (J's able to come with me, which is a real bonus).

Friday Dec 14, 2007

One of the things I'm working on at the moment is a firmware flashing utility. We've got an existing one in Solaris, called fwflash(1m) and one thing that PSARC made very clear is that They don't want a proliferation of firmware flashing utilities inside Solaris. So I'm working on making fwflash(1m) pluggable.

There's a good deal of work required to make this succeed, mostly in the implementation of a plugin interface, and a specific plugin for the area that has a requirement I need to solve.

That requirement pretty much mandates the use of SCSI Enclosure Services-2 (SES2), which is all good and well except when we get to section 6.1.13.3 which deals with the Additional Element Status descriptor protocol-specific information for SAS. I'm particularly annoyed at sections 6.1.13.3.3 (SAS Expanders) and 6.1.13.3.4 (SCSI Initiator Port, SCSI Target Port, Enclosure Services Controller Electronics).

The problem is that - as far as I can see, after about a week's worth of serious and detailed investigation - these sections overlap in how they deliver a data payload to you. So figuring whether you've got a SAS expander, or one of a SCSI Initiator Port, SCSI Target Port or Enclosure Services Controller Electronics is actually incredibly difficult.

I could punt and look at the size of the data payload, except that there'll be cases where Expanders vs (the rest) will coincide in terms of payload size. Or I could assume that everything I see there is an Expander - which would be wrong. Or I could do a massive amount of extra engineering in order to approximate what is probably the answer. Or I could use a lookup table to match against the devices which I really want and need to get access to. Right now, the lookup table is winning - a fact about which I am *not* happy.

So, what used to be elegant code in my first prototype is now quite ugly. I'm not happy about it, this vagueness in SES2 has kicked my schedule around and has caused sleepless nights while trying to figure out a way forward.

The SCSI family of standards are normally very well defined, very clear, and precise. I'm not impressed with SES2, that's for sure.

Technorati tags:

Friday Dec 07, 2007

In the last few days I've been kinda-sorted prevented from successfully LiveUpgrading due to a freakin' annoying bug in my Ultra20-M2 system bios:

6636511 u20m2 bios version 1.45.1 still can't distinguish disks on the same sata channel

(It's in a closed prod/cat/subcat, sorry).

The gist of the bug is that I've got two identical Seagate 320Gb disks (ST3320620AS, 320072933376 bytes) in my system, providing /, /zroot (for my zones, it's ufs), and sink - my zpool. No matter which two SATA ports I plug those two disks into, Shidokht's /sbin/biosdev util cannot do anything but report either no disks found, or (if run with -d) that the matchcount for the devices is greater than 1.

This means that /usr/lib/lu/lumkboot, which is called as part of lucreate and friends, cannot do the needful. Hence LU fails.

Yesterday I finally cracked and went off to purchase two new 320Gb disks (one Western Digital, the other a Samsung) in order to see how deep the bug goes. This became particularly important after JanD attempted to reproduce

6628268 u20 and u20m2 + snv_75a with non-global zones refuses to allow LU (lucreate)

with an u20m2 and two identical Hitachi 250Gb disks. He wasn't able to, despite having the same model disk, with the same firmware version in each slot.

At the moment my box is having a grand old time, 1hr10 into a zpool replace:


farnarkle:jmcp $ zpool status sink
pool: sink
state: DEGRADED
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scrub: resilver in progress, 67.52% done, 0h28m to go
config:

NAME STATE READ WRITE CKSUM
sink DEGRADED 0 0 0
mirror DEGRADED 0 0 0
c2t0d0s7 ONLINE 0 0 0
replacing DEGRADED 0 0 0
c3t0d0s7/old FAULTED 0 0 0 corrupted data
c3t0d0s7 ONLINE 0 0 0

errors: No known data errors

To get to the point where zpool could replace the device, I made sure the slices on the new disk were in order, then ran zpool replace sink c3t0d0s7. That's it - it's really nifty.

I've got one more thing to try (swapping the cables around for c3t0 and c3t1), which I think I'll have a go at in about 40 minutes. Whatever the results of that test, it's not looking good for the bios when it's got Seagate-branded disks attached.

Friday Nov 02, 2007

Wow ... what a day! I hung out with my friends (including Robs) at the HCTS - Hardware Certification Test Suite booth for most of the day. We were showing off the latest prototype demo of the Slim Installer LiveCD, which provided a very nice segue to the OpenSolaris Device Detection Tool. I lost track of how many people I talked to and handed out August 2007 OpenSolaris starter kits to, but it was definitely a steady stream of people.

Just after lunch we managed to trip a surge protector, so the se3320 and t2k sitting in front of me were silent. Then a bunch of students from Beijing IT Institute rocked up and wanted to know all about the t2k. So I showed them the fan trays, swung the unit around and started talking about LOM and the various interfaces.... then figured what the heck, might as well yank the cover off entirely and show them the insides.

So a good 45 minutes later after many many questions and much discussion about the benefits of multi-core computing and OpenSolaris, they invited me to visit their campus! Unfortunately I can't take them up on the offer since I'm flying home today (gee it's late as I type this!) but my Beijing colleagues will definitely visit them.

The last session of the day for us was a demo of a device driver writing utility which our team is working on as a NetBeans / Sun Studio plugin. I think it went quite well, though I did get the impression that a lot of the attendees didn't really understand what a device driver was!

Other highlights of the day were meeting Josh Berkus of PostgreSQL fame, and (finally!) meeting Jim Grisanzio in the flesh, albeit briefly.

A few photos from today:


Crowds at registration

More crowds at registration

Queueing near the PostgreSQL booth

Steve talking about xVM

Josh Berkus

Fiona talking about HCTS and HCTLive

Josh Berkus

Ryan about to start the device driver demo session

Hands on experience in the demo

Example template code from the demo

Ada explaining during the demo

Kevin explaining during the demo

Javen explaining during the demo

Technorati tags: , , , , , , , , , , , ,

Wednesday Oct 31, 2007

One part of this trip which I am really pleased about (apart from getting the backport patches released of course!) is that I'm going to get to go to the Beijing Sun Tech Day tomorrow (1st November).

The event is going to be at the Beijing International Conference Centre next to the Birdsnest, aka Beijing Olympic Stadium:

I'm really looking forward to hearing Jim Hughes speak. Sun "acquired" him with the StorageTek acquisition, and since I have a huge bias towards tape (from years of doing netbackup and networker support), I'm keen to see if I can have a chat with him.

We'll see how it goes. I'll have the camera, I'll take photos and I'll be hanging out at whatever OpenSolaris-related booths I can find. Come and say hi if you're there.

Technorati tags: , , , , , , , , ,

 

 

Today I'm ecstatic to be able to announce that the S10 patches for our backport are finally available on sunsolve.sun.com. We've delivered PSARC 2006/703 MPxIO extension for Serial Attached SCSI, and (my personal favourite) PSARC 2007/046 stmsboot(1M) extension for mpt(7D).


The patches that you need to install are

sparc:: 125081-10
(We recommend that on sparc you also install 127747-01 as well, due to 6466248)

and

x86/x64:: 125082-10

 

The full list of rfes and bugs is as follows:


6443044 add mpxio support to SAS mpt driver
6502231 stmsboot needs to support SAS devices
6544226 mpt needs mdb module

6242789 primary path comes up as standby instead online even if auto-failback is enabled
6442215 mpt.conf maybe overwritten because filetype within SUNWckr package is 'f'
6449836 stmsboot -d failed to boot if several LUNs or targets map to same partition
6510425 properties "flow_control" and "queue" in mpt.conf are useless
6525558 untagged command unlikely to be sent to HBA during heavy I/O
6541750 CAM5.1.1b2: 2530, MPT2: Vdbench bailed out after I pull ctlr-A out
6545198 build should allow architecture-dependent class action scripts
6546164 stmsboot does not remove sun4u SMF service, erroneously lists parallel SCSI HBAs
6548867 mpxio-upgrade script has fatally mis-defined variable
6550585 mpt driver has a memory leak in mpt_send_tur
6550591 mpt should not print unnecessary messages
6550849 WARNING: mpt TEST_UNIT_READY failure
6554029 mpt should get maxdevice from portfacts, not IOCfacts
6554556 stmsboot's privilege message is not quite correct
6556832 after ctlr brought online, some paths failed to come back
6560371 mpt hangs during ST2530 firmware upgrade
6566097 mpt: sd targets under mpt are not power-manageable
6566815 changes for 6502231 broke g11n in stmsboot
6531069 SCSI2 (tc_mhioctkown test cases) testing are showing UNRESOLVED results for ST2530
6546465 mpt: kernel panic due to NULL pointer reference in an error code path
6556852 mpt needs to support Sun Fire x4540 platform
6588204 mpt_check_scsi_io_error() incorrectly tests IOCStatus register
6588278 mpt driver doesn't check GUID of LUN when the path online
6591973 panic in mdi_pi_free() when remapping devices
6613189 T125082-09 and T125081-09 don't work - missing misc/scsi module from deliverables



As an interesting side note, during the development process we stumbled across

6566270 Seagate Savvio 10k1 disks do not enumerate under scsi_vhci

You'll probably see this if you have a Galaxy or T2000/T1000 system. (Unfortunately you need a service contract to view the bug report due to its category).

 

And on a personal note, I'd like to thank the other members of our team for working so well together - with Greg in Melbourne, Javen and Dolpher up in Beijing, test teams in Beijing, Menlo Park, Broomfield and San Diego and yours truly in Sydney (and now Brisbane) - we have truly been a virtual team. I reckon we've demonstrated that physical distance does not get in the way of designing, developing, testing and (most importantly) delivering good software that provides solutions for our customers.

 


Technorati tags: , , , , , , , , , , ,

Sunday Oct 07, 2007

One thing I'd forgotten about my homeblog was that I'd registered it with Technorati.

Please excuse this post as a minor distraction while I get this blog claimed as well.




Technorati Profile

Get OpenSolaris


For the last few weeks I've been working with Jason King (jbk on #opensolaris) to integrate his clean-room re-implementation of libdisasm for SPARC.

Today, having received RTI approval, passed all the tests and checks and run many many nightly builds I was able to putback the changes to the ON gate. The heads up message is here.

 

The putback comments are


PSARC/2007/507 Unencumbered libdisasm for Sparc
6596739 need non-encumbered libdisasm for sparc
6396410 Update dis for preferred assembly language syntax
4751282 fp conversion ops decode registers incorrectly
4767086 fmovrq registers decoded wrong
4767091 pixel compare source registers decoded wrong
4767154 Registers for fmul8x16, fmul8sux16, fmul8ulx16 decoded wrong
4658958 dis misrepresents invalid opcodes
6193412 Support for new Olympus B/C instructions needed in disassemblers

 

I expect that there will be a few followup putbacks as people find edge cases, but the great thing about this putback is that *you* can make those changes if you want. You don't have to depend on Sun doing it for you :-)

 

Thankyou Jason - you've helped make OpenSolaris more open.


Technorati tags: OpenSolaris Jason King OpenSPARC UltraSPARC Solaris disassembler jbk libdisasm

This blog copyright 2008 by jmcp