« May 2008
SunMonTueWedThuFriSat
    
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
       
Today
XML

Tom Haynes

loghyr.com
excfb.com

Blogs to Gander At

Navigation

Editing

AllMarks

Referers

Today's Page Hits: 1720

Powered by Roller Weblogger.

statcounter.com

clustrmaps.com

Locations of visitors to this page

technorati.com

www.alesti.org

Add to Alesti RSS Reader

South Park as I was 10 years ago

South Park Fantasy

South Park today

South Park Reality

I have more hair and it isn't so grey. :->

10 years ago, really

Toon Tom

Today, literally

Tom Today

Site notes

This page validates as XHTML 1.0, and will look much better in a browser that supports web standards, but it is accessible to any browser or Internet device. It was created using techniques detailed at glish.com/css/.

Main | Next page »
20080131 Thursday January 31, 2008
Getting a webrev generated for code review

I sponsor OpenSolaris contributions. I recently asked some contributers to provide a webrev on cr.opensolaris.org such that I could get a review done much in the same manner as we do internally. And I found there wasn't much usage documentation for it. And then the same subject just came up here: Shooting my mouth off.

I was wondering what it would take to generate a webrev. So I decided on a little experiment. Note, I chose to do this without Mercurial because at least one poster couldn't get to a repository. This should be easy enough to do with 'hg' as well.

First, lets get the ON source and install it in two locations:

[tdh@silver ~]> mkdir os
[tdh@silver ~]> cd os
[tdh@silver ~/os]> wget http://dlc.sun.com/osol/on/downloads/current/on-src.tar.bz2
...
[tdh@silver ~/os]> bzcat on-src.tar.bz2 | tar xf -
...
[tdh@silver ~/os]> ls -la
total 132535
drwxr-xr-x   3 tdh      staff          5 Jan 30 17:25 .
drwxr-xr-x  54 tdh      staff        101 Jan 31 11:12 ..
-rw-r--r--   1 tdh      staff    67779916 Jan 30 17:43 on-src.tar.bz2
-rw-r--r--   1 tdh      staff      10420 Jan 30 17:25 README.opensolaris
drwxr-xr-x   3 tdh      staff          3 Jan 30 17:24 usr
[tdh@silver ~/os]> mv usr README.opensolaris head/
[tdh@silver ~/os]> bzcat on-src.tar.bz2 | tar xf -
...
[tdh@silver ~/os]> mkdir tail
[tdh@silver ~/os]> mv usr README.opensolaris tail

And now set up our environment:

[tdh@silver ~/os]> cd tail/
[tdh@silver tail]> cp usr/src/tools/env/opensolaris.sh .
[tdh@silver tail]> chmod +w opensolaris.sh
[tdh@silver tail]> vi opensolaris.sh
[tdh@silver tail]> diff opensolaris.sh usr/src/tools/env/opensolaris.sh
43c43
< NIGHTLY_OPTIONS="-FNnaCDlmr";         export NIGHTLY_OPTIONS
---
> NIGHTLY_OPTIONS="-FNnaCDlmrt";                export NIGHTLY_OPTIONS
47c47
< GATE=tail;                    export GATE
---
> GATE=testws;                  export GATE
50c50
< CODEMGR_WS="/home/tdh/os/$GATE";                      export CODEMGR_WS
---
> CODEMGR_WS="/export/$GATE";                   export CODEMGR_WS
93c93
< STAFFER=tdh;                          export STAFFER
---
> STAFFER=nobody;                               export STAFFER

Note, I took out -t because I had earlier installed SUNWonbld:

[tdh@silver ~/os]> which webrev
/opt/onbld/bin/webrev

Okay, really get the environment:

[tdh@silver tail]> bldenv -d opensolaris.sh
Build type   is  DEBUG
RELEASE      is
VERSION      is  tail
RELEASE_DATE is

The top-level 'setup' target is available to build headers and tools.

Using /bin/tcsh as shell.

And a simple test (which shows why I should use 'hg' by the way - i.e., chmod is not an acceptable change tracking mechanism.):

[tdh@silver tail]> cd usr/src/prototypes/
[tdh@silver prototypes]> chmod +w prototype.c
[tdh@silver prototypes]> cp prototype.c xxx
[tdh@silver prototypes]> vi prototype.c
[tdh@silver prototypes]> diff prototype.c xxx
23c23
<  * Copyright 2008 Sun Microsystems, Inc.  All rights reserved.
---
>  * Copyright 2007 Sun Microsystems, Inc.  All rights reserved.
51,55d50
<
< /*
<  * Get up, stand up, stand up for your rights!
<  * Get up, stand up, don't give up your rights!
<  */

Yes, Bob Marley is playing in the background.

[tdh@silver prototypes]> cd ../../..
[tdh@silver ~/os]> pwd
/home/tdh/os/tail
[tdh@silver tail]> vi changes.txt
[tdh@silver tail]> more !$
more changes.txt
usr/src/prototypes/prototype.c

And this does not work:

[tdh@silver tail]> webrev -p ../head -i changes.txt
Unable to determine SCM type currently in use.
For teamware: webrev looks for $CODEMGR_WS either in
              the environment or in the file list.

And I'll try this:

[tdh@silver tail]> vi changes.txt
[tdh@silver tail]> cat changes.txt
CODEMGR_WS=/home/tdh/os/tail
CODEMGR_PARENT=/home/tdh/os/head
usr/src/prototypes/prototype.c
[tdh@silver tail]> webrev changes.txt
   SCM detected: teamware
 File list from: changes.txt
      Workspace: /home/tdh/os/tail
Compare against: /home/tdh/os/head
      Output to: /home/tdh/os/tail/webrev
   Output Files:
        usr/src/prototypes/prototype.c
                 patch cdiffs udiffs wdiffs sdiffs frames ps old new
 Generating PDF: Done.
     index.html: Done.

By the way, thanks to Frank Hoffman for a link to the man page, which is here:

[tdh@silver tail]> nroff -man usr/src/tools/scripts/webrev.1 | more

And we see we have output:

[tdh@silver tail]> ls -la webrev/
total 42
drwxr-xr-x   4 tdh      staff         11 Jan 31 11:35 .
drwxr-xr-x   4 tdh      staff          8 Jan 31 11:35 ..
-rw-r--r--   1 tdh      staff          2 Jan 31 11:35 TotalChangedLines
-rw-r--r--   1 tdh      staff       4983 Jan 31 11:35 ancnav.html
-rw-r--r--   1 tdh      staff       3053 Jan 31 11:35 ancnav.js
-rw-r--r--   1 tdh      staff         93 Jan 31 11:35 file.list
-rw-r--r--   1 tdh      staff       3760 Jan 31 11:35 index.html
drwxr-xr-x   4 tdh      staff          4 Jan 31 11:35 raw_files
-rw-r--r--   1 tdh      staff        489 Jan 31 11:35 tail.patch
-rw-r--r--   1 tdh      staff       2518 Jan 31 11:35 tail.pdf
drwxr-xr-x   3 tdh      staff          3 Jan 31 11:35 usr

At this point, you need to follow the instructions at cr.opensolaris.org to post your changes for review. Note that you could also tar and bzip the webrev to send out.

But I can finish this off with a simple example:

[tdh@silver tail]> scp -r webrev/ tdh@cr.opensolaris.org:example
The authenticity of host 'cr.opensolaris.org (72.5.123.19)' can't be established.
RSA key fingerprint is fa:ab:0c:a4:78:75:0b:bd:3d:eb:74:f1:b5:e1:98:a8.
Are you sure you want to continue connecting (yes/no)? yes
Warning: Permanently added 'cr.opensolaris.org,72.5.123.19' (RSA) to the list of known hosts.
Enter passphrase for key '/home/tdh/.ssh/id_dsa':
...

Which means that the following link should work: http://cr.opensolaris.org/~tdh/example


Originally posted on Kool Aid Served Daily
Copyright (C) 2008, Kool Aid Served Daily

20071126 Monday November 26, 2007
OpenSolaris Developer's Reference Guide

I've been sponsoring Shawn Walker with what started as a relatively simple and bite sized fix for CR 6397024. The usage output of /usr/lib/fs/nfs/umount was incorrect:

# /usr/lib/fs/nfs/umount
Usage: nfs umount [-o opts] {server:path | dir}

The only flag allowed is -f, so this is the desired output:

# /usr/lib/fs/nfs/umount
Usage: nfs umount [-f] {server:path | dir} 

All this takes is a small change to what is basically a printf() string. We've been going back and forth on it for some time. Shawn decided to add a new lint library and he didn't do it correctly. And I decided that was cool to do. I never discussed with him that he should pull it, he made a choice to improve lint time checking and it could help catch an issue later on. And we want to encourage OpenSolaris developers to own decisions.

The problem was that Shawn didn't understand how the build space interacted with the install space. And the new lint library was not being installed. I knew what was going on, I just didn't know how to describe it succinctly to him. (I was also under pressure to get the Mirror Mounts project out on time - try it, you will love it.) We went back and forth a couple of times, until he finally prodded me to say it such that he understood it. Don't laugh, explaining a system without documentation and not fully understanding it yourself is hard to do - I intuitively understood parts that he didn't and visa versa.

Anyway, he came back later with new changes which compiled and installed correctly. So, I sent out a code review. And promptly stumbled across this resource on OpenSolaris Developer's Reference Guide. I know it would have helped Shawn. I know it would have helped me in both the In Kernel Sharetab and Mirror Mount projects.


Originally posted on Kool Aid Served Daily
Copyright (C) 2007, Kool Aid Served Daily

20071031 Wednesday October 31, 2007
iGoLogic JBOX Java Embedded Developer's Kit powered by VIA 1 GHz Processor

I just installed Nevada b73 on a JBOX from iGoJava. BTW: You would think a company selling boxes for Java development would use something other than ASP for their online store.

The system comes by default with Solaris 10 installed. And there are supposed to be special drivers needed to let that run on the box. Anyway, I just installed from a DVD image. I had to use a PS2 keyboard and mouse - the installer refused to detect the USB ones. But once installed, the system came up fine.

I haven't physically opened the box up - it is a work machine and it has a little warranty voiding sticker. So I'm not sure if this is the VIA cpu or an Intel one.

We want to use these small form factor machines as clients at Connectathon and BakeAThons. The guy trying these out before me had sent some back because he couldn't get Nevada to boot on it. I think he traded the VIA models for an Intel one. And yes, I can confirm that:

Oct 30 15:37:32 sharky unix: [ID 950921 kern.info] cpu0: x86 (GenuineIntel 6D8 family 6 model 13 step 8 clock 1500 MHz)
Oct 30 15:37:32 sharky unix: [ID 950921 kern.info] cpu0: Intel(r) Celeron(r) M processor         1.50GHz

The system has a fan and it is loud. I wouldn't want it on my desk. So I must have the JBOX-ES, even though it has the orange faceplate. If this were something I had bought, I would be looking at replacing the system fans.

It has 2 network ports (rtsl0 and rge0) and 3 serial ports. I like the power brick and the form factor.


Originally posted on Kool Aid Served Daily
Copyright (C) 2007, Kool Aid Served Daily

20070808 Wednesday August 08, 2007
Just became an OpenSolaris Sponsor

I had met the criteria for Becoming a sponsor last month and I never got around to getting it officially approved. I was hip deep writing an article for SDN: The Management of NFS Performance With Solaris ZFS with Doug McCallum. This is part of a concerted effort by the Storage Community to get content out there.

I'm also really busy with the Mirror Mount project (which is part of the OpenSolaris Project: NFSv4 namespace extensions). A good melding of us getting content out there and an explanation of Mirror Mounts is provided by Rob Thurlow as Mirror-mount and referrals demo.

Anyway, I saw Ram asking for a sponsor for 6428435 zfs rename failure can leave file systems unmounted. It sounded a lot like another I had worked on, and no one was replying. I knew I could be an intern sponsor, so I piped up that I could help. Well then Mark Musante sent me email stating that he wanted to work on it, but needed a sponsor he could intern with. And that led to me applying to be a sponsor.

Helping out sounds so simple in practice, but it takes time to track down all of the details. I can't tell you how stressed I am between the end of a project and some additional bugs I own. Sometimes contributing to OpenSolaris is not doing code, but enabling others to do so. You have to find the time, even when pressed for it, to make others feel like there are no impediments to contributing.


Originally posted on Kool Aid Served Daily
Copyright (C) 2007, Kool Aid Served Daily

20070717 Tuesday July 17, 2007
mdb: failed to add breakpoint: operation not supported by target

This has been bugging me all morning:

[console] root@burr ( e2:64 ) > mdb -kw
Loading modules: [ unix genunix specfs dtrace cpu.AuthenticAMD.15 uppc pcplusmp scsi_vhci ufs mpt ip hook neti sctp arp usba qlc fctl nca lofs zfs random fcp nfs cpc ptm sppp ]
> nfs4_trigger_mount:b 
mdb: failed to add breakpoint: operation not supported by target
> :c
mdb: failed to continue target: operation not supported by target
> $q

The first thing I did was search on google and no luck on finding a solution. Other people had asked for help on it, but not directly. I decided to set the system to boot up in kmdb (see eeprom hosed on an x86) and no luck either.

Okay, my next step was to make sure that I was really, really dropping to the kernel. I was suspicious that I wasn't getting there since '$q' was not attempting to reboot the box. In this case, the escape character was: '^]'.

[console] root@burr ( e2:64 ) >   
telnet> send brk

Welcome to kmdb
kmdb: unable to determine terminal type: assuming `vt100'
Loaded modules: [ scsi_vhci crypto cpc uppc i hook lofs genunix ip logindmux usba specfs pcplusmp md nfs 
random sctp arp cpu.AuthenticAMD.15 ]
[3]> nfs4_trigger_mount:b 
[3]> :c

And we have liftoff!


Originally posted on Kool Aid Served Daily
Copyright (C) 2007, Kool Aid Served Daily

20070606 Wednesday June 06, 2007
Allowing remote sendmail connections

I'm in the process of creating some gate and clones for an internal project. Part of the gate maintenance requires mail to be sent to a specific host such that a checkin kicks of some sanity checks. I've got a working example on another host. But I can't get mine to work.

First we need to make sure that sendmail is running on the target box:

> svcs -a | grep smtp
online         Jun_04   svc:/network/smtp:sendmail
> netstat -a | grep smtp
localhost.smtp             *.*                0      0 49152      0 LISTEN

Okay, it appears to be up. Can we confirm that from a remote host?

> telnet kanigix 25
Trying 192.168.2.XXX...
telnet: connect to address 192.168.2.XXX: Connection refused

> sudo nmap kanigix

Starting Nmap 4.11 ( http://www.insecure.org/nmap/ ) at 2007-06-06 10:51 CDT
Interesting ports on kanigix.XXX (192.168.2.XXX):
Not shown: 1676 closed ports
PORT     STATE SERVICE
22/tcp   open  ssh
111/tcp  open  rpcbind
2049/tcp open  nfs
4045/tcp open  lockd
MAC Address: 00:03:47:B1:6E:45 (Intel)

Nmap finished: 1 IP address (1 host up) scanned in 45.244 seconds

Okay, internally it appears up and externally it appears down. Further, a google doesn't yield anything interesting. Time to check the man pages for sendmail:

     On an  unmodified  system,  access  to  sendmail  by  remote
     clients  is enabled and disabled through the service manage-
     ment facility (see smf(5)).  In particular, remote access is
     determined by the value of the local_only SMF property:

       svc:/network/smtp:sendmail/config/local_only = true


     A setting of true, as above, disallows remote access;  false
     allows remote access. The default value is true.

     The following example shows the  sequence  of  SMF  commands
     used to enable sendmail to allow access to remote systems:

       # svccfg -s svc:/network/smtp:sendmail setprop config/local_only = false
       # svcadm refresh svc:/network/smtp:sendmail

Okay, what is the current value of the property?

> svccfg -s svc:/network/smtp:sendmail listprop config/local_only
config/local_only  boolean  true

Time to correct it:

> sudo svccfg -s svc:/network/smtp:sendmail setprop config/local_only = false
> sudo svcadm refresh svc:/network/smtp:sendmail

And what do we see now? Nothing changed. Try this:

> sudo svcadm restart svc:/network/smtp:sendmail

And we see some changes start:

>  netstat -a | grep smtp
      *.smtp               *.*                0      0 49152      0 LISTEN
      *.smtp               *.*                0      0 49152      0 LISTEN
      *.smtp                            *.*                             0      0 49152      0 LISTEN   

And from the client:

> telnet kanigix 25
Trying 192.168.2.XXX...
Connected to kanigix.
Escape character is '^]'.
220 kanigix.XXX ESMTP Sendmail 8.14.1+Sun/8.14.1; Wed, 6 Jun 2007 11:12:54 -0500 (CDT)
^]
telnet> q
Connection closed.

Okay, that wasn't intuitive. And neither was the control for remote access being buried in a property. I will say that the man page was helpful.


Originally posted on Kool Aid Served Daily
Copyright (C) 2007, Kool Aid Served Daily

20070510 Thursday May 10, 2007
I just opened a can of worms

It was the tail end of a internal OpenSolaris Storage Developer Program meeting. My office window was open, the AC wasn't on, and it was very muggy. In short, I wasn't paying total attention. They started talking about asking for something on one of the discussion forums. I still wasn't paying that close of attention. Jeff was volunteered to do the post. He pointed out that another Jeff, say Bonwick, would have more credibility.

Now it may have been muggy, but I never pass up a chance to tease someone. I told Jeff this was how he could start to make a name for himself. He could one day be just as famous as Bonwick. And Jeff replied that he had read my recent blog entry on how to make a name: What does it mean to contribute to OpenSolaris Communities and Projects?. And I would be perfect to make the posting. I said I'd do it, still fuzzy on just what I agreed to.

And yes, I understand Jeff is going to read this entry, that my management is going to see I was fuzzy in a meeting, etc. :> And I fully expect Jeff to tease me about this entry in the next meeting.

I was quick to put together that I had agreed to ask on storage-discuss what the wishlist of the community was for new projects. You can read (and reply to it) it here: Thread: OpenSolaris Storage Wishlist.

The title of this entry doesn't refer to the above part of the entry. I'm perfectly okay with people knowing I'm not perfect. No the title refers to the fact that I had a hard time writing the request because I realized I was effectively doing product marketing. What else do you call it when you ask your customers what new features they would like in your next release?

I never signed up to do marketing. I'm okay going out to meet customers on an escalation, but I'm not trained for this type of interaction.

But this is exactly the way Sun has to change if we really want OpenSolaris to be, well, open. It isn't enough to just dump code out there and declare that we are open source. We have to build working communities and we are hampered by the fact that the majority of our developmental population is inside Sun. It rubs open source purists the wrong way.

The reality is that we need the communities to evolve and we need external contributions. My request describes a very internal process development goes through periodically. We typically may not directly engage marketing in the lowlevel roadmap discussions. That isn't to say that marketing doesn't have a voice in our final roadmap - that would be suicidal. The final directional decisions are fully made with the marketplace in mind.

But the request is also an effort by Sun to change the process. We could still meet internally and push things over the wall. But we don't want to, we want to engage instead. I don't think we really know how to effectively engage the community. You can see that in my thinking it is product marketing. It really isn't. Instead it is changing some of our customers to contributers. But instead of contributing code, they contribute ideas and ways that development can help them deliver solutions to their customers.

And finally, we need to help developers interact with external people through forums, blogs, and email. I may not have been trained for this type of customer/contributer/community member interaction, but I need to do it. The fact that external people can converse with internal developers is very empowering to the community.


Originally posted on Kool Aid Served Daily
Copyright (C) 2007, Kool Aid Served Daily

20070504 Friday May 04, 2007
Honeycomb project goes live!

After some hard work, the OpenSolaris Storage Community has opened up the OpenSolaris Project: HoneyComb Fixed Content Storage.


Originally posted on Kool Aid Served Daily
Copyright (C) 2007, Kool Aid Served Daily
What does it mean to contribute to OpenSolaris Communities and Projects?

I work for the NFS group at Sun. That means when things really align, I can help out with the NFS Community in OpenSolaris. That community fits under filesystems, network storage, storage, and appliances. And some of those align with my own interests. When I joined Sun, I told my manager I was going to contribute to OpenSolaris and I was going to blog.

I thought contributing to OpenSolaris would be in the form of coding. I couldn't have been more wrong and that was why it was a slow start for me. I found that I was blogging away merrily about how to use OpenSolaris and how it differed from Linux. That experience got me invited as a contributer to the Immigrants Community. I didn't realize that type of activity was just what I needed to be doing for OpenSolaris.

When I finally realized that, I started coming out of my shell with respect to OpenSolaris. I helped push the NFS Server in non-Global Zones project out into the community (and I still need to get the requirements doc started).

I was also sitting in a staff meeting when my manager 3 levels up (Bev Crair) said, "We should get Richard McDougall to blog about Mirror Mounts." I sent her email pointing out that I had already done that twice in the last couple of months (see Some fun with NFSv4 and automount across a ssh tunnel and How NFSv4 should work when crossing filesystems). She wasn't aware that I was a blogger-boy. Now she knows it and I'm on an internal bloggers list for Software.

After both of these events, Lynn Rohrer, contacted me and asked me if I wanted to be on an internal committee which met on the OpenSolaris Storage Community. It actually does more than that, we talk about Storage on BigAdmin and Solaris Developer Network (SDN). Before I knew what was happening, I was heavily involved in rewriting the Storage Community web page and tricked into writing two articles for the SDN.

And I got volunteered to help open up the Honeycomb project. Again, it is site management and not coding.

Am I bragging here? Kinda. But what I'm really trying to point out is that you don't have to be a coder to help drive a Community or a Project. There are plenty of non-coding tasks that need to be done to make this open source experience succeed.

And when you look at the projects and communities I've mentioned, please realize that there are many people involved across the board in getting great content up there. For the Honeycomb project, I did a lot of cut-and-paste, some editing, some research in tracking down people and blogs, etc. I was mainly glue and free cycles.

It is funny, when I first started out in OpenSolaris, I would hound away at people, asking them to put me on the contributor's list for a community or a project. I had one person tell me, "Grasshopper, when you have contributed enough, we will make you a contributor." Now that people know I will help out, I have to almost fight not to be on leader's lists for things.

The moral is that you have something to contribute to the success of OpenSolaris. It may be coding, but then again there is a lot more than code changes that need to be done in order to make this a vibrant open source community. Just help out where you can and it will be appreciated.


Originally posted on Kool Aid Served Daily
Copyright (C) 2007, Kool Aid Served Daily

20070503 Thursday May 03, 2007
The In-Kernel Sharetab bits live, despite untimely rumors of their death by backout

I was reading a blog entry by James McPherson on Zones, automounter and nfs fun stuff when he indirectly blamed me for some of his zone issues:

Incidentally, the feature which I think is the cause of my problem is the In-kernel sharetab integration .... which was backed out.

It wasn't backed out - but I understand his confusion. After I checked it in, a bug was found in the BFU process by which '/etc/dfs/sharetab' was not being constructed properly. As build 62 was being respun for other problems, we decided to take the simple fix for this into build 62. But Helen Chao, our really great test engineer, also discovered what would become 6542714 user maybe able to see shares from global zones after sharetab. I think James put these two together to decide the iks bits had been backed out.

But how do I prove that they are still in the kernel?

Why I point you to the source sitting in the tree up in cvs.opensolaris.org: /onnv/onnv-gate/usr/src/uts/common/fs/sharefs . There is more to find from that putback, like the header files. But the point is that the code is in there and available for testing by people using current versions of OpenSolaris.


Originally posted on Kool Aid Served Daily
Copyright (C) 2007, Kool Aid Served Daily

20070412 Thursday April 12, 2007
Great overview of what is going on in OpenSolaris with respect to Storage

Sun sprinkles FISH food for storage guppies: Open sourcing NetApp killer is an article on The Register which details what has just been released under the auspices of Storage in OpenSolaris.

I love it as a roadmap of where we have recently been - I'm not so hot on it as a conspiracy theory. But then again, I do work for Sun. :-)

I must say my favorite part is about:

YANFS (formerly known as WebNFS) - Java implementation of the client side of the XDR, RPC, NFSv2, and NFSv3 protocols

I happen to know that this code was released not as part of some master plan, but because Spencer Shepler likes working on that code as a side project late at night. Don't forget that the reason some Sun engineers are engaged in OpenSolaris is that they like working on non-work related projects. He was just able to turn an interest into something for the group.

Anyway, read the article - it is pretty fun to see Storage getting some airtime.


Originally posted on Kool Aid Served Daily
Copyright (C) 2007, Kool Aid Served Daily

20070410 Tuesday April 10, 2007
We've updated the OpenSolaris Storage Community

I'm associated with several OpenSolaris Communities and I've come to learn that they are managed totally haphazardly (which is not to say that there is a lack actual thought going on). One of the current approaches is to just have a consistent look and feel for all of the pages - it is quite clear that they are FDBD (for developers, by developers).

So I was quite surprised when Lynn Rohrer invited me to join the internal OpenSolaris Storage Developer Program meeting. I had no clue that a Community might have weekly meetings. The aim of this group is how can we tie together the Storage Community across OpenSolaris (community and discussion groups), blogs, BigAdmin, and the Sun Developer Network. In part it is marketing (at the grass roots), but for the most it is how can deploy information to our primary consumers. And there are two sets: developers and administrators.

One of the first orders of business I became involved in was updating the OpenSolaris Community: Storage entry page. It was basically a project page for Network Storage. We wanted to create a project page for NWS, which was odd since it had been a OpenSolaris project for over a year. We went to the powers that be and they told us two things:

  1. We had to do a normal request for project.
  2. Since the elections had just ended, we might find out there were discussions about what was a project and what was a community. In other words, we could very well find out we were not active enough to be a community.

We didn't want this - we knew we were a strong community and were just lacking a presence on OpenSolaris, Also, we have a lot of sub-communities, like NFS and ZFS, which are very active. We still needed an umbrella organization to provide our story.

But most of this wasn't in our minds. No, we were busy branching off NWS to a real project page and getting the Storage community page in order. The new content that we added included:

So we updated the page and discovered that while we told you what we covered, we didn't tell you why we covered it or why Storage was important. I.e., why do we still need an umbrella organization?

So we did that, we told our story. And that was when we discovered that we were FDBD. I looked at the page and it was starkly functional. Non-developers in the group looked at the page and it lacked appeal. In large part, having the list of endorsed projects at the top really took away that potential splash area on the first chunk of real estate.

So we got rid of it. We realized that the sub-communities could endorse the projects for now and that we could place our list in tabular form at the bottom of the page. (It turns out that the endorsed projects links are automatically created and always appear at the start of the page.)

So the new page is up there and it is brilliant. We unveiled it to coincide with the launch of Free and Open Storage Software. Go and look at the OpenSolaris Community: Storage - I'm proud that I helped turn it from an inactive part of OpenSolaris to an active part of the larger community. And I'm glad that while it looks pretty - it is still for developers, by developers.


Originally posted on Kool Aid Served Daily
Copyright (C) 2007, Kool Aid Served Daily

20070406 Friday April 06, 2007
Getting a locked up Gnome session

So I installed snv_61 on my home desktop. The first thing that went was my rge0 stopped working. My nge0 and nge1 are still not working, so I slapped an iprb0 in place of the rge0. And right after that, I started getting a blank screen (bottom menu bar but no screen icons) on Gnome. The only clue I have is: "The application gnome-volume-manager has crashed."

Except I get that all the time. Okay, what do I see in dmesg:

Apr  5 23:26:33 kanigix svc.startd[7]: [ID 652011 daemon.warning] svc:/system/hal:default: Method "/lib/svc/method/svc-hal start" failed with exit status 95.
Apr  5 23:26:33 kanigix svc.startd[7]: [ID 748625 daemon.error] system/hal:default failed fatally: transitioned to maintenance (see 'svcs -xv' for details)

It turns out that the 95 just means a fatal error. Since svc-hal is script, lets see what is causing it to whine:

[tdh@kanigix ~]> svcs -xv
svc:/system/hal:default (Hardware Abstraction Layer daemon)
 State: maintenance since Thu Apr 05 23:26:33 2007
Reason: Start method exited with $SMF_EXIT_ERR_FATAL.
   See: http://sun.com/msg/SMF-8000-KS
   See: man -M /usr/man -s 1M hal
   See: /var/svc/log/system-hal:default.log
Impact: 1 dependent service is not running:
        svc:/system/filesystem/rmvolmgr:default
[tdh@kanigix ~]> tail /var/svc/log/system-hal:default.log
[ Apr  5 23:22:20 Executing start method ("/lib/svc/method/svc-hal start") ]
hal failed to start: error 2
[ Apr  5 23:26:33 Method "start" exited with status 95 ]
[tdh@kanigix ~]> file /usr/lib/hal/hald
/usr/lib/hal/hald:      ELF 32-bit LSB executable 80386 Version 1 [FPU], dynamically linked, not stripped, no debugging information available

Okay, we need to find out why hald is returning 2, which should be ENOENT. It looks like parent_wait_for_child() is the code killing me. Why? Hmm, either 250s has expired or something else is going on. I'm getting an answer pretty quick, so that isn't it.

One of my SATA drives is not showing up when I boot - the bios is no longer detecting it. Could I have lost a drive? WinXP shows 6 disks - my boot, 4 internal SATA, and the external SATA. It is reporting the external SATA drive as USB. It was being reported as an internal drive earlier.

I pulled the drive - I had been using it for a dump device. And now I have a working system. Still get the warning message about the gnome-volume-manager!


Originally posted on Kool Aid Served Daily
Copyright (C) 2007, Kool Aid Served Daily

20070404 Wednesday April 04, 2007
Putback of In-Kernel Sharetab

Well, I finally putback the In-Kernel Sharetab (iks) bits. They will show up in build 62.

I was really impressed by the QA effort undertaken by Helen Chao and John Cui. They added a couple of weeks worth of work by shaking out some really nasty corner cases. And I liked that I when they asked how something should behave, they understood my answer of, "I don't know, you tell me."

What I meant of course was go get Solaris 10 bits and tell me how the sharetab implementation currently worked. And then tell me how my new code differed from the old code. And finally, create a test case such that we never have a regression creep in. I've been involved with other QA engineers who couldn't do this part of the job - they didn't understand that QA wasn't just in testing what a developer had changed, but to also understand how customers use our product and be their advocate to development.

It is real easy to chug away at code and find panics. It is much harder to find behavioral changes. That requires you to understand the baseline cases.

I also had to battle my way past my CRT advocate - Spencer Shepler. There were several panics on test systems the two weeks before I could formally ask to putback. In one case, I was able to show that the panic occurred in code that did not have any iks changes. That one cost me a week of time. But more frustrating was the second panic, a memory exhaustion. We had two cores - one after the iks code had been run and one before the iks code had started to run. They had the same signatures in the cores. I argued that since one occurred before the iks code had been loaded, the problem wasn't with that code. Oh, and this only happened on one machine in the whole company.

I would say I know it isn't the iks code, but I can't formally prove it. I lost an other week with this issue. I learned a lot about looking at core files. One of the other key ways Helen was able to help me out was she realized a testing statement she made was incorrect. That had us looking down the wrong area of code. She also thought of a really sweet way to help isolate whether the problem was in the iks code or already present in the kernel. At all times, I felt she was fully engaged in understanding the problem and finding the root cause. I've had many QA engineers who felt once they filed a bug report, they could go to the next problem.

And then Bill Baker had to ruin it all by identifying a matching ZFS bug and finding the signature in the cores we had our hands on. Bill is a another developer (and much more), who took his own time to comb through the core to help identify whether it was iks or something else. He knew how frustrated I was and was able to correlate another bug report which was in his inbox to what I was seeing. A quick deployment of that fix and we were able to show that it fixed the issue we were hitting.

The point of all of this is that Sun is deadly serious about the quality of its kernel. I don't think that everyone gets that when they look at OpenSolaris and compare development models with the way Linux proceeds. Sun kernel engineers (developers and QA) do not want to release control of the quality of the code. They take pride not only in their work, but in the processes which protect their investments.

This serious commitment to quality was what brought me to Sun. That and the chance to help develop OpenSolaris.


Originally posted on Kool Aid Served Daily
Copyright (C) 2007, Kool Aid Served Daily

20070318 Sunday March 18, 2007
A design choice with the In Kernel Sharetab

So, in In Kernel Sharetab: Have a single file psuedo-fs working!, I wrote about how I was close to having that work done. I got a request from Peter Tribble (comments) to make sure that the ignore option was set such that normal df output would not show the sharetab? I replied yes, since it sounded reasonable.

Well, I was wrong. :-) It turns out that the pseudo-filesystems like mntfs and objfs are loaded directly in vfs.c. And as such, they have no knowledge of /etc/vfstab. So they gleefully ignore 'ignore'.

I looked at changing the interface into vfs_mountfs(), but I hadn't run that past a design check and when I did it informally, the decision was that all of the pseudo-filesystems had to work the same way. And to change the others would require a PSARC review. I.e., we would be changing the expectation that these filesystems would appear in a published public interface.

So I decided to ship the code without setting 'ignore' and to work in the background on getting a new review approved. It turns out we have a collection of bugs requesting this change for the other filesystems.


Originally posted on Kool Aid Served Daily
Copyright (C) 2007, Kool Aid Served Daily

Copyright (C) 2007, Kool Aid Served Daily