Predictable
Stephen Hahn's blog at Sun Microsystems
All | Pastime | Person | Peruse | Position | Process | Product

« Previous month (May 2005) | Main | Next month (Jul 2005) »
20050624 Friday June 24, 2005

OpenSolaris BoF at JavaOne

On Monday evening, a few of us—Bryan, Mike, Adam, and myself (plus others, I hope)—will be hosting an OpenSolaris birds-of-a-feather session during JavaOne. The BoF, Java™ Technology on the OpenSolaris™ Platform", will be from 1930 - 2020 in Golden Gate B1 at the Marriott Hotel.

With Team DTrace there, the BoF should be an additional opportunity to ask questions about some of the preliminary Java–DTrace instrumentation techniques and how that might apply to your programming problems. (Of course, you should go to Adam's session for the full story first.) I'm there for a different reason. As Dave mentioned, we're planning a set of classes to make the smf(5) functionality available from the Java platform. It turns out that these might be the first public Solaris-specific Java interfaces we publish, which means we've reached a small unexplored architectural area. To start surveying this terrain, I need to ask the community a couple of questions:

Of course, you don't have to go to JavaOne to make suggestions—feel free to make them here. But if you're at the conference, please drop by and make your points in person. The BoF details once more:

Java™ Technology on the OpenSolaris™ Platform

Monday 6/27
7:30-8:20pm
Golden Gate B1
Marriott Hotel

Hope to see you there.

[ T: ]

(2005-06-24 11:09:16.0) Permalink Comments [1]
20050620 Monday June 20, 2005

Manifests so far

To ward off unintentional duplication of effort, I thought I would list the service manifests submitted to date:

application/mysqlMySQL database serverKeith LawsonCanada
application/oracle/[database]Oracle database controlJoost MuldersNetherlands [Sun]
application/oracle/[listener]Oracle listener Joost MuldersNetherlands [Sun]
[application]/popfile[POPFile mail classifier]Iouri GoussevCanada
[application]/xprintX Window System Print serverPeter ErikssonSweden
network/ntp:openntpdOpenNTPD daemon Todd CarsonUSA
network/[smtp:qmail]qmail SMTP MTA Iouri GoussevCanada

I hadn't heard of POPFile before, but a multiple bucket mail classifier might be the only way for me to keep up with opensolaris-discuss@opensolaris.org.

I've read each of the above manifests quickly, so I'm confident we'll be sending out mugs to each author—yes, even to you, Joost. We'll do a detailed read, send some suggestions (like my first thoughts on service names), and publish links at the contest end. Keep the manifests coming.

[ T: ]

(2005-06-20 21:48:25.0) Permalink Comments [3]
20050615 Wednesday June 15, 2005

Contest progress; smf(5) discussion

I've been happily receiving submissions of manifests for various services over the past few days. With all the traffic around other news, I believe it only fair to extend the "Manifests sans frontierès" contest to Canada Day (1 July), so that we can all monitor new mailing lists, more vigorous IRC discussion, track countless tags and feeds, and generally commune, and still write some useful code.

benr: I have an Oracle submission in hand.

One of those new forums is smf-discuss@opensolaris.org, where we'll be having development discussions about smf(5). There's big stuff and little stuff to do, so if you're not certain where to start or are looking for ways to contribute, come talk with us—our learning curve is relatively gentle.

To subscribe, send a blank message to smf-discuss-subscribe@opensolaris.org.

[ T: ]

(2005-06-15 14:20:12.0) Permalink Comments [1]
20050614 Tuesday June 14, 2005

libuutil and designing for debuggability

Going into Solaris 10, I knew we were planning to develop a troupe of new daemons; we ultimately ended up with svc.startd(1M), svc.configd(1M), and a new implementation of inetd(1M). I wanted to make sure we made some progress on daemon implementation practice, and bounced some ideas around with the afternoon coffee group and also with Mike, and probably some others—I wander around a bit.

We anticipated that most of the daemons would be multithreaded, and it became apparent that they would all present large, complicated images for postmortem debugging1. To reduce the time to acquire familiarity with each of these daemons, we worked out three common requirements:

The problem was, of course, that there wasn't a library with such data structures in Solaris at the time.2, 3. So we began to design libuutil, which combines a number of established utility functions used in authoring Solaris commands with these new "good" implementations of useful data structures.

The library in question was named in sympathy with libumem(3LIB)—libuutil for "userland utility functions". libuutil provides both a doubly linked list implementation and an AVL tree implementation. The list implementation is mostly located in lib/libuutil/common/uu_list.c; we'll use that to explore the debugging assistance we designed in.

The model used is that each program is likely to have multiple lists of common structures, and that there would be multiple such structures. This led us to create an interface that is expressed in terms of pools of list. So, for each structure, you create a list pool using uu_list_pool_create(). Then, for each list of that structure, you create a list in the respective pool using uu_list_create().

That sounds complicated, but it's for a good reason: at each call to uu_list_pool_create(), we register the newly created pool on a global list, headed by the "null pool", uu_null_lpool:

uu_list_pool_t *
uu_list_pool_create(const char *name, size_t objsize,
    size_t nodeoffset, uu_compare_fn_t *compare_func, uint32_t flags)
{
	uu_list_pool_t *pp, *next, *prev;

	/* validate name, allocate storage, initialize members */

	(void) pthread_mutex_init(&pp->ulp_lock, NULL);

	pp->ulp_null_list.ul_next = &pp->ulp_null_list;
	pp->ulp_null_list.ul_prev = &pp->ulp_null_list;

	(void) pthread_mutex_lock(&uu_lpool_list_lock);
	pp->ulp_next = next = &uu_null_lpool;
	pp->ulp_prev = prev = next->ulp_prev;
	next->ulp_prev = pp;
	prev->ulp_next = pp;
	(void) pthread_mutex_unlock(&uu_lpool_list_lock);

	return (pp);
}

with similar code being used to connect each list to its pool on calls to uu_list_create().

So now we have an address space where each list pool is linked in a list, and each list in a pool is linked to a list headed at that pool. This leads us to the second part, which is to use the encoded information in a debugger. The typical debugger for kernel work in Solaris is mdb(1), the modular debugger. It's been shipping with Solaris since 5.8, and has a rich set of extensions for kernel debugging. For userland, the modules are rarer: libumem is probably the best known.4

The source code for the libuutil module (or "dmod") is located at cmd/mdb/common/modules/libuutil/libuutil.c; the function that provides the dcmd itself, uutil_listpool, is just a wrapper around the walker for uu_list_pool_t structures. The pertinent portion is the initialization function, uutil_listpool_walk_init():5

int
uutil_listpool_walk_init(mdb_walk_state_t *wsp)
{
        uu_list_pool_t null_lpool;
        uutil_listpool_walk_t *ulpw;
        GElf_Sym sym;

        bzero(&null_lpool, sizeof (uu_list_pool_t));

        if (mdb_lookup_by_obj("libuutil.so.1", "uu_null_lpool", &sym) ==
            -1) {
                mdb_warn("failed to find 'uu_null_lpool'\n");
                return (WALK_ERR);
        }

        if (mdb_vread(&null_lpool, sym.st_size, (uintptr_t)sym.st_value) ==
            -1) {
                mdb_warn("failed to read data from 'uu_null_lpool' address\n");
                return (WALK_ERR);
        }

        ulpw = mdb_alloc(sizeof (uutil_listpool_walk_t), UM_SLEEP);

        ulpw->ulpw_final = (uintptr_t)null_lpool.ulp_prev;
        ulpw->ulpw_current = (uintptr_t)null_lpool.ulp_next;
        wsp->walk_data = ulpw;

        return (WALK_NEXT);
}

which safely pulls out the value of the uu_null_pool head element, and the relevant pieces we'll need to walk the list.

This means that, for any program linked with libuutil, we can attach with mdb(1M) and display its list pools:

# mdb -p `pgrep -z global startd`
Loading modules: [ svc.startd ld.so.1 libumem.so.1 libnvpair.so.1 libsysevent.so.1 libuutil.so.1 libc.so.1 ]
> ::uu_list_pool
ADDR     NAME                            COMPARE FLAGS
080dcf08 wait_info                      00000000     D
080dce08 SUNW,libscf_datael             00000000     D
080dcd08 SUNW,libscf_iter               00000000     D
080dcc08 SUNW,libscf_transaction_entity c2b0476c     D
080dc808 dict                           0805749c     D
080dc908 timeouts                       0806ffab     D
080dca08 restarter_protocol_events      00000000     D
080dcb08 restarter_instances            0806ccd7     D
080dc708 restarter_instance_queue       00000000     D
080dc608 contract_list                  00000000     D
080dc508 graph_protocol_events          00000000     D
080dc408 graph_edges                    00000000     D
080dc308 graph_vertices                 08059844     D
>

and then drill down into constituent lists of interest.

Additional walkers are also provided, such that the lists and list nodes can be visited from the command line or programmatically. As an example, the ::vertex dcmd from the svc.startd module uses the walkers to display the various service graph nodes in a quasi-readable format.5

So, by providing extra structured information in the library and support to consume that information in the debugger, we end up with a set of data structures that, if used, leads to more debuggable programs. More work up front for less later: welcome to OpenSolaris.

Footnotes

1. By postmortem debugging, I'm referring to the operation of debugging a failed application after its failure, from a core file or other memory image captured as soon after that failure as possible. Suitability for postmortem debugging is a standard expectation for software design in Solaris, as it reduces the time to diagnose and fix software failures. In particular, multiple engineers can debug a core file in parallel; this can be contrasted with the cost of setting up a duplicate installation and trying to reproduce the failure, let alone expecting the customer to risk further downtime experimenting with "try this" scenarios.

2. Please remember that we were making these decisions three years ago, and that this choice had to fit the then-applicable constraints on the product.

3. In contrast, the kernel has had a generic, modular hash table since 5.8/2000 (uts/common/os/modhash.c), a generic AVL tree since 5.9/2002 (common/avl/avl.c), and a generic list implementation early in 5.10/2005 (uts/common/os/list.c). Of course, the kernel has used the slab allocator (uts/common/os/kmem.c) since 5.4/1994.

4. A quick listing in /usr/lib/mdb/proc/ will display the other modules valid in the process target: beyond libumem and libuutil, there's support for the linker, libc, name-value pairs, system event, and the two main smf(5) daemons.

5. As an example, here's the output of "::vertex on my current system, for those services related to my VNC server (and the service itself):

> ::vertex ! grep vnc
0x85d3380  212 I 1 svc:/application/vncserver:sch
0x85d3320  213 s - svc:/application/vncserver
0x85d3200  214 R - svc:/application/vncserver:sch>milestone
0x85d3260  215 R - svc:/application/vncserver:sch>autofs
0x85d32c0  216 R - svc:/application/vncserver:sch>nis
>

[ T: ]

(2005-06-14 09:51:44.0) Permalink Comments [1]
20050601 Wednesday June 01, 2005

Bespoke services: application/vncserver

In honour of the "Mugs for Manifests" contest, I thought I would spin out another custom service description I wrote some months ago.

My setup for working from home—key during the last six months of Solaris 10—is to tunnel into Sun's network via one implementation or another of a virtual private network (VPN). In all cases, the VPN solution runs on Solaris. Although the VPN lets your system participate more or less like a regular host, I find it's easier to use VNC to remotely present an X11 display from my main workstation, muskoka. But, of course, machine running pre-production bits can fail or be rebooted or be reinstalled regularly, so I wanted the VNC server on my system to always be up: I wanted a VNC service.

What's distinct about running the VNC server is that it should run as me, with my environment, and not as root with init(1M)'s. svc.startd(1M), while it can run methods according to smf_method(5), doesn't populate the environment fully in the sense of login(1). So we will need to extract some data from the name service, which is cumbersome to perform in a shell script. We'll write our method in Perl, which implies

Tip 1: Methods need not be shell scripts.

In fact, the start method and the stop method can be totally separate commands: you could write one in Python, and one can be an executable Java .jar archive, or some even more bizarre combination.

The other trick is that, if VNC fails for some reason, I want to be aggressive about cleaning up its various leftover temporary files. For this purpose, I run the stop method with a different credential—the default of root—than the start method, which is done in our brief manifest by locating the <method_context> element on only the start method.

Tip 2: Methods need not be run with identical method contexts. Credentials, privileges, and the like may all differ from method to method.

Our manifest then looks like:

<?xml version='1.0'?>
<!DOCTYPE service_bundle SYSTEM '/usr/share/lib/xml/dtd/service_bundle.dtd.1'>
<service_bundle type='manifest' name='export'>
  <service name='application/vncserver' type='service' version='0'>
    <single_instance/>
    <instance name='sch' enabled='true'>
      <dependency name='milestone' grouping='require_all' restart_on='none' type='service'>
        <service_fmri value='svc:/milestone/multi-user:default'/>
      </dependency>
      <dependency name='autofs' grouping='require_all' restart_on='none' type='service'>
        <service_fmri value='svc:/system/filesystem/autofs:default'/>
      </dependency>
      <dependency name='nis' grouping='require_all' restart_on='none' type='service'>
        <service_fmri value='svc:/network/nis/client:default'/>
      </dependency>
      <exec_method name='stop' type='method' exec='/home/sch/bin/vncserver_method stop' timeout_seconds='60'/>
      <exec_method name='start' type='method' exec='/home/sch/bin/vncserver_method start' timeout_seconds='300'>
        <method_context>
          <method_credential user='sch' group='staff' />
        </method_context>
      </exec_method>
    </instance>
  </service>
</service_bundle>

The dependencies above are needed if you use NFS for home directories and NIS for name services; they could be reduced for less networked setups.

And, for the method, we have a short Perl program. The complete list of environment variables in login(1) would include LOGNAME, PATH, MAIL, and TZ (timezone), and exclude my silly setting of LANG, but most of these will be set up by the shell that the VNC startup script (its analgue to .xinitrc. The various print calls are just to let the service log show a little activity, and could be removed.

#!/usr/perl5/bin/perl

require 5.8.3;
use strict;
use warnings;

use locale;

my ($name, $passwd, $uid, $gid,  $quota, $comment, $gcos, $dir, $shell,
    $expire) = getpwuid "$<";

$ENV{USER} = $name;
$ENV{HOME} = $dir;
$ENV{SHELL} = $shell;
$ENV{LANG} = "en_CA";           # Just to create havoc (i.e. expose bugs).


#
# The stop method is run as root so that it can cleanup.
#
if (defined($ARGV[0]) && $ARGV[0] eq "stop") {
        # ksh and sh specific
        print "stop method\n";
        system("$ENV{SHELL}", "-c", "/opt/csw/bin/vncserver -kill :1");

        if (-S "/tmp/.X11-unix/X1") {
                unlink("/tmp/.X11-unix/X1");
                unlink("/tmp/.X1-lock");
        }

        exit 0;
}

#
# The start method is run with the user's identity.
#
print "start method\n";

if (-f "/tmp/.X1-lock") {
        unlink("/tmp/.X1-lock");
}

if (-S "/tmp/.X11-unix/X1") {
        system("logger -p 1 application/vncserver requires " .
            "/tmp/.X11-unix/X1 be removed");
        exit 0;
}

# ksh and sh specific
{ exec "$ENV{SHELL}", "-c",
    "/opt/csw/bin/vncserver -pn -geometry 1600x1200 -depth 24 :1" };
system("logger -p 1 application/vncserver can't exec /opt/csw/bin/vncserver");
exit 1;

And now we have always-on VNC service for the regular telecommuter:

$ svcs -p vncserver
STATE          STIME    FMRI
online         13:01:01 svc:/application/vncserver:sch
	       13:01:00   100577 Xvnc
	       13:01:17   100625 xwrits
	       13:01:17   100626 ctrun
	       13:01:17   100632 xautolock
	       13:11:18   102348 xlock
$ uptime
 12:00pm  up 23 hr(s),  4 users,  load average: 0.04, 0.07, 0.07

Exercises

  1. Remove the hard coded display numbering (":1", "X1", etc.).
  2. Make the resolution, display depth, RGB encoding, and other standard options into properties.

[ T: ]

(2005-06-01 12:00:57.0) Permalink Comments [4]
Stephen Hahn
Sun Microsystems
sch@sun.com
17 Network Circle
MS MPK17-301
Menlo Park CA 94025 USA