Friday June 24, 2005 OpenSolaris BoF at JavaOne
On Monday evening, a few of us—Bryan, Mike, Adam, and myself (plus others, I hope)—will be hosting an OpenSolaris birds-of-a-feather session during JavaOne. The BoF, Java Technology on the OpenSolaris Platform", will be from 1930 - 2020 in Golden Gate B1 at the Marriott Hotel.
With Team DTrace there, the BoF should be an additional opportunity to
ask questions about some of the preliminary Java–DTrace
instrumentation techniques and how that might apply to your programming
problems. (Of course, you should go to Adam's
session for the full story first.) I'm there for a different reason.
As Dave
mentioned, we're planning a set of classes to make the smf(5)
functionality available from the Java platform. It turns out that these
might be the first public Solaris-specific Java interfaces we publish,
which means we've reached a small unexplored architectural area. To
start surveying this terrain, I need to ask the community a couple of
questions:
- What else can we do to make Java a first class systems programming language for Solaris?
- For instance, what other classes, defaults, or conventions should we offer or modify to make Solaris-based Java development easier/more capable/more fabulous?
Of course, you don't have to go to JavaOne to make suggestions—feel free to make them here. But if you're at the conference, please drop by and make your points in person. The BoF details once more:
Java Technology on the OpenSolaris Platform Monday 6/27 7:30-8:20pm Golden Gate B1 Marriott Hotel
Hope to see you there.
[ T: OpenSolaris Solaris smf JavaOne2005 ]
(2005-06-24 11:09:16.0) Permalink Comments [1]Manifests so far
To ward off unintentional duplication of effort, I thought I would list the service manifests submitted to date:
application/mysql | MySQL database server | Keith Lawson | Canada |
application/oracle/[database] | Oracle database control | Joost Mulders | Netherlands [Sun] |
application/oracle/[listener] | Oracle listener | Joost Mulders | Netherlands [Sun] |
[application]/popfile | [POPFile mail classifier] | Iouri Goussev | Canada |
[application]/xprint | X Window System Print server | Peter Eriksson | Sweden |
network/ntp:openntpd | OpenNTPD daemon | Todd Carson | USA |
network/[smtp:qmail] | qmail SMTP MTA | Iouri Goussev | Canada |
I hadn't heard of POPFile before, but a multiple bucket mail classifier
might be the only way for me to keep up with
opensolaris-discuss@opensolaris.org.
I've read each of the above manifests quickly, so I'm confident we'll be sending out mugs to each author—yes, even to you, Joost. We'll do a detailed read, send some suggestions (like my first thoughts on service names), and publish links at the contest end. Keep the manifests coming.
[ T: OpenSolaris Solaris smf nationalism ]
(2005-06-20 21:48:25.0) Permalink Comments [3]Contest progress; smf(5) discussion
I've been happily receiving submissions of manifests for various services over the past few days. With all the traffic around other news, I believe it only fair to extend the "Manifests sans frontierès" contest to Canada Day (1 July), so that we can all monitor new mailing lists, more vigorous IRC discussion, track countless tags and feeds, and generally commune, and still write some useful code.
benr: I have an Oracle submission in hand.
One of those new forums is smf-discuss@opensolaris.org,
where we'll be having development discussions about smf(5).
There's big stuff and little stuff to do, so if you're not certain where
to start or are looking for ways to contribute, come talk with
us—our learning curve is relatively gentle.
To subscribe, send a blank message to smf-discuss-subscribe@opensolaris.org.
libuutil and designing for debuggability
Going into Solaris 10, I knew we were planning to develop a
troupe of new daemons; we ultimately ended up with
svc.startd(1M), svc.configd(1M), and a new
implementation of inetd(1M). I wanted to make sure we made
some progress on daemon implementation practice, and bounced some ideas
around with the afternoon coffee group and
also with Mike, and probably some others—I wander
around a bit.
We anticipated that most of the daemons would be multithreaded, and it became apparent that they would all present large, complicated images for postmortem debugging1. To reduce the time to acquire familiarity with each of these daemons, we worked out three common requirements:
- include Compact C Type Format (CTF) information with each daemon,
- use
libumem(3LIB) for memory allocation, and - use standard, debuggable, MT-safe implementations of data structures.
The problem was, of course, that there wasn't a library with such data
structures in Solaris at the time.2, 3. So we began to design
libuutil, which combines a number of established utility
functions used in authoring Solaris commands with these new
"good" implementations of useful data structures.
The library in question was named in sympathy with
libumem(3LIB)—libuutil for "userland
utility functions". libuutil provides both a doubly linked
list implementation and an AVL tree implementation. The list
implementation is mostly located in lib/libuutil/common/uu_list.c;
we'll use that to explore the debugging assistance we designed in.
The model used is that each program is likely to have multiple lists of
common structures, and that there would be multiple such structures.
This led us to create an interface that is expressed in terms of pools
of list. So, for each structure, you create a uu_list_pool_create(). Then, for each
list of that structure, you create a list in the respective pool using
uu_list_create().
That sounds complicated, but it's for a good reason: at each call to
uu_list_pool_create(), we register the newly created pool
on a global list, headed by the "null pool", uu_null_lpool:
uu_list_pool_t *
uu_list_pool_create(const char *name, size_t objsize,
size_t nodeoffset, uu_compare_fn_t *compare_func, uint32_t flags)
{
uu_list_pool_t *pp, *next, *prev;
/* validate name, allocate storage, initialize members */
(void) pthread_mutex_init(&pp->ulp_lock, NULL);
pp->ulp_null_list.ul_next = &pp->ulp_null_list;
pp->ulp_null_list.ul_prev = &pp->ulp_null_list;
(void) pthread_mutex_lock(&uu_lpool_list_lock);
pp->ulp_next = next = &uu_null_lpool;
pp->ulp_prev = prev = next->ulp_prev;
next->ulp_prev = pp;
prev->ulp_next = pp;
(void) pthread_mutex_unlock(&uu_lpool_list_lock);
return (pp);
}
with similar code being used to connect each list to its pool on calls
to uu_list_create().
So now we have an address space where each list pool is linked in a
list, and each list in a pool is linked to a list headed at that pool.
This leads us to the second part, which is to use the encoded
information in a debugger. The typical debugger for kernel work in
Solaris is mdb(1), the modular debugger. It's been
shipping with Solaris since 5.8, and has a rich set of extensions for
kernel debugging. For userland, the modules are rarer:
libumem is probably the best known.4
The source code for the libuutil module (or "dmod") is
located at
cmd/mdb/common/modules/libuutil/libuutil.c;
the function that provides the dcmd itself, uutil_listpool,
is just a wrapper around the walker for uu_list_pool_t
structures. The pertinent portion is the initialization function,
uutil_listpool_walk_init():5
int
uutil_listpool_walk_init(mdb_walk_state_t *wsp)
{
uu_list_pool_t null_lpool;
uutil_listpool_walk_t *ulpw;
GElf_Sym sym;
bzero(&null_lpool, sizeof (uu_list_pool_t));
if (mdb_lookup_by_obj("libuutil.so.1", "uu_null_lpool", &sym) ==
-1) {
mdb_warn("failed to find 'uu_null_lpool'\n");
return (WALK_ERR);
}
if (mdb_vread(&null_lpool, sym.st_size, (uintptr_t)sym.st_value) ==
-1) {
mdb_warn("failed to read data from 'uu_null_lpool' address\n");
return (WALK_ERR);
}
ulpw = mdb_alloc(sizeof (uutil_listpool_walk_t), UM_SLEEP);
ulpw->ulpw_final = (uintptr_t)null_lpool.ulp_prev;
ulpw->ulpw_current = (uintptr_t)null_lpool.ulp_next;
wsp->walk_data = ulpw;
return (WALK_NEXT);
}
which safely pulls out the value of the uu_null_pool head
element, and the relevant pieces we'll need to walk the list.
This means that, for any program linked with libuutil,
we can attach with mdb(1M) and display its list pools:
# mdb -p `pgrep -z global startd` Loading modules: [ svc.startd ld.so.1 libumem.so.1 libnvpair.so.1 libsysevent.so.1 libuutil.so.1 libc.so.1 ] > ::uu_list_pool ADDR NAME COMPARE FLAGS 080dcf08 wait_info 00000000 D 080dce08 SUNW,libscf_datael 00000000 D 080dcd08 SUNW,libscf_iter 00000000 D 080dcc08 SUNW,libscf_transaction_entity c2b0476c D 080dc808 dict 0805749c D 080dc908 timeouts 0806ffab D 080dca08 restarter_protocol_events 00000000 D 080dcb08 restarter_instances 0806ccd7 D 080dc708 restarter_instance_queue 00000000 D 080dc608 contract_list 00000000 D 080dc508 graph_protocol_events 00000000 D 080dc408 graph_edges 00000000 D 080dc308 graph_vertices 08059844 D >
and then drill down into constituent lists of interest.
Additional walkers are also provided, such that the lists and list nodes
can be visited from the command line or programmatically. As an example,
the ::vertex dcmd from the svc.startd module
uses the walkers to display the various service graph nodes in a
quasi-readable format.5
So, by providing extra structured information in the library and support to consume that information in the debugger, we end up with a set of data structures that, if used, leads to more debuggable programs. More work up front for less later: welcome to OpenSolaris.
Footnotes
1. By postmortem debugging, I'm referring to the operation of debugging a failed application after its failure, from a core file or other memory image captured as soon after that failure as possible. Suitability for postmortem debugging is a standard expectation for software design in Solaris, as it reduces the time to diagnose and fix software failures. In particular, multiple engineers can debug a core file in parallel; this can be contrasted with the cost of setting up a duplicate installation and trying to reproduce the failure, let alone expecting the customer to risk further downtime experimenting with "try this" scenarios.
2. Please remember that we were making these decisions three years ago, and that this choice had to fit the then-applicable constraints on the product.
3. In contrast, the kernel has had a generic, modular hash table since 5.8/2000 (uts/common/os/modhash.c), a generic AVL tree since 5.9/2002 (common/avl/avl.c), and a generic list implementation early in 5.10/2005 (uts/common/os/list.c). Of course, the kernel has used the slab allocator (uts/common/os/kmem.c) since 5.4/1994.
4. A quick listing in
/usr/lib/mdb/proc/ will display the other modules valid in the
process target: beyond libumem and libuutil,
there's support for the linker, libc, name-value pairs,
system event, and the two main smf(5)
daemons.
5. As an example, here's the output
of "::vertex on my current system, for those services
related to my VNC server (and the service itself):
> ::vertex ! grep vnc 0x85d3380 212 I 1 svc:/application/vncserver:sch 0x85d3320 213 s - svc:/application/vncserver 0x85d3200 214 R - svc:/application/vncserver:sch>milestone 0x85d3260 215 R - svc:/application/vncserver:sch>autofs 0x85d32c0 216 R - svc:/application/vncserver:sch>nis >
[ T: OpenSolaris Solaris mdb ]
(2005-06-14 09:51:44.0) Permalink Comments [1]Bespoke services: application/vncserver
In honour of the "Mugs for Manifests" contest, I thought I would spin out another custom service description I wrote some months ago.
My setup for working from home—key during the last six months of Solaris 10—is to tunnel into Sun's network via one implementation or another of a virtual private network (VPN). In all cases, the VPN solution runs on Solaris. Although the VPN lets your system participate more or less like a regular host, I find it's easier to use VNC to remotely present an X11 display from my main workstation, muskoka. But, of course, machine running pre-production bits can fail or be rebooted or be reinstalled regularly, so I wanted the VNC server on my system to always be up: I wanted a VNC service.
What's distinct about running the VNC server is that it should run as
me, with my environment, and not as root with init(1M)'s.
svc.startd(1M), while it can run methods according to
smf_method(5), doesn't populate the environment fully in
the sense of login(1). So we will need to extract some
data from the name
service, which is cumbersome to perform in a shell script. We'll write
our method in Perl, which implies
Tip 1: Methods need not be shell scripts.
In fact, the start method and the stop method can be totally separate commands: you could write one in Python, and one can be an executable Java .jar archive, or some even more bizarre combination.
The other trick is that, if VNC fails for some reason, I want to be
aggressive about cleaning up its various leftover temporary files. For
this purpose, I run the stop method with a different
credential—the default of root—than the start
method, which is done in our brief manifest by locating the
<method_context> element on only the start method.
Tip 2: Methods need not be run with identical method contexts. Credentials, privileges, and the like may all differ from method to method.
Our manifest then looks like:
<?xml version='1.0'?>
<!DOCTYPE service_bundle SYSTEM '/usr/share/lib/xml/dtd/service_bundle.dtd.1'>
<service_bundle type='manifest' name='export'>
<service name='application/vncserver' type='service' version='0'>
<single_instance/>
<instance name='sch' enabled='true'>
<dependency name='milestone' grouping='require_all' restart_on='none' type='service'>
<service_fmri value='svc:/milestone/multi-user:default'/>
</dependency>
<dependency name='autofs' grouping='require_all' restart_on='none' type='service'>
<service_fmri value='svc:/system/filesystem/autofs:default'/>
</dependency>
<dependency name='nis' grouping='require_all' restart_on='none' type='service'>
<service_fmri value='svc:/network/nis/client:default'/>
</dependency>
<exec_method name='stop' type='method' exec='/home/sch/bin/vncserver_method stop' timeout_seconds='60'/>
<exec_method name='start' type='method' exec='/home/sch/bin/vncserver_method start' timeout_seconds='300'>
<method_context>
<method_credential user='sch' group='staff' />
</method_context>
</exec_method>
</instance>
</service>
</service_bundle>
The dependencies above are needed if you use NFS for home directories and NIS for name services; they could be reduced for less networked setups.
And, for the method, we have a short Perl program. The complete list of
environment variables in login(1) would include
LOGNAME, PATH, MAIL, and
TZ (timezone), and exclude my silly setting of
LANG, but most of these will be set up by the shell that
the VNC startup script (its analgue to .xinitrc. The
various print calls are just to let the service log show a
little activity, and could be removed.
#!/usr/perl5/bin/perl
require 5.8.3;
use strict;
use warnings;
use locale;
my ($name, $passwd, $uid, $gid, $quota, $comment, $gcos, $dir, $shell,
$expire) = getpwuid "$<";
$ENV{USER} = $name;
$ENV{HOME} = $dir;
$ENV{SHELL} = $shell;
$ENV{LANG} = "en_CA"; # Just to create havoc (i.e. expose bugs).
#
# The stop method is run as root so that it can cleanup.
#
if (defined($ARGV[0]) && $ARGV[0] eq "stop") {
# ksh and sh specific
print "stop method\n";
system("$ENV{SHELL}", "-c", "/opt/csw/bin/vncserver -kill :1");
if (-S "/tmp/.X11-unix/X1") {
unlink("/tmp/.X11-unix/X1");
unlink("/tmp/.X1-lock");
}
exit 0;
}
#
# The start method is run with the user's identity.
#
print "start method\n";
if (-f "/tmp/.X1-lock") {
unlink("/tmp/.X1-lock");
}
if (-S "/tmp/.X11-unix/X1") {
system("logger -p 1 application/vncserver requires " .
"/tmp/.X11-unix/X1 be removed");
exit 0;
}
# ksh and sh specific
{ exec "$ENV{SHELL}", "-c",
"/opt/csw/bin/vncserver -pn -geometry 1600x1200 -depth 24 :1" };
system("logger -p 1 application/vncserver can't exec /opt/csw/bin/vncserver");
exit 1;
And now we have always-on VNC service for the regular telecommuter:
$ svcs -p vncserver STATE STIME FMRI online 13:01:01 svc:/application/vncserver:sch 13:01:00 100577 Xvnc 13:01:17 100625 xwrits 13:01:17 100626 ctrun 13:01:17 100632 xautolock 13:11:18 102348 xlock $ uptime 12:00pm up 23 hr(s), 4 users, load average: 0.04, 0.07, 0.07
Exercises
- Remove the hard coded display numbering (":1", "X1", etc.).
- Make the resolution, display depth, RGB encoding, and other standard options into properties.