Thursday July 19, 2007 Observations on packaging
Over the past few months, a bunch of us have been exploring various options for packaging. (Actually, I suppose I've been pursuing this on a part-time basis for a year now, but it's only recently that it's made it to the top of the stack.) I've looked at a bunch of packaging systems, ported a few to OpenSolaris, run a bunch on their native platforms, and read a slew of manual pages, FAQs, blogs, and the like. In parallel with those efforts, Bart, Sanjay, and others have been analyzing the complexity around patching in the Solaris 10 patch stream, and improving the toolset to address the forms of risk that the larger features in the update releases presented.
In the course of those investigations, we've come up with a number of different approaches to understanding requirements and possibilities; I'll probably write those out more fully in a proper design document, but I thought it would be helpful to outline some of those constraints here. For instance, one way to look at how we might improve packaging is to separate the list of "design inputs" for a packaging system into "old" and "new".
When I make a list of this kind, I know it's bound to offend. (I already know it's more scattershot than the argument we'll present in a design document.) Feel free to send me the important inputs I've omitted, as I have a few follow-up posts on requirements—lower level, more specific intentions—and architectural thoughts where I can cover any issues not mentioned here.
"Old" inputs
The "old" inputs are those that are derived from current parts of the feature set around software packaging and installation. Many of these inputs will really be satisfied by the new installer work ("Caiman") Dave's leading, but the capabilities of the packaging system will change some from difficult and fragile to straightforward and robust. With effort, some might even achieve some elegance.
- Hands off. For a long time, via the JumpStart facility, it's been easy to provision systems via scripted installed. This technology is still sound, particularly when we consider that the Flash archives allow a mix of image-based provisioning with the customizations JumpStart offered.
- Virtualized systems. As long or longer, we've supported the notion of diskless systems, where the installed image is shared out, in portions, between multiple running environments. Zones was a direct evolutionary successor of the diskless environments, and shows that this approach can lead to some very economical deployment models. Lower-level virtualization methods can also benefit from the administrative precision that comes out of sharing known portions of the installation image.
- Availability and liveness. With Live Upgrade, it's been possible for a while now to update a second image on a running system while the first image is active—a command to switch images and a reboot is all that's required to be running the new software. This approach requires too much feature-specific knowledge at present, but provides a very safe approach to installing an upgraded kernel or larger software stack, as reverting to the previous version is just a matter of switching back to the previous image. So, a package installation option that doesn't ruin a working system is a must.
- Change control. In principle, with the current set of package and patch operations, it is possible to create a very specific configuration—that may never have been tested at any other site, mind you—from the issued stream of updates. From a service perspective, the variety of possible configuration combinations is too broad, but the ability to precisely know your configuration and make it a fixed point remains important.
- Migration and compatibility. There's a very large library of software packaged in the System V format that will run on most OpenSolaris-based systems. Providing well-defined compatibility and migration paths is an obvious constraint, given the goals of the development process.
- Networked behaviour. Believe it or not, the current installer and packaging support a large set of network-based operations—you can install specific packages and systems over the network, if you know where everything is. That is, the current components need to be assembled into some locally relevant architecture by each site to be useful—any replacement needs to make this assembly unnecessary, potentially via a number of mechanisms, but definitely via a list of known (trusted?) public package repositories.
"New" inputs
But, as you might expect, the efforts around Approachability, Modernization, and Indiana have brought to light some new qualities a packaging system must possess.
- Safety. One of the real complications from virtualized systems, at least in current packaging, is that a developer has to understand each of the image types his or her package might reach, and make sure that the set of pre- and post-install scripts, class actions scripts, and the like are correct for each of these cases. When that doesn't happen, the package is at a minimum incorrectly installed; in certain real cases, this class of failures compromises other data on the system. Restrictions in this space, particularly during package operations on an inert image, seem like a promising trade-off to achieve greater safety.
- Developer burden. Current packaging requires the developer provide a description of the package, its dependencies, and contents across a minimum of three files, in addition to knowing a collection of rules and guidelines around files, directories, and package boundaries. Most of these conventions should be encoded and enforced by the package creation and installation tools themselves, and it should be possible to construct a package with only a single file—or from an existing package or archive.
-
smf(5) aware. OpenSolaris-based systems have an understanding of services and service interdependencies, derived from the Service Management Facility's service descriptions.smf(5) service management during packaging operations is awkward under current packaging and very much needs to be improved but, more importantly, the service graph provides a rich set of relationships among larger components that should lead to a better understanding of the rules around consistent system assembly. -
Minimization. For a while, the current system has had
some very coarse package "blocks", from which one could construct a
system—the various metaclusters. These, with the exception of the
minimally required metacluster and the entire metacluster, split the
system along boundaries of interest only to workstation installs. Any
suitable packaging system must provide query and image management tools
to make image construction more flexible and less error-prone (and
eliminate the need for things like a "developer" metacluster, for that
matter).
It's also pretty clear that the package boundaries aren't optimized in any fashion, as evidenced by the differing rates of change of the binaries they currently enclose—in the form of the issued patches against Solaris 10.
-
Multiple streams of change, of a single kind. Although
we noted the continued need to control change above, it's also important
to be able to subscribe to a stream of change consistent with one's
expectations around system stability. The package system needs to allow
one to bind to one or more streams of change, and limit how the
interfaces the aggregate binaries from those streams evolve. That is,
it should be possible to subscribe to a stream of only the most
important updates, like security and data corruption fixes, or to the
development stream, so that one's system changes in a way consistent
with one's expectations.
Conversely, the tradeoff between complexity and space optimization in current patches—which introduce a separate and difficult-to-calculate dependency graph and distinct namespace entries for each platform, among other issues—has slid much too far, given the increase in system richness and the increases in disk space and bandwidth. There seems to be little long-term benefit in preserving the current patch mechanism, particularly since Sun never offered it in a form useful outside of Sun's own software.
-
ZFS aligned. ZFS offers the administrator access to so
many sophisticated options around deployment and data management that it
would be foolish for a packaging system to not explore how to take
advantage of them—
zfs snapshotandzfs cloneare the most obvious capabilities that allow the packaging system to limit the risk of its modifications to an image, without excessive consumption of storage space or system time. -
Prevent bad stuff early. Another classic
OpenSolaris-style requirement is that the set of available packages
be known to be self-consistent and correct at all times, to spare the
propagation of incomplete configurations. In a packaging system, this
input expands on our intent to reduce the developer burden to assist the
developer in writing as correct a package as possible, and to enable the
repository operator to block incomplete packages from being made
available to clients. There's a rich set of data here, as we noted for
smf(5) service descriptions above. -
Friendly user deployment. Direct from Indiana,
but sensible and apparent to all is that packaging systems have advanced
to a point where the usability of the client is expected, and not an
innovation. I haven't got the complete timeline of packaging
developments—the literature survey continues—but it's clear
that Debian's
aptsystem marks the current expectations about the client-side capability and ease-of-use.
In the course of the Indiana discussions, Bryan raised one point, which I'll paraphrase as "it's not an idea until there's code". That's a sound position, which I also happen to hold—Danek and I (and Bart and Daniel, I hope) and have been quietly prototyping for a little while now, to see if we had a handle on this collection of inputs. I'd like to give a bit more background, in the form of requirements and assertions, in an upcoming post or two. Then we're hoping to start a project on opensolaris.org to take the idea all the way from notional fancy to functioning code.
[ T: OpenSolaris Solaris smf Indiana pkg ]
(2007-07-19 12:58:19.0) Permalink Comments [17]OpenSolaris: Five updates conservative developers should make
It's been almost two-and-a-half years since Solaris 10 was released, and if we look at Nevada (via Developer Edition or one of the other distributions), we can see that many of the technologies introduced in S10 are becoming still more capable. At this point, even the most conservative software developer can assume that certain features are always present. So, for the conservative OpenSolaris application developer, here are the five low-risk, high-reward updates you should make to your application:
Provide x86 and SPARC versions. OpenSolaris has two primary instruction set targets,
i386andsparc. Each of these has both a 32- and a 64-bit variant. The metrics on Solaris 10 and SX:CE/DE downloads tell us that the Solaris volume is substantial on both targets so, for maximum uptake, you should attempt to offer software on both.On x86, you should consider delivering both 32- and 64-bit versions, if your application can take advantage of a 64-bit address space. But there is a large contingent of 32-bit only users, so don't stop delivering appropriate binaries prematurely.
Of course, if you're writing at a hardware-independent level, like on a Java language platform, then you get x86 and SPARC (and presumably others) for free.
Make packages that deliver into sparse zones. The primary software delivery mechanism is still System V packages—but your software's already packaged properly, so that's not an issue. (Right?) With Solaris 10, the Zones feature offers a sparse variant that requires package support. Roughly, this support means that the package author shouldn't deliver into
/usrand should add the three properties needed to thepkginfofile.There are some fairly serious Zones deployers out there; Joyent is probably the most public example, but there are plenty of corporate datacentres using Zones to their advantage. If you want your software run by them or their customers, providing a Zones-compatible package seems like the easiest way to get it into their hands and onto their Zones.
Replace your
init.dscripts withsmf(5) manifests. The Service Management Facility (smf(5)) provides a collection of capabilities that make service administration easier, while also reducing the development burden on application authors. Converting your service from therc*.dscript to a service description and methods means that administrators get automatic restart (and higher service availability), an easy on/off switch, and a place to make site-specific annotations (using the various template properties). There's a free comptetive advantage here, if your service runs undersmf(5) and a rival's doesn't.Of course, you can do more: placing key configuration values in the service repository means that various administrative utilities can be taught to make manipulating your application's feature set easy to the deploying administrator. But that won't happen without an initial service conversion.
(Once you write a manifest for your service, you'll also probably want to write a rights profile, so that administrative authority for your service and its instances can be easily delegated.)
Understand needed privileges. One of the more interesting features in Solaris 10 and later is the work Casper did to split out the absolute privilege owned by
rootinto a specific collection of privileges. That means that you can take away a process's ability to fork or exec, change file ownership, or manipulate or utilize various subsystems of the operating system. If your application runs with the minimal set of privileges it needs to function, then the set of actions a hypothetical exploit against your application can invoke becomes limited, which reduces the impact of an intrusion. You can reduce your privileges via thesmf(5) manifest you wrote for #3, via the role-based access control (RBAC) configuration, or via the privileges API.Don't unnecessarily ship duplicate components. The various OpenSolaris distributions include a lot of software; most of these offer one or more update mechanisms for the components they include. Whether or not you prefer minimal patches to wholesale package replacements, if you ship a duplicate component, it's your responsiblity to update it if a defect or security hole is found. Sometimes you have to ship a component—the distros don't update it often enough—but private libraries (or private copies of the Java runtime) have a collection of costs, many of which are imposed on your customer.
For specific kinds of software, there's more to investigate. Language
interpreters and byte-code virtual machines (and probably complex
daemons) should have
DTrace providers.
Network device drivers should write to the latest version of the generic
LAN device (gld) interface. Archival programs should be
ZFS-compatible—there's going to be a lot of data on
ZFS. Daemons should
investigate using libumem for memory allocation (and event ports in
place of poll(2) or select(3C)). And so on.
There are OpenSolaris
communities for each of these
topics but, if you're having trouble getting started, I would suggest an
email to
opensolaris-code,
that reads something like: "I have a [daemon/driver/app] that does
[practical purpose/amazing feat/curious entertainment]. Are there any
OpenSolaris features I can use to make it better?"
Looking forward to your mail.
Thanks to Dave for #5. Dave also confesses to being keen on #3.
[ T: OpenSolaris Solaris smf privilege RBAC zones ]
(2007-06-06 17:17:06.0) Permalink Comments [4]SFW: Integrating coreutils and which variants
Last night, I finished up another task, in an attempt to reduce my
current multitasking factor: I integrated initial versions of
coreutils and which from the GNU Project into the Freeware
consolidation. As a lower priority task, it took longer than a
more dedicated developer might have managed, but it's reasonably
pleasing to look back:
initial idea, in case form, proposed and repeatedly revised on
companion-discuss,draft fast track circulated on
sfwnv-discuss,fast tracks for
/usr/gnu,coreutils, andwhichsubmitted as open PSARC fast track cases,preliminary, intermediate, and final code reviews on
sfwnv-discuss, and,finally, an integration message to
sfwnv-discuss.
There's still a bunch of process associated with SFW that requires redesign—legal review and Section 508 compliance, in particular—but I think, barring the latent intervals, this sequence was a reasonable consensus-driven open development experience.
If you look at other opensolaris.org mailing
lists during June – November and February – April, you'll be
able to verify that I was indeed working—just on other things, and
not just surfing the Web...
Of course, now that I know that these commands will start to show up more widely when Build 67 is released, I can update my dotfiles, so I get the versions I prefer:
$ svn diff
Index: sh-functions
===================================================================
--- sh-functions (revision 91)
+++ sh-functions (working copy)
@@ -36,6 +36,10 @@
PATH=$HOME/bin:$HOME/bin/$(/usr/bin/uname -p):$PATH
MANPATH=$HOME/man:$MANPATH
;;
+ gnu) # PREPEND: Bundled GNU command variants
+ PATH=/usr/gnu/bin:$PATH
+ MANPATH=/usr/gnu/share/man:$PATH
+ ;;
Index: bashrc
===================================================================
--- bashrc (revision 91)
+++ bashrc (working copy)
@@ -49,8 +49,12 @@
path clear home sfw csw
fi
-if hash gls > /dev/null 2>&1; then
- alias ls="gls --color -CF"
+if [ -x /usr/gnu/bin/ls ]; then
+ alias ls="/usr/gnu/bin/ls --color -CF"
fi
+if [ -x /usr/gnu/bin/which ]; then
+ alias which="/usr/gnu/bin/which"
+fi
+
export CDPATH=$MACHINE_CDPATH_PRE:$HOME/projects:$HOME:$MACHINE_CDPATH_POST
If you're using a distribution that offers SUNWgnu-coreutils,
SUNWgnu-which, and the other /usr/gnu packages, do share your
feedback with the maintainers on sfwnv-discuss—or become one and
pick your favourite package.
[ T: OpenSolaris Solaris SFW coreutils which ]
(2007-06-05 15:47:41.0) PermalinkBespoke services: network/rmi/registry
Gary and I were recently
prototyping an application that uses Java RMI, and so I ended up
searching around to see if anyone has done a service conversion for
rmiregistry(1). (rmiregistry(1) is the daemon
that lets RMI clients find the available remote objects being served by
various virtual machines on a given system.) Turns out no one has (or
no one's published it), which means it's time to rev up the
convert-o-tron.
Since we're still developing our application and it's likely we'll
change a definition or two, and since we need to restart the registry to
cause the remote objects, we're going to make our prototype service
restart automatically if we restart the registry. That means our
prototype service has a dependency on network/rmi/registry
with specific restart_on behaviour, meaning that its
service description has a fragment like the following:
<!--
As an RMI server application, we expect to be able to
register our RMI classes with the registry server.
-->
<dependency
name='rmi-registry'
grouping='require_all'
restart_on='restart'
type='service'>
<service_fmri value='svc:/network/rmi/registry' />
</dependency>
Inject that fragment into your various RMI servers' descriptions (or the equivalent property group into the repository) and you'll save a bit of time on application reinitializations.
So, if you're interested, please feel free to take a copy of network/rmi/registry;
comments and corrections welcome.
[ T: OpenSolaris Solaris smf RMI ]
(2006-03-28 15:26:56.0) Permalink Comments [3]store.sun.com to Niagara/Solaris 10
James Dickens spied a new BluePrint on
the planned redeployment of store.sun.com [PDF] to SunFire T2000 systems running Solaris 10.
Beyond the tremendous reduction in occupied space and the around 90 percent estimated reduction in
input power and output heat, the document describes the use of Solaris Containers to consolidate
the middle tier servers, complete with invocations of zonecfg(1M) (for the zones the applications
run inside of) and poolcfg(1M) (for the resource pools the zones sit upon).
Business applications are complex—maybe some smf(5) service conversions will appear in the next version to make
the dependency and failure handling more precise.
LISA05 Tuesday: device errors, iostat, and logging
One of the questions raised at Tuesday night's BoF was "why are some of
the statistics that iostat -E displays result in a console message and
some do not?" I was sitting in the back with a copy of
Mike Kupfer's split ON source
tree,
and decided to have a look. iostat(1M) is a kstat
reader, with some simple processing and formatted output. The output
function is
show_disk_errors()
but requires understanding how iostat groups the disks and statistics
in its implementation; the code that acquires the device statistics is
located in cmd/stat/common/acquire_iodevs.c. Searching for error show
that the critical function is
acquire_iodev_errors(),
and that two classes of kstats contribute to the error output:
device_error and iopath_error.
The most direct way to see these statistics is to invoke kstat(1M)
with these classes. On my laptop, the result for device_error is
$ kstat -p -c device_error sderr:0:sd0,err:Device Not Ready 0 sderr:0:sd0,err:Hard Errors 0 sderr:0:sd0,err:Illegal Request 1 sderr:0:sd0,err:Media Error 0 sderr:0:sd0,err:No Device 0 sderr:0:sd0,err:Predictive Failure Analysis 0 sderr:0:sd0,err:Product UJ-832D Revision sderr:0:sd0,err:Recoverable 0 sderr:0:sd0,err:Revision 1.50 sderr:0:sd0,err:Serial No sderr:0:sd0,err:Size 0 sderr:0:sd0,err:Soft Errors 1 sderr:0:sd0,err:Transport Errors 0 sderr:0:sd0,err:Vendor MATSHITA sderr:0:sd0,err:class device_error sderr:0:sd0,err:crtime 76.139658104 sderr:0:sd0,err:snaptime 1857.960128997
(The laptop has no kstats of class iopath_error.)
We then look for the creation of the named kstat for each of these
strings—invocations of kstat_create() to identify the
structure member names associated with each. And then we can look for
statements that involve those member names; this leads us to the various
SDUPDATEERRSTATS() invocations throughout
uts/common/io/scsi/targets/sd.c.
The discrepancy between updates and logging arises because the macro,
SD_UPDATE_ERRSTATS(),
which bumps the counters and the function which displays the error,
sd_print_sense_msg(),
are sometimes both invoked, and sometimes are not. I don't know the
details of the SCSI error categories, but the decision to make some of
these errors silent and some not appears arbitrary. So, unfortunately,
the only answer today to determine "why is this messaged" is to look at
the code. (Perhaps an 'sd' expert can offer an enlightening comment.)
If you're trying to anticipate and avoid potential failures, having
random messages emitted arbitrarily isn't very helpful: that's why
Mike and
the FMA team
developed the fault management architecture to have a framework in which
errors are processed in a predictable fashion, resulting in the proper
diagnosis of faults. Eric described
one possible scenario involving disk errors, FMA, and ZFS
a couple of weeks
ago, but there's a smaller step that seems useful involving only sd and
FMA: the error increments could be converted to error events, and the
decision to issue a notice deferred until a series of errors is
diagnosed into a fault except, I suppose, if the error can be
immediately diagnosed as a critical fault. Taking this step would
result in consistent reporting for all disk consumers, including less
sophisticated consumers than ZFS, like older filesystems and raw disk
accessors.
In a software engineering sense, the FMA approach, where error issuance and
diagnosis are separated, is much more sound: at the initial driver
software composition, the field experience with the hardware device is
typically limited. Over time, the actual impact of errors on system
practice becomes better known, and the diagnosis and the actions
associated with it can be refined. fmd(1M) can handle on-the-fly
module updates gracefully, and also deal with overlapping event flows so
that both primitive and ZFS-specific fault handling policies can be
implemented, depending on the use of a particular device.
Now, the community that's discussing technical issues and directions for fault management is aptly called the Fault Management community; if you are interested in how this work is going to proceed, and ways to contribute, I suggest joining it.
[ T: Solaris OpenSolaris LISA05 FMA zfs iostat kstat ]
(2005-12-07 11:44:15.0) Permalink Comments [0]smf(5): Stepping through an rc.d conversion
Over on mediacast.sun.com, I see Bob Netherton has posted a nice tutorial from Solaris Boot Camp, entitled "Migrating a legacy RC service". The presentation covers the hiccoughs you might run into during your first conversion, and its step-by-step approach is very soothing.
[ T: OpenSolaris Solaris smf ]
(2005-10-10 11:45:23.0) Permalink Comments [0]libuutil and designing for debuggability
Going into Solaris 10, I knew we were planning to develop a
troupe of new daemons; we ultimately ended up with
svc.startd(1M), svc.configd(1M), and a new
implementation of inetd(1M). I wanted to make sure we made
some progress on daemon implementation practice, and bounced some ideas
around with the afternoon coffee group and
also with Mike, and probably some others—I wander
around a bit.
We anticipated that most of the daemons would be multithreaded, and it became apparent that they would all present large, complicated images for postmortem debugging1. To reduce the time to acquire familiarity with each of these daemons, we worked out three common requirements:
- include Compact C Type Format (CTF) information with each daemon,
- use
libumem(3LIB) for memory allocation, and - use standard, debuggable, MT-safe implementations of data structures.
The problem was, of course, that there wasn't a library with such data
structures in Solaris at the time.2, 3. So we began to design
libuutil, which combines a number of established utility
functions used in authoring Solaris commands with these new
"good" implementations of useful data structures.
The library in question was named in sympathy with
libumem(3LIB)—libuutil for "userland
utility functions". libuutil provides both a doubly linked
list implementation and an AVL tree implementation. The list
implementation is mostly located in lib/libuutil/common/uu_list.c;
we'll use that to explore the debugging assistance we designed in.
The model used is that each program is likely to have multiple lists of
common structures, and that there would be multiple such structures.
This led us to create an interface that is expressed in terms of pools
of list. So, for each structure, you create a uu_list_pool_create(). Then, for each
list of that structure, you create a list in the respective pool using
uu_list_create().
That sounds complicated, but it's for a good reason: at each call to
uu_list_pool_create(), we register the newly created pool
on a global list, headed by the "null pool", uu_null_lpool:
uu_list_pool_t *
uu_list_pool_create(const char *name, size_t objsize,
size_t nodeoffset, uu_compare_fn_t *compare_func, uint32_t flags)
{
uu_list_pool_t *pp, *next, *prev;
/* validate name, allocate storage, initialize members */
(void) pthread_mutex_init(&pp->ulp_lock, NULL);
pp->ulp_null_list.ul_next = &pp->ulp_null_list;
pp->ulp_null_list.ul_prev = &pp->ulp_null_list;
(void) pthread_mutex_lock(&uu_lpool_list_lock);
pp->ulp_next = next = &uu_null_lpool;
pp->ulp_prev = prev = next->ulp_prev;
next->ulp_prev = pp;
prev->ulp_next = pp;
(void) pthread_mutex_unlock(&uu_lpool_list_lock);
return (pp);
}
with similar code being used to connect each list to its pool on calls
to uu_list_create().
So now we have an address space where each list pool is linked in a
list, and each list in a pool is linked to a list headed at that pool.
This leads us to the second part, which is to use the encoded
information in a debugger. The typical debugger for kernel work in
Solaris is mdb(1), the modular debugger. It's been
shipping with Solaris since 5.8, and has a rich set of extensions for
kernel debugging. For userland, the modules are rarer:
libumem is probably the best known.4
The source code for the libuutil module (or "dmod") is
located at
cmd/mdb/common/modules/libuutil/libuutil.c;
the function that provides the dcmd itself, uutil_listpool,
is just a wrapper around the walker for uu_list_pool_t
structures. The pertinent portion is the initialization function,
uutil_listpool_walk_init():5
int
uutil_listpool_walk_init(mdb_walk_state_t *wsp)
{
uu_list_pool_t null_lpool;
uutil_listpool_walk_t *ulpw;
GElf_Sym sym;
bzero(&null_lpool, sizeof (uu_list_pool_t));
if (mdb_lookup_by_obj("libuutil.so.1", "uu_null_lpool", &sym) ==
-1) {
mdb_warn("failed to find 'uu_null_lpool'\n");
return (WALK_ERR);
}
if (mdb_vread(&null_lpool, sym.st_size, (uintptr_t)sym.st_value) ==
-1) {
mdb_warn("failed to read data from 'uu_null_lpool' address\n");
return (WALK_ERR);
}
ulpw = mdb_alloc(sizeof (uutil_listpool_walk_t), UM_SLEEP);
ulpw->ulpw_final = (uintptr_t)null_lpool.ulp_prev;
ulpw->ulpw_current = (uintptr_t)null_lpool.ulp_next;
wsp->walk_data = ulpw;
return (WALK_NEXT);
}
which safely pulls out the value of the uu_null_pool head
element, and the relevant pieces we'll need to walk the list.
This means that, for any program linked with libuutil,
we can attach with mdb(1M) and display its list pools:
# mdb -p `pgrep -z global startd` Loading modules: [ svc.startd ld.so.1 libumem.so.1 libnvpair.so.1 libsysevent.so.1 libuutil.so.1 libc.so.1 ] > ::uu_list_pool ADDR NAME COMPARE FLAGS 080dcf08 wait_info 00000000 D 080dce08 SUNW,libscf_datael 00000000 D 080dcd08 SUNW,libscf_iter 00000000 D 080dcc08 SUNW,libscf_transaction_entity c2b0476c D 080dc808 dict 0805749c D 080dc908 timeouts 0806ffab D 080dca08 restarter_protocol_events 00000000 D 080dcb08 restarter_instances 0806ccd7 D 080dc708 restarter_instance_queue 00000000 D 080dc608 contract_list 00000000 D 080dc508 graph_protocol_events 00000000 D 080dc408 graph_edges 00000000 D 080dc308 graph_vertices 08059844 D >
and then drill down into constituent lists of interest.
Additional walkers are also provided, such that the lists and list nodes
can be visited from the command line or programmatically. As an example,
the ::vertex dcmd from the svc.startd module
uses the walkers to display the various service graph nodes in a
quasi-readable format.5
So, by providing extra structured information in the library and support to consume that information in the debugger, we end up with a set of data structures that, if used, leads to more debuggable programs. More work up front for less later: welcome to OpenSolaris.
Footnotes
1. By postmortem debugging, I'm referring to the operation of debugging a failed application after its failure, from a core file or other memory image captured as soon after that failure as possible. Suitability for postmortem debugging is a standard expectation for software design in Solaris, as it reduces the time to diagnose and fix software failures. In particular, multiple engineers can debug a core file in parallel; this can be contrasted with the cost of setting up a duplicate installation and trying to reproduce the failure, let alone expecting the customer to risk further downtime experimenting with "try this" scenarios.
2. Please remember that we were making these decisions three years ago, and that this choice had to fit the then-applicable constraints on the product.
3. In contrast, the kernel has had a generic, modular hash table since 5.8/2000 (uts/common/os/modhash.c), a generic AVL tree since 5.9/2002 (common/avl/avl.c), and a generic list implementation early in 5.10/2005 (uts/common/os/list.c). Of course, the kernel has used the slab allocator (uts/common/os/kmem.c) since 5.4/1994.
4. A quick listing in
/usr/lib/mdb/proc/ will display the other modules valid in the
process target: beyond libumem and libuutil,
there's support for the linker, libc, name-value pairs,
system event, and the two main smf(5)
daemons.
5. As an example, here's the output
of "::vertex on my current system, for those services
related to my VNC server (and the service itself):
> ::vertex ! grep vnc 0x85d3380 212 I 1 svc:/application/vncserver:sch 0x85d3320 213 s - svc:/application/vncserver 0x85d3200 214 R - svc:/application/vncserver:sch>milestone 0x85d3260 215 R - svc:/application/vncserver:sch>autofs 0x85d32c0 216 R - svc:/application/vncserver:sch>nis >
[ T: OpenSolaris Solaris mdb ]
(2005-06-14 09:51:44.0) Permalink Comments [1]Bespoke services: application/vncserver
In honour of the "Mugs for Manifests" contest, I thought I would spin out another custom service description I wrote some months ago.
My setup for working from home—key during the last six months of Solaris 10—is to tunnel into Sun's network via one implementation or another of a virtual private network (VPN). In all cases, the VPN solution runs on Solaris. Although the VPN lets your system participate more or less like a regular host, I find it's easier to use VNC to remotely present an X11 display from my main workstation, muskoka. But, of course, machine running pre-production bits can fail or be rebooted or be reinstalled regularly, so I wanted the VNC server on my system to always be up: I wanted a VNC service.
What's distinct about running the VNC server is that it should run as
me, with my environment, and not as root with init(1M)'s.
svc.startd(1M), while it can run methods according to
smf_method(5), doesn't populate the environment fully in
the sense of login(1). So we will need to extract some
data from the name
service, which is cumbersome to perform in a shell script. We'll write
our method in Perl, which implies
Tip 1: Methods need not be shell scripts.
In fact, the start method and the stop method can be totally separate commands: you could write one in Python, and one can be an executable Java .jar archive, or some even more bizarre combination.
The other trick is that, if VNC fails for some reason, I want to be
aggressive about cleaning up its various leftover temporary files. For
this purpose, I run the stop method with a different
credential—the default of root—than the start
method, which is done in our brief manifest by locating the
<method_context> element on only the start method.
Tip 2: Methods need not be run with identical method contexts. Credentials, privileges, and the like may all differ from method to method.
Our manifest then looks like:
<?xml version='1.0'?>
<!DOCTYPE service_bundle SYSTEM '/usr/share/lib/xml/dtd/service_bundle.dtd.1'>
<service_bundle type='manifest' name='export'>
<service name='application/vncserver' type='service' version='0'>
<single_instance/>
<instance name='sch' enabled='true'>
<dependency name='milestone' grouping='require_all' restart_on='none' type='service'>
<service_fmri value='svc:/milestone/multi-user:default'/>
</dependency>
<dependency name='autofs' grouping='require_all' restart_on='none' type='service'>
<service_fmri value='svc:/system/filesystem/autofs:default'/>
</dependency>
<dependency name='nis' grouping='require_all' restart_on='none' type='service'>
<service_fmri value='svc:/network/nis/client:default'/>
</dependency>
<exec_method name='stop' type='method' exec='/home/sch/bin/vncserver_method stop' timeout_seconds='60'/>
<exec_method name='start' type='method' exec='/home/sch/bin/vncserver_method start' timeout_seconds='300'>
<method_context>
<method_credential user='sch' group='staff' />
</method_context>
</exec_method>
</instance>
</service>
</service_bundle>
The dependencies above are needed if you use NFS for home directories and NIS for name services; they could be reduced for less networked setups.
And, for the method, we have a short Perl program. The complete list of
environment variables in login(1) would include
LOGNAME, PATH, MAIL, and
TZ (timezone), and exclude my silly setting of
LANG, but most of these will be set up by the shell that
the VNC startup script (its analgue to .xinitrc. The
various print calls are just to let the service log show a
little activity, and could be removed.
#!/usr/perl5/bin/perl
require 5.8.3;
use strict;
use warnings;
use locale;
my ($name, $passwd, $uid, $gid, $quota, $comment, $gcos, $dir, $shell,
$expire) = getpwuid "$<";
$ENV{USER} = $name;
$ENV{HOME} = $dir;
$ENV{SHELL} = $shell;
$ENV{LANG} = "en_CA"; # Just to create havoc (i.e. expose bugs).
#
# The stop method is run as root so that it can cleanup.
#
if (defined($ARGV[0]) && $ARGV[0] eq "stop") {
# ksh and sh specific
print "stop method\n";
system("$ENV{SHELL}", "-c", "/opt/csw/bin/vncserver -kill :1");
if (-S "/tmp/.X11-unix/X1") {
unlink("/tmp/.X11-unix/X1");
unlink("/tmp/.X1-lock");
}
exit 0;
}
#
# The start method is run with the user's identity.
#
print "start method\n";
if (-f "/tmp/.X1-lock") {
unlink("/tmp/.X1-lock");
}
if (-S "/tmp/.X11-unix/X1") {
system("logger -p 1 application/vncserver requires " .
"/tmp/.X11-unix/X1 be removed");
exit 0;
}
# ksh and sh specific
{ exec "$ENV{SHELL}", "-c",
"/opt/csw/bin/vncserver -pn -geometry 1600x1200 -depth 24 :1" };
system("logger -p 1 application/vncserver can't exec /opt/csw/bin/vncserver");
exit 1;
And now we have always-on VNC service for the regular telecommuter:
$ svcs -p vncserver STATE STIME FMRI online 13:01:01 svc:/application/vncserver:sch 13:01:00 100577 Xvnc 13:01:17 100625 xwrits 13:01:17 100626 ctrun 13:01:17 100632 xautolock 13:11:18 102348 xlock $ uptime 12:00pm up 23 hr(s), 4 users, load average: 0.04, 0.07, 0.07
Exercises
- Remove the hard coded display numbering (":1", "X1", etc.).
- Make the resolution, display depth, RGB encoding, and other standard options into properties.
smf(5) not-quite-free stuff
I'm in the middle of some longish, and one rather preachy, blog posts. These will need editing, so to pep things up...
Like one of these?
We had a bunch of custom mugs made up, to commemorate the completion of
smf(5)'s integration into Solaris 10. If you've been at a
customer or community presentation on S10 or smf(5), you
might have received one: for asking a good question, for answering one
from me, or for physical attendance. But these mugs—fine, solid,
large capacity, high quality mugs for coffee, tea, or even
pens—are heavy: too heavy for us to lug a box to all the
conferences we might attend.
So instead we're going to run a little contest.
Liane summarized our understanding of other service conversions circulating a few months ago. I'd like to get another batch done, and there's no incentive like a ceramic container incentive, so I'm going to suggest a few categories:
-
Historical: Convert one (or more) of the unconverted services in
/etc/rc*.din Solaris 10. -
Free/Open: Convert a F/OSS daemon to be an
smf(5) service. -
Commercial: Convert a commercial software package to be one or
more
smf(5) services. - Artistic/Offbeat: Convert something unexpected into a particularly elegant service.
The conditions are pretty simple: there are 36 mugs in the box, so the first round can have 36 winners. One mug for each converted service; the winning entry for a specific service will be judged by completeness (dependencies in particular), correctness (methods), utility (will anyone else use this?), and date received. I'll give some no-prize honorable mentions in each category as well. This round will be quick: entries must be received by June 15th.
An entry should disclose:
- Your name,
- preferred email,
- blog URL (optional),
- mailing address,
- description of the software (plus details if obscure) and
- the service manifest and method(s) (if any), or
- an accessible URL to same.
smf(5) keeners to help me evaluate the submissions.
Services on the list Liane gave are not eligible, unless you think your conversion is substantially better by the criteria above.
If your conversion wins, I'll send you your mug via an amazing cooperative, potentially international, mechanism composed of government-granted-monopoly package delivery agencies. Winners, and their entries (or pointers) will be posted here.
(2005-05-27 16:02:08.0) Permalink Comments [0]Banging on multiple heads
I spent a bit of time each of the past few weeks trying out different
graphics cards to drive two displays—a multihead configuration.
Presently, I'm using an older ATI Radeon 7000-based card to run two
displays at 1600 × 1200 each. The radeon driver
included with the X.org X server can knit these together into a single
display with an effective resolution of 3200 × 1200.
Other folks are using nVidia drivers to get similar configurations.
Once I had this setup running, I started to see familiar applications fail with a pleasant message:
$ gvim The program 'gvim' received an X Window System error. This probably reflects a bug in the program. The error was 'BadWindow (invalid Window parameter)'. (Details: serial 13 error_code 3 request_code 128 minor_code 2) (Note to programmers: normally, X errors are reported asynchronously; that is, you will receive the error a while after causing it. To debug your program, run it with the --sync command line option to change this behavior. You can then get a meaningful backtrace from your debugger if you break on the gdk_x_error() function.)
Dave Powell helped me to use xscope to watch the X11
protocol requests. This, and a little code browsing allows us a
diagnosis.
It turns out that Xsun and Xorg use different implementations of the
Xinerama extension (but with similar names). As far as I can tell, the
standard behaviour changed after Sun developed support for a draft
proposal. Now, a few years later, the applications know how to deal
with both versions. With Xorg though, you now have a Sun system which
doesn't speak Sun's Xinerama variant--hence our error message. Alan knows how to fix this for
real but, after looking at libgdk startup, it's pretty easy
to work around this with a preloaded shared object.
All we do is pretend that our display can't do Sun's Xinerama. That means we need a function like
$ cat > xin_shim.c
int
XineramaGetState()
{
return (0);
}
^D
We then compile it into a shared object
$ cc -o xin_shim.so -G -Kpic xin_shim.cor we could freely compile it into a shared object
$ which gcc /usr/sfw/bin/gcc $ gcc -o xin_shim.so -G -fpic xin_shim.c
Then, to use your shim, you use LD_PRELOAD (with an
absolute path)
$ LD_PRELOAD=`pwd`/xin_shim.so gvim [happy editing...]
Because xin_shim.so isn't in /usr/lib/secure,
you may see messages from setuid processes as the linker refuses
to preload your potentially unsafe object. The message looks something
like
ld.so.1: /usr/lib/utmp_update: warning: /home/sch/src/preloads/xin_shim.so: open failed: illegal insecure pathname(2005-04-27 15:06:25.0) Permalink Comments [1]
Bespoke services: application/catman
For various reasons—some reasonable, some suspect—Solaris doesn't ship with a compiled set of windex databases for its manual pages. The unfortunate result is that helpful commands like apropos(1) or man -k are unhelpful:
$ apropos sort /usr/man/windex: No such file or directory
smf(5) provides one way to address this shortcoming, via a transient service to be run during startup. Our service description
would be roughly equivalent to the following:
<?xml version="1.0"?>
<!DOCTYPE service_bundle SYSTEM "/usr/share/lib/xml/dtd/service_bundle.dtd.1">
<service_bundle type='manifest' name='sch:catman'>
<service
name='application/catman'
type='service'
version='1'>
<create_default_instance enabled='false' />
<single_instance />
<!--
By default, application/catman will run in the background
during boot. If you want to run it periodically, execute
# /usr/sbin/svcadm restart catman
If you wish to augment the default MANPATH, use the setenv
subcommand to svccfg(1M). For instance, to add the Java
manual pages to the build:
# /usr/sbin/svccfg -s application/catman
> setenv MANPATH /usr/share/man:/usr/java/man
> exit
# /usr/sbin/svcadm refresh catman
If MANPATH is not defined, the default manual path is
/usr/share/man, as per catman(1M).
-->
<dependency
name='local-filesystems'
type='service'
grouping='require_all'
restart_on='none'>
<service_fmri value='svc:/system/filesystem/local' />
</dependency>
<dependency
name='remote-filesystems'
type='service'
grouping='optional_all'
restart_on='none'>
<service_fmri value='svc:/network/nfs/client' />
<service_fmri value='svc:/system/filesystem/autofs' />
</dependency>
<exec_method
type='method'
name='start'
exec='/usr/bin/catman -w'
timeout_seconds='0' />
<exec_method
type='method'
name='stop'
exec=':true'
timeout_seconds='0' />
<property_group name='startd' type='framework'>
<propval name='duration' type='astring' value='transient' />
</property_group>
<stability value='Unstable' />
<template>
<common_name>
<loctext xml:lang='C'>
manual page index generation
</loctext>
</common_name>
<documentation>
<manpage
title='catman'
section='1M'
manpath='/usr/share/man' />
</documentation>
</template>
</service>
</service_bundle>
Following my own instructions in the comment block, I defined a value for MANPATH and refreshed the service.
My setting can be double-checked with svcprop(1) like so:
$ svcprop -p start application/catman start/exec astring /usr/bin/catman\ -w start/timeout_seconds count 0 start/type astring method start/environment astring MANPATH=/usr/share/man:/usr/openwin/man:/usr/sfw/man:/usr/dt/man:/usr/perl5/man:/usr/java/man:/usr/apache/man:/usr/X11/man:/opt/sfw/man:/opt/csw/man
Issuing "svcadm enable catman" will cause the service to be executed immediately, and upon each subsequent boot. Our earlier query
becomes fecund:
$ apropos sort FcFontSort FcFontSort (3fontconfig) - Return list of matching fonts aclsort aclsort (3sec) - sort an ACL alphasort scandir (3c) - scan a directory alphasort scandir (3ucb) - scan a directory bsearch bsearch (3c) - binary search a sorted table bunzip2 bzip2 (1) - a block-sorting file compressor and associated utilities bzcat bzip2 (1) - a block-sorting file compressor and associated utilities bzip2 bzip2 (1) - a block-sorting file compressor and associated utilities bzip2recover bzip2 (1) - a block-sorting file compressor and associated utilities disksort disksort (9f) - single direction elevator seek sort for buffers ldap_sort ldap_sort (3ldap) - LDAP entry sorting functions ldap_sort_entries ldap_sort (3ldap) - LDAP entry sorting functions ldap_sort_strcasecmp ldap_sort (3ldap) - LDAP entry sorting functions ldap_sort_values ldap_sort (3ldap) - LDAP entry sorting functions libbz2 libbz2 (3) - library for block-sorting data compression look look (1) - find words in the system dictionary or lines in a sorted list qsort qsort (3c) - quick sort sort sort (1) - sort, merge, or sequence check text files sortbib sortbib (1) - sort a bibliographic database tsort tsort (1) - topological sort ...Exercises
- Add a configuration property that makes the service also rebuild the nroffed versions of the manual pages, if set to true.
- Make the service regenerate only in the case that components in the path have changed.
Tie knot: Knot 54 (Hanover).
(2005-03-29 13:31:58.0) Permalink Comments [4]smf(5): manifest editing assistance
We had a productive wrap up meeting for Solaris 10 Platinum Beta last week, with lots of good feedback on
smf(5). One point raised is that few people like to hand-edit XML—or, maybe, many people hate to—so
tools for composing service manifests are needed. We'll need to percolate on how best to improve or extend the current
set of tools, but there are a few tricks out there already.
A bad manifest. Let's take a well-formed and valid manifest file and add the nonsensical line
<french_fry>I,m a bad element.</french_fries>to simulate a developer making a composition error during service development. How do we determine that our manifest is now broken?
svccfg(1M) validation. As I mentioned, the basic tools aren't helpful. The logical svccfg(1M) subcommand to check a manifest for correctness is validate. Its output on our manifest is
$ svccfg validate /tmp/gdm2-login.xml svccfg: couldn't parse documentwhich accurately tells us the manifest is broken but does not indicate how (at all).
xmllint(1). The XML parser implementation of svccfg(1M) is the GNOME libxml2, which includes a general validation tool in the form of xmllint(1). If we
invoke this command with its --valid long option, we get
$ xmllint --valid /tmp/gdm2-login.xml
/tmp/gdm2-login.xml:26: parser error : Opening and ending tag mismatch: french_fry line 26 and french_fries
<french_fry>I,m a bad element.</french_fries>
^
which isn't validating the document, but is telling us where and how it is not well-formed.
Graphically clear. An interesting option is to use the jEdit editor, with its XML plugin. With our document, the XML plugin will validate on save and highlight the incorrect line with red underlining:

Moreover, the error window shows both the non-well-formedness and the invalid <french_fry> element
(which is absent from the non-fast food-oriented service bundle DTD).

So we see both the immediate and the deeper error, plus the plugin highlights matching tags and provides completion menus for tag selection. Civilization to most, I expect.
I happily use vim for development, but it's important to note the value in jEdit just from using a different XML
implementation. Using other tools to ease your composition of service descriptions (or profiles)? Let us know—and, rest assured, we're working to make that svccfg(1M) output more useful.
Tie knot: Knot 6 (Victoria).
(2005-03-14 18:39:43.0) Permalinksmf(5) on /.
smf(5) ended up in two stories on slashdot today. In "Torvalds on Opening Solaris",
elmegil observed
I'm rather amused to see Sun be the first to implement a replacement for the old init and have it done. I can't say I know who thought it up first, but Solaris 10 SMF is the first working implementation I'm aware of that's going to get any kind of wide deployment. I saw some linux-head saying this needed to be done a year or more ago, but I can't even find their website in google now. And obviously if Solaris has it now, the implementation started a while back (probably more than a year)...
I suspect elmegil is referring to Seth Nickell's System Services work, which ended up being discussed in an article on osnews.com in October 2003. But there is other work in the parallel startup area that's been cited on slashdot, and elsewhere.
The second story is "A Diagnosis of Self-Healing Systems", which is a discussion around Mike Shapiro's recent overview in ACM Queue of the problems we're working to solve in the Predictive Self-Healing effort. The comments range across a number of topics in deployment and architecture, but I was interested in the observations that self-healing in a general purpose system is a different proposition than in a limited purpose system. (I probably would also contrast open software systems and closed ones—perhaps a distinction of the past.)
(2004-12-21 21:01:10.0) PermalinkPersonal restarters and ctrun(1)
(I know I need to wrap up the cal(1) contest. I also need to finish about three smf(5) blog entries. I am also mostly keeping up with the forum/list/newsgroup traffic on smf(5). I also have a few more bugfixes to get into Solaris 10 first. But a small dispatch seems necessary.)
One of the neat things about GNOME is that it provides restarter operations for specified applications—your application
references a bad address and dies, it gets restarted. I'm pretty attached to ion, however, and don't really need all of the GNOME environment all the time. But I do want restart for a couple of applications in my session.
For instance, I wrote a little C program, osdclock, using libxosd that displays the date, the time, and those of my mail folders containing new mail. It looks like this:
Occasionally, and for reasons I lack the time to debug, new ion workspaces will obscure the on-screen display. Rather
than write code into osdclock to call exec(2) (of itself) on a received signal, it's easier to use the new
commands associated with the contracts subsystem of Solaris 10, specifically the ctrun(1) command. By adding
the invocation
/usr/bin/ctrun -r 0 -o noorphan -f signal,hwerr $HOME/bin/osdclock &
to my .xinitrc file, I get an osdclock that is restarted from any fatal external signal (or an uninterceptable hardware error of some kind), but is not restarted if a core file is generated (from a software error) or if the ctrun process is itself killed (like on session exit). So I avoid a home directory filled with core files. (You are using coreadm(1M) to have a meaningful core dump pattern, right?)
Since the cause of restart is ctrun's awareness of the process contract becoming empty, this same ctrun invocation could be used with an application that prefers to daemonize, like a personal web proxy. It could also be used to set up a restartable group of processes, like an application suite or widget collection. (ctrun has some other interesting options, and is a versatile lightweight restarter on its own—try it out.)
Having applications always available leads to predictable computing, but I think a more dramatic way to express it in this medium (but imagine appropriate stormy weather sound effects) would be
<mad_scientist>Restarters, restarters everywhere! <laughter glee="maniacal" /><mad_scientist>(2004-12-01 23:34:50.0) Permalink