Thursday Mar 31, 2005

While it is far from new news, webmin (1.190 or better) now groks smf(5), thanks to Sun's own Alan Maguire. Particularly cool is the tantalizing "Create New Service" button. It leads you through a set of questions which help you to create a simple daemon-style service. Two hints: it drops the manifest in /etc/webmin/smf/manifest.xml, and you need to use "Add" on each screen to get the changes to take ("Next" isn't sufficient). You can see my "foo" and "bar" services on my laptop in the screenshot which I quickly created with webmin. John Clingan wrote more about how to get the new version up and running. Alan says he's happy to get feedback about the smf(5) functionality he's added to webmin: you can mail him at firstname.lastname@sun.com, just like the rest of us.

Monday Mar 28, 2005

I've posted the current slide set we're using for smf(5) presentations here at the mediacast.sun.com site. The presentation is currently focused on using and administering a Solaris system with smf(5). There are only a few developer slides, but I expect to post a more comprehensive presentation about smf(5) development eventually.

If you're near the San Francisco bay area and are interested in seeing me talk to some variant of these slides live and in person, plan to attend the inaugural meeting of the OpenSolaris User Group on April 26th, 2005 at the Sun Santa Clara campus. I'll post more details as I get them, so watch this space.

Wednesday Mar 23, 2005

Discovering services available on your system is really easy. A few interesting questions to ask:

  1. What services are enabled/running? svcs(1) with no options answers that easily:

    $ svcs
    ...
    online         Feb_04   svc:/network/ntp:default
    online         Feb_04   svc:/network/service:default
    online         Feb_04   svc:/application/x11/xfs:default
    online         Feb_04   svc:/application/font/stfsloader:default
    ...
    
  2. What services are available? Just ask svcs(1) to list all services, including the disabled ones:

    $ svcs -a
    disabled       Feb_04   svc:/system/metainit:default
    disabled       Feb_04   svc:/network/rpc/nisplus:default
    disabled       Feb_04   svc:/network/nis/server:default
    
  3. What do these available services do anyways? Again, just ask svcs(1). This time, get the service description too:

    $ svcs -a -o FMRI,DESC
    svc:/milestone/name-services:default               name services milestone
    svc:/platform/i86pc/kdmconfig:default              Display configuration
    svc:/system/cron:default                           clock daemon (cron)
    
  4. And how do I find out more about the service I'm interested in? svcs gives useful information with both the -x and -l options. The manpage references in svcs -x are particularly helpful. We'll be adding those to the -l output as well.

    $ svcs -x system-log
    svc:/system/system-log:default (system log)
     State: online since Fri Feb 04 19:30:11 2005
       See: syslogd(1M)
       See: /var/svc/log/system-system-log:default.log
    Impact: None.
    
    $ svcs -l system-log
    fmri         svc:/system/system-log:default
    name         system log
    enabled      true
    state        online
    next_state   none
    state_time   Fri Feb 04 19:30:11 2005
    logfile      /var/svc/log/system-system-log:default.log
    restarter    svc:/system/svc/restarter:default
    contract_id  51 
    dependency   require_all/none svc:/milestone/sysconfig (online)
    dependency   require_all/none svc:/system/filesystem/local (online)
    dependency   optional_all/none svc:/system/filesystem/autofs (online)
    dependency   require_all/none svc:/milestone/name-services (online)
    

Wednesday Mar 16, 2005

In addition to trying to improve the service deployment and administration model in Solaris 10, smf(5) (also known as Solaris Service Manager) works hand-in-hand with the Solaris Fault Manager (also known as fmd(1M)) to isolate and recover from faults1. The Fault Manager handles detecting and predicting hardware faults, including retiring bad hardware when faults are predicited/detected. That's a very simplistic description of a very sophisticated suite of software, but hopefully it is enough for me to continue the smf(5) part of the discussion.

In earlier versions of Solaris, we could detect hardware faults, but not always recover from them. I'll focus on memory errors here, which can occur either on your physical memory, or on the cache that's part of the CPU module. Either way, memory can go bad. It can generate either a correctable error, or an uncorrectable one. Solaris has always recovered gracefully from correctable errors. They're handled by the kernel and never seen by a user process. But, uncorrectable ones mean that we can't find a good copy of the data. The error can occur either in the kernel's address space or in a user process's address space. An error in kernel address space means that we need to panic the kernel immediately. An error in user space can be dealt with more gracefully. As we know which process the error effected, we can kill it before it causes any more damage. However, what we didn't know in previous versions of Solaris were the relationships between user processes. As we didn't know if the corrupted/absent memory in one process would cause corruption in another process which was cooperating very closely with the one that received the error, we had to gracefully (via init 6) take the entire system down.

In Solaris 10, fmd(1M) can take hardware that's about to have a failure offline in advance of that failure, or after that failure occurs. But, when a failure does slip through it is smf(5)'s job to know the relationships between processes/services on the system. There are two main types of relationships:

  1. processes part of the same service / fault boundary, and

  2. services which depend upon each other.

To track processes as part of the same service, the smf(5) restarters write process(4) contracts to be able to receive events on a group of related processes. Certain types of events can be classified as important:

  • empty - the last member of a process was killed

  • fork - a new process was added to the contract

  • exit - a member of the contract exited

  • core - a process dumped core

  • signal - a process received a fatal signal

  • hwerr - a process was killed due to an uncorrectable hardware error

Each of these events is detected by the kernel, and then passed on to the contract owner. In the specific case of hwerr, if an uncorrectable hardware error does occur in a user process the kernel detects it and kills the process where the error occurs, just like in Solaris 9. What differs in Solaris 10 is that we no longer need to restart the system -- with smf(5) and contracts, we can just restart the "associated processes".

I was planning on a separate post about contracts, but it seems like much of the back information is necessary to explain the architecture here. Bear with me. Maybe I'll write a followup post which gives only the higher level view of our fault isolation features in Solaris 10.

Contracts are written with three types of event sets: informative, critical, and fatal. Informative and critical only differ really in the guarantees about event delivery. Fatal means we kill off all processes in the contract if a fatal event is received. smf(5) puts the hwerr event into the critical event set. A few things to look at here. First, I can find out about contract and process relationships using:

   $ ptree -c `pgrep sendmail`
   [process contract 1]
     1     /sbin/init
       [process contract 4]
         7     /lib/svc/bin/svc.startd
           [process contract 513]
             18676 /usr/lib/sendmail -Ac -q15m
             18678 /usr/lib/sendmail -bd -q15m

You can see that sendmail is in contract 513. Using that information, you can look at the terms of the contract:

   $ ctstat -vi 513
      CTID    ZONEID  TYPE    STATE   HOLDER  EVENTS  QTIME   NTIME   
      513     0       process owned   7       0       -       -       
              cookie:                0x20
              informative event set: none
              critical event set:    hwerr empty
              fatal event set:       none
              parameter set:         inherit regent
              member processes:      18676 18678
              inherited contracts:   none

That output confirms what I described: hwerr is in the critical event set. If there's a hwerr in either of the sendmail processes, the contract owner (7, svc.startd as you see above) will get a critical error. svc.startd then responds to the error by stopping the service, and restarting it if possible. Thus, when an uncorrectable memory error occurs in a process managed as an smf(5) service, smf(5) is able to detect an uncorrectable memory error in a process, and repair it by restarting the service2. That handles the first relationship type I described above -- processes related as part of the same service / fault boundary. So, how about service relationships?

Service relationships are managed by smf(5) dependencies. Most dependencies are used to specify startup order, by using grouping=require_all and restart_on=none. However, you can also specify that a service is restarted if its dependency experiences any type of error (hardware error, core dump, etc.). You do this by using restart_on=error as opposed to none. Then when the dependency is restarted due to that error, your dependent service will be too. Pretty simple.

The astute observers will note that I haven't described how those nasty uncorrectable errors are handled for processes that aren't explicitly part of an smf(5) service. How does Solaris know what to do if you didn't write a service manifest to describe how faults should be handled?

All processes are part of a process contract. If no software creates a new contract, the process is in the same contract as its parent. The default terms for a contract are not the same as what svc.startd uses. Instead, the default process contract is written such that hardware errors are fatal. Remember, that means all processes in the contract are killed if any process sees an uncorrectable memory error. svc.startd also helpfully puts each legacy-run service in its own contract. Thus, if any processes launched out of a legacy-run service (e.g. vold or dtlogin) fall victim to an uncorrectable memory error, all processes in the contract will be killed.

   $ ptree -c `pgrep vold`
   [process contract 81]
     481   /usr/sbin/vold
   $ ctstat -vi 81 
   CTID    ZONEID  TYPE    STATE   HOLDER  EVENTS  QTIME   NTIME   
   81      0       process orphan  -       0       -       -       
           cookie:                0
           informative event set: core signal
           critical event set:    hwerr empty
           fatal event set:       hwerr
           parameter set:         none
           member processes:      481
           inherited contracts:   none

Note that for vold's process, hwerr is in the fatal event set. But, since there's no service manifest to tell Solaris how to deal with the legacy-run service, we can't restart it. That's one of the reasons why even though we do provide compatibility for legacy services, we strongly suggest folks take the time to do a quick conversion of their service to smf(5).

Finally, what does this mean for hardware faults inside zones? As a zone doesn't have a kernel of its own, an uncorrectable memory error in the kernel still means that the entire system goes down. However, each zone has its own copy of smf(5) inside which is completely separate from the other zones on the system. As smf(5) runs inside the zone as well, faults are handled inside the local zone the same was as they are in the global zone. There's no need to isolate the fault to the zone because we isolate the fault to a finer granularity -- the service. smf(5) and zones are highly complementary technologies.


1Mike covers the topic of self-healing systems and Solaris' approach to self-healing in greater detail in his ACM Queue article. Read it if you want a more comprehensive architectural view, rather than the smf(5) implementation/day-to-day use view I provide here.

2If you've specified the following with your service manifest, you've told smf(5) that you don't care about what happens to the processes that your start method starts up.

   <property_group name='startd' type='framework'>
      <propval name='duration' type='astring' value='transient' />
   </property_group>

We provided this functionality for configuration services which need to tell smf(5) that they don't have processes that need to be restarted if they fail. Basically, no processes in the contract isn't an error. But, this has also (understandably) been abused to shoehorn legacy services which may or may not have processes running when their start method exits into smf(5). Even Sun is guilty of some of these. svc:/network/initial may start up a number of daemons on your Solaris 10 system, but you don't see them under svcs -p. That's because the duration property is set to transient. You can see this with:

   $ svcprop -p startd/duration network/initial
   transient

svc.startd believes there are no important processes to worry about restarting, so it doesn't track them under svcs -p, and won't restart the service if one of the processes is killed due to an uncorrectable memory error. We're properly ashamed of the partial conversion that was done with network/initial and a few other services, and are working on fixing them. But, if you want the processes in your service to be restarted on failure, don't set startd/duration to transient.

Thursday Mar 10, 2005

Gavin's talking about Solaris and its error handling in his blog. Go, read.

His most recent entry talks about the philosophy behind error handling, but his previous post is nice too (though, I'm perhaps just narcissistic and happy to see SMF get a mention), and I'm already looking forward to future posts. If his internal communication is anything to go by, I'll often be reading while thinking "hey, that's a much clearer description of what I was trying to convey".

Tuesday Mar 08, 2005

I'm still delinquent on a number of smf(5) entries. But, here's a quick one that's at least smf(5)-related. Based on a few pieces of internal mail, it seems like lots of folks out there are asking about the new filesystems that appeared on Solaris 10 when they typed mount. An excerpt from one of my systems:

/ on /dev/dsk/c1d0s0 read/write/setuid/devices/intr/largefiles/logging/xattr/onerror=panic/dev=1980040 ...
/devices on /devices read/write/setuid/devices/dev=4380000 on Fri Feb  4 19:29:50 2005
/system/contract on ctfs read/write/setuid/devices/dev=43c0001 on Fri Feb  4 19:29:50 2005
/proc on proc read/write/setuid/devices/dev=4400000 on Fri Feb  4 19:29:50 2005
/etc/mnttab on mnttab read/write/setuid/devices/dev=4440001 on Fri Feb  4 19:29:50 2005
/etc/svc/volatile on swap read/write/setuid/devices/xattr/dev=4480001 on Fri Feb  4 19:29:50 2005
/system/object on objfs read/write/setuid/devices/dev=44c0001 on Fri Feb  4 19:29:50 2005
/lib/libc.so.1 on /usr/lib/libc/libc_hwcap2.so.1 read/write/setuid/devices/dev=1980040 on Fri ...
/dev/fd on fd read/write/setuid/devices/dev=4680001 on Fri Feb  4 19:29:58 2005
/tmp on swap read/write/setuid/devices/xattr/dev=4480002 on Fri Feb  4 19:29:59 2005
/var/run on swap read/write/setuid/devices/xattr/dev=4480003 on Fri Feb  4 19:29:59 2005

Most of the new mounts we've added this release are dynamic filesystems which reflect kernel state. These include ctfs(7FS) (used extensively by smf(5)) and objfs(7FS). Like procfs, they're truly dynamic and are generated by the kernel on each boot. There's no need to include them in system backups.

The libc loopback mount is pretty nifty. Once Solaris has booted far enough, it looks at the hardware capabilities of the system, including what instruction sets it supports. Then it loopback mounts a customized libc which can take advantage of all the performance of the specific chip we're using (if such a customized library is available). Right now, we use this on a set of x86 and x64 systems. See moe(1) for more info on the $HWCAP capabilities. Darren also talks about this in more detail.

devfs(7FS) makes the /devices namespace fully dynamic. If you aren't doing things like using chmod(1) to change device permissions, there's no need to back it up either. Recovery to a new system will be easier if you don't back it up. You can use /etc/minor_perm to specify device permissions without using chmod. See add_drv(1M) for more details.

This blog copyright 2009 by lianep