Tuesday September 28, 2004 | Tobin Coziahr's blog notes from an insomniac engineer |
|
SMF/ Predictive Self Healing: svcadm(1) Today we'll take a look at SMF's main administrative tool, enable/disable Let's start with a commonly used service, ssh. [straylight] % svcs network/ssh From another machine, we can verify that everything is running fine: [proxima-centauri] % ssh straylight As you can see, ssh is enabled and running on my machine. Now let's disable it. [straylight] % svcadm disable network/ssh Now the service reads as disabled, and we can see: [proxima-centauri] % ssh straylight ...that it actually is. Turning the service on is just as easy. [straylight] % svcadm enable network/ssh Note that it's now online. (The time changes each time we execute an administrative action) Enable and disable have some extra options that are quite useful. Say we enabled ssh, but that we noticed that it was offline: [straylight] % svcs network/ssh We know from my previous post that offline means that the service is enabled, but something it depends on is missing (either disabled, or offline). Let's see what is wrong: [straylight] % svcs -d network/ssh:default Ah hah, cryptosvc isn't enabled. You might have a service with lots of dependencies that are disabled, or you might have dependencies disabled many levels deep. Do you want to walk through all those services, find out why they're not on, and enable every dependency by hand? Of course you don't. So svcadm has a "recursive enable" option that goes through and enables everything that your service depends on. [straylight] % svcadm enable -r network/ssh As you can see, we recursively enabled not only ssh, but everything it depended on, allowing it to come online. One last option of note for enable/disable is the "temporary" option. Say that you want to enable/disable a service just for this session, but have it revert to its previous state on reboot, in case there are problems. If ssh is disabled and you issue: [straylight] % svcadm enable -t network/ssh The enable will only be temporary. If you reboot the machine, the service will once again be disabled. refresh Refresh serves two purposes. One is if you've changed any of the properties of your service, say that you've added a dependency or changed the timeout for starting, you refresh the service, and the properties become active. The other purpose is that there's an optional method, in addition to "start" and "stop", called "refresh" that you can define. If your daemon can be sent a HUP signal to re-read its configuration file, you put this in the refresh method, and when you refresh the service, this method is called. A good example of this is DHCP. If you change one of the parameters in dhcpsvc.conf, you issue: [straylight] % svcadm refresh network/dhcp-server ... and your changes become active. restart Restart is pretty self evident. Restarting a service means that you stop it and start it again. Where in the past you might have issued a /etc/init.d/sendmail stop followed by /etc/init.d/sendmail start, now you would use: [straylight] % svcadm restart network/smtp:sendmail ... which will restart sendmail. mark (degraded | maintenance) Mark is used to force a service into a certain state. (The states are here if you've forgotten them) An administrator might want to force a service into the maintenance state to let other administrators know that there's something wrong with it that needs to be addressed before it's started again. You can force a service into either maintenance (which will shut the service down) or degraded (which will leave it running, but let others know that it's running in a degraded state). Keeping with our earlier example of ssh: [straylight] % svcadm mark maintenance network/ssh [straylight] % svcs network/ssh clear Clear is used to "reset" the state of a service, and have it be re-evaluated. For example, say that syslog is in maintenance: [straylight] % svcs system/system-log You debug the problem, and realize that syslog failed to start because someone had accidentally deleted syslog.conf, which syslog needs to start. It attempted to start, saw that the conf file was missing, and fell into maintenance. You repair the file, and issue a clear: [straylight] % svcadm clear system/system-log Summary So now you know how to perform basic maintenance on a Solaris 10 machine using SMF. I hope it's clear that this system of administration is quite easy, and incredibly powerful. No longer do you have to hunt around for daemons and init scripts, every service is given a unique FMRI, administered through a unified framework. This, combined with explicit states and dependencies, gives administrators flexibility and power that is unavailable in other Unix distributions. My next post will be about manifests, which are the XML files used to describe each service. We'll examine a manifest in depth, and take a look at the properties and the dependencies that make it up. As always, questions and suggestions are welcome. (2004-09-28 02:33:14.0) Permalink Comments [8]SMF/ Predictive Self Healing: svcs(1) Perhaps the most often used tool in the SMF world is Used with no options, it simply lists all services that are enabled. Enabled means that the administrator wishes these services to be running. They may not be, because their dependencies are not met, they failed to start correctly, or some other reason. But they're the services that should be running. # svcs STATE STIME FMRI legacy_run Sep_17 lrc:/etc/rcS_d/S10pfil legacy_run Sep_17 lrc:/etc/rcS_d/S29wrsmcfg ... legacy_run Sep_17 lrc:/etc/rc2_d/S72autoinstall legacy_run Sep_17 lrc:/etc/rc2_d/S72directory ... legacy_run Sep_17 lrc:/etc/rc3_d/S84patchserver legacy_run Sep_17 lrc:/etc/rc3_d/S90samba online Sep_17 svc:/system/svc/restarter:default online Sep_17 svc:/network/loopback:default online Sep_17 svc:/network/physical:default online Sep_17 svc:/system/filesystem/root:default ... online Sep_17 svc:/network/ssh:default online Sep_17 svc:/system/coreadm:default ... online Sep_17 svc:/milestone/single-user:default online Sep_17 svc:/system/system-log:default online Sep_17 svc:/system/utmp:default online Sep_17 svc:/system/filesystem/local:default online Sep_17 svc:/milestone/name-services:default online Sep_17 svc:/network/inetd:default Now, I've edited this list quite a bit. As it stands now, a freshly installed Solaris machine will come up with 108 running services. I'm sure this number will change a bit before we ship. What can we see above? First, we see the legacy services, which I've mentioned in a previous post. These are the scripts that still exist in the rcX directories. For example, you can see above that Below this you start to see online SMF services, such as ssh and inetd. You'll also see "milestones", which are services that are simply lists of dependencies that represent a system state, such as "single-user", or "local filesystems" as being available. Let's take a look at some of the command line options that make svcs so powerful and useful. -a The -a option means "all". It shows all the services on a machine, whether they're enabled or not. # svcs -a STATE STIME FMRI legacy_run Sep_17 lrc:/etc/rcS_d/S10pfil legacy_run Sep_17 lrc:/etc/rcS_d/S29wrsmcfg ... disabled Sep_17 svc:/application/print/server:default disabled Sep_17 svc:/network/nfs/server:default disabled Sep_17 svc:/network/time:default disabled Sep_17 svc:/network/talk:default online Sep_17 svc:/system/svc/restarter:default online Sep_17 svc:/network/loopback:default online Sep_17 svc:/network/physical:default ... This listing will show you all of the services seen in Now we'll start to take a look at how using svcs can be useful in analyzing a service. -d The -d option to svcs shows which services this service depends on. Let's take a real world example. Say that for some reason, inetd isn't running on your machine, and you want to look at what it depends on. # svcs -d network/inetd STATE STIME FMRI online Sep_17 svc:/network/loopback:default online Sep_17 svc:/network/physical:default disabled Sep_17 svc:/network/rpc/bind:default online Sep_17 svc:/milestone/single-user:default online Sep_17 svc:/system/filesystem/local:default online Sep_17 svc:/milestone/name-services:default You can see above that inetd depends on networking, being in single user mode, local files, and name services. And also on rpcbind, which is disabled. You'll know that you need to enable rpcbind to get inetd to run. And since we have dynamic dependency checking with SMF, as soon as you enable rpcbind, inetd will come online. -D The -D option to # svcs -D network/rpc/bind STATE STIME FMRI disabled Sep_17 svc:/network/nis/server:default disabled Sep_17 svc:/network/nfs/client:default disabled Sep_17 svc:/network/rpc/bootparams:default disabled Sep_17 svc:/network/nfs/server:default ... online Sep_17 svc:/network/rpc/keyserv:default online Sep_17 svc:/network/inetd:default online Sep_17 svc:/network/nis/client:default online Sep_17 svc:/milestone/multi-user:default online 0:42:34 svc:/network/nfs/cbd:default online 0:42:34 svc:/network/nfs/mapid:default online 1:26:47 svc:/network/nfs/nlockmgr:default online 1:26:47 svc:/network/nfs/status:default ... Again, I've removed several lines for brevity, but you can see that there are several disabled services that depend on rpcbind, and quite a few online services. Including inetd, your ability to be a nis server, the multi-user milestone, and others. So you can see that disabling rpcbind would have a significant effect on your machine, and you're able to know *exactly* how your system will be affected. -l Say you want to know a general "overview" of one of your services. # svcs -l network/inetd:default fmri svc:/network/inetd:default enabled true state online next_state none restarter svc:/system/svc/restarter:default contract_id 43 dependency require_all/none svc:/milestone/single-user (online) dependency optional_all/error svc:/network/rpc/bind (online) dependency optional_all/error svc:/network/physical (online) dependency require_all/error svc:/system/filesystem/local (online) dependency require_all/error svc:/network/loopback (online) # svcs -l network/telnet:default fmri svc:/network/telnet:default enabled true state online next_state none restarter svc:/network/inetd:default contract_id 128 The two examples above show inetd and telnet. You can see that they're both enabled and online. You can also see that inetd is restarted by the master restarter, but telnet is restarted by inetd. inetd has several dependencies, while telnet has none (other than inetd running). You'll notice that inetd has two different types of dependencies, optional and require. While I'll go into types of dependencies in another post, it's helpful to note that "require" means that that dependency must be online, while "optional" means that it has to be online only if it's enabled. -p Sometimes you might want to know what processes are controlled by a service. The -p (process) option gives you the pid, start time, and name of all the processes started by a service. For example, sendmail: # svcs -p network/smtp:sendmail
STATE STIME FMRI
online Sep_17 svc:/network/smtp:sendmail
Sep_17 452 sendmail
Sep_17 453 sendmail
You can see above that sendmail has two processes on my machine, both started on Sept 17th. One last example I'll show is how svcs can be useful for scripting purposes. -H, -o The -H option means "don't show column headings", and -o is used to pick output columns. Say you wanted to write a perl script that took services in a certain state, and performed an action upon them. [straylight] % svcs -H -o state,fmri legacy_run lrc:/etc/rcS_d/S10pfil legacy_run lrc:/etc/rcS_d/S29wrsmcfg legacy_run lrc:/etc/rcS_d/S55fdevattach ... online svc:/system/svc/restarter:default online svc:/network/loopback:default online svc:/network/physical:default online svc:/system/filesystem/root:default ... This way, you're telling Next time, we'll take a look at SMF/Predictive Self Healing: Graphing service dependencies I don't have the next tutorial entry ready yet, but Stephen posted something so cool on his blog that I had to show it to you all. Now that every service on a system is an entity with dependencies, one of the side benefits is that you can actually chart what your system looks like, graphically. Below is a chart from Stephen's machine, representing not only all of his running services and their dependencies, but a right-to-left timeline of the boot sequence of his machine:
I really can't get over how cool this is. Stephen gives a nice overview of the features of this graph in his blog entry, so I won't reproduce them here. But go check it out. SMF is letting us do some amazing things. SMF/Predictive Self Healing Overview: Part 2 Continuing on with the overview, we're going to cover how services actually get on your Solaris machine, a few more basic concepts, and give a brief outline of what system administration is like under SMF. For those of you just joining, part 1 of the overview is here. Manifests An SMF manifest is an XML file describing a service. All of the manifests in the system are stored in On boot, In order to create an SMF service, a user need simply to create an XML file describing it, and import it. We've labored to make these manifests incredibly simple to use. In most cases, all you need to do is determine what your service depends on, and how to start and stop it, cut and paste that into an XML file, and you're finished. For a few minutes of work, you get all the benefits of SMF, including parallel booting of your service, dynamic dependency checking, and automatic restarting on failure. I'll be dedicating an entire post later on to the process of converting a service to SMF, since it's critical that users understand how simple it can be. This leads us to: Compatibility Take a deep breath, and read this a few times: *Everything still works* Most users of Solaris have their pet scripts and services that they've carefully honed over time, and don't want to part with. While we'd like you to take advantage of the benefits of SMF and convert your services, you by no means have to. All of the scripts in If you look at the states I mentioned in my previous post, you'll see one called On the development end, we've converted a great number of the standard Solaris services already. Once you install Solaris 10, you'll notice that the Administrative Interfaces Ah, the heart and soul of it. We've put a lot of time and effort into making the administration of a Solaris machine with SMF as painless as possible. Once you start to play with our tools and see what's really possible, I think you'll be a convert as well. No longer will you have to be grepping for processes in lists, wondering if they're running or not, hunting for configuration files, et cetera. Administration of SMF services is all done through a central interface, allowing you to observe the state of services, their dependencies, their properties, and make changes to your services quite easily. The SMF CLI tools are as follows:
Each of these are so commonly used and important that I'll be dedicating a post to each of them in the coming days, including real-world examples from administering my own personal Solaris 10 desktop. There's also a set of programming interfaces in a library called I plan to move on from here to descriptions and examples of the administrative tools, and then how to convert a legacy service to SMF. As always, any requests or questions should be posted to the comments section, or emailed to me. SMF/Predictive Self Healing Overview Predictive Self Healing is an architectural framework made up of several pieces. The one that I've been working on is SMF, or Service Management Facility. It's an infrastructure that provides several functions: 1) Defining services for Solaris, which can be the state of a device, a running application, or a set of other services. Each service is referred to by a unique identifier. 2) A formal relationship between services, with explicit dependencies. 3) Automatic starting and restarting of services. 4) A repository to store service state and configuration properties (negating the need for dozens of configuration files scattered throughout the system. The "thousand mile view" of SMF is that the system is managed by a master "restarter" named Let's look at the pieces of SMF a little closer. Services A service is the fundamental unit of SMF. Each service can have one or more instances, which is a specific configuration of a service. For example, Apache is a service. An Apache daemon configured to serve www.sun.com on port 80 would be an instance of that service. Apache could have several instances, all with different configurations. The service holds basic configuration properties that are inherited by each of its instances, but each instance can override configuration properties, as needed. There are also special services called milestones. These are a service that correspond to a specific system state, such as "basic networking" or "local filesystems available". They are basically a list of other services, and they're considered to be online when each of their component parts is online. Each service is identified with an FMRI, or Fault Management Resource Identifier. It's the unique identifier representing a service, or instance. For example, the FMRIs can be a bit of a handful to type, so you'll find that most SMF commands will accept the "shortened" versions of a service's FMRI, given that it only has one instance. For example, most utilities will accept You will have noticed that telnet is preceded with the word network. SMF contains several categories for services, to provide organization and uniqueness of naming. The standard categories are:
States Each service on the machine is always in one of seven discrete states, observable by the SMF CLI tools. The possible states of each service are:
That's about enough typing for me tonight. Next time we'll start to look at how services are described, and how you administer the system using SMF. As usual, if you have any questions, please feel free to ask them in the comments section. (2004-09-14 01:49:18.0) Permalink Comments [1] The time has come, the walrus said... Right, then. This blog will primarily be a space for me to discuss the work I'm doing at Sun, hopefully spread some information with examples of powerful but potentially confusing new technologies, and occasionally ramble on about things that may or may not be of interest. I've been at Sun since July of 2000, when I started in a group called Internet Engineering, and has through several permutations become more generally Solaris Networking and Security Technologies. So I guess I'm a Network Engineer(tm). I worked on DHCP primarily at first, and then moved on to a short lived project titled EIC, which mutated into Greenline, which mutated into I've been working on Greenline/smf/Predictive Self Healing for the last two years. I'm a firm believer in the power and potential of it, and you'll be hearing me evangelize about it quite a bit in the days to come. SMF, the Service Manager side of Predictive Self Healing, basically objectifies and defines every service on a Solaris machine, charting its dependencies and storing its configuration properties, and allows the administrator to observe and manage the services much more effectively. It also provides automatic restarting of services upon failure. It's going to change (for the better) the system administrator's experience in administering Solaris machines, provide a (much faster) parallelized boot, and greatly assist in error diagnosis and fault recovery. As you can see, I'm a big fan. And I plan to convert as many of you as humanly possible. Anyway, for the boring background bits, I was an Air Force brat who's lived in 13 different states, and I got my degree in Computer Science with a minor in Engineering Studies from Carnegie Mellon, in Pittsburgh. Unlike nearly everyone else who's lived there, I loved Pittsburgh, with all its ugly little warts. I read books voraciously, I'm a libertarian (possibly the only one in California, much to my chagrin), and I really, really like pirates. Soon I'll begin picking random bits of SMF to highlight, but if you're reading this and you have any questions about how this will affect your private cache of scrips, or |
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||