Thursday July 29, 2004 Another quiet week
There's a sizeable troupe of kernel engineers at the O'Reilly Open Source Convention this week, so once again the office is a little quieter. (Bryan's around, so the office isn't silent.) Most of the blog entries are "the author has left the building" style, but I'm hoping that those of us who remain chained to our keyboards will get an opportunity to read about how the conference is progressing, particularly regarding tonight's birds-of-a-feather session on opening Solaris source.
(2004-07-29 09:37:34.0) PermalinkThe tantalizing aroma of svcs(1)
A side point: svcs(1) is pretty fast. Our example from last week was
$ svcs \*milestone\* STATE STIME FMRI online Jul_23 svc:/milestone/devices:default online Jul_23 svc:/milestone/single-user:default online Jul_23 svc:/milestone/name-services:default online Jul_23 svc:/milestone/multi-user:default online Jul_23 svc:/milestone/multi-user-server:default
Other approaches to service management in Unix-like systems ask each service for their status.
On a large system, with a complete representation of its running daemons as services, this can be
thousands of fork(2)/exec(2) pairs. That's not how smf(5)
works, and so a command like svcs(1) can be quick:
$ time svcs \*milestone\* STATE STIME FMRI online Jul_23 svc:/milestone/devices:default online Jul_23 svc:/milestone/single-user:default online Jul_23 svc:/milestone/name-services:default online Jul_23 svc:/milestone/multi-user:default online Jul_23 svc:/milestone/multi-user-server:default real 0m0.027s user 0m0.004s sys 0m0.009s
In fact, we're not even calling fork(2) to get this report on service status:
$ truss -t fork,write,exit svcs \*milestone\* STATE STIME FMRI write(1, " S T A T E ".., 29) = 29 online Jul_23 svc:/milestone/devices:default write(1, " o n l i n e ".., 55) = 55 online Jul_23 svc:/milestone/single-user:default write(1, " o n l i n e ".., 59) = 59 online Jul_23 svc:/milestone/name-services:default write(1, " o n l i n e ".., 61) = 61 online Jul_23 svc:/milestone/multi-user:default write(1, " o n l i n e ".., 58) = 58 online Jul_23 svc:/milestone/multi-user-server:default write(1, " o n l i n e ".., 65) = 65 _exit(0)
I'll start drawing a suitable architecture diagram...
(2004-07-27 09:32:17.0) PermalinkEnabling and disabling services
I thought I would show a different example of smf(5) today. Here's
the state of the network/telnet service on my desktop:
# svcs -p network/telnet:default STATE STIME FMRI online Jul_23 svc:/network/telnet:default
It's easy to enable and disable service instances using svcadm(1M):
# svcadm disable network/telnet # svcs -p network/telnet:default STATE STIME FMRI disabled 13:08:15 svc:/network/telnet:default # telnet localhost Trying 127.0.0.1... telnet: Unable to connect to remote host: Connection refused
And we can enable it just as easily, too:
# svcadm enable network/telnet # svcs -p network/telnet:default STATE STIME FMRI online 13:08:29 svc:/network/telnet:default
Note that while something is declaring telnet service is available, no processes
are associated with the service instance. If we "telnet localhost" from
another window, we can then see the telnet daemon and the login session:
# svcs -p network/telnet:default
STATE STIME FMRI
online 13:08:52 svc:/network/telnet:default
13:08:52 116400 in.telnetd
13:08:52 116403 login
Support for enabling and disabling services can be done by calling smf_enable_instance(3SCF) and smf_disable_instance(3SCF),
in addition to the command line interface of
svcadm(1M). Since the framework relays the enable or disable request, we
don't need privileges to signal any process (or even know which process we might have to
signal to make the update...).
A peek while bug flushing
With only a few days of exposure on the varied system configurations around here, there
have been a few bugs raised against smf(5), the new service management facility.
(I suppose it's similar to a pack of hounds flushing pheasants during a hunt (although, ultimately, whom the hounds chase and whom a gun is pointed at does differ from a hunt).) What's more exciting is that, as the kinks get smoothed out, people are instead starting to discuss possibilities. But I thought I'd show a little tiny piece of output instead. Here's the output of the new services listing command, svcs(1), looking only at the major milestones of the startup process:
$ svcs \*milestone\* STATE STIME FMRI online 11:03:56 svc:/milestone/devices:default online 11:04:04 svc:/milestone/single-user:default online 11:04:07 svc:/milestone/name-services:default online 11:04:11 svc:/milestone/multi-user:default online 11:04:18 svc:/milestone/multi-user-server:default
What name services am I running? Examine the name services milestone more closely:
$ svcs -d milestone/name-services:default STATE STIME FMRI disabled 11:03:51 svc:/network/ldap/client:default disabled 11:03:51 svc:/network/nis/server:default disabled 11:03:51 svc:/network/rpc/nisplus:default online 11:04:04 svc:/network/nis/client:default online 11:04:07 svc:/network/dns/client:default
NIS, with a bit of DNS for seasoning. What's it take to be a NIS client these days?
$ svcs -p svc:/network/nis/client:default
STATE STIME FMRI
online 11:04:04 svc:/network/nis/client:default
11:04:04 100202 ypbind
(But you knew that already.) More later.
(2004-07-22 11:47:19.0) PermalinkChecklists navigated
The project that I mentioned a little while back navigated our engineering processes (or, rather, we steered it through them) and integrated into the Solaris development release last week. Now it's getting a real shakedown as more and more people in the company get access to the bits. And it will hit the various Beta programs and Software Express shortly.
They've found a few bumps and rough edges, but nothing we can't address. We're even finding that it's straightforward to keep the hardened machines hardened. And everyone's moving fast again, as we don't have to keep our many, many changes in our heads anymore—people will tell us directly what's wrong/imperfect/improvable.
The project? It's the service management facility, which is the other major technology comprising S10's Predictive Self-Healing feature.
(2004-07-20 11:08:10.0) PermalinkHow not to communicate novelty
Adam seems to have recovered from his initial embarrassment regarding the alleged lack of novelty around describing a Solaris 9 feature. There's no such shame here—I look at S9 at one of our Dangerfield releases (along with S7), that didn't get the respect it deserved. (There's no comparison to S10.)
I mentioned in a previous entry that I wasn't particularly proud of how I had talked about S9RM. In this vein, I dug up a paper I wrote for SUPerG 2001 in Amsterdam. SUPerG is a Sun conference for datacenter customers, and focusses a lot on best practices for large Solaris systems. I was pretty giddy after S9RM wrapped up, and wrote a paper to present there, on the various mechanisms we envisioned and were in the process of implementing. It was received very quietly.
While I was writing the paper, I was trying out various text analyzers. One that I used was the Lingua::EN::Fathom module, available at CPAN. The results?
13 $ perl fathom.pl superg-2001-paper.ltx [ ... vocabulary list elided ... ] Number of characters : 19918 Number of words : 2865 Percent of complex words : 26.21 Average syllables per word : 1.9763 Number of sentences : 104 Average words per sentence : 27.5481 Number of text lines : 353 Number of blank lines : 101 Number of paragraphs : 69 READABILITY INDICES Fog : 21.5044 Flesch : 11.6817 Flesch-Kincaid : 18.4737
The Fog scale informally corresponds to the number of years of education an average reader needs to read the text once and understand it. (21.5 is somewhere in graduate school.) The Flesch scale rates text on a 100 point scale; higher is better, with 60 being a reasonable target. (It's safe to say that 11 is not in the vicinity of 60.) The Flesch-Kincaid is meant to correlate roughly with the U.S. school grade: 18 (graduate school again) is bad. The indices agree: this text is not good, or clear, writing.
So this document is a pretty solid indicator that, indeed, I didn't do a good job explaining the value of resource management. For posterity, I'm making the paper available. Now I work harder on not having sentences that are more than twenty seven words long, or use 25% complex words—and if there's a resource management topic you would like to see examined, feel free to tell me and I'll try to write something understandable.
And, yes, Tim and Andy are blameless.
(2004-07-12 17:49:27.0) PermalinkEndorsing publicradiofan.com
I stumbled across publicradiofan.com, amongst all of the various useless internet radio sites out there. (Radio reception inside my building isn't great.) This site is a gem. It lets you select sites by show, by category, by media streams provided, and more. Program grids, grouped listings, adjustable time windows. Very solid.
(The media stream filter is very useful if you're running Solaris x86, as your streaming options are somewhat limited. I've decided that if a local public radio station isn't willing to provide MP3 or OggVorbis streams then I won't support them. And I was pleased to see that some of the stations that I support do indeed provide "open" streams. And some don't—a decision to make?)
Recommended.
(2004-07-08 10:45:07.0) PermalinkWhy projects? Why not?
One of the questions I often ask myself is "why aren't more sites using projects?". As I wander from forum to forum, I regularly
see people saying, "I want to consolidate three [application server] instances on my system"—or two [database] instances or n
applications. Many of these applications need to run with identical credentials (user id, group id, authorizations, privileges, etc.) and
are only distinguishable by their working directory, environment variables, or the like. Reading these requests is a bit frustrating, as this scenario is one of the key motivations we had when introducing the project(4) database—and I can only conclude that it's my failure to really communicate its utility.
Projects let you assign a label with a specific workload. In S8 6/00 and all subsequent releases, you can explicitly launch a workload with its
appropriate project using the newtask(1) command. If extended accounting has been activated using acctadm(1M) with one
of the standard record groupings, then the processes within that workload will include their project ID. Writing an accounting record on every
process exit can impact some workloads, so you can optionally choose to only write records when every task exits. A task is a new process collective that groups related work within a workload (so it could be a workload component, like a batch submission). acctadm(1M)
will report on the current status of the extended accounting subsystem, if invoked without arguments:
$ acctadm
Task accounting: inactive
Task accounting file: none
Tracked task resources: none
Untracked task resources: extended
Process accounting: inactive
Process accounting file: none
Tracked process resources: none
Untracked process resources: extended,host,mstate
Flow accounting: inactive
Flow accounting file: none
Tracked flow resources: none
Untracked flow resources: extended
The resource line is reporting what accounting resource groups and resources we can include in each record. We can expand the resource groups for each type of accounting using the -r option.
$ acctadm -r process: extended pid,uid,gid,cpu,time,command,tty,projid,taskid,ancpid,wait-status,zone,flag basic pid,uid,gid,cpu,time,command,tty,flag task: extended taskid,projid,cpu,time,host,mstate,anctaskid,zone basic taskid,projid,cpu,time flow: extended saddr,daddr,sport,dport,proto,dsfield,nbytes,npkts,action,ctime,lseen,projid,uid basic saddr,daddr,sport,dport,proto,nbytes,npkts,action
So we can enable the extended task record by invoking acctadm(1M) like
# acctadm -e extended task # acctadm -E task # acctadm -f /var/adm/exacct/task
In S10, you can optionally enable accounting without having it write to a file, such that the records are retrievable using
getacct(2).
Of course, that's all about accounting, but projects are useful even if you're not interested in the long term resource
consumption of your workloads. The project ID is useful for isolating your workload using conventional /proc-based tools
like prstat(1M) and pgrep(1), as well as with DTrace. For instance to see only one's own projects,
you can use the -J option to pgrep.
$ pgrep -lf -J user.sch 728069 /usr/bin/bash 728027 /usr/bin/bash 125169 /usr/bin/bash
To see workloads on the system, you can use prstat's -J option, which aggregates the activity by project ID, as well as
displaying the most active processes:
$ prstat -c -J user.sch 1 1
PID USERNAME SIZE RSS STATE PRI NICE TIME CPU PROCESS/NLWP
653322 xx 19M 17M cpu2 0 3 166:34:10 12% setiathome/1
911046 xx 19M 17M cpu5 0 3 170:28:53 12% setiathome/1
668697 xx 19M 17M cpu4 0 3 138:53:14 12% setiathome/1
100378 daemon 2352K 1944K sleep 60 -20 30:18:23 0.2% nfsd/5
125214 sch 4472K 4152K cpu3 1 0 0:00:00 0.0% prstat/1
100066 root 7872K 6736K sleep 29 0 2:20:42 0.0% picld/13
125169 sch 2768K 2416K sleep 1 0 0:00:00 0.0% bash/1
100156 root 91M 36M sleep 59 0 0:46:59 0.0% poold/8
100249 root 6680K 4848K sleep 1 0 8:02:46 0.0% automountd/2
100254 root 5776K 3552K sleep 59 0 0:00:01 0.0% fmd/10
100262 root 4024K 3424K sleep 59 0 0:19:40 0.0% nscd/57
100265 root 1248K 776K sleep 59 0 0:00:00 0.0% sf880drd/1
100184 root 2288K 1384K sleep 1 0 0:00:00 0.0% ypbind/1
100172 daemon 2680K 1704K sleep 58 0 1:07:32 0.0% rpcbind/1
100158 root 2216K 1336K sleep 59 0 0:00:26 0.0% in.routed/1
PROJID NPROC SIZE RSS MEMORY TIME CPU PROJECT
130 3 56M 51M 0.3% 475:56:17 37% background
0 61 341M 168M 1.0% 43:21:28 0.3% system
36565 4 13M 11M 0.1% 0:00:01 0.1% user.sch
105403 14 39M 30M 0.2% 0:00:03 0.0% user.xxxxxxx
77194 17 74M 62M 0.4% 0:03:10 0.0% user.xxxxxx
Total: 133 processes, 279 lwps, load averages: 3.07, 3.07, 3.04
(This system's pretty idle during our U.S. shutdown, so it's doing its best to find extraterrestrial customers.)
To limit your DTrace predicates to only a project of interest, use the curpsinfo built-in variable to access the
pr_projid field, like
/curpsinfo->pr_projid == $projid && ..../
where I've also used the $projid scripting macro, which expands to the result of curprojid(2) for the
running DTrace script. You could instead explicitly enter your project ID of interest, or use one of the argument macros
if writing a script you expect to reuse.
Projects also let you place resource controls on your workload, establish its resource pool bindings, and more. We'll make it easier to use them with the forthcoming service management facility. But I'll summarize: projects are a precise and efficient way to label your workloads (as opposed to pattern matching on arguments or environment variables). If you are consolidating workloads, either because of machine eliminations, organizational mergers, or other reasons, they are definitely worth considering. If you think there's a way to make them more applicable to your work, please let me know.
(2004-07-08 10:11:42.0) Permalink Comments [6]Answer to sort(1) puzzle #1
/usr/bin/sort -ur -k 1,1n -k1,2nr input.d
Well, I underspecified that problem slightly; the output I was looking for is
1305 6565 1401 8192 1408 2312
which the anonymous poster's invocation will give. Alan's invocation gets the correct
line, but has the first field backwards (if I had specified the problem fully). You could, of
course, send Alan's output through another sort(1) stage to order the first field.
The key to this puzzle is knowing that (a) sort(1) does a final comparison of the entire
line using strcoll(3C), (b) that fields with specific modifiers ignore global modifiers (like the -r option here), and (c) that the Solaris implementation of sort(1) will output only
the first unique line it finds in the collated sequence. The first two of these points are in the manual page; the last requires some experimenting.
True story: This puzzle grew out of a service request where a customer was moving from a platform where the last unique line was the one displayed and needed to modify their script to produce the same output on Solaris.
Please comment if you want more puzzles, or if you think I should stop before getting started!
(2004-07-02 11:13:25.0) Permalink Comments [1]sort(1) puzzle #1
(I'm waiting for a build to finish, so here's a small Solaris sort(1) trick.)
Question: I have the following data file
1401 8192 1401 3487 1401 0807 1305 3471 1305 6565 1408 2312 1408 1233
Using only sort(1), how do I generate a file sorted by the first field, with only the highest valued second field
for each first field value?
Note that the unique line behaviour of sort(1) isn't well specified, so versions from
other platforms may not be able to do this trick.