The View from the Moon

20050107 Friday January 07, 2005

More on Bootchart for Solaris
So Eric Let the cat out of the bag on our reworked BootChart for Solaris. Eric posted one image; here is another: the bootup of a single Solaris Zone. I'm pretty happy with this, as we boot in only about 7 seconds.

This was an interesting experience because Eric and I had not previously worked together very closely. I had a great time doing this, and because Eric immediately stuck everything into a Teamware workspace, we were able to work simultaneously. Eric and I both worked on the D scripting, and somehow a joke about "wouldn't it be nice if the script output XML?" turned into our strategy. This turned out to be a good decision, as I hate writing parsers; instead we just let SAX do the work. We were able to maintain the split of having boot-log generation in one self-contained component, and the log processing into another. Because the XML logs are in an easily parsed format (as opposed to parsing the output of top and iostat), they can be useful to anyone doing boot analysis on Solaris. We've already had some such requests. I'm sure Eric will have more to say about the implementation so I'll leave it to him, except to say that some of the visual design changes can be blamed on me, taking inspiration from Edward Tufte's work.

Something else which fell out of this experience is that it's easy to use the log gatherer on any collection of processes which start up (as we saw in the zones example, above). We hope that this will be helpful in highlighting performance wins in the startup of complex ISV programs.

Following the experience of Linux developers, we've also found a series of bugs in Solaris with this tool. Let's start with an easy one, and the first one I found. The bug is visible in this chart from xanadu, my Shuttle SN45G4. Because we don't have support (sadly) for the on-board ethernet on this box, I had inserted another network card (I won't name the vendor, as I don't want to put them on the spot). If you look carefully at this bootchart, you'll see that the ifconfig process is spending a lot of time on the CPU. What's up with that? A brief investigation with DTrace made it clear that the driver had code like the following (shown here reduced as pseudo-code) in the attach(9E) codepath (attach is the process by which a driver begins controlling a hardware device):

    for (i := 0 to auto_negotiation_timeout) {
        if auto_negotiation_complete()
            return success;

        wait_milliseconds(100);
    }
Which all looks fine except that wait_milliseconds() (a function defined by the driver) is a wrapper around drv_usecwait(9F) ("busy-wait for specified interval"). Busy is of course the problem. drv_usecwait is really more about waiting for short intervals for various information to become ready in various hardware registers. Busy-waiting 100 milliseconds at a time is practically forever, and ties up the CPU just spinning in a loop. The authors almost certainly meant to use delay(9F). I filed a bug, and hopefully we'll have it fixed soon (since this driver comes to us from a third party, they request that we let them make the code changes). Fun, eh?

Another two issues we spotted concern inetd, which has been rewritten from scratch in Solaris 10; it is now a delegated restarter, which basically means that it take some of its direction from the system's master restarter (svc.startd). The behavior we noticed sticks out on any of the boot charts, including the zone boot chart mentioned above: inetd is starting a lot of very short-lived ksh processes. Why? When I first spotted this, I used DTrace to work out the answer, as follows:

# dtrace -n 'proc:::create/execname=="inetd"/{ustack();}'
...
restart inetd
...
  0  12188                     cfork:create 
              libc.so.1`__fork1+0x7
              libc.so.1`wordexp+0x16f
              inetd`create_method_info+0x45
              inetd`create_method_infos+0x2f
              inetd`read_instance_cfg+0xc8
              inetd`process_restarter_event+0x171
              inetd`event_loop+0xfd
              inetd`start_method+0x91
              inetd`main+0xcb
              inetd`0x8054712
Ahh, so we stumble upon an (embarrassing) implementation artifact of libc-- it uses ksh to help it implement the libc routine wordexp(3c). So, every time inetd needs to wordexp() something, we wind up running a ksh. We can also see that this is not severely impacting performance, but we would like to get this fixed. Personally, I'd like to see wordexp() fixed to not rely upon ksh at all.

Another somewhat more subtle issue is something that SMF engineers like Liane are still looking at. It's also visible in the zone boot chart. It appears that some services (such as nfs/client) are delayed in coming on-line because it is taking the startd/inetd cabal a while to mark their dependent services as online, even though that shouldn't really entail much work. We can see this as follows:

$ svcs -l nfs/client
fmri         svc:/network/nfs/client:default
name         NFS client service
...
dependency   optional_all/none svc:/network/rpc/keyserv (online)
dependency optional_all/none svc:/network/rpc/gss (online)
dependency require_all/refresh svc:/milestone/name-services (online) $ svcs -l network/rpc/gss fmri svc:/network/rpc/gss:default name Generic Security Service enabled true state online next_state none state_time Fri Jan 07 18:22:32 2005
restarter svc:/network/inetd:default
dependency require_all/restart svc:/network/rpc/bind (online) dependency optional_all/none svc:/network/rpc/keyserv (online)
Since nfs/client depends upon GSS, it can't be started by startd until GSS is online. Liane and Jonathan have offered up some theories about why this is happening, but we've all been engaged on higher priority work, and so we haven't had much time yet to dig deeper. This serialization point appears to be costing us roughly 1 full second on boot, so it's something we need to look at further. Have a great weekend folks!


(2005-01-07 19:00:00.0) Permalink Comments [8]
Trackback: http://blogs.sun.com/dp/entry/more_on_bootchart_for_solaris

 

Trackback URL: http://blogs.sun.com/dp/entry/more_on_bootchart_for_solaris
Comments:

[Trackback] Zwei gute Artikel zum Booten von Solaris 10: Eric Schrock's Weblog: Boot chart result More on Bootchart for Solaris (an die beiden: Really great articles)

Posted by c0t0d0s0.org on January 07, 2005 at 10:58 PM PST #

Dan, you talk about startup performance wins in large ISV programs, but what about something more common, like mozilla? (Which gets through dozens of short lived processes every time it's invoked, which includes requests to open new pages in an existing browser.) And then what about a gnome session startup?

Posted by Peter Tribble on January 08, 2005 at 01:15 AM PST #

One might consider Mozilla to be a large app from an ISV :) But yes, we'd thought about that and GNOME startup, too. As Eric mentioned, we're hoping to get through the legal hurdles and release the code and binaries, so that you can do your own analysis...

Posted by Dan Price on January 08, 2005 at 02:20 AM PST #

bootchart looks like a great tool for startup analysis; I hope it gets in SX soon. Perhaps this tool can provoke some work on parallelizing boot-up routines. Now if the hardware would just check itself a little faster.... it takes more than a few minutes for some Sun equipment to get ready to boot.

Posted by Charles Meeks on January 12, 2005 at 01:10 PM PST #

Charles, we do have parallelized startup in S10. That's one of the many advantages of SMF. If you look, you'll see some things which are starting up at the same time-- in the bootchart above, look at <tt>svc-utmpd</tt> and <tt>fs-local</tt>. These are two independent startup actions happening at the same time. Previously we would have waited for the svc-utmpd process to exit before starting the next startup script. Now we're getting both going at the same time. For more on SMF see Stephen and Liane's blogs, and the Self-Healing Bigadmin Site.

Posted by Dan Price on January 12, 2005 at 01:28 PM PST #

(On the other hand, maybe Charles means: maybe Sun can do even better with parallelization at boot time. On that score, I agree-- and that's why we have SMF and Bootchart).

Posted by Dan Price on January 12, 2005 at 01:29 PM PST #

I am from TAIWAN I'm a collge student. My majority is computer science. I am a freshman. I want to learn unix kernel. I find a book"Lion's Commentary on UNIX 6th Edition"in our college's library. please tell me how to learn UNIX kernel. Do you have any friend who is kernel developer in SUN ? could they give me a direction? THANKS you for tolerating my poor english

Posted by tsj on January 23, 2005 at 09:44 AM PST #

tsj, My suggestion would be to first get a good grounding in operating systems basics. There are a number of well respected textbooks, including Operating System Concepts and Modern Operating Systems. I would start with reading one of those.

After you have read one of these introductory books, try <a href="http://www.amazon.com/exec/obidos/tg/detail/-/0131019082/">UNIX Internals: The New Frontiers, which is really an excellent book. <a href="http://www.amazon.com/exec/obidos/tg/detail/-/0130224960/">Solaris Internals has lots of nice details as well, and there are similar books for Linux.

These books are all quite expensive, but your library should have them.

Another option would be to contact the professor in your department who teaches the operating systems class, and ask him or her about which textbook they recommend. Good luck!

Posted by Dan Price on January 24, 2005 at 06:35 PM PST #

Post a Comment:

Name:
E-Mail:
URL:

Your Comment:

HTML Syntax: NOT allowed
Dan Price's Weblog
[about me]