Friday March 06, 2009
On Sunday I committed changes into Nevada build 110 (and hence, into OpenSolaris 2009.next) to improve the speed at which OpenSolaris systems shutdown. On a generic x86 test system, it used to take about 41 seconds to shut the system down starting from the time you typed init 6, shutdown(1m), or pressed the shutdown button in GNOME. This form of shutdown is known as gentle shutdown because SMF takes care to stop each software service in dependency order. In contrast, when you use reboot, halt, or poweroff, you're using what I'll call violent shutdown. In the latter, the boot archive is brought up to date and that's about it. It has traditionally been much, much faster than gentle shutdown.
This relatively long shutdown time has had an interesting effect: Solaris developers almost universally cheat, and use violent shutdown. Typing reboot is terribly ingrained in my fingertips. This is bad, because it means less test coverage for the shutdown method which actually matters to customers. I recently began to be bothered by this because the GNOME gui uses init 6, and so shutting down via the nice shiny "Shut Down" button is also terribly slow.

On something of a whim, I dusted off our variant of the bootchart project which we ported in-house several years ago to get some initial information on what was happening during system shutdown. Click here to see a graphical representation of a typical system shutting down (note: big image). To read the image, note that time is on the X axis, from left to right. Processes are represented as horizontal bars representing their duration. At the rightmost side of the chart, the system has stopped.
In the image I've highlighted a couple of points of interest:
- The pppd shutdown script seems to sleep 1, always, even if you aren't using ppp; since pppd isn't converted to SMF (bug 6310547), we will try to stop it on all systems on every shutdown.
- The wbem service seems to sleep 1 while shutting down, and the the webconsole service takes a while to shutdown. However, these services are present only on Nevada, and not on OpenSolaris, so I chose not to pursue trying to fix them.
- The deferred patching script, installupdates is really slow. And needlessly so-- it can run in a few milliseconds with a simple fix; I filed a bug.
- There are some long calls to /usr/bin/sleep. In the chart linked above, you can see svc-autofs, rpc-bind, and svc-syseventd each taking five seconds to stop. Five seconds is a really long time!
- There's a call to something called killall near the end of shutdown. Then, 5 seconds later, another. Then, 10 seconds later, things proceed again. I wondered what the killall was all about? Did it really need 15 seconds to do its work?
After a bit of effort (ok, a lot of effort), I've cleaned up these, and some other problems I spotted along the way. It turns out that the five second delays are from some poor quality shell code in /lib/share/smf_include.sh:
smf_kill_contract() {
...
# Kill contract.
/usr/bin/pkill -$2 -c $1
...
# If contract does not empty, keep killing the contract to catch
# any child processes missed because they were forking
/usr/bin/sleep 5
/usr/bin/pgrep -c $1 > /dev/null 2>&1
while [ $? -eq 0 ] ; do
...
Ugh. So this shell function rather bizarrely always waits five seconds, even if the contract empties out in 1/10th of a second! I fixed this to have a smarter algorithm, and to keep checking at more frequent intervals (every 0.2 seconds).
I discovered that the calls to killall(1m) were really ancient history, and probably did not need to occupy 15 seconds worth of system shutdown. I have shortened the interval substantially.
Another problem we faced was that, in the last moments before shutdown, startd runs some commands using system(3c). This can be a problem if one of those commands, for some reason, wedges up. So, I've replaced the calls to system with calls which timeout after a set number of seconds. This is some nice insurance that the system continues to make progress as it shuts down. Since I wound up with so much extra time available at shutdown, I've taken the chance to add a call to lockfs(1m) in an effort to get as much on-disk UFS consistency as possible.
So, here is the intermediate picture. I've slightly revised the shutdown messaging, as well, to include a datestamp and a measurement of how long the shutdown took:
svc.startd: The system is coming down. Please wait.
svc.startd: 83 system services are now being stopped.
Mar 5 19:43:34 soe-x4100-3 syslogd: going down on signal 15
svc.startd: Killing user processes.
Mar 5 19:43:40 The system is down. Shutdown took 17 seconds.
But wait, there's more! On OpenSolaris, we don't have the time consuming wbem or webconsole services. So, we can disable those and try again. And, we use ZFS, for which the time consuming lockfs call at the end of shutdown is a no-op (on UFS, it takes at least two seconds). This slimmed down stack results in an impressive shutdown time:
Mar 6 02:51:51 The system is down. Shutdown took 7 seconds.
And here is what it looks like. If you want to see the complete set of changes, the codereview is also available. As you can see, revising the way we kill off processes at the end of the life of the system is the big unrealized win. And doing so would likely shave about 3 more seconds off, for a gentle shutdown of 4-5 seconds. I ran out of time to do that this time around.
Some caveats:
- You mileage may vary: you might run a different mix of services on your system, and perhaps one of those has a slow shutdown method which will gum up the works. If you want to test how long a service takes to stop, try ptime svcadm disable -s <servicename>.
- Your performance improvement is likely to be less dramatic on systems with less available parallelism. Most of my test systems have two or four CPUs.
I should add a coda here: this work is greatly improved by recent bootadm performance work by Enrico. While building the boot archive is still sometimes triggered on shutdown, it takes a lot less time than it did previously.
I had a good time working on this project; I hope you'll enjoy using it.
(2009-03-06 06:15:00.0) Permalink Comments [14]
Tags: boot bootchart halt opensolaris shutdown
Trackback: http://blogs.sun.com/dp/entry/speeding_to_a_halt


Great work!!!!
Posted by Michal Pryc on March 06, 2009 at 07:28 AM PST #
Excellent to have all those scripts cleaned up a bit.
Would love to know what kind of tool you used to monitor the processes and generate your graphs, those are really neat.
Posted by Mat on March 06, 2009 at 08:09 AM PST #
Like the shutdown speed improvements, but *LOVE* the changes to ensure that the system doesn't hang on shutdown! We've run into problems with machines refusing to umount filesystems on shutdown in the past. Since we admin a number of machines that are 300km away, we're always worried about them not coming back after an init 6.
Good work!
Posted by sean walmsley on March 06, 2009 at 12:32 PM PST #
Sean, I hit the same problem-- due to a somewhat complicated bug, umountall(1m) can sometimes get stuck on shutdown when NFS is in play. The NFS guys are aware of this problem, and I'm hoping they will do something about it soon, but this is essentially what lead me to add the timeouts.
Posted by Dan Price on March 06, 2009 at 12:53 PM PST #
Thanks, Dan!
Like most, I've been using "sync; sync; halt" to shut down systems for so long that I rarely notice how painful the standard init 6/power button shutdown can be.
Posted by Bill on March 06, 2009 at 09:12 PM PST #
Oh good god thank you !
Shutdown and Boot times are killing me on Solaris !
This makes a vast improvment for a start however I believe is much more that can be done and should be done..
*Please continue*
Edward.
Posted by Edward O'Callaghan on March 07, 2009 at 07:06 AM PST #
Been caught in situations where shutdown and even init 0/6 would just hang on customer systems, so I have then resorted to uadmin 2 0 with the consent of the customer. This never fails and but is not guaranteed to umount/close the filesystems in an orderly fashion.
Posted by Bill Nicholson on March 07, 2009 at 04:15 PM PST #
@Edward, I've found boot time to be reasonable, although we're going to try to do better still. Shutdown time was definitely in the realm of "not reasonable", at least, in my opinion.
The next big wave of improvement is work by Sherry Moore (http://blogs.sun.com/sherrym/), which will enable you to turn on fast-reboot for 'init 5/6/0' shutdowns. Coupled with my work, and on platforms which support fast-reboot (some x86 boxes), this should mean dramatically shortened cycle times.
Edward-- If you have some specific issues, or know of specific parts of boot which seem to be slow, we'd love to hear about them. I know that, for example, the ZFS folks have done a lot of work to make zfs mounts faster.
Of course, if you use containers for some apps, those can reboot very fast, too...
Posted by Dan Price on March 08, 2009 at 12:55 AM PST #
Sorry, link to Sherry's blog was busted by roller. It is: http://blogs.sun.com/sherrym/
Posted by Dan Price on March 08, 2009 at 12:56 AM PST #
Wow. I had a brief discussion with Antonello Cruz last year about this exact issue.
We talked about possible SMF changes and instrumentation options for tackling the shutdown delays, which, of course, included bootchart and timestamping the shutdown messages.
I'm thrilled you've gone and done all the hard work. 7 seconds is fantastic!
I still think it would be a huge service for the community if you could try to get bootchart past legal one more time. =P
Posted by Albert Lee on March 09, 2009 at 07:49 AM PDT #
@Dan Price ,
Hi thanks,
I guess you could say these few bits add about 5sec to boot time lone to the kernel booting.
installing amd_iommu , module id 36
..
installing pipe , module id 153
..
installing lofs , module id 158
..
installing log , module id 159
..
installing cryptoadm , module id 161
..
installing fdfs, module id 270
..
installing cpr, module id 271
You can probe more deeply with:
set moddebug=0x00000000
in /etc/system
maybe if you could add a bit of DTrace in there the kernel eng could see what's taking so long.
One thing that could be improved is not mod loading every single driver at boot but just loading what needs to be loaded for the kernel to boot ! Currently Solaris hardware probing code is somewhat 'raw' that is load everything and see what sticks which as a end result is slower boot up speeds and possible for kernel bugs to pop into the mix such as 6816786.
Thanks,
Edward O'Callaghan.
.:: AuroraUX ::.
Posted by EdwardO'Callaghan on March 18, 2009 at 10:14 PM PDT #
this kind of initiative deserves to be rewarded handsomely because it makes a big difference to the product and it should have been done a long, long time ago
Posted by mc on March 22, 2009 at 02:43 AM PDT #
Shutdown times are greatly improved with 110.
Many thanks!
alan
Posted by Alan Pae on March 22, 2009 at 01:51 PM PDT #
Great work. How about opening some code from your bootchart port? For example, DTrace log parser, if you look license problems with original bootchart java source..
Posted by Alexander Eremin on March 23, 2009 at 04:22 AM PDT #