Wednesday November 21, 2007 Perl DTrace and complex data structures
Recently I had to solve the following problem. Suppose that you have a bunch of strings, consisting of various tokens (think of sentences and words). For example. For example, the first string may be 'a b c' and the second can be 'a b d'. Some strings can have common prefixes - sets of tokens that start the string. I'd like to be able to see what tokens are common and what are different, so for two strings above I'd like to print something like
a b c d
This tree shows all unique strings that I have in my input in a compressed way with all common prefixes joined together. If I add another string 'a e f' to it, the tree would become
a b c d e f
I started by writing a little prototype in Lisp, using Emacs lisp interpreter. First of all I needed a little helper function which tranlates a single string into a deep tree:
;;
;; Convert list l to a one-brunch tree
;; For example, (list2tree nil '(a b c)) becomes
;; (a (b (c nil)))
(defun list2tree (l)
(when l
(list (car l)
(list2tree (cdr l)))))
The main code is a bit tricky because it uses double recursion - one for the already constructed tree and another for the tokens in the string.
;;
;; Given the tree and a list of args construct a new tree finding the common
;; prefix in the args. For example:
;;
;; (linsert (linsert nil '(a b c)) '(a b d)) gives
;;
;; ((a
;; (b
;; (c nil)
;; (d nil))))
;;
(defun linsert (tree args)
(if tree
;; Add new elements ot an existing tree
(let* ((front (car tree)) ; first brunch
(key (car front)) ; key of the brunch
(el (car args)))
(if (eq key el) ; if key matches the first element
;;
;; Replace the matching brunch of a tree with a tree including rest
;; of args
;;
(cons (cons key
(linsert (cdr front)
(cdr args)))
(cdr tree))
;; No match - try next brunch
(cons front
(linsert (cdr tree) args))))
;; Tree is empty, construct a tree using list2tree
(list (list2tree args))))
A simple test shows that we got what we wanted:
(setq x (linsert nil '(a b c))) ((a (b (c nil)))) (setq y (linsert x '(d e f))) ((a (b (c nil))) (d (e (f nil)))) (setq z (linsert y '(d e h))) ((a (b (c nil))) (d (e (f nil) (h nil)))) (pp z) ((a (b (c nil))) (d (e (f nil) (h nil))))
So far so good. Now I tried to translate the above code into Perl using array references and got thoroughly confused. I just could not get the same thing working in perl. List references are rather ugly once you try to do something non-trivial with them. So I decided to follow another root and searched on CPAN for available implementation of trees and found the Tree::Simple module which seemed to provide the functionality needed. Here is the perl version:
#!/usr/bin/perl
use Tree::Simple;
# Insert a list into a tree
sub tinsert
{
my $tree = shift;
return $tree unless scalar @_;
my $el = shift;
my @rest = @_;
my @match = grep { $_->getNodeValue() eq $el} $tree->getAllChildren();
if (scalar @match) {
my $t = $match[0];
tinsert($t, @rest);
} else {
tinsert($tree->addChild(Tree::Simple->new($el)), $el, @rest);
}
}
As a side benefit I got everything I needed to pretty-print the result:
# Print tree node
sub print_node
{
my $tree = shift;
print ' ' x $tree->getDepth(), $tree->getNodeValue(), "\n";
}
# Print the whole tree
sub tprint
{
my $tree = shift;
$tree->traverse(\&print_node);
}
my $tree = Tree::Simple->new("root");
tinsert($tree, 'a', 'b', 'c');
tinsert($tree, 'd', 'e', 'f');
tinsert($tree, 'd', 'e', 'h');
tprint($tree);
This produces
a b c d e f h
So, although direct list manipulation turned out to be pretty ugly, using Perl objects and a great library from Stevan Little, the resulting code is pretty simple.
This wasn't just an exercise in recursive functions. I used this to post-process a huge file with DTrace data describing Solaris build collected by a little D script:
#!/usr/sbin/dtrace -Cs /* * Provide information about dmake targets and directories. */ #include/* * Use process p_mstart time instead of pid since pids roll over */ proc:::exec-success /curpsinfo->pr_projid == $1 && execname == "dmake"/ { this->proc = curthread->t_procp; this->parent = curthread->t_procp->p_parent->p_parent; this->pcwd = this->parent->p_user.u_cdir->v_path == NULL ? " " : stringof(this->parent->p_user.u_cdir->v_path); printf("I %d\t%d\t%d\t%d\t%d\t%s\t%s\t%s [%s]\n", walltimestamp, pid, this->proc->p_mstart, this->parent->p_pid, this->parent->p_mstart, cwd, this->pcwd, curpsinfo->pr_psargs, this->parent->p_user.u_psargs); @dirs[cwd] = count(); } syscall::rexit:entry /curpsinfo->pr_projid == $1 && execname == "dmake"/ { printf("O %d\t%d\t%d\n", walltimestamp, pid, curthread->t_procp->p_mstart); } END { printf("---directories--- \n"); printa("%@d\t%s\n", @dirs); }
Combining the DTrace and Perl magic together into a single mix I got some interesting build timelines for sparc and x86. For example:
--------------------------------------------------------------------------------
Directory Time Spent Target
--------------------------------------------------------------------------------
...
usr/src ksh usr/src/tools/scripts/nightly.sh /export/onnv-76//etc/env
usr/src 51s 1h37m30s -e install
usr/src/uts 2m15s 1h23m43s install
usr/src/uts/common/sys 2m16s all_h
usr/src/uts/common/rpc 2m16s all_h
usr/src/uts/common/rpcsvc 2m16s all_h
usr/src/uts/common/gssapi 2m16s all_h
usr/src/uts/common/idmap 2m16s all_h
usr/src/uts/sun4v 2m18s 16m16s install
usr/src/uts/sun4v/genassym 2m18s 7s install
usr/src/uts/sun4v/genassym 2m18s 6s def.targ
usr/src/uts/sun4v/unix 2m24s 5m6s install
usr/src/uts/sun4v/unix 2m25s 5m4s install.targ
usr/src/uts/sun4v/genassym 2m26s 1s all.targ
usr/src/uts/sun4v/genunix 3m20s 3m51s all.targ
usr/src/uts/sparc/ip 3m21s 1m25s ipctf.debug64
usr/src/uts/sparc/ip 3m22s 1m24s debug64/ipctf.a
usr/src/uts/sun4v/platmod 7m11s 3s all.targ
usr/src/uts/sun4v/genunix 7m30s 1m35s install
usr/src/uts/sun4v/genunix 7m31s 1m33s install.targ
usr/src/uts/sparc/ip 7m32s 11s ipctf.debug64
usr/src/uts/sparc/ip 7m33s 10s debug64/ipctf.a
usr/src/uts/sun4v/generic 10m17s 1m55s install
usr/src/uts/sun4v/generic 10m18s 1m55s install.targ
usr/src/uts/sun4v/generic 10m18s 1m54s def.targ
usr/src/uts/sun4v/unix 10m22s 1m50s symcheck
usr/src/uts/sun4v/genassym 10m23s 1s all.targ
usr/src/uts/sun4v/genunix 10m39s 1m31s all.targ
usr/src/uts/sparc/ip 10m40s 10s ipctf.debug64
usr/src/uts/sparc/ip 10m41s 10s debug64/ipctf.a
usr/src/uts/sun4v/niagara 12m13s 1m56s install
usr/src/uts/sun4v/niagara 12m13s 1m56s install.targ
usr/src/uts/sun4v/niagara 12m13s 1m55s def.targ
usr/src/uts/sun4v/unix 12m19s 1m49s symcheck
usr/src/uts/sun4v/genassym 12m20s 1s all.targ
usr/src/uts/sun4v/genunix 12m36s 1m31s all.targ
usr/src/uts/sparc/ip 12m37s 10s ipctf.debug64
usr/src/uts/sparc/ip 12m37s 10s debug64/ipctf.a
usr/src/uts/sun4v/niagara2 14m9s 1m54s install
usr/src/uts/sun4v/niagara2 14m9s 1m54s install.targ
usr/src/uts/sun4v/niagara2 14m9s 1m53s def.targ
usr/src/uts/sun4v/unix 14m15s 1m48s symcheck
usr/src/uts/sun4v/genassym 14m15s 1s all.targ
usr/src/uts/sun4v/genunix 14m31s 1m30s all.targ
usr/src/uts/sparc/ip 14m32s 10s ipctf.debug64
usr/src/uts/sparc/ip 14m32s 10s debug64/ipctf.a
usr/src/uts/sun4v/vfalls 16m3s 1m54s install
usr/src/uts/sun4v/vfalls 16m3s 1m54s install.targ
usr/src/uts/sun4v/vfalls 16m4s 1m53s def.targ
usr/src/uts/sun4v/unix 16m9s 1m48s symcheck
usr/src/uts/sun4v/genassym 16m10s 1s all.targ
usr/src/uts/sun4v/genunix 16m26s 1m30s all.targ
usr/src/uts/sparc/ip 16m27s 10s ipctf.debug64
usr/src/uts/sparc/ip 16m27s 10s debug64/ipctf.a
usr/src/uts/sun4v/ontario 18m19s 10s install
usr/src/uts/sun4v/ontario/platmod 18m19s 4s install
usr/src/uts/sun4v/ontario/platmod 18m19s 3s install.targ
usr/src/uts/sun4v/ontario/tsalarm 18m23s 6s install
usr/src/uts/sun4v/ontario/tsalarm 18m23s 5s install.targ
usr/src/uts/sun4v/montoya 18m28s 4s install
usr/src/uts/sun4v/montoya/platmod 18m28s 4s install
usr/src/uts/sun4v/montoya/platmod 18m29s 3s install.targ
usr/src/uts/sun4v/huron 18m32s install
usr/src/uts/sun4v/maramba 18m32s install
usr/src/uts/sun4u 18m32s 53m18s install
usr/src/uts/sun4u/genassym 18m33s 7s install
usr/src/uts/sun4u/genassym 18m33s 7s def.targ
usr/src/uts/sun4u/unix 18m40s 3m46s install
usr/src/uts/sun4u/unix 18m40s 3m45s install.targ
usr/src/uts/sun4u/genassym 18m41s 1s all.targ
usr/src/uts/sun4u/genunix 19m35s 2m34s all.targ
usr/src/uts/sparc/ip 19m36s 10s ipctf.debug64
usr/src/uts/sparc/ip 19m36s 10s debug64/ipctf.a
usr/src/uts/sun4u/platmod 22m8s 2s all.targ
usr/src/uts/sun4u/genunix 22m25s 1m32s install
usr/src/uts/sun4u/genunix 22m26s 1m31s install.targ
usr/src/uts/sparc/ip 22m27s 10s ipctf.debug64
usr/src/uts/sparc/ip 22m28s 10s debug64/ipctf.a
usr/src/uts/sun4u/cheetah 25m54s 2m3s install
usr/src/uts/sun4u/cheetah 25m55s 2m3s install.targ
usr/src/uts/sun4u/cheetah 25m55s 2m2s def.targ
usr/src/uts/sun4u/unix 26m7s 1m50s symcheck
usr/src/uts/sun4u/genassym 26m8s 1s all.targ
usr/src/uts/sun4u/genunix 26m24s 1m31s all.targ
usr/src/uts/sparc/ip 26m25s 10s ipctf.debug64
usr/src/uts/sparc/ip 26m25s 10s debug64/ipctf.a
usr/src/uts/sun4u/cheetahplus 27m58s 2m2s install
usr/src/uts/sun4u/cheetahplus 27m58s 2m2s install.targ
usr/src/uts/sun4u/cheetahplus 27m58s 2m1s def.targ
usr/src/uts/sun4u/unix 28m11s 1m48s symcheck
usr/src/uts/sun4u/genassym 28m12s 1s all.targ
usr/src/uts/sun4u/genunix 28m27s 1m30s all.targ
usr/src/uts/sparc/ip 28m28s 10s ipctf.debug64
usr/src/uts/sparc/ip 28m29s 10s debug64/ipctf.a
usr/src/uts/sun4u/jalapeno 30m 2m1s install
usr/src/uts/sun4u/jalapeno 30m 2m1s install.targ
usr/src/uts/sun4u/jalapeno 30m 2m def.targ
usr/src/uts/sun4u/unix 30m12s 1m48s symcheck
usr/src/uts/sun4u/genassym 30m13s 1s all.targ
usr/src/uts/sun4u/genunix 30m29s 1m30s all.targ
usr/src/uts/sparc/ip 30m30s 10s ipctf.debug64
usr/src/uts/sparc/ip 30m30s 10s debug64/ipctf.a
usr/src/uts/sun4u/serrano 32m1s 2m1s install
usr/src/uts/sun4u/serrano 32m1s 2m1s install.targ
usr/src/uts/sun4u/serrano 32m2s 2m def.targ
usr/src/uts/sun4u/unix 32m14s 1m48s symcheck
usr/src/uts/sun4u/genassym 32m15s 1s all.targ
usr/src/uts/sun4u/genunix 32m30s 1m30s all.targ
usr/src/uts/sparc/ip 32m31s 10s ipctf.debug64
usr/src/uts/sparc/ip 32m32s 10s debug64/ipctf.a
usr/src/uts/sun4u/spitfire 34m2s 1m58s install
usr/src/uts/sun4u/spitfire 34m3s 1m58s install.targ
usr/src/uts/sun4u/spitfire 34m3s 1m57s def.targ
usr/src/uts/sun4u/unix 34m12s 1m48s symcheck
usr/src/uts/sun4u/genassym 34m13s 1s all.targ
usr/src/uts/sun4u/genunix 34m28s 1m30s all.targ
usr/src/uts/sparc/ip 34m29s 10s ipctf.debug64
usr/src/uts/sparc/ip 34m30s 10s debug64/ipctf.a
usr/src/uts/sun4u/hummingbird 36m 1m59s install
usr/src/uts/sun4u/hummingbird 36m1s 1m58s install.targ
usr/src/uts/sun4u/hummingbird 36m1s 1m57s def.targ
usr/src/uts/sun4u/unix 36m11s 1m48s symcheck
usr/src/uts/sun4u/genassym 36m11s 1s all.targ
usr/src/uts/sun4u/genunix 36m27s 1m30s all.targ
usr/src/uts/sparc/ip 36m28s 10s ipctf.debug64
usr/src/uts/sparc/ip 36m28s 9s debug64/ipctf.a
usr/src/uts/sparc 1h11m49s 14m7s install
usr/src/uts/sun4v 9m5s install
usr/src/uts/sun4v/bge 9m5s 1m6s install
usr/src/uts/sun4v/bge 9m6s 1m5s install.targ
...
The table above shows when a directory was entered during the build and how much time was actually spent building it.
( Nov 21 2007, 03:06:47 PM PST ) PermalinkJust integrated CPU Caps project into S10U5. So far it was available only for users of OpenSolaris and SXDE, pretty soon it will be also available for regular S10 users. It is really exciting to see its applications in real life! Finally S10 users would be able to simply say in zonecfg:
zonecfg:myzone> add capped-cpu
zonecfg:myzone>capped-cpu> set ncpus=3
zonecfg:myzone>capped-cpu> end
( Oct 11 2007, 09:03:54 PM PDT ) Permalink Comments [2]
Recently I was investigating a very interesting CPU Caps test failure. One of the tests was failing when it was running early during zone boot. It turned out to be a very interesting bug, indeed. For once, it was a great exercise in using DTrace for solving complicated problem and it also exposed a generic weakness of Solaris scheduler. The bug is 6577453 Java CPU hogs can escape CPU Caps enforcement by sleeping a lot.
In a nutshell, a program may behave in such a way that it is never seen on CPU by clock() thread while it uses a noticeable chunk of CPU resources. CPU Caps were accurate in charging its CPU time (see the description of CPU Caps accounting mechanism), the policing was done by clock(). As a result some threads enjoyed unlimited access to CPU resources while blocking everyone else. To test the fix I wrote a little nasty program that demonstrates the problem.
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <poll.h>
#include <sys/time.h>
#define NSEC_IN_MSEC (NANOSEC / MILLISEC)
#define DELTA (9 * NSEC_IN_MSEC)
int f(int x)
{
return (x+1);
}
int main(int argc, char *argv[])
{
for (;;) {
int i;
int y = 0;
hrtime_t t1, t2;
int rc = poll(NULL, 0, 1);
t1 = gethrtime();
t2 = t1 + DELTA;
while (gethrtime() < t2) {
y += f(i);
}
}
}
The program consumes 50% of a CPU and is never seen by a clock() thread. It starts after the clock tick and goes to sleep right before the next one. Running it demonstrates the general problem with Solaris scheduling class implementation - while all the micro-state accounting information is available it is not actually used by the scheduler. This program always runs at priority 59, defeating priority mechanisms of the time-sharing class.
Side note - I was unable to write the same program in Perl. It turned out that IO::Poll module converts the poll(2) timeout value to seconds by always multiplying it to 1000.As many of you know, once you are involved in something, related factors start flying your way. Indeed, a few days after I investigated this problem, slashdot advertised the paper by Dan Tsafrir, Yoav Etsion and Dror Feitelson, describing the very same issue!
Seems like now is the time to fix this old standing problem in Solaris. I opened bug 6582502 Threads may hide behind the clock.
The CPU Caps project is now integrated in OpenSolaris and now I am working on back-porting it to S10 update. I think this is a good time to put some notes regarding its implementation details. The Implementation guide gives a good high-level overview, so here I'd like to concentrate on the bottom-up view.
Before we can penalize CPU usage of some threads we need to know how much CPU is consumed by every project. There are two main approaches available - sampling and monitoring. The difference can be illustrated using the freeway speed control example. Imagine that local police decided to crack down on the freeway speeders (in fact, this is exactly what happened in San Jose). The common method is to hide police vehicle in the bushes and wait until some unlucky schmuck races by. The chances of getting every speeder are not very high but over some period of time the method works because enough speeders would be eventually caught. Some lucky ones, though, would miss a chance meeting with a friendly policeman.
Another approach is to tag each car when it enters and exits freeway with the location and the time. Assuming that the speed is more or less constant it is easy to calculate the average speed and penalise speeding car when it exits the freeway. This method provides for much greater accuracy.
Using sampling we can periodically check what threads are running on each CPU and interpolate their CPU usage from that. For example, once every clock tick we find all threads running on a CPU and charge them 1 clock tick worth of CPU time. Some threads may have just arrived on CPU while others may be sitting there longer, but for long-running threads we should get a good enough estimation. This is the simplest approach and it was used in the initial CPU Caps prototype. The main trouble is that one tick is quite a long time on modern super-fast CPUs and a lot of thread activity may happen in the meantime.
Thread monitoring allows us to know exactly how much CPU time was consumed by a CPU. We do this by marking the time a thread boarded a CPU and left it. Since we are only interested in short-turn CPu usage (over a tick) we also need to check those running on a CPU and get their on-CPU time as well.
Solaris kindly provides us a convenient tool for such purposes, called micro-state accounting. It uses very accurate nanosecond-granularity timestamps whenever thread changes its states. The CPU Caps code uses this facility to calculate CPU usage of each thread. This is done by the mstate_thread_onproc_time() routine:
mstate_thread_onproc_time(kthread_t *t)
{
hrtime_t aggr_time;
hrtime_t now;
hrtime_t state_start;
struct mstate *ms;
klwp_t *lwp;
int mstate;
/* Ignore kernel threads */
if ((lwp = ttolwp(t)) == NULL)
return (0);
/* Get the current thread state */
mstate = t->t_mstate;
ms = &lwp->lwp_mstate;
/* time when thread entered this state */
state_start = ms->ms_state_start;
/* Thread's user + system + trap time */
aggr_time = ms->ms_acct[LMS_USER] +
ms->ms_acct[LMS_SYSTEM] + ms->ms_acct[LMS_TRAP];
/* current time */
now = gethrtime_unscaled();
/*
* NOTE: gethrtime_unscaled on X86 taken on different CPUs is
* inconsistent, so it is possible that now < state_start.
*/
if ((mstate == LMS_USER || mstate == LMS_SYSTEM ||
mstate == LMS_TRAP) && (now > state_start)) {
/* Add time spent on CPU in the current state */
aggr_time += now - state_start;
}
scalehrtime(&aggr_time);
return (aggr_time);
}
This function returns the time spent on CPU by user-land threads since their
birth. The t->t_lwp->lwp_mstate.ms_acct array contains aggregate time spent
by thread in each of the possible states:
LMS_USER - running in user mode
LMS_SYSTEM - running in system call or page fault
LMS_TRAP - running in other trap
LMS_TFAULT - asleep in user text page fault
LMS_DFAULT - asleep in user data page fault
LMS_KFAULT - asleep in kernel page fault
LMS_USER_LOCK - asleep waiting for user-mode lock
LMS_SLEEP - asleep for any other reason
LMS_WAIT_CPU - waiting for CPU (latency)
LMS_STOPPED - stopped (/proc, jobcontrol, lwp_suspend)
The function above is the foundation of the thread accounting done by CPU
caps. The CPU-caps specific monitoring is implemented by each scheduling class
which supports CPU caps. For each thread scheduling classes keep a little
caps_charge_adjust
function via
the cpucaps_charge.
The caps_charge_adjust function calculates the time spent on CPU
since a thread was last checked and updates its total on-CPU time. We will
take a closer look at it next time.
[ Technorati: Solaris ]
( Jun 19 2007, 06:16:28 PM PDT ) PermalinkEvery time I start a new document I face the same problem - what markup language to use for it? My high-level goal is pretty simple - I need a good way to create good looking printed documents and Web pages from a single source. I guess, everyone has his/her own solution. I'll just describe what I use currently.
Most of the documents I am dealing with have a simple structure. They need to
have sections and subsections, numbered and unnumbered lists, a few different
founts for emphasizing things (like code fragments and
things to pay attention to). Occasionally I use tables and
sometimes hyperlinks to other parts of the same
document or some useful external links.
Here are some candidates I have considered so far:
There are, probably, a whole bunch of other things that I am not very familiar with:
ASCII lacks any structure and although there are some automatic converters of text to HTML, the author still have to use some implicit markup structure to communicate intentions to the conversion tool.
Perl POD is simple enough and is quite suitable for something resembling man pages, but its support for lists is buggy and the support for hyoperlinks is missing. It has converters to HTML, ASCII, LaTeX, bubdled with Perl, so I use it once in a while.
Texinfo is pretty powerful, but I just never understood it, it might be the best choice, though.
HTML may be a good choice and there is HTML->LaTeX converter. But why use it if we can use LaTex directly?
This leaves me with LaTeX which provides very powerful facilities and can be combined with latex2html to produce HTML. That is what I used for various documentation for the CPU Caps project. LaTeX provides very powerful markup facilities and a whole bunch of third-party packages (unfortunately most of them are unlikely to be interpreted correctly by latex2html). As a bonus, LaTex has a pretty powerful Emacs AUCTeX support.
Here are some tricks that I use in LaTeX to deal with on-line documents. The document starts with the following preamble:
\documentclass{article}
\usepackage{url}
\usepackage{html}
Sometimes, I use \documentclass[twoside]{article} or
\documentclass[twocolumn,twoside]{article} to produce two-sided or
two column documents.
The url
package allows me to say things like
\url{http://www.opensolaris.org} to point readers to various
external resources.
The htmlextension package provides some nice features for the latex to
HTML conversion. One that I actually use is a way to provide hypertext links
where I can control the link and its text separately: \htmladdnormallink
{link name} {link-URL}
As an example, when I want to mention various bug IDs in the document I define the following macro:
% Provide URL for a bug ID for latex2html.
\newcommand{\bug}[1]{\htmladdnormallink{#1}
{http://bugs.opensolaris.org/bugdatabase/view_bug.do?bug_id=#1}
}
Then I can use \bug{6468451} in the document and both printed
and WEB will show bug ID while WEB version will show a hyperlink to the bug
description in the OpenSolaris bug
database.
It would be interesting to know how others solve the same problem.
[ Technorati: LaTeX, markup, HTML, documentation, ]
( Nov 22 2006, 02:01:15 PM PST ) Permalink Comments [2]Caps revised What's new with CPU caps - observability! [Read More] ( Nov 21 2006, 06:25:03 PM PST ) Permalink
Using ZFS to save space on laptop Why ZFS is useful on laptops? It saves disk space! [Read More] ( Sep 14 2006, 06:41:11 PM PDT ) Permalink
Now that the NUMA observability tools are integrated, I made some progress on the CPU caps. I was mostly scribbling documentation and produced a few useful pages:
I also managed to do some actual coding and fixed the nasty locking issue which could leave threads left on the wait queues forever when caps are disabled. I still have a few more issues to attack.
In the process I filed a few new resource management bugs/RFEs:
NUMA observability tools are now in the gate!
Previows wednesday the NUMA observability tools were finally integrated into Solaris Nevada build 49 and should be available soon on OpenSolaris!
The tools include:
The tools provide a lot of observability and control to the inner workings of the NUMA subsystem.
( Sep 11 2006, 05:23:55 PM PDT ) PermalinkExchanging hats and wearing a CPU cap. CPU Caps update [Read More] ( Aug 21 2006, 05:07:26 PM PDT ) Permalink Comments [1]
OpenSolaris - 1 year of opening Sun development
It seems just yesterday that Solaris went IPO. Yet, now it is celebrating its first anniversary. A good time to reflect on the effect of OpenSolaris on the internal Solaris developer community. What have really changes since Jun 14 2005?
I think the most important direct effect is that users and customers now ask direct questions about how something works and developers can explain them exactly what is going on. On more than one occasion I was exchanging e-mails with customers explaining the intricacies of the STREAMS framework implementation and pointing to specific snippets of code and getting questions about this code. This has a downside as well: the easily availability of code may create lots of implicit dependencies on the implementation details. Hopefully developers will continue sticking to the published APIs.
During the first year OpenSolaris site become a huge repository of technical documentation for both existing and future projects. One day we discovered that someone decided that our server hosting all the internal documentation for the NUMA project is a test machine and completely wiped out all the content. We decided to just move all the information to the OpenSolaris site - together with prototype code and binaries.
Another major change for the internal developers is that they are not quite "internal" any more! We now routinely publish proposals, code reviews, prototypes PSARC cases and other development by-products and are anxiously expecting useful feedback. It seems like some areas (e.g getting the favorite shell as Solaris default) are getting much more attention than boring issues of scheduling and memory optimization, so I hope that the next year will catch up. All of us would like to see deeper penetration of community involvement in the guts of Solaris internals.
I have seen several cases when some projects didn't want to go open initially and tried to follow the traditional path. The peer pressure inevitably pushed them out - for the better. I myself usually initiated internal code reviews before opening a public discussion - all of us want to avoid embarassement :-). I think, by now we are doing much more open development. It is becoming a norm by now and getting part of the process. And part of the fun!
[ T: OpenSolaris anniversary 2006 ]
( Jun 15 2006, 07:13:29 PM PDT ) PermalinkI spent a few days in Arizona with my wife and daughter in the middle of April and would like to share some of the experience here. Our route was San Francisco - Phoenix - Casa Grande - Phoenix - Wilcox - Douglas - Tucson Phoenix - San Francisco. We started in the morning of April 13. The 10 a.m. flight was pretty convenient and it took 2 hours to get from SFO to Phoenix. Luckily the sky was pretty clear and sitting near the window I could see California snow-covered mountains and Nevada/Arizona deserts, and was able to take a few aerial photos of Phoenix.
Our first stop in Phoenix was the little Doll and Toy museum which is about 20 minute drive from the airport. It is located within a big museum complex which has its own big parking (free with museum validation). It has an interesting collection of toy houses, little replica of an ice cream cafe, big toy school with 16 students and a document, claiming to be 1872 Rules for Teachers.
Other places in the same complex are, probably, worth visiting, but we skipped them and went to the Desert Botanical Garden. On the way there Liza fell asleep and we checked in in our AmeriSuites hotel in downtown Scottsdale instead. The hotel is located right near the old Scottsdale, has a free wireless Internet (which we didn't use) and was only about $80/night.
After a little rest in the hotel we finally went to the Desert Botanical Garden and arrived there around 7 p.m. This turned out to be an excellent time for the garden tour as we avoided crowds and heat and caught the wonderful evening lighting very suitable for taking pictures. On the downside some of the exhibitions were closed and some trails were closing as we walked. The main park attraction are various cactuses. here are some samples:
The combination of cactuses and sunset is a real photographic feast! Every tour book recommended the Desert Botanical Garden and it is really worth it. After the long day we headed to My Big Fat Greek Restaurant in downtown Scottsdale where we had plenty of Greek food and fun. Make sure you order one of the two flaming cheeses - either Flaming Saganaki "OPA Time" or Flaming Feta"OPA Time". Couple of guys come to your table with a small dish and produce a huge fire right at your table which quickly disappears and you are left with a small piece of a tasty cheese. Unfortunately I didn't take my camera there thinking that a restaurant is a boring place to make pictures.
( Jun 15 2006, 05:58:07 PM PDT ) PermalinkP.S. core-aware psrinfo is in Solaris Nevada
The core-aware psrinfo(1M) command, described in my previous blog is integrated into Solaris Nevada and is available with the latest code drops. It will show up in build 39. ( Apr 26 2006, 11:15:27 AM PDT ) Permalink
It seems that most chip vendors adopted the multi-threaded way of improving their chip performance. Dual-core CPUs are really hot these days and Sun now sells UltraSPARC-T1 based systems with 8 cores and 32 threads.
Andrei Dorofeev provided a nice introduction to CMT Scheduling Optimizations for SunFire T2000 and James Laudon has an interesting description of threading implementation of the UltraSPARC T1.
Are you one of the lucky ones who got UltraSPARC T1? Or you have a box with an Intel hyper-threaded chip sitting inside? You may be wondering, how many cores does your system have and what are relationships between its physical and virtual processors. The Solaris operating system makes it all transparent - users do not need to know any low-level CPU details to benefit from the new chips. But there are always users who want to know everything.
If you follow The Register, you probably saw the article Sun has at least 1GHz of Niagara Viagra , published in September 2005. Here is a short snippet:
By using a couple of commands and looking at the OpenSolaris code, observers
would seem to be able to tell the Niagara chip's first name and clock speed,
along with future directions for the chip.
$ ./psrinfo -vp
The physical processor has 8 cores and 32 virtual processors
The core 0 has 4 virtual processors (0, 1, 2, 3)
The core 1 has 4 virtual processors (4, 5, 6, 7)
The core 2 has 4 virtual processors (8, 9, 10, 11)
The core 3 has 4 virtual processors (12, 13, 14, 15)
The core 4 has 4 virtual processors (16, 17, 18, 19)
The core 5 has 4 virtual processors (20, 21, 22, 23)
The core 6 has 4 virtual processors (24, 25, 26, 27)
The core 7 has 4 virtual processors (28, 29, 30, 31)
UltraSPARC-T1 (clock 1080 MHz)
Well, I saw this article too and decided to add this cute functionality to the psrinfo(1M) command, so I wrote a version of psrinfo(1M) command that can print the relationships between physical CPUs, cores and virtual CPUs. It requires some kernel support which is available in recent Solaris Nevada builds (32 and above). It should work on both Solaris Express and OpenSolaris versions.
The program is a
Perl Script, which provides a drop-in replacement of the standard psrinfo(1M)
psrinfo(1) command. It has the same options as the original and produces the
same output, but with a twist. On a systems having multiple cores per physical
processor, psrinfo -vp output will look a bit different:
$ uname -i
SUNW,Sun-Fire-T200
$ psrinfo.pl -vp
The physical processor has 8 cores and 32 virtual processors (0-31)
The core has 4 virtual processors (0-3)
The core has 4 virtual processors (4-7)
The core has 4 virtual processors (8-11)
The core has 4 virtual processors (12-15)
The core has 4 virtual processors (16-19)
The core has 4 virtual processors (20-23)
The core has 4 virtual processors (24-27)
The core has 4 virtual processors (28-31)
UltraSPARC-T1 (clock 1000 MHz)
And on x86:
$ uname -a
SunOS bolt 5.11 snv_32 i86pc i386 i86pc
$ psrinfo.pl -vp
The physical processor has 2 cores and 4 virtual processors (0-3)
The core has 2 virtual processors (0 1)
The core has 2 virtual processors (2 3)
x86 (GenuineIntel family 15 model 4 step 4 clock 3211 MHz)
Intel(r) Pentium(r) D CPU 3.20GHz
You can get the new psrinfo command here.
While writing this blog I tried running the new psrinfo.pl command on build 34 and noticed that it doesn't show the core information. It turns out that one of the projects inadvertently removed a small piece of kernel code required to make it work on Niagara platform. This means that the functionality is only available on builds 32 and 33 and build 36 and above which fixed the bug. On x86 platforms it works on all builds, starting from 32.
[ Technorati: NiagaraCMT, Solaris ]
( Feb 28 2006, 07:25:00 PM PST ) Permalink Comments [1]