Alan Hargreaves' Weblog
The ramblings of an Australian SaND TSC* Principal Field Technologist
* Solaris and Network Domain Technology Support Centre - The group I work forTags
(update 1) acoustic bind birthday blues bugs cec cec2007 cec2008 china cmt contention cringley debugging dogs dtrace earthquake encumbered-binaries extra flash funny google guitar halloween huron install kids linux liveupgrade locking mdb music mysql newyear niagra openjava opensolaris oracle patches patents percussion performance redhat secondlife security solaris sru sun support sxcr t2 t2000 timeslider ufs upgrade virtualbox windows youtube zfs
Tuesday Jul 11, 2006
A first look at Thumper
Jonathan has given a sneak preview of the new Thumper box. It's nice that this will be out shortly and we will be able to talk about it.
As you can see from what he writes, it's pretty damned impressive unit. With up to 24 Terabytes (you read that correctly) of internal SATA disk and ZFS on top of that, ...
It was a nice touch apologising to the courier drivers for including (all 170 lbs of) it in the try & buy program.
Posted at 06:05PM Jul 11, 2006 by Alan Hargreaves in Solaris |
Friday Jun 16, 2006
Will a faster cpu make my application faster?
I was recently involved in an escalation in which a customer had moved from one sparc platform to another and also moved to a faster release of Ultrasparc-IV than they had previously looked at.
It turns out that they actually saw their application slow down.
This is not as silly, nor as unusual as it may at first seem.
The actual platform migration was from a US-III workgroup server to a starcat class machine.
Now, there are some things to watch for in this type of migration as there there are some major differences in the architecture. Most specifically you are moving from a platform with a two tier memory architecture to one with a three tier architecture.
Now in Solaris 9 we have some "new" bits that help us immensely here that are simply not there and not able to be backported to Solaris 8.
These are Memory Placement Optimization (MPO) and Multiple Page Size Support (MPSS). MPO is the important one as it attempts to run programs on the same board that the memory exists on.
OK, that probably accounts for the slow down. Why did we not see any improvement.
While I don't actually have the data on the previous system I have my suspicions.
On the Starcat box I am seeing large amounts of idle time. The immediate thought here is, is the application cpu bound or is something else the bottleneck?
If we already have lots of idle time on the previous box, then it's odds on that the cpu is not our bottleneck.
If, for example, we have threads that are doing a lot of I/O, and we haven't changed the I/O subsystem, then the time we spend waiting on the I/O is not going to change. If that is the limiting speed factor, then faster cpus are not going to help us.
The suggestion coming out of this is that before upgrading your hardware in order to speed up your applications, please have a look at the application to see exactly where the bottlenecks are. You may be pleasantly surprised to find that there are cheaper options to improving your application performance.
Some good *stat commands to start with would be vmstat, iostat and mpstat.
First off, try "vmstat 5" This will give us output like
kthr memory page disk faults cpu r b w swap free re mf pi po fr de sr cd s0 -- -- in sy cs us sy id 0 0 0 692416 239220 33 129 80 1 4 0 140 37 0 0 0 469 2255 996 5 7 88 0 0 0 626860 175244 0 9 0 0 0 0 0 26 0 0 0 427 214 278 1 3 96 0 0 0 626860 175244 0 0 0 0 0 0 0 0 0 0 0 371 160 193 0 3 97 0 0 0 626860 175244 0 0 0 0 0 0 0 0 0 0 0 372 195 203 1 3 97
Have a look at the 'cpu' columns, specifically the user/system/idle split. Do we have idle time? Does the system time look excessive? These kinds of things are more complex to investigate, but can be looked in to.
It's also worth looking at the 'thr' columns. These show
- r - # threads ready to run but not yet running (a count of the threads on the dispatch queues
- b - # threads that are blocked waiting for resources (eg I/O, paging, ...)
- w - # swapped out lightweight processes that are waiting for processing resources to finish
Consistantly high numbers on any of these are cause for concern. Specifically, consistantly having threads in the dispatch queue is a sign that we are probably cpu starved in this box.
On to mpstat. This shows the following columns (I'm running this on a single cpu notebook, on multi-cpu machines you'd see more cpus).
CPU minf mjf xcal intr ithr csw icsw migr smtx srw syscl usr sys wt idl 0 115 6 0 460 356 920 249 0 0 0 2068 5 6 0 89 0 2 0 0 412 310 250 2 0 0 0 160 0 3 0 96 0 0 0 0 375 273 198 2 0 0 0 153 1 3 0 97
Again we have the user/system/idle split, but now it's on a per cpu basis. High numbers of icsw (involuntary context switches) on a particular cpu is an indication that that cpu is handling a lot of interrupts. This can have incredibly detrimental effects on applications trying to use that cpu. It may be worthwhile considering either processor sets, or using psradm to disable all but interrupts on that cpu.
High numbers in 'migr' (thread migrations) can also be detrimental as we end up having to invalidate cached data on one cpu and reload it in another. Binding processes to particular cpus might help here.
iostat is a good way to see how the I/O subsystem is running. I generally use something like the following command:
$ iostat -xnz 5
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
20.6 8.9 1281.1 53.1 1.8 0.3 62.1 11.2 9 21 c0d0
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
0.0 19.2 0.0 52.0 0.0 0.0 0.1 1.0 0 2 c0d0
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
extended device statistics
r/s w/s kr/s kw/s wait actv wsvc_t asvc_t %w %b device
The -z argument tells iostat not to print lines of zeros (hence the multiple headers with no data).
Most of the columns are self explanatory. I'm generally interested in 'asvc_t' (active service time), which generally equates to 'time on the wire', or how long the device takes to service the request; '%w' shows the average number of I/O requests pending for this device and '%b' gives an indication of how busy the device is. Note that '%b' assumes that the device is only capable of sequential individual packets to the device. This is obviously not the case for arrays.
High '%w' numbers are an indication of a lot of I/O taking place. High 'asvc_t' numbers may indicate a problem with the storage device. You should be seeing times of the order of 1-10ms in this column in general on current hardware. The above were taken on my notebook which has a slower IDE drive in it that had had a bit of a workout before I ran the stats.
You should also be aware that in pretty much all of the *stat commands, the first output is an average since boot, while good for a feel of the system average, it's not real useful when trying to get a feel for a particular time period.
Basically, a little bit of analysis can save you a considerable amount of money and then angst. I'd be failing in my role as a "trusted adviser" to recommend any other course. Yes, sure we'd like the money for customers buying more hardware, but for myself and many others, it's far more important for the customer to be a happy and returning customer rather than an unhappy one who is likely to look elsewhere for their next purchase due to such an experience.
Technorati Tags: Solaris, Performance, Sun
Posted at 11:13PM Jun 16, 2006 by Alan Hargreaves in Solaris | Comments[2]
Thursday Apr 27, 2006
What compells a tech writer to look at tech?
Like Jim, I too feel compelled to respond to Dwight Silverman's comments about Sun on Scott's moving to Chairman and Jonathan's becoming CEO.
He has responded to the comments that Jim left him thus:
Jim:
He's 'bit the dust' in that he's no longer CEO. Sorry if I'm a little cynical, but when I see the 'leaving-the-ceo-post-will-remain-chairman' in an executive-steps-down news release, I'm skeptical. I am, after all, from Houston, home of Enron & Compaq, which raised "the CEO is stepping down" to high art!
I'll repeat what I said in my post: I've not paid much attention to Sun recently, and the company should more be worried that folks like me haven't felt compelled to try. What I know is inconsisent growth and widening losses. "You'll like what you see?" Uh.....
I've posted the following in response.
Stating up front that I have a bias (I am an employee), but it appears that you do too.
I find it sad and incongruous that as a tech writer, you are more interested in the financials than the tech. It does not automatically follow that because the numbers have not been great that the tech cannot possibly be worth looking at.
There has been some pretty damned amazing tech coming out of Sun in the last two years.
Jim just gave you a reason to at least look (ie "... how much has changed at Sun in the past two years. Give us a chance. I think you'll like what you see"). Let me spell some of it out in a little more detail.
Solaris 10 is nothing short of revolutionary, as is the (now GPL'd) T1 chip in the T1000 and T2000 series machines. Some of teh new stuff coming down the line in Open Solaris is amazing (eg BrandZ and ZFS). The Opteron based boxes that Andy is turning out are world leaders.
I can only surmise that if you have not felt compelled to look that you have either had your head in the sand or have been deliberately not looking.
As a tech writer, rather than a financials writer, what does it take for you to feel "compelled to look"?
Alan.
Dwight moderates his comments to avoid the site being abused, but as he posted Jim's comments I have no doubt that he'l post (and hopefully respond) to mine).
Update
Dwight has indeed responded and a short conversation ensued. Rather than reproduce it here, you can follow it at the link I provided above. I will say, that although we appear to disagree on Sun's relevance to business as we each see things, I commend Dwight for both responding and keeping the conversation on a civil footing. All I can say now is taht I hope that some time in the not too distant future, something we do, does compell him to at least have a look.
Technorati Tags: Sun Microsystems Jonathan Schwartz Scott McNealy
Posted at 10:10AM Apr 27, 2006 by Alan Hargreaves in Solaris |
Monday Mar 13, 2006
Did SJVN read the article?
I've had my disagreements with Steven J. Vaughan-Nichols on many things, but I've generally respected him as a journalist as he does have some good history (including working on a Perkin Elmer with unix [I guess V7] like I did in the early 80's), and he generally researches before he writes.
However, on reading this response to an article that Sun put up about Ning, I am left wondering if he read past the hyphen in the title!
His diatribe jumped directly into a reinforcement of his opinion of us "flip-flopping on Linux" and contained nothing of the actual article that sparked him off.
Come on Steven, you can do better than that.
The actual title of the article is a quote from Marc Andreessen, who (last time I looked) is not a Sun employee. The guts of it are actually interesting, as Marc (who initially thought it would be cheaper to implement what he needed using Linux on whiteboxes) was surprised to find that he could actually do it cheaper using Sun's boxes and Solaris 10.
The article goes into Marc's numbers and how he made the decision. I believe that Jonathan also went into further detail of the calculations back on February 17.
Surely a refutation(sp?) of the actual content would have been far more useful and interesting than this diatribe Steven.
Update
This, this and this are the type of thing that I mean when I talk about Steven researching and doing a good background before publishing.Posted at 05:43PM Mar 13, 2006 by Alan Hargreaves in Solaris |
Thursday Mar 09, 2006
Apache Software Foundation and Solaris 10
Sun gave the Apache Software Foundation a V40z with Solaris 10 on it last year. Tim Bray did an email interview with Mads Toftum having a look at how the box is being utilised. It makes a very interesting read.
Posted at 11:33AM Mar 09, 2006 by Alan Hargreaves in Solaris | Comments[1]
Tuesday Dec 20, 2005
A "Performance" problem on a T2000 [reprise]
This call is pretty much at a conlusion now, well it's out of my hands.
It turns out that the "benchmark" that was done up does indeed use a single connection across multiple threads. Hence we only get a single server thread.
We've asked the account team to discuss this with the customer and to find out just how closely this mirrors the "real" load. Given what the account team has been telling us, I suspect the answer is that it doesn't. The account team believes that we have multiple connections from multiple processes.
It's now up to the account team and perhaps MDE to work out a benchmark load that more accurately reflects the customer's production environment.
So, moving to a T2000 is actually likely to dramatically increase the throughput that this customer is going to see on their production load, with (I might add) no modification to the production code.
If I hear any more on this, I'll mention it.
Technorati Tags: Solaris, OpenSolaris, Niagara, DTrace
Posted at 10:38PM Dec 20, 2005 by Alan Hargreaves in Solaris |
Wednesday Dec 14, 2005
A "Performance" issue on a T2000
For the last week or so, I've been troubleshooting a performance issue on a T2000. Not quite there yet.
Background
The box that went out to the customer was a beta version of the T2000 that only had four cores (i.e. 16 virtual cpus). It was also running an earlier release of the kernel (Solaris 10 hw 2 build 3) than what is currently shipping on the T2000s (Solaris 10 hw 2 build 5), which had some nice little gotchas in it (ipge hang on write problem and another that would stop DTrace working).
The customer here was running an MQ Series (v5.3) test and was interested in the number of packets/second that could be processed. The baseline was that they could do 4000/second on a v440 and expected to be able to do 16000 on a T2000. The problem was that they were only seeing about 2000.
So, what happened?
First off I tried to address the DTrace issue by bringing the beta box up to KU-20. While installing the patch, I noticed that a number of packages were missing, and thus not patched. When we tried to reboot this box, it complained about missing files and refused to boot.
OK, I though, I asked these guys to make a flash archive of the full box before I started playing with it, we'll just re-install.
It appears that the people working with this box (which is in another country to me) did not have installation media. OK, I pointed them to where they could get cd images of the version currently shipping on T2000s and they could use that to bootstrap the flash image that they had taken.
Guess what? There is a known issue with booting this version on those beta boxes, which is adressed by adding a few lines to /etc/system. Unfortunately, last time I looked, most installation media is read only.
It also turns out that the "flash archive" that was taken was ufsdumps of root and var.
By this time I had gained access to a released version of a T2000 in the Sydney lab. Fortunately it had a second disk in it that I could drop the ufsdump images onto, and after a bit of fiddling (mainly getting IP address right and fixing vfstab to point at the correct disk) I was able to get it up and running locally. *phew*
I applied a kludge to enable DTrace to work (commenting out some lines in sched.d) and ran up the server and the client. \p>
mpstat shows a pretty idle box (95-98%). iostat shows very little disk activity. Time to crank out DTrace.
First off, who is doing read(2) system calls?
# dtrace -q -n 'syscall::read:entry { @[execname] = count();}
tick-10s { printa(@); clear(@); }'
nscd 1
fmd 3
java 12284
amqrmppa 24566
fmd 0
nscd 2
nfsmapid 2
java 11938
amqrmppa 23869
nscd 0
nfsmapid 0
fmd 0
ttymon 2
sac 2
java 12306
amqrmppa 24611
Ok, we have java and amqrmppa. The client is java so we will leave that as we're interested in the server. Let's have a look at the number of reads per second that each thread of this process is doing.
# dtrace -q -n 'syscall::read:entry
/execname == "amqrmppa"/ { @[tid] = count();}
tick-10s { normalize(@,10); printa(@); clear(@); }'
5 2563
5 2462
5 2557
There are two things of interest here.
- We are seeing around 2500 reads per second, which gels with the customer seeing about 2000 messages/second. This is probably a pretty good gauge of messages/second.
- Only one thread is doing any of the reading. The server is running single threaded!
Running single threaded might be good on a box that has a small number of very fast cpus, but is about the worst possible thing that you could do on a T2000.
Out of interest, let's see what stacks are doing the reads, just to make we are in the right place.
# dtrace -q -n 'syscall::read:entry
/execname == "amqrmppa"/ { @[ustack(20)] = count();}
tick-10s { normalize(@,10); printa(@); clear(@); }'
libc.so.1`_read+0x8
amqcctca`cciTcpReceive+0xc24
libmqmr.so`ccxReceive+0x1d0
libmqmr.so`rriMQIServer+0x2f4
libmqmr.so`rrxResponder+0x52c
libmqmr.so`ccxResponder+0x14c
libmqmr.so`cciResponderThread+0xac
libmqmcs.so`ThreadMain+0x890
libc.so.1`_lwp_start
2534
Which kind of looks like we have a server receiving packets.
One other thing that I noticed is that each time I killed and restarted the client, it looks like it attaches to a new single thread in the server.
I spent quite some time going through the mqm trees and google but so far I have been unable to come up with a way to increase the number of threads in the sever.
For all intents and purposes we are running single threaded. If we can increase the number of server threads to match the platform, then I would expect to see an incredible increase in the #packets/second processed.
If any of my readers have any suggestions on how to increase the number of server threads, I'd love to hear from you. MQ Series is not something that I've spent a lot of time with.
An Aside
I should mention one other thing that is incredibly useful if you happen to have a machine running a relatively current nevada or open solaris.
As Bryan mentioned when he addressed SOSUG in Sydney, the output from the dtrace command when given a -l and a -v command has been enhanced to also give you the types of the arguments. I used this a bit while looking at other things to get a feel for the system. It saved me having to dig up the reference manual. For example:
$ dtrace -v -l -n io:::wait-start
ID PROVIDER MODULE FUNCTION NAME
514 io genunix biowait wait-start
Probe Description Attributes
Identifier Names: Private
Data Semantics: Private
Dependency Class: Unknown
Argument Attributes
Identifier Names: Evolving
Data Semantics: Evolving
Dependency Class: ISA
Argument Types
args[0]: bufinfo_t *
args[1]: devinfo_t *
args[2]: fileinfo_t *
Update
update 1
I suspect that what we are seeing here is a client that does a connect, and then spawns all of it's threads. It looks like the way that the server is working is that we get a thread per connection.
I'm currently looking at a way to verify this suspicion.
update 2
Just for kicks, I modified MQSender.properties back to 1 client thread and then proceeded to start up 8 instances of the client.
This looks much better, we are tending around just under 16000/second on the server side. What is peculiar is that about every 38 seconds we see the count drop to 0 for about 4-6 seconds. At this time we see idle jump to 100% and more interestingly iostat shows a lot of IO to /var with active service times blowing out to half a second.
Playing with the SCSI write cache and forcing the filesystem to forcedirectio does not appear to help us.
Looking through the java source to the client, it looks like the connection is shared between all threads created. It would probably be more useful (and be more likely to reflect real life) if each of the threads had their own connection.
update 3
I've got to say that I'm also a bit suspect of the relevance (to reality) of "benchmarking" an application server platform having both the client and server both living on the one machine. Do people actually do this in real life where the server is likely to be pushed to it's limit? I would have thought that it would be much more reasonable (and likely) to run the application server seperately to it's clients.
It might be interesting to try splitting the client and server onto two different boxes. Unfortunately I've only got one T2000 to play with.
Technorati Tags: Solaris, OpenSolaris, Niagara, DTrace
Posted at 12:07PM Dec 14, 2005 by Alan Hargreaves in Solaris | Comments[2]
Friday Nov 18, 2005
Avoiding confusion about Solaris 10 patches and updates
Copy of an email sent to patchpolicy-AT-sun-DOT-com
Folks, the following sentence on http://www.sun.com/service/sunconnection/solaris10patches.html serves only to confuse the issue.
As of November 29, 2005, the following options will be available for accessing Solaris 10 software updates:
Right from the start we have been pointing out that the Solaris 10 updates (eg s10u1, s10u2, ...) will remain free for download. *I* understand that this is to remain the case. Many people are likely to be confused over over this now because of the phrase "software updates" in the above sentence.
Even though it *is* the case that the Solaris 10 updates will remain free, we *are* going to cop some confusion and the resulting bad feeling in the public forums through the use of these words.
Is it possible to choose a name less confusing? What was wrong with calling patches "patches"?
alan.
Technorati Tags: Solaris
Update - Nov 21
I've received an response from the patch policy folks along the lines of they'll look at incorporating my comments for their next update before the end of November. Let's wait and see.
I really don't think we should be in the business of trying to confuse our customers. If this naming goes ahead we will have:
- Solaris 10 Hardware Updates (adding New hardware support)
- Solaris 10 Software Updates (patches)
- Solaris 10 Updates (new functionality. eg zfs)
It's starting to get silly.
Posted at 07:09AM Nov 18, 2005 by Alan Hargreaves in Solaris | Comments[1]
Tuesday Nov 01, 2005
ZFS Integration - Jeff Bonwick's comments
Jeff Bonwick has blogged about what the team went through to get ZFS integrated into the Solaris Nevada source tree today (well today in the US is still Oct 31).
As the man says, watch for it to appear in a Solaris Express shortly.
Posted at 06:40PM Nov 01, 2005 by Alan Hargreaves in Solaris | Comments[1]
Friday Oct 14, 2005
DTrace/Containers Tutorial at AUUG Winter Conference
I hear that there are still places left in Bryan's tutorial at the AUUG Winter Conference.
He will be talking about DTrace and Containers on Solaris 10.
Bryan is an amazing speaker who really gets deep into his subject matter and knows what he is talking about. This tutorial will definitely be worth attending.
For those that don't know, Bryan is one of the designers and implementors of DTrace. MIT's Technology Review named him as one of their TR35 -- the top 35 innovators under the age of thirty-five.
Technorati Tags: Sun, Unix, Solaris, AUUG, OpenSolaris
Posted at 02:24PM Oct 14, 2005 by Alan Hargreaves in Solaris |
Wednesday Sep 21, 2005
So, who is using Solaris 10 in production?
In my weekly one on one with my manager this morning (my manager is in Tokyo), I was asked what my impression of Solaris 10 takeup in Australia was.
I could only answer that I've seen a few calls get up to PTS-Kernel, and that must be indicative of some kind of take-up, but we only get to see when things break.
I decided that I could take a straw poll through the blog. I'm going to ask some questions and ask for people to follow the email submission link back to me rather than respond in comments, so we can give a little privacy.
That being said, I have no interest in doing anything with people's email addresses, once I've done the collating, I'll be removing the emails and anything identifying about them.
While I'm initially interested in Australian Customers, please feel free to pass on your thoughts if you're not in Australia, simply deselect the "I am in Australia" checkbox.
There appears to be a problem with the blogs s/w we are using and it won't let me save a form in a blog entry, so I'll upload it as a seperate file. You can find the link here.
I suppose we should put a sunset on this. I'll post something about the results on October 10 (Australia/NSW).
Posted at 02:41PM Sep 21, 2005 by Alan Hargreaves in Solaris |
Thursday Aug 04, 2005
DTrace and Perl
Following a link in a comment from Alan Burlison from Bryan's blog about creating a php provider, it appears that Alan is very well along the path to having a DTrace provider for perl.
To read a bit more about this, have a look at the mail thread that starts here.
The only suggestion that I'd make is that I'd find perl$target far easier to type than perl_dtrace_execute$target.
Other than that, this looks like a great step forward.
Technorati Tags: Solaris, perl, DTrace
Update
Wow, I had no idea that Alan Burlison was back at Sun. That's wonderful news. Welcome back (yet another) Alan :)
Posted at 04:33PM Aug 04, 2005 by Alan Hargreaves in Solaris | Comments[1]
Wednesday Aug 03, 2005
Technical Blogs that I have written
Would this be a meta-blog entry?
Below are a list of links to technical articles that I've put up on my blog. The list is currently sorted by reverse date. At some stage I may get around to trying to categorise things. It's going to take me some time to go back through the last year and a bit of blogging, so bear with me.
At some point I'll make this a fixed page.
- July 31, 2005
- How Solaris Calculates %user, %system and %idle
- July 17, 2005
- Cscope-fast in OpenSolaris-Land
Posted at 06:34PM Aug 03, 2005 by Alan Hargreaves in Solaris |
Sunday Jul 31, 2005
How Solaris Calculates %user, %system and %idle
A year or so back I wrote an infodoc that described how we calculated the %iowait (or %wio) number. I had always intended to create a companion document outlining the broader question of how we do the %user, %system and %idle numbers. A few misconceptions that I've seen have prompted me to do this as a 'blog first.
The first thing that must be noted is that with Solaris 10 and microstate accounting we completely changed how the numbers are arrived at. I'll go over the pre-Solaris 10 method first, then discuss the current method along with links into the Open Solaris Source Tree.
Before Solaris 10
There is an array in cpu_t called cpu_stats.sys.cpu[]. This array contains counters for:
CPU_USER
CPU_SYSTEM
CPU_WAIT
CPU_IDLE
The various array entries contain a count based on a sample (taken at fixed intervals) of what each cpu is doing at the time of the sampling. In order to determine usage, we must take two snapshots of these counters and look at the differences.
If we sum these differences, we get a count of how many samples were taken for this particular cpu. We then simply calculate a percentage for each of the figures.
Okay, so how do we do the sampling?
Usually, the function clock() is called every 10ms1. We do the sampling in here. For each cpu, we look at what it is currently executing and increment the appropriate counter. Note that in Solaris 9 and earlier we still have a counter for IO Wait. This number is only calculated if the cpu is idle. See infodoc 75659 for more explanation of this.
The values are accessible through the cpu_stat kstat module as idle, user, kernel and wait. eg
$ kstat -m cpu_stat -s '/^(wait|user|kernel|idle)$/'
module: cpu_stat instance: 0
name: cpu_stat0 class: misc
idle 373121
kernel 11557
user 5196
wait 0
1. If we define hires_tick as non-zero in /etc/system, then clock will be called every millisecond.
Solaris 10 and Beyond
In general, the sampling method gives us a pretty good number. It would be an unusual thread that takes a significant amount of cpu time, that is not on cpu every time that clock() runs. However, implementing microstate accounting gave us the opportunity to make it even better.
The raw numbers are now kept in an array in cpu_t called cpu_acct[]. This contains entries for:
CMS_USER
CMS_SYSTEM
CMS_IDLE
There is a state called CMS_DISABLED, but it's used for something else and there is not an array element for it.
So what are the numbers? We don't sample anymore. The numbers represent delta values from the high resolution timer (nanoseconds) taken from when the cpu entered this state, to where we are about to change it. The values are calculated in new_cpu_mstate().
The current state is saved in cpu->cpu_mstate. On a state change, the high resolution time is stored in cpu->cpu_mstate_start.
new_cpu_mstate() reads the high resolution timer once at the beginning of the routine. This time is used for the end of the period being measured and the start of the new period so we do not lose small numbers of cycles.
This function is called whenever we change state. It's called directly from idle_enter() and idle_exit(). The other state changes are handled from new_mstate(), which also updates per-lwp statistics. The following functions and macros call new_mstate()
SEMA_BLOCK()
cv_block()
fp_precise()
fpu_trap()
lwp_block()
lwp_cond_wait()
lwp_mutex_timedlock()
lwp_mutex_trylock()
lwp_park()
lwp_rwlock_lock()
sched()
shuttle_resume()
shuttle_sleep()
shuttle_swtch()
stop()
term_mstate()
trap()
turnstile_block()
The upshot of this is that you can probably place a higher reliance in the figures now, whereas the previous figures were a little more coursely grained.
The kstats for the previous figures still exist as distinct structure elements in cpu->cpu_stats.sys, the difference being that they are now calculated from the microstate accounting generated figures.
The new figures can be accessed through the new cpu kstat module which has a grouping called sys, containing the statistics cpu_nsec_idle, cpu_nsec_kernel and cpu_nsec_user.
$ kstat -n sys -s 'cpu_nsec*'
module: cpu instance: 0
name: sys class: misc
cpu_nsec_idle 3626708012091
cpu_nsec_kernel 113348790642
cpu_nsec_user 50875403788
Technorati Tags: OpenSolaris, Solaris
Posted at 09:14AM Jul 31, 2005 by Alan Hargreaves in Solaris | Comments[3]
Wednesday Jun 15, 2005
July Australian Personal Computer Magazine Features Solaris 10
This one has been in the works for a couple of months.
Some time back we were approached by the magazine to do a feature issue on Solaris 10, including a copy of Solaris 10 on the DVD that ships with each issue.
It's happened with the July Issue.
The DVD includes the four iso images that comprise Solaris 10 FCS for x86 and amd64.
Reviews from APC staff include
"Solaris is a flexible and powerful OS that can run everything from a home PC to massive multi-CPU servers. Now it is free,Solaris seems a viable option for small to medium businesses wanting a secure and stable OS. To cap it off, a great office suite - StarOffice 7 - is bundled with the program." Tony Sarno, Editor, APC Magazine
"Sun's Solaris is an operating system that typically powers serious computing infrastructures. It is the most successful and widespread commercial Unix OS in the world.... For all its power, Solaris installs as easily on a PC as Windows XP or SUSE Linux." Peter Sbarski, APC reviewer
Features include
- a 2 page article on Sun's strategies with Solaris titled, "Open for Business",
- a first installment of an on-going workshop feature, this month titled, "Solaris, so good", 4 pages instructions on installation.
Also included is installation support for one month from Sun Australia. More information about support can be found by purchasing the magazine and finding the link information enclosed therein. Also at that link you will find pointer to how to install Solaris in a dual boot configuration with Windows XP.
Technorati Tags: Solaris, x86, amd64, magazine, unix, Sun Microsystems.
Posted at 11:07AM Jun 15, 2005 by Alan Hargreaves in Solaris | Comments[2]

