Monday May 12, 2008 Well, I can truly say that I had a great time at this years JavaOne. There were several highlights for me but all of them weren't technical (Smashmouth were absolutely excellent on Thursday night). I guess what I enjoyed the most was seeing what's going off in technology areas that are so different to the one that I do for my day to day activities and I picked up some ideas for pet projects that may (or may not) see the light of day this year...
Down to DTrace and Java - it was exciting to see how much interest there is in the marriage of these two technologies. The demo booths in the pavilion seemed to be permanently rammed with people wanting to see what's going off in this area and the hands on labs were very well attended (hats off to Angelo et. al). Simon Ritter and I got the chance to take up the baton from Adam and Jarod this year to present two sessions on DTrace and Java (well, we did the same session twice anyway!). We had a fantastic time doing this and may thanks to everyone who came to see the show and also to those that asked so many interesting and well thought out questions afterwards.
DTrace tends top be a hands-on technology and demo's and talks are best when code is cranked live so that is what we (mostly) did. One of the issues though when demo'ing DTrace is having a workload to observe that isn't too contrived and simplistic but one that also isn't overly complicated. For our JavaOne talk I wanted some Java code that everyone could feel at home with and, standing on the shoulders of giants , I opted for a game from my youth, Space Invaders. I used Kevin Glass's excellent Space Invader game as it was designed as a tutorial and is therefore simple and perfect for a demo like this. As a bunch of people asked for the scripts for this session I thought I'd put down here what I did. What follows are three examples of what we showed on the day in our session. Apologies if what I write is over prescriptive at times for the DTrace long timers but hopefully a bit of detail may help any newcomers to DTrace.
The main aim of this session was to show how you can use the dvm and hotspot providers to obtain new views on your Java code. I then wanted to show how the great work that is being done by Keith McGuigan and team on Java Statically Defined Tracing can be used to enhance that view. All the Java focussed demo's of the session (there were some plain old DTrace ones as well) were done using the hotspot provider which comes with Java SE6 and the SE7 JVM for the JSDT demo's. I wanted to show the approach that I'd taken when first analysing the behaviour of this game so that the attendees could get a view of the kind of approaches you can take when looking at such an application (which may not be the most optimal way BTW!). Anyway, onto the code and analysis.
I had the code in one workspace in a netbeans project so that I could easily modify it with and rebuild it quickly. First off I just wanted to look at the methods that were being called to see if anything of interested turned up:
$ cat meth.d
#!/usr/sbin/dtrace -s
hotspot$target:::method-entry
{
this->str = strjoin(".", copyinstr(arg3, arg4));
@[strjoin(copyinstr(arg1, arg2), this->str)] = count();
}
Note that the strjoin()'s that I use are just there to make the output pretty. We grab the pid of the relevant jvm:
$ jps 806 Main 813 Game 814 Jps
...run the script and play the game for a few seconds:
$ dtrace -q -s ./meth.d -p 813 ^C <chop> java/util/concurrent/locks/AbstractQueuedSynchronizer.release 5153 java/util/concurrent/locks/ReentrantLock$NonfairSync.lock 5153 java/util/concurrent/locks/ReentrantLock$Sync.tryRelease 5153 sun/java2d/SunGraphics2D.drawImage 7466 sun/java2d/pipe/DrawImage.copyImage 7466 java/awt/Rectangle.intersects 108993 org/newdawn/spaceinvaders/Entity.collidesWith 108994 java/awt/image/BufferedImage.getHeight 221725 java/awt/image/BufferedImage.getWidth 221725 java/util/ArrayList.RangeCheck 227932
So, from the look of the method names this is collision detection code which may or may not be interesting. Let's just take a look at where these methods are being executed by getting stack traces of the RangeCheck method:
$ cat range.d
#!/usr/sbin/dtrace -s
hotspot$target:::method-entry
/copyinstr(arg3, arg4) == "RangeCheck"/
{
@[jstack(30, 16384)] = count();
}
$ dtrace -q -s ./range.d -p 813
^C
<chop>
libjvm.so`__1cNSharedRuntimeTdtrace_method_entry6FpnKJavaThread_pnNmethodOopDesc__i_+0x7b
java/util/ArrayList.RangeCheck(I)V*
org/newdawn/spaceinvaders/Game.gameLoop()V*
org/newdawn/spaceinvaders/Game.main([Ljava/lang/String;)V
StubRoutines (1)
libjvm.so`__1cJJavaCallsLcall_helper6FpnJJavaValue_pnMmethodHandle_pnRJavaCallArguments_pnGThread__v_+0x1a3
libjvm.so`__1cCosUos_exception_wrapper6FpFpnJJavaValue_pnMmethodHandle_pnRJavaCallArguments_pnGThread__v2468_v_+0x27
libjvm.so`__1cJJavaCallsEcall6FpnJJavaValue_nMmethodHandle_pnRJavaCallArguments_pnGThread__v_+0x2f
libjvm.so`__1cRjni_invoke_static6FpnHJNIEnv__pnJJavaValue_pnI_jobject_nLJNICallType_pnK_jmethodID_pnSJNI_ArgumentPusher_pnGThread__v_+0x1df
libjvm.so`jni_CallStaticVoidMethod+0x154
java`JavaMain+0xd30
libc.so.1`_thr_setup+0x52
libc.so.1`_lwp_start
5153
We see that most of them are coming out of the Game.gameLoop() method which probably makes sense if it's collision detection. Looking at the code for this we see:
for (int p=0;p<entities.size();p++) {
for (int s=p+1;s<entities.size();s++) {
Entity me = (Entity) entities.get(p);
Entity him = (Entity) entities.get(s);
if (me.collidesWith(him)) {
me.collidedWith(him);
him.collidedWith(me);
}
}
}
The entities object is an ArrayList so the RangeCheck calls are coming from there (looks like the get() is being optimised out of the stacktrace). Now that we have the area of code in view we can easily see that there is an inefficiency here. This loop goes along the list of entities (aliens, missiles) and sees if the current entity (me) intersects with any other entities. However, note that the current entity, me, is being retreived on every iteration of the inner loop when it only needs to be fetched on the outer loop. We can hoist that out as shown below, rebuild and see what the effect is:
for (int p=0;p<entities.size();p++) {
Entity me = (Entity) entities.get(p);
for (int s=p+1;s<entities.size();s++) {
Entity him = (Entity) entities.get(s);
if (me.collidesWith(him)) {
me.collidedWith(him);
him.collidedWith(me);
}
}
}
$ dtrace -q -s ./meth.d -p 833
<chop>
java/util/concurrent/locks/AbstractQueuedSynchronizer.release 4747
java/util/concurrent/locks/ReentrantLock$NonfairSync.lock 4747
java/util/concurrent/locks/ReentrantLock$Sync.tryRelease 4747
sun/java2d/SunGraphics2D.drawImage 7144
sun/java2d/pipe/DrawImage.copyImage 7144
java/awt/Rectangle.intersects 104874
org/newdawn/spaceinvaders/Entity.collidesWith 104874
java/util/ArrayList.RangeCheck 117608
java/awt/image/BufferedImage.getHeight 213326
java/awt/image/BufferedImage.getWidth 213326
And we see that the RangeCheck method has dropped down the list accordingly.
I wanted to provide an example where we could write some simple D to drill down on a particular aspects of the games behaviour that may be of interest. In a game like this there are really only a few things that are of great interest and any interaction a user has is one of these areas. The example I took next was to look at detecting when a missile has gone off the screen (as opposed to hitting an alien) to demonstrate the kind of things we do to differentiate events in D scripts. The code for moving a missile is the ShotEntity.move method:
public void move(long delta) {
// proceed with normal move
super.move(delta);
// if we shot off the screen, remove ourselfs
if (y < -100) {
game.removeEntity(this);
}
}
So, we just need to look for the case where we calling game.removeEntity when we are in the ShotEntity.move method. The script that I created to show this is:
$ cat shotmiss.d
#!/usr/sbin/dtrace -s
hotspot$target:::method-entry
/basename(copyinstr(arg1,arg2)) == "ShotEntity" &&
copyinstr(arg3, arg4) == "move"/
{
self->in = 1;
}
hotspot$target:::method-entry
/self->in && copyinstr(arg3, arg4) == "removeEntity"/
{
printf("Missile Missed!\n");
}
hotspot$target:::method-return
/basename(copyinstr(arg1,arg2)) == "ShotEntity" &&
copyinstr(arg3, arg4) == "move" && self->in/
{
self->in = 0;
}
So, we probe method-entry and build up the predicate to show our interest in ShotEntity.move. Note that I use the D basename() subroutine to pull the class name out of the fully qualified package name. By setting the thread local variable, in, when we enter the move method we can then predicate upon that in the following clause to ensure that we only show interest in removeEntity calls that originate from ShotEntity.move. The third clause makes sure that we unset the in thread local variable when we return from the ShotEntity.move method as we're done.
Java Statically Defined Tracing is the Java equivalent of USDT and should be arriving in Java SE7. Augmenting the SpaceInvader game to use JSDT was simple. I first defined my provider by extending the Provider interface:
@ProviderName("Invader")
interface MyProvider extends Provider {
@ProbeName("boom") void death(String type);
void miss(int nothing);
}
Note that although I have named the provider, MyProvider, I have used an annotation to override this name so that it will appear as Invader. You'll see I also did the same with the probe named death. This is for no other reason than to demonstrate the annotations. As you see above, the provider has two probes defined, death which takes a String as its argument and, miss which takes an integer as its argument.
In the Game class's constructor I created the instance of the ProviderFactory from which I then create my provider (read Keith's blog referenced above for why it's done this way):
provider = ProviderFactory.getDefaultFactory().createProvider(MyProvider.class);
Note that provider is defined as public MyProvider provider in the Game classes public data. Adding probes is then trivial as we just insert calls in our code where suitable. I had two probe placements, one which is taken from the last hotspot example, to show missiles going off screen. The ShotEntity.move code now becomes:
public void move(long delta) {
// proceed with normal move
super.move(delta);
// if we shot off the screen, remove ourselfs
if (y < -100) {
game.provider.miss(1);
game.removeEntity(this);
}
}
I also added the death probe into where Entities (aliens and missiles) are destroyed in Game.removeEntity. I also added in here some code to demonstrate the JSDT mechanism for checking whether a probe is enabled or not. We can use this technique of the code taken to construct probe arguments is costly and therefore we only want to do it when a probe is enabled. The code for this looks like:
public void removeEntity(Entity entity) {
Method m = null;
String ent = null;
try {
m = MyProvider.class.getMethod("death", String.class);
}
catch (NoSuchMethodException e) {
System.out.println("Can't find probe");
}
Probe p = provider.getProbe(m);
if (p.isEnabled() ) {
// Do argument prep in here
if (entity instanceof AlienEntity)
ent = "Alien";
else if (entity instanceof ShotEntity)
ent = "Missile";
else
ent = "Unknown Object";
}
provider.death(ent);
removeList.add(entity);
}
We obtain a reference to the provider from which we can then verify whether or not a probe is currently enabled. Smart! If it is I then check what type of object is being removed and prepare a suitable string to pass as an argument to the death probe. Building the code and running it we then see the following probes available:
$ dtrace -l -P Invader* ID PROVIDER MODULE FUNCTION NAME 70483 Invader899 java_tracing unspecified miss 70484 Invader899 java_tracing unspecified boom
I then cranked the following script which shows us how many aliens were destroyed, how many missiles were fired and how many of those went off the screen (i.e. missed their targets)
$ cat alien.d
#!/usr/sbin/dtrace -s
Invader*:::miss
{
@[probename] = count();
}
Invader*:::boom
{
@[copyinstr(arg0)] = count();
}
Running this against the game and playing a level we saw:
$ dtrace -q -s ./alien.d ^C miss 18 Alien 60 Missile 78 $
18 misses out of 78 shots! Poor indeed! One thing that was immediately obvious to those watching this was the much, much reduced probe effect of the JSDT version. With the hotspot techniques we have visible probe effect owing to the mechanisms being used by that provider but the JSDT version is slick and fast.
The last thing I demo'd for fun was the integration of these newly created JSDT based probes with a visualisation technique. I took the Chime Visualisation tool and created a display so we could graph the killing in real time! I did cheat and create the display before the session but it only took a few minutes to do. Here's a screenshot of some realtime destruction:
As you can see from the screenshot, I was a real killing machine until near the end (missile fired in blue , aliens killed in green) when I couldn't hit anything at all for and my miss count (in red) rises rapidly! The point of this demo was not to prove my lack or prowess at SpaceInvaders but rather to show that we have a set of Java API's that you can hook into easily so it's not much effort to plug DTrace into a Java management framework or visualisation tools (just as two examples).
Anyway, that's about it for now. I did do some more hotspot demo's which were interesting but they'll have to save for another time. If you came along to our sessions at JavaOne then I hope you enjoyed it as much as we did.
( May 12 2008, 04:35:50 PM GMT+00:00 ) Permalink Comments [0]I had the fortune of attending the first dtrace.conf last Friday in San Francisco and I have to say that it was a fantastic event. There are many aspects that I enjoyed but what really made it for me was the chance to have great technical and social exchanges with such a diverse bunch of engineers. It didn't matter who we all worked for; DTrace was the great leveller. I really had some of the most thought provoking technical conversations I've had in a long time. By the end of the day my head was spinning with the amount and quality of ideas that I'd heard (well, it was either the ideas or Richard McDougall's driving...).
Many thanks to the DTrace Patriarchs for organising the event, Forsythe for funding it and for the many attendees who willingly shared and listened. Absolutely top day. ( Mar 18 2008, 06:40:20 AM GMT+00:00 ) Permalink Comments [1]
The London OpenSolaris User Group meets once a month at the Sun Customer Briefing Center on King William Street (the North end of London Bridge). If you ever get the chance to go along to one of the meetings then I can heartily recommend it as a there is always a nice amount of pretty good food and drink supplied by a generous, anonymous benefactor. In addition to this there is a regular group of between 30-50 people who make for good chat during the evening.
Ages ago I had agreed to present something on DTrace at the September event. However, everyone who attends has no doubt been exposed to the Haslam DTrace tour so I couldn't force that upon those poor souls. What I came up with was a quick walk through of some of the new features that have been introduced into DTrace since Solaris 10 first shipped and the slides are given below. The slides are in no way meant to be a complete set of features but just a selection of ones that I thought may be of interest. On the night I actually demo'd various pieces but I have actually included some example code in the slides so you can (hopefully) see what is going off. Features that don't have examples meant that I ran out of time by then!
( Sep 26 2007, 01:10:47 PM GMT+00:00 ) Permalink Comments [3]Grab your Irn-Bru: Accessing a programs local variables using DTrace
A question that has raised its head several times on the dtrace-discuss@opensolaris.org forum has been, "how do I access the local variables in my program"? Unfortunately, the answer to this isn't particularly pretty so I thought I'd provide a simple (and contrived...) example of how to this. What we're doing here relies on getting just a teeny bit low down and dirty so you may want to grab your Irn-Bru laddy! The example used here was done on an x86 based system but the principle is exactly the same for any other architecture. Take the extremely simple piece of code below:
void
foo(void)
{
int i;
const char *string = "Foo bar test string";
for (i = 0; i < strlen(string); i++) {
printf("%c", string[i]);
}
printf("%\n");
}
We're going to use DTrace to tell us the value of the loop counter i when we encounter a space in the string that is being iterated over.> foo::dis foo: pushl %ebp foo+1: movl %esp,%ebp foo+3: subl $0x8,%esp foo+6: leal 0x8050ac8,%eax foo+0xc: movl %eax,-0x8(%ebp) foo+0xf: movl $0x0,-0x4(%ebp) foo+0x16: movl -0x8(%ebp),%eax foo+0x19: pushl %eax foo+0x1a: call -0x163 < PLT:strlen > foo+0x1f: addl $0x4,%esp foo+0x22: movl -0x4(%ebp),%edx foo+0x25: cmpl %eax,%edx foo+0x27: jae +0x32 < foo+0x5b > foo+0x29: movl -0x8(%ebp),%eax foo+0x2c: addl -0x4(%ebp),%eax foo+0x2f: movsbl 0x0(%eax),%eax foo+0x33: pushl %eax foo+0x34: pushl $0x8050adc foo+0x39: call -0x172 < PLT:printf > foo+0x3e: addl $0x8,%esp foo+0x41: movl -0x4(%ebp),%eax foo+0x44: incl %eax foo+0x45: movl %eax,-0x4(%ebp) foo+0x48: movl -0x8(%ebp),%eax foo+0x4b: pushl %eax foo+0x4c: call -0x195 < PLT:strlen > foo+0x51: addl $0x4,%esp foo+0x54: movl -0x4(%ebp),%edx foo+0x57: cmpl %eax,%edx foo+0x59: jb -0x32 < foo+0x29 > foo+0x5b: pushl $0x8050ae0 foo+0x60: call -0x199 < PLT:printf > foo+0x65: addl $0x4,%esp foo+0x68: leave foo+0x69: retSo, from the above instructions we need to figure out where we can insert a probe with DTrace so we can then get our hands on the relevant data! Well, it's not as bad as it looks really. Our two variables i and string have space created for them on the stack early on:
foo+6: leal 0x8050ac8,%eax foo+0xc: movl %eax,-0x8(%ebp) foo+0xf: movl $0x0,-0x4(%ebp)So, i is at %ebp-0x4 and string is at %ebp-0x8. There are a couple places in the code where we could insert a probe to inspect whether or not the current index into the string is a space character. The place I'll choose here is where we load up the current character being indexed for the printf() call. You'll notice that for the call to printf() we place on the stack the character to print and the printf format modifier. Here are the instructions which do that:
foo+0x33: pushl %eax foo+0x34: pushl $0x8050adc foo+0x39: call -0x172 < PLT:printf >And we'll just verify that foo+0x34 is loading up the the printf modifier string:
> 0x8050adc/s 0x8050adc: %cSo, we're in the right place. The character to be printed out (i.e. the current value of string[i]) will be found in the %eax register at foo+0x33 so this is where we'll place a probe with the pid provider. What we can do is to use the DTrace regs[] array to access the %eax register and predicate on whether or not we are currently looking at a space character (the space character is hex value 0x20). If the register contains the value 0x20 we can then extract the current value of the loop counter i which is found on the stack at %ebp-0x4. So our script looks like this:
#cat local1.d
#pragma D option quiet
inline int SPACE = 0x20;
pid$target:a.out:foo:33
/uregs[R_EAX] == SPACE/
{
this->pos = *(int *)(copyin(uregs[R_EBP]-0x4, sizeof(int)));
printf("Space found at char %d\n", this->pos + 1);
}
You'll note that to access the memory location for the loop counter i (the programs stack in this case), we need to use the copyin() DTrace subroutine. Executing the above script on this code we get the following output:
# dtrace -s ./local1.d -c ./local Foo bar test string Space found at char 4 Space found at char 8 Space found at char 13Voila! There you have it. It's a bit grim I know and forces you to rely on knowing a bit about the assembly language of the architecture you're executing on. However, it can prove to be invaluable technique for debugging those hard to get at problems.
Last week, my old mucker Phil Harman and I were fortunate enough to present a 3 hour session at JavaU on DTrace . We had a great time doing this session and I hope that those of you who attended got something out of it. We try and do our sessions with as few slides as possible and as much live demo as possible - this makes for a pretty hectic 3 hours for whoever is doing the typing (as the younger of the dynamic duo I don't mind!). I promised to post the scripts that were produced from this session and I will do that shortly (or at least pointers to where you can get better versions of, essentially, the same scripts :-) ).
One of the things I like to cover in my talks are what I call "fun things" to do with DTrace. By "fun" I don't mean side splitting laughter fun but more geek orientated antics that can be got up to with DTrace. One of the examples I give is a bit of a cheat but amusing nonetheless. One day, no doubt tired of hearing me proclaim that DTrace can solve virtually any problem, a Sun colleague challenged me with this one: he bet me that I couldn't make Solaris generate a BSOD . At first this may seem a strange challenge but it's one that DTrace can (kind of) address...
I set about it like this; when a binary is executed the kernel reads the first few bytes of the executable to determine what type of file it is - check out the gexec() code for how we do this. I'm going to use the findexec_by_hdr() function as the buffer that contains the executables magic is passed in as the first argument . By enabling a probe on this function I can look at the first two bytes of the file that have been read and determine whether or not a windows executable has been executed (the first two bytes of a windows 9x executable are 'MZ'). If the predicate evaluates to true then we can then use the system() action to change my GNOME desktop background to that of a BSOD jpeg! See, I said it was a dubious cheat but it usually gets a laugh. The script is:
#!/usr/sbin/dtrace -ws
#pragma D option quiet
fbt::findexec_by_hdr:entry
/ *args[0] == 'M' && *(args[0]+1) == 'Z'/
{
raise(SIGKILL);
system("/usr/bin/gconftool-2 --type=string --set /desktop/gnome/background/picture_options stretched");
system("/usr/bin/gconftool-2 --type=string --set /desktop/gnome/background/picture_filename /local/JavaU/bsod/BSOD.jpg");
}
Before the script is ran my desktop looks like:
If I then run the above D script and try to execute a random Windows executable my desktop changes to look like:
Who says you can't do everything with DTrace!
( May 15 2007, 12:58:22 PM GMT+00:00 ) Permalink
DTrace at LUGRadio Live 2006 They don't let me out much nowadays but this weekend is a little different. I'll be off to sunny Wolverhampton on Sunday to speak about DTrace at LUGRadio Live 2006. Looks like I've got the death slot just before lunch so I'll try and make it as interesting as I can for those that can come along. Who knows, I may even bring some BrandZ along with me (if time allows).
Hopefully see you there! ( Jul 18 2006, 01:20:00 PM GMT+00:00 ) Permalink
Determining Opteron processor revisions on Solaris
For a while now I've been doing quite a bit of work on the Opteron processor. For some of this work it's important to know which revision of processor you currently have in the machine you've cobbled together. This should be a doddle, right? Well, not quite.
So, how do you know if you have a rev B or a rev E processor then? The processor revision is determined by a bunch of values returned by the CPUID instruction - these include the family, model and step of the processor. These values can then be used in a lookup table which is given in the Opteron revision guide which can be found here. On a Solaris system the family, model and step can be found using psrinfo(1M):
# psrinfo -pv
The physical processor has 2 virtual processors (0 1)
x86 (AuthenticAMD family 15 model 33 step 2 clock 2190 MHz)
Dual Core AMD Opteron(tm) Processor 275
The physical processor has 2 virtual processors (2 3)
x86 (AuthenticAMD family 15 model 33 step 2 clock 2190 MHz)
Dual Core AMD Opteron(tm) Processor 275
So, armed with these numbers I can go and look up the processor revision. However, if you take a look at the revision guide, you'll see it's not quite as straight forward as you may hope (to say the least).
static const mc_rev_map_t mc_rev_map[] = {
{ 0xf, 0x00, 0x0f, MC_REV_PRE_D, "B/C/CG" },
{ 0xf, 0x10, 0x1f, MC_REV_D_E, "D" },
{ 0xf, 0x20, 0x3f, MC_REV_D_E, "E" },
{ 0xf, 0x40, 0x5f, MC_REV_F, "F" },
{ 0, 0, 0, MC_REV_UNKNOWN, NULL }
};
Where a mc_rev_map_t looks like:
typedef struct mc_rev_map {
uint_t rm_family;
uint_t rm_modello;
uint_t rm_modelhi;
uint_t rm_rev;
const char *rm_name;
} mc_rev_map_t;
Using the contents of this array you can see which revision you have by simply seeing which range (using the rm_modello and rm_modelhi members) your processors model falls within. We can see from the psrinfo output given earlier, that this machine has Rev E processors. > *mc_list::list mc_t mc_next | ::print mc_t mc_revname mc_revname = 0xfffffffff01eb550 "E" mc_revname = 0xfffffffff01eb550 "E"It's always more fun with mdb!! ( Jun 23 2006, 09:00:00 AM GMT+00:00 ) Permalink Comments [3]
Small sampling interval with cpustat(1M) It never ceases to amaze me that, even with tools that I use quite frequently, I can still learn new things about those tools. I recently had one of those experiences with cpustat(1M), a tool used to read performance counters from CPU's. Like most other observational tools of this ilk, cpustat allows you to specify a sampling interval. However, unlike most other tools, it allows you to specify intervals with a sub-second granularity - this is something I never knew.
The following example is on an Opteron based system. I'm using 2 of it's 4 pic counters (a pic counter is a hardware counter used to record events of interest) to measure user mode level 1 icache misses in pic0 and system mode level 1 icache misses in pic1. The sampling interval is every half a second:
# cpustat -c pic0=IC_miss,pic1=IC_miss,sys1,nouser1 0.5 time cpu event pic0 pic1 0.507 1 tick 796 3126 0.507 0 tick 574 3785 1.007 0 tick 263 3564 1.007 1 tick 21584 4851 1.507 0 tick 1416 5542 1.507 1 tick 6116 12318 2.007 0 tick 753 3817 2.007 1 tick 869 2507 2.507 1 tick 1345 3150 2.507 0 tick 185 2791Simple stuff but something which had evaded me until now. The most excellent Sriram brought this to my attention - thanks Sriram. ( Jan 17 2006, 05:00:00 PM GMT+00:00 ) Permalink
DTrace and mdb(1) Just before Christmas, I was unfortunate enough to do a slot at the London Opensolaris User Group. I say unfortunate because my session was supposed to be about DTrace Internals. Actually, the word Internals is far too grand a word for what I aimed to do. Really I should replace Internals with "a bit more under the hood than you'd normally go...". Anyway, I digress.
I was also unfortunate because Adam Leventhal was in London at the time and came along to lend a hand! My presentation itself (the material) was pretty bad really owing to the lack of time I'd had to put into, what is essentially, a very large topic. However, I hope I wasn't too dull and I hope that I wasn't too innacurate in my description of how little bits of DTrace work.
The vast topic of how DTrace actually works is hugely interesting and I'll do my best to put some of my pitted understanding up here in the not-too-distant future. One thing that I put as an addendum to my presentation was some mdb(1) incantations that allow you to extract a bit of information about what is going on around DTrace itelf.
DTrace has several mdb(1) dcmds of interest that allow us to shine some light on the workings of DTrace. Firstly, let's take a couple of the simple ones starting with the ::dtrace_state dcmd:
# mdb -k Loading modules: [ unix krtld genunix specfs dtrace ufs ip sctp usba uhci s1394 fctl nca lofs audiosup zfs random nfs sppp ptm ipc crypto logindmux ] > ::help dtrace_state NAME dtrace_state - print active DTrace consumersThe ::dtrace_state dcmd actually has a very good description which says it all really. You can use it to display all or individual consumers that are running on the system. Used without any arguments we see:
> ::dtrace_state
ADDR MINOR PROC NAME FILE
ffffffff8155c800 2 ffffffff856be048 dtrace ffffffff857a84d0
ffffffff8155c000 3 ffffffff856af190 dtrace ffffffff857a8498
So, we have two consumers running. The intersting bit is the address given on the ADDR column. This is a pointer to a dtrace_state structure. The dtrace_state structure is of huge interst to us when figuring out what a consumer is doing as it holds kernel state for a consumer.We can then use the ::dtrace dcmd to extract any data that is being held in kernel for that consumer (data that hasn't yet been read by the consumer). Let's take the second consumers dtrace_state address and feed in into the dtrace dcmd:
> ffffffff8155c000::dtrace fsflush 1 Xorg 2 battstat-applet- 3 gnome-netstatus- 3 nautilus 3 dtrace 4 mixer_applet2 4 soffice.bin 4 mdb 5 powernowd 6 java 34 sched 100It would be very interesting to know what probes were enabled and what the script looked like that generated that output. Well, that's where things get a bit tougher and I'll do that one in another entry.
I'll leave you with a small piece of mdb that shows you what providers are loaded on your system at the moment. Now, you could do this with dtrace(1M) and a simple bit of awk but that would be dull :-) . If I remember, next time I'll try to explain what it does...
> 1::id2probe | ::print -t dtrace_probe_t dtpr_provider | ::list dtrace_provider _t dtpv_next | ::print -t dtrace_provider_t dtpv_name char *dtpv_name = 0xffffffff83959078 "dtrace" char *dtpv_name = 0xffffffff8527ed00 "pid689" char *dtpv_name = 0xffffffff83db07c0 "syscall" char *dtpv_name = 0xffffffff83db0790 "sdt" char *dtpv_name = 0xffffffff83db07d0 "mib" char *dtpv_name = 0xffffffff83db07b0 "io" char *dtpv_name = 0xffffffff83db0660 "proc" char *dtpv_name = 0xffffffff83db0690 "sched" char *dtpv_name = 0xffffffff83db0650 "fpuinfo" char *dtpv_name = 0xffffffff83db08f8 "vminfo" char *dtpv_name = 0xffffffff83db0920 "sysinfo" char *dtpv_name = 0xffffffff83db0928 "vtrace" char *dtpv_name = 0xffffffff8316cb78 "profile" char *dtpv_name = 0xffffffff83da8a28 "lockstat" char *dtpv_name = 0xffffffff83db0900 "fbt" char *dtpv_name = 0xffffffff831a5658 "fasttrap"( Jan 09 2006, 05:00:00 PM GMT+00:00 ) Permalink Comments [3]
With a spring in my step I'm off to SUPerG on monday. The "spring" here really refers to the fact that I'm really looking forward to this conference. Last year was my first SUPerG and I must say that it is probably the best conference that I have ever been to. A whole lot of really excellent Sun and external engineers come and share their insights with delegates who are genuinely interested in hearing what is being said. The people attending are not only interested but they are also interesting. I met many excellent people from all over the globe who were willing to not only listen but also share their experiences around Sun and the surrounding technologies. I learnt a whole lot and I'm looking forward to much of the same this year.
Last year Phil Harman (author of the infamous Multithreading in the Solaris Operating Environment paper) and I did a TechBrief on DTrace and we had a great time. The format 1 was one of Phil waxing lyrical about real scenarios/problems that you will encounter when dealing with systems in Data Centres whilst I create the D scripts on the fly and execute them, demonstrating how DTrace makes easy meat of all these problems. To really appreciate DTrace you've got to see it and feel it and this is what we aimed to do; slides are off the menu and smokin' keyboards and caffeine are on!
This year we aim to do the same but maybe with a few different laughs along the way. If you're coming along then I look forward to seeing you there. If you're not then try and get there next year!
A whole raft of talent will be presenting and a sample include Richard McDougall , Jim Mauro and Richard Elling .
The simple things in life are still cool
I've been messing around with DTrace for some time now and, as with most things in life, a certain level of intimacy means that I've began take the old boy a bit for granted. What was incredibly cool at first becomes normal after a while even though it is still mind boggling. A simple example that came up today reminded me of how the simple things are still incredibly cool.
A colleague of mine was wanting to see the http interaction in a web infrastructure he's developing. On a Solaris 10 system he has multiple zones with portal server, access manager and various other JES pieces. He had never touched DTrace before but being a bright chap he went and got the manual and knocked some stuff up to just look at the
#!/usr/sbin/dtrace -s
#pragma D option quiet
#pragma D option strsize=2048
syscall::write:entry
/execname == "firefox-bin" && stringof(copyin(arg1, 3)) == "GET"/
{
printf("%s\n\n", copyinstr(arg1));
}
GET http://www.sun.com/ HTTP/1.1 Host: www.sun.com User-Agent: Mozilla/5.0 (X11; U; SunOS i86pc; en-US; rv:1.7) Gecko/20041221 Accept: text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Proxy-Connection: keep-alive Cookie: ... GET http://www.sun.com/2005-0329/images/b3_staroffice8.jpg HTTP/1.1 Host: www.sun.com User-Agent: Mozilla/5.0 (X11; U; SunOS i86pc; en-US; rv:1.7) Gecko/20041221 Accept: image/png,*/*;q=0.5 Accept-Language: en-us,en;q=0.5 Accept-Encoding: gzip,deflate Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7 Keep-Alive: 300 Proxy-Connection: keep-alive Referer: http://www.sun.com/ Cookie: ...
( Apr 04 2005, 03:00:00 PM GMT+00:00 ) Permalink Comments [1]
uname lies ... Well, it's been a while to say the least but I think it's about time to put fingers to keys again and see what comes forth ...
I've mentioned before some of the great things that can be achieved with destructive actions. Indeed many good examples are popping up - check out this cool example from Chris Gerhards blog. Today we'll just take a quick look at another one (I actually have a bunch which I'll try and write-up over the next few weeks).
In the UK we have run a series of events called Make-It-Fly which I've been involved with. Last week I did a session on DTrace which I hope everyone who was present enjoyed (I did anyway!). At the events I usually do quite a bit of hands on demo with most of it being hand cranked. However, one of the scripts I use that always gets a laugh is the following one which I don't hand crank as I can never remember the offsets ... If you've ever wanted to get uname(1) to return something different to normal then this is what you need:
#!/usr/sbin/dtrace -s
#pragma D option destructive
syscall::uname:entry
{
self->addr = arg0;
}
syscall::uname:return
{
copyoutstr("SunOS", self->addr, 257);
copyoutstr("PowerPC", self->addr+257, 257);
copyoutstr("5.5.1", self->addr+(257*2), 257);
copyoutstr("gate:1996-12-01", self->addr+(257*3), 257);
copyoutstr("PPC", self->addr+(257*4), 257);
}
Before we have:
# uname -a SunOS homer 5.10 SunOS_Development sun4u sparc SUNW,Ultra-5_10and like magic we morph into something else when the above script is ran:
# uname -a SunOS PowerPC 5.5.1 gate:1996-12-01 PPC sparc SUNW,Ultra-5_10Here at Sun we often test pieces of software on versions of Solaris that return something different to that which the software is expecting. Previously I would LD_PRELOAD a library in with my own uname hack. Now I can not only do this without bothering the application but I can present different uname information to different applications/users/whatever as I can predicate accordingly!
Note, that the above script isn't quite complete as it returns the incorrect ISA information. This is me being idle and a bit of twiddling with sysinfo() is all that's needed. Maybe another day. ( Mar 21 2005, 05:20:00 PM GMT+00:00 ) Permalink Comments [1]
Who's at the Door? Doors are an IPC facility that exist within Solaris which allow one process to execute functions within another process. A process that exports a function using the door_create() interface is termed a door server and a process that invokes this function with the door_call() interface is known as a client (suprisingly enough).
I was on the train the other day just tinkering with a few DTrace scripts that I've been developing (see Alans Blog for other DTrace train tinkering) when I happened to notice a reasonable amount of door_call() calls being made by my X server (the Xsun process). Nowadays it's really easy to see what process the door_call() is being made to as the pfiles(1) utility even tells us what process is the door server for this door. Taking a quick look at my X server we see:
# pgrep -fl Xsun
497 /usr/openwin/bin/Xsun :0 -nobanner -defdepth 24 -auth /var/dt/A:0-PDaa9a
# pfiles 497 | grep -i door
4: S_IFDOOR mode:0777 dev:214,0 ino:0 uid:0 gid:0 size:0
O_RDWR door to svc.configd[9]
26: S_IFDOOR mode:0777 dev:214,0 ino:0 uid:0 gid:0 size:0
O_RDWR door to stfontserverd[695]
How spoilt we are nowadays - it's all so easy to see what is happening! However, I really want to see a systemic view a what door calls are being made so inbetween Surbiton and Waterloo I put the following simple script together:
#!/usr/sbin/dtrace -s
#pragma D option quiet
door_call:entry
{
self->in = 1;
}
door_lookup:return
/self->in/
{
self->called = stringof(args[1]->door_target->p_user.u_comm);
}
door_call:return
/self->in/
{
@[execname, self->called] = count();
self->in = 0;
self->called = 0;
}
END
{
printa("%s called door in %s %@d times\n", @);
}
The above script just takes advantage of the fact that we do a door_lookup() from within a door_call(). We can then extract the name of the door server from the proc structure that is stored in the struct door_node that door_lookup() returns (see sys/door.h).
# ./door.d ^C kcfd called door in nscd 1 times zoneadm called door in zoneadmd 1 times devfsadm called door in devfsadm 1 times zoneadmd called door in devfsadm 1 times snmpd called door in nscd 2 times sshd called door in kcfd 2 times sshd called door in nscd 2 times syslogd called door in nscd 2 times syseventd called door in devfsadm 2 times chown called door in nscd 2 times devfsadm called door in syseventd 2 times ttymon called door in nscd 3 times dtlogin called door in nscd 3 times id called door in nscd 4 times deallocate called door in nscd 4 times snmpXdmid called door in nscd 4 times lpstat called door in syslogd 4 times cron called door in nscd 5 times svc.startd called door in nscd 6 times inetd called door in nscd 16 times svcprop called door in svc.configd 26 times svccfg called door in svc.configd 54 times rpcbind called door in svc.configd 117 times lsvcrun called door in svc.configd 550 times svc.configd called door in nscd 2034 times mfstscan called door in svc.configd 2708 times Xsun called door in stfontserverd 5046 times svcs called door in svc.configd 7918 times inetd called door in svc.configd 28908 times svc.startd called door in svc.configd 37110 timesYou'll have to excuse the poor formatting as it's always a weakness of mine but isn't it cool to see how easy we can see the various interactions? That's quite a few doors being opened in the smf(5) world as I was doing a bit of svcs work at the time (see Stephen Hahn's Blog for smf(5) goodness). ( Sep 10 2004, 10:32:00 PM GMT+00:00 ) Permalink Comments [2]
DTrace and Visualisation As I talk to people about DTrace one of the most frequent requests that arises is for scripts and tools. On the scripts front people want cookbook type stuff as it is always easier to learn a new technology such as DTrace by "re-using" and building upon pre-prepared material. This is an excellent request that, hopefully, Sun and the ever growing band of DTrace addicts will fulfill in the near future.
On the tools front a request that is made is for a DTrace GUI. In the early days of DTrace (about a year or so ago...) my usual base reaction to such a request was to reply that, whilst I never say never, I doubt that this is high on the priority list but I'll feed it back ... However, age and thought has softened me tremendously and I now feel that the union of DTrace and Visualisation techniques is a source of terrific innovation and will be of immense benefit to all.
With DTrace we have the infrastructure to ask almost arbitrary questions about the behaviour of our software stacks from the top of an application through to the bottom of a bus nexus driver. However, some of the data that these complex subsystems generate can be rich in nature. This is not a size issue particularly as DTrace has great primitives for data reduction but more referring to the relationships that exist within the data. Sometimes a bit of visual stimulation is required to get our synapses firing and to force our grey cells into seeing patterns and relationships in the data. I'm not knocking DTrace here; on the contrary I'm saying that DTrace is the nearly infinitely rich architecture with which we can feed the visualisation engines. I'm convinced that in the near future we will see some real innovation in the tools around systemic and application observability as the data is so easily at hand.
For example, take something that a colleague ( Simon Ritter ) and and I have been working on in our copious spare time, the area of call graphs. With DTrace we can trivially discover what functions a given function is calling and in it's simplest form the script looks like:
#usr/sbin/dtrace -s
#pragma D option flowindent
fbt::$1:entry
{
self->in = 1;
}
fbt:::entry,
fbt:::return
/self->in && probefunc != "$1"/
{
}
fbt::$1:return
/self->in/
{
self->in = 0;
exit(1);
}
So, a trivial example would be to look at what activity the clock routine generates:
# ./fbt.d clock dtrace: script './fbt.d' matched 31751 probes CPU FUNCTION 0 -> clock 0 -> set_anoninfo 0 <- set_anoninfo 0 -> set_freemem 0 <- set_freemem 0 -> wakeup_pcgs 0 <- wakeup_pcgs 0 -> clock_tick 0 -> sys_nullsys 0 <- sys_nullsys 0 -> mutex_owned 0 <- mutex_owned 0 -> rctl_test 0 <- rctl_test 0 -> rctl_test_entity 0 -> mutex_owned 0 <- mutex_owned 0 -> rctl_set_find 0 -> mutex_owned 0 <- mutex_owned 0 <- rctl_set_find 0 <- rctl_test_entity 0 -> rctl_test 0 <- rctl_test 0 -> rctl_test_entity 0 -> mutex_owned 0 <- mutex_owned 0 -> rctl_set_find 0 -> mutex_owned 0 <- mutex_owned 0 <- rctl_set_find 0 <- rctl_test_entity 0 -> rm_asrss 0 <- rm_asrss 0 <- clock_tick 0 -> callout_schedule 0 -> callout_schedule_1 0 <- callout_schedule_1 0 -> softcall 0 -> mutex_vector_enter 0 <- mutex_vector_enter 0 -> mutex_vector_exit 0 <- mutex_vector_exit 0 <- softcall 0 -> siron 0 <- siron 0 -> setsoftint 0 <- setsoftint 0 -> callout_schedule_1 0 <- callout_schedule_1 0 <- callout_schedule 0 | clock:return 0 <- clockHowever, as great as it is to see the above, the shape of the call sequence isn't very apparent and that is a simple routine. Using something such as a ufs_write() you soon see how this gets very complex. It is much better to draw a directed graph of the calls and you can immediately see what is being called by what. An example of a single ufs_write() call can be seen in this image . The image was generated using DTrace to feed a public domain piece of software called JzGraph which is a great Java implementation of the wonderful Graphviz algorithms.