Darryl Gove's blog

Tuesday Mar 25, 2008

Conference schedule

The next two months are likely to be a bit hectic for me. I'm presenting at three different conferences, as well as a chat session in Second Life. So I figured I'd put the information up in case anyone reading this is also going to one or other of the events. So in date order:

I'll be talking about parallelisation at the various conferences, the talks will be different. The multi-core expo talks focuses on microparallelisation. The ESC talk will probably be higher level, and the CommunityOne talk will probably be wider ranging, and I hope more interactive.

In the Second Life event I'll be talking about the book, although the whole idea of appearing is to do Q&A, so I hope that will be more of a discussion.

Friday Feb 08, 2008

Interposing on malloc

Ended up wanting to look at malloc calls, how much was requested, where the memory was located, and where in the program the request was made. This was on S9, so no dtrace, so the obvious thing to do was to write an interpose library and use that. The code is pretty simple:

#include <stdlib.h>
#include <stdio.h>
#include <dlfcn.h>
#include <ucontext.h>

void * malloc(size_t size)
{
  static void* (*func)(size_t)=0;
  void* ret;
  if (!func) {func=(void*(*)(size_t))dlsym(RTLD_NEXT,"malloc");}
  ret=func(size);
  printf("size = %i address=%x\n",size,ret);
  printstack(0);
  return ret;
}

The code uses a call to printstack to print out the stack at the point of the call.

The code is compiled and run with:

$ cc -O -G -Kpic -o libmallinter.so mallinter.c
$ LD_PRELOAD=./libmallinter.so ls
size = 17 address=25118
/home/libmallinter.so:malloc+0x5c
/usr/lib/libc.so.1:_strdup+0xc
/usr/lib/libc.so.1:0x73b54
/usr/lib/libc.so.1:0x72d44
/usr/lib/libc.so.1:0x720e4
/usr/lib/libc.so.1:setlocale+0x3c
/usr/bin/ls:main+0x14
/usr/bin/ls:_start+0x108
size = 17 address=25138
/home/libmallinter.so:malloc+0x5c
/usr/lib/libc.so.1:_strdup+0xc
/usr/lib/libc.so.1:0x73b54
/usr/lib/libc.so.1:0x72d44
/usr/lib/libc.so.1:0x720e4
/usr/lib/libc.so.1:setlocale+0x3c
/usr/bin/ls:main+0x14
/usr/bin/ls:_start+0x108

Thursday Jan 31, 2008

Bart Smaalder's performance anti-patterns

Bart Smaalder's has an excellent article on performance anti-patterns or things not to do. It's a great (and quick) read.

Monday Jan 28, 2008

Sample chapter from Solaris Application Programming available

There's a sample chapter from my book up on sun.com/books.

It's chapter 4 which is the chapter which discusses the tools that come with Solaris and Sun Studio. The chapter exists because I find that there are some tools that I use every day, and some tools that I might touch once a month, and some that I use even more rarely. The problems I hit are:

  • What was the name of the tool which ....?
  • What are the command line options to ...?
  • Is there a tool to ....?

Obviously I hit the third problem very infrequently, but I'm sometimes surprised when I discover a tool which I'd previously never heard of which just happens to do exactly what I need. Anyway I hope you find the chapter useful. It's one of my two solutions to this problem.

The other solution is spot which attempts to collect all the data that you routinely need for performance analysis of an application. So it calls the other tools - so you don't need to know the commandlines, or the names of the tools. One of the things that should be noticeable with spot is that it has few commandline options. I was hoping that we'd end up with none, but some are inevitable; but those are really house-keeping options (where to put the report, what to call it). There's only -X which generates an extended report, given the time it can take to get the data, it seemed appropriate to do the high value stuff quickly with an option for the tool to take a longer time when the user specified that it was ok.

Monday Jan 07, 2008

putc in a multithreaded context

Just answering a question from a colleague. The application was running significantly slower when compiled as a multithreaded app compared to the original serial app. The profile showed mutex_unlock as being hot, but going up the callstack the routine that called mutex_unlock was putc.

This is the OpenSolaris source for putc, which shows a call to FLOCKFILE, which is defined in this file for MT programs. So for MT programs, a lock needs to be acquired before the character can be output.

Fortunately it is possible to avoid the locking using putc_unlocked. This call should not be used as a drop-in replacement for putc, but used after the appropriate mutex has been acquired. The details are in the Solaris Multi-threaded programming guide.

A test program that demonstrates this problem is:

#include <stdio.h>
#include <pthread.h>
#include <sys/time.h>

static double s_time;

void starttime()
{
  s_time=1.0*gethrtime();
}

void endtime(long its)
{
  double e_time=1.0*gethrtime();
  printf("Time per iteration %5.2f ns\n", (e_time-s_time)/(1.0*its));
  s_time=1.0*gethrtime();

}

void *dowork(void *params)
{
  starttime();
  FILE* s=fopen("/tmp/dldldldldld","w");
  for (int i=0; i<100000000; i++)
  {
    putc(65,s);
  }
  fclose(s);
  endtime(100000000);
}

void main()
{
  starttime();
  FILE* s=fopen("/tmp/dldldldldld","w");
  for (int i=0; i<100000000; i++)
  {
    putc(65,s);
  }
  fclose(s);
  
  endtime(100000000);
  pthread_t threads[1];
  pthread_create(&threads[0],NULL,dowork,NULL);
  pthread_join(threads[0],NULL);
}

Here's the results of running the code on Solaris 10:

$ cc -mt putc.c
Time per iteration 30.55 ns
Time per iteration 165.76 ns

The situation on Solaris 10 is better than Solaris 9, since on Solaris 9 the cost of the mutex was incurred by the -mt compiler flag rather than whether there were actually multiple threads active.

Monday Dec 17, 2007

Open source application tuning

My group has started a page on the Sun wiki detailing the steps necessary to compile and build a number of open source applications. The page also contains links to useful destinations in the compiler documentation. Feel free to suggest ideas for applications that we should cover there - I can't guaranty that we'll manage to look at them, but I'd love to know what's important to you!

Tuesday Nov 20, 2007

Adding dtrace probes to user code

The process of adding dtrace probes to userland code is described in the dynamic tracing guide. However, there's no better way of learning how to do it, than trying it out on a snippet of code.

Here's a short bit of code that calls a function twice, each time with different parameters. The plan is to insert a probe that can report the passed parameters.

#include <stdio.h>

void func(int a, int b)
{
  printf("a=%i, b=%i\n",a,b);
}

void main()
{
  func(1,2);
  func(2,3);
}

The first change is to add the <sys/sdt.h> header file. This file has definitions for the DTRACE_PROBE<N> macro. N represents the number of parameters that are to be reported by the probe. In this case we are going to pass two parameters (a and b) to the probe. As well as the parameters that are to be passed to the dtrace probe, the macro takes the name to be used tof the application provide (in this case the name will be myapp) and the name of the probe (in this case func_call). The modified source code looks as follows:

#include <stdio.h>
#include <sys/sdt.h>

void func(int a, int b)
{
  DTRACE_PROBE2(myapp,func_call,a,b);
  printf("a=%i, b=%i\n",a,b);
}

void main()
{
  func(1,2);
  func(2,3);
}

The next step is to write a probe description file which dtrace will use to produce the probes. A full file would describe the stability of the probe in more detail, but a lightweight file just describes the probes defined by the provider application:

provider myapp
{
  probe func_call(int, int);
};

Having completed this, it's necessary to compile and link the application. Initially each source file needs to be compiled, and then before the application is linked, dtrace needs to be invoked to modify the object files, removing the calls to the probes, but leaving space for them to be reinserted. dtrace also needs to compile the probe description file into an object file. Finally the modified object files and the probe description file can be linked to produce the executable. As follows:

$ cc -c app.c
$ dtrace -G -32 -s probes.d app.o
$ cc probes.o app.o

The resulting code in the application looks like:

func()
        113a0:  9d e3 bf a0  save       %sp, -96, %sp
        113a4:  f0 27 a0 44  st         %i0, [%fp + 68]
        113a8:  f2 27 a0 48  st         %i1, [%fp + 72]
        113ac:  d0 07 a0 44  ld         [%fp + 68], %o0
        113b0:  01 00 00 00  nop
        113b4:  d2 07 a0 48  ld         [%fp + 72], %o1
        113b8:  11 00 00 45  sethi      %hi(0x11400), %o0
        113bc:  90 12 22 60  bset       608, %o0        ! 0x11660
        113c0:  d2 07 a0 44  ld         [%fp + 68], %o1
        113c4:  40 00 42 c7  call       printf  ! 0x21ee0
        113c8:  d4 07 a0 48  ld         [%fp + 72], %o2
        113cc:  81 c7 e0 08  ret
        113d0:  81 e8 00 00  restore

The nop at 0x113b0 is there for dtrace to dynamically patch with a call instruction that will enable the dtrace probe.

Finally, the following is an example of using the new probe:

$ more script.d
myapp$target:::func_call
{
  @[arg0,arg1]=count();
}
$ dtrace -s script.d -c a.out
dtrace: script 'script.d' matched 1 probe
a=1, b=2
a=2, b=3
dtrace: pid 22355 has exited

                1                2                1
                2                3                1

The script just aggregates the parameters used in the function call. When the application terminates the aggregation is printed out - showing the expected result of two calls to the routine each call with different parameters.

Monday Oct 29, 2007

Solaris Application Programming - available as rough cut

My book, "Solaris Application Programming", is now available as a Safari Rough-Cut.

For those who are unfamiliar with the rough-cut programme, the idea is to get early access to drafts of new books. The draft of my book that is available is the one that I actually handed over about two months back. This is before the copyeditor went through smoothing out the grammar, and also before I did another review of the text. The layout of the book is also different.

From the link you can either get access to the full text (for example, if you have a subscription to safari), or you can view snippets from various sections of the book to get a feel for the content.

Part of the idea of rough-cuts is that they provide an opportunity to influence/improve the final product. So please use the mechanism they provide to comment.

Friday Sep 28, 2007

Solaris Application Programming Table of Contents

A couple of folks requested that I post the table of contents for my book. This is the draft TOC, not the finished product. I assume that there will be a good correspondence, but the final version should definitely look neater.

Tuesday Sep 25, 2007

Solaris Application Programming book

I'm thrilled to see that my book is being listed for pre-order on Amazon in the US. It seems to take about a month for it to travel the Atlantic to Amazon UK.

Friday Aug 31, 2007

Register windows and context switches

Interesting paper on register windows and context switching

Thursday Aug 02, 2007

Outline of book for Solaris developers

It's probably useful to outline the contents of the book I'm working on. The book is meant as a resource for people coding for or on the Solaris platform, for either SPARC or x85/x64 processors. It falls into four main sections:

  • Hardware. Solaris is supported on both x86/x64 and SPARC. Both processor families have different features and different assembly languages. But there's also a lot of commonality in processors (e.g. Caches, TLBs etc.). The first section of the book outlines common features of processors, and also the differences between the two families. It also covers particular implementations of the families (e.g. UltraSPARC T1 etc.) All this material is useful context and definitions for the material that follows later.
  • Software. The software is Solaris and the tools that ship with it, the Sun Studio compilers, the performance profiling tools, and the debugging tools. In fact, there are tools for most questions that a developer could think of asking, the trick is to know that they exist and have some examples that demonstrate the use of the tools.
  • Source code. Inevitably much of what the developer deals with is source code, and this section demonstrates how to use the available tools to identify, tune, and improve source code. The section has coverage of the topic of using performance counters to determine what's causing performance bottlenecks, and also of deriving metrics using performance counters. The section also covers using compiler options and source code modifications to improve performance.
  • Multi-core. Almost all systems that are available today have more than one core. The challenge going forwards is to utilise these resources effectively and efficiently. This section focuses on the various approaches that can be used to leverage these resources, and the tools that can be used to diagnose and improve the code.

Tuesday Jul 31, 2007

Snippet from book: cost of calling libraries

I've been working on a book about developing on Solaris, and I'm currently in the final stages of editing - which is a great feeling :) One of the strange things that happens at this stage is that material ends up being cut out. One of the sections that didn't make it was a discussion of the overhead of calling dynamic libraries rather than static libraries. The text is in a 'raw' format, and for some reason the document claims to have 4 pages, rather than the 3 that are there.

Wednesday Apr 11, 2007

Solaris observability tools

A comprehensive list of the observability tools shipped with Solaris. Unfortunately the links on the page go to the source of the tool rather than the man page.

Calendar

Search this blog

About

Solaris Application Programming

Book resources

Recent entries

Custom search

Tag cloud

ats bit book c++ cmt communityone compiler cooltools cpu2006 developers dtrace gccfss hpc multithreading openmp opensparc parallelisation parallelization performance performanceanalyzer secondlife solaris solarisapplicationprogramming sparc spot sunstudio t2 ultrasparc ultrasparct2 x86

Links

Webcasts

Articles

Presentations

Navigation

Referers

Feeds