20040730 Friday July 30, 2004

Solaris Quiz #2 (Which file)


A fairly simple quiz which probably has several answers. Here we go: We have a process which is already running. We dont want to restart the process. It is writing a lot of data to a file but we dont know which one. We suspect process has a lot of files open. How do we find the name of the file that we are writing to?

Restrictions

  • The OS is Solaris 8
  • We can become root

The prize will be whatever I can scrounge. The winner will not necessarily the first person to answer, but either the most elegant answer, or failing that the answer which meets my arbitrary and as yet unspecified criteria.

If anyone from Sun is reading this and knows whom I can contact for some prizes please email me ( Jul 30 2004, 03:38:58 PM BST ) Permalink Comments [1]

20040727 Tuesday July 27, 2004

Solaris Quiz - answer

The correct answer was that ksh has 'true' as a built in function to the korn shell whereas the bourne shell execs a separate binary /usr/bin/true. So in vmstat/mpstat we see a lot of time attributed to 'sys' even though our instinct is that 'while true do' should be userland only. At least this sounds like a good theory. How to prove it though? Today Matthew our I'm going to be using truss(1). Im going to create 2 scripts
#!/bin/sh
while true
do
 :
done

and
#!/bin/ksh
while true
do
 :
done

So now we have a couple of scripts to trace. Im going to use truss with -f so that we 'follow' any forked processes and -c to 'count' the number of system calls. We use '-f' because otherwise we'll only see the forked sub-shell and not the script that we are actually trying to measure. So what do we actually see? First of all using korn shell, then bourne shell.
garyli@podtwo[/var/tmp] $truss -fc ./x.ksh              
^Csyscall               seconds   calls  errors
read                     .000       3
open                     .000      12      1
close                    .000      16      2
time                     .000       2
brk                      .000       6
stat                     .000       1
getpid                   .000       2
getuid                   .000       2
fstat                    .000      11
getgid                   .000       2
ioctl                    .000       3      2
execve                   .000       1
fcntl                    .000       6
sigprocmask              .000       1
sigaction                .000      19
mmap                     .002      18
munmap                   .001       5
memcntl                  .000       4
llseek                   .000       2
resolvepath              .001       8
stat64                   .000       3
fstat64                  .000       3
open64                   .000       2
                     --------  ------   ----
sys totals:              .012     132      5
usr time:                .011
elapsed:                5.400

So here we see only 1 execve call. Now when we run the same script but under bourne shell...

garyli@podtwo[/var/tmp] $truss -fc ./x.sh               
^Csyscall               seconds   calls  errors
_exit                    .020     119
fork                     .079     120
read                     .040     477
open                     .080     721    121
close                    .020     959    120
brk                      .160    1081
stat                     .010     121
getpid                   .030     240
getuid                   .000     241
fstat                    .000     600
access                   .000       1
getsid                   .010     120
getpgid                  .010     120
getgid                   .000     240
ioctl                    .047     716    358
execve                   .310     121
fcntl                    .010     240
sigaltstack              .020     120
sigaction                .141    6360
waitid                   .053     238
mmap                     .181    1321
munmap                   .060     360
memcntl                  .040     240
sysconfig                .010     240
resolvepath              .100     722
stat64                   .002       5      4
open64                   .010     120
                     --------  ------   ----
sys totals:             1.450   15963    603
usr time:                .461
elapsed:                5.370

We see 121 in the execve row, meaning that we execec'd 121 sub processes well 120 (less the exec of the subshell). Now the number of execs is dwarfed by say, the number of sigaction calls, so why concentrate on exec? Well the sigactions are just a part of what happens when some processes are exec'd. An individual truss -fc against /usr/bin/true shows 53 calls to sigaction (120 * 53 = 6360)

So, I think there's just one more question to be answered here. Why do we need to make 53 calls to sigaction for a binary that just returns 'true'. The answer is here...

garyli@podtwo[/var/tmp] $file /usr/bin/true
/usr/bin/true:  executable /usr/bin/sh script

So.... 'true' is actually a script...

garyli@podtwo[/var/tmp] $cat /usr/bin/true
#!/usr/bin/sh
#       Copyright (c) 1984, 1986, 1987, 1988, 1989 AT&T
#         All Rights Reserved

#       THIS IS UNPUBLISHED PROPRIETARY SOURCE CODE OF AT&T
#       The copyright notice above does not evidence any
#       actual or intended publication of such source code.

#ident  "@(#)true.sh    1.6     93/01/11 SMI"   /* SVr4.0 1.4   */

So we are starting an entire shell just to exit with '0' and thus give us the 'true' we want. Shells need to have signal handlers to correctly handle CTRL-Z and so on. That's all a bit superfluous for 'true'. We can do something a lot simpler than that such as...

int main()
{
 return 0;
}

We can then compile that code and call it newtrue. Now when we run the script and get it to call 'newtrue' we get a truss output similar to this...

garyli@podtwo[/var/tmp] $truss -fc ./x1.sh
^Csyscall               seconds   calls  errors
_exit                    .050     146
fork                     .101     147
read                     .000       1
open                     .090     594    148
close                    .010     448      1
brk                      .120     447
stat                     .050     148
getpid                   .040     148
getuid                   .000       2
fstat                    .020     446
getsid                   .000       1
getpgid                  .000       1
getgid                   .000       2
ioctl                    .019     294      1
execve                   .530     148
fcntl                    .000       2
sigaltstack              .000       1
sigaction                .001      53
waitid                   .063     292
mmap                     .141    1187
munmap                   .030     297
memcntl                  .050     149
sysconfig                .000       2
resolvepath              .220     741
open64                   .000       1
                     --------  ------   ----
sys totals:             1.541    5698    150
usr time:                .232
elapsed:                4.760

Now we see 148 execs but only 53 calls to sigaction. It's all a bit trivial, but shows quite nicely how much information that we can get out of an old-school tool like truss. If only everything were that simple.

And Finally. Congratulations to David Morris from the UK on getting the first post. ( Jul 27 2004, 03:15:36 PM BST ) Permalink

20040726 Monday July 26, 2004

smalltownboy

So it's the weekend, and time to take the Hawk out for a spin.
Everyone should have one, well at least I should have one, but alas I was just borrowing this from BAE. One of the very few benefits of living in Aldershot, in addition to the famous gentlemens club (think Groucho club, then think again) is that the international airshow that is Farnborough is close enough to walk to. It's a great day out, albeit a bit bizarre seeing small children running around an exhibition hall dedicated to more efficient ways of killing people.

Anyhow on the tech side, our Blueprints CD winner made an interesting point which was that for all the talk about dtrace, there are many folks using Solaris who are still in the dark about truss and a lot of the tracing tools we have in production versions of Solaris. Would a short tutorial and/or hints 'n' tips on truss, p* tools and so on be of interest? If you'd like to see some part of Solaris discussed drop me a line via the comments or gjl@sun.com ( Jul 26 2004, 11:05:46 PM BST ) Permalink

20040721 Wednesday July 21, 2004

Oracle on Network Attached Storage + blue boxin'


so it's not the most thrilling subject, but I found this document on BigAdmin a while ago, and just got around to reading it. The document contains a lot of good tips on setting up Oracle on NAS (Particularly NetApps). The impressive thing I found was that the engineers use real measured data to qualify much of what they recommend. The authors are from Sun's performance group, and from NetApp themselves. Check it out for general Solaris/Oracle/NFS configuration guidelines also.

And by way of apology for such a dry and boring blog entry, please enjoy this gratuitous picture of an original Woz designed blue box I found at the Coumputer History museum. Nice.

( Jul 21 2004, 10:02:21 PM BST ) Permalink

Solaris quiz


OK,
Here's a quick quiz and it's something that stumped me for a while. On the same system (SunOS 5.10, SPARC, Ultra10 - not that it matters) I run the same script under two different shells. When I use vmstat to monitor the system I seem to get totally different results. Under a ksh I get this
 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr dd f0 s1 --   in   sy   cs us sy id
 0 0 15 480384 121968 6  22  1  0  0  0  0  0  0  0  0  419  292  233  4  3 94
 0 0 29 322856 35840  0   0  0  0  0  0  0  0  0  0  0  407  573  302 76  1 23
 0 0 29 322856 35840  0   0  0  0  0  0  0  0  0  0  0  401  564  291 99  1  0
 0 0 29 322856 35840  0   0  0  0  0  0  0  0  0  0  0  402  582  286 99  1  0
 0 0 29 322856 35840  0   0  0  0  0  0  0  2  0  0  0  417  583  293 98  2  0

but under bourne shell (sh) I get this - hint look at the sys Vs usr column
 kthr      memory            page            disk          faults      cpu
 r b w   swap  free  re  mf pi po fr de sr dd f0 s1 --   in   sy   cs us sy id
 0 0 15 480384 121968 6  22  1  0  0  0  0  0  0  0  0  419  292  233  4  3 94
 0 0 29 322576 35536 251 2636 0 0  0  0  0  0  0  0  0 1072 6707 1052 19 69 12
 3 0 29 322704 35576 187 2007 13 0 0  0  0  1  0  0  0  938 5470  849 17 74  9
 0 0 29 322888 35736 238 2531 0 0  0  0  0  4  0  0  0 1067 6520 1044 17 72 11
 0 0 29 322880 35728 244 2606 0 0  0  0  0  0  0  0  0 1072 6673 1038 17 72 11
The vmstat was collected using vmstat 5. The script is very simple
garyli@arches[/home/garyli] $while true
> do
> :
> done
Why is that? What is happening here? The first correct answer will get a Sun Blueprints CD. ( Jul 21 2004, 10:12:47 AM BST ) Permalink Comments [3]

You shouldn't laugh but

Personally I found this story 'Reverse air-rage' on Russian jet hilarious! ( Jul 21 2004, 09:57:24 AM BST ) Permalink
20040719 Monday July 19, 2004

Good news for bad databases...

I just closed a long running escalation whereby old oracle connections would appear to hang, and new ones could not be established. A truss of a new sqlplus session showed that the oracle process was sleeping on a semop system call.
sqlplus/1:      read(10, 0x10024FF66, 2064)     (sleeping...)^M
oracle/1:       semop(1245185, 0xFFFFFFFF7FFFAF9C, 1) (sleeping...)^M
sqlplus/1:      556.9328            Received signal #2, SIGINT, in read() 

^^^Here we see the process is eventually killed with CTRL-C
Anyhow the fix was to apply Oracle update 9.2.0.5 (9205) which according to my Oracle contacts contains around 1,100 fixes. 9.2.0.4 contained in the region of 800 compared to 9.2.0.3 Cor blimey, what a whopper . ( Jul 19 2004, 04:12:07 PM BST ) Permalink

I was a teenage raver...

But that was a long time ago, these days things are a bit more sedate. However, I did manage to get along to a gig of sorts this weekend. Ruth won two tickets to see Pavarotti at Leeds Castle which is in Kent, England. The prize included a hotel stay, hospitality (drinks and small portions of food) excellent seats ( we sat just behind John Simpson and he works for the BBC you know) and dinner.

Anyhow regarding the concert itself, I was totally unprepared for the awsome sound of the full orchestra on stage. I now think that one of the reasons I have never really appreciated orchestral music is down to the fact that it is very hard to reproduce on a home Hi-Fi. So I now have an excuse to save up for the Linn kit I've always wanted. Of course being a commoner doesnt help either, but I cant change that.

Anyhow after the gig/concert we got to eat in the castle itself. Apparently the room we ate in was used by no less than King Henry VIII.

Heres a picture of the castle:-

And the dining room, featuring pictures of some of his adoring wives...some of which he decided to kill.

The music was great but I dont think Im about to become an opera fan anytime soon, though I'd like to check out some more orchestral/classical pieces and you cant beat being outdoors whilst your at it. If you ever get the chance, give it a try. ( Jul 19 2004, 10:26:45 AM BST ) Permalink

20040716 Friday July 16, 2004

Oi! who said you could use that CPU

I just happened to stumble across a very useful switch to prstat, which I either forgot about or never knew existed. The magic switch is -m which displays microstate accounting information. Woo hoo you say, WTF does that mean? Well it means that you can see which processes are running in USR and which are using SYS e.g.

   PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/NLWP
    50 garyli   2.4  13 0.0 0.0 0.0 0.0  74 9.7  1K 169  9K   0 find/1
 26632 garyli   0.5 0.9 0.0 0.0 0.0 0.0  99 0.0   2   6 550   0 prstat/1
    47 garyli   0.4 0.6 0.0 0.5 0.0 0.0  98 0.8   4   7 544   0 prstat/1
 22201 garyli   0.5 0.4 0.0 0.0 0.0 0.0  95 4.4 251   4 745   0 acroread/1
 21140 garyli   0.0 0.0 0.0 0.1 0.0  17  83 0.1 130  25 210   0 netscape-bin/6
 24289 garyli   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   1   0   9   0 dtterm/1
    52 garyli   0.0 0.0 0.0 0.0 0.0 0.0 100 0.1   5   2  10   0 dtscreen/1
 13896 garyli   0.0 0.0 0.0 0.0 0.0  53  46 0.2 215   1 189   0 java/15
   169 daemon   0.0 0.0  -   -   -   -  100  -    0   0   0   0 statd/2
   170 daemon   0.0 0.0  -   -   -   -  100  -    0   0   0   0 lockd/2
 24299 garyli   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 ksh/1
 21221 garyli   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 dtcalc/1
 10102 garyli   0.0 0.0 0.0 0.0 0.0 0.0 100 0.0   0   0   0   0 ksh/1
   212 root     0.0 0.0  -   -   -   -  100  -    0   0   0   0 automountd/3
   141 root     0.0 0.0  -   -   -   -  100  -    0   0   0   0 ypbind/1
Total: 165 processes, 263 lwps, load averages: 0.47, 0.34, 0.18

So we see that find(1) is spending 13% of its time in sys. Hmmm tarzan say sys is bad - better log a bug. But wait, if we look at the SCL column we see it's made 9,000 system calls so no wonder it spent a lot if time in sys. Every time we make a system call, we are doing stuff in sys - so that explains that. If we really wanted to we could do a truss -c and see which syscalls are made, and we'd see stuff like chdir() lstat() fstat() and so on. Mystery solved.

So now you can drill down through your vmstat->mpstat->prstat to see who is using all that CPU. Of course it's pretty useless for short lived processes but you could always use regular accounting for that.

( Jul 16 2004, 03:28:15 PM BST ) Permalink