Solaris Quiz #2 (Which file)
A fairly simple quiz which probably has several answers. Here we go: We have a process which is already running. We dont want to restart the process. It is writing a lot of data to a file but we dont know which one. We suspect process has a lot of files open. How do we find the name of the file that we are writing to?
Restrictions
- The OS is Solaris 8
- We can become root
The prize will be whatever I can scrounge. The winner will not necessarily the first person to answer, but either the most elegant answer, or failing that the answer which meets my arbitrary and as yet unspecified criteria.
If anyone from Sun is reading this and knows whom I can contact for some prizes please email me ( Jul 30 2004, 03:38:58 PM BST ) Permalink Comments [1]
Solaris Quiz - answer
The correct answer was that ksh has 'true' as a built in function to the korn shell whereas the bourne shell execs a separate binary /usr/bin/true. So in vmstat/mpstat we see a lot of time attributed to 'sys' even though our instinct is that 'while true do' should be userland only. At least this sounds like a good theory. How to prove it though? Today Matthew our I'm going to be using truss(1). Im going to create 2 scripts#!/bin/sh while true do : done
and
#!/bin/ksh while true do : done
So now we have a couple of scripts to trace. Im going to use truss with -f so that we 'follow' any forked processes and -c to 'count' the number of system calls. We use '-f' because otherwise we'll only see the forked sub-shell and not the script that we are actually trying to measure. So what do we actually see? First of all using korn shell, then bourne shell.
garyli@podtwo[/var/tmp] $truss -fc ./x.ksh
^Csyscall seconds calls errors
read .000 3
open .000 12 1
close .000 16 2
time .000 2
brk .000 6
stat .000 1
getpid .000 2
getuid .000 2
fstat .000 11
getgid .000 2
ioctl .000 3 2
execve .000 1
fcntl .000 6
sigprocmask .000 1
sigaction .000 19
mmap .002 18
munmap .001 5
memcntl .000 4
llseek .000 2
resolvepath .001 8
stat64 .000 3
fstat64 .000 3
open64 .000 2
-------- ------ ----
sys totals: .012 132 5
usr time: .011
elapsed: 5.400
So here we see only 1 execve call. Now when we run the same script but under bourne shell...
garyli@podtwo[/var/tmp] $truss -fc ./x.sh
^Csyscall seconds calls errors
_exit .020 119
fork .079 120
read .040 477
open .080 721 121
close .020 959 120
brk .160 1081
stat .010 121
getpid .030 240
getuid .000 241
fstat .000 600
access .000 1
getsid .010 120
getpgid .010 120
getgid .000 240
ioctl .047 716 358
execve .310 121
fcntl .010 240
sigaltstack .020 120
sigaction .141 6360
waitid .053 238
mmap .181 1321
munmap .060 360
memcntl .040 240
sysconfig .010 240
resolvepath .100 722
stat64 .002 5 4
open64 .010 120
-------- ------ ----
sys totals: 1.450 15963 603
usr time: .461
elapsed: 5.370
We see 121 in the execve row, meaning that we execec'd 121 sub processes well 120 (less the exec of the subshell). Now the number of execs is dwarfed by say, the number of sigaction calls, so why concentrate on exec? Well the sigactions are just a part of what happens when some processes are exec'd. An individual truss -fc against /usr/bin/true shows 53 calls to sigaction (120 * 53 = 6360)
So, I think there's just one more question to be answered here. Why do we need to make 53 calls to sigaction for a binary that just returns 'true'. The answer is here...
garyli@podtwo[/var/tmp] $file /usr/bin/true /usr/bin/true: executable /usr/bin/sh script
So.... 'true' is actually a script...
garyli@podtwo[/var/tmp] $cat /usr/bin/true #!/usr/bin/sh # Copyright (c) 1984, 1986, 1987, 1988, 1989 AT&T # All Rights Reserved # THIS IS UNPUBLISHED PROPRIETARY SOURCE CODE OF AT&T # The copyright notice above does not evidence any # actual or intended publication of such source code. #ident "@(#)true.sh 1.6 93/01/11 SMI" /* SVr4.0 1.4 */
So we are starting an entire shell just to exit with '0' and thus give us the 'true' we want. Shells need to have signal handlers to correctly handle CTRL-Z and so on. That's all a bit superfluous for 'true'. We can do something a lot simpler than that such as...
int main()
{
return 0;
}
We can then compile that code and call it newtrue. Now when we run the script and get it to call 'newtrue' we get a truss output similar to this...
garyli@podtwo[/var/tmp] $truss -fc ./x1.sh
^Csyscall seconds calls errors
_exit .050 146
fork .101 147
read .000 1
open .090 594 148
close .010 448 1
brk .120 447
stat .050 148
getpid .040 148
getuid .000 2
fstat .020 446
getsid .000 1
getpgid .000 1
getgid .000 2
ioctl .019 294 1
execve .530 148
fcntl .000 2
sigaltstack .000 1
sigaction .001 53
waitid .063 292
mmap .141 1187
munmap .030 297
memcntl .050 149
sysconfig .000 2
resolvepath .220 741
open64 .000 1
-------- ------ ----
sys totals: 1.541 5698 150
usr time: .232
elapsed: 4.760
Now we see 148 execs but only 53 calls to sigaction. It's all a bit trivial, but shows quite nicely how much information that we can get out of an old-school tool like truss. If only everything were that simple.
And Finally. Congratulations to David Morris from the UK on getting the first post. ( Jul 27 2004, 03:15:36 PM BST ) Permalink
smalltownboy
So it's the weekend, and time to take the Hawk out for a spin.
Everyone should have one, well at least I should have one, but alas I was just borrowing this from BAE. One of the very few benefits of living in Aldershot, in addition to the famous gentlemens club (think Groucho club, then think again) is that the international airshow that is Farnborough is close enough to walk to. It's a great day out, albeit a bit bizarre seeing small children running around an exhibition hall dedicated to more efficient ways of killing people.
Anyhow on the tech side, our Blueprints CD winner made an interesting point which was that for all the talk about dtrace, there are many folks using Solaris who are still in the dark about truss and a lot of the tracing tools we have in production versions of Solaris. Would a short tutorial and/or hints 'n' tips on truss, p* tools and so on be of interest? If you'd like to see some part of Solaris discussed drop me a line via the comments or gjl@sun.com ( Jul 26 2004, 11:05:46 PM BST ) Permalink
Oracle on Network Attached Storage + blue boxin'
so it's not the most thrilling subject, but I found this document on BigAdmin a while ago, and just got around to reading it. The document contains a lot of good tips on setting up Oracle on NAS (Particularly NetApps). The impressive thing I found was that the engineers use real measured data to qualify much of what they recommend. The authors are from Sun's performance group, and from NetApp themselves. Check it out for general Solaris/Oracle/NFS configuration guidelines also.
And by way of apology for such a dry and boring blog entry, please enjoy this gratuitous picture of an original Woz designed blue box I found at the Coumputer History museum. Nice.
( Jul 21 2004, 10:02:21 PM BST )
Permalink
Solaris quiz
OK,
Here's a quick quiz and it's something that stumped me for a while. On the same system (SunOS 5.10, SPARC, Ultra10 - not that it matters) I run the same script under two different shells. When I use vmstat to monitor the system I seem to get totally different results. Under a ksh I get this
kthr memory page disk faults cpu r b w swap free re mf pi po fr de sr dd f0 s1 -- in sy cs us sy id 0 0 15 480384 121968 6 22 1 0 0 0 0 0 0 0 0 419 292 233 4 3 94 0 0 29 322856 35840 0 0 0 0 0 0 0 0 0 0 0 407 573 302 76 1 23 0 0 29 322856 35840 0 0 0 0 0 0 0 0 0 0 0 401 564 291 99 1 0 0 0 29 322856 35840 0 0 0 0 0 0 0 0 0 0 0 402 582 286 99 1 0 0 0 29 322856 35840 0 0 0 0 0 0 0 2 0 0 0 417 583 293 98 2 0
but under bourne shell (sh) I get this - hint look at the sys Vs usr column
kthr memory page disk faults cpu r b w swap free re mf pi po fr de sr dd f0 s1 -- in sy cs us sy id 0 0 15 480384 121968 6 22 1 0 0 0 0 0 0 0 0 419 292 233 4 3 94 0 0 29 322576 35536 251 2636 0 0 0 0 0 0 0 0 0 1072 6707 1052 19 69 12 3 0 29 322704 35576 187 2007 13 0 0 0 0 1 0 0 0 938 5470 849 17 74 9 0 0 29 322888 35736 238 2531 0 0 0 0 0 4 0 0 0 1067 6520 1044 17 72 11 0 0 29 322880 35728 244 2606 0 0 0 0 0 0 0 0 0 1072 6673 1038 17 72 11The vmstat was collected using vmstat 5. The script is very simple
garyli@arches[/home/garyli] $while true > do > : > doneWhy is that? What is happening here? The first correct answer will get a Sun Blueprints CD. ( Jul 21 2004, 10:12:47 AM BST ) Permalink Comments [3]
You shouldn't laugh but
Personally I found this story 'Reverse air-rage' on Russian jet hilarious! ( Jul 21 2004, 09:57:24 AM BST ) PermalinkGood news for bad databases...
I just closed a long running escalation whereby old oracle connections would appear to hang, and new ones could not be established. A truss of a new sqlplus session showed that the oracle process was sleeping on a semop system call.sqlplus/1: read(10, 0x10024FF66, 2064) (sleeping...)^M oracle/1: semop(1245185, 0xFFFFFFFF7FFFAF9C, 1) (sleeping...)^M sqlplus/1: 556.9328 Received signal #2, SIGINT, in read() ^^^Here we see the process is eventually killed with CTRL-CAnyhow the fix was to apply Oracle update 9.2.0.5 (9205) which according to my Oracle contacts contains around 1,100 fixes. 9.2.0.4 contained in the region of 800 compared to 9.2.0.3 Cor blimey, what a whopper . ( Jul 19 2004, 04:12:07 PM BST ) Permalink
I was a teenage raver...
But that was a long time ago, these days things are a bit more sedate. However, I did manage to get along to a gig of sorts this weekend. Ruth won two tickets to see Pavarotti at Leeds Castle which is in Kent, England. The prize included a hotel stay, hospitality (drinks and small portions of food) excellent seats ( we sat just behind John Simpson and he works for the BBC you know) and dinner.
Anyhow regarding the concert itself, I was totally unprepared for the awsome sound of the full orchestra on stage. I now think that one of the reasons I have never really appreciated orchestral music is down to the fact that it is very hard to reproduce on a home Hi-Fi. So I now have an excuse to save up for the Linn kit I've always wanted. Of course being a commoner doesnt help either, but I cant change that.
Anyhow after the gig/concert we got to eat in the castle itself. Apparently the room we ate in was used by no less than King Henry VIII.
Heres a picture of the castle:-
And the dining room, featuring pictures of some of his adoring wives...some of which he decided to kill.
The music was great but I dont think Im about to become an opera fan anytime soon, though I'd like to check out some more orchestral/classical pieces and you cant beat being outdoors whilst your at it. If you ever get the chance, give it a try. ( Jul 19 2004, 10:26:45 AM BST ) Permalink
Oi! who said you could use that CPU
I just happened to stumble across a very useful switch to prstat, which I either forgot about or never knew existed. The magic switch is -m which displays microstate accounting information. Woo hoo you say, WTF does that mean? Well it means that you can see which processes are running in USR and which are using SYS e.g.
PID USERNAME USR SYS TRP TFL DFL LCK SLP LAT VCX ICX SCL SIG PROCESS/NLWP
50 garyli 2.4 13 0.0 0.0 0.0 0.0 74 9.7 1K 169 9K 0 find/1
26632 garyli 0.5 0.9 0.0 0.0 0.0 0.0 99 0.0 2 6 550 0 prstat/1
47 garyli 0.4 0.6 0.0 0.5 0.0 0.0 98 0.8 4 7 544 0 prstat/1
22201 garyli 0.5 0.4 0.0 0.0 0.0 0.0 95 4.4 251 4 745 0 acroread/1
21140 garyli 0.0 0.0 0.0 0.1 0.0 17 83 0.1 130 25 210 0 netscape-bin/6
24289 garyli 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 1 0 9 0 dtterm/1
52 garyli 0.0 0.0 0.0 0.0 0.0 0.0 100 0.1 5 2 10 0 dtscreen/1
13896 garyli 0.0 0.0 0.0 0.0 0.0 53 46 0.2 215 1 189 0 java/15
169 daemon 0.0 0.0 - - - - 100 - 0 0 0 0 statd/2
170 daemon 0.0 0.0 - - - - 100 - 0 0 0 0 lockd/2
24299 garyli 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 ksh/1
21221 garyli 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 dtcalc/1
10102 garyli 0.0 0.0 0.0 0.0 0.0 0.0 100 0.0 0 0 0 0 ksh/1
212 root 0.0 0.0 - - - - 100 - 0 0 0 0 automountd/3
141 root 0.0 0.0 - - - - 100 - 0 0 0 0 ypbind/1
Total: 165 processes, 263 lwps, load averages: 0.47, 0.34, 0.18
So we see that find(1) is spending 13% of its time in sys. Hmmm tarzan say sys is bad - better log a bug. But wait, if we look at the SCL column we see it's made 9,000 system calls so no wonder it spent a lot if time in sys. Every time we make a system call, we are doing stuff in sys - so that explains that. If we really wanted to we could do a truss -c and see which syscalls are made, and we'd see stuff like chdir() lstat() fstat() and so on. Mystery solved.
So now you can drill down through your vmstat->mpstat->prstat to see who is using all that CPU. Of course it's pretty useless for short lived processes but you could always use regular accounting for that.
( Jul 16 2004, 03:28:15 PM BST ) Permalink
