Monday Mar 24, 2008
One of the joy's of UNIX is that even after 16 years of using it, the opportunity to trip over
new useful/interesting commands/features which have been around for a while. Even better if you find something when you are not looking for it.
NAME
quot - summarize file system ownership
SYNOPSIS
quot [-acfhnv] filesystem...
quot -a [-cfhnv]
DESCRIPTION
quot displays the number of blocks (1024 bytes) in the named
filesystem (one or more) currently owned by each user. There
is a limit of 2048 blocks. Files larger than this will be
counted as a 2048 block file, but the total block count will
be correct.
This is the output from a root filesystem on a V890 with Solaris 10. The -v option is interesting in that it gives 3 extra columns of blocks not accessed in 30, 60 and 90 days. Yet to think of a use for the -v, but I am sure we will find one.
# quot -f -v /
/dev/rdsk/c1t0d0s0:
3956446 173808 root 3679449 3491876 3301515
901 73 uucp 503 502 467
133 19 adm 21 21 4
118 115 smmsp 57 4 0
74 5 svctag 3 2 0
48 41 noaccess 47 4 0
28 28 lp 24 23 2
11 3 bin 11 11 11
8 8 daemon 8 4 0
8 16 nobody 8 8 0
8 8 postgres 8 8 0
4 4 gdm 4 4 0
4 4 webservd 4 4 0
3 9 clivek 1 1 1
Friday Mar 14, 2008
I use a Macbook for travelling as it is light and the battery lasts a few hours (compared to the 45 minutes for the Ferrari!). I have been using Solaris under Parallels for over a year, but the integration with OSX never quite cut it.
I got round to installing VMWare Fusion yesterday and was pleased that it just worked. Once VMWARE tools was installed the network just worked (apart from ssh which was disabled on the OSX side). I used Indiana(OpenSolaris Developers Preview 2) as I wanted to play with the new packaging framework.
I don't have any great hints or tricks beyond follow the instructions and when the guest is installed go to the "Virtual Machine" menu item and click "start VMWARE tools Installation" drop down.
Fusion might actually be good enough to fork out my own money when the evaluation period comes to an end!
Thursday Dec 27, 2007
I have been a Sun OS Ambassador since 1999. Most of my focus has been to maintain the relationship Sun has with Universities in the UK, so I have taken the unofficial role of UK OS Ambassador for Education. I worked at a University in the role of Systems Administrator and as a researcher and doing some teaching, so it is a natural fit for me.
Its really good to see Universities at the top of Sun's corporate agenda again in a meaningful way. The most
visible activity which shows Sun is *really* serious about education again is the Campus Ambassador.
I thought it was worth listing the range of activities I have been involved as OS Ambassador since 1999 and if you are either a UK Campus Ambassador or work in a UK University and think that I might be of use, it would be a pleasure to help/pay you a visit or "get the right person" to help/pay you a visit.
- Solaris technology demo. Has been DTrace, ZFS, Zones over the last few years, but a wider range is possible.
- Invited talks to 2nd year Operating Systems courses.
- PhD external examiner in areas related to Operating or Distributed Systems.
- Our new system is slow/does not work and we can't get this issue resolved (yes, it does happen, though not very often and working in Service really helps on this one).
- I don't know who else to talk to in Sun, I know you are not the right person, but can you find them.
- We need someone independent to help us with Strategy Facilitation (I can do quite a good impression of a Management Consultant without any dress sense). Most of this is using techniques from the Kepner Tregoe Rational Process toolbox.
- Free performance analysis via SharedShell if I can blog about it.
- Student Mock Interviews and "how to prepare for interviews in industry" lectures.
- Department Industrial Liaison boards.
- Account team has a quick technical question and does not know where else to ask it.
- Outside/Independent member of an interview panel.
UK OS Ambassador for Education is one of my hobbies, a bit like rock climbing, fell running or restoring a 1973 Muir-Hill A5000. Having a day job in Support Services means I only get to spend a day or 2 a month outside the support dungeon doing Ambassador type work.
If you work in a UK university and think any of the above might help you, drop me an email and I look forward to meeting you.
Thursday Aug 30, 2007
I have been to a couple of customer sites this year
where between ¼ and 1/6 of 20-40 cpu systems has been consumed
by various types of monitoring. Over monitoring is a contributor to
scalability issues that causes the customer to introduce additional
monitoring. If you run large systems by CPU/core count (this now
includes T2000), please read on. We introduce a few principles for
the purpose of summary along the way.
One of the downsides of any monitoring is that it
has overhead. Often cited as the Heisenburg
Principle, but should be called the Observer
Effect when discussing about computer systems. We shall put to
one side a cat in a box and any associated philosophy. Let us agree
that if you try to look at a system you will change it and the deeper
you look the more overhead you add. kstats and DTrace are a best
case, an overhead still exists. One enabled DTrace probe has a tiny
overhead, 30,000 enabled probes has more overhead. Few customers buy
our systems to run monitoring software, most use our systems to
support their business, so Principle 1 : monitor what you care
about now to solve today's business problem, not what you might care
about in future
The use of /proc often(event distribution
measured in small numbers of seconds) is a big deal. The procfs
filesystem has a goal of giving correct data at the point in time
that an observation is made. When a choice needs to be made, Solaris
architects choose correct and slower over faster and misleading. This
trade-off means, in the case of process monitoring, you can't have
both correct and performant. Its a lot of work to give a consistent
picture of a process. A ps -ef on our Sunray server with in
the region of 4000 processes causes a sum of 3226,292 kernel fbt
probes to fire. /proc is not a lightweight interface, so we
need to be selective about its use.
/proc is also very lock heavy, it needs to be
to ensure it gives a consistent picture. Every /proc operation
acquires and release proclock and pidlock,
among other locks. Being lock heavy means its very easy
to write application which don't scale if /proc is used on a
regular basis. I was involved in a hot customer issue where an early
SF15K did not scale beyond 28 cpu's for a particular in-house Java
application. The highly threaded application used the pr_fname field
of the curprpsinfo structure to get the process name for every
operation. The process name never changes and the developers had no
idea a 3rd party native library used /proc
in this way. The SMTX column in mpstat lit up and lockstat -C
shows the errant kernel call stack very clearly. Easy fix to the
application once it was pointed out and a huge drop in system time
once the library cached the program name. Which leads to principle
2:- If you need to do it often, don't do it often with procfs
The Solaris kernel does not store the size of a
processes address space at the current point in time. Its only
typically needed by tools such as ps, so the /proc interface
ps_procinfo??? calculates it as required and when needed. This is a
non trivial operation as each segment you see in the pmap
output for a processes needs to have the appropriate vnode locked,
the size taken and the vnode unlocked. It not unusual to have
processes with in excess of 100,000 mapped segments. 30 such
processes and you can see that procfs on behalf of ps
needs to do a lot of work to return the size of the address space of
a process. Couple of ps instances running at the same time
along with prstat and top and quite a bit of your very
busy multi-million dollar database server gets consumed.
The most successful approach to system monitoring
that I have observed our customers employ is to monitor at a business
level such as user experienced response time. For some application
types it takes a lot of work to get beyond measuring how often the
help desk or CIO's phone rings and the word slow is uttered. To
get useful quantitative metrics which give a useful representation of
the user experience and to provide a clear trigger when it degrades
is a highly non-trivial task. This may explain why many
organisations have relied on system level metrics such as user/system
time ratio or even I/O wt time(don't even joke about using wt)!
System level metrics only typically confuse the process of resolution
if used out of context with the business problem.. Taking system
level metrics outside the context of the flow of data to and from the
user (be it human or silicon based) typically leads to a colourful
festival of explaining incrementing values of obscure kstats, rather
than solving business problems and establishing actual root cause.
Which leads to principle 3: Use only measures and metrics you
fully understand
I was asked to pay a visit to a customer by an
account manager where staff investigating a SAP performance issue had
been flown from the UK to Hong Kong to conduct network tests with a
result of its not the network.
A morning of understanding the business need, system (people,
software, computers, networks)
and the interaction between components followed by 10 minutes of
DTrace(Truss would have done fine) showed the problem to be
the efficiency of coding of a SAP script. I can't spell SAP, but
following the flow of data between components, observing with the
intent of answering the SGRT questions of where on the
object and when in the lifecycle has not let me down yet.
Which leads to principle 4: Strong and relevant business
metrics avoid wrong turns.
There is a school of thought that suggests a system
should have no system level performance monitoring enabled on the box
itself unless a baseline is being established, a problem is being
pursued or data is being collected for a specific capacity planning
exercise. For the most part I agree, based on the observation that
continual low level system monitoring, on balance, causes more
problems than it addresses. Storage monitoring products in particular
appear adept at consuming a cpu or 2.
One of the Sun support tools, GUDS shares of the
same concerns. Its important to get the purpose of GUDS into
perspective. GUDS is intended to capture as much potentially useful
data as possible in one shot such that we get useful data for most
types of performance problem. Thus we accept that there is a non-zero
overhead and must allow for it in the analysis
and be judicious in its use. Like any tool, its the context in which
you use GUDS that counts and can add great value. We (Global
Performance V-team) often get asked you have a look at GUDS output
and diagnose a situation where the problem definitions is system
going slow. GUDS is a 1st pass tools when you don't
have a decisive problem definition or you want to gather baseline
data.
GUDS add load to the system it monitors. ::memstat
in mdb, for example, takes 1 CPU and non trivial numbers of
cross-calls to walk each page in memory and determine what the page
is used for. TNF and lockstat also add a overhead. GUDS, when used in
the right context, with the right -X option, is highly effective. As
the Welsh Rugby commentator and ex-international Jonathan
Davies notes in a different context its the top 2 inches that
count.
This leads us to principle 5: Use system level
metrics only in the context of understanding a business lead
performance issue.
I have mentioned the need for relevant business
metrics, itself a huge and complex subject, to replace obtrusive
system level monitoring as the trigger to investigate when a problem
that impacts the business arises. If you set out on a journey it
helps to know your objective, business level metrics assist in
knowing when you are making progress and when you are done. It also
curious how often capacity planning gets confused with business
metrics.
Back to /proc. Some useful one liners for finding
overhead that is just overhead.
Lets see what processes are using /proc over 60
seconds
dtrace -n
procfs::entry'{@[execname] = count()}' -n tick-60s'{exit(0)}'
For an application proc_pig, lets find the
user land stack which causes procfs to be called.
dtrace -n
procfs::entry'/<80><9D>execname == <80><9C>proc_pig<80><9D>/@[ustack()] = count()}'
-n tick-60s'{exit(0)}'
One of the DTrace demo scripts is very useful for
highlighting those monitoring processes which spawn many child
processes.
dtrace -s
/usr/demo/dtrace/whoexec.d -s tick-60s'{exit(0)}'
ps(or pgrep) are often used in scripts
to determine if a child process identified either by name or by PID
is still running. ps is a process monitoring lump hammer and
its use in process state scripts is architecturally questionable,
more so with the advent of SMF in Solaris 10.
So if you have a script that does something along
the lines of
while true
do
ps -ef | awk
'{print $2}' | egrep '^$PID$' > /dev/null
if [ $? != 0 ] ;
then
restart process
fi
sleep 10
done
to restart a process called $PID if it dies.
A step in the right direction in reducing the
overhead is to use
ps -p $2 >
/dev/null
in the ps line. If the process does not
exist, then a non-zero return code is given and when the process does
exist, the overhead of traversing every process and calculating its
address space size is avoided.
To do the proper job, let SMF take the strain and
manage it for you. The underlying contracts framework detects if a
process dies and get it restarted One of the best places to learn
about writing your own services is Bigadmin
.
This leads us to principle 6: Use the right
architecture and tools for service management
Writing a SMF service to restart a process, while
not a trivial task, is easier and less error prone than writing
efficient and correct shell script!
In summary, we have touched on a number of topics
which relate to monitoring and the resultant impact on overall system
performance. The obvious open question which is how do we generate
meaningful business metrics beyond how long a batch job takes to run
or how often the phone rings? Its a tough subject as most situations
are unique and I would be interested in real examples of useful
non-trivial business metrics in complex environments.
Monitor what you care about now to solve
today's business problem, not what you might care about in future
If you need to do it often, don't do it often
with procfs
Use only measures and metrics you fully
understand
Strong and relevant business metrics avoid
wrong turns
Use system level metrics only in the context of
understanding a business lead performance issue
Use the right architecture and tools for
service management
If you can think of any more from your experiences,
please drop me an email or add a comment.
ufs is dead. Long live zfs.
For Chris's next trick, he will show you how to do...
Well the first thing to point out is that quot, li...