The Good, the Blog & the Ugly - Tim Cook's Weblog

« Previous page | Main | Next page »

http://blogs.sun.com/timc/date/20080421 Monday April 21, 2008

Comparing the UltraSPARC T2 Plus to Other Recent SPARC Processors

Update - now the UltraSPARC T2 Plus has been released, and is available in several new several Sun servers. Allan Packer has published a new collection of blog entries that provide lots of detail.

Here is my updated table of details comparing a number of current SPARC processors. I can not guarantee 100% accuracy on this, but I did quite a bit of reading...

Name UltraSPARC IV+® SPARC64TM VI UltraSPARCTM T1 UltraSPARCTM T2 UltraSPARCTM T2 Plus
Codename Panther Olympus-C Niagara Niagara 2 Victoria Falls
Physical
process 90nm 90nm 90nm 65nm 65nm
die size 335 mm2 421 mm2 379 mm2 342 mm2
pins 1368 1933 1831
transistors 295 M 540 M 279 M 503 M
clock 1.5 – 2.1 GHz 2.15 – 2.4 GHz 1.0 – 1.4 GHz 1.0 – 1.4 GHz 1.2 – 1.4 GHz
Architecture
cores 2 2 8 8 8
threads/core 1 2 4 8 8
threads/chip 2 4 32 64 64
FPU : IU 1 : 1 1 : 1 1 : 8 1 : 1 1 : 1
integration 8 × small crypto 8 × large crypto, PCI-E, 2 × 10Gbe 8 × large crypto, PCI-E, multi-socket coherency
virtualization domains1 hypervisor
L1 i$ 64K/core 128K/core 16K/core
L1 d$ 64K/core 128K/core 8K/core
L2 cache (on-chip) 2MB, shared, 4-way, 64B lines 6MB, shared, 10-way, 256B lines 3MB, shared, 12-way, 64B lines 4MB, shared, 16-way, 64B lines
L3 cache 32MB shared, 4-way, tags on-chip, 64B lines n/a n/a
MMU on-chip
on-chip, 4 × DDR2 on-chip, 4 × FB-DIMM on-chip, 2 × FB-DIMM
Memory Models TSO TSO TSO, limited RMO
Physical Address Space 43 bits 47 bits 40 bits
i-TLB 16 FA + 512 2-way SA 64 FA
d-TLB 16 FA + 512 2-way SA 64 FA 128 FA
combined TLB 32 FA + 2048 2-way SA
Page sizes 8K, 64K, 512K, 4M, 32M, 256M 8K, 64K, 512K, 4M, 32M, 256M 8K, 64K, 4M, 256M
Memory bandwidth2 (GB/sec) 9.6 25.6 60+ 32

Footnotes

  • 1 - domains are implemented above the processor/chip level
  • 2 - theoretical peak - does not take cache coherency or other limits into account

Glossary

  • FA - fully-associative
  • FPU - Floating Point Unit
  • i-TLB - Instruction Translation Lookaside Buffer (d means Data)
  • IU - Integer (execution) Unit
  • L1 - Level 1 (similarly for L2, L3)
  • MMU - Memory Management Unit
  • RMO - Relaxed Memory Order
  • SA - set-associative
  • TSO - Total Store Order

References:

http://blogs.sun.com/timc/date/20080409 Wednesday April 09, 2008

What Drove Processor Design Toward Chip Multithreading (CMT)?

I thought of a way of explaining the benefit of CMT (or more specifically, interleaved multithreading - see this article for details) using an analogy the other day. Bear with me as I wax lyrical on computer history...

Deep back in the origins of the computer, there was only one process (as well as one processor). There was no operating system, so in turn there were no concepts like:

  • scheduling
  • I/O interrupts
  • time-sharing
  • multi-threading

What am I getting at? Well, let me pick out a few of the advances in computing, so I can explain why interleaved multithreading is simply the next logical step.

The first computer operating systems (such as GM-NAA I/O) simply replaced (automated) some of the tasks that were undertaken manually by a computer operator - load a program, load some utility routines that could be used by the program (e.g. I/O routines), record some accounting data at the completion of the job. They did nothing during the execution of the job, but they had nothing to do - no other work could be done while the processor was effectively idle, such as when waiting for an I/O to complete.

Then muti-processing operating systems were developed. Suddenly we had the opportunity to use the otherwise wasted CPU resource while one program was stalled on an I/O. In this case the O.S. would switch in another program. Generically this is known as scheduling, and operating systems developed (and still develop) more sophisticated ways of sharing out the CPU resources in order to achieve the greatest/fairest/best utilization.

At this point we had enshrined in the OS the idea that CPU resource was precious, not plentiful, and there should be features designed into the system to minimize its waste. This would reduce or delay the need for that upgrade to a faster computer as we continued to add new applications and features to existing applications. This is analogous to conserving water to offset the need for new dams & reservoirs.

With CMT, we have now taken this concept into silicon. If we think of a load or store to or from main (uncached) memory as a type of I/O, then thread switching in interleaved multithreading is just like the idea of a voluntary context switch. We are not giving up the CPU for the duration of the "I/O", but we are giving up the execution unit, knowing that if there is another thread that can use it, it will.

In a way, we are delaying the need to increase the clock rate or pipe-lining abilities of the cores by taking this step.

Now the underlying details of the implementation can be more complex than this (and they are getting more complex as we release newer CPU architectures like the UltraSPARC T2 Plus - see the T5140 Systems Architecture Whitepaper for details), but this analogy to I/O's and context switches works well for me to understand why we have chosen this direction.

To continue to throw engineering resources at faster, more complicated CPU cores seems to be akin to the idea of the mainframe (the closest descendant to early computers) - just make it do more of the same type of workload.

See here for the full collection of UltraSPARC T2 Plus blogs

http://blogs.sun.com/timc/date/20080221 Thursday February 21, 2008

Margins in Consumer Telephony

Here is a little observation on telephone margins that is dear to my heart. Below is a list of rates (in US dollars per minute, taxes and other fees not shown) for various methods of calling from the US to a land-line in Australia. The last four options use VoIP.

Source Carrier Add-on Plan Add-on $/month Rate
Land-line AT&T none – Peak - $4.00
Land-line AT&T none – Off-peak - $2.76
Mobile AT&T none - $3.49
Mobile AT&T World Connect $3.99 $0.09
Land-line AT&T Occasional Calling $1.00 $1.75
Land-line AT&T Worldwide Value Calling $5.00 $0.09
Land-line Time-Warner Cable
- $0.10
Land-line Comcast

$0.09
Land-line Vonage

$0.05
Land-line AT&T CallVantage

$0.04
Land-line Callcentric

$0.0231
Land-line CallWithUs

$0.0148

As you may see, there is a 27000% range in these numbers. Even with that one carrier there is a 100x range. Plenty of opportunity for profit.

Hopefully it is useful to be aware there can be some very steep rates for ex-pat Aussies to call home if they are away from their preferred carrier.

I have been quite satisfied with CallWithUs, if anyone is interested. They even have a call-back feature if I want to call from my mobile.

While I'm on the topic, I should also mention this helpful message I got from my wireless (mobile) provider (although they are no longer my provider):

When you're on the go and don't have the info you need, AT&T 411 is here to help. Whether you're searching for a business or residence - dial 4-1-1 to get quick access to phone numbers and addresses. Plus, with AT&T 411 you can find movie times, driving directions and more. And it's just $1.79 per call plus standard airtime charges.*

Thanks for the reminder - I will be vigilant to avoid that $1.79 charge, and stick to 1-800-FREE411...

http://blogs.sun.com/timc/date/20080213 Wednesday February 13, 2008

Utilization - Can I Have More Accuracy Please?

Just thought I would share another Ruby script - this one takes the output of mpstat, and makes it more like the output of mpstat -a, only the values are floating point. I wrote it to process mpstat -a that I got from a customer. It can also cope with the date (in Unix ctime format) being prepended to every line. Here is some sample output:

CPUs minf mjf xcal  intr ithr  csw icsw migr smtx  srw syscl  usr sys  wt idl
 4 7.0 0.0 26.0 114.25 68.0 212.75 16.75 64.75 11.75 0.0 141.25 1.0 1.0 0.0 98.5
 4 0.75 0.0 929.75 2911.5 1954.75 10438.75 929.0 4282.0 715.0 0.0 6107.25 39.25 35.75 0.0 25.25
 4 0.0 0.0 892.25 2830.25 1910.5 10251.5 901.5 4210.0 694.5 0.0 5986.0 38.5 35.0 0.0 26.75
 4 0.0 0.0 941.5 2898.25 1926.75 10378.0 911.75 4258.0 698.0 0.0 6070.5 39.0 35.5 0.0 25.25
 4 0.0 0.0 893.75 2833.75 1917.75 10215.0 873.75 4196.25 715.25 0.0 5925.25 38.0 34.75 0.0 27.25

The script is here.

Interestingly, you can use this to get greater accuracy on things like USR and SYS than you would get if you just used vmstat, sar, iostat or mpstat -a. This depends on the number of CPUs you have in your system though.

Now, if you do not have a lot of CPUs, but still want greater accuracy, I have another trick. This works especially well if you are conducting an experiment and can run a command at the beginning and end of the experiment. This trick is based around the output of vmstat -s:

# vmstat -s
[...]
 54056444 user   cpu
 42914527 system cpu
1220364345 idle   cpu
        0 wait   cpu

Those numbers are "ticks" since the system booted. A tick is usually 0.01 seconds.

NEW: I have now uploaded a script that uses these statistics to track system-wide utilization.

http://blogs.sun.com/timc/date/20071102 Friday November 02, 2007

Comparing the UltraSPARC T2 to Other Recent SPARC Processors

This is now a placeholder. You probably want to read my updated blog on SPARC processor details to get the latest.

http://blogs.sun.com/timc/date/20070831 Friday August 31, 2007

How Event-Driven Utilization Measurement is Better than Sample-Based

...and how to measure both at the same time

With the delivery of Solaris 10, Sun made two significant changes to how system utilization is measured. One change was to how CPU utilisation is measured

Solaris used to (and virtually all other POSIX-like OS'es still) measure CPU utilisation by sampling it. This happened once every "clock tick". A clock tick is a kernel administrative routine which is executed once (on one CPU) for every clock interrupt that is received, which happens once every 10 milliseconds. At this time, the state of each CPU was inspected, and a "tick" would be added to each of the "usr", "sys", "wt" or "idle" buckets for that CPU.

The problem with this method is two-fold:

  • It is statistical, which is to say it is an approximation of something, derived via sampling
  • The sampling happens just before the point when Solaris looks for threads that are waiting to be woken up to do work.

Solaris 10 now uses microstate accounting. Microstates are a set of finer-grained states of execution, including USR, SYS, TRP (servicing a trap), LCK (waiting on an intra-process lock), SLP (sleeping), LAT (on a CPU dispatch queue), although these all fall under one of the traditional USR, SYS and IDLE. These familiar three are still used to report system-wide CPU utilisation (e.g. in vmstat, mpstat, iostat), however you can see the full set of states each process is in via "prstat -m".

The key difference in system-wide CPU utilization comes in how microstate accounting is captured - it is captured at each and every transition from one microstate to another, and it is captured in nanosecond resolution (although the granularity of this is platform-dependent). To put it another way it, it is event-driven, rather than statistical sampling.

This eliminated both of the issues listed above, but it is the second issue that can cause some significant variations in observed CPU utilization.

If we have a workload that does a unit of work that takes less than one clock tick, then yields the CPU to be woken up again later, it is likely to avoid being on a CPU when the sampling is done. This is called "hiding from the clock", and is not difficult to achieve (see "hide from the clock" below).

Other types of workloads that do not explicitly behave like this, but do involve processes that are regularly on and off the CPU can look like they have different CPU utilization on Solaris releases prior to 10, because the timing of their work and the timing of the sampling end up causing an effect which is sort-of like watching the spokes of a wheel or propeller captured on video. Another factor involved in this is how busy the CPUs are - the closer a CPU is to either idle or fully utilized, the more accurate sampling is likely to be.

What This Looks Like in the Wild

I was recently involved in an investigation where a customer had changed only their operating system release (to Solaris 10), and they saw an almost 100% increase (relative) in reported CPU utilization. We suspected that the change to event-based accounting may have been a factor in this.

During our investigations, I developed a DTrace utility which can capture CPU utilization that is like that reported by Solaris 10, then also measure it the same way as Solaris 9 and 8, all at the same time.

The DTrace utility, called util-old-new, is available here. It works by enabling probes from the "sched" provider to track when threads are put on and taken off CPUs. It is event-driven, and sums up nanoseconds the same way Solaris 10 does, but it also tracks the change in a system variable, "lbolt64" while threads are on CPU, to simulate how many "clock ticks" the thread would have accumulated. This should be a close match, because lbolt64 is updated by the clock tick routine, at pretty much the same time as when the old accounting happened.

Using this utility, we were able to prove that the change in observed utilisation was pretty much in line with the way Solaris has changed how it measures utilisation. The up-side for the customer was that their understanding of how much utilisation they had left on their system was now more accurate. the down side was that they now had to re-assess whether, and by how much, this changed the amount of capacity they had left.

Here is some sample output from the utility. I start the script when I already have one CPU-bound thread on a 2-CPU system, then I start up one instance of Alexander Kolbasov's "hide-from-clock", which event-based accounting sees, but sample-based accounting does not:

mashie[bash]# util-old-new 5
NCPUs = 2
Date-time              s8-tk/1000   s9-tk/1000      ns/1000
2007 Aug 16 12:12:14          508          523          540
2007 Aug 16 12:12:19          520          523          553
2007 Aug 16 12:12:24          553          567          754
2007 Aug 16 12:12:29          549          551          798
2007 Aug 16 12:12:34          539          549          810
^C

The Other Change in Utilization Measurement

By the way, the other change was to "hard-wire" the Wait I/O ("%wio" or "wt" or "wait time") statistic to zero. The reasoning behind this is that CPU's do not wait for I/O (or any other asynchronous event) to complete - threads do. Trying to characterize how much a CPU is not doing anything in more than one statistic is like having two fuel gauges on your car - one for how much fuel remains for highway driving, and another for city driving.

References & Resources

P.S. This entry is intended to cover what I have spoken about in my previous two entries. I will soon delete the previous entries.

http://blogs.sun.com/timc/date/20070712 Thursday July 12, 2007

nicstat - Update for Solaris & Linux

I have made a minor change to nicstat on Solaris and Linux. The way it schedules its work has been improved.

Use the links from my latest entry on nicstat for the latest source and binaries.

I will write up a more detailed explanation along with a treatise on the merits of different scheduling methodologies in a post in the near future.

http://blogs.sun.com/timc/date/20070601 Friday June 01, 2007

Using DTrace to Capture Statement Execution Times in MySQL

Introduction

I have recently been engaged with a customer that is evaluating MySQL, in particular with its Memory (formerly known as Heap) engine, which stores all database data and metadata in RAM.

I was asked to assist with diagnosing if/whether/where statements were taking longer than 5 milliseconds to complete. Now, this was being observed from the viewpoint of the "client" - the client was a synthetic benchmark run as a Java program. It could be run either on a separate system or on the same system as the MySQL database, and a small number of transactions would be measured as taking longer than 5ms.

Now, there is more than one way to skin this cat, and it turns out that Jenny Chen has had a go at putting static probes into MySQL. For my (and the customer's) purposes however, we wanted to skip the step of re-compiling with our own probes, and just use what we can observe via use of the PID provider.

How Do We Do This?

Well, it is not trivial. However as it turns out, I have seen a bit of the MySQL code. I also had someone pretty senior from MySQL next to me, who helped confirm what was happening, as I used some "fishing" bits of DTrace to watch a mysqld thread call functions as we ran the "mysql" client and entered simple statements.

This allowed me to narrow down on a couple of vio_*() routines, and to build some pseudo-code to describe the call flow around reception of requests from a client, processing of the statement, then return of a result to the client.

It is not as simple as looking for the entry and return of a single function, because I wanted to capture the full time from when the mysqld thread returns from a read(2) indicating a request has arrived from a client through to the point where the same thread has completed a write(2) to send the response back. This is the broadest definition of "response time measure at the server".

The Result

The result of all of our measurements showed that there were no statements from the synthetic benchmark that took longer than 5 ms to complete. Here is an example of the output of my DTrace capture (a histogram of microseconds):

bash-3.00# ./request.d -p `pgrep mysqld`
^C

           value  ------------- Distribution ------------- count
             < 0 |                                         0
               0 |@@@@@@@@@@@@@@@@@@@@@                    10691
             500 |@@@@@@@@@@@@@@@@@                        8677
            1000 |@                                        680
            1500 |                                         31
            2000 |                                         0

The Script

Feel free to use the DTrace script for your own purposes. It should work on MySQL 5.0 and 5.1.

The Upshot - Observability in the JVM

There is a nagging question remaining - why was the Java client seeing some requests run longer than 5 ms?.

There are three possible answers I can think of:

  1. There is latency involved in transmitting the requests and responses between the client and server (e.g. network packet processing).
  2. The thread inside the JVM was being pre-empted (thrown off the CPU) between taking its time measurements.
  3. The measurements (taken using System.nanoTime()) are not reliable.

http://blogs.sun.com/timc/date/20070530 Wednesday May 30, 2007

Case Study - AT&T Customer Relationship Management

Time-line of Events

April 20th, 2006 - I apply for SBC Yahoo Internet, which includes a sweetheart deal that will last for 12 months. Included in the terms & conditions is an early termination fee of $99 is mentioned.

April 21st, 2006 - The service is connected

April 13th, 2007 (approx) - I contact AT&T (had since re-acquired SBC) to disconnect phone & internet service, due to my imminent move to a new apartment. Disconnection is booked for April 14th. No mention from AT&T call center staff of early termination fee.

April 18th, 2007 - AT&T issue me a a bill. It is for $17.22 and is clearly marked "FINAL".

May 16th, 2007 - AT&T issue me another bill. This one is marked "REVISED FINAL BILL", and contains a $99 charge for "EARLY TERMINATION-HSI BASIC".

May 25th, 2007 - I contact AT&T and ask for an explanation. They explain the early termination fee. I ask if I can get my service re-connected. I am told that no, I can not get it re-connected as it has been more than 30 days since it was disconnected. I ask if there is any way this situation can be rectified - suggest that I could reconnect if a credit was made to cover the early termination fee. There is no flexibility available to the person I was speaking to, though they did empathize with my situation.

UPDATE

June 26th, 2007 - I send a check to AT&T for the $99, but include a request for an enclosed copy of this blog entry to be forwarded to their customer service department.

July 27th, 2007 - AT&T send a check for $99 back to me, refunding the early termination fee.

Outcome

So, initially I was disappointed, but ultimately I am happy with AT&T. I could never say that they were not in the right, but I would still suggest that they could give more latitude to the people who work in their call centers.

It would also be wise to advise customers of the pending early termination fee at the earliest opportunity.

http://blogs.sun.com/timc/date/20070316 Friday March 16, 2007

Simplifying "lockstat -I" Output (or Ruby Is Better Than Perl)

There, I said it. I have been a Perl scripter for nearly 20 years (since early version 3). I think Ruby has pretty much everything Perl has, and more, like:

  • It is really object-oriented (rather than Perl's bolt-on objects). I am much more likely to end up using this functionality.
  • It has operator overloading
  • It has a "case" statement
  • There is no Obfuscated Ruby Contest (though IMHO there could be one)

Anyway, enough religous argument. I want to thank my colleague Neel for putting me onto it, and now I will provide a simple example.

Post-Processing "lockstat -I"

The Ruby script below post-processes the output of "lockstat -I". Why you ask? - well, because the output of "lockstat -I" can be tens of thousands of lines, and it does not coalesce by CPU.

The script below will coalesce by CPU, and PC within a function (useful if you forgot the "-k" flag to lockstat). A very simple change will also make it coalesce by PIL, but I will leave that as an exercise for the reader.

Ruby Post-Processor Script

One thing about this is that I would write this script almost exactly the same way if I used Perl. That is another plus of Ruby - it is easy to pick-up for a Perl programmer.

#!/bin/env ruby -w
# lockstatI.rb -	Simplify "lockstat -I" output
#
# http://blogs.sun.com/timc, 16 Mar 2007
#
# SYNOPSIS
#	lockstat -I [-i 971 -n <nnnnnn> ] sleep <nnn> | lockstatI.rb

PROG = "lockstatI"

#-- Once we have printed values that cover this proportion, then quit
CUTOFF = 95.0

DASHES = ('-' * 79) + "\n"

#========================================================================
#	MAIN
#========================================================================

print "#{PROG} - will display top #{CUTOFF}% of events\n"

events = 0
period = 0
state = 0
counts = {}

#-- The classic state machine
ARGF.each_line do |line|
    next if line == "\n"
    case state
    when 0
	if line =~ /^Profiling interrupt: (\d+) events in (\d.*\d) seconds/
	then
	    puts line
	    state = 1
	    events = $1
	    period = $2
	    next
	end
    when 1
	if line == DASHES then
	    state = 2
	end
    when 2
	if line == DASHES then
	    break
	end
	f = line.split
	count = f[0]
	cpu, pil = f[5].split('+')

	#-- Coalesce PCs within functions; i.e. do not differentiate by
	#-- offset within a function.  Useful if "-k" was not specified
	#-- on the lockstat command.
	caller = f[6].split('+')[0]

	if pil then
	    caller = caller + "[" + pil + "]"
	end
	counts[caller] = counts[caller].to_i + count.to_i
    end
end

#-- Give me an array of keys sorted by descending value
caller_keys = counts.keys.sort { |a, b| counts[b] <=> counts[a] }

#-- Dump it out
printf "%12s  %5s  %5s  %s\n", "Count", "%", "cuml%", "Caller[PIL]"
cuml = 0.0
caller_keys.each do |key|
    percent = counts[key].to_f * 100.0 / events.to_f
    cuml += percent
    printf "%12d  %5.2f  %5.2f  %s\n", counts[key], percent, cuml, key
    if cuml >= CUTOFF then
	break
    end
end

Example Output

Free beer to the first person to tell me what this is showing. It was not easy to comprehend the 90,000 line "lockstat -I" output prior to post-processing it though. You get this problem when you have 72 CPUs...

lockstatI - will display top 95.0% of events
Profiling interrupt: 4217985 events in 60.639 seconds (69559 events/sec)
       Count      %  cuml%  Caller[PIL]
     1766747  41.89  41.89  disp_getwork[11]
     1015005  24.06  65.95  (usermode)
      502560  11.91  77.86  lock_set_spl_spin
       83066   1.97  79.83  lock_spin_try[11]
       62670   1.49  81.32  mutex_enter
       53883   1.28  82.60  mutex_vector_enter
       40847   0.97  83.57  idle
       40024   0.95  84.51  lock_set_spl[11]
       27004   0.64  85.15  splx
       17432   0.41  85.57  send_mondo_set[13]
       15876   0.38  85.94  atomic_add_int
       14841   0.35  86.30  disp_getwork

http://blogs.sun.com/timc/date/20070228 Wednesday February 28, 2007

SSH Cheat Sheet

This is offered for those who want to kick their telnet habit. I also offer a simple text version, which you can keep in ~/.ssh.

To create an SSH key for an account

srchost$ ssh-keygen -t rsa

This will create id_rsa and id_rsa.pub in ~/.ssh. "-t dsa" can be used instead. You will need an SSH key if you want to log in to a system without supplying a password.

To be able to log in to desthost from srchost without a password (as below)

srchost$ ssh desthost
desthost$

Simply add the contents of srchost:~/.ssh/id_rsa.pub to desthost:~/.ssh/authorized_keys in the form "ssh-rsa AAAkeystringxxx= myusername@srchost".

To enable forwarding of an X-windows session back to your $DISPLAY on srchost

Just use "-X":

srchost$ ssh -X desthost
desthost$ xterm

If I use a different account on desthost (and I want to use a short name for desthost)

Add something like this:

Host	paedata
    Hostname	paedata.sfbay
    User	tc35445
to srchost:~/.ssh/config

Still Getting Prompted For a Password

If I find that my key is not being recognised on desthost (I still get prompted for a password), I probably have a premission problem. try this as the user on desthost:

cd
chmod g-w,o-w .
chmod g=,o= .ssh .ssh/authorized_keys

To allow root logins (but must specify password or have an authorized_key) on a host

  1. Edit /etc/ssh/sshd_config, change line to
    PermitRootLogin yes
    
  2. Solaris 9 & earlier:
    # /etc/init.d/sshd restart
    
  3. Solaris 10 & later:

    # svcadm restart ssh
    

Here is a patch (will save the originial config file in sshd_config.orig)

/usr/bin/patch -b /etc/ssh/sshd_config << 'EOT'
--- sshd_config.orig       Fri Feb  2 11:27:12 2007
+++ sshd_config Fri Feb 23 14:12:24 2007
@@ -129,7 +129,8 @@
 # Note that sshd uses pam_authenticate(3PAM) so the root (or any other) user
 # maybe denied access by a PAM module regardless of this setting.
 # Valid options are yes, without-password, no.
-PermitRootLogin no
+#PermitRootLogin no
+PermitRootLogin yes

 # sftp subsystem
 Subsystem      sftp    /usr/lib/ssh/sftp-server
EOT

Host Key Has Changed

Reconfigure of desthost - this happens when you (re-)install Solaris. You can avoid it by restoring /etc/ssh/ssh_host_*_key*. Otherwise:

bash$ ssh katie
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
Someone could be eavesdropping on you right now(man-in-the-middle attack)
It is also possible that the RSA host key has been changed.
The fingerprint for the RSA key sent by the remote host is
md5 8e:c4:53:93:64:5b:2d:b4:f8:e9:a8:9c:d9:95:4a:70.
Please contact your system administrator.
Add correct host key in /home/tc35445/.ssh/known_hosts
Offending key is entry 3 in /home/tc35445/.ssh/known_hosts
RSA host key for katie has changed and you have requested strict checking.

Solution - remove the "katie" entry in ~/.ssh/known_hosts and log-in again - ssh will put a new host key in for you.

http://blogs.sun.com/timc/date/20070222 Thursday February 22, 2007

nicstat - now for Linux, too

Just a quick one to flag that I have released a version of nicstat for Linux (see latest blog on nicstat).

I do not have a myriad of Linux systems to test it on, so if anyone finds any issues, please let me know.

http://blogs.sun.com/timc/date/20070214 Wednesday February 14, 2007

nicstat - the Solaris Network Monitoring Tool You Did Not Know You Needed

Update

This is a placeholder entry - see the latest blog on nicstat, for the current source and binaries.

http://blogs.sun.com/timc/date/20070123 Tuesday January 23, 2007

The Compiler Detective - What Compiler and Options Were Used to Build This Application?

("cc CSI")

Introduction

Performance engineers often look at improving application performance by getting the compiler to produce more efficient binaries from the same source. This is done by changing what compiler options are used. In this modern era of Open Source Software, you can often get your hands on a number of binary distributions of an application, but if you really want to roll your sleeves up, the source is there, just waiting to be compiled with the latest compiler and optimisations.

Now, it might be useful to have as a reference the compiler version and flags that were originally used on the binary distribution you tried out, or you just might be interested to know. Read on for details on the forensic tools.

What architecture was the executable compiled for?

Solaris supports 64 and 32-bit programming models on SPARC and x86. You may need to know which one an application is using - it's easy enough to find out.

$ file *.o
xxx.o:          ELF 64-bit LSB relocatable AMD64 Version 1
yyy.o:          ELF 32-bit LSB relocatable 80386 Version 1 [SSE2]

Note: yyy.o was built using "-native", informing the compiler that it can use the SSE2 instruction set extensions supported by the CPUs on the build system.

Some more examples, including a binary built on a SunOS 4.x system:

$ file gobble.*
gobble.sunos4:  Sun demand paged SPARC executable dynamically linked
gobble.sparcv9: ELF 64-bit MSB executable SPARCV9 Version 1,
dynamically linked, not stripped
gobble.i386:    ELF 32-bit LSB executable 80386 Version 1 [FPU],
dynamically linked, not stripped
gobble.amd64:   ELF 64-bit LSB executable AMD64 Version 1 [SSE FXSR FPU],
dynamically linked, not stripped

Can I detect what version of whose compiler was used to build a binary, and on what version of Solaris?

The command to use is:

$ /usr/ccs/bin/mcs -p executable | object

Here are some examples. The "ld:" entries should indicate what release of Solaris the executable was built on, which is a good indicator of the oldest version of Solaris it will run on.

  • Sun Studio 11, on Solaris 10:
        acomp: Sun C 5.8 2005/10/13
        iropt: Sun Compiler Common 11 2005/10/13
        as: Sun Compiler Common 11 2005/10/13
        ld: Software Generation Utilities - Solaris Link Editors: 5.10-1.477
    
  • An old version of GCC (this was built in 1994, looks like Solaris 2.3):
        @(#)SunOS 5.3 Generic September 1993
        GCC: (GNU) 2.5.8
        as: SC3.0 early access 01 Sep 1993
        ld: (SGU) SunOS/ELF (LK-1.3)
    
  • Sun C 5.0, Solaris???
        @(#)stdio.h     1.78    99/12/08 SMI
        acomp: WorkShop Compilers 5.0 98/12/15 C 5.0
        ld: Software Generation Utilities - Solaris-ELF (4.0)
    
  • GCC 2.8.1, Solaris 7:
        @(#)SunOS 5.7 Generic October 1998
        as: WorkShop Compilers 5.0 Alpha 03/27/98 Build
        GCC: (GNU) 2.8.1
        ld: Software Generation Utilities - Solaris/ELF (3.0)
    
  • GCC 3.4.3 (object). The GCC compiler was built on/for "sol210" [sic]
        GCC: (GNU) 3.4.3 (csl-sol210-3_4-branch+sol_rpath)
    

There is a non-linear relationship between versions of the compiler ("Sun C") and the development suite ("Sun Studio", "Forte Developer", "Sun WorkShop", etc.). You can figure it out using Sun Studio Support Matrix.

Can I figure out what compiler flags were used?

It depends on what compiler was used, and whether you have objects.

  • For Sun Studio 11, use "dwarfdump -i", and look for "DW_AT_SUN_command_line" in the output.
  • For Sun Studio 10 and earlier, use "dumpstabs -s", and look for "CMDLINE" in the output.
  • For GCC, you can only figure out what version of GCC was used (see above).

"dwarfdump" and "dumpstabs" are available with the Sun Studio suite.

Some examples:

$ dwarfdump -i gobble.sparcv9
...
                DW_AT_name                  gobble.c
                DW_AT_language              DW_LANG_C99
                DW_AT_comp_dir              /home/tc35445/tools/c/gobble
                DW_AT_SUN_command_line       /usr/dist/share/sunstudio_sparc,
v11.0/SUNWspro/prod/bin/cc -xO3 -xarch=v9 -c  gobble.c
                DW_AT_SUN_compile_options   Xa;O;R=Sun C 5.8 2005/10/13;
backend;raw;cd;

$ dumpstabs -s test
   5:  .stabs "Xa ; g ; V=3.1 ; R=Sun WorkShop 6 2000/04/07 C 5.1
 ; G=$XAB0qnBB6
   6:  .stabs "/net/vic/export/home/timc/tools/c/interposition;
 /opt/Forte6/SUNWspro/bin/../WS6/bin/cc -g -c  test.c
 -W0,-xp\$XAB0qnBB6hh5S_R.",N_CMDLINE,0x0,0x0,0x0

$ dumpstabs -s getwd
   1:  .stabs "Xa ; V=3.1 ; R=WorkShop Compilers 4.2 30 Oct 1996 C 4.2",
N_OPT,0x0,0x0,0x0
   2:  .stabs "/home/staff/timc/tools/c/getwd;
 /opt/SUNWspro/bin/../SC4.2/bin/cc -c  getwd.c -W0,-xp",N_CMDLINE,0x0,0x0,0x0

Can I figure out where the binary will look for shared objects?

If an RPATH has been specified, dumpstabs can find it:
$ dumpstabs mysqladmin | grep RPATH
13: Tag = 15 (RPATH)    /usr/sfw/lib/mysql:/usr/sfw/lib

Can I figure out what shared objects the binary needs?

Yes, just use "ldd":

$ ldd mysqladmin
        libmysqlclient.so.12 =>  /usr/sfw/lib/libmysqlclient.so.12
        libz.so.1 =>     /usr/lib/libz.so.1
        librt.so.1 =>    /usr/lib/librt.so.1
        libcrypt_i.so.1 =>       /usr/lib/libcrypt_i.so.1
        libgen.so.1 =>   /usr/lib/libgen.so.1
        libsocket.so.1 =>        /usr/lib/libsocket.so.1
        libnsl.so.1 =>   /usr/lib/libnsl.so.1
        libm.so.2 =>     /usr/lib/libm.so.2
        libthread.so.1 =>        /usr/lib/libthread.so.1
        libc.so.1 =>     /usr/lib/libc.so.1
        libaio.so.1 =>   /usr/lib/libaio.so.1
        libmd5.so.1 =>   /usr/lib/libmd5.so.1
        libmp.so.2 =>    /usr/lib/libmp.so.2
        libscf.so.1 =>   /usr/lib/libscf.so.1
        libdoor.so.1 =>  /usr/lib/libdoor.so.1
        libuutil.so.1 =>         /usr/lib/libuutil.so.1
        /platform/SUNW,Sun-Fire-15000/lib/libc_psr.so.1
        /platform/SUNW,Sun-Fire-15000/lib/libmd5_psr.so.1

References

http://blogs.sun.com/timc/date/20061216 Saturday December 16, 2006

Why CONNECT / AS SYSDBA did not work - Trap For Young Players

Background

I have been doing some installing of Oracle databases onto system where a database and oracle account are already installed. I wanted to effect changes that could be easily cleaned up. So, I created my own oracle account/uid and group/gid.

The Problem

The problem I encountered was that when I tried to connect to the database to do DBA-type things, my experience went like this:

tora ) sqlplus / as sysdba
...

ERROR:
ORA-01031: insufficient privileges

The Solution

This mode of connection, where a bare slash (/) is used instead of oracleaccount/oraclepassword, is where you ask Oracle to authorize you via your OS credentials. What I was missing is that the Unix group specified during creation of the database (or installation of the software, I forget which) is separate from the Unix group used to automatically grant a user the SYSDBA and SYSOPER privileges - which would have allowed the "sqlplus / as sysdba" to succeed.

So the Unix group that does grant these privileges (I think they are Oracle roles, just as Solaris has roles) is "dba". I am not sure whether this can be changed, so I took the shortest route and added my "tora" user to the "dba" group, and all was well.

File this one under just-hoping-anyone-in-the-same-pain-finds-this-via-Google.

I also would like to plug Installing and Configuring Oracle Database 10g on the Solaris Platform by Roger Schrag as a useful Cheat Sheet.


Valid HTML! Valid CSS!

This is a personal weblog, I do not speak for my employer.