Here's a list of interesting technical articles on application performance tuning that are currently on the Sun Developer Network portal.
- Calculating Processor Utilisation From the UltraSPARC T1 and UltraSPARC T2 Performance Counters
Use the performance counters for the UltraSPARC T1 and UltraSPARC T2
processors to estimate core load and find potential areas for
performance improvement.
- Prefetching Pragmas and Intrinsics
Explicit data prefetching pragmas and intrinsics for the x86
platform and additional pragmas and intrinscs for the SPARC platform
are now available in Sun Studio 12 compilers. Prefetch instructions can increase the speed of an application
substantially by bringing data into cache so that it is available when
the processor needs it. This benefits performance because today's
processors are so fast that it is difficult to bring data into them
quickly enough to keep them busy, even with hardware prefetching and
multiple levels of data cache.
-
- Using F95 Interfaces to Customize Access to the Sun Performance Library
When porting Fortran source, the Fortran 95 generic interface can be
used to allow the source code to remain virtually unchanged and yet
facilitate the use of the ILP-32, LP-64, and ILP-64 programming models.
-
A case study in program optimization.
-
Getting the Best AMD64 Performance With Sun Studio Compilers
Performance is a factor of both hardware and software. To extract the
maximum performance from the new AMD-64 based systems on your critical
C/C++ and Fortran applications, choose the best compilers. Then use
compiler options to take advantage of the Opteron system features to
maximize performance. This article will show you how.
- Using VIS Instructions to Speed Up Key Routines
The VIS instruction set includes a number of instructions that can
be used to handle several items of data at the same time. These are
called SIMD (Single Instruction Multiple Data) instructions. The VIS
instructions work on data held in floating point registers. The
advantage of using VIS instructions is that an operation can be applied
to different items of data in parallel; meaning that it takes the same
time to compute eight 1 byte results as it does to calculate one 8-byte
results. In theory this means that code that uses VIS instructions can
be many times faster than code without them.
- The Sun Studio Binary Optimizer
The Binary Optimizer is a static SPARC optimizer that accepts as
input a binary and creates an optimized binary as the output. We define
a binary as either an executable or a shared object. The availability
of the original source code is not a pre-requisite for using this tool.
It can optimize binaries irrespective of the source language used (C,
C++ or FORTRAN). It can also optimize mixed source language binaries.
- Advanced Compiler Options for Performance
Users wanting the best performance from CPU-intensive codes may
wish to explore the use of additional libraries and advanced compiler
options that control individual compiler components.
- Selecting the Best Compiler Options
How to get the best performance from an UltraSPARC or x86/AMD64
(x64) processor running on the latest Solaris systems by compiling with
the best set of compiler options and the latest compilers? Here are
suggestions of things you should try, but before you release the final
version of your program, you should understand exactly what you have
asked the compiler to do.
- Using Inline Templates to Improve Application Performance
Inline templates are a mechanism for directly inserting assembly
code into an executable. Typically, this approach is used to obtain the
best performance for a given function, or to implement an algorithm in
a specific way.
These days
everyone
seems to use
memcached, a high-performance, distributed memory object
caching system, intended for use in speeding up web applications.
Performance can be greatly improved from moving away from disk fetch to
a RAM fetch. Here is an
excellent
article explaining memcached taking LiveJournal as a case
study.
Do you know that memcached daemons can be set up on Solaris Zones too?
For this you need to download
memcached
package from the
Cool Stack
site.
What you need to get?
1.
Cool
Stack 1.2
2. memcached Java Client APIs (Get
this
jar and
this
jar).
Create and Boot ZonesCreate 3 zones - zonea, zoneb and zonec to test memcached. For
information on creating Solaris zones
read
this article.
I'm using
SXDE
9/07
OK you don't have SXDE?
Get it!
Here is the status of my zones:
#
zoneadm list -vc
ID NAME STATUS PATH
BRAND IP
0 global running /
native shared
4 zonea running
/zones/zonea native shared
5 zoneb running
/zones/zoneb native shared
6 zonec running
/zones/zonec native shared
Start memcached on all
Zones
Follow the
Cool
Stack site for installing Cool Stack on your zones. When you
do a
pkgadd -d
<memcached*.pkg>, memcached gets installed
in all the available zones even though they are not in a running state.
When everything is set, these commands should work fine:
zonea#
./opt/coolstack/bin/memcached -u phantom -d -m 100 -l 129.158.224.242 -p 11111
zoneb#
./opt/coolstack/bin/memcached -u phantom -d -m 100 -l 129.158.224.231 -p 11112
zonec#
./opt/coolstack/bin/memcached -u phantom -d -m 100 -l 129.158.224.243 -p 11113
Start memcached as non-root user on all all zones with 100 MB
memory bucket. This should be OK for testing but ideally in a
production setup it should be around a
terabyte.
If you don't know already, each Solaris zone can bind to an IP and port
through the virtual interface. So you don't need 3 NICs or 3 machines - but
just 3 zones.
Test memcached
Set the classpath pointing to the downloaded
jars. Use
NetBeans
for simplicity. Store an object in-memory and retrieve it
from the memcached daemon running on zonea:
....
//Interact with zonea
MemcachedClient c;
try {
c = new MemcachedClient(new
InetSocketAddress
("129.158.224.242", 11111));
String test=new String("I'm going to be
cached!");
c.set("mykey", 180, test);
Object obj=c.get("mykey");
System.out.println((String)obj);
c.delete("mykey");
} catch (IOException ex) {
ex.printStackTrace();
}
...
We are storing an object for 3 mins. After retrieving the object, you can clean
the cache. When you compile and run the program, the output will look like:
2007-10-03
10:38:49.615 INFO net.spy.memcached.MemcachedConnection: Connected to
{QA sa=/129.158.224.242:11111, #Rops=0, #Wops=0, #iq=0, topRop=null,
topWop=null, toWrite=0, interested=0} immediately
I'm going to be
cached!
From your code, you can also connect to multiple memcached servers and
store objects.
This is quite interesting. You can halt one zone and can try to
store object in-memory on all the three zones.
#
zoneadm -z zonea halt
# zoneadm list -vc
ID NAME STATUS PATH
BRAND IP
0 global running /
native shared
5 zoneb running
/zones/zoneb native shared
6 zonec running
/zones/zonec native shared
- zonea installed
/zones/zonea native shared
Now zonea is no longer running memcached because zonea zone is
down.
Here is the modified code:
MemcachedClient c;
try {
//zonea, zoneb and zonec
c=new MemcachedClient(AddrUtil.getAddresses
("129.158.224.242:11111
129.158.224.231:11112
129.158.224.243:11113"));
String test=new String("I'm going to be
cached on all zones!");
c.set("mykey2", 180, test);
Object obj=c.get("mykey2");
System.out.println((String)obj);
c.delete("mykey2");
} catch (IOException ex) {
ex.printStackTrace();
}
We are trying to store the object on all the available servers. But zonea
is offline.
Here is the output:
2007-10-03 11:18:27.678 INFO net.spy.memcached.MemcachedConnection:
Added {QA
sa=/129.158.224.242:11111, #Rops=0, #Wops=0, #iq=0, topRop=null,
topWop=null, toWrite=0, interested=0} to connect queue
2007-10-03
11:18:27.682 INFO net.spy.memcached.MemcachedConnection:
Connected to
{QA sa=/129.158.224.231:11112, #Rops=0, #Wops=0, #iq=0, topRop=null,
topWop=null, toWrite=0, interested=0} immediately
2007-10-03
11:18:27.684 INFO net.spy.memcached.MemcachedConnection:
Connected to
{QA sa=/129.158.224.243:11113, #Rops=0, #Wops=0, #iq=0, topRop=null,
topWop=null, toWrite=0, interested=0} immediately
I'm going to be
cached on all zones!
The object is queued for insertion whenever zonea comes up. It would be
interesting to test the automatic failover behavior of memcached
considering the fact that memcached is a mother of
all hashtables and there should be sufficient fail safe
plumbing required between running instances of memcached daemons. You can also use
DTrace
for memcached debugging.