Application development on Solaris OS using Sun Studio compilers and tools
Solaris Developer
ABOUT

We are the Solaris Developer Information Products Team:
Richard Friedman, David Lindt, Kami Shahi, Jyothi Srinath, Paul Echeverri, Ann Rice, Alta Elstad, Susan Morgan, Frank Jennings

DOCUMENTATION
»Solaris OS
»Sun Studio Compilers/Tools

ARTICLES
»Solaris OS
»Sun Studio Compilers/Tools

DOWNLOADS
»Solaris OS
»Sun Studio Compilers/Tools

RECENT ENTRIES
Archives
Click me to subscribe
Search

Links
 

Today's Page Hits: 1

« C and C++ Technical... | Main | Project D-Light... »
Saturday Jun 23, 2007
Technical Articles on Performance Tuning

Here's a list of the current SDN articles dealing with tuning and optimization of applications on Solaris using Sun Studio compilers: 

Performance Tuning With Sun Studio Compilers and Inline Assembly Language
Here are examples of using a compiler flag or inline assembly language with Sun Studio compilers to increase the performance of C, C++, and Fortran programs. (June 4, 2007)
 
Profiling WebSphere Application Servers with Sun Studio Performance Tools
This article describes how to profile an IBM WebSphere Application Server (WAS) runtime environment with the Sun Studio Performance Analysis Tools, Collector and Analyzer. (January 30, 2007)
 
Cool Tools: Using SHADE to Trace Program Execution by Darryl Gove
The SHADE library is an emulator for SPARC hardware. The particular advantage of using SHADE is that it is possible to write an analysis tool which gathers information from the application being emulated. The SHADE library comes with some example analysis tools which track things like the number of instructions executed or the frequency that each type of instruction is executed. A more advanced analysis tool might look at cache misses that the application encounters for a given cache structure. (September 29, 2006)
 
Selecting Representative Training Workloads for Profile Feedback Through Coverage and Branch Analysis by Darryl Gove
Profile feedback is an optimisation technique that uses a short training run of the application to provide the compiler with more detailed information about the runtime behaviour of the program. This information enables the compiler to make better optimisation decisions. For example, which routines are appropriate to inline, or which branches are the frequently taken path. This paper presents two ways of viewing the correspondence between the behaviour of the training and reference workloads. The methods presented here are necessary conditions for the training workload to be representative of the reference workload. (September 29, 2006)
 
Profiling Java Applications with Sun Studio Performance Tools
How to use the Sun Studio Performance Analyzer to profile Java applications. (August 25, 2006)
 
Profiling WebLogic Servers with Sun Studio Performance Tools
This article describes how use the Sun Studio Performance Tools to profile servers being run under BEA's WebLogic system. A server running under BEA's WebLogic is a Java application that you launch by running a script to invoke the JVM. To profile a server, prepend the JVM command with a collect command to invoke the Sun Studio Collector. The article details how this is done. (August 25, 2006)
 
Getting the Best AMD64 Performance With Sun Studio Compilers
Performance is a factor of both hardware and software. To extract the maximum performance from the new AMD-64 based systems on your critical C/C++ and Fortran applications, choose the best compilers. Then use compiler options to take advantage of the Opteron system features to maximize performance. This article will show you how. (May 23, 2006)
 
Building Enterprise Applications with Sun Studio Profile Feedback
Large, CPU intensive applications may perform better when built with profile feedback. Profile feedback optimization requires the application to be built twice, once to collect the profile data, and again to make use of the profile to generate optimal code. This requirement may keep some software vendors from building their applications with profile feedback. However it is possible to use old profiles to minimize the overhead of profile feedback builds in a development environment. This article introduces all the stages of profile feedback with examples, and offers some tips for making profile feedback builds. (April 11, 2006)
 
Performance Analysis Made Simple Using SPOT
An application's performance depends on a combination of hardware and software factors. For example, what events must the hardware deal with, and what degree of optimisation was applied when compiling the application? There are a number of tools that can be used to extract information or collect this kind of information, but knowing which tools to pick for a given application can be tricky. This paper introduces a new tool that aims to simplify the process of performance analysis. We call it the Simple Performance Optimisation Tool, or 'SPOT'. Spot is an add-on package to Sun Studio 11, and it is only available for UltraSPARC based systems. Spot has been released as part of the Cool Tools project. (March 7, 2006)
 
Using VIS Instructions to Speed Up Key Routines
The VIS instruction set includes a number of instructions that can be used to handle several items of data at the same time. These are called SIMD (Single Instruction Multiple Data) instructions. The VIS instructions work on data held in floating point registers. The advantage of using VIS instructions is that an operation can be applied to different items of data in parallel; meaning that it takes the same time to compute eight 1 byte results as it does to calculate one 8-byte results. In theory this means that code that uses VIS instructions can be many times faster than code without them. (January 5, 2006)
 
The Sun Studio Binary Optimizer
The Binary Optimizer is a static SPARC optimizer that accepts as input a binary and creates an optimized binary as the output. We define a binary as either an executable or a shared object. The availability of the original source code is not a pre-requisite for using this tool. It can optimize binaries irrespective of the source language used (C, C++ or FORTRAN). It can also optimize mixed source language binaries. (December 1, 2005)
 
The Sun Studio Performance Tools
The Sun Studio performance tools are designed to help answer questions about application performance. This article discusses the kinds of performance questions that users typically ask. It describes the model for using the tools, and for building the target executable, as well as the data collection process, and the data that can be collected. The Analyzer and its displays are also described, along with a number of examples of what it can do. (November 10, 2005)
 
Advanced Compiler Options for Performance
Users wanting the best performance from CPU-intensive codes may wish to explore the use of additional libraries and advanced compiler options that control individual compiler components. (Revised March 23, 2006)
 
Use Profile Feedback To Improve Performance
Profile feedback is a useful mechanism for providing the compiler with information about how a code behaves at runtime. Having this information can lead to significant improvements in the performance of the application. As with all optimisations, it is only worth using profile feedback if it does produce a gain in performance. (September 7, 2005)
 
Improving Code Layout Can Improve Application Performance
Large applications have a particular problem: they have a lot of instructions, and the processor does not have the capacity to hold the entire application on-chip at any one time. As a consequence, larger applications spend some of their run time stalled with the processor waiting to fetch new instructions from memory. This paper discusses several techniques that help the processor to hold more useful instructions on-chip, consequently reducing the time wasted fetching data from memory. (July 12, 2005)
 
Selecting the Best Compiler Options
How to get the best performance from an UltraSPARC or x86/AMD64 (x64) processor running on the latest Solaris systems by compiling with the best set of compiler options and the latest compilers? Here are suggestions of things you should try, but before you release the final version of your program, you should understand exactly what you have asked the compiler to do. (June 24, 2005)
 
Using Inline Templates to Improve Application Performance
Inline templates are a mechanism for directly inserting assembly code into an executable. Typically, this approach is used to obtain the best performance for a given function, or to implement an algorithm in a specific way. (July 23, 2003)
 
Using UltraSPARC-IIICu Performance Counters to Improve Application Performance
This article introduces you to the UltraSPARC-IIICu performance counters, and demonstrates how you might use the Sun ONE Studio Performance Tools to identify where in your application these events are happening and how you can use this information to improve the performance of your application. (July 23, 2003)
 
LU Factorization Case Study Using FAST
A discussion of dataflow parallelism with the Fast Application Scalability Tool. (February 27, 2003)
 
How I Got 15x Improvement Without Really Trying
A case study in program optimization. (January 13, 2003)
 
Compiling for the UltraSPARC(R) IIICu Processor
(This article has been revised and expanded, and republished as Selecting The Best Compiler Options.)
 
C++ Compile and Run-Time Performance
Techniques for improving both compile-time and run-time performance of your C++ programs. (Revised March 14, 2006)

Posted at 11:57AM Jun 23, 2007 by Richard Friedman in Tuning/Optimization  |  Comments[4]

Comments:

I have not read anywhere about the interaction between openmp (say with -xautopar, or explicit directives) and profile feedback. If openmp optimization uses profile feedback, I guess it would be good to have test runs with various values for PARALLEL (or similar variables).

Posted by Marc on June 25, 2007 at 06:59 AM PDT #

Good question!

Yes, profile feedback and OpenMP can live together. And yes, the granularity of the parallelization and number of threads could have an affect on the results collected by -xprofile=collect.

But this requires a more definitive answer; like what kind of data -xprofile=collect actually collects when there multiple threads running over multiple processors.

Stay tuned. I'll get that info and have a more definitive answer for you.

Posted by rchrd on June 26, 2007 at 11:07 PM PDT #

Here's the definitive answer:

-xprofile=collect doesn't collect timing data. It only counts the number of times each block of code is executed, and for each conditional or indirect branch instruction, the number of times each outcome of the branch was taken.

The information collected under -xprofile=collect should be the same whether the instrumented code is executed by a single thread or by multiple threads. If -xopenmp is specified with -xprofile=collect, the compiler instruments the code using a private array of execution counters for each thread. The counters are accumulated when the thread exits or the program is unloaded, whichever comes first.

So therefore a single run should be sufficient. Changes in the number of threads should have little or no affect.

Sounds like we need to make this clearer in the docs.

Posted by rchrd on June 27, 2007 at 09:41 AM PDT #

Thank you for the answer.
I keep forgetting that collect does not collect any timings, I guess better optimizations are only for programs run in a virtual machine. It must be hard to optimize openmp loops that use library functions (not available through ipo) without timing.
Anyway if a single mono-threaded run is good enough for collect, this is very easy to use.

Posted by Marc on June 29, 2007 at 08:19 AM PDT #

Post a Comment:
Comments are closed for this entry.