Darryl Gove's blog

Friday Apr 17, 2009

Compiling for Nehalem and other processors

First thing I'd suggest is to make sure that you're using a recent compiler. Practically that means Sun Studio 12 (which is quite old now), or the Sun Studio Express releases (aka Sun Studio 12 Update 1 Early Access). Obviously the latest features are only found in the latest compilers.

In terms of flags, the starting point should be -fast, you can later trim that if you need to remove floating point simplification, or are not happy with one or other of the flags that it includes.

The next flags to use are:

  • -xipo=2 This flag enables crossfile optimisation and tracking of allocated memory. I find crossfile optimisation useful because it limits the impact of the source code structure on the final application.
  • -xarch=sse4_2 This flag is included in -fast (assuming that the build system supports it). However, if you later plan to fine tune the compiler flags, it's best to start off with it explicitly. This allows the compiler to use the SSE4 instruction set. Probably -xarch=sse2 will be sufficient in most circumstances - it's a call depending on the system that the application will be deployed on.
  • -xvector=simd This flag tells the compiler to generate SIMD (single instruction multiple data) instructions - basically the combination of this and the architecture flag enables the compiler to generate applications that use SSE instructions. These instructions can lead to substantial performance gains in some floating point applications.
  • -m64 On x86 there's performance to be gained from using the 64-bit instruction set extensions. The code gets more registers and a better calling convention. These tend to outweigh the costs of the larger memory footprint of 64-bit applications.
  • -xpagesize=2M This tells the operating system to provide large pages to the application.
  • The other optimisation that I find very useful is profile feedback. This does complicate and lengthen the build process, but is often the most effective way of getting performance gains for codes dominated by branches and conditional code.
  • The other flags to consider are the aliasing flags -xalias_level=std for C, -xalias_level=compatible for C++, and -xrestrict. These flags do lead to performance gains, but require the developer to be comfortable that their code does conform to the requirements of the flags. (IMO, most code does.)

All this talk about flags should not be a replacement for what I consider to be the basic first step in optimising the performance of an application: to take a profile. Compiler flags tell the compiler how to do a good job of producing the code, but the compiler can't do much about the algorithms used. Profiling the application will often give a clue as to a way that the performance of the application can be improved by a change of algorithm - something that even the best compiler flags can't always do.

Comments:

Post a Comment:
Comments are closed for this entry.

Calendar

Search this blog

About

Solaris Application Programming

Book resources

The Developer's Edge

Book resources

OpenSPARC Internals

Book resources

Recent entries

Custom search

Tag cloud

book cmt communityone compiler cooltools cpu2006 dtrace gcc libraries linker openmp opensolaris opensparc optimisation optimization parallelisation parallelization performance performanceanalyzer programming solaris solarisapplicationprogramming sparc spec spot sunstudio t2 ultrasparc ultrasparct2 x86

Links

Webcasts

Articles

Presentations

Interesting docs

Navigation

Referers

Feeds