Thursday March 09, 2006 The target goal of 100% compatibility of gcc4ss (GCC for SPARC Systems) with plain GCC wouldn't be achieved if we didn't support all gcc flags. So we do! gcc4ss accepts all gcc flags plus we added more to control Sun Code Generator for SPARC Systems (scg4ss).
The maximum optimization level is still -O3 (same as GCC). At -O3 gcc4ss performs initial inlining and passes IR (Internal Representation) to scg4ss to do advance optimizations and further inlining. scg4ss's heuristics are tuned for sparc processors and can be driven by profile feedback and inter-module/inter-procedure analysis. Unfortunately I'm not in a position to talk about exact numbers, but grab your favourite app and measure -O2 vs -O3 performance with gcc4ss. And send us your results of course!
On top of -O3 we added -fast flag. Those familiar with Sun Studio know about this flag already. -fast is the macro of -O3 -xtarget=native -fns -fsimple=2 and other flags. -xtarget=native determines the available architecture, chip, cache of the machine on which the compiler is running, so you don't have to worry about improper -xarch, -xchip on your build server. Of course there is a -xtarget=generic in scg4ss for 'blended' arch/chip model. -fns and -fsimple=2 allows scg4ss's optimizer to perform aggressive floating point computations which are not strictly conforming with IEEE 754, but makes the floating point code run much faster. Once you're comfortable with -O3, try -fast instead. That what we use to run spec benchmarks.
As an extra topping to your -fast shake you can add -xipo flag to do inter-procedural optimizations. scg4ss's internal representation is stored within object file and fetched back during the link time, hence optimizer can see the IR for all modules at once. Each particular module during -xipo build is compiled with -O0-like level, hence all .o are built quickly, but the linking takes quite some time, because optimizer needs to recompile all modules with original optimization level and call code generator for each .o again. -xipo works best with -xprofile.
-xprofile flag should be used in two steps. Step one to collect train data with -xprofile=collect and step two to use the profile data with -xprofile=use. Normally you don't have to use -xipo during 'collect' phase if you want to use it during 'use', but it's recommended to have optimization level and other flags the same between two phases.
There are bunch of other performance related flags.
Please read about them here:
http://cooltools.sunsource.net/gcc/flags.html
Alexey.
Posted by alexey ( Mar 09 2006, 02:08:17 PM PST ) Permalink Comments [7]This site is a personal blog and is to be used for informational purposes only. The views expressed on this blog are those of the author only, and should not be attributed to any past or present employers.
Posted by p on March 10, 2006 at 11:16 AM PST #
Posted by Alexey on March 10, 2006 at 02:32 PM PST #
Posted by p on March 10, 2006 at 03:00 PM PST #
Posted by p on March 10, 2006 at 04:51 PM PST #
Posted by 192.18.42.11 on March 10, 2006 at 05:58 PM PST #
Posted by Alexey on March 10, 2006 at 06:02 PM PST #
Posted by 128.195.11.178 on March 10, 2006 at 09:53 PM PST #