Wednesday Jun 27, 2007
|
We are the Solaris Developer Information Products Team:
Richard Friedman, David Lindt, Kami Shahi, Jyothi Srinath, Paul Echeverri, Ann Rice, Alta Elstad, Susan Morgan, Frank Jennings
Today's Page Hits: 1
Wednesday Jun 27, 2007
|
Explicit data prefetching pragmas and intrinsics for the x86 platform and additional pragmas and intrinscs for the SPARC platform are now available in Sun Studio 12 compilers, released June 2007.
Prefetch instructions can increase the speed of an application substantially by bringing data into cache so that it is available when the processor needs it. This benefits performance because today's processors are so fast that it is difficult to bring data into them quickly enough to keep them busy, even with hardware prefetching and multiple levels of data cache.
The compilers have several options that enable them to generate prefetch instructions automatically:
-xprefetch, -xprefetch_level, and -xprefetch_auto_type
(described below). The compilers generally do an excellent job of
inserting prefetch instructions, and this is the most portable and best
way to use prefetch. If finer control of prefetching is desired,
prefetch pragmas or intrinsics can be used. Note that the performance
benefit due to prefetch instructions is hardware-dependent and
prefetches which improve performance on one chip may not have the same
effect on a different chip. It is a good idea to study the instruction
reference manual for the target hardware before inserting prefetch
pragmas or intrinsics. Furthermore, the Sun Studio Performance Analyzer
can be used to identify the cache misses of an application.
Prefetch pragmas are available in Fortran and prefetch intrinsics are available in C and C++. Prefetch can be specified generically, or, on SPARC platforms, with SPARC-specific versions.
Generic x86 and SPARC Prefetch (New)
| PREFETCH TYPE |
FORTRAN PRAGMA |
C, C++ INTRINSIC |
| Prefetch data that is likely to be read more than once | c$pragma sun_prefetch_read_many(address) |
sun_prefetch_read_many(address) |
| Prefetch data that is likely to be read only once | c$pragma sun_prefetch_read_once(address) |
sun_prefetch_read_once(address) |
| Prefetch data that is likely to be written more than once | c$pragma sun_prefetch_write_many(address) |
sun_prefetch_write_many(address) |
| Prefetch data that is likely to be written only once | c$pragma sun_prefetch_write_once(address) |
sun_prefetch_write_once(address) |
SPARC Platforms only:
| PREFETCH TYPE |
FORTRAN PRAGMA |
C, C++ INTRINSIC |
| Prefetch data that is likely to be read more than once | c$pragma sparc_prefetch_read_many(address) |
sparc_prefetch_read_many(address) |
| Prefetch data that is likely to be read only once | c$pragma sparc_prefetch_read_once(address) |
sparc_prefetch_read_once(address) |
| Prefetch data that is likely to be written more than once | c$pragma sparc_prefetch_write_many(address) |
sparc_prefetch_write_many(address) |
| Prefetch data that is likely to be written only once | c$pragma sparc_prefetch_write_once(address) |
sparc_prefetch_write_once(address) |
Strong Prefetch and Instruction Cache Prefetch:
The SPARC Ultra IV+ (ultra4plus)
processor provides "strong" data prefetch instructions. It also
provides a prefetch for instructions rather than data. Strong
prefetches are more powerful than normal prefetches and are recommended
when the data being prefetched has a very high probability of being
used. They will not be dropped on a TLB miss or prefetch queue full
event. Ultra III and Ultra IV processors treat strong prefetches as
normal prefetches.
| PREFETCH TYPE |
FORTRAN PRAGMA |
C, C++ INTRINSIC |
| Prefetch data that is likely to be read more than once | c$pragma sparc_strong_prefetch_read_many(address) |
sparc_strong_prefetch_read_many(address) |
| Prefetch data that is likely to be read only once | c$pragma sparc_strong_prefetch_read_once(address) |
sparc_strong_prefetch_read_once(address) |
| Prefetch data that is likely to be written more than once | c$pragma sparc_strong_prefetch_write_many(address) |
sparc_strong_prefetch_write_many(address) |
| Prefetch data that is likely to be written only once | c$pragma sparc_strong_prefetch_write_once(address) |
sparc_strong_prefetch_write_once(address) |
| Prefetch instructions or data at address of label or data |
c$pragma sparc_prefetch_instruction(address)
|
sparc_prefetch_instruction(address)
|
Command-line options related to prefetch:
| -xprefetch[=auto|no%auto|explicit|no%explicit|latx:factor] | Enable the compiler to insert prefetch instructions. The default is -xprefetch=auto,explicit. |
| -xprefetch_level[=1|2|3] | Control the degree of insertion of prefetch instructions. The default is 1 for C and C++, and 2 for Fortran. |
| -xprefetch_auto_type=[no%]indirect_array_access | Generate prefetches for indirect memory accesses |
| -xarch=architecture | Prefetch instructions are only inserted for architectures that support prefetch. See documentation. |
| -xO[12345] | The optimization level must be 2 or higher for automatic prefetch |
| -xdepend, -xrestrict, -xalias_level | These options may affect the aggressiveness of computing the prefetch candidates due to better memory disambiguation |
For best performance, loops should be unrolled such that each
iteration uses one cache line of data. Since it may take several
iterations for the cache line to arrive, the distance should be a few
iterations ahead. It is very important to avoid inserting too many
prefetch instructions since this can seriously degrade performance. Use
the "read_many" variant if the data will be read again before being
evicted from all levels of data cache, and use "read_once" otherwise.
C/C++ example:
original loop:
|
Note that using -xprefetch=auto with the original loop yields the same code as doing explicit prefetch, so explicit prefetch should only be used for cases where the compiler does not insert adequate prefetches.
Fortran example:
original loop: |
Note that using -xprefetch=auto with the original loop yields the same code as doing explicit prefetch, so explicit prefetch should only be used for cases where the compiler does not insert adequate prefetches.