See Compiler Run, Run Compiler, Run...
If you've ever wondered what the compiler is doing when it optimizes your code, you can use the command-line tool, er_src, which is part of the Sun Studio Performance Analyzer, to view the "compiler commentary".
Just compile with some optimization level and -g and then pass the object code to er_src.
| >f95 -O3 -g -c fall.f95 ; er_src fall.o Source file: fall.f95 Object file: fall.o Load Object: fall.o 1. parameter (n=100) <Function: MAIN> 2. real psi(n,n) 3. a = 1E6 4. tpi = 2*3.14159265 5. di = tpi/float(n) 6. dj = di Source loop below has tag L1 Source loop below has tag L2 L1 could not be pipelined because it contains calls 7. forall (j=1:n, i=1:n) psi(i,j)= a*sin((float(i)-.5) * di) * sin((float(j)-.5)*dj) 8. print*, psi(50,50) 9. end |
This is a little test example using a Fortran 95 FORALL loop, compiled at optimization level O3.
Lets try it again, but this time with -fast for full optimization:
| >f95 -fast -g -c fall.f95 ; er_src fall.o Source file: fall.f95 Object file: fall.o Load Object: fall.o 1. parameter (n=100) <Function: MAIN> 2. real psi(n,n) 3. a = 1E6 4. tpi = 2*3.14159265 5. di = tpi/float(n) 6. dj = di Source loop below has tag L1 Source loop below has tag L2 L1 fissioned into 2 loops, generating: L3, L4 L1 transformed to use calls to vector intrinsics: __vsinf_ L4 scheduled with steady-state cycle count = 2 L4 unrolled 3 times L4 has 1 loads, 1 stores, 0 prefetches, 0 FPadds, 1 FPmuls, and 0 FPdivs per iteration L4 has 0 int-loads, 0 int-stores, 4 alu-ops, 0 muls, 0 int-divs and 0 shifts per iteration L3 scheduled with steady-state cycle count = 4 L3 unrolled 2 times L3 has 0 loads, 1 stores, 0 prefetches, 3 FPadds, 1 FPmuls, and 0 FPdivs per iteration L3 has 0 int-loads, 0 int-stores, 3 alu-ops, 0 muls, 0 int-divs and 0 shifts per iteration 7. forall (j=1:n, i=1:n) psi(i,j)= a*sin((float(i)-.5) * di) * sin((float(j)-.5)*dj) 8. print*, psi(50,50) 9. end |
A lot more going on here. Note that transforms the FORALL into two loops and then unrolls them. It also uses a vector version of the sin() function to process a bunch of arguments in a single call.
While the compiler commentary can get somewhat bit cryptic, you can get a feel for the kinds of optimizations the compiler is performing on your code.
It's also useful when using the auto parallelization options. We'll have more to say about that. But it's worth using er_src to get an idea about what the compiler can and cannot do. And don't forget to also compile with -g.


