Paul Hinker's Weblog

pageicon Tuesday Sep 06, 2005

The Fortran Compiler (F95) and some clever optimizations

I thought I would write about some of the really good work being done in the Fortran compiler with respect to the MATMUL F90/F95 intrinsic. The compiler, or more specifically the middle end, is doing some very clever optimizations when it comes to handling calls to the MATMUL intrinsic.

In the most simple case, a user will use the MATMUL intrinsic to multiply two matrices together.

REAL*8, DIMENSION(10,10) :: A, B, C
...
C = MATMUL(A,B)

For those programmers familiar with the BLAS routines, this call can be re-written as follows:

call dgemm('n','n',10,10,10,1.0d0,A,10,B,10,0.0d0,C,10)

The BLAS3 interface for the xGEMM routines is very flexible and can handle such things as submatrices, alpha and beta with different values, etc. From a high level perspective, the F90/F95 language allows the same sort of flexibility.

C(1:5,1:5) = MATMUL(A(1:5,1:5),B(1:5,1:5))

While this offers the programmer an easy way to handle such things as submatrices it may cause some performance issues depending on the matrix multiplication code that supports the intrinsic. If the low-level routine has the same flexibility as the BLAS3 xGEMM routine, the compiler could easily call the underlying routine as follows:

call dgemm('n','n',5,5,5,1.0d0,A,10,B,10,0.0d0,C,10)

If, on the other hand, the underlying routine is restrictive in terms of how it is called, the compiler might be forced to perform gather/scatter pairs on each of the matrices before and after the call to the computational routine. Obviously, this will have an adverse impact on program performance.

Not only is the compiler handling sub-matrices, it's also handling various other forms of the general matrix multiply.

C = MATMUL(transpose(A),B) becomes call dgemm('t','n',10,10,10,1.0d0,A,10,B,10,0.0d0,C,10)

C = MATMUL(transpose(A)*alpha,B) becomes call dgemm('t','n',10,10,10,alpha,A,10,B,10,0.0d0,C,10)

C = C + MATMUL(A,B) becomes call dgemm('n','n',10,10,10,1.0d0,A,10,B,10,1.0d0,C,10)

C = C * beta + MATMUL(A*alpha,transpose(B)) becomes call dgemm('n','t',10,10,10,alpha,A,10,B,10,beta,C,10)

The F90 compiler has been doing these transformations since Sun Studio 9 and additional optimizations are in the works where MATMUL calls which are really matrix-vector (i.e. xGEMV) calls will be converted directly to the correct matrix-vector call.

Comments:

Post a Comment:
Comments are closed for this entry.