Paul Hinker's Weblog
Tuesday Apr 28, 2009
Double complex matrix-matrix multiply (ZGEMM) on the cheap
Here's a trick that works surprisingly well if you've already got a good DGEMM (double precision matrix-matrix multiply) implementation but haven't had the time to tune the ZGEMM (double precision complex matrix-matrix multiply). Fortran 95 array notation makes this very clean and performance is surprisingly good.
subroutine pp_zgemm_nn(m,n,k,alpha,a,lda,b,ldb,c,ldc)
implicit none
integer, intent(in) :: m,n,k
integer, intent(in) :: lda,ldb,ldc
real*8, intent(in), dimension(0:2*lda-1,0:k-1) :: a
real*8, intent(in), dimension(0:2*ldb-1,0:n-1) :: b
real*8, intent(inout), dimension(0:2*ldc-1,0:n-1) :: c
real*8, intent(in) :: alpha(2)
integer :: i, j
real*8, dimension(0:m-1,0:k-1) :: Ar, Ai
real*8, dimension(0:k-1,0:n-1) :: Br, Bi
real*8, dimension(0:m-1,0:n-1) :: Tr, Ti
Ar = A(0:2*m-1:2,0:k-1)*alpha(1)-A(1:2*m-1:2,0:k-1)*alpha(2)
Ai = A(1:2*m-1:2,0:k-1)*alpha(1)+A(0:2*m-1:2,0:k-1)*alpha(2)
Br = B(0:2*k-1:2,0:n-1)
Bi = B(1:2*k-1:2,0:n-1)
call pp_dgemm('n','n',m,n,k,1.0d0,ar,m,br,k,0.0d0,Tr,m)
call pp_dgemm('n','n',m,n,k,1.0d0,ai,m,bi,k,0.0d0,Ti,m)
C(0:2*m-1:2,0:n-1) = C(0:2*m-1:2,0:n-1) + (Tr - Ti)
call pp_dgemm('n','n',m,n,k,1.0d0,ar,m,bi,k,0.0d0,Tr,m)
call pp_dgemm('n','n',m,n,k,1.0d0,ai,m,br,k,0.0d0,Ti,m)
C(1:2*m-1:2,0:n-1) = C(1:2*m-1:2,0:n-1) + (Tr + Ti)
return
end
Of course, there are a few more details that need to be handled. This only does the NN case (transa='N' and transb='N') and the scaling of C by beta is handled in an upper level driver. You need versions for all the variations (NT,TN,TT,CN, etc).
There are some pros and cons with this approach. On the upside, you get a fast ZGEMM almost for free; if your DGEMM version is parallelized, you get that for free as well. Performance is good for matrix sizes of 1000x1000 and larger. On the downside, there's lots of temporary space used. If you block this for cache, you'll need to be careful that you parallelize at this level and not expect your DGEMM to do it
Posted at 09:27AM Apr 28, 2009 by hinkthink in General |

