Paul Hinker's Weblog

pageicon Thursday Jan 19, 2006

Parallel Dual-Core Amd Performance

Last time I presented some serial performance for a dual-core Amd box (285 cpus) . Those numbers aren't especially interesting so here are some parallel perform ance numbers.

Matrix Size1Cpu2Cpu4Cpu% of Peak 2Cpu Scaling4Cpu Scaling
10004605 9076.68 17614.13 88.56% 98.55% 95.63%
12504661.55 9162.21 17764.15 89.65% 98.27% 95.27%
15004632.85 9128.44 17840.91 89.09% 98.52% 96.27%
17504647.37 9132.16 17990.68 89.37% 98.25% 96.78%
20004624.1 9149.64 18010.68 88.93% 98.93% 97.37%
22504642.03 9185.79 18015.37 89.27% 98.94% 97.02%
25004630.41 9148.34 18015.57 89.05% 98.79% 97.27%
27504641.52 9191.52 18006.34 89.26% 99.01%96.99%
30004607.91 9120.19 17946.34 88.61% 98.96% 97.37%
32504646.25 9203.86 18112.48 89.35% 99.05% 97.46%
35004621.16 9169.04 17988.65 88.87% 99.21% 97.32%
37504628.75 9166.81 17995.61 89.01% 99.02% 97.19%
40004673.16 9277.18 18291.52 89.87% 99.26% 97.85%
42504628.35 9175.68 18077.82 89.01% 99.12% 97.65%
45004611.75 9135.38 18009.88 88.69% 99.04% 97.63%
47504634.89 9195.68 18090.37 89.13% 99.20% 97.58%
50004600.08 9103.6 18054.83 88.46% 98.95% 98.12%

Performance numbers are expressed in Mflops and scaling is calculated as (multi-core performance / (serial performance * #cpus used)

Nice performance numbers with 90% of peak for the serial run and as much as 98% scaling to 4 cpus. The above table concerns the double precision matrix multiply routine. As discussed previously in this blog, the DGEMM routine is probably one of the most heavily used routines in high performance computing. Especially when solving dense systems. The 3 other 'flavors' of matrix multiply (single,complex, double complex) demonstrate similar performance and scaling.

Comments:

Post a Comment:
Comments are closed for this entry.