The Sun SPARC Enterprise T5220, T5240 and T5440 servers execute the Aho-Corasick string searching algorithm (see Communications of the ACM Volume 18, Number 6, pp. 333-340, 1975) with very high performance.
Motivation for investigating the performance of this algorithm was provided by publications by IBM (see IEEE Computer, Volume 41, Number 4, pp. 42-50, 2008 and http://www.ddj.com/architect/206903527?pgno=1) that describes extensive optimizations to the Aho-Corasick algorithm that were required to achieve good scalability but mediocre performance on the IBM Cell Broadband Engine (CBE). A 2-chip, 16-core, 3.2 GHz CBE used the text of the King James Bible to
search a dictionary of 20,000 common English words at a rate of 0.48
GB/sec.
Several features of the CBE architecture limit performance, notably: (1) a highly non-uniform memory architecture (NUMA) and (2) no cache memory associated with each of the 8 cores or Synergistic Processing Elements (SPEs) that each CBE chip comprises.
In contrast, the Sun SPARC Enterprise T5440 architecture appears to be ideally suited to execution of the Aho-Corasick algorithm. The T5440 possesses a uniform memory architecture as well as a large L2 cache that is shared by all of its cores. Latency to memory is minimized and is effectively hidden by the fact that multiple threads can execute on each core. If one thread stalls waiting for memory, execution instantly switches to another thread.
In order to test the performance of the T5440, the Aho-Corasick algorithm was programmed in C without any optimizations. A test case was chosen to reproduce as closely as possible the test case that was reported by IBM: the 4.6 MB text of the King James Bible was used to search a dictionary of 25,143 common English words (the Solaris "words" file). The dictionary was shared by all threads executing on the T5440.
The 4-chip, 32-core, 1.4 GHz T5440 searched the shared dictionary at a rate of 12.7 GB/sec which is 26.8 times the speed of the IBM CBE. The 2-chip, 16-core, 1.4 GHz T5240 searched the dictionary at a rate of 6.36 GB/sec which is 13.4 times the speed of the IBM CBE. The 1-chip, 8-core, 1.4 GHz T5220 searched the dictionary at a rate of 3.16 GB/sec which is 6.7 times the speed of the IBM CBE. More information can be found at http://www.sun.com/servers/coolthreads/t5440/benchmarks.jsp
where the performance is described in greater detail.