2. HPC Architecture Outline
1) HPCC system architecture
HPCC system architecture construction
HPPC system is needed to construct below system architecture generally. Its logical consturction doesn’t need to separate in hareware. Very small scale’s cluster can be constructed on several computer and 1~2 storage and file server , but as the scale on big cluster named supercom is larger , Parallel transantion , management, Interconnect, Scratch disk and cluster filesystem are complicated and these factor influences system performance and stability.
[picture 3] HPCC system architecture consturuction example
A. System architecture construction outline
Supercomputing cluster system architecture is classified to next construction.
■ Computer node/ shared file service (scratch) disk storage
It is a large-scale computing node and a file service node that supply computing power for performming a large work.
Parallel cluster file system of excellent performance and scalability for file sharing between computing node and shared-file service (scratch) disk storage.
■ Intercoonection network
Interconnect network of High Bandwidth for supplying Hige performance MPI
■ Cluster fie service consturction
- Parallel cluster file system of excellent performance and scalability for file sharing between computing node and shared-file service (scratch) disk storage.
- HOME/HSM/Backup consturction
■ Front/ management node
Login, Debugging node for user’s system access and work preparation environment
Management node for management and control about all system resource of cluster system, and for performing to S/W ‘s Provisioning.
■ Management network
-Ethernet Network for management and control of all system resource and forprovoding Ethernet Access Service such as Provisioning.
B. Cluster construct’s considerable matters
For constructing Cluster Supercomputer, there are trend of technolodge, required-performance, user envirionment but large classification are as below.
■ Superior performance
- It provides performance of high level by system airchitecture constructed Low Latency Fabric network, which provides exllent system architecture and CPU design’s computer node and superior bandwidth and latency time.
- Supercom’s computing architecture is future computing technology
Appropriate Ratio Balancing about CPU, Memory, I/O, Network, Storage, Backup is very important than anyting else.
Supercomputer’s performance is marked in “FLOPS(FLoating Point Operations Per Second)”unit. It means floation point operation frequency/per second, often it is measured by benchmark such linpeck that is using linear matrix
■ Stability
By selecting high reliable system and eliminating complexity, it constructs stability and reriable cluster by maximum simplicated whole architecture construction
■ Scalability
Because Main factor’s technology develpment speed of supercomputer (ex.computer, interconnect technology) is very fast, supercomputer’s performance improvement speed is progressed rapidly.
-Accordingly, according to required-performance’s improvement and technology development, rapid extension of performance is mainly required matter.
By adding Interconnect Fabric simply, blade system is modular type architecture that provides high extension and easy extension than rack server.
- Because Magnum Switch , which can be offered by only Sun, which offers Non-Blocking Fabricsingle switch up to maximum 3456port on single switch, it provides Non-Blocking Fabric’s extension without addition of IB Switch and it is easy extention on large cluster.
■ Interoperability
Interoperability about large scale computing node, sratch node, and diverse File service , management service, user environment and investigation security about performance bottleneck state are needed.
■ Computing Environment S/W stack
Supercom’s computing environment is needed to have development environment of compiler, debugger and diverse parallel performance library for preserving optimized performance of parallel programming’s development and diverse application
In case of cluster, which is a standard architecture , that includes very diverse devices ,and confirmation of compatibility about these each divice’s drives and tunning occupies a major part in whole performance and stability
■ Management skill
- cluster management, stsyem management, user management chip
Management skill about Performance management, obstacle management, construction management and report written.
-Provisioning about O/S and App
-H/W state management and automatic power On-off