Thursday Jul 09, 2009

4) Interconnect Network
On the extreme parallel Cluster system, performence and scalability stability depend on interconnect network construction for message passing between cluster node.
Generally, small-scale cluster consists of construction , management in low cost and simple consturction ethernet , but as Cluster’s scale is larger, whole cluster performence’s effency is decreased by Ethernet’s large latency , and cluster scalibility is also limited, large cluster system consists of network which are Infiniband, Quadric, Myrinet between computer node.

From large scale Cluster example On Top500.org, InfiniBand instead of existing Myfinet is universally used for Cluster’s Interconnect Network.
InfiniBand is that intergrated result of both Future I/O by Compaq, IBM, HP and Next Generation I/O(ngio) by Sun, Intel, MS, and mainly used HPC as Switched fabric. It includes quality of service(QoS) and failover function and especially offers scalability. It is mainly used as interface between preocessor node or between high performance I/O node on HPC.
IB offers below Theoretical throughput according to Signalling rate.

Effective theoretical throughput in different configurations

   Single  Double  Quad
 1X  2 Gbit/s  4 Gbit/s  8 Gbit/s
 4X  8 Gbit/s  16 Gbit/s  32 Gbit/s
 12X  24 Gbit/s  48 Gbit/s  96 Gbit/s

Because serial connection’s signalling rate is 2.5 Gbps/per connection, It offers 10Gbps in theory in case of 4X IB( SDR(Single Data Rate)) and offers 20Gbps in case of DDR(Double Data Rate).
Because of IB’s 8B/10B encoding, Data throughput is 4/5 of signalling rate which is 8Gbps, 16GBps.
From Top500, the reason that IB is choosen as Cluster intercinnect is due to high throughput, low Latency, and relative low construct cost.

   InfiniBand  Ethernet  Myrinet  Quadrics
 Proprietary  No  No  Yes  Yes
 QoS  Yes  Yes  No  No
 Common ULP RDMA  Yes  No  No  No
 Open Source  Yes  Yes  Some  Some
 Link Rate  10-30Gb/s  1 Gb/s  2.5 Gb/s  7.2 Gb/ps
 MPI Bandwidth  2500 MB/s  85-118 MB/s  250 - 450 MB/s  280-840 MB/s
 Latency  3.38 uSec  30 uSec  4.5 uSec  2 uSec

InfiniBand Switch’s main vendors areCisco which takes over Topspin and Voltaire, SilverStorm.

 

1) HPC Components outline

[picture 2] HPCC Components
On next page, according to each major construction factor, outline about the factor and a present trend are summarized.

2)Computing node (Computer)

Computing node is computer that performs supercomputer’s operation, and it occupies mainly important part on construction of supercomputer. According to CPU’s development and supercom’s application, supercom computer is being evolved to Scala, Vector, Messive parallel processing computer.


i. Vector Processor (Array processor)
The early days supercom machine simply consisted of Scalar processor.
Scalar processor, which is most simple form’s computer processor, could perform interger or floating point operation once a time.
Vector Processor,on the other hand, can perform many datas by using just one command at the same time. Difference of between Scalar processor and Vector Processor is such as difference of between Scalar operation and Vector operation.
Vector Processor can perform mathematics-operation by many datas’s mixture at the same time. From the middle of 1970 to the middle of 1980, Vector computer, as a major term of super computer, was used widely in Scientific computing field, and many of computer benders came out computer mounted self special-purpose processor,and loaded 4~16 processers processor.
However, as 1980~1990 passed, large scale parallel computer(massive parallel processing system) consist of many general CPU are being took noticed of new concept’ supercom, these massive cluster systems are constructed by parallel conneting “off the shelf”1type computer mounting mass production cpu such as X86-64, PowerPC, Itanium in speedy interconnecter network.

ii. Massive parallel processing (Computer Cluster)
High-performance computing(HPC) cluster, representive constuction of Massive parallel processing, interconnects many computer nodes by high-speed or switching fabric, and it is named tightly compled computer because it is like single computer and also each computer is operated on seperated(independent)OS.
HPC Cluster improves computin performence by distibuting computational tasks in cluster’s nodes and it is useful to work workload optimized by inter communicating between computer nodes in cluster’s nodes. In other words, one node’s operation result influences the others operation result, Librarys such MPI are used to distributed performming many scientific applications between cluster nodes
HPC Cluster model ,well known as “Beowulf cluster”2, is architecture , and low cost PC level is constructed various free software for Linux OS and parallelism.
Today, as more and more mass computing power is needed for perfomming very large scale works, HPC cluster supercomputing ,which supplies computing power for large scale of relative low cost, is increased rapidly,and it is prospected that most extra-large supercom in the hereafter are nearly consturcted by cluster computing. It seems that it is not simple cost problem but general cpu of mass prodcution and network technology ‘s performence.


iii. SMP(Symmetirc multiprocessing) and NUMA(Non-Uniform Memory Access)
For running applications which are unable to be distributed by Scientific application needed mass memory or MPI, SMP cluster supercomputer of mass memory is still major supercomputer’s constituent part.
Multipreocessing computer is singular computer that consint of more than one cpu in one OS instance, and as memory access method, when main memory are shared according to SMP,processor connected memory’s place (In common, both are simply called SMP supercomputing)
Libraries openMP are used for multi processer distributed operation in singlular node of scientific applicaitions
When parallel programming is mentioned on Cluster, it is parallel programming of hybrid-model which uses both OpenMP and MPI .

1. generally technology or computer products, that are ready-made and available for sale, lease, or license to the general public.

2 Be.owulf is a design for high-performance parallel computing clusters on inexpensive personal computer hardware. Originally developed by Thomas Sterling and Donald Becker at NASA, Beowulf systems are now deployed worldwide, chiefly in support of scientific computing.

Wednesday Oct 15, 2008

 

1) HPCC system architecture

 

HPCC system architecture construction

 

 

HPPC system is needed to construct below system architecture generally. Its logical consturction doesnt need to separate in hareware. Very small scales cluster can be constructed on several computer and 1~2 storage and file server , but as the scale on big cluster named supercom is larger , Parallel transantion , management, Interconnect, Scratch disk and cluster filesystem are complicated and these factor influences system performance and stability.

 

[picture 3] HPCC system architecture consturuction example


 

A. System architecture construction outline

 

 

Supercomputing cluster system architecture is classified to next construction.

 

Computer node/ shared file service (scratch) disk storage

 

It is a large-scale computing node and a file service node that supply computing power for performming a large work.

 

Parallel cluster file system of excellent performance and scalability for file sharing between computing node and shared-file service (scratch) disk storage.

Intercoonection network

 

Interconnect network of High Bandwidth for supplying Hige performance MPI 

Cluster fie service consturction

 

- Parallel cluster file system of excellent performance and scalability for file sharing between computing node and shared-file service (scratch) disk storage.

- HOME/HSM/Backup consturction

Front/ management node

 

Login, Debugging node for users system access and work preparation environment

Management node for management and control about all system resource of cluster system, and for performing to S/W s Provisioning.

Management network

 

 

 

-Ethernet Network for management and control of all system resource and forprovoding Ethernet Access Service such as Provisioning.

 

B. Cluster construct’s considerable matters

 

 

For constructing Cluster Supercomputer, there are trend of technolodge, required-performance, user envirionment but large classification are as below.

 

 

Superior performance

 

- It provides performance of high level by system airchitecture constructed Low Latency Fabric network, which provides exllent system architecture and CPU designs computer node and superior bandwidth and latency time.

 

- Supercoms computing architecture is future computing technology

 

 Appropriate Ratio Balancing about CPU, Memory, I/O, Network, Storage, Backup is very important than anyting else.

 

 Supercomputers performance is marked in FLOPS(FLoating Point Operations Per Second)unit. It means floation point operation frequency/per second, often it is measured by benchmark such linpeck that is using linear matrix

 

Stability

 

By selecting high reliable system and eliminating complexity, it constructs stability and reriable cluster by maximum simplicated whole architecture construction

Scalability

 

 

Because Main factors technology develpment speed of supercomputer (ex.computer, interconnect technology) is very fast, supercomputers performance improvement speed is progressed rapidly.

-Accordingly, according to required-performances improvement and technology development, rapid extension of performance is mainly required matter.

By adding Interconnect Fabric simply, blade system is modular type architecture that provides high extension and easy extension than rack server.

 

- Because Magnum Switch , which can be offered by only Sun, which offers Non-Blocking Fabricsingle switch up to maximum 3456port on single switch, it provides Non-Blocking Fabrics extension without addition of IB Switch and it is easy extention on large cluster.

Interoperability

Interoperability about large scale computing node, sratch node, and diverse File service , management service, user environment and investigation security about performance bottleneck state are needed.

Computing Environment S/W stack

 

 

           Supercoms computing environment is needed to have development environment   of      compiler, debugger and diverse parallel performance library for preserving optimized performance of parallel programmings development and diverse application

           In case of cluster, which is a standard architecture , that includes very diverse devices ,and confirmation of compatibility about these each divices drives and tunning occupies a major part in whole performance and stability

 

■ Management skill

 

- cluster management, stsyem management, user management chip

 

Management skill about Performance management, obstacle management, construction management and report written.

 

 

-Provisioning about O/S and App

 

-H/W state management and automatic power On-off

 


 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Friday Sep 05, 2008

Decription

I'm writing the fundamental of HPC Cluster Architecture to help whom want to learn  the HPC Cluster architecure basically.

it's not finalized. I'll update continuously.

the references are below :

www.ksc.re.kr
http://en.wikipedia.org/wiki/Supercomputer
http://www.lustre.org/ Lustre White paper (CFS)
http://www-935.ibm.com/services/kr/index.wss/offering/its/a1023615
http://en.wikipedia.org/wiki/Job_scheduler
http://www.globus.org/toolkit/
http://www.rocksclusters.org/
Titech_blueprint_820-0831 (Sun Internal)
http://www.liebert.com/
http://www.dell.com

1) IDC’s HPC Market Definitions

HPC, named High Performance Computing, is mainly used for scientific research and industrial technology and high-performance computing systems.

Therfore, It is a section by used-purposes or technology rather than a specific technology and not a particular system, but extensive technical architecture elements can be included in this category. See below for the IDC market is separated all five categories, and this distinction can help that the distinction between architecture and technology

n Technical Capability

- Systems configured and purchased to solve the largest most demanding problems

n Technical Enterprise

- Systems purchased to support technical applications in throughput environments selling for $1 million or more

n Technical Divisional

- Systems purchased for throughput environments selling from $250,000 to $999,000

n Technical Departmental

- Systems purchased for throughput environments selling for $50,000 to $250,000

n Technical Workgroup

- Has been added for systems under $50,000

Supercomputer to provide a high-performance computing is part of HPC, such as the HPC SE Paper is a part of the field of the HPC Supercomputera. I will focus on a cluster of Sun computers in the current technology in order to summerize the architecture.

2) Supercomputer’s definition

If you need to define a super computer a word, you will tell that speed of operation is dozens of times more than the usual computer, or hundreds of times faster than that the usual computer. This definition is a very relative concept, as being the development of computers, the performance of supercomputers based on the old supercomputers today will not be there any more suggestive of the fact.

when we call supercomputers Commonly, we can think huse computer than a PC or workstation around, but we must cinsider first that a super computer and a normal computer are very different in arrange of application.

When Supercomputer normally calculates a large amount of numerical calculations in a given time or a large amount of information in a short time, you have to deal with the super computer.


'Large amount of numerical calculations' or' processing large amounts of information’, 'depending on the age and the changing concept of a relative, the current is 1 Tera Flops (1 jobeon operations per second) or 1 Tera Bytes (1 trillion Bytes). These supercomputers is being used in various fields (insurance companies, oil companies, film industry as well as high-tech engineering, the natural sciences)and is a useful tool in a lot of numerical calculation required, or processing of large amounts of data used in all areas.

The efficiency of the supercomputer per the candle there probably is a possibility changing mind floating decimal point of operation some, is used at the scale which is important, the efficiency and ranking of the super computer which it possesses from the world-wide each country are announced in yearly June and November. (http://www.top500.org)

2006 December (2007 June inclusion) currently the computer which equips a most quick computing function from the world is the BlueGene/L computer which from the Lawrence Livermore national laboratory (IINL). It is operating at American Illinois week. This computer is based in the IBM company BlueGene solution, and 131,072 processors and 32.7TBmemory are affixed. The objection efficiency (Rpeak) of this computer reaches 367TFLOPS and also the Rmax 280TFLOPS is upper remorse.

One side, the Blade Cluster BL-20P system of the IBM company which is established in the Telecommunication company of the United States places 4896Gflops with 500 hold. At the of result , The computer’s performence which is current called as super computer on 2006 December must become 4896Gflops (the Gflops: Per second 1000000000 operations) .

The super computer history of present-day meaning was started from the Cray-1 which is developed 1976 Seymour Cray in the center, even until 1980's opening the use was restricted with the Property of the government authority or the university of the United States.

However, since middle of 1980's it started toexamines the use characteristic of the super computer in the industrial body, as the result, calculation science and calculation engineering reappeared an actual physical actual condition and prediction possibility was the thing to be verified and the base of the super computer was quick and it is diffused it was started.

Currently among running cars in the road,there are nothing which don’t use supercomputer , and displayed goods in the pharmacy are also helped by supercomputer.

The world-wide important nations lead supercomputer center and supercomputer development to have the leader of scientific technique and information communication thorough development of calculation science and engineering field in middle of 80's


Specially the United States and Japan is developing in order to get super computer development and the high position of super computer market ,and the super computer life cycle is little by little abridged and per second it is the actual condition operation is possible from thousands hundred million are appearing.

A. Cluster’s definition

Cluster computer is a kind of arranging in a row control supercomputer whice is connected in personal PC or small-sized server.

This concept was oriented from intendtion of altherlative in high price supercomputer equipment by using low cost small equipment. This cluster enxtends the area in very fast such as picture 1

B. Cluster computer’s feature

For getting same power performance of super computer, if we buy the cluster on common price from large supercomputers bender, it will have same calculations ability.

Because of the reason, It needs to have common supercomputers operation performance using , but because of the cost, it is spearded to the application field in data processing ,internet service application, commercial part.

-By constructing PC or small sized server used by network device, the cluster computer is much better than common super computer.This advantage is accepted and applied to supercomputer. Accordingly, there are a little of restrictions but it is possible to perform performances extention when it is use.

The most characteristic point of cluster computer in practical use part is inflexible.The cluster computer offers optimized-environment by pointing the solution of problem form options of hardware device to options of user software. In other words, it is a semi-thing made to order computer. In the past, for specified problem, new hardwares were made but its cost was very expensive. Otherwise, cluster computer can make lower cost possible.

However, it is needed to intergrate from lower level than SI (system intergraion) in common reconized. When we purchase cluster from big supercomputer bender,the specific detail define work of using is needed. Accordingly, special knowledges related to cluster system are needed.


[picture 1]2005, Top 500 data.

We can look over that the cluster computer weights extension speed is high.

It is predicted that MPP computers occupancy will be alternated to cluster.

This blog copyright 2009 by Jongjun Son