1 Computing platform Andrew A. Chien Mohsen Saneei University of Tehran

1

Computing platformAndrew A. Chien

Mohsen Saneei

University of Tehran

2

outline

Basic Elements– Computing Elements– Communication Elements– Storage Elements

Simple Composite Elements (SCE): Local Grids– High Throughput SCEs– High Reliability SCEs– Dedicated High Performance SCEs– Shared Controllable Performance SCEs

Illinois HPVM Project and Similar Efforts

3

Basic ElementsComputing Elements

Gordon Moore’s low: the number of transistor and performance of chip, will be double every tow years.

Transistor = 20 x 2(yr-1965)/1.5

Example:– 1975: intel 8080 with 4500 transistors– 1998: pentium II with 7,500,000 tran.

Performance: increases by 1.5 times each year– In 1975: 0.1 MIPS– In 1998: 1,000 MIPS

4

Basic Elements Communication Elements

Networking element with terabit performance is available today

But networks had slower advances:– Cost– Fundamental nature

Over the past few years we have seen a rapid advance from 10 Mb/s networks to 100 Mb/s

We focus on local networks (Cluster networks)

5

Basic Elements Communication Elements (cluster networks)

cluster networks:– Physically localized– High speed– Generally low-volume products

Some cluster networks:– Myricom’s Myrinet– Compaq/Tandem’s Servernet

Servernet I Servernet II

6


Myrinet:– High speed local network– Full duplex 1.28 Gb/s links– Derived from multicomputer routers– Wormhole routing– Switch latency below 1 µs– Myrinet open new horizon for research and produced

messaging layersFast messageActive message

7


0102030405060708090

band

wid

th (M

B/s

)

Message size (byte) in Fast Message on Myrinet

8


Compaq/Tandem’s Servernet I– Full duplex 50 MB/s– Wormhole routing– Reliable communication– 64 byte packets– A few microsecond latency

Compaq/Tandem’s Servernet II– Full duplex 125 MB/s– 512 byte packet size– 64 bit network addressing

9

Basic Elements Storage Elements

Capacity and cost-performance improved at an exponential rate

Density:– In 1970 to 1988: 29% every year– After 1988: 60% every year

Cost-per-byte:– In 1970 to 1988: 40% every year– After 1988: 100% every year

But seek times are improving very slowly (average seek timea of 7-10 ms remain typical)

10

Basic Elementsfuture

Machine Computing Memory Disk Network

2003 PC 8 GIPS 1 GB 128 GB 1 Gb/s

Supercomputer 80 TIPS 10 TB 1,280 TB 10 Tb/s

2008 PC 64 GIPS 16 GB 2 TB 10 Gb/s

Supercomputer 640 TIPS 160 TB 20,000 TB 100 Tb/s

11

Simple Composite Elements (SCE): Local Grids

SCEs: collections of basic elements, aggregated with software and special hardware.

They are often single administrative domains. SCEs are study for these reasons:

– They can reduce the number of problems higher-level grids must solve.

– SCEs use resources and software to implement the external properties

– SCEs form the basis for the larger computational grid

12

Simple Composite Elements (SCE): Local Grids (cont.)

A national computational grid:– Reliable CEs to

management (access control & scheduling) and basic services (naming and routing)

– Other CEs to resource pools (data caching, storage, …)

13


SCEs are defined by:– Their external interface– Their internal hardware requirements– Their ability to deliver efficient and flexible use of the

hardware to application Their external interface:

– Capacity – Aggregate performance– Reliability – Predictability– sharability

14


hardware requirements– Heterogeneity– Network requirements (special hardware, link

length limited, bandwidth, …)– Distributed resources (links tens of meters or

thousands of kilometers)– Changes in constituent system– Scalability (number of nodes)

15

SCE: Local Grids High Throughput SCEs

Pooled resources are utilized to achieve high throughput on a set of sequential compute jobs– Example: Condor, Utopia , Symbio

External interface:– high capacity for computation and a sharable resource– Interface for some parallel computing such as PVM

are available

16

SCE: Local Grids High Throughput SCEs (cont.)

Hardware requirements:– Running on a wide range of processor and network

environment – Tolerating both processor and network heterogeneity

in type and speed– Can scale to larger number of processors (hundreds

to thousands)

17

SCE: Local Grids High Throughput SCEs (cont.)

High Throughput SCEs in grids– Flexible and powerful systems for achieving high

throughput on large numbers of sequential jobs.– Thus, they are well-matched grid elements for such

tasks.

No supported :– Aggregate performance– Reliability (partial reliable)– Predictability

18

SCE: Local Grids High Reliability SCEs (reliable clusters)

Provide computational resources with extremely low probability of service interruption and data loss

Limited scalability is used to increase system capacity

Used of sharable resource Prefer compatible hardware to enable

failover and data sharing

19

SCE: Local Grids High Reliability SCEs (reliable clusters) (cont.)

Can used of lower performance standby system to reduce cost

Can be physically localized or distributed over a wide area network

Traditionally used special operating system

20

SCE: Local Grids Dedicated High Performance SCEs

Merge basic element into a single resource, to be applied to a single computation.

Used of collection of microprocessors or entire systems (scalable network of workstation)

Initially applied to supercomputers tasks Scalable to connect hundreds or thousands of

node with limited physical extent (tens of meters)

21

SCE: Local Grids Dedicated High Performance SCEs (cont.)

Predominant programming model: message passing (such as MPI)

Support both sequential jobs and parallel computation but focus on highest single-job performance

don’t support: – Reliability – Predictability– Sharable

22


Berkeley Network Of Workstations (NOW) is a project o the Dedicated High Performance SCEs

IBM SP-2 and Intel/Sandia are 2 example:– Use high volume microprocessors as their

basic computation engines– Use custom high performance interconnect

delivering 5-100 MB/s of network bandwidth to each node

Latencies of 20-100 µs

23


IBM SP-2– Employed entire workstation as the basic building

block– Standard AIX workstation operating system– Allowing a single job to each node

Intel/Sandia– Employed special system boards and packaging as

the basic building block– A custom operating system PUMA– Multitasking and virtual memory are not provided on

the compute node

24

SCE: Local Grids Shared Controllable-Performance SCEs

Aim: deliver predictable high performance in a shared-resource, heterogeneous, distributed environment.

This SCEs combine the capabilities of all the SCEs except reliability.

This SCEs are High-Performance Virtual Machine (HPVM)

25

SCE: Local Grids Shared Controllable-Performance SCEs (cont.)

HPVM simplifies programming task by allowing programmers to focus on the complexity of the application

Construction of effective HPVM requires meeting a number of research challenges in:– High performance predictable communication– Management of heterogeneity– Performance models– Adaptive resource management

26

SCE: Local Grids Shared Controllable-Performance SCEs (cont.)

To achieve efficient tight coupling, the network hardware will need to support both low latency and high bandwidth.

It most be scalable to thousands of node, because HPVM execute on distributed resources

Physically limited geographic distribution

27

SCE: Local Grids Illinois HPVM Project and Similar Efforts

Aim: develop shared controllable-high-performance SCEs.

Basic parameters :– Computing nodes: x86 and PCI computing systems– Operating systems: Windows NT and Linux– Networks: Myrinet, servernet, …

Real World Computing Project (RWCP) & Berkeley Network Of Workstations II (NOW II) are another project on HPVM

28

summary

Element type Scalable Aggregatable Reliable Predictable Sharable

Basicelements

Basic compute No -- No No Yes

Basic storage No -- No No yes

Basic network No -- No No yes

Local

Grids

(SCE)

High Throughput yes No partial No yes

High Reliability Limited No yes No yes

Dedicated High Performance

yes Yes No No No

Shared Controllable Performance

yes yes No yes yes

Documents

1 Computing platform Andrew A. Chien Mohsen Saneei University of Tehran