174
1 Rajkumar Buyya, Monash University, Melbourne. Email: [email protected] / [email protected] Web: http://www.ccse.monash.edu.au/~rajkumar / www.buyya.com High Performance Cluster Computing ( Architecture, Systems, and Applications) ISCA 2000

clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

1

Rajkumar Buyya, Monash University, Melbourne.

Email: [email protected] / [email protected]

Web: http://www.ccse.monash.edu.au/~rajkumar / www.buyya.com

High Performance Cluster Computing(Architecture, Systems, and Applications)

ISCA

2000

Page 2: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

2

Objectives

� Learn and Share Recent advances in cluster computing (both in research and commercial settings):

– Architecture,

– System Software

– Programming Environments and Tools

– Applications

� Cluster Computing Infoware: (tutorial online)

– http://www.buyya.com/cluster/

Page 3: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

3

Agenda

�Overview of Computing

�Motivations & Enabling Technologies

�Cluster Architecture & its Components

�Clusters Classifications

�Cluster Middleware

�Single System Image

�Representative Cluster Systems

�Resources and Conclusions

Page 4: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

4

P PP P P P........

Microkernel

Multi-Processor Computing System

Threads Interface

Hardware

Operating System

ProcessProcessor ThreadP

Applications

Computing Elements

Programming Paradigms

Page 5: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

5

Architectures

System Software

Applications

P.S.Es

Architectures

System Software

Applications

P.S.Es

Sequential

Era

Parallel

Era

1940 50 60 70 80 90 2000 2030

Two Eras of Computing

Commercialization

R & D Commodity

Page 6: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

6

Computing Power andComputer Architectures

Page 7: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

7

Computing Power (HPC) Drivers

Solving grand challenge applications using computer modeling, simulation and analysis

Life SciencesLife Sciences

CAD/CAMCAD/CAM

AerospaceAerospace

Military ApplicationsDigital BiologyDigital Biology Military ApplicationsMilitary Applications

E-commerce/anything

Page 8: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

8

How to Run App. Faster ?

� There are 3 ways to improve performance:

– 1. Work Harder

– 2. Work Smarter

– 3. Get Help

� Computer Analogy

–1. Use faster hardware: e.g. reduce the time per

instruction (clock cycle).

–2. Optimized algorithms and techniques

–3. Multiple computers to solve problem: That

is, increase no. of instructions executed per

clock cycle.

Page 9: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

9

21 00

2 1 00 2 1 00 2 1 00

2 1 00

2 1 00 2 1 00 2 1 002 1 00

Desktop(Single Proces sor?)

SMPs orSuperC om

puters

LocalCluster

GlobalCluster/Grid

PERFORMANCE

Computing Platforms EvolutionComputing Platforms EvolutionBreaking Administrative BarriersBreaking Administrative Barriers

Inter PlanetCluster/Grid ??

IndividualGroupDepart mentCampus

Sta te

NationalGlobe

Inte r Plane tUniverse

Administrative Barriers

EnterpriseCluster/Grid

?

Page 10: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

10

Application Case Study

Web Serving and E-Commerce”

Page 11: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

11

E-Commerce and PDC ?

�What are/will be the major problems/issues in eCommerce? How will or can PDC be applied to solve some of them?

� Other than “Compute Power”, what else can PDC contribute to e-commerce?

� How would/could the different forms of PDC (clusters, hypercluster, GRID,…) be applied to e-commerce?

� Could you describe one hot research topic for PDC applying to e-commerce?

� A killer e-commerce application for PDC ?

�…...

Page 12: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

12

Killer Applications of Clusters

� Numerous Scientific & Engineering Apps.

� Parametric Simulations

� Business Applications

– E-commerce Applications (Amazon.com, eBay.com ….)

– Database Applications (Oracle on cluster)

– Decision Support Systems

� Internet Applications

– Web serving / searching

– Infowares (yahoo.com, AOL.com)

– ASPs (application service providers)

– eMail, eChat, ePhone, eBook, eCommerce, eBank, eSociety, eAnything!

– Computing Portals

� Mission Critical Applications

– command control systems, banks, nuclear reactor control, star-war, and handling life threatening situations.

Page 13: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

13

Major problems/issues in E-commerce

g Social Issues

g Capacity Planning

� Multilevel Business Support (e.g., B2P2C)

� Information Storage, Retrieval, and Update

� Performance

� Heterogeneity

� System Scalability

� System Reliability

� Identification and Authentication

� System Expandability

� Security

� Cyber Attacks Detection and Control (cyberguard)

� Data Replication, Consistency, and Caching

g Manageability (administration and control)

Page 14: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

14

Amazon.com: Online sales/trading killer E-commerce Portal

� Several Thousands of Items

– books, publishers, suppliers

�Millions of Customers

– Customers details, transactions details, support

for transactions update

� (Millions) of Partners

– Keep track of partners details, tracking referral

link to partner and sales and payment

� Sales based on advertised price

� Sales through auction/bids

– A mechanism for participating in the bid

(buyers/sellers define rules of the game)

Page 15: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

15

Can these driveE-Commerce ?

� Clusters are already in use for web serving, web-hosting, and number of other Internet applications including E-commerce

– scalability, availability, performance, reliable-high

performance-massive storage and database support.

– Attempts to support online detection of cyber attacks (through

data mining) and control

� Hyperclusters and the GRID:

– Support for transparency in (secure) Site/Data Replication for high

availability and quick response time (taking site close to the user).

– Compute power from hyperclusters/Grid can be used for data

mining for cyber attacks and fraud detection and control.

– Helps to build Compute Power Market, ASPs, and Computing

Portals.

2100 2100 21002100

2100 2100 21002100

Page 16: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

16

Science Portals - e.g., PAPIA system

PAPIA PC Cluster

PentiumsMyrinetNetBSD/LinuuxPMScore-DMPC++

RWCP Japan: http://www.rwcp.or.jp/papia/

Page 17: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

17

PDC hot topics for E-commerce

� Cluster based web-servers, search engineers, portals…

� Scheduling and Single System Image.

� Heterogeneous Computing

� Reliability and High Availability and Data Recovery

� Parallel Databases and high performance-reliable-mass storage systems.

� CyberGuard! Data mining for detection of cyber attacks, frauds, etc. detection and online control.

� Data Mining for identifying sales pattern and automatically tuning portal to special sessions/festival sales

� eCash, eCheque, eBank, eSociety, eGovernment, eEntertainment, eTravel, eGoods, and so on.

� Data/Site Replications and Caching techniques

� Compute Power Market

� Infowares (yahoo.com, AOL.com)

� ASPs (application service providers)

� . . .

Page 18: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

18

Sequential Architecture Limitations

�Sequential architectures reaching physicallimitation (speed of light, thermodynamics)

�Hardware improvements like pipelining, Superscalar, etc., are non-scalable and requires sophisticated Compiler Technology.

�Vector Processing works well for certain

kind of problems.

Page 19: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

19

No. of Processors

C.P.I.

1 2 . . . .

Computational Power Improvement

Multiprocessor

Uniprocessor

Page 20: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

20

Age

Growth

5 10 15 20 25 30 35 40 45 . . . .

Human Physical Growth Analogy:Computational Power Improvement

Vertical Horizontal

Page 21: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

21

�The Tech. of PP is mature and can beexploited commercially; significantR & D work on development of tools& environment.

�Significant development inNetworking technology is paving away for heterogeneous computing.

Why Parallel Processing NOW?

Page 22: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

22

History of Parallel Processing

� PP can be traced to a tablet datedaround 100 BC.

� Tablet has 3 calculating positions.

� Infer that multiple positions:

Reliability/ Speed

Page 23: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

23

�Aggregated speed with

which complex calculations

carried out by millions of neurons inhuman brain is amazing! althoughindividual neurons response is slow(milli sec.) - demonstrate thefeasibility of PP

Motivating Factors

Page 24: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

24

� Simple classification by Flynn:(No. of instruction and data streams)

� SISD - conventional

� SIMD - data parallel, vector computing

�MISD - systolic arrays

�MIMD - very general, multiple approaches.

� Current focus is on MIMD model, usinggeneral purpose processors ormulticomputers.

Taxonomy of Architectures

Page 25: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

25

Main HPC Architectures..1a

� SISD - mainframes, workstations, PCs.

� SIMD Shared Memory - Vector machines, Cray...

�MIMD Shared Memory - Sequent, KSR, Tera, SGI, SUN.

� SIMD Distributed Memory - DAP, TMC CM-2...

�MIMD Distributed Memory - Cray T3D, Intel, Transputers, TMC CM-5, plus recent workstation clusters (IBM SP2, DEC, Sun, HP).

Page 26: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

26

Motivation for using Clusters

�The communications bandwidth between workstations is increasing as new networking technologies and protocols are implemented in LANs and WANs.

�Workstation clusters are easier to integrateinto existing networks than special parallel computers.

Page 27: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

27

Main HPC Architectures..1b.

� NOTE: Modern sequential machines are not purely SISD - advanced RISC processors use many concepts from

– vector and parallel architectures (pipelining,

parallel execution of instructions, prefetching of

data, etc) in order to achieve one or more

arithmetic operations per clock cycle.

Page 28: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

28

Parallel Processing Paradox

�Time required to develop a parallel application for solving GCA is equal to:

– Half Life of Parallel Supercomputers.

Page 29: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

29

The Need for Alternative Supercomputing Resources

�Vast numbers of under utilised workstations available to use.

�Huge numbers of unused processor cycles and resources that could be put to good use in a wide variety of applications areas.

�Reluctance to buy Supercomputer due to their cost and short life span.

�Distributed compute resources “fit” better into today's funding model.

Page 30: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

30

Technology Trend

Page 31: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

31

Scalable Parallel Computers

Page 32: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

32

Design Space of Competing Computer Architecture

Page 33: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

33

Towards Inexpensive Supercomputing

It is:

Cluster Computing..The Commodity Supercomputing!

Page 34: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

34

Cluster Computing -Research Projects

� Beowulf (CalTech and NASA) - USA

� CCS (Computing Centre Software) - Paderborn, Germany

� Condor - Wisconsin State University, USA

� DQS (Distributed Queuing System) - Florida State University, US.

� EASY - Argonne National Lab, USA

� HPVM -(High Performance Virtual Machine),UIUC&now UCSB,US

� far - University of Liverpool, UK

� Gardens - Queensland University of Technology, Australia

� MOSIX - Hebrew University of Jerusalem, Israel

� MPI (MPI Forum, MPICH is one of the popular implementations)

� NOW (Network of Workstations) - Berkeley, USA

� NIMROD - Monash University, Australia

� NetSolve - University of Tennessee, USA

� PBS (Portable Batch System) - NASA Ames and LLNL, USA

� PVM - Oak Ridge National Lab./UTK/Emory, USA

Page 35: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

35

Cluster Computing -Commercial Software

� Codine (Computing in Distributed Network Environment) -GENIAS GmbH, Germany

� LoadLeveler - IBM Corp., USA

� LSF (Load Sharing Facility) - Platform Computing, Canada

� NQE (Network Queuing Environment) - Craysoft Corp., USA

� OpenFrame - Centre for Development of Advanced Computing, India

� RWPC (Real World Computing Partnership), Japan

� Unixware (SCO-Santa Cruz Operations,), USA

� Solaris-MC (Sun Microsystems), USA

� ClusterTools (A number for free HPC clusters tools from Sun)

� A number of commercial vendors worldwide are offering clustering solutions including IBM, Compaq, Microsoft, a number of startups like TurboLinux, HPTI, Scali, BlackStone…..)

Page 36: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

36

Motivation for using Clusters

�Surveys show utilisation of CPU cycles of desktop workstations is typically <10%.

�Performance of workstations and PCs is rapidly improving

�As performance grows, percent utilisation will decrease even further!

�Organisations are reluctant to buy large supercomputers, due to the large expense and short useful life span.

Page 37: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

37

Motivation for using Clusters

�The development tools for workstations are more mature than the contrasting proprietary solutions for parallel computers - mainly due to the non-standard nature of many parallel systems.

�Workstation clusters are a cheap and readily available alternative to specialised High Performance Computing (HPC) platforms.

�Use of clusters of workstations as a distributed compute resource is very cost effective - incremental growth of system!!!

Page 38: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

38

Cycle Stealing

�Usually a workstation will be owned by an individual, group, department, or organisation - they are dedicated to the exclusive use by the owners.

�This brings problems when attempting to form a cluster of workstations for running distributed applications.

Page 39: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

39

Cycle Stealing

�Typically, there are three types of owners, who use their workstations mostly for:

1. Sending and receiving email and preparing

documents.

2. Software development - edit, compile, debug and

test cycle.

3. Running compute-intensive applications.

Page 40: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

40

Cycle Stealing

�Cluster computing aims to steal spare cyclesfrom (1) and (2) to provide resources for (3).

�However, this requires overcoming the ownership hurdle - people are very protective of their workstations.

�Usually requires organisational mandate that computers are to be used in this way.

�Stealing cycles outside standard work hours(e.g. overnight) is easy, stealing idle cycles during work hours without impacting interactive use (both CPU and memory) is much harder.

Page 41: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

41

Rise & Fall of Computing Technologies

Mainframes Minis PCs

Minis PCs Network

Computing

1970 1980 1995

Page 42: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

42

Original Food Chain Picture

Page 43: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

43

1984 Computer Food Chain

Mainframe

Vector Supercomputer

Mini ComputerWorkstation

PC

Page 44: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

44

Mainframe

Vector Supercomputer MPP

WorkstationPC

1994 Computer Food Chain

Mini Computer(hitting wall soon)

(future is bleak)

Page 45: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

45

Computer Food Chain (Now and Future)

Page 46: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

46

What is a cluster?

�A cluster is a type of parallel or distributed processing system, which consists of a collection of interconnected stand-alone/complete computers cooperatively working together as a single, integrated computing resource.

�A typical cluster:

– Network: Faster, closer connection than a typical

network (LAN)

– Low latency communication protocols

– Looser connection than SMP

Page 47: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

47

Why Clusters now?(Beyond Technology and Cost)

� Building block is big enough

– complete computers (HW & SW) shipped in

millions: killer micro, killer RAM, killer disks,

killer OS, killer networks, killer apps.

�Workstations performance is doubling every 18 months.

� Networks are faster

� Higher link bandwidth (v 10Mbit Ethernet)

�Switch based networks coming (ATM)

�Interfaces simple & fast (Active Msgs)

� Striped files preferred (RAID)

� Demise of Mainframes, Supercomputers, & MPPs

Page 48: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

48

Architectural Drivers…(cont)

� Node architecture dominates performance

– processor, cache, bus, and memory

– design and engineering $ => performance

� Greatest demand for performance is on large systems

– must track the leading edge of technology without lag

�MPP network technology => mainstream

– system area networks

� System on every node is a powerful enabler

– very high speed I/O, virtual memory, scheduling, …

Page 49: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

49

...Architectural Drivers

� Clusters can be grown: Incremental scalability (up, down, and across)

– Individual nodes performance can be improved by

adding additional resource (new memory blocks/disks)

– New nodes can be added or nodes can be removed

– Clusters of Clusters and Metacomputing

� Complete software tools

– Threads, PVM, MPI, DSM, C, C++, Java, Parallel

C++, Compilers, Debuggers, OS, etc.

�Wide class of applications

– Sequential and grand challenging parallel applications

Page 50: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

Clustering of Computers for Collective Computing: Trends

1960 1990 1995+ 2000

?

Page 51: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

51

Example Clusters:Berkeley NOW

� 100 Sun UltraSparcs

– 200 disks

� Myrinet SAN

– 160 MB/s

� Fast comm.

– AM, MPI, ...

� Ether/ATM switched external net

� Global OS

� Self Config

Page 52: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

52

Basic Components

$

P

M I/O bus

MyriNet

P

Sun Ultra 170

MyricomNIC

160 MB/s

M

Page 53: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

53

Massive Cheap Storage Cluster

� Basic unit:

2 PCs double-ending four SCSI chains of 8 disks each

Currently serving Fine Art at http://www.thinker.org/imagebase/

Page 54: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

54

Cluster of SMPs (CLUMPS)

� Four Sun E5000s

– 8 processors

– 4 Myricom NICs each

�Multiprocessor, Multi-NIC, Multi-Protocol

� NPACI => Sun 450s

Page 55: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

55

Millennium PC Clumps

� Inexpensive, easy to manage Cluster

� Replicated in many departments

� Prototype for very large PC cluster

Page 56: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

56

Adoption of the Approach

Page 57: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

57

So What’s So Different?

� Commodity parts?

� Communications Packaging?

� Incremental Scalability?

� Independent Failure?

� Intelligent Network Interfaces?

� Complete System on every node

– virtual memory

– scheduler

– files

– ...

Page 58: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

58

OPPORTUNITIES &

CHALLENGES

Page 59: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

59

Shared Pool ofComputing Resources:

Processors, Memory, Disks

Interconnect

Guarantee atleast oneworkstation to many individuals

(when active)

Deliver large % of collectiveresources to few individuals

at any one time

Opportunity of Large-scaleComputing on NOW

Page 60: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

60

Windows of Opportunities

� MPP/DSM:

– Compute across multiple systems: parallel.

� Network RAM:

– Idle memory in other nodes. Page across

other nodes idle memory

� Software RAID:

– file system supporting parallel I/O and

reliablity, mass-storage.

� Multi-path Communication:

– Communicate across multiple networks:

Ethernet, ATM, Myrinet

Page 61: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

61

Parallel Processing

� Scalable Parallel Applications require

– good floating-point performance

– low overhead communication scalable

network bandwidth

– parallel file system

Page 62: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

62

Network RAM

� Performance gap between processor anddisk has widened.

� Thrashing to disk degrades performancesignificantly

� Paging across networks can be effectivewith high performance networks and OSthat recognizes idle machines

� Typically thrashing to network RAM can be 5to 10 times faster than thrashing to disk

Page 63: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

63

Software RAID: Redundant Array of Workstation Disks

� I/O Bottleneck:

– Microprocessor performance is improving more

than 50% per year.

– Disk access improvement is < 10%

– Application often perform I/O

� RAID cost per byte is high compared to singledisks

� RAIDs are connected to host computers which areoften a performance and availability bottleneck

� RAID in software, writing data across an array ofworkstation disks provides performance and somedegree of redundancy provides availability.

Page 64: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

64

Software RAID, Parallel File Systems, and Parallel I/O

Page 65: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

65

Cluster Computer and its Components

Page 66: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

66

Clustering Today

�Clustering gained momentum when 3 technologies converged:

–1. Very HP Microprocessors• workstation performance = yesterday supercomputers

–2. High speed communication• Comm. between cluster nodes >= between processors

in an SMP.

–3. Standard tools for parallel/ distributed

computing & their growing popularity.

Page 67: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

67

Cluster Computer Architecture

Page 68: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

68

Cluster Components...1aNodes

�Multiple High Performance Components:

– PCs

– Workstations

– SMPs (CLUMPS)

– Distributed HPC Systems leading to

Metacomputing

� They can be based on different architectures and running difference OS

Page 69: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

69

Cluster Components...1bProcessors

� There are many (CISC/RISC/VLIW/Vector..)

– Intel: Pentiums, Xeon, Merceed….

– Sun: SPARC, ULTRASPARC

– HP PA

– IBM RS6000/PowerPC

– SGI MPIS

– Digital Alphas

� Integrate Memory, processing and networking into a single chip– IRAM (CPU & Mem): (http://iram.cs.berkeley.edu)

– Alpha 21366 (CPU, Memory Controller, NI)

Page 70: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

70

Cluster Components…2OS

�State of the art OS:– Linux (Beowulf)

– Microsoft NT (Illinois HPVM)

– SUN Solaris (Berkeley NOW)

– IBM AIX (IBM SP2)

– HP UX (Illinois - PANDA)

– Mach (Microkernel based OS) (CMU)

– Cluster Operating Systems (Solaris MC, SCO Unixware,

MOSIX (academic project)

– OS gluing layers: (Berkeley Glunix)

Page 71: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

71

Cluster Components…3High Performance Networks

�Ethernet (10Mbps),

�Fast Ethernet (100Mbps),

�Gigabit Ethernet (1Gbps)

�SCI (Dolphin - MPI- 12micro-sec latency)

�ATM

�Myrinet (1.2Gbps)

�Digital Memory Channel

�FDDI

Page 72: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

72

Cluster Components…4Network Interfaces

�Network Interface Card

–Myrinet has NIC

–User-level access support

–Alpha 21364 processor integrates

processing, memory controller,

network interface into a single chip..

Page 73: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

73

Cluster Components…5 Communication Software

� Traditional OS supported facilities (heavy weight due to protocol processing)..

– Sockets (TCP/IP), Pipes, etc.

� Light weight protocols (User Level)

– Active Messages (Berkeley)

– Fast Messages (Illinois)

– U-net (Cornell)

– XTP (Virginia)

� System systems can be built on top of the above protocols

Page 74: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

74

Cluster Components…6aCluster Middleware

� Resides Between OS and Applications and offers in infrastructure for supporting:

– Single System Image (SSI)

– System Availability (SA)

� SSI makes collection appear as single machine (globalised view of system resources). Telnet cluster.myinstitute.edu

� SA - Check pointing and process migration..

Page 75: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

75

Cluster Components…6bMiddleware Components

�Hardware – DEC Memory Channel, DSM (Alewife, DASH) SMP

Techniques

�OS / Gluing Layers

– Solaris MC, Unixware, Glunix)

�Applications and Subsystems

– System management and electronic forms

– Runtime systems (software DSM, PFS etc.)

– Resource management and scheduling (RMS):

• CODINE, LSF, PBS, NQS, etc.

Page 76: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

76

Cluster Components…7aProgramming environments

� Threads (PCs, SMPs, NOW..)

– POSIX Threads

– Java Threads

� MPI

– Linux, NT, on many Supercomputers

� PVM

� Software DSMs (Shmem)

Page 77: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

77

Cluster Components…7bDevelopment Tools ?

�Compilers– C/C++/Java/ ;

– Parallel programming with C++ (MIT Press book)

�RAD (rapid application development tools).. GUI based tools for PP modeling

�Debuggers

�Performance Analysis Tools

�Visualization Tools

Page 78: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

78

Cluster Components…8Applications

�Sequential

�Parallel / Distributed (Cluster-aware app.)

–Grand Challenging applications

• Weather Forecasting

• Quantum Chemistry

• Molecular Biology Modeling

• Engineering Analysis (CAD/CAM)

• ……………….

–PDBs, web servers,data-mining

Page 79: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

79

Key Operational Benefits of Clustering

� System availability (HA). offer inherent high system availability due to the redundancy of hardware, operating systems, and applications.

� Hardware Fault Tolerance. redundancy for most system components (eg. disk-RAID), including both hardware and software.

� OS and application reliability. run multiple copies of the OS and applications, and through this redundancy

� Scalability. adding servers to the cluster or by adding more clusters to the network as the need arises or CPU to SMP.

� High Performance. (running cluster enabled programs)

Page 80: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

80

Classification

of Cluster Computer

Page 81: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

81

Clusters Classification..1

�Based on Focus (in Market)

–High Performance (HP) Clusters

• Grand Challenging Applications

–High Availability (HA) Clusters

• Mission Critical applications

Page 82: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

82

HA Cluster: Server Cluster with "Heartbeat" Connection

Page 83: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

83

Clusters Classification..2

�Based on Workstation/PC Ownership

–Dedicated Clusters

–Non-dedicated clusters

• Adaptive parallel computing

• Also called Communal multiprocessing

Page 84: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

84

Clusters Classification..3

�Based on Node Architecture..

–Clusters of PCs (CoPs)

–Clusters of Workstations (COWs)

–Clusters of SMPs (CLUMPs)

Page 85: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

85

Building Scalable Systems: Cluster of SMPs (Clumps)

Performance of SMP Systems Vs.

Four-Processor Servers in a Cluster

Page 86: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

86

Clusters Classification..4

�Based on Node OS Type..

–Linux Clusters (Beowulf)

–Solaris Clusters (Berkeley NOW)

–NT Clusters (HPVM)

–AIX Clusters (IBM SP2)

–SCO/Compaq Clusters (Unixware)

–…….Digital VMS Clusters, HP

clusters, ………………..

Page 87: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

87

Clusters Classification..5

�Based on node components architecture & configuration (Processor Arch, Node Type: PC/Workstation.. & OS: Linux/NT..):

–Homogeneous Clusters• All nodes will have similar configuration

–Heterogeneous Clusters

• Nodes based on different processors and

running different OSes.

Page 88: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

88

Clusters Classification..6aDimensions of Scalability & Levels of

Clustering

Network

Technology

Platform

Uniprocessor

SMP

Cluster

MPP

(1)

(2)

(3)

Campus

Enterprise

Workgroup

Department

Public Metacomputing (GRID)

Page 89: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

89

Clusters Classification..6bLevels of Clustering

�Group Clusters (#nodes: 2-99)

– (a set of dedicated/non-dedicated computers -

mainly connected by SAN like Myrinet)

� Departmental Clusters (#nodes: 99-999)

� Organizational Clusters (#nodes: many 100s)

� (using ATMs Net)

� Internet-wide Clusters=Global Clusters:(#nodes: 1000s to many millions)

– Metacomputing

– Web-based Computing

– Agent Based Computing

• Java plays a major in web and agent based computing

Page 90: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

90

g Size Scalability (physical & application)

g Enhanced Availability (failure management)

g Single System Image (look-and-feel of one system)

g Fast Communication (networks & protocols)

g Load Balancing (CPU, Net, Memory, Disk)

g Security and Encryption (clusters of clusters)

g Distributed Environment (Social issues)

g Manageability (admin. And control)

g Programmability (simple API if required)

g Applicability (cluster-aware and non-aware app.)

Major issues in cluster design

Page 91: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

91

Cluster Middleware

and

Single System Image

Page 92: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

92

A typical Cluster Computing Environment

PVM / MPI/ RSH

Application

Hardware/OS

???

Page 93: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

93

CC should support

� Multi-user, time-sharingenvironments

� Nodes with different CPU speeds and

memory sizes (heterogeneous configuration)

� Many processes, with unpredictable

requirements

� Unlike SMP: insufficient “bonds” between

nodes

– Each computer operates independently

Page 94: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

94

The missing link is provide by cluster middleware/underware

PVM / MPI/ RSH

Application

Hardware/OS

Middleware or

Underware

Page 95: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

95

SSI Clusters--SMP services on a CC

� Adaptive resource usage for better

performance

� Ease of use - almost like SMP

� Scalable configurations -by decentralized

control

Result: HPC/HAC at PC/Workstation prices

“Pool Together” the “Cluster-Wide” resources

Page 96: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

96

What is Cluster Middleware ?

� An interface between between use applications and cluster hardware and OS platform.

� Middleware packages support each other at the management, programming, and implementation levels.

� Middleware Layers:

– SSI Layer

– Availability Layer: It enables the cluster services of

• Checkpointing, Automatic Failover, recovery from

failure,

• fault-tolerant operating among all cluster nodes.

Page 97: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

97

Middleware Design Goals

� Complete Transparency (Manageability)

– Lets the see a single cluster system..

• Single entry point, ftp, telnet, software loading...

� Scalable Performance

– Easy growth of cluster

• no change of API & automatic load distribution.

� Enhanced Availability

– Automatic Recovery from failures

• Employ checkpointing & fault tolerant technologies

– Handle consistency of data when replicated..

Page 98: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

98

What is Single System Image (SSI) ?

�A single system image is the illusion, created by software or hardware, that presents a collection of resources as one, more powerful resource.

�SSI makes the cluster appear like a single machine to the user, to applications, and to the network.

�A cluster without a SSI is not a cluster

Page 99: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

99

Benefits of Single System Image

� Usage of system resources transparently

� Transparent process migration and load balancing across nodes.

� Improved reliability and higher availability

� Improved system response time and performance

� Simplified system management

� Reduction in the risk of operator errors

� User need not be aware of the underlying system architecture to use these machines effectively

Page 100: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

100

Desired SSI Services

� Single Entry Point

– telnet cluster.my_institute.edu

– telnet node1.cluster. institute.edu

� Single File Hierarchy: xFS, AFS, Solaris MC Proxy

� Single Control Point: Management from single GUI

� Single virtual networking

� Single memory space - Network RAM / DSM

� Single Job Management: Glunix, Codine, LSF

� Single User Interface: Like workstation/PC windowing environment (CDE in Solaris/NT), may it can use Web technology

Page 101: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

101

Availability Support Functions

� Single I/O Space (SIO):

– any node can access any peripheral or disk devices

without the knowledge of physical location.

� Single Process Space (SPS)

– Any process on any node create process with cluster

wide process wide and they communicate through

signal, pipes, etc, as if they are one a single node.

� Checkpointing and Process Migration.

– Saves the process state and intermediate results in

memory to disk to support rollback recovery when

node fails. PM for Load balancing...

Page 102: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

102

Scalability Vs. Single System Image

UP

Page 103: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

103

SSI Levels/How do we implement SSI ?

� It is a computer science notion of levels of abstractions (house is at a higher level of abstraction than walls, ceilings, and floors).

Application and Subsystem Level

Operating System Kernel Level

Hardware Level

Page 104: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

104

SSI at Application and Subsystem Level

Level Examples Boundary Importance

application cluster batch system,

system management

subsystem

file system

distributed DB,

OSF DME, Lotus

Notes, MPI, PVM

an application what a user

wants

Sun NFS, OSF,

DFS, NetWare,

and so on

a subsystem SSI for all

applications of

the subsystem

implicitly supports

many applications

and subsystems

shared portion of

the file system

toolkit OSF DCE, Sun

ONC+, Apollo

Domain

best level of

support for heter-

ogeneous system

explicit toolkit

facilities: user,

service name,time

(c) In search of clusters

Page 105: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

105

SSI at Operating System Kernel Level

Level Examples Boundary Importance

Kernel/

OS LayerSolaris MC, Unixware

MOSIX, Sprite,Amoeba

/ GLunix

kernel

interfaces

virtual

memory

UNIX (Sun) vnode,

Locus (IBM) vproc

each name space:

files, processes,

pipes, devices, etc.

kernel support for

applications, adm

subsystems

none supporting

operating system kernel

type of kernel

objects: files,

processes, etc.

modularizes SSI

code within

kernel

may simplify

implementation

of kernel objects

each distributed

virtual memory

space

microkernel Mach, PARAS, Chorus,

OSF/1AD, Amoeba

implicit SSI for

all system services

each service

outside the

microkernel

(c) In search of clusters

Page 106: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

106

SSI at Harware Level

Level Examples Boundary Importance

memory SCI, DASH better communica-

tion and synchro-

nization

memory space

memory

and I/O

SCI, SMP techniques lower overhead

cluster I/O

memory and I/O

device space

Application and Subsystem Level

Operating System Kernel Level

(c) In search of clusters

Page 107: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

107

SSI Characteristics

�1. Every SSI has a boundary

�2. Single system support can exist at different levels within a system, one able to be build on another

Page 108: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

108

SSI Boundaries -- an applications SSI boundary

Batch System

SSIBoundary

(c) In searchof clusters

Page 109: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

109

Relationship Among Middleware Modules

Page 110: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

110

SSI via OS path!

� 1. Build as a layer on top of the existing OS

– Benefits: makes the system quickly portable, tracks

vendor software upgrades, and reduces development

time.

– i.e. new systems can be built quickly by mapping

new services onto the functionality provided by the

layer beneath. Eg: Glunix

� 2. Build SSI at kernel level, True Cluster OS

– Good, but Can’t leverage of OS improvements by vendor

– E.g. Unixware, Solaris-MC, and MOSIX

Page 111: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

111

SSI Representative Systems

�OS level SSI

– SCO NSC UnixWare

– Solaris-MC

– MOSIX, ….

�Middleware level SSI

– PVM, TreadMarks (DSM), Glunix,

Condor, Codine, Nimrod, ….

�Application level SSI

– PARMON, Parallel Oracle, ...

Page 112: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

112

SCO NonStop® Cluster for UnixWare

Users, applications, and systems management

Standard OS kernel calls

Modularkernel

extensions

Extensions

UP or SMP node

Users, applications, and systems management

Standard OS kernel calls

Modularkernel

extensions

Extensions

Devices Devices

ServerNet™

UP or SMP node

Standard SCO UnixWare®

with clustering hooks

Standard SCO UnixWare

with clustering hooks

Other nodes

http://www.sco.com/products/clustering/

Page 113: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

113

How does NonStop Clusters Work?

� Modular Extensions and Hooks to Provide:

– Single Clusterwide Filesystem view

– Transparent Clusterwide device access

– Transparent swap space sharing

– Transparent Clusterwide IPC

– High Performance Internode Communications

– Transparent Clusterwide Processes, migration,etc.

– Node down cleanup and resource failover

– Transparent Clusterwide parallel TCP/IP networking

– Application Availability

– Clusterwide Membership and Cluster timesync

– Cluster System Administration

– Load Leveling

Page 114: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

114

Solaris-MC: Solaris for MultiComputers

� global file system

� globalized process management

� globalized networking and I/O

SolarisMCArchitecture

Systemcall interface

Network

Filesystem

C++

Processes

Object framework

ExistingSolaris2.5kernel

Othernodes

Objectinvocations

Kernel

SolarisMC

Applications

http://www.sun.com/research/solaris-mc/

Page 115: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

115

Solaris MC components

� Object and communication support

� High availability support

� PXFS global distributed file system

� Process mangement

� NetworkingSolarisMCArchitecture

Systemcall interface

Network

Filesystem

C++

Processes

Object framework

ExistingSolaris2.5kernel

Othernodes

Objectinvocations

Kernel

SolarisMC

Applications

Page 116: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

116

MulticomputerOS for UNIX (MOSIX)

� An OS module (layer) that provides the applications with the illusion of working on a single system

� Remote operations are performed like local operations

� Transparent to the application - user interface unchanged

PVM / MPI / RSH

Application

Hardware/OS

http://www.mosix.cs.huji.ac.il/

Page 117: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

117

Main tool

� Supervised by distributed algorithms that

respond on-line to global resource

availability - transparently

� Load-balancing - migrate process from over-loaded to under-loaded nodes

� Memory ushering - migrate processes from a node that has exhausted its memory, to prevent paging/swapping

Preemptive process migration that can

migrate--->any process, anywhere, anytime

Page 118: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

118

MOSIX for Linux at HUJI

� A scalable cluster configuration:

– 50 Pentium-II 300 MHz

– 38 Pentium-Pro 200 MHz (some are SMPs)

– 16 Pentium-II 400 MHz (some are SMPs)

� Over 12 GB cluster-wide RAM

� Connected by the Myrinet 2.56 G.b/s LANRuns Red-Hat 6.0, based on Kernel 2.2.7

� Upgrade: HW with Intel, SW with Linux

� Download MOSIX:

– http://www.mosix.cs.huji.ac.il/

Page 119: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

119

NOW @ Berkeley

� Design & Implementation of higher-level system

�Global OS (Glunix)

�Parallel File Systems (xFS)

�Fast Communication (HW for Active Messages)

�Application Support

� Overcoming technology shortcomings

�Fault tolerance

�System Management

� NOW Goal: Faster for Parallel AND Sequential

http://now.cs.berkeley.edu/

Page 120: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

120

NOW Software Components

AM L.C.P.

VN segment Driver

UnixWorkstation

AM L.C.P.

VN segment Driver

UnixWorkstation

AM L.C.P.

VN segment Driver

UnixWorkstation

AM L.C.P.

VN segment Driver

Unix (Solaris)Workstation

Global Layer Unix

Myrinet Scalable Interconnect

Large Seq. AppsParallel Apps

Sockets, Split-C, MPI, HPF, vSM

Active MessagesName Svr

Page 121: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

121

3 Paths for Applications on NOW?

� Revolutionary (MPP Style): write new programs from scratch using MPP languages, compilers, libraries,…

� Porting: port programs from mainframes, supercomputers, MPPs, …

� Evolutionary: take sequential program & use

1) Network RAM: first use memory of many computers to reduce disk accesses; if not fast enough, then:

2) Parallel I/O: use many disks in parallel for accesses not in file cache; if not fast enough, then:

3) Parallel program: change program until it sees enough processors that is fast=> Large speedup without fine grain parallel program

Page 122: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

122

Comparison of 4 Cluster Systems

Page 123: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

123

Cluster Programming Environments

� Shared Memory Based

– DSM

– Threads/OpenMP (enabled for clusters)

– Java threads (HKU JESSICA, IBM cJVM)

� Message Passing Based

– PVM (PVM)

– MPI (MPI)

� Parametric Computations

– Nimrod/Clustor

� Automatic Parallelising Compilers

� Parallel Libraries & Computational Kernels (NetSolve)

Page 124: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

124

Code-Granularity

Code Item

Large grain

(task level)

Program

Medium grain

(control level)

Function (thread)

Fine grain

(data level)

Loop (Compiler)

Very fine grain

(multiple issue)

With hardware

Code-Granularity

Code Item

Large grain

(task level)

Program

Medium grain

(control level)

Function (thread)

Fine grain

(data level)

Loop (Compiler)

Very fine grain

(multiple issue)

With hardware

Levels of ParallelismLevels of Parallelism

Task i-l Task i Task i+1

func1 ( )

{

....

....

}

func2 ( )

{

....

....

}

func3 ( )

{

....

....

}

a ( 0 ) =..a ( 0 ) =..

b ( 0 ) =..

a ( 1 )=..a ( 1 )=..

b ( 1 )=..

a ( 2 )=..a ( 2 )=..

b ( 2 )=..

+ x Load

PVM/MPI

Threads

Compilers

CPU

Page 125: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

125

MPI (Message Passing Interface)

� A standard message passing interface.– MPI 1.0 - May 1994 (started in 1992)

– C and Fortran bindings (now Java)

� Portable (once coded, it can run on virtually all HPC platforms including clusters!

� Performance (by exploiting native hardware features)

� Functionality (over 115 functions in MPI 1.0)

– environment management, point-to-point & collective communications, process group, communication world, derived data types, and virtual topology routines.

� Availability - a variety of implementations available, both vendor and public domain.

http://www.mpi-forum.org/

Page 126: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

126

A Sample MPI Program...

# include <stdio.h>

# include <string.h>

#include “mpi.h”

main( int argc, char *argv[ ])

{

int my_rank; /* process rank */

int p; /*no. of processes*/

int source; /* rank of sender */

int dest; /* rank of receiver */

int tag = 0; /* message tag, like “email subject” */

char message[100]; /* buffer */

MPI_Status status; /* function return status */

/* Start up MPI */

MPI_Init( &argc, &argv );

/* Find our process rank/id */

MPI_Comm_rank( MPI_COM_WORLD, &my_rank);

/*Find out how many processes/tasks part of this run */

MPI_Comm_size( MPI_COM_WORLD, &p);

(master)

(workers)

Hello,...

Page 127: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

127

A Sample MPI Program

if( my_rank == 0) /* Master Process */

{

for( source = 1; source < p; source++)

{

MPI_Recv( message, 100, MPI_CHAR, source, tag, MPI_COM_WORLD, &status);

printf(“%s \n”, message);

}

}

else /* Worker Process */

{

sprintf( message, “Hello, I am your worker process %d!”, my_rank );

dest = 0;

MPI_Send( message, strlen(message)+1, MPI_CHAR, dest, tag, MPI_COM_WORLD);

}

/* Shutdown MPI environment */

MPI_Finalise();

}

Page 128: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

128

Execution

% cc -o hello hello.c -lmpi

% mpirun -p2 hello

Hello, I am process 1!

% mpirun -p4 hello

Hello, I am process 1!

Hello, I am process 2!

Hello, I am process 3!

% mpirun hello

(no output, there are no workers.., no greetings)

Page 129: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

129

PARMON: A Cluster Monitoring Tool

PARMONHigh-Speed

Switch

parmond

parmon

PARMON Server

on each nodePARMON Client on JVM

http://www.buyya.com/parmon/

Page 130: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

130

Resource Utilization at a Glance

Page 131: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

131

Single I/O Space and Single I/O Space and Design IssuesDesign Issues

Globalised Cluster Storage

Reference:

Designing SSI Clusters with Hierarchical Checkpointing and Single I/O

Space”, IEEE Concurrency, March, 1999

by K. Hwang, H. Jin et.al

Page 132: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

132

Without Single I/O Space

Users

With Single I/O Space Services

Users

Single I/O Space Services

Clusters with & without Single I/O Space

Page 133: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

133

Benefits of Single I/O SpaceBenefits of Single I/O Space

� Eliminate the gap between accessing local disk(s) and remote

disks

� Support persistent programming paradigm

� Allow striping on remote disks, accelerate parallel I/O

operations

� Facilitate the implementation of distributed checkpointing and

recovery schemes

Page 134: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

134

Single I/O Space Design IssuesSingle I/O Space Design Issues

�� Integrated I/O SpaceIntegrated I/O Space

�� Addressing and Mapping MechanismsAddressing and Mapping Mechanisms

�� Data movement proceduresData movement procedures

Page 135: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

135

Integrated I/O SpaceIntegrated I/O Space

Sequential

addresses

. . .

B11

SD1 SD2 SDm

. . .

. . .

. . .

. . .

. . .

. . .

D11 D12 D1t

D21 D22 D2t

Dn1 Dn2 Dnt

B12

B1k

B21

B22

B2k

Bm1

Bm2

Bmk

LD1

LD2

LDn

Local

Disks,

(RADD

Space)

Shared

RAIDs,

(NASD Space)

. . .

P1

Ph

. . . Peripherals

(NAP Space)

Page 136: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

136

User-level

Middleware

plus some

Modified OS

System Calls

User Applications

RADD

I/O Agent

Name Agent Disk/RAID/

NAP Mapper

Block Mover

I/O Agent

NASD

I/O Agent

NAP

I/O Agent

Addressing and MappingAddressing and Mapping

Page 137: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

137

Data Movement ProceduresData Movement Procedures

Node 1

LD2 or SDi

of the NASD

Block

Mover

User

Application

I/O Agent

Node 2

I/O Agent

A

A

LD1

Node 1

LD2 or SDi

of the NASD

Block

Mover

User

Application

I/O Agent

Node 2

I/O Agent

A

Request

Data

Block A

LD1

Page 138: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

138

What Next ??

Clusters of Clusters (HyperClusters)

Global Grid

Interplanetary Grid

Universal Grid??

Page 139: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

139

Clusters of Clusters (HyperClusters)

Scheduler

Master

Daemon

ExecutionDaemon

Submit

GraphicalControl

Clients

Cluster 2

Scheduler

Master

Daemon

ExecutionDaemon

Submit

GraphicalControl

Clients

Cluster 3

Scheduler

Master

Daemon

ExecutionDaemon

Submit

GraphicalControl

Clients

Cluster 1

LAN/WAN

Page 140: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

140

Towards Grid Computing….

For illustration, placed resources arbitrarily on the GUSTO test-bed!!

Page 141: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

141

What is Grid ?

� An infrastructure that couples

– Computers (PCs, workstations, clusters, traditional

supercomputers, and even laptops, notebooks, mobile

computers, PDA, and so on)

– Software ? (e.g., renting expensive special purpose applications

on demand)

– Databases (e.g., transparent access to human genome database)

– Special Instruments (e.g., radio telescope--SETI@Home

Searching for Life in galaxy, Austrophysics@Swinburne for

pulsars)

– People (may be even animals who knows ?)

� across the local/wide-area networks (enterprise, organisations, or Internet) and presents them as an unified integrated (single) resource.

Page 142: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

142

Conceptual view of the Grid

Leading to Portal (Super)Computing

http://www.sun.com/hpc/

Page 143: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

143

Grid Application-Drivers

� Old and New applications getting enabled due to coupling of computers, databases, instruments, people, etc:

– (distributed) Supercomputing

– Collaborative engineering

– high-throughput computing

• large scale simulation & parameter studies

– Remote software access / Renting Software

– Data-intensive computing

– On-demand computing

Page 144: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

144

Grid Components

GridFabricNetworked Resources across

OrganisationsComputers Clusters Data Sources Scientific InstrumentsStorage Systems

Local Resource Managers

Operating Systems Queuing Systems TCP/IP & UDP

R

Libraries & App Kernels R

Distributed Resources Coupling Services

Comm. Sign on & Security Information R QoSProcess Data Access

Development Environments and Tools

Languages Libraries Debuggers R Web toolsResource BrokersMonitoring

Applications and Portals

Prob. Solving Env.Scientific RCollaborationEngineering Web enabled Apps

GridApps.

GridMiddleware

GridTools

Page 145: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

145

Many GRID Projects and Initiatives

� PUBLIC FORUMS

– Computing Portals

– Grid Forum

– European Grid Forum

– IEEE TFCC!

– GRID’2000 and more.

� Australia

– Nimrod/G

– EcoGrid and GRACE

– DISCWorld

� Europe

– UNICORE

– MOL

– METODIS

– Globe

– Poznan Metacomputing

– CERN Data Grid

– MetaMPI

– DAS

– JaWS

– and many more...

� Public Grid Initiatives

– Distributed.net

– SETI@Home

– Compute Power Grid

� USA

– Globus

– Legion

– JAVELIN

– AppLes

– NASA IPG

– Condor

– Harness

– NetSolve

– NCSA Workbench

– WebFlow

– EveryWhere

– and many more...

� Japan

– Ninf

– Bricks

– and many more...

http://www.gridcomputing.com/

Page 146: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

146

NetSolve

Client/Server/Agent -- Based Computing

• Client-Server design

• Network-enabled solve rs

• Seamless access to resources

• Non-hierarchical system

• Load Balancing

• Fault Tole rance

• Interfaces to Fortran, C, Java, Matlab, more

Easy-to-use tool to provide efficient and uniform

access to a variety of scientific packages on UNIX platforms

NetSolve Client NetSolve Agent

Network Resources

Software Repository

Software is avai lable

www.cs.utk.edu/netsolve/

request

choicereply

Page 147: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

147

Host D

Host C

Host B

Host A

Virtual

Machine

Operation within VM usesDistributed Control

process control

user features

HARNESS daem on

Customizationand extens ion

by dynamicallyadding plug-ins

Componentbased daemon

Discovery and registration

AnotherVM

HARNESS Virtual MachineHARNESS Virtual MachineScalable Distributed control and CCA based Daemon Scalable Distributed control and CCA based Daemon

http://www.epm.ornl.gov/harness/

Page 148: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

148

HARNESS Core ResearchHARNESS Core ResearchParallel Plug-ins for Heterogeneous Distributed Virtual Machine Parallel Plug-ins for Heterogeneous Distributed Virtual Machine

One research goal is to understand and implement

a dynamic parallel plug-in environment.

provides a method for many users to extend Harness in much the same way that third party serial plug-ins extend Netscape, Photoshop, and Linux.

Research issues with Parallel plug-ins include:

heterogeneity, synchronization, interoperation, partial success

(three typica l cases):

•load plug-in into single host of VM w/o communication•load plug-in into single host broadcast to rest of VM•load plug-in into every host of VM w/ synchronization

Page 149: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

149

Nimrod - A Job Management System

http://www.dgs.monash.edu.au/~davida/nimrod.html

Page 150: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

150

Job processing with Nimrod

Page 151: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

151

Nimrod/G Architecture

Middleware Services

Nimrod/G Client Nimrod/G ClientNimrod/G Client

Grid Information Services

Schedule Advisor

Trading Manager

Nimrod Engine

GUSTO Test Bed

Persistent Store

Grid Explorer

GE GISTM TS

RM & TS

RM & TS

RM & TS

Dispatcher

RM: Local Resource Manager, TS: Trade Server

Page 152: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

152

User

Application

Resource Broker

A Resource Domain

Grid Explorer

Schedule Advisor

Trade Manager

Job

Control

Agent

Deployment Agent

Trade Server

Resource Allocation

Resource

Reservation

R1

Other

services

Trading

Grid Information Server

R2 RnR

Charging Alg.

Accounting

Compute Power Market

Page 153: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

153

Pointers to Literature on Cluster Computing

Page 154: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

154

Reading Resources..1aInternet & WWW

–Computer Architecture:

• http://www.cs.wisc.edu/~arch/www/

–PFS & Parallel I/O• http://www.cs.dartmouth.edu/pario/

–Linux Parallel Procesing• http://yara.ecn.purdue.edu/~pplinux/Sites/

–DSMs• http://www.cs.umd.edu/~keleher/dsm.html

Page 155: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

155

Reading Resources..1bInternet & WWW

–Solaris-MC

• http://www.sunlabs.com/research/solaris-mc

–Microprocessors: Recent Advances• http://www.microprocessor.sscc.ru

–Beowulf:• http://www.beowulf.org

–Metacomputing• http://www.sis.port.ac.uk/~mab/Metacomputing/

Page 156: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

156

Reading Resources..2Books

– In Search of Cluster• by G.Pfister, Prentice Hall (2ed), 98

–High Performance Cluster Computing

• Volume1: Architectures and Systems

• Volume2: Programming and Applications– Edited by Rajkumar Buyya, Prentice Hall, NJ, USA.

–Scalable Parallel Computing• by K Hwang & Zhu, McGraw Hill,98

Page 157: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

157

Reading Resources..3Journals

– A Case of NOW, IEEE Micro, Feb’95

• by Anderson, Culler, Paterson

– Fault Tolerant COW with SSI, IEEE

Concurrency, (to appear)

• by Kai Hwang, Chow, Wang, Jin, Xu

– Cluster Computing: The Commodity

Supercomputing, Journal of Software

Practice and Experience-(get from my web)

• by Mark Baker & Rajkumar Buyya

Page 158: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

158

Cluster Computing Infoware

http://www.csse.monash.edu.au/~rajkumar/cluster/

Page 159: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

159

Cluster Computing Forum

IEEE Task Force on Cluster Computing

(TFCC)

http://www.ieeetfcc.org

Page 160: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

160

TFCC Activities...

� Network Technologies

� OS Technologies

� Parallel I/O

� Programming Environments

� Java Technologies

� Algorithms and Applications

� >Analysis and Profiling

� Storage Technologies

� High Throughput Computing

Page 161: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

161

TFCC Activities...

� High Availability

� Single System Image

� Performance Evaluation

� Software Engineering

� Education

� Newsletter

� Industrial Wing

� TFCC Regional Activities

– All the above have there own pages, see pointers

from:

– http://www.ieeetfcc.org

Page 162: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

162

TFCC Activities...

�Mailing list, Workshops, Conferences, Tutorials, Web-resources etc.

� Resources for introducing subject in senior undergraduate and graduate levels.

� Tutorials/Workshops at IEEE Chapters..

�….. and so on.

� FREE MEMBERSHIP, please join!

� Visit TFCC Page for more details:

– http://www.ieeetfcc.org (updated daily!).

Page 163: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

163

Clusters Revisited

Page 164: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

164

Summary

�We have discussed Clusters

�Enabling Technologies

�Architecture & its Components

�Classifications

�Middleware

�Single System Image

�Representative Systems

Page 165: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

165

Conclusions

�Clusters are promising..

�Solve parallel processing paradox

�Offer incremental growth and matches with

funding pattern.

�New trends in hardware and software

technologies are likely to make clusters more

promising..so that

�Clusters based supercomputers can be seen

everywhere!

Page 166: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

166

21 00

2 1 00 2 1 00 2 1 00

2 1 00

2 1 00 2 1 00 2 1 002 1 00

Desktop(Single Proces sor?)

SMPs orSuperC om

puters

LocalCluster

GlobalCluster/Grid

PERFORMANCE

Computing Platforms EvolutionComputing Platforms EvolutionBreaking Administrative BarriersBreaking Administrative Barriers

Inter PlanetCluster/Grid ??

IndividualGroupDepart mentCampus

Sta te

NationalGlobe

Inte r Plane tUniverse

Administrative Barriers

EnterpriseCluster/Grid

?

Page 167: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

167

Thank You ...Thank You ...Thank You ...Thank You ...Thank You ...Thank You ...Thank You ...Thank You ...

?

Page 168: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

168

Backup Slides...

Page 169: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

169

SISD : A Conventional Computer

Speed is limited by the rate at which computer can transfer information internally.

ProcessorData Input Data Output

Instru

ctions

Ex:PC, Macintosh, Workstations

Page 170: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

170

The MISD Architecture

More of an intellectual exercise than a practical configuration. Few built, but commercially not available

Data

Input

Stream

Data

Output

Stream

Processor

A

Processor

B

Processor

C

Instruction

Stream A

Instruction

Stream B

Instruction Stream C

Page 171: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

171

SIMD Architecture

Ex: CRAY machine vector processing, Thinking machine cm*

Ci<= Ai * Bi

Instruction

Stream

Processor

A

Processor

B

Processor

C

Data Input

stream A

Data Input

stream B

Data Input

stream C

Data Output

stream A

Data Output

stream B

Data Output

stream C

Page 172: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

172

Unlike SISD, MISD, MIMD computer works asynchronously.

Shared memory (tightly coupled) MIMD

Distributed memory (loosely coupled) MIMD

MIMD Architecture

Processor

A

Processor

B

Processor

C

Data Input

stream A

Data Input

stream B

Data Input

stream C

Data Output

stream A

Data Output

stream B

Data Output

stream C

Instruction

Stream AInstruction

Stream BInstruction

Stream C

Page 173: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

173

M

E

M

O

R

Y

B

U

S

Shared Memory MIMD machine

Comm: Source PE writes data to GM & destination retrieves it

Easy to build, conventional OSes of SISD can be easily be ported

Limitation : reliability & expandability. A memory component or any processor failure affects the whole system.

Increase of processors leads to memory contention.

Ex. : Silicon graphics supercomputers....

M

E

M

O

R

Y

B

U

S

Global Memory System

Processor

A

Processor

B

Processor

C

M

E

M

O

R

Y

B

U

S

Page 174: clusterofinstrumentsinsystempresentation1libvolume3.xyz/.../clusterofinstrumentsinsystempresentation2.pdf · 2 Objectives Learn and Share Recent advances in cluster computing (both

174

M

E

M

O

R

Y

B

U

S

Distributed Memory MIMD

Communication : IPC on High Speed Network.

Network can be configured to ... Tree,Mesh, Cube, etc.

Unlike Shared MIMD

easily/ readily expandable

Highly reliable (any CPU failure does not affect the whole system)

Processor

A

Processor

B

Processor

C

M

E

M

O

R

Y

B

U

S

M

E

M

O

R

Y

B

U

S

Memory

System A

Memory

System B

Memory

System C

IPC

channel

IPC

channel