Upload
creda
View
37
Download
1
Embed Size (px)
DESCRIPTION
High Performance Cluster Computing. CSI668 Xinyang(Joy) Zhang. Overview of Parallel Computing Cluster Architecture & its Components Several Technical Areas Representative Cluster Systems Resources and Conclusions. Outline. Overview of Parallel Computing. - PowerPoint PPT Presentation
Citation preview
High Performance Cluster Computing
CSI668 Xinyang(Joy) Zhang
CSI668 HPCC 2
Outline
Overview of Parallel Computing
Cluster Architecture & its Components
Several Technical Areas
Representative Cluster Systems
Resources and Conclusions
CSI668 HPCC 3
Overview of Parallel Computing
CSI668 HPCC 4
Computing Power (HPC) Drivers
Life ScienceScienceLife ScienceScience E-commerce/anything
Military ApplicationsDigital BiologyDigital Biology
CSI668 HPCC 5
How to Run App. Faster ?
• Use faster hardware: e.g. reduce the time per instruction (clock cycle).
• Optimized algorithms and techniques
• Multiple computers to solve problem: That is, increase No. of instructions executed per clock cycle.
CSI668 HPCC 6
Parallel Processing
Limitations on traditional sequential supercomputer
– physical limit of the speed
– production cost
Rapid increase in the performance of commodity
processors
– Intel x86 architecture chip
– RISC
CSI668 HPCC 7
Parallel Architecture Processors
– amount of processors– processor type
• MIPS, HP PA 8000, Digital Alpha, IBM RIOS, Intel Pentium
Memories
– Distributed Memory, Shared Memory, Distributed Shared Memory (DSM)
Processor/Memory Interaction
– SIMD, MIMD
Interconnection Network– Bus, Ring, Hybrid, etc.
CSI668 HPCC 8
HPC ExamplesProcessors Memory Control Network
SGI/Cray Power Seriers MIPS Shared MIMD Bus
SGI/Cray Origin 2000 MIPS DSM MIMD Hybrid
HP/Convex SPP-2000 HP PA 8000 DSM MIMD Hybrid
SGI/Cray T3E Digital Alpha Distributed MIMD Torus
IBM SP/2 IBM RIOS Distributed MIMD IBM Ring
CSI668 HPCC 9
The Need for Alternative Supercomputing Resources Vast numbers of under utilized workstations
available to use. Huge numbers of unused processor cycles and
resources that could be put to good use in a wide variety of applications areas.
Reluctance to buy Supercomputer due to their cost Distributed compute resources “fit” better into
today's funding model.
CSI668 HPCC 10
What is a cluster?
A cluster is a type of parallel or distributed processing system, which consists of a collection of interconnected stand-alone/complete computers cooperatively working together as a single, integrated computing resource.
CSI668 HPCC 11
Motivation for using Clusters
Recent advances in high speed networks
Performance of workstations and PCs is rapidly improving
Workstation clusters are a cheap and readily available alternative to specialized High Performance Computing (HPC) platforms.
Standard tools for parallel/ distributed computing & their growing popularity
CSI668 HPCC 12
Towards Inexpensive Supercomputing 17 IBM's Netfinity Servers (36 Pentium II
chips) Linux Cluster Cray T3E-900-AC64
Costs :
- IBM $1.5 Million
- Cray $5.5 Million
CSI668 HPCC 13
Cluster Computer and its Components
CSI668 HPCC 14
Cluster Computer Architecture
CSI668 HPCC 15
Cluster Components...1aNodes
Multiple High Performance Components:– PCs– Workstations– SMPs (CLUMPS)
They can be based on different architectures
and running difference OS
CSI668 HPCC 16
Cluster Components...1bProcessors
There are many (CISC/RISC/VLIW/Vector..)
– Intel: Pentiums
– Sun: SPARC, ULTRASPARC
– HP PA
– IBM RS6000/PowerPC
– SGI MPIS
– Digital Alphas
CSI668 HPCC 17
Cluster Components…2 OS
State of the art OS:– Linux (Beowulf)
– Microsoft NT (Illinois HPVM)
– Sun Solaris (Berkeley NOW)
– IBM AIX (IBM SP2)
– Cluster Operating Systems (Solaris MC, MOSIX (academic project) )
– OS gluing layers: (Berkeley Glunix)
CSI668 HPCC 18
Cluster Components…3 High Performance Networks
Ethernet (10Mbps), Fast Ethernet (100Mbps), Gigabit Ethernet (1Gbps) SCI (Dolphin - MPI- 12micro-sec latency) Myrinet (1.2Gbps) Digital Memory Channel FDDI
CSI668 HPCC 19
Cluster Components…4 Communication Software Traditional OS supported facilities (heavy weight
due to protocol processing)..– Sockets (TCP/IP), Pipes, etc.
Light weight protocols (User Level)– Active Messages (Berkeley)– Fast Messages (Illinois)– U-net (Cornell)– XTP (Virginia)
System can be built on top of the above protocols
CSI668 HPCC 20
Cluster Components…5 Cluster Middleware
Resides Between OS and Applications and offers in infrastructure for supporting:– Single System Image (SSI)– System Availability (SA)
SSI makes clusters appear as single machine (globalizes view of system resources).
SA - Check pointing and process migration..
CSI668 HPCC 21
Cluster Components…6a Programming environments
Shared Memory Based– DSM– OpenMP (enabled for clusters)
Message Passing Based– PVM – MPI (portable to SM based as well)
CSI668 HPCC 22
Cluster Components…6bDevelopment Tools
Compilers
– C/C++/Java/ ;
– Parallel programming with C++ (MIT Press book)
Debuggers Performance Analysis Tools Visualization Tools
CSI668 HPCC 23
Several Topics in Cluster Computing
CSI668 HPCC 24
Several Topics in CC
MPI (Message Passing Interface)
SSI (Single System Image)
Parallel I/O & Parallel File System
CSI668 HPCC 25
Message-Passing Model
A Process is a program counter and address space
Interprocess communication consists of
– Synchronization
– Movement of data from one process’s address space to another
CSI668 HPCC 26
What is MPI
A message-passing library specification– extends the message-passing model– not a language or product
For parallel computers, cluster, heterogeneous networks
Designed to provide access to advanced parallel hardware for– end users, library writer, tool developers
CSI668 HPCC 27
Some Basic Concepts
Process can be collected into groups
Each message is sent in a context, must be received in the same context.
A group and context together form a communicator.
A process is identified by its rank in the group associated with a communicator
Default communicator whose group contains all initial processes, called MPI_COMM_WORLD
CSI668 HPCC 28
Basic Set of Functions
• MPI_INIT
• MPI_FINALIZE
• MPI_COMM_SIZE
• MPI_COMM_RANK
• MPI_SEND
• MPI_RECV
• MPI_BCAST
• MPI_REDUCE
CSI668 HPCC 29
A Sample MPI Program...# include <stdio.h>
# include <string.h>
#include “mpi.h”
main( int argc, char *argv[ ])
{
int my_rank; /* process rank */
int p; /*no. of processes*/
int source; /* rank of sender */
int dest; /* rank of receiver */
int tag = 0; /* message tag, like “email subject” */
char message[100]; /* buffer */
MPI_Status status; /* function return status */
/* Start up MPI */
MPI_Init( &argc, &argv );
/* Find our process rank/id */
MPI_Comm_rank( MPI_COM_WORLD, &my_rank);
/*Find out how many processes/tasks part of this run */
MPI_Comm_size( MPI_COM_WORLD, &p);
CSI668 HPCC 30
A Sample MPI Programif( my_rank == 0) /* Master Process */
{
for( source = 1; source < p; source++)
{
MPI_Recv( message, 100, MPI_CHAR, source, tag, MPI_COM_WORLD, &status);
printf(“%s \n”, message);
}
}
else /* Worker Process */
{
sprintf( message, “Hello, I am your worker process %d!”, my_rank );
dest = 0;
MPI_Send( message, strlen(message)+1, MPI_CHAR, dest, tag, MPI_COM_WORLD);
}
/* Shutdown MPI environment */
MPI_Finalise();
}
CSI668 HPCC 31
Execution
% cc -o hello hello.c -lmpi
% mpirun -p2 hello
Hello, I am your worker process 1!
% mpirun -p4 hello
Hello, I am your worker process 1!
Hello, I am your worker process 2!
Hello, I am your worker process 3!
% mpirun hello
(no output, there are no workers.., no greetings)
CSI668 HPCC 32
Single System Image
Problem– each nodes has a certain amount of resources
that can only be used from that node
– This restriction limits the power of a cluster
Solution – implementing a middle-ware layer that glues all
operating systems on all nodes
– offer a unified access to system resources
CSI668 HPCC 33
What is Single System Image (SSI) ? A single system image is the illusion,
created by software or hardware, that presents a collection of resources as one, more powerful resource.
SSI makes the cluster appear like a single machine to the user, to applications, and to the network.
A cluster without a SSI is not a cluster
CSI668 HPCC 34
Key SSI Services Single Entry Point
– telnet cluster.my_institute.edu
– telnet node1.cluster. institute.edu Single File Hierarchy: Solaris MC Proxy Single Control Point: Management from single GUI Single virtual networking Single memory space - Network RAM / DSM Single Job Management: Glunix Single User Interface: Like workstation/PC windowing
environment (CDE in Solaris/NT)
CSI668 HPCC 35
Implementing Layers
Hardware Layers– hardware DSM
Gluing layer (operating system) – single file system, software DSM,
– e.g. Sun Solaris-MC
Applications and subsystem layer – Single window GUI based tool
CSI668 HPCC 36
Parallel I/O
Needed for I/O intensive applications Multiple processes participate. Application is aware of parallelism Preferably the “file” is itself stored on a
parallel file system with multiple disks That is, I/O is parallel at both ends:
– application program– I/O hardware
CSI668 HPCC 37
Parallel File System
A typical PFS: Compute nodes I/O nodes Interconnect Physical distribution of data across multiple disks in
multiple cluster nodes
Sample PFSs– Galley Parallel File System (Dartmouth)
– PVFS (Clemson)
CSI668 HPCC 38
PVFS-Parallel Virtual File System
File System– Allow users to store and retrieve data using common
file access method(open, close, read, write..)
Parallel – Stores data on multiple independent machines, with
separate network connections
Virtual– exists as set of user-space daemons storing data on
local file system
CSI668 HPCC 39
PVFS Components...
Two Servers:• mgr - file manager, handles
metadata for files
• iods - I/O servers, store and retrieve file data
libpvfs:– links clients to PVFS
servers
– hides details of PVFS access from App. Tasks
– multiple interfaces
CSI668 HPCC 40
…PVFS Components
PVFS Linux kernel support
– PVFS kernel module registers PVFS file system
type
– PVFS file system can be mounted
– Converts VFS operations to PVFS operations
– Requests pass through device file
CSI668 HPCC 41
Access PVFS File Through VFS
I/O operations pass
through VFS PVFS code in kernel
pass operation through
device Daemon pvfsd reads requests from /dev/pvfsd Requests converted to PVFS operations by libpvfs, and
send to servers Data passed back through device
CSI668 HPCC 42
Advantages of PVFS
provide high bandwidth for concurrent read/write operations from multiple processes or threads to a common file
support multiple APIs:
– native PVFS API
– UNIX/POSIX I/O API
– MPI-IO ROMIO
Common Unix shell commands work with PVFS files
– ls , cp, rm...
Robust and scalable
Easy to install and use
CSI668 HPCC 43
A Lot More...
Algorithms and Applications
Java Technologies
Software Engineering
Storage Technology
Etc..
CSI668 HPCC 44
Representative Cluster System
CSI668 HPCC 45
Berkeley NOW
100 Sun UltraSparcs
200 disksMyrinet SAN
160 MB/sFast comm.
AM, MPI, ...Global OS
CSI668 HPCC 46
Cluster of SMPs (CLUMPS)
4 Sun E5000s8 processors4 Myricom NICs each
Multiprocessor, Multi-NIC, Multi-Protocol
CSI668 HPCC 47
Beowulf Cluster in SUNY Albany Particle physics group Beowulf Cluster with:
– 8 nodes with Pentium III dual processor– Redhat Linux– MPI– Monte Carlo package
Using for data analysis
CSI668 HPCC 48
Resources And Conclusion
CSI668 HPCC 49
Resources IEEE Task Force on Cluster Computing
– http://www.ieeetfcc.org
Beowulf:– http://www.beowulf.org
PFS & Parallel I/O– http://www.cs.dartmouth.edu/pario/
PVFS– http://parlweb.parl.clemson.edu/pvfs/
CSI668 HPCC 50
Conclusions
Clusters are promising..
Offer incremental growth and matches with funding
pattern.
New trends in hardware and software technologies
are likely to make clusters more promising..so that
Clusters based supercomputers can be seen
everywhere!
CSI668 HPCC 51
Thank You ...Thank You ...
Questions ??Questions ??