29
EGEE-III INFSO-RI-222667 Enabling Grids for E-sciencE Feb. 06, 2009 www.eu- egee.org A c a d e m i c a n d E d u c a t i o n a l G r i d I n i t i a t i v e o f S e r b i a A E G I S Introduction to High Performance and Grid Computing Faculty of Sciences, University of Novi Sad Introduction to High Performance and Grid Computing Antun Balaz, [email protected] Scientific Computing Laboratory Institute of Physics Belgrade Serbia High Performance Cluster and Grid Computing Introduction to High Performance and Grid Computing

High Performance Cluster and Grid Computing

Embed Size (px)

DESCRIPTION

High Performance Cluster and Grid Computing. Antun Balaz, [email protected] Scientific Computing Laboratory Institute of Physics Belgrade Serbia. Introduction to High Performance and Grid Computing. Overview. Introduction to clusters High performance computing Grid computing paradigm - PowerPoint PPT Presentation

Citation preview

EGEE-III INFSO-RI-222667

Enabling Grids for E-sciencE

Feb. 06, 2009www.eu-egee.org

Acad

em

ic a

nd E

ducat ional Gr id Init iat ive o

f Serbia

A E G I S

Introduction to High Performance and Grid ComputingFaculty of Sciences, University of Novi Sad

Introduction to High Performance and Grid Computing

Antun Balaz, [email protected] Computing LaboratoryInstitute of Physics BelgradeSerbia

High Performance Cluster and Grid Computing

Introduction to High Performance and Grid Computing

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing

Overview

• Introduction to clusters• High performance computing• Grid computing paradigm• Ingredients for Grid development• Introduction to Grid middleware

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing

Parallel computing

• Splitting problem in smaller tasks that are executed concurrently

• Why?– Absolute physical limits of hardware components (speed

of light, electron speed, …)– Economical reasons –more complex = more expensive– Performance limits –double frequency <> double

performance– Large applications –demand too much memory & time

• Advantages: Increasing speed & optimizing resources utilization

• Disadvantages: Complex programming models –difficult development

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing

Parallelism levels

• CPU– Multiple CPUs– Multiple CPU cores– Threads –time sharing

• Memory– Shared– Distributed– Hybrid (virtual shared memory)

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing

Parallel architectures (1)

• Vector machines– CPU processes multiple data sets– shared memory– advantages: performance, programming difficulties– issues: scalability, price– examples: Cray SV, NEC SX, Athlon3/d, Pentium-

IV/SSE/SSE2

• Massively parallel processors (MPP)– large number of CPUs– distributed memory– advantages: scalability, price– issues: performance, programming difficulties– examples: ConnectionSystemsCM1 i CM2, GAAP

(GeometricArrayParallel Processor)

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing

Parallel architectures (2)

• Symmetric Multiple Processing (SMP)– two or more processors– shared memory– advantages: price, performance, programming

difficulties– issues: scalability– examples: UltraSparcII, Alpha ES, Generic Itanium,

Opteron, Xeon, …

• Non Uniform Memory Access (NUMA)– Solving SMP’sscalability issue– hybrid memory model– advantages: scalability– issues: price, performance, programming difficulties– examples: SGI Origin/Altix, Alpha GS, HP Superdome

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing

Clusters

• Poor’s man supercomputer “…Collection of interconnected stand-alone computers working together as a single, integrated computing resource”–R. Buyya

• Cluster consists of:– Nodes– Network– OS– Cluster middleware

• Standard components– Avoiding expensive proprietary components

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing

Cluster classification

• High performance clusters (HPC)– Parallel, tightly coupled applications

• High throughput clusters (HTC)– Large number of independent tasks

• High availability clusters (HA)– Mission critical applications

• Load balancing clusters– Web servers, mail servers, …

• Hybrid clusters– Example: HPC+HA

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing

Beowulf clusters

• 1994– T. Sterling & M. Baker– NASA Ames Centre

• Frontend– Access machine– JMS & Monitoring server– Shared storage –NFS (directory /home)

• Nodes– Multiple private networks– Local storage (/scratch)

• Private networks– High speed / low latency

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing

From clusters to Grids

• Many distributed computing resources (clusters) exist, even in Serbia

• Problem 1: they cannot be used by end users transparently

• Problem 2: even when access is granted to users to several clusters, they tend to neglect smaller clusters

• Problem 3: distribution of input/output data, sharing of data between clusters

• To overcome such problems, Grid paradigm was introduced

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing

Unifying concept: Grid

Resource sharing and coordinated problem solving in dynamic, multi-institutional virtual organizations.

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing

Effective policy governing access within a collaboration

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing

• Too hard to keep track of authentication data (ID/password) across institutions

• Too hard to monitor system and application status across institutions

• Too many ways to submit jobs

• Too many ways to store & access files/data

• Too many ways to keep track of data

• Too easy to leave “dangling” resources lying around (robustness)

What problems Grid addresses

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing

Requirements

• Security• Monitoring/Discovery• Computing/Processing Power• Moving and Managing Data• Managing Systems• System Packaging/Distribution• Secure, reliable, on-demand access to data,

software, people, and other resources (ideally all via a Web Browser!)

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing

• Resources being used may be valuable & the problems being solved sensitive– Both users and resources need to be careful

• Dynamic formation and management of user groups– Large, dynamic, unpredictable…

• Resources and users are often located in distinct administrative domains- Cannot assume cross-organizational trust agreements– Different mechanisms & credentials

Why Grid security is hard (1)

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing

Why Grid security is hard (2)

• Interactions are not just client/server, but service-to-service on behalf of user– Requires delegation of rights user service– Services may be dynamically instantiated

• Standardization of interfaces to allow for discovery, negotiation and use

• Implementation must be broadly available & applicable– Standard, well-tested, well-understood protocols; integrated with wide

variety of tools

• Policy from sites, user communities and users need to be combined– Varying formats

• Want to hide as much as possible from applications!

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing

Grids and VOs (1)

• Virtual organizations (VOs) are groups of Grid users (authenticated through digital certificates)

• VO Management Service (VOMS) serves as a central repository for user authorization information, providing support for sorting users into a general group hierarchy, keeping track of their roles,etc.

• VO Manager, according to VO policies and rules, authorizes authenticated users to become VO members

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing

Grids and VOs (2)

• Resource centers (RCs) may support one or more VOs, and this is how users are authorized to use computing, storage and other Grid resources

• VOMS allows flexible approach to A&A on the Grid

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing

User view of the Grid

•User Interface

•Grid services

•User Interface

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing

Ingredients for GRID development

• Right balance of push and pull factors is needed

• Supply side Technology – inexpensive HPC resources (linux clusters) Technology – network infrastructure Financing – domestic, regional, EU, donations from industry

• Demand side Need for novel eScience applications Hunger for number crunching power and storage capacity

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing

Supply side - clusters

• The cheapest supercomputers – massively parallel PC clusters

• This is possible due to: Increase in PC processor speed (> Gflop/s) Increase in networking performance (1 Gbs) Availability of stable OS (e.g. Linux) Availability of standard parallel libraries (e.g. MPI)

• Advantages: Widespread choice of components/vendors, low price (by factor ~5-

10) Long warranty periods, easy servicing Simple upgrade path

• Disadvantages: Good knowledge of parallel programming is required Hardware needs to be adjusted to the specific application

(network topology) More complex administration

• Tradeoff: brain power purchasing power• The next step is GRID:

Distributed computing, computing on demand Should “do for computing the same as the Internet did for

information” (UK PM, 2002)

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing

Supply side - network

• Needed at all scales: World-wide Pan-European (GEANT2) Regional (SEEREN2, …) National (NREN) Campus-wide (WAN) Building-wide (LAN)

• Remember – it is end user to end user connection that matters

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing

GÉANT2 Pan-European IP R&E network

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing

GÉANT2 Global Connectivity

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing

Future development: regional network

Ioannina

Mytilini

Chios

Samos

Syros

Komotini

Athens

Rhodos

Agrinio

Preveza

Florina Thessaloniki

Patra

Iraklio

Larissa

Drama

Beroia

Lamia

Livadia

Bucharest

Skopje

Tirana

Belgrade

Banja Luka

Xanthi

Plovdiv

Sofia

Ruse

VelikoTarnovo

Bitola

Serres

Gjirokastra

Vranje

Nis

Kragujevac

Pirot

Novi-Sad

Subotica

Craiova

Timisoara

Turnu Severin

Resita

Chania

Kardzali

Edessa

PrilepTitov Veles

Sevlievo

Elbasan

Ohrid

TepeleneKorce

Arad

Cluj-Napoca

Targo-Mures

Brasov

Ploiesti

Oradea

Szeged

Slatina Pitesti

Zvornik

Bjeljina

Brcko

Doboj

Derventa

Sabac

Vlasenica

Sarajevo

Budapest

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing

Supply side - financing

• National funding (Ministries responsible for research) Lobby gvnmt. to commit to Lisbon targets Level of financing should be following an increasing trend (as a

% of GDP) Seek financing for clusters and network costs

• Bilateral projects and donations• Regional initiatives

Networking (HIPERB) Action Plan for R&D in SEE

• EU funding FP6 – IST priority, eInfrastructures & GRIDs FP7 CARDS

• Other international sources (NATO, …)• Donations from industry (HP, SUN, …)

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing

Demand side - eScience

• Usage of computers in science: Trivial:

text editing, elementary visualization, elementary quadrature, special functions, ...

Nontrivial: differential eq., large linear systems, searching combinatorial spaces, symbolic algebraic manipulations, statistical data analysis, visualization, ...

Advanced: stochastic simulations, risk assessment in complex systems, dynamics of the systems with many degrees of freedom, PDE solving, calculation of partition functions/functional integrals, ...

• Why is the use of computation in science growing?

Computational resources are more and more powerful and available (Moore’s law)

Standard approaches are having problemsExperiments are more costly, theory more difficult

Emergence of new fields/consumers – finance, economy, biology, sociology

• Emergence of new problems with unprecedented storage and/or processor requirements

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing

Demand side - consumers

• Those who study: Complex discrete time phenomena Nontrivial combinatorial spaces Classical many-body systems Stress/strain analysis, crack propagation Schrodinger eq; diffusion eq. Navier-Stokes eq. and its derivates functional integrals Decision making processes w. incomplete information …

• Who can deliver? Those with: Adequate training in mathematics/informatics Stamina needed for complex problems solving

• Answer: rocket scientists (natural sciences and engineering)

Enabling Grids for E-sciencE

EGEE-III INFSO-RI-222667 Introduction to High Performance and Grid Computing

Scenario

ComputingComputing

ElementElement

Submit

Input “sandbox”

Input “sandbox”

Output “sandbox”

Get output

Output “sandbox”

““User User interface”interface”

A worker node is allocated by the local jobmanager

Job S

ub

mit

Even

t

Statu

s / log q

uery

Job status u

pdate

stderr.txt

stdout.txt

Logging and bookkeeping

stderr.txt

stdout.txt

/bin/hostname

stderr.txt

stdout.txt

• STD input stream is read from file• STD out and err. streams are redirected into files

Job status update

publishstate