48
Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

Embed Size (px)

Citation preview

Page 1: Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

Introduction to Grid Computing

CI-Days workshop Clemson University, Clemson, SC May 19, 2008

1

Page 2: Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

Scaling up Science:Citation Network Analysis in Sociology

2002

1975

1990

1985

1980

2000

1995

Work of James Evans, University of Chicago,

Department of Sociology

2

Page 3: Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

Scaling up the analysis

Query and analysis of 25+ million citations Work started on desktop workstations Queries grew to month-long duration With data distributed across

U of Chicago TeraPort cluster: 50 (faster) CPUs gave 100 X speedup Many more methods and hypotheses can be tested!

Higher throughput and capacity enables deeper analysis and broader community access.

3

Page 4: Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

Computing clusters have commoditized supercomputing

Cluster Management“frontend”

Tape Backup robots

I/O Servers typically RAID fileserver

Disk Arrays Lots of Worker Nodes

A few Headnodes, gatekeepers and

other service nodes

4

Page 5: Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

Initial Grid driver: High Energy Physics

Tier2 Centre ~1 TIPS

Online System

Offline Processor Farm

~20 TIPS

CERN Computer Centre

FermiLab ~4 TIPSFrance Regional Centre

Italy Regional Centre

Germany Regional Centre

InstituteInstituteInstituteInstitute ~0.25TIPS

Physicist workstations

~100 MBytes/sec

~100 MBytes/sec

~622 Mbits/sec

~1 MBytes/sec

There is a “bunch crossing” every 25 nsecs.

There are 100 “triggers” per second

Each triggered event is ~1 MByte in size

Physicists work on analysis “channels”.

Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server

Physics data cache

~PBytes/sec

~622 Mbits/sec or Air Freight (deprecated)

Tier2 Centre ~1 TIPS

Tier2 Centre ~1 TIPS

Tier2 Centre ~1 TIPS

Caltech ~1 TIPS

~622 Mbits/sec

Tier 0Tier 0

Tier 1Tier 1

Tier 2Tier 2

Tier 4Tier 4

1 TIPS is approximately 25,000

SpecInt95 equivalents

Image courtesy Harvey Newman, Caltech 5

Page 6: Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

Grids can process vast datasets. Many HEP and Astronomy experiments consist of:

Large datasets as inputs (find datasets) “Transformations” which work on the input datasets (process) The output datasets (store and publish)

The emphasis is on the sharing of the large datasets Programs when independent can be parallelized.

Montage Workflow: ~1200 jobs, 7 levelsNVO, NASA, ISI/Pegasus - Deelman et al.

Mosaic of M42 created on TeraGrid

= Data Transfer

= Compute Job 6

Page 7: Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

PUMA: Analysis of Metabolism

PUMA Knowledge Base

Information about proteins analyzed against ~2 million gene sequences

Analysis on Grid

Involves millions of BLAST, BLOCKS, and

other processesNatalia Maltsev et al.http://compbio.mcs.anl.gov/puma2 7

Page 8: Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

Mining Seismic data for hazard analysis (Southern Calif. Earthquake Center).

InSAR Image of theHector Mine Earthquake

• A satellitegeneratedInterferometricSynthetic Radar(InSAR) image ofthe 1999 HectorMine earthquake.

• Shows thedisplacement fieldin the direction ofradar imaging

• Each fringe (e.g.,from red to red)corresponds to afew centimeters ofdisplacement.

SeismicHazardModel

Seismicity Paleoseismology Local site effects Geologic structure

Faults

Stresstransfer

Crustal motion Crustal deformation Seismic velocity structure

Rupturedynamics 88

Page 9: Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

Grids Provide Global ResourcesTo Enable e-Science

9

Page 10: Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

Ian Foster’s Grid Checklist

A Grid is a system that:

Coordinates resources that are not subject to

centralized control

Uses standard, open, general-purpose protocols

and interfaces

Delivers non-trivial qualities of service

10

Page 11: Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

Virtual Organizations (VOs) Groups of organizations that use the Grid to share resources

for specific purposes Support a single community Deploy compatible technology and agree on working policies

Security policies - difficult Deploy different network accessible services:

Grid Information Grid Resource Brokering Grid Monitoring Grid Accounting

11

Page 12: Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

What is a grid made of ? Middleware.

Grid Protocols

Grid Resources dedicatedby UC, IU, Boston

GridStorage

GridMiddleware

Co

mp

utin

gC

luster

Grid resource time purchasedfrom commercial provider

GridMiddleware

Co

mp

utin

gC

luster

Grid resources sharedby OSG, LCG, NorduGRID

GridMiddleware

Co

mp

utin

gC

luster

Grid Client

ApplicationUser

Interface

GridMiddleware

Resource,WorkflowAnd DataCatalogs

GridStorage

GridStorage

Security to control access and protect communication (GSI) Directory to locate grid sites and services: (VORS, MDS) Uniform interface to computing sites (GRAM) Facility to maintain and schedule queues of work (Condor-G) Fast and secure data set mover (GridFTP, RFT) Directory to track where datasets live (RLS)

12

Page 13: Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

We will focus on Globus and Condor Condor provides both client & server scheduling

In grids, Condor provides an agent to queue, schedule and manage work submission

Globus Toolkit (GT) provides the lower-level middleware Client tools which you can use from a command line APIs (scripting languages, C, C++, Java, …) to build

your own tools, or use direct from applications Web service interfaces Higher level tools built from these basic components,

e.g. Reliable File Transfer (RFT)

13

Page 14: Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

Grid software is like a “stack”

Grid Security Infrastructure

JobManagement

DataManagement

Information(config & status)

Services

Core Globus Services

Standard Network Protocols

Condor Grid Client

Workflow system (explicit or ad-hoc)

Grid Application (often includes a Portal)

14

Page 15: Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

Provisioning

Grid architecture is evolving to a Service-Oriented approach.

Service-oriented Gridinfrastructure Provision physical

resources to support application workloads

ApplnService

ApplnService

Users

Workflows

Composition

Invocation

Service-oriented applications Wrap applications as

services Compose applications

into workflows

“The Many Faces of IT as Service”, Foster, Tuecke, 2005

...but this is beyond our workshop’s scope.See “Service-Oriented Science” by Ian Foster.

15

Page 16: Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

Job and resource management

16

Page 17: Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

Local Resource Manager: a batch schedulerfor running jobs on a computing cluster Popular LRMs include:

PBS – Portable Batch System LSF – Load Sharing Facility SGE – Sun Grid Engine Condor – Originally for cycle scavenging, Condor has evolved

into a comprehensive system for managing computing LRMs execute on the cluster’s head node Simplest LRM allows you to “fork” jobs quickly

Runs on the head node (gatekeeper) for fast utility functions No queuing (but this is emerging to “throttle” heavy loads)

In GRAM, each LRM is handled with a “job manager”

17

Page 18: Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

GRAM provides a uniform interfaceto diverse cluster schedulers.

User

Grid

VO

VO

VO

VO

Condor

PBS

LSF

UNIX fork()

GRAM

Site A

Site B

Site C

Site D

19

Page 19: Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

July 11-15, 2005 Lecture3: Grid Job Management

20

Condor-G:Grid Job Submission Manager

Globus GRAM ProtocolGatekeeper & Job Manager

Submit to LRM

Organization A Organization B

Condor-G

myjob1myjob2myjob3myjob4myjob5…

Page 20: Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

Data Management

21

Page 21: Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

Data management services provide the mechanisms to find, move and share data GridFTP

Fast, Flexible, Secure, Ubiquitous data transport Often embedded in higher level services

RFT Reliable file transfer service using GridFTP

Replica Location Service (RLS) Tracks multiple copies of data for speed and reliability Emerging services integrate replication and transport

Metadata management services are evolving

22

Page 22: Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

GridFTP examples

globus-url-copy

file:///home/YOURLOGIN/dataex/myfile

gsiftp://osg-edu.cs.wisc.edu/nfs/osgedu/YOURLOGIN/ex1

globus-url-copy

gsiftp://osg-edu.cs.wisc.edu/nfs/osgedu/YOURLOGIN/ex2

gsiftp://tp-osg.ci.uchicago.edu/YOURLOGIN/ex3

Page 23: Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

Grids replicate data files for faster access Effective use of the grid resources – more

parallelism Each logical file can have multiple physical

copies Avoids single points of failure Manual or automatic replication

Automatic replication considers the demand for a file, transfer bandwidth, etc.

25

Page 24: Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

File Catalog Services Replica Location Service (RLS) Phedex RefDB / PupDB

Requirements Abstract the logical file name (LFN) for a physical file maintain the mappings between the LFNs and the PFNs

(physical file names) Maintain the location information of a file

File catalogues tell you where the data is

26

Page 25: Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

RLS -Replica Location Service• RLS

• component of the data grid architecture (Globus component)• It provides access to mapping information from logical names to

physical names of items• Its goal is to reduce access latency, improve data locality, improve robustness, scalability and performance for distributed applications

• RLS produces replica catalogs (LRCs), which represent mappings between logical and physical files scattered across the storage system. • For better performance, the LRC can be indexed.

Page 26: Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

RLS -Replica Location Service RLS maps logical filenames to physical

filenames. Logical Filenames (LFN)

Names a file with interesting data in it Doesn’t refer to location (which host, or where in a

host) Physical Filenames (PFN)

Refers to a file on some filesystem somewhere Often use gsiftp:// URLs to specify

Two RLS catalogs: Local Replica Catalog (LRC) and Replica Location Index (RLI)

Page 27: Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

Local Replica Catalog (LRC)

stores mappings from LFNs to PFNs. Interaction:

Q: Where can I get filename ‘experiment_result_1’? A: You can get it from gsiftp://gridlab1.ci.uchicago.edu/home/benc/r.txt

Undesirable to have one of these for whole grid Lots of data Single point of failure

Page 28: Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

Replica Location Index (RLI) stores mappings from LFNs to LRCs. Interaction:

Q: Who can tell me about filename ‘experiment_result_1’.

A: You can get more info from the LRC at gridlab1 (Then go to ask that LRC for more info)

Failure of one RLI or LRC doesn’t break everything

RLI stores reduced set of information, so can cope with many more mappings

Page 29: Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

Globus RLS

file1→ gsiftp://serverA/file1file2→ gsiftp://serverA/file2

LRC

RLIfile3→ rls://serverB/file3file4→ rls://serverB/file4

rls://serverA:39281

file1file2

site A

file3→ gsiftp://serverB/file3file4→ gsiftp://serverB/file4

LRC

RLIfile1→ rls://serverA/file1file2→ rls://serverA/file2

rls://serverB:39281

file3file4

site B

Page 30: Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

Storage Resource Managers (SRMs) are Storage Resource Managers (SRMs) are middleware componentsmiddleware components whose function is to provide

dynamic space allocation file management

on shared storage resources on the Grid

Different implementations for underlying storage systems are based on the same SRM specification

What is SRM?What is SRM?

Page 31: Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

SRMs role in the data grid architectureSRMs role in the data grid architecture Shared storage space allocation & reservation

important for data intensive applications

Get/put files from/into spaces archived files on mass storage systems

File transfers from/to remote sites, file replication

Negotiate transfer protocols

File and space management with lifetime

support non-blocking (asynchronous) requests

Directory management

Interoperate with other SRMs

SRMs role in gridSRMs role in grid

Page 32: Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

SRM: Main conceptsSRM: Main concepts Space reservations

Dynamic space management

Pinning file in spaces

Support abstract concept of a file name: Site URL

Temporary assignment of file names for transfer: Transfer URL

Directory management and authorization

Transfer protocol negotiation

Support for peer to peer request

Support for asynchronous multi-file requests

Support abort, suspend, and resume operations Non-interference with local policies

Page 33: Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

Birmingham•

The Globus-BasedLIGO Data Grid

Replicating >1 Terabyte/day to 8 sites>40 million replicas so farMTBF = 1 month

LIGO Gravitational Wave Observatory

Cardiff

AEI/Golm

35

Page 34: Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

Grid Security

36

Page 35: Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

Grid security is a crucial component Problems being solved might be sensitive Resources are typically valuable Resources are located in distinct administrative

domains Each resource has own policies, procedures, security

mechanisms, etc. Implementation must be broadly available &

applicable Standard, well-tested, well-understood protocols;

integrated with wide variety of tools

37

Page 36: Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

Grid Security Infrastructure - GSI Provides secure communications for all the higher-level

grid services Secure Authentication and Authorization

Authentication ensures you are whom you claim to be ID card, fingerprint, passport, username/password

Authorization controls what you are permitted to do Run a job, read or write a file

GSI provides Uniform Credentials Single Sign-on

User authenticates once – then can perform many tasks

38

Page 37: Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

National GridCyberinfrastructure

39

Page 38: Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

Open Science Grid (OSG) provides shared computing resources, benefiting a broad set of disciplines

OSG incorporates advanced networking and focuses on general services, operations, end-to-end performance

Composed of a large number (>50 and growing) of shared computing facilities, or “sites”

http://www.opensciencegrid.org/

A consortium of universities and national laboratories, building a sustainable grid infrastructure for science.

40

Page 39: Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

www.opensciencegrid.org

Diverse job mix

Open Science Grid70 sites (20,000 CPUs) & growing400 to >1000 concurrent jobsMany applications + CS experiments;

includes long-running production operations

41

Page 40: Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

TeraGrid provides vast resources via a number of huge computing facilities.

42

Page 41: Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

To efficiently use a Grid, you must locate and monitor its resources. Check the availability of different grid sites Discover different grid services Check the status of “jobs” Make better scheduling decisions with

information maintained on the “health” of sites

43

Page 42: Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

OSG Resource Selection Service: VORS

44

Page 43: Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

Grid Workflow

45

Page 44: Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

A typical workflow pattern in image analysis runs many filtering apps.3a.h

align_warp/1

3a.i

3a.s.h

softmean/9

3a.s.i

3a.w

reslice/2

4a.h

align_warp/3

4a.i

4a.s.h 4a.s.i

4a.w

reslice/4

5a.h

align_warp/5

5a.i

5a.s.h 5a.s.i

5a.w

reslice/6

6a.h

align_warp/7

6a.i

6a.s.h 6a.s.i

6a.w

reslice/8

ref.h ref.i

atlas.h atlas.i

slicer/10 slicer/12 slicer/14

atlas_x.jpg

atlas_x.ppm

convert/11

atlas_y.jpg

atlas_y.ppm

convert/13

atlas_z.jpg

atlas_z.ppm

convert/15

Workflow courtesy James Dobson, Dartmouth Brain Imaging Center 46

Page 45: Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

Conclusion: Why Grids?

New approaches to inquiry based on Deep analysis of huge quantities of data Interdisciplinary collaboration Large-scale simulation and analysis Smart instrumentation Dynamically assemble the resources to tackle a new

scale of problem Enabled by access to resources & services without

regard for location & other barriers

47

Page 46: Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

Grids:Because Science needs community … Teams organized around common goals

People, resource, software, data, instruments…

With diverse membership & capabilities Expertise in multiple areas required

And geographic and political distribution No location/organization possesses all required skills and

resources

Must adapt as a function of the situation Adjust membership, reallocate responsibilities, renegotiate

resources

48

Page 47: Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

What’s next ?

https://twiki.grid.iu.edu/twiki/bin/view/Education/NextSteps

Take the self-paced course:

opensciencegrid.org/OnlineGridCourse

Contact us:

[email protected]

Periodically check OSG site:

opensciencegrid.org

Page 48: Introduction to Grid Computing CI-Days workshop Clemson University, Clemson, SC May 19, 2008 1

Acknowledgements (OSG, Globus, Condor teams and associates)

Gabrielle Allen, LSU

Rachana Ananthakrishnan, ANL

Ben Clifford, Uchicago

Alex Sim, BNL

Alain Roy, UW - Madison

Mike Wilde, ANL

50