41
October, 2003 – Linkoping, Sweden Andrew Grimshaw Department of Computer Science, Virginia CTO & Founder Avaki Corporation From Clusters to Grids

From Clusters to Grids

  • Upload
    jack

  • View
    32

  • Download
    0

Embed Size (px)

DESCRIPTION

From Clusters to Grids. October, 2003 – Linkoping, Sweden Andrew Grimshaw Department of Computer Science, Virginia CTO & Founder Avaki Corporation. Agenda. Grid Computing Background Legion Existing Systems & Standards Summary. Grid Computing. First: What is a Grid System?. - PowerPoint PPT Presentation

Citation preview

Page 1: From Clusters to Grids

October, 2003 – Linkoping, Sweden

Andrew Grimshaw

Department of Computer Science, Virginia

CTO & Founder Avaki Corporation

From Clusters to GridsFrom Clusters to Grids

Page 2: From Clusters to Grids

2

Agenda

• Grid Computing Background

• Legion

• Existing Systems & Standards

• Summary

Page 3: From Clusters to Grids

3

Grid Computing

Page 4: From Clusters to Grids

4

First: What is a Grid System?

A Grid system is a collection

of distributed resources

connected by a network

Examples of Distributed Resources: Desktop Handheld hosts Devices with embedded processing resources such as

digital cameras and phones Tera-scale supercomputers

Page 5: From Clusters to Grids

5

A grid enables users to collaborate securely by sharing processing, applications, and data across heterogeneous systems and administrative domains for collaboration, faster application execution and easier access to data.• Compute Grids • Data Grids

What is a Grid?

A grid is all about gathering together resources and making them accessible to users and applications.

Page 6: From Clusters to Grids

6

What are the characteristics of a Grid system?

Numerous Resources

Ownership by MutuallyDistrustful Organizations

& Individuals

Potentially FaultyResources

Different SecurityRequirements

& Policies Required

Resources areHeterogeneous

GeographicallySeparated

Different ResourceManagementPolicies

Connected byHeterogeneous, Multi-Level Networks

Page 7: From Clusters to Grids

7

What are the characteristics of a Grid system?

Numerous Resources

Ownership by MutuallyDistrustful Organizations

& Individuals

Potentially FaultyResources

Different SecurityRequirements

& Policies Required

Resources areHeterogeneous

GeographicallySeparated

Different ResourceManagementPolicies

Connected byHeterogeneous, Multi-Level Networks

Page 8: From Clusters to Grids

8

Technical Requirements of a Successful Grid Architecture

Simple Secure Scalable Extensible Site Autonomy Persistence & I/O Multi-Language Legacy Support Single Namespace Transparency Heterogeneity Fault-tolerance & Exception Management

Success requires an integrated solution

ANDflexible policy

Manage Complexity!!

Page 9: From Clusters to Grids

9

Implication:Complexity is THE Critical Challenge

How should complexity be addressed?

Page 10: From Clusters to Grids

10

Robustness

Time & Cost

Low

Low

HighSockets & Shells

Low

High

A low-level or “socket & shell” approach is low in robustness & high in time and cost to develop.

Integrated Solution

Low

High

An integrated approach is high in robustness and low in time and cost to develop.

As Application Complexity Increases, Differences Between the Systems Increase Dramatically

High

High-level versus low-level solutions

Page 11: From Clusters to Grids

11

The Importance of Integration in a Grid Architecture

If separate pieces are used, then the programmer must integrate the solutions.

If all the pieces are not present, then the programmer must develop enough of the missing pieces to support the application.

Bottom Line: Both raise the bar by putting the cognitive burden on the programmer.

Page 12: From Clusters to Grids

12

• Simple cycle aggregation • State of the state is essentially scheduling and

queuing for CPU cluster management• These definitions are selling short the promise of

Grid technology• AVAKI believes grids are not just about aggregating

and scheduling CPU cycles but also …• Virtualizing many types of resources, internally and across

domains• Empowering anyone to have secure access to any and all

resources through easy administration

Misconceptions about Grids

Page 13: From Clusters to Grids

13

• Sons of SETI@home • United Devices, Entropia, Data Synapse• Low-end, desktop cycle aggregation• Hard sell in corporate America

• Cluster Load Management • LSF, PBS, SGE• High end, great for management of local clusters but not

well proven in multi-cluster environments

• As soon as you go outside of the local cluster to cross-domain multi-cluster, the game changes dramatically with the introduction of three major issues:

• Data • Security• Administration

Compute Grids Categories

To address these issues, you need a fully-integrated solution, or a toolkit to build one

Page 14: From Clusters to Grids

14

Typical Grid Scenarios

Desktop Cycle Aggregation• Desktop only• United Devices, Entropia, Data Synapse

Cluster & Departmental Grids• Single owner, platform, domain, file system and location• SUN SGE, Platform LSF, PBS

Enterprise Grids• Single enterprise; multiple owners, platforms, domains, file systems, locations, and security policies• SUN SGE EE, Platform Multi-cluster

Global Grids• Multiple enterprises, owners, platforms, domains, file systems, locations, and security policies• Legion, Avaki, Globus

Page 15: From Clusters to Grids

15

What are grids being used for today?

• Multiple sites with multiple data sources (public and private)

• Need secure access to data and applications for sharing

• Have partnership relationships with other organizations: internal, partners, or customers

• Computationally challenging applications

• Distributed R&D groups across company, networks and geographies

• Staging large files

• Want to utilize and leverage heterogeneous compute resources

• Need for accounting of resources

• Need to handle multiple queuing systems• Considering purchasing compute cycles for spikes in demand

Page 16: From Clusters to Grids

16

Legion

Page 17: From Clusters to Grids

17

Legion Grid Software

Desktop Server

Users

Wide-area access to data, processing and application

resources in a single, uniform operating environment that is secure and easy to administer

Server ApplicationData Server Data Cluster

ApplicationsLegion Grid Capabilities Wide-area data access Distributed processing Global naming Policy-based

administration Resource accounting Fine-grained security Automatic failure

detection and recovery

Legion G R I DLegion G R I D

Load Mgmt & Queuing

VendorDepartment BDepartment APartner

ApplicationData

Load Mgmt & Queuing

Page 18: From Clusters to Grids

18

Legion Combines Data and Compute Grid

Users Applications

Legion G R I DLegion G R I D

Desktop ServerServer ApplicationData Server Data Cluster

Load Mgmt & Queuing

VendorDepartment BDepartment APartner

ApplicationData

Load Mgmt & Queuing

Page 19: From Clusters to Grids

19

The Legion Data Grid

Page 20: From Clusters to Grids

20

Data Grid

Users

Wide-area access to data at its source location based on business

policies, eliminating manual copying and errors caused by accessing

out-of-date copies

Applications

Desktop ServerServer ApplicationData Server Data Cluster

VendorDepartment BDepartment APartner

Application

Legion G R I DLegion G R I D

Data

Data Grid Capabilities

Federates multiple data sources

Provides global naming Works with local and

virtual file systems – NFS, XFS, CIFS

Accesses data in DAS, NAS, SAN

Uses standard interfaces Caches data locally

Page 21: From Clusters to Grids

21

Data Grid Share

Users Applications

Linux NT Solaris Solaris

Tools VendorResearch CenterHeadquartersInformatics Partner

Data mapped to Grid namespace via Legion ExportDir

Legion Data Grid transparently handles client and application requests, maps them

to the global namespace, and returns the data

Page 22: From Clusters to Grids

22

Data Grid Access

ServerRD - 2

App_APM-1 ClusterHQ - 1

sequence_b Cluster BLAST sequence_csequence_a

Tools VendorResearch CenterHeadquartersInformatics Partner

Users Applications

Fine-grained Security

Access Point

• Access files using standard NFS protocol or Legion commands

- NFS security issues eliminated- Caches exploit semantics

• Access files using global name• Access based on specified privileges

Page 23: From Clusters to Grids

23

Data Grid Access using virtual NFS

Partner

Fine-grained Security

Department A Department B

Legion-NFS

Complexity = Servers + Clients• Clients mount grid• Servers share files to grid• Clients access data using NFS protocol• Wide-area access to data outside administrative domain

sequence_csequence_a

Page 24: From Clusters to Grids

24

Keeping Data in the grid

• Legion storage servers• Data is copied into Legion storage servers

that execute on a set of hosts.• The particular set hosts used is a

configuration option - here five hosts are used

• Access to the different files is completely independent and asynchronous

• Very high sustained read/write bandwidth is possible using commodity resources

a

d e

b

f

c

g h

/

Local Disk

Local Disk

Local Disk

Local Disk

Local Disk

Page 25: From Clusters to Grids

25

I/O Performance

0

20

40

60

80

100

120

140

160

180

200

Ban

dw

idth

(M

B/s

ec)

1 10 20 30 40 50

Number of readers

Large Read Aggregate Bandwidth

NFS lnfsd LegionFS

Read performance in NFS, Legion-NFS, and Legion I/Olibraries. The x axis indicates the number of clients that simultaneously perform 1 MB reads on 10 MB files, and the y axis indicates total read bandwidth. All results are the average of multiple runs. All clients on 400 MHZ Intel’s, NFS server on 800 MHZ Intel server.

Page 26: From Clusters to Grids

26

Data Grid Benefits

• Easy, convenient, wide-area access to data – regardless of location, administrative domain or platform

• Eliminates time-consuming copying and obtaining accounts on machines where data resides

• Provides access to the most recent data available• Eliminates confusion and errors caused by inconsistent

naming of data• Caches remote data for improved performance• Requires no changes to legacy or commercial applications• Protects data with fine-grained security and limits access

privileges to those required • Eases data administration and management• Eases migration to new storage technologies

Page 27: From Clusters to Grids

27

The Legion Compute Grid

Page 28: From Clusters to Grids

28

Compute Grid

Users

Wide-area access to processing resources based on business policies,

managing utilization of processing resources for fast, efficient job

completion

Applications

Desktop Server ApplicationServer ApplicationData Server Data Cluster

VendorDepartment BDepartment APartner

Application

Legion G R I DLegion G R I D

Compute Grid Capabilities

Job scheduling and priority-based queuing

Easy integration with third party load management and queuing software

Automatic staging of data and applications

Efficient processing of both sequential and parallel applications

Failure detection and recovery

Usage accounting

Page 29: From Clusters to Grids

29

Fine-grained Security

Compute Grid Access

SolarisServerRD - 2

NT ServerPM-1

Data ClusterHQ - 1

Data Linux Cluster

BLAST

Tools VendorResearch CenterHeadquartersInformatics Partner

Scheduling, Queuing, Usage Management, Accounting, Recovery

Login/SubmissionLogin/Submission

• The grid:­Locates resources­Authenticates and grants access privileges­Stages applications and data­Detects failures and recovers­Writes output to specified location­Accounts for usage

App_AData

Users Applications

Page 30: From Clusters to Grids

30

Tools - All are cross-platform

• MPI• P-space studies - multi-run• Parallel C++• Parallel object-based

Fortran• CORBA binding• Object migration• Accounting

• legion_make - remote builds

• Fault-tolerant MPI libraries • post-mortem debugger• “console” objects• parallel 2D file objects• Collections

Page 31: From Clusters to Grids

31

One Favorite

Page 32: From Clusters to Grids

32

Related Work

Page 33: From Clusters to Grids

33

Related Work

• Avaki• All distributed systems literature• Globus• AFS/DFS• LSF, PBS, ….• Global Grid Forum - OGSA

Page 34: From Clusters to Grids

34

Avaki Company Background• Grid Pioneers - a Legion spin-off

• Over $20M capitalization

• The only commercial grid software provider with a solution that addresses data access, security, and compute power challenges

• Standards efforts leader

Partners StandardsOrganizations

Customers

Page 35: From Clusters to Grids

35

AFS/DFS comparison with Legion Data Grid

• AFS presumes that all files kept in AFS - no federation with other file systems. Legion allows data to be kept in Legion, or in an NFS, XFS, PFS, or Samba file system.

• AFS presumes all sites using Kerberos and that realms “trust” each other - Legion assumes nothing about local authentication mechanism and there is no need for cross-realm trust

• AFS semantics are fixed - copy on open - Legion can support multiple semantics. Default is Unix semantics.

• AFS volume oriented (sub-tree’s) - Legion can be volume oriented or file oriented

• AFS caching semantics not extensible - Legion caching semantics are extensible

Page 36: From Clusters to Grids

36

Legion & Globus GT2

• Projects with many common goals:• Metacomputing (or the “Grid”)• Middleware for wide-area systems• Heterogeneous resource sets• Disjoint administrative domains• High-performance, large-scale applications

Page 37: From Clusters to Grids

37

Legion Specific Goals

• Shared collaborative environment including shared file system

• Fault-tolerance and high-availability

• Both HPC applications and distributed applications

• Complete security model including access control

• Extensible

• Integrated - create a meta-operating system

Page 38: From Clusters to Grids

38

Many “Similar” Features

• Resource Management Support • Message-passing libraries

• e.g., MPI

• Distributed I/O Facilities• Globus GASS/remote I/O vs. Avaki Data Grid

• Security Infrastructure

Page 39: From Clusters to Grids

39

• The “toolkit” approach• Provide services as separate libraries

• E.g. Nexus, GASS, LDAP

• Pros:• Decoupled architecture

• easy to add new services into the mix• Low buy-in: use only what you like!

• In practice all the pieces use each other

• Cons:• No unifying abstractions

• very complex environment to learn in full• composition of services difficult as number of services grows

• Interfaces keep changing due to ever evolving design

• Does not cover space of problems

Globus

Page 40: From Clusters to Grids

40

Standards: GGF

Background:

• Grid standards are now being developed at the Global Grid Forum (GGF)

• In-development standard, Open Grid Services Infrastructure (OGSI) will extend Web Services (SOAP/XML, WSDL, etc.)

• Names and a two level name scheme

• Factories and lifetime management

• Mandatory set of interfaces, e.g., discovery interfaces

• OGSA – Open Grid Services Architecture• Over-arching architecture

• Still in development

Page 41: From Clusters to Grids

41

Summary

• Grids are about resource federation and sharing• Grids are here today. They are being used in production computing

in industry to solve real problems and provide real value.• Compute Grids• Data Grids

• We believe that users want high-level abstractions - and don’t want to think about the grid.

• Need low activation energy and legacy support

• There are a number of challenges to be solved - and different applications and organizations want to solve them differently

• Policy heterogeneity• Strong separation of policy and mechanism

• Several areas where really good policies are still lacking• Scheduling• Security and security policy interactions• Failure recovery (and the interaction of different policies)