View
218
Download
3
Embed Size (px)
Citation preview
Tiziana Ferrari An Introduction to Grid Computing 1
An Introduction to Grid Computing
[email protected] – CNAF
Corso di Laurea specialistica in InformaticaAnno Acc. 2005/2006
Sources: 1 .Introduction to Grid Computing, Globus Project, Argonne National Laboratory, http://www.globug.org/2. The Anatomy and Physiology of the Grid, B.Ramamurthy, 10/4/2004
Tiziana Ferrari An Introduction to Grid Computing 2
Outline
• PART I: Introduction to Grid Computing
• PART II: Some Definitions
• PART III: Grid Architecture
Tiziana Ferrari An Introduction to Grid Computing 4
The Grid Problem
• Purpose: – flexible, secure, coordinated resource sharing among dynamic
collections of individuals, institutions, and resources From “The Anatomy of the Grid: Enabling Scalable Virtual Organizations”
– enable “groups of users (virtual organizations)” to share geographically distributed resources as they pursue common goals – assuming the absence of…
• central location,
• central control,
• omniscience,
• existing trust relationships.
Tiziana Ferrari An Introduction to Grid Computing 5
Virtual Organizations• Virtual organization (VO): a set of individuals and/or institutions
identified by the same set of rules, which define:– the resources shared (what);
– the individual users allowed to share (who);
– conditions under which sharing occurs (how).
• VOs represent “community overlays” on classic organization structures. They can be large or small, static or dynamic.
• VO membership:– Single users and users of the same institution can be members of different
VOs
u1
u2
u3
u5
u9
VO3VO2VO1
u8
u4
u7
u6
Tiziana Ferrari An Introduction to Grid Computing 6
Virtual Organizations (cont)
• Examples:– An industrial consortium – Students from different university departments
using computing power for their simulation projects
– physicists from different research institutions involved in a the same experiment implementation, using the Grid for analysis of the data generated by the experiment
– Astronomers from various research institutes analyzing data gathered by multiple telescopes all over the world
Tiziana Ferrari An Introduction to Grid Computing 7
Authentication and authorization
• Resource sharing requires owners to make resources available, subject to contraints on when, where and what can be done
• This requires:
– Policies and mechanisms to express them in Policy Decision Points
– Authentication: the establishment of the identity of a consumer
– Authorization: determining whether an operation is consistent with resource sharing rules applicable to the consumer at Policy Enforcement Points
Tiziana Ferrari An Introduction to Grid Computing 8
Elements of the Problem
• Resource sharing: sharing issues today are not adequately addressed by existing technologies.– Complex requirements: “run program X at site Y subject to
community policy P, providing access to data at Z according to policy Q”
– High performance: unique demands of advanced & high-performance systems
– Heterogeneous resources: computers, storage, sensors, networks, …
– Sharing always conditional: issues of trust, policy, negotiation, payment, …
• Coordinated problem solving– It extends the simple client-server: distributed data analysis and
computation
Tiziana Ferrari An Introduction to Grid Computing 9Image courtesy Harvey Newman, Caltech
Example: usage of Data Grids in High Energy Physics
Tier2 Centre ~1 TIPS
Online System
Offline Processor Farm
~20 TIPS
CERN Computer Centre
FermiLab ~4 TIPSFrance Regional Centre
Italy Regional Centre
Germany Regional Centre
InstituteInstituteInstituteInstitute ~0.25TIPS
Physicist workstations
~100 MBytes/sec
~100 MBytes/sec
~622 Mbits/sec
~1 MBytes/sec
There is a “bunch crossing” every 25 nsecs.
There are 100 “triggers” per second
Each triggered event is ~1 MByte in size
Physicists work on analysis “channels”.
Each institute will have ~10 physicists working on one or more channels; data for these channels should be cached by the institute server
Physics data cache
~PBytes/sec
~622 Mbits/sec or Air Freight (deprecated)
Tier2 Centre ~1 TIPS
Tier2 Centre ~1 TIPS
Tier2 Centre ~1 TIPS
Caltech ~1 TIPS
~622 Mbits/sec
Tier 0Tier 0
Tier 1Tier 1
Tier 2Tier 2
Tier 4Tier 4
1 TIPS is approximately 25,000
SpecInt95 equivalents
Tiziana Ferrari An Introduction to Grid Computing 10
Why now? The network
• The Internet: universal connectivity
• Network vs. computer performance– Computer speed doubles every
18 months
– Network speed doubles every 9 months
– Difference = order of magnitude per 5 years
Moore’s Law vs. storage improvements vs. optical improvements. Graph from Scientific American (Jan-2001) by Cleo Vilett, source Vined Khoslan, Kleiner, Caufield and Perkins.
Tiziana Ferrari An Introduction to Grid Computing 11
GEANT2: the European research backbone – topology (Oct 2005)
Dark fiber
10 Gb/s
2.5 Gb/s
Tiziana Ferrari An Introduction to Grid Computing 13
Some Important Definitions
1. Grid Computing
2. Resource
3. Protocol
4. Network enabled service
5. Application Programmer Interface (API)
6. Software Development Kit (SDK)
7. Syntax
Tiziana Ferrari An Introduction to Grid Computing 14
1. Grid Computing (1/4)
• Early definition: – “We will probably see the spread of computer utilities, which, like
present electric and telephone utilities, will service individual homes and offices across the country” (Len Kleinrock, 1969)
• The Grid:– “A computational Grid is a hardware and software infrastructure that
provides dependable, consistent, pervasive and inexpensive access to high-end comptational capabilities” (I.Foster, C.Kesselman: The Grid: Blueprint for a New Computing Infrastructure”, 1998)
– and: “because of the focus on dynamic cross-organization sharing, Grid technologies complement rather than compete with esisting distributed computing technologies” (I.Foster et al., The Anatomy of the Grid, 2001)
Tiziana Ferrari An Introduction to Grid Computing 15
1. Grid computing: characteristics (2/4)
• The three fundamental properties of Grid computing:
1. Large-scale coordinated management of resources belonging to different administrative domains (multi-domain vs single domain) Grid computing involves multiple management systems
2. Standard, open, multi-purpose protocols and interfaces that provide a range of services (standard vs proprietary) Grid computing supports heterogeneous user applications
3. Delivery of complex Quality of Service (QoS): Grid computing allows its contituent resources to be used in a coodinated fashion to deliver various types of QoS, such as resposed time, throughput, avaiability, reliability, security, etc.
Tiziana Ferrari An Introduction to Grid Computing 16
1. Grid Computing (3/4)
• Examples of non-Grid systems:– Cluster management systems on a parallel computer or on a Local Area
Network• Sun Grid Engine
• Load Sharing Facility (LSF, by Platform)
• Portable Bach System (PBS, by Veridian)
Complete kwowledge of system state and user’s requests centralized control
– The World Wide Web:• Open
• Based on general-purpose protocols accessing distributed resources
• No coordinated use of independent resources (no protocols for negotiation and sharing, yet)
Tiziana Ferrari An Introduction to Grid Computing 17
1. Grid Computing (4/4)
• The importance of standardization:
– The Grid:
• open, general-purpose and using standard protocols
– A Grid:
• no standardization and interoperability between services – current situation
– Similarly: an Internet (based on proprietary protocols, as in the early ages of networking) vs the Internet (based on the IP protocol)
Tiziana Ferrari An Introduction to Grid Computing 18
2. Resource
• An entity that is to be shared– E.g., computers, storage, data, software
• Does not have to be a physical entity– E.g., Condor pool, distributed file system, …
• Defined in terms of interfaces, not devices– E.g. scheduler such as LSF and PBS define a compute
resource
– Open/close/read/write define access to a distributed file system, e.g. NFS, AFS, DFS
Tiziana Ferrari An Introduction to Grid Computing 19
3. Grid Protocol• A formal description of message formats and a set of rules for message
exchange, which defines one of the basic mechanisms of Grid Computing– Rules may define sequence of message exchanges– Protocol may define state-change in end-points triggered by a given sequence
of exchanged messages, e.g., file system state change
• Protocols, some examples:– Management of credentials and policies in case of multi-domain resources– Secure remote access– Co-allocation of multiple resources– Information query protocols– Data management protocols
• Good protocols are designed to do one thing; for this reason, the Grid architecture relies on layering of protocols. i.e. through the composition of multiple, simple protocols.
• Examples of protocols– IP, TCP, Transport Layer Security (was Secure Socket Layer), HTTP,
Kerberos
Tiziana Ferrari An Introduction to Grid Computing 20
4. Network Enabled Services• Services are defined by:
– the protocol spoken, as protocols allow interaction between different services
– the behaviour implemented.
• Examples:– Resource access service, Resource discovery, Co-scheduling, Data
replication, etc.
• Services hide the complexity of resource implementations
• Examples: FTP and Web servers
Web Server
IP Protocol
TCP Protocol
Transport Layer Security Protocol
HTTP Protocol
FTP Server
IP Protocol
TCP Protocol
FTP Protocol
Telnet Protocol
Tiziana Ferrari An Introduction to Grid Computing 21
5. Application Programming Interface (API)
• A specification for a set of routines to facilitate application development– Refers to definition, not implementation (several implementations of the
same API are possible)
• APIs are a complement of protocols, as without protocols interoperability between APIs would could be solved only case by case with specific implementations
• Specification of an API is often language-specific– Routine name, number, order and type of arguments; mapping to
language constructs
– Behavior or function of routine
Tiziana Ferrari An Introduction to Grid Computing 22
6. Software Development Kit
• A particular instantiation of an API
• Software Development Kits consist of libraries and tools– Provides implementation of API specification
• One API can have multiple SDKs
Tiziana Ferrari An Introduction to Grid Computing 23
7. Syntax
• Rules for encoding information, e.g.– XML, Condor ClassAds, Globus Resource Specification Language
– X.509 certificate format (RFC 2459)
• Distinct from protocols– One syntax may be used by many protocols (e.g., XML) and be
useful for several purposes
• Syntaxes may be layered– E.g., HTML XML ASCII
– Important to understand layerings when comparing or evaluating syntaxes
Tiziana Ferrari An Introduction to Grid Computing 24
APIs and Protocols are Both Important• Standard APIs/SDKs are important
– They enable application portability
– But without standard protocols, interoperability is hard (every SDK speaks every protocol? infeasible)
• Standard protocols are important– Enable cross-site interoperability
– Enable shared infrastructure
– But without standard APIs/SDKs, application portability is hard (different platforms access protocols in different ways)
Tiziana Ferrari An Introduction to Grid Computing 25
A Protocol can have Multiple APIs
• TCP/IP APIs: BSD sockets, Winsock, …
• The protocol provides interoperability: programs using different APIs can exchange information
• I don’t need to know remote user’s API
TCP/IP Protocol: Reliable byte streams
WinSock API Berkeley Sockets API
Application Application
Tiziana Ferrari An Introduction to Grid Computing 27
Grid Architecture: Definition (1/2)
• The Grid Architecture:
– identifies the fundamental system components (services and VOs);
– specifies purpose and function of these components;
– indicates how these components interact with each other.
– Functions:
• It identifies key areas in which services are required;
• It defines standard protocols and APIs to facilitate creation of interoperable Grid systems and portable applications.
• Grid architecture protocols define the basic communication mechanisms by which:– Virtual Organization users and resources negotiate, establish, manage and exploit
sharing relationships;
– different services interact.
• Grid architecture services are the components whose standardization facilitates extensibility, interoperability, portability and code sharing.
Tiziana Ferrari An Introduction to Grid Computing 28
Grid Archticture: Definition (2/2)
S1
S2
S4
S3
U3U4
U5
U8U7
U6
U1
U2
VO1
VO2COMPONENTS
1. VOs and their users2. Services3. Protocols: - between services - between services and VOs
protocol
protocolprotocol
protocol
protocolprotocol
Tiziana Ferrari An Introduction to Grid Computing 29
Resource Sharing
• Address security and policy concerns of resource owners and users
• Are flexible enough to deal with many resource types and sharing modalities
• Scale: – to large number of resources, many participants, many program
components
– to large data management tasks and computing tasks
Tiziana Ferrari An Introduction to Grid Computing 30
Service Sharing
1) Need for interoperability when different groups want to share resources– Diverse components, policies, mechanisms– E.g., standard notions of identity, means of communication,
resource descriptions
2) Need for shared infrastructure services to avoid repeated development, installation– E.g., one port/service/protocol for remote access to computing, not
one per tool and/or application– E.g., use of shared Certificate Authorities as they are expensive to
run
A common need for protocols & services
Tiziana Ferrari An Introduction to Grid Computing 31
Layered Grid Architecture(by Analogy to the Internet Architecture)
Application
Fabric“Controlling elements locally”: Access to, & control of, resources
Connectivity“Talking to Grid elements”: communication (Internet protocols) & security
Resource“Sharing single resources”: negotiating access, controlling use
Collective“Coordinating multiple resources”: ubiquitous infrastructure services, app-specific distributed services
InternetTransport
Application
Link
Inte
rnet P
roto
col
Arch
itectu
re
Grid Architecture
Internet Architecture
Tiziana Ferrari An Introduction to Grid Computing 32
Layered Grid Architecture
• Layers are ment to group services of similar nature in the same class
• They do not intend to prescribe how services at different layers interact. Communication between services at the same and/or different layer is possible, depending on the scenario.– Example 1: The workload management service (Collective layer) uses
the Resoure discovery service (still Collective layer) to find resources that match the user’s rquirements
– Example 2: an application may submit a job:• By invoking the Workload manager service (re-submission, logging and
bookkeping, input/output management, Collective)
• By invoking the resource layer through a standardized API (Resource)
• By submitting directly to a specific local resource instance (Fabric)
Tiziana Ferrari An Introduction to Grid Computing 33
Protocols, Services,and APIs Occur at Each Level
Languages/Frameworks
Fabric Layer
Applications
Local Access APIs and Protocols
Collective Service APIs and SDKs
Collective ServicesCollective Service Protocols
Resource APIs and SDKs
Resource ServicesResource Service Protocols
Connectivity APIs
Connectivity Protocols
Tiziana Ferrari An Introduction to Grid Computing 34
Fabric Layer (1/2)
• Examples:– Computing farm– Mass storage device – file catalogs– network link– file systems– sensors, etc.
• It provides access to the resources to which shared access is mediated by Grid protocols.
• Fabric components implement local, resource-specific operations (through internal protocols), which are transparent to the upper layers thanks to connectivity and resource level protocols, that define interfaces, not the physical internal characteristics.
• Operations at the fabric layer are triggered by resource sharing operations at a higher level
Tiziana Ferrari An Introduction to Grid Computing 35
Fabric Layer: Capabilities (2/2)
• Examples of capabilities– Computational resource:
• Start programs
• Monitor execution and corresponding processes
• Management of resources (e.g. advance reservation)
• Enquiry of hw/sw capabilities, current load, queue waiting time, etc.
– Storage resource:• Put/get files
• Third party and high-performance data transfer
• Read/write subset of file
• Management of disk space, disk bandwidth, network bandwidth, CPU, etc.
• Enquiry of hw/sw characteristics, available space, bandwidth utilziation, etc.
Tiziana Ferrari An Introduction to Grid Computing 36
Connectivity Layer: Protocols & Services• It supports secure communication between Fabric-layer resources by defining the core
communication and authentication protocols required for grid-specific network functions.
• Communication:– transport (e.g. IP), naming (e.g. DNS), routing, etc. (the TCP/IP protocol stack)
• Security: the Grid Security Infrastructure (GSI)– Uniform authentication, authorization, and message protection mechanisms in
multi-institutional setting– Single sign-on: users must be able to authenticate just once and then have access
to multiple resources, without further user intervention;– Delegation: a user’s program can access the resources on which the program
owner is authorized.– Integration with local security solutions: Grid security needs to interoperate
with local security solutions adopted by the resource managers– User-based trust relationship: a user can access to resources in different
domains without requiring security administrators in each domain to interact with others in different domains. In other words, authorization is only based on the user’s certificate, not on the authentication/authorization performed by other servers in different domains.
– Public key cryptography: Secure Socket Layer (SSL)– Supporting infrastructure: Certificate Authorities, certificate & key management
Tiziana Ferrari An Introduction to Grid Computing 37
Resources Layer: Protocols & Services• Resource layer defines protocols, APIs, and SDKs for:
– secure negotiations,
– initiation,
– monitoring control,
– accounting,
– and payment of sharing operations on individual resources.
• Resource layer implementation relies on the Fabric layer functions.
• Two main components define this layer:– information protocols: used to obtain the information about the structure and state of the resource, e.g.:
configuration, current load and usage policy.
– management protocols: used to negotiate access to the shared resource, specifying for example QoS, advanced reservation, etc.
• Examples: – Access to compute cluster, storage, network information
– Invocation of resource service providers
– Access to local scheduler
Tiziana Ferrari An Introduction to Grid Computing 38
Collective Layer: Protocols & Services• The Collective Layer contains protocols and services that coordinate multiple
resources concurrently, i.e. they capture interactions among a collection of resources.• It supports a variety of sharing behaviors without placing new requirements on the
resources being shared. Examples: – Directory services: they allow VO participants to discover the existence and/or
properties of VO resources. – Co-allocation, scheduling: they allow VO participants to request the allocation of
one or more resource for a specific purpose and the scheduling of tasks on the appropriate resources.
– Monitoring: support the monitoring of VO resources (intrusion, failure, load, ..)– Data replication: supports the management of VO storage resources to maximize
data access performance with respect to metrics such as response time, reliability, etc.
– Workload management: description, use and management of complex workflows (e.g. submission of jobs with mutual dependencies)
– Authorization servers, accounting, collaboratory services (synchronous/asynchronous exchange of messages)
Tiziana Ferrari An Introduction to Grid Computing 39
Application Layer• It includes user applications that operate within a Virtual
Organization environment.• Applications are constructed such that invocation of services
and use of protocols defined at any layer are possible. For example, a given user can submit jobs by invoking services at different layers:– Collective Layer: by submitting to a Workload Manager (it can
provide: internal queue in: case of submission failure, periodic re-submission, access to logging and bookkeeping information, input/output file management)
– Resource Layer: by submitting directly to a compute resource scheduler (e.g. a local cluster)
– Connectivity Layer: by submitting to a remote cluster – Fabric Layer: by submitting to a local cluster
Tiziana Ferrari An Introduction to Grid Computing 40
Open Grid Services Architecture (OGSA)
• OGSA defines the semantics of the Grid Service instances: how they are created, how lifetime is determined, how they communicate, etc.; where:– the service is a network-enabled entity providing a set of capabilities.– compositions of services are possible to form “virtual resources”
• OGSA builds on concepts and technologies from the Grid and Web services communities.
• Web services technology: it is used to define and publish (in a uniform manner) the service semantics. It defines:– defines standard mechanisms for creating, naming, and discovering
transient grid service instance; – provides location transparency and multiple protocol bindings;– supports integration with underlying native platform facilities;– lifetime management, change management, and notification,
authentication, authorization, and delegation.
Tiziana Ferrari An Introduction to Grid Computing 41
ComputeResource
SDK
API
AccessProtocol
CheckpointRepository
SDK
API
C-pointProtocol
Example 1:High-Throughput Computing System
High Throughput Computing System
Dynamic checkpoint, job management, failover, staging
Brokering, certificate authorities
Access to data, access to computers, access to network performance data
Communication, service discovery (DNS), authentication, authorization, delegation
Storage systems, schedulers
Collective(App)
App
Collective(Generic)
Resource
Connect
Fabric
Tiziana Ferrari An Introduction to Grid Computing 42
Example 2:Data Grid Architecture
Data Grid Application (access to distributed data)
Coherency control, replica selection, task management, virtual data catalog, virtual data code catalog, …
Replica catalog, replica management, co-allocation, certificate authorities, metadata catalogs,
Access to data, access to computers, access to network performance data, …
Communication, service discovery (DNS), authentication, authorization, delegation
Storage systems, clusters, networks, network caches, …
Collective(App)
App
Collective(Generic)
Resource
Connect
Fabric
Tiziana Ferrari An Introduction to Grid Computing 44
Globus Toolkit™
• A software toolkit addressing key technical problems in the development of Grid enabled tools, services, and applications– Offer a modular “set of technologies”
– Enable incremental development of grid-enabled tools and applications
– Implement standard Grid protocols and APIs
– Make available under open source license
Tiziana Ferrari An Introduction to Grid Computing 45
General Approach
• Define Grid protocols & APIs– Protocol-mediated access to remote resources
– Integrate and extend existing standards
– “On the Grid” = speak “Intergrid” protocols
• Develop a reference implementation– Open source Globus Toolkit
– Client and server SDKs, services, tools, etc.
• Grid-enable wide variety of tools– Globus Toolkit, FTP, SSH, Condor, SRB, MPI, …
• Learn through deployment and applications
Tiziana Ferrari An Introduction to Grid Computing 46
Fundamental Protocols
• Connectivity Layer protocols– Security: Grid Security Infrastructure (GSI)
• Resource layer:– Resource Management: Grid Resource Allocation Management
(GRAM)
– Information Services: Grid Resource Information Protocol (GRIP)
– Data Transfer: Grid File Transfer Protocol (GridFTP)
• Collective layer protocols– Information Services,
– File Replica Management,
– etc.
Tiziana Ferrari An Introduction to Grid Computing 47
Collective Layer:Grid Security Infrastructure (GSI)
• Globus Toolkit implements GSI protocols and APIs, to address Grid security needs
• GSI protocols extends standard public key protocols– Standards: X.509 & Secure Socket Layer (SSL)/Transport Layer
Security (TLS)
– Extensions: X.509 Proxy Certificates & Delegation
• GSI extends standard Generic Security Service-API
Tiziana Ferrari An Introduction to Grid Computing 48
Site A(Kerberos)
Site B (Unix)
Site C(Kerberos)
Computer
User
Single sign-on via “grid-id”& generation of proxy cred.
Or: retrieval of proxy cred.from online repository
User ProxyProxy
credential
Computer
Storagesystem
Communication*
GSI-enabledFTP server
AuthorizeMap to local idAccess file
Remote fileaccess request*
GSI-enabledGRAM server
GSI-enabledGRAM server
Remote processcreation requests*
* With mutual authentication
Process
Kerberosticket
Restrictedproxy
Process
Restrictedproxy
Local id Local id
AuthorizeMap to local idCreate processGenerate credentials
Ditto
GSI in Action“Create Processes at A and B
that Communicate & Access Files at C”
Tiziana Ferrari An Introduction to Grid Computing 49
Resource Layer: Resource Management
• The Grid Resource Allocation Management (GRAM) protocol and client API allows programs to be started and managed on remote resources, despite local heterogeneity
• Resource Specification Language (RSL) is used to communicate requirements
• A layered architecture allows application-specific resource brokers and co-allocators to be defined in terms of GRAM services– Integrated with Condor, PBS, MPICH-G2, …
Tiziana Ferrari An Introduction to Grid Computing 50
GRAM GRAM GRAM
LSF Condor NQE
Application
RSL
Simple ground RSL
Information Service
Localresourcemanagers
RSLspecialization
Broker
Ground RSL
Co-allocator
Queries& Info
Resource Management Architecture
Tiziana Ferrari An Introduction to Grid Computing 51
Resource Layer: Data Access & Transfer
• GridFTP: extended version of popular FTP protocol for Grid data access and transfer
• Secure, efficient, reliable, flexible, extensible, parallel, concurrent, e.g.:– Third-party data transfers, partial file transfers
– Parallelism, striping (e.g., on PVFS)
– Reliable, recoverable data transfers
• Reference implementations– Existing clients and servers: wuftpd, ncftp
– Flexible, extensible libraries in Globus Toolkit
Tiziana Ferrari An Introduction to Grid Computing 52
Collective Layer: the Grid Information Service
Large numbers of distributed “sensors” with different properties Need for different “views” of this information, depending on community
membership, security constraints, intended purpose, sensor type
Tiziana Ferrari An Introduction to Grid Computing 53
The Globus Toolkit Solution (MDS-2)
Registration & enquiry protocols, information models, query languages– Provides standard interfaces to sensors
– Supports different “directory” structures supporting various discovery/access strategies
Tiziana Ferrari An Introduction to Grid Computing 54
References
1. What is the Grid? A Three Point Checklist; Ian Foster, GridToday, Jul 2002.
2. The Anatomy of the Grid: Enabling Scalable Virtual Organizations, I. Foster, Carl Kesselman, S.Tuecke
3. The Physiology of the Grid, An Open Services Archtiecture for Distributed Systems Integration; I. Foster, C.Kesselman, J.M.Nick, S.Tuecke; Jun 2002.