35
An Introduction to Grid Computing Prashanth Chengi NPSF, C-DAC Pune July 09, 2012 Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 1 / 35

Gridcomputing

  • Upload
    pchengi

  • View
    2.515

  • Download
    0

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Gridcomputing

An Introduction to Grid Computing

Prashanth Chengi

NPSF, C-DAC Pune

July 09, 2012

Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 1 / 35

Page 2: Gridcomputing

Plan of Talk

1 What is ‘Grid Computing’?

2 History of Grid Computing

3 Milestones in Grid Computing

4 Grid Checklist

5 Need for Grid Computing

6 Advantanges of Grid Computing

7 Applications which can run on the grid

8 Architecture of the Grid

9 Virtual Organizations

10 Grid software components

11 Grid Computing: A user’s perspective

12 Grid Computing: An administrator’s perspective

13 Challenges of Grid Computing

14 Questions?

15 References

Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 2 / 35

Page 3: Gridcomputing

What is Grid Computing?

Grid computing is enabling, sharing, selection, and aggregation ofdistributed resources and presenting them as a single, unified resource.

Grand vision: Analogous to power grids.

Users can use resources without needing to know source.An abstraction of implementation specifics from users.‘The whole is bigger than the part’.Allows users to use more resources than they independently own.

Related terms: Collaborative computing, cooperative computing,shared computing.

Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 3 / 35

Page 4: Gridcomputing

History of Grid Computing

Born at a workshop called “Building a Computational Grid” held atArgonne National Laboratory in September 1997.

Ian Foster, Carl Kesselman, and Steve Tuecker are regarded as‘Fathers of grid computing’.

Immediate ancestor: Metacomputing, Circa 1990

FAFNER (Factoring via Network-Enabled Recursion) and I-WAY(Information Widea Area Year) were early adopters of metacomputing.

SETI@Home, Folding@Home, BOINC, GIMPS

Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 4 / 35

Page 5: Gridcomputing

Milestones in GridComputing [4]

Open Grid Services Architecture (OGSA)

Specifies security policies.Uses Grid Security Infrastructure (GSI) protocol.Uses Web Services Description Language (WSDL) and Simple ObjectAccess Protocol (SOAP) for grid services.

Open Grid Services Infrastructure (OGSI)

Specifies communication protocols.

Web Services Resource Framework (WSRF)

Refactoring of OSGI to exploit web services.

Globus Toolkit

Toolkit for developing grid software.Includes authentication framework, message-level and transport-levelsecurity.Provides Java classes and libraries for certificate-based authenticationsupport, access controls and credential management

Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 5 / 35

Page 6: Gridcomputing

Foster’s Three point grid checklist [1]

Coordinates resources that are not subjected to centralized control...

...using standard, open, general-purpose protocols and interfaces...

...to deliver nontrivial qualities of service.

Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 6 / 35

Page 7: Gridcomputing

Need for Grid Computing

Millions of computer instruction cycles are wasted when not in use.

Users‘ programs are constrained by limited amount of availableresources.

If CPU cycle scavenging could be done, and the saved cycles shared,resources could be better utilized.

Mutual resource sharing would mean users are no longer constrainedto use only resources actually owned/operated by themselves.

Enter grid computing.

Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 7 / 35

Page 8: Gridcomputing

Advantages of Grid Computing

Exploitation of under utilized resources

In most organizations, desktop machines are busy less than 5% of thetime.Often, even servers are idle.Resources such as storage may also be under utilized.These resources can be shared over the grid.

Parallel CPU capacity

In addition to pure scientific needs, industries such as bio-medical field,finance, oil exploration and motion picture animation require massiveparallel CPU capacity.These applications can easily tap into resources available over the grid.

Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 8 / 35

Page 9: Gridcomputing

Advantages of Grid Computing

Data grids

Files and databases can span many systems and thus have largercapacities than on any single system.Such spanning can improve data transfer rates through use of stripingtechniques.Data can be duplicated throughout the grid to serve as backup.

Resource balancing

For applications that are grid-enabled, scheduling can be done onmachines with low utilization thereby achieving a resource balancingeffect.An unexpected peak can be routed to relatively idle machines on thegrid.If the grid is already fully utilized, lowest priority tasks can besuspended or even cancelled and taken up later.

Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 9 / 35

Page 10: Gridcomputing

Advantages of Grid Computing

Representation of data grid. Diag courtesy [3, IBM Redbook]

Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 10 / 35

Page 11: Gridcomputing

Advantages of Grid Computing

Reliability

High-end conventional computing systems use expensive hardware toincrease reliability.Grid allows for machine redundancy and instant failover to otherresources.Resources can be taken down for maintainance/upgrades withouncrippling projects involved.

Communication

When machines on a grid are connected to the internet and don’t sharethe same communication paths, they add to the total availablebandwidth.It makes it possible to have redundant communication paths, ascommunication can quickly be rerouted through other paths.

Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 11 / 35

Page 12: Gridcomputing

Applications which can run on the grid

The nature of the grid restricts usage of the grid.

The grid cannot be used for all applications, but it is extremelypractical for certain types of applications.

High Throughput problems

Computing grids can be used to schedule these tasks across resources.As soon as a processor finishes one task, the next task arrives. In thisway, hundreds of tasks can be performed in a very short time.

Embarrassingly parallel problems

These are problems which can be broken down into parts which arecompletely independent of each other.Example: Fingerprint matching in an extremely large database. Theimages are unique and not dependent on each other.

Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 12 / 35

Page 13: Gridcomputing

Applications which can run on the grid

Course-grained calculations

These are often embarrassingly parallel “Monte Carlo simulations”,where parameters are varied and results observed.

High-performance problems

These are problems which require supercomputing resources.Supercomputers generally deal with computer-centric problems; thesecret to solving these probems is “teraflops”: as many as possible.HPC grids require extremely low-latency/high-throughput networks.TeraGrid in US and DEISA in Europe are examples of supercomputinggrids.

In general, HPC applications are not suitable for running on gridswhere network connectivity is not excellent or bandwidth is aconstraint.

Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 13 / 35

Page 14: Gridcomputing

Architecture of the Grid

Network layer: The lowest layer, connecting the grid resources.

Resource layer: Resources may be computers, storage systems,electronic catalogues, sensors etc connected to the network.

Middleware layer: Tools that enable various elements of the grid toparticipate in a grid.

Application layer: The highest layer, it includes applications inscience, engineering, business, finance and more, as well as portalsand development toolkit to support applications.

Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 14 / 35

Page 15: Gridcomputing

Virtual Organizations

Virtual Organinizations (VOs) are groups of people who share acommon goal.

To achieve their mutual goal, VO members share access to eachother’s computers, programs, files, data and networks in a controlled,secure and flexible manner.

For example, the Earth sciences VO unites scientists and researchersworking in the domain of Earth sciences.

Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 15 / 35

Page 16: Gridcomputing

Grid software components

Management components

All grids have some management components to keep track of resourceavailability, membership information etc.Grid software also needs to track capacities and current utilization ofnodes in realtime.It’s also responsible for monitoring node health, usage patterns andstatistics.Some grid systems provide their own login to the grid while othersdepend on the native operating systems for user authentication.

Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 16 / 35

Page 17: Gridcomputing

Grid software components

Distributed grid management

Often, grid software are hierarchical, thereby allowing for decentralizedmanagement.Clusters of clusters approach.For example, a top-level scheduler only submits tasks to thecluster-level scheduler, instead of trying to schedule the actual run ofthe job.Lower level schedulers handle the assignment of the task to theindividual machines and gathering of output to be passed to thehigher-level job manager.

Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 17 / 35

Page 18: Gridcomputing

Grid software components

Donor software

Each machine on the grid needs to install some software which isrequired by other members of the VO.These software may be scientific libraries, compilers and other softwarepackages.The machines need to have the necessary binaries to execute the users’jobs.

Submission software

Usually any member machine of a grid can be used to submit jobs tothe grid and initiate grid queries.However, on some grid systems, this function is implemented as aseparate component installed on submission nodes or submissionclients.

Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 18 / 35

Page 19: Gridcomputing

Grid software components

Schedulers

Most grid systems include some sort of job scheduling software.The scheduler locates a machine on which to run a grid job that hasbeen submitted by a user.Simplest of schedulers are round-robin schedulers, which cyclicallyassign jobs to machines matching the job requirements.Other schedulers have complex scheduling logic and manage multiplequeues of jobs.Schedulers measure current utilization of machines or depend oncluster management software to provide it relevant figures.

Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 19 / 35

Page 20: Gridcomputing

Grid software components

Schedulers (contd)

Schedulers may also be hierarchical, i.e. top-level scheduler submittingjobs to cluster schedulers.Schedulers generally maintain job state information and are responsiblefor resubmitting jobs in the event of failures.Schedulers also offer resource reservation, thereby eliminating the needfor the users to manually monitor resource availability.Schedulers also often offer opportunistic job migration.

Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 20 / 35

Page 21: Gridcomputing

Grid software components

Communications

Jobs submitted on the grid may need to communicate with each other.For example, a job may split itself into a large number of subjobs whichneed to exchange information amongst themselves.The subjobs would need to be able to locate other subjobs and sendappropriate data.As a result, the open standard Message Passing Interface (MPI) and itsvariations are a often included as part of the grid system.

Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 21 / 35

Page 22: Gridcomputing

Grid software components

Monitoring and measurement

Donor software often includes tools that measure current load andactivity on given machine using either OS tools or by directmeasurement.Some grid systems provide means for implementing custom loadsensors for other than CPU or storage resources.Schedulers often depend on these tools to make scheduling decisions.These statistics are also useful for discovering usage patters in the grid.Usage pattern analysis is used to better predict resource requirementsof the job for its next run.The measurement information can be saved for accounting purposes oras the basis for grid resource brokering.

Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 22 / 35

Page 23: Gridcomputing

Grid Computing: A user’s perspective

In this section, we will see the the grid from a user’s perspective.

Enrolling and installing grid software

While there may be testbed grid setups with free and unrestrictedaccess to all, production grids require users to first sign up for VOmembership.In order to obtain VO membership, it is mandatory to obtain a digitalcertificate vouching for his/her identity.The identity certificate will have to be obtained from a certificationauthority (CA) trusted by the VO which the user wishes to subscribe to.Upon verifying the identity of the user, the CA will issue a digitalcertificate to the user, which the user has to safeguard and takeresponsibility for.Upon installing identity credentials, the user then has to install clientsoftware for accessing the grid.

Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 23 / 35

Page 24: Gridcomputing

Grid Computing: A user’s perspective

Logging onto the grid.

Many grid systems require the user to log on to a system using an idenrolled in the grid.Often, the digital certificate itself forms the user’s id for logging ontothe grid.In case of the former, the user’s login information must be replicatedall over the grid in the exact fashion.In case of the latter, the user’s credentials may be mapped to any localaccount and it is completely opaque to the user.

Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 24 / 35

Page 25: Gridcomputing

Grid Computing: A user’s perspective

Querying and submitting jobs

The user usually performs queries to check to the resource availabilityon the grid.The user may specify custom requirements in his submit script.Grid systems usually provide command-line tools, if not graphical, tocheck the status of jobs already submitted by the user and to querystatus of the grid.This allows users to write custom scripts to check the status of the gridand automatically fire jobs, if conditions are favorable.Scripts can also be used to submit pipeline jobs: a series of jobs inwhich each job depends on the output of of it’s predecessor.

Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 25 / 35

Page 26: Gridcomputing

Grid Computing: A user’s perspective

The job submit process

Firstly, the job input data and possibly the executable program/script isstaged in. Alternatively, the data and/or executable may already be onthe grid machine.The job is the executed on the grid machine, either using a commonuser credential or the user’s own grid identity.The results of the job are sent back to the submitter in a process calledstaging out.In some cases, intermediate output is made available to the userthrough console/GUI.

Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 26 / 35

Page 27: Gridcomputing

Grid Computing: A user’s perspective

Data configuration

The data accessed by grid jobs may simply be staged in and out by thegrid system.However, in case of pipe-line jobs and other subjobs, repeated stagingin can be avoided by using a networked file system instead.Many grid sites also offer storage resource manager services which canbe used to store input data for repeated retrieval.A user should always respect the grid site’s file storage policies.

Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 27 / 35

Page 28: Gridcomputing

Grid Computing: A user’s perspective

Resource reservation

Many grid site offer the service of advance job reservations.A user wanting to execute a job may apply for a slot in advance, inwhich case jobs submitted by him will await for unreserved resourceavailability or the commencement of reservation window, whichevercomes first.Reservations fix the latest time a job may come into execution.Users have to be careful in estimating resource requirements asinaccurate estimations may adversely affect the job’s time spent inqueued state.Sites offer reservations for not only compute resources but also otherresources such as scanners, sensors and storage.

Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 28 / 35

Page 29: Gridcomputing

Grid Computing: An administrator’s perspective

Planning

The admin should understand the organization’s requirements andaccordingly deploy resources.It is advisible to deploy a testbed grid to gain understanding of thesystem and to experiment with settings before deploying them onto theproduction environment.

Security

Admins must take care to prevent unauthorized access of data in amulti-user grid environment.The machines must be constantly monitored and updated to fixvulnerability issues as and when they are discovered.Public keys must be backed up and private keys must be carefullysecured.

Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 29 / 35

Page 30: Gridcomputing

Grid Computing: An administrator’s perspective

User and quota management

Admins are required to ensure that VO members possess validaccounts/credential mapping on all grid resources.Admins must stay updated about user credential revokations andcancellations and remove access privileges to such users.They must plan and enforce restrictions on resources such asprocessors, storage etc to ensure fair usage opportunity to all users.They must actively monitor machines to ensure that all necessaryservices are up and running.

Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 30 / 35

Page 31: Gridcomputing

Grid Computing: An administrator’s perspective

Certificate Authority (CA)

It is critical to maintain highest levels of security in a grid because itallows multiple users to not only access data but also to execute code.The CA is responsible for positively identifying entities requesting forVO membership/credentials and ensure their bonafides.Issue certificates to users whose bonafides have been verified.The CA should take all measures to protect the CA server.He/she should ensure that members who have quit the VO arepromptly removed and revocation lists are regularly published.

Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 31 / 35

Page 32: Gridcomputing

Challenges of Grid Computing

Security

Access policy: What is shared? Who is allowed to share? When sharingcan occur?Authentication: How do you identify a user or resource?Authorization: How to determine whether a certain operation isadhering to rules?These questions led to development of security infrastructure for thegrid.

User requirements

Compute resources are often not general purpose. They are tuned forperformance for certain classes of applications.Users often require installation of custom software to run theirapplications. This is problematic in shared access scenarios.A need was therefore felt for setting up ‘Virtual Organizations’ (VOs),in which people working on similar technologies/domains could shareresources amongst each other.

Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 32 / 35

Page 33: Gridcomputing

Challenges of Grid Computing

Networking performance

Grids by definition have to be allow geographic distribution.Networking becomes a major problem when resources are spread acrossa WAN, across cities or even contries.Grid middleware needs to have high degrees of fault tolerance, to allowfor intermittent and transient network failures.

Gridifying applications

Not all applications can be transformed to run in parallel or on a grid.There are no practical tools for transforming arbitrary applications toexploit the parallel capabilities of a grid: applications need to often berewritten.Parallelizing a non-parallel application requires mathematical andprogramming expertize.Scalability of the actual problem.

Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 33 / 35

Page 34: Gridcomputing

Questions?

Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 34 / 35

Page 35: Gridcomputing

References

[1] Ian Foster. What is the Grid? A Three Point Checklist. url:http://dlib.cs.odu.edu/WhatIsTheGrid.pdf.

[2] url: http://www.gridcafe.org/.

[3] B. Jacob et al. Introduction to grid computing. 2005. url:http://www.redbooks.ibm.com/redbooks/pdfs/sg246778.pdf.

[4] Ken North. Milestones in Grid Computing. url:http://www.gridsummit.com/Articles/Milestones.htm.

Prashanth Chengi (NPSF, C-DAC Pune) An Introduction to Grid Computing July 09, 2012 35 / 35