Cs6703 grid and cloud computing unit 1

Slide 1

CS6703 GRID AND CLOUD COMPUTINGUnit 1Dr Gnanasekaran ThangavelProfessor and HeadFaculty of Information TechnologyR M K College of Engineering and Technology

UNIT IINTRODUCTIONEvolution of Distributed computing: Scalable computing over the Internet Technologies for network based systems clusters of cooperative computers - Grid computing Infrastructures cloud computing - service oriented architecture Introduction to Grid Architecture and standards Elements of Grid Overview of Grid Architecture.

2Dr Gnanasekaran Thangavel30-Aug-16

Distributed ComputingDefinitionA distributed system consists of multiple autonomous computers that communicate through a computer network. Distributed computing utilizes a network of many computers, each accomplishing a portion of an overall task, to achieve a computational result much more quickly than with a single computer.Distributed computing is any computing that involves multiple computers remote from each other that each have a role in a computation problem or information processing.3Dr Gnanasekaran Thangavel30-Aug-16

IntroductionA distributed system is one in which hardware or software components located at networked computers communicate and coordinate their actions only by message passing.In the term distributed computing, the word distributed means spread out across space. Thus, distributed computing is an activity performed on a spatially distributed system.These networked computers may be in the same room, same campus, same country, or in different continents4Dr Gnanasekaran Thangavel30-Aug-16

Introduction

CooperationCooperationCooperationInternet

Large-scaleApplicationResourceManagement

SubscriptionDistributionDistributionDistributionDistributionAgentAgentAgentAgentJob Request


A distributed system consists of collection of autonomous computers, connected through a network and distributed operating system software, which enables computers to coordinate their activities and to share the resources of the system - hardware, software and data, so that users perceive the system as a single, integrated computing facility.

A distributed system is a collection of independent computers that appears to its users as a single coherent system.

This definition has several important aspects. The first one is that a distributed system consists of components (i.e., computers) that are autonomous. A second aspect is that users (be they people or programs) think they are dealing with a single system. This means that one way or the other the autonomous components need to collaborate. How to establish this collaboration lies at the heart of developing distributed systems.Note that no assumptions are made concerning the type of computers. In principle, even within a single system, they could range from high-performance mainframe computers to small nodes in sensor networks. Likewise, no assumptions are made on the way that computers are interconnected.

MotivationInherently distributed applicationsPerformance/costResource sharingFlexibility and extensibilityAvailability and fault toleranceScalabilityNetwork connectivity is increasing.Combination of cheap processors often more cost-effective than one expensive fast system.Potential increase of reliability.6Dr Gnanasekaran Thangavel30-Aug-16

The main motivations in moving to a distributed system are the following:Inherently distributed applications. Distributed systems have come into existence in some very natural ways, e.g., in our society people are distributed and information should also be distributed. Distributed database system information is generated at different branch offices (subdatabases), so that a local access can be done quickly. The system also provides a global view to support various global operations.Performance/cost. The parallelism of distributed systems reduces processing bottlenecks and provides improved all-around performance, i.e., distributed systems offer a better price/performance ratio.Resource sharing. Distributed systems can efficiently upport information and resource (hardware and software) sharing for users at different locations.Flexibility and extensibility. Distributed systems are capable of incremental growth and have the added advantage of facilitating modification or extension of a system to adapt to a changing environment without disrupting its operations.Availability and fault tolerance. With the multiplicity of storage units and processing elements, distributed systems have the potential ability to continue operation in the presence of failures in the system.Scalability. Distributed systems can be easily scaled to include additional resources (both hardware and software).

History1975 1985Parallel computing was favored in the early yearsPrimarily vector-based at firstGradually more thread-based parallelism was introducedThe first distributed computing programs were a pair of programs called Creeper and Reaper invented in 1970sEthernet that was invented in 1970s.ARPANET e-mail was invented in the early 1970s and probably the earliest example of a large-scale distributed application.7Dr Gnanasekaran Thangavel30-Aug-16

The use of concurrent processes that communicate by message-passing has its roots in operating system architectures studied in 1960s.[19] The first wide-spread distributed systems were local-area networks such as Ethernet that was invented in 1970s.[20]ARPANET, the predecessor of the Internet, was introduced in the late 1960s, and ARPANET e-mail was invented in the early 1970s. E-mail became the most successful application of ARPANET,[21] and it is probably the earliest example of a large-scale distributed application. In addition to ARPANET and its successor Internet, other early worldwide computer networks included Usenet and FidoNet from 1980s, both of which were used to support distributed discussion systems.The study of distributed computing became its own branch of computer science in the late 1970s and early 1980s. The first conference in the field, Symposium on Principles of Distributed Computing (PODC), dates back to 1982, and its European counterpart International Symposium on Distributed Computing (DISC) was first held in 1985.

The first distributed computing programs were a pair of programs called Creeper and Reaper which made their way through the nodes of the ARPANET in the 1970s, the predecessor of the Internet. The Creeper came first and was a worm program, using the idle CPU cycles of processors in the ARPANET to copy itself onto the next system and then delete itself from the previous one. It was modified to remain on all previous computers and the Reaper was created which traveled through the same network and deleted all copies of the Creeper. In this way Creeper and Reaper were the first infectious computer programs and are actually often thought of as the first network viruses. They did no damage, however, to the computers they passed through and were instrumental in exploring the possibility of making use of idle computational power.

History1985 -1995Massively parallel architectures start rising and message passing interface and other libraries developedBandwidth was a big problemThe first Internet-based distributed computing project was started in 1988 by the DEC System Research Center.Distributed.net was a project founded in 1997 - considered the first to use the internet to distribute data for calculation and collect the results, 8Dr Gnanasekaran Thangavel30-Aug-16

The use of concurrent processes that communicate by message-passing has its roots in operating system architectures studied in 1960s.[19] The first wide-spread distributed systems were local-area networks such as Ethernet that was invented in 1970s.[20]ARPANET, the predecessor of the Internet, was introduced in the late 1960s, and ARPANET e-mail was invented in the early 1970s. E-mail became the most successful application of ARPANET,[21] and it is probably the earliest example of a large-scale distributed application. In addition to ARPANET and its successor Internet, other early worldwide computer networks included Usenet and FidoNet from 1980s, both of which were used to support distributed discussion systems.The study of distributed computing became its own branch of computer science in the late 1970s and early 1980s. The first conference in the field, Symposium on Principles of Distributed Computing (PODC), dates back to 1982, and its European counterpart International Symposium on Distributed Computing (DISC) was first held in 1985.

The first distributed computing programs were a pair of programs called Creeper and Reaper which made their way through the nodes of the ARPANET in the 1970s, the predecessor of the Internet. The Creeper came first and was a worm program, using the idle CPU cycles of processors in the ARPANET to copy itself onto the next system and then delete itself from the previous one. It was modified to remain on all previous computers and the Reaper was created which traveled through the same network and deleted all copies of the Creeper. In this way Creeper and Reaper were the first infectious computer programs and are actually often thought of as the first network viruses. They did no damage, however, to the computers they passed through and were instrumental in exploring the possibility of making use of idle computational power. The first Internet-based distributed computing project was started in 1988 by the DEC System Research Center. The project sent tasks to volunteers through email, who would run these programs during idle time and then send the results back to DEC and get a new task. The project worked to factor large numbers and by 1990 had about 100 users.

The most prominent group, considered the first to actually use the internet to distribute data for calculation and collect the results, was a project founded in 1997 called distributed.net. They used independently owned computers as DEC had, but allowed the users to download the program that would utilize their idle CPU time instead of emailing it to them. Distributed.net completed several cryptology challenges by RSA Labs as well as other research facilities with the help of thousands of users.

History1995 TodayCluster/grid architecture increasingly dominantSpecial node machines eschewed in favor of COTS technologiesWeb-wide cluster softwareGoogle take this to the extreme (thousands of nodes/cluster)SETI@Home started in May 1999 - analyze the radio signals that were being collected by the Arecibo Radio Telescope in Puerto Rico.9Dr Gnanasekaran Thangavel30-Aug-16

Commercial, off-the-shelf (COTS) is a term for software or hardware, generally technology or computer products, that are ready-made and available for sale, lease, or license to the general public. They are often used as alternatives to in-house developments or one-off government-funded developments. The use of COTS is being mandated across many government and business programs, as they may offer significant savings in procurement and maintenance. However, since COTS software specifications are written by external sources, government agencies are sometimes wary of these products because they fear that future changes to the product will not be under their control.

The project that truly popularized distributed computing and showed that it could work was SETI@Home, an effort by the Search for Extraterrestrial Intelligence (SETI) at the University of California at Berkeley. The project was started in May 1999 to analyze the radio signals that were being collected by the Arecibo Radio Telescope in Puerto Rico. It has gained over three million independent users who volunteer their idle computers to search for signals that may not have originated from Earth. This project has really brought the field to light, so that other groups and companies are quickly following their lead. (See Current Projects)

GoalMaking Resources AccessibleData sharing and device sharingDistribution TransparencyAccess, location, migration, relocation, replication, concurrency, failureCommunicationMake human-to-human comm. easier. E.g.. : electronic mailFlexibilitySpread the work load over the available machines in the most cost effective way To coordinate the use of shared resources To solve large computational problem10Dr Gnanasekaran Thangavel30-Aug-16

solving a large computational problem. Alternatively, each computer may have its own user with individual needs, and the purpose of the distributed system is to coordinate the use of shared resources or provide communication services to the users

From Tanenbaum book:

Making Resources AccessibleThe main goal of a distributed system is to make it easy for the users (and applications) to access remote resources, and to share them in a controlled and efficient way. Resources can be just about anything, but typical examples include things like printers, computers, storage facilities, data, files, Web pages, and networks, to name just a few. There are many reasons for wanting to share resources. One obvious reason is that of economics. For example, it is cheaper to let a printer be shared by several users in a smaJl office than having to buy and maintain a separate printer for each user. Likewise, it makes economic sense to share costly resources such as supercomputers, high- performance storage systems, imagesetters, and other expensive peripherals.

Distribution Transparency An important goal of a distributed system is to hide the fact that its processes and resources are physically distributed across multiple computers. A distributed system that is able to present itself to users and applications as if it were only a single computer system is said to be transparent.

OpennessAnother important goal of distributed systems is openness. An open distributed system is a system that offers services according to standard rules that describe the syntax and semantics of those services. For example, in computer networks, standard rules govern the format, contents, and meaning of messages sent and received. Such rules are formalized in protocols. In distributed systems, services are generally specified through interfaces, which are often described in an Interface Definition Language (IDL). Interface definitions written in an IDL nearly always capture only the syntax of services. In other words, they specify precisely the names of the functions that are available together with types of the parameters, return values, possible exceptions that can be raised, and so on. The hard part is specifying precisely what those services do, that is, the semantics of interfaces. In practice, such specifications are always given in an informal way by means of natural language.

ScalabilityWorldwide connectivity through the Internet is rapidly becoming as common as being able to send a postcard to anyone anywhere around the world. With this in mind, scalability is one of the most important design goals for developers of distributed systems. Scalability of a system can be measured along at least three different dimensions (Neuman, 1994). First, a system can be scalable with respect to its size, meaning that we can easily add more users and resources to the system. Second, a geographically scalable system is one in which the users and resources may lie far apart. Third, a system can be administratively scalable,/~~aning that it can still be easy to manage even if it spans many independent administrative organizations. Unfortunately, a system that is scalable in one or more of these dimensions often exhibits some loss of performance as the system scales up.

CharacteristicsResource SharingOpennessConcurrencyScalabilityFault ToleranceTransparency11Dr Gnanasekaran Thangavel30-Aug-16

Resource Sharing:- Resource sharing is the ability to use any hardware, software or data anywhere in the system. Resources in a distributed system, unlike the centralized one, are physically encapsulated within one of the computers and can only be accessed from others by communication. It is the resource manager to offers a communication interface enabling the resource be accessed, manipulated and updated reliability and consistently. There are mainly two kinds of model resource managers: client/server model and the object-based model. Object Management Group uses the latter one in CORBA, in which any resource is treated as an object that encapsulates the resource by means of operations that users can invoke.Openness:- Openness is concerned with extensions and improvements of distributed systems. New components have to be integrated with existing components so that the added functionality becomes accessible from the distributed system as a whole. Hence, the static and dynamic properties of services provided by components have to be published in detailed interfaces. Concurrency:- Concurrency arises naturally in distributed systems from the separate activities of users, the independence of resources and the location of server processes in separate computers. Components in distributed systems are executed in concurrent processes. These processes may access the same resource concurrently. Thus the server process must coordinate their actions to ensure system integrity and data integrity. Scalability:- Scalability concerns the ease of the increasing the scale of the system (e.g. the number of processor) so as to accommodate more users and/or to improve the corresponding responsiveness of the system. Ideally, components should not need to be changed when the scale of a system increases.Fault tolerance:- Fault tolerance cares the reliability of the system so that in case of failure of hardware, software or network, the system continues to operate properly, without significantly degrading the performance of the system. It may be achieved by recovery (software) and redundancy (both software and hardware).Transparency:- Transparency hides the complexity of the distributed systems to the users and application programmers. They can perceive it as a whole rather than a collection of cooperating components in order to reduce the difficulties in design and in operation. This characteristic is orthogonal to the others. There are many aspects of transparency, including access transparency, location transparency, concurrency transparency, replication transparency, failure transparency, migration transparency, performance transparency and scaling transparency.

ArchitectureClient-server3-tier architectureN-tier architectureloose coupling, ortight couplingPeer-to-peer Space based 12Dr Gnanasekaran Thangavel30-Aug-16

Client-server:- Smart client code contacts the server for data, then formats and displays it to the user. Input at the client is committed back to the server when it represents a permanent change.3-tier architecture:- Three tier systems move the client intelligence to a middle tier so that stateless clients can be used. This simplifies application deployment. Most web applications are 3-Tier.N-tier architecture:- N-Tier refers typically to web applications which further forward their requests to other enterprise services. This type of application is the one most responsible for the success ofapplication servers.Tightly coupled(clustered):- Tightly coupled architecture refers typically to a cluster of machines that closely work together, running a shared process in parallel. The task is subdivided in parts that are made individually by each one and then put back together to make the final result.Peer-to-peer:- Peer-to-peer is an architecture where there is no special machine or machines that provide a service or manage the network resources. Instead all responsibilities are uniformly divided among all machines, known as peers. Peers can serve both as clients and servers.Space based:- Space based refers to an infrastructure that creates the illusion (virtualization) of one single address-space. Data are transparently replicated according to application needs. Decoupling in time, space and reference is achieved.

Another basic aspect of distributed computing architecture is the method of communicating and coordinating work among concurrent processes. Through various message passing protocols, processes may communicate directly with one another, typically in amaster/slaverelationship. Alternatively, a"database-centric" architecturecan enable distributed computing to be done without any form of directinter-process communication, by utilizing a shareddatabase.

Examples of commercial application :Database Management SystemDistributed computing using mobile agentsLocal intranetInternet (World Wide Web)JAVA Remote Method Invocation (RMI)Application13Dr Gnanasekaran Thangavel30-Aug-16

examples of commercial application of distributed system, such as the Database Management System, distributed computing using mobile agents, local intranet, internet (World Wide Web), JAVA RMI

Distributed Computing Using Mobile AgentsMobile agents can be wandering around in a network using free resources for their own computations.


Local IntranetA portion of Internet that is separately administered & supports internal sharing of resources (file/storage systems and printers) is called local intranet.


Internet The Internet is a global system of interconnected computer networks that use the standardized Internet Protocol Suite (TCP/IP).


JAVA RMIEmbedded in language Java:-Object variant of remote procedure callAdds naming compared with RPC (Remote Procedure Call)Restricted to Java environments

RMI Architecture17Dr Gnanasekaran Thangavel30-Aug-16

Java Remote Method Invocation (RMI), which is a simple and powerful network object transport mechanism, provides a way for a Java program on one machine to communicate with objects residing in different address spaces. Some Java parallel computing environments use RMI for communication, such as JavaParty, discussed in next section. It is also the foundation of Jini technology--discussed on section2.1.5. RMI is an implementation of the distributed object programming model, comparable with CORBA, but simpler, and specialized to the Java language. An overview of the RMI architecture is shown in Figure 2.1.

GoalsA primary goal for the RMI designers was to allow programmers to develop distributed Java programs with the same syntax and semantics used for non-distributed programs. To do this, they had to carefully map how Java classes and objects work in a single Java Virtual Machine1 (JVM) to a new model of how classes and objects would work in a distributed (multiple JVM) computing environment.

Java RMI ArchitectureThe design goal for the RMI architecture was to create a Java distributed object model that integrates naturally into the Java programming language and the local object model. RMI architects have succeeded; creating a system that extends the safety and robustness of the Java architecture to the distributed computing world.

Important parts of the RMI architecture are the stub class, the object serialization, and the server-side Run-time System. The stub class implements the remote interface and is responsible for marshaling and unmarshaling the data and managing the network connection to a server. An instance of the stub class is needed on each client. Local method invocations on the stub class will be made whenever a client invokes a method on a remote object. Java has a general mechanism for converting objects into streams of bytes that can later be read back into an arbitrary JVM. This mechanism, called object serialization, is an essential functionality needed by Java's RMI implementation. It provides a standardized way to encode all the information into a byte stream suitable for streaming to some type of network or to a file-system. In order to provide the functionality, an object must implement the Serializable interface. The server-side run-time system is responsible for listening for invocation requests on a suitable IP port, and dispatching them to the proper, remote object on the server. Since RMI is designed for Web based client-server applications over slow network, it is not clear it is suitable for high performance distributed computing environments with low latency and high bandwidth. A better serialization would be needed, since Java's current object serialization often takes at least 25% and up to 50% of the time [50] needed for a remote invocation.

Stubs:Stub is a piece of code emulating a called function , It is a temporary called program.It functions similarly like sub modules when called by the main module.A piece of code that simulates the activity of missing component. Stubs are simulations of the sub-code that otherwise is very difficult to execute in the test code.Stub is a simple routine that takes the place of the real routine.

Skeletons:The role of the stubs is to marshal and unmarshal the messages that are sent and received on the client or the server side .

Categories of Applications in distributed computingScienceLife SciencesCryptographyInternet FinancialMathematicsLanguageArtPuzzles/GamesMiscellaneousDistributed Human ProjectCollaborative Knowledge BasesCharity


18

AdvantagesEconomics:-Computers harnessed together give a better price/performance ratio than mainframes.Speed:-A distributed system may have more total computing power than a mainframe.Inherent distribution of applications:-Some applications are inherently distributed. E.g., an ATM-banking application.Reliability:-If one machine crashes, the system as a whole can still survive if you have multiple server machines and multiple storage devices (redundancy).Extensibility and Incremental Growth:-Possible to gradually scale up (in terms of processing power and functionality) by adding more sources (both hardware and software). This can be done without disruption to the rest of the system.19Dr Gnanasekaran Thangavel30-Aug-16

Economics:- Computers harnessed together give a better price/performance ratio than mainframes.Speed:- A distributed system may have more total computing power than a mainframe.Inherent distribution of applications:- Some applications are inherently distributed. E.g., an ATM-banking application.Reliability:- If one machine crashes, the system as a whole can still survive if you have multiple server machines and multiple storage devices (redundancy).Extensibility and Incremental Growth:-Possible to gradually scale up (in terms of processing power and functionality) by adding more sources (both hardware and software). This can be done without disruption to the rest of the system.Distributed custodianship:- The National Spatial Data Infrastructure (NSDI) calls for a system of partnerships to produce a future national framework for data as a patchwork quilt of information collected at different scales and produced and maintained by different governments and agencies. NSDI will require novel arrangements for framework management, area integration, and data distribution. This research will examine the basic feasibility and likely effects of such distributed custodianship in the context of distributed computing architectures, and will determine the institutional structures that must evolve to support such custodianship.Data integration:- This research will contribute to the integration of geographic information and GISs into the mainstream of future libraries, which are likely to have full digital capacity. The digital libraries of the future will offer services for manipulating and processing data as well as for simple searches and retrieval. Missed opportunities:- By anticipating the impact that a rapidly advancing technology will have on GISs, this research will allow the GIS community to take better advantage of the opportunities that the technology offers.

DisadvantagesComplexity :- Lack of experience in designing, and implementing a distributed system. E.g. which platform (hardware and OS) to use, which language to use etc. Network problem:-If the network underlying a distributed system saturates or goes down, then the distributed system will be effectively disabled thus negating most of the advantages of the distributed system.Security:-Security is a major hazard since easy access to data means easy access to secret data as well.20Dr Gnanasekaran Thangavel30-Aug-16

Issues and ChallengesHeterogeneity of components :- variety or differences that apply to computer hardware, network, OS, programming language and implementations by different developers.All differences in representation must be deal with if to do message exchange.Example : different call for exchange message in UNIX different from Windows. Openness:-System can be extended and re-implemented in various ways.Cannot be achieved unless the specification and documentation are made available to software developer.The most challenge to designer is to tackle the complexity of distributed system; design by different people.21Dr Gnanasekaran Thangavel30-Aug-16

HeterogeneityThey must be constructed from a variety of diff. networks, OS, computer hardware and programming language.The Internet comm. Protocols mask the difference in networks, and middleware can deal with other differences.

OpennessDist. Systems should be extensible the 1st step is to publish the interfaces of the components, but the integration of components written by diff. programmers is a real challenge.

Issues and Challenges contTransparency:-Aim : make certain aspects of distribution are invisible to the application programmer ; focus on design of their particular application.They not concern the locations and details of how it operate, either replicated or migrated.Failures can be presented to application programmers in the form of exceptions must be handled.


Issues and Challenges contTransparency:-This concept can be summarize as shown in this Figure:


Location TransparencyRefers to the fact that in a true dist. System, users cannot tell where hardware and software resources such as CPUs, printers, files and databases are located.

Migration TransparencyResources must be free to move from one location to another without having their names change.

Replication TransparencyOS is free to make additional copies of files and other resources on its own without the users noticing.Eg.: The servers can decide by themselves to replicate any file on any or all servers, without the users having to know about it.

Concurrency TransparencyThe users will not notice the existence of other users.

Parallelism Transparencycan be regarded as the holy grail for dist. Systems designers.

Issues and Challenges contSecurity:-Security for information resources in distributed system have 3 components :a. Confidentiality : protection against disclosure to unauthorized individuals.b. Integrity : protection against alteration/corruptionc. Availability : protection against interference with the means to access the resources.The challenge is to send sensitive information over Internet in a secure manner and to identify a remote user or other agent correctly.


Encryption can be used to provide adequate protection of shared resources and to keep sensitive information secret when is transmitted in message over a network.Denial of service attacks are still a problem.

Issues and Challenges cont..Scalability :-Distributed computing operates at many different scales, ranging from small Intranet to Internet.A system is scalable if there is significant increase in the number of resources and users.The challenges is :a. controlling the cost of physical resources.b. controlling the performance loss.c. preventing software resource running out.d. avoiding performance bottlenecks.25Dr Gnanasekaran Thangavel30-Aug-16

Dist. Computing is scalable if the cost of adding a user is a constant amount in terms of the resources that must be added.The algorithms used to access shared data should avoid performance bottlenecks and data should be structured hierarchically to get the best access times.

Issues and Challenges contFailure Handling :-Failures in a distributed system are partial some components fail while others can function.Thats why handling the failures are difficulta. Detecting failures : to manage the presence of failures cannot be detected but may be suspected.b. Masking failures : hiding failure not guaranteed in the worst case.Concurrency :-Where applications/services process concurrency, it will effect a conflict in operations with one another and produce inconsistence results.Each resource must be designed to be safe in a concurrent environment.26Dr Gnanasekaran Thangavel30-Aug-16

Failure HandlingAny process, computer or network may fail independently of the others.Therefore each component needs to be aware of the possible ways in which the components its depends on may fail and be designed to deal with each of those failure appropriately.

ConclusionThe concept of distributed computing is the most efficient way to achieve the optimization. Distributed computing is anywhere : intranet, Internet or mobile ubiquitous computing (laptop, PDAs, pagers, smart watches, hi-fi systems)It deals with hardware and software systems, that contain more than one processing / storage and run in concurrently.Main motivation factor is resource sharing; such as files , printers, web pages or database records.Grid computing and cloud computing are form of distributed computing.27Dr Gnanasekaran Thangavel30-Aug-16

In this age of optimization everybody is trying to get optimized output from their limited resources. The concept of distributed computing is the most efficient way to achieve the optimization. In case of distributed computing the actual task is modularized and is distributed among various computer system. It not only increases the efficiency of the task but also reduce the total time required to complete the task. Now the advance concept of this distributed computing, that is the distributed computing through mobile agents is setting a new landmark in this technology. Amobile agentis aprocessthat can transport its state from one environment to another, with its data intact, and be capable of performing appropriately in the new environment.

Grid ComputingGrid computing is a form of distributed computing whereby a "super and virtual computer" is composed of a cluster of networked, loosely coupled computers, acting in concert to perform very large tasks.

Grid computing (Foster and Kesselman, 1999) is a growing technology that facilitates the executions of large-scale resource intensive applications on geographically distributed computing resources.

Facilitates flexible, secure, coordinated large scale resource sharing among dynamic collections of individuals, institutions, and resource

Enable communities (virtual organizations) to share geographically distributed resources as they pursue common goals 30-Aug-1628Dr Gnanasekaran Thangavel

Criteria for a Grid:Coordinates resources that are not subject to centralized control.Uses standard, open, general-purpose protocols and interfaces.Delivers nontrivial qualities of service.

BenefitsExploit Underutilized resourcesResource load BalancingVirtualize resources across an enterpriseData Grids, Compute GridsEnable collaboration for virtual organizations


Grid ApplicationsData and computationally intensive applications:This technology has been applied to computationally-intensive scientific, mathematical, and academic problems like drug discovery, economic forecasting, seismic analysis back office data processing in support of e-commerce A chemist may utilize hundreds of processors to screen thousands of compounds per hour.Teams of engineers worldwide pool resources to analyze terabytes of structural data.Meteorologists seek to visualize and analyze petabytes of climate data with enormous computational demands.Resource sharingComputers, storage, sensors, networks, Sharing always conditional: issues of trust, policy, negotiation, payment, Coordinated problem solvingdistributed data analysis, computation, collaboration,

30-Aug-1630Dr Gnanasekaran Thangavel

Grid Topologies Intragrid Local grid within an organization Trust based on personal contracts Extragrid Resources of a consortium of organizations connected through a (Virtual) Private Network Trust based on Business to Business contracts Intergrid Global sharing of resources through the internet Trust based on certification30-Aug-1631Dr Gnanasekaran Thangavel

30-Aug-16Dr Gnanasekaran Thangavel32Computational Grid

A computational grid is a hardware and software infrastructure that provides dependable, consistent, pervasive, and inexpensive access to high-end computational capabilities.

The Grid: Blueprint for a New Computing Infrastructure, Kesselman & Foster Example : Science Grid (US Department of Energy)

Data GridA data grid is a grid computing system that deals with data the controlled sharing and management of large amounts of distributed data.

Data Grid is the storage component of a grid environment. Scientific and engineering applications require access to large amounts of data, and often this data is widely distributed. A data grid provides seamless access to the local or remote data required to complete compute intensive calculations. Example :Biomedical informatics Research Network (BIRN), the Southern California earthquake Center (SCEC).


Methods of Grid ComputingDistributed SupercomputingHigh-Throughput ComputingOn-Demand ComputingData-Intensive ComputingCollaborative ComputingLogistical Networking


Distributed SupercomputingCombining multiple high-capacity resources on a computational grid into a single, virtual distributed supercomputer.

Tackle problems that cannot be solved on a single system.30-Aug-1635Dr Gnanasekaran Thangavel

35

High-Throughput ComputingUses the grid to schedule large numbers of loosely coupled or independent tasks, with the goal of putting unused processor cycles to work.On-Demand ComputingUses grid capabilities to meet short-term requirements for resources that are not locally accessible.Models real-time computing demands.30-Aug-1636Dr Gnanasekaran Thangavel

36

Collaborative ComputingConcerned primarily with enabling and enhancing human-to-human interactions. Applications are often structured in terms of a virtual shared space.Data-Intensive ComputingThe focus is on synthesizing new information from data that is maintained in geographically distributed repositories, digital libraries, and databases.Particularly useful for distributed data mining.30-Aug-1637Dr Gnanasekaran Thangavel

37

Logistical NetworkingLogistical networks focus on exposing storage resources inside networks by optimizing the global scheduling of data transport, and data storage. Contrasts with traditional networking, which does not explicitly model storage resources in the network. high-level services for Grid applications Called "logistical" because of the analogy it bears with the systems of warehouses, depots, and distribution channels. 30-Aug-1638Dr Gnanasekaran Thangavel

38

P2P Computing vs Grid ComputingDiffer in Target CommunitiesGrid system deals with more complex, more powerful, more diverse and highly interconnected set of resources than P2P.VO


A typical view of Grid environment

UserResource BrokerGrid Resources

Grid Information Service

A User sends computation or data intensive application to Global Grids in order to speed up the execution of the application.

A Resource Broker distribute the jobs in an application to the Grid resources based on users QoS requirements and details of available Grid resources for further executions.

Grid Resources (Cluster, PC, Supercomputer, database, instruments, etc.) in the Global Grid execute the user jobs.

Grid Information Service system collects the details of the available Grid resources and passes the information to the resource broker.Computation resultGrid applicationComputational jobsDetails of Grid resources

Processed jobs


Grid MiddlewareGrids are typically managed by grid ware - a special type of middleware that enable sharing and manage grid components based on user requirements and resource attributes (e.g., capacity, performance) Software that connects other software components or applications to provide the following functions: Run applications on suitable available resources Brokering, Scheduling Provide uniform, high-level access to resources Semantic interfaces Web Services, Service Oriented Architectures Address inter-domain issues of security, policy, etc. Federated Identities Provide application-level status monitoring and control30-Aug-1641Dr Gnanasekaran Thangavel

MiddlewareGlobus chicago UnivCondor Wisconsin Univ High throughput computingLegion Virginia Univ virtual workspaces- collaborative computingIBP Internet back pane Tennesse Univ logistical networkingNetSolve solving scientific problems in heterogeneous env high throughput & data intensive30-Aug-1642Dr Gnanasekaran Thangavel

42

Two Key Grid Computing GroupsThe Globus Alliance (www.globus.org)Composed of people from: Argonne National Labs, University of Chicago, University of Southern California Information Sciences Institute, University of Edinburgh and others.OGSA/I standards initially proposed by the Globus Group

The Global Grid Forum (www.ggf.org)Heavy involvement of Academic Groups and Industry(e.g. IBM Grid Computing, HP, United Devices, Oracle, UK e-Science Programme, US DOE, US NSF, Indiana University, and many others) ProcessMeets three times annuallySolicits involvement from industry, research groups, and academics30-Aug-1643Dr Gnanasekaran Thangavel

Some of the Major Grid ProjectsNameURL/SponsorFocusEuroGrid, Grid Interoperability (GRIP)eurogrid.orgEuropean UnionCreate tech for remote access to super comp resources & simulation codes; in GRIP, integrate with Globus ToolkitFusion Collaboratoryfusiongrid.orgDOE Off. ScienceCreate a national computational collaboratory for fusion researchGlobus Projectglobus.orgDARPA, DOE, NSF, NASA, MsoftResearch on Grid technologies; development and support of Globus Toolkit; application and deploymentGridLabgridlab.orgEuropean UnionGrid technologies and applicationsGridPPgridpp.ac.ukU.K. eScienceCreate & apply an operational grid within the U.K. for particle physics researchGrid Research Integration Dev. & Support Centergrids-center.orgNSFIntegration, deployment, support of the NSF Middleware Infrastructure for research & education


Grid Architecture30-Aug-1645Dr Gnanasekaran Thangavel

The Hourglass ModelFocus on architecture issuesPropose set of core services as basic infrastructureUsed to construct high-level, domain-specific solutions (diverse)Design principlesKeep participation cost lowEnable local controlSupport for adaptationIP hourglass modelDiverse global services

CoreservicesLocal OS

A p p l i c a t i o n s


Layered Grid Architecture(By Analogy to Internet Architecture)Application

FabricControlling things locally: Access to, & control of, resourcesConnectivityTalking to things: communication (Internet protocols) & securityResourceSharing single resources: negotiating access, controlling useCollectiveCoordinating multiple resources: ubiquitous infrastructure services, app-specific distributed servicesInternetTransportApplicationLinkInternet Protocol Architecture30-Aug-1647Dr Gnanasekaran Thangavel

We define Grid architecture in terms of a layered collection of protocols. Fabric layer includes the protocols and interfaces that provide access to the resources that are being shared, including computers, storage systems, datasets, programs, and networks. This layer is a logical view rather then a physical view. For example, the view of a cluster with a local resource manager is defined by the local resource manger, and not the cluster hardware. Likewise, the fabric provided by a storage system is defined by the file system that is available on that system, not the raw disk or tapes.The connectivity layer defines core protocols required for Grid-specific network transactions. This layer includes the IP protocol stack (system level application protocols [e.g. DNS, RSVP, Routing], transport and internet layers), as well as core Grid security protocols for authentication and authorization.Resource layer defines protocols to initiate and control sharing of (local) resources. Services defined at this level are gatekeeper, GRIS, along with some user oriented application protocols from the Internet protocol suite, such as file-transfer.Collective layer defines protocols that provide system oriented capabilities that are expected to be wide scale in deployment and generic in function. This includes GIIS, bandwidth brokers, resource brokers,.Application layer defines protocols and services that are parochial in nature, targeted towards a specific application domain or class of applications. These are are are arrgh

Example:Data Grid ArchitectureDiscipline-Specific Data Grid ApplicationCoherency control, replica selection, task management, virtual data catalog, virtual data code catalog, Replica catalog, replica management, co-allocation, certificate authorities, metadata catalogs, Access to data, access to computers, access to network performance data, Communication, service discovery (DNS), authentication, authorization, delegationStorage systems, clusters, networks, network caches, Collective(App)AppCollective(Generic)ResourceConnectFabric30-Aug-1648Dr Gnanasekaran Thangavel

Simulation toolsGridSim job schedulingSimGrid single client multiserver schedulingBricks schedulingGangSim- Ganglia VOOptoSim Data Grid SimulationsG3S Grid Security services Simulator security services


Simulation tool

GridSim is a Java-based toolkit for modeling, and simulation of distributed resource management and scheduling for conventional Grid environment.

GridSim is based on SimJava, a general purpose discrete-event simulation package implemented in Java.

All components in GridSim communicate with each other through message passing operations defined by SimJava. 50Dr Gnanasekaran Thangavel30-Aug-16

Salient features of the GridSimIt allows modeling of heterogeneous types of resources.Resources can be modeled operating under space- or time-shared mode.Resource capability can be defined (in the form of MIPS (Million Instructions Per Second) benchmark.Resources can be located in any time zone.Weekends and holidays can be mapped depending on resources local time to model non-Grid (local) workload.Resources can be booked for advance reservation.Applications with different parallel application models can be simulated.51Dr Gnanasekaran Thangavel30-Aug-16

Salient features of the GridSimApplication tasks can be heterogeneous and they can be CPU or I/O intensive.There is no limit on the number of application jobs that can be submitted to a resource.Multiple user entities can submit tasks for execution simultaneously in the same resource, which may be time-shared or space-shared. This feature helps in building schedulers that can use different market-driven economic models for selecting services competitively.Network speed between resources can be specified.It supports simulation of both static and dynamic schedulers.Statistics of all or selected operations can be recorded and they can be analyzed using GridSim statistics analysis methods.52Dr Gnanasekaran Thangavel30-Aug-16

A Modular Architecture for GridSim Platform and Components.

Appn ConfRes ConfUser ReqGrid ScOutputApplication, User, Grid Scenarios input and ResultsGrid Resource Brokers or Schedulers

Appn modelingRes entityInfo servJob mgmtRes allocStatis

GridSim Toolkit

Single CPUSMPsClustersLoadNetwReservationResource Modeling and Simulation

SimJavaDistributed SimJavaBasic Discrete Event Simulation Infrastructure

PCsWorkstationClustersSMPsDistributed ResourcesVirtual Machine53Dr Gnanasekaran Thangavel30-Aug-16

Web 2.0, Clouds, and Internet of ThingsHPC: High - Performance Computing HTC: High - Throughput ComputingP2P: Peer to Peer MPP: Massively Parallel Processors


55What is a Service Oriented Architecture?

56What is a Service Oriented Architecture (SOA)?A method of design, deployment, and management of both applications and the software infrastructure where: All software is organized into business services that are network accessible and executable. Service interfaces are based on public standards for interoperability.

57Key Characteristics of SOAQuality of service, security and performance are specified. Software infrastructure is responsible for managing. Services are cataloged and discoverable. Data are cataloged and discoverable. Protocols use only industry standards.

58What is a Service?A Service is a reusable component.A Service changes business data from one state to another.A Service is the only way how data is accessed.If you can describe a component in WSDL, it is a Service.

59Information Technology is Not SOA

Business Mission

Information Management

Information Systems

Systems Design

Computing & Communications

Information Technology

SOA

60Why Getting SOA Will be Difficult Managing for Projects:Software: 1 - 4 yearsHardware: 3 - 5 years;Communications: 1 - 3 years;Project Managers: 2 - 4 years;Reliable funding: 1 - 4 years;User turnover: 30%/year;Security risks: 1 minute or less.Managing for SOA:Data: forever.Infrastructure: 10+ years.

61Why Managing Business Systems is Difficult?40 Million lines of code in Windows XP is unknowable.Testing application (3 Million lines) requires >1015 tests.Probability correct data entry for a supply item is 100 formats that identify a person in DoD.Output / Office Worker: >30 e-messages /day.

62How to View Organizing for SOA STABILITY HEREVARIETY HERE

Corporate Policy, Corporate Standards, Reference Models,

Data Management and Tools, Integrated Systems

Configuration Data Base, Shared Computing and

Telecommunications

Applications Development & MaintenanceENTERPRISE LEVELPROCESS LEVELBUSINESS LEVELAPPLICATION LEVELLOCAL LEVEL

Graphic InfoWindow, Personal Tools, Inquiry Languages

Customized Applications, Prototyping Tools, Local

Applications and Files Applications

Security Barrier

Business

Security BarrierProcess

Security BarrierPrivacy and

Individual

Security Barrier

GLOBAL LEVEL

Industry Standards, Commercial Off-the-Shelf

Products and Services

PERSONAL LEVEL

Private Applications and Files

Functional Process AFunctional Process BFunctional Process CFunctional Process DOSDService AService B

62

63SOA Must Reflect Timing

Corporate Policy, Corporate Standards, Reference Models,

Data Management and Tools, Integrated Systems

Configuration Data Base, Shared Computing and

Telecommunications, Security and Survivability

Business ABusiness BInfrastructure

Support

Applications Development & MaintenanceENTERPRISEPROCESSBUSINESSAPPLICATION LOCAL

Graphic InfoWindow, Personal Tools, Inquiry Languages

Customized Applications, Prototyping Tools, Local

Applications and Files

GLOBAL

Industry Standards, Commercial Off-the-Shelf

Products and Services

PERSONAL

Private Applications and Files

Functional Process AFunctional Process BFunctional Process CFunctional Process D

LONG TERM STABILITY &

TECHNOLOGY

COMPLEXITYSHORT TERM ADAPTABILITY &

TECHNOLOGY

SIMPLICITY

63

64SOA Must Reflect Conflicting InterestsEnterpriseMissions OrganizationsLocalPersonal

65Organization of Infrastructure Services InfrastructureServices(Enterprise Information)DataServices SecurityServicesComputingServices Communication ServicesApplicationServices

66Organization of Data Services DataServices

DiscoveryServices ManagementServices CollaborationServices InteroperabilityServices SemanticServices

67Data Interoperability Policies Data are an enterprise resource.Single-point entry of unique data.Enterprise certification of all data definitions.Data stewardship defines data custodians. Zero defects at point of entry.De-conflict data at source, not at higher levels.Data aggregations from sources data, not from reports.

67

68Data ConceptsData Element Definition Text associated with a unique data element within a data dictionary that describes the data element, give it a specific meaning and differentiates it from other data elements. Definition is precise, concise, non-circular, and unambiguous. (ISO/IEC 11179 Metadata Registry specification) Data Element Registry A label kept by a registration authority that describes a unique meaning and representation of data elements, including registration identifiers, definitions, names, value domains, syntax, ontology and metadata attributes. (ISO 11179-1).

69Data and Services Deployment PrinciplesData, services and applications belong to the Enterprise. Information is a strategic asset. Data and applications cannot be coupled to each other. Interfaces must be independent of implementation. Data must be visible outside of the applications.Semantics and syntax is defined by a community of interest. Data must be understandable and trusted.

70Organization of Security Services SecurityServices

TransferServices ProtectionServices CertificationServices SystemsAssuranceAuthenticationServices

71Security Services = Information AssuranceConduct Attack/Event ResponseEnsure timely detection and appropriate response to attacks. Manage measures required to minimize the networks vulnerability.Secure Information ExchangesSecure information exchanges that occur on the network with a level of protection that is matched to the risk of compromise. Provide Authorization and Non-Repudiation ServicesIdentify and confirm a user's authorization to access the network.

72Organization of Computing Services ComputingServices

ComputingFacilities

ResourcePlanning

Control &Quality ConfigurationServices FinancialManagement

73Computing Services Provide Adaptable Hosting EnvironmentsGlobal facilities for hosting to the edge. Virtual environments for data centers. Distributed Computing InfrastructureData storage, and shared spaces for information sharing.Shared Computing Infrastructure ResourcesAccess shared resources regardless of access device.

74Organization of Communication Services CommunicationServices

Interoperability Services

SpectrumManagement

ConnectivityArrangements Continuity ofServices Resource Management

75Network Services ImplementationFrom point-to-point communications (push communications) to network-centric processes (pull communications).Data posted to shared space for retrieval.Network controls assure data synchronization and access security.

76Communication ServicesProvide Information TransportTransport information, data and services anywhere.Ensures transport between end-user devices and servers. Expand the infrastructure for on-demand capacity.

77Organization of Application Services Application Services

Component Repository

Code BindingServices

MaintenanceManagementPortalsExperimentalServices

78Application Services and ToolsProvide Common End User Interface ToolsApplication generators, test suites, error identification, application components and standard utilities.Common end-user Interface Tools.E-mail, collaboration tools, information dashboards, Intranet portals, etc.

79Example of Development Tools Business Process Execution Language, BPEL, is an executable modeling language. Through XML it enables code generation.

Traditional Approach BPEL Approach- Hard-coded decision logic - Externalized decision logic- Developed by IT - Modeled by business analysts- Maintained by IT - Maintained by policy managers- Managed by IT - Managed by IT- Dependent upon custom logs - Automatic logs and process capture- Hard to modify and reuse - Easy to modify and reuse

80A Few Key SOA Protocols Universal Description, Discovery, and Integration, UDDI. Defines the publication and discovery of web service implementations. The Web Services Description Language, WSDL, is an XML-based language that defines Web Services. SOAP is the Service Oriented Architecture Protocol. It is a key SOA in which a network node (the client) sends a request to another node (the server).The Lightweight Directory Access Protocol, or LDAP is protocol for querying and modifying directory services.Extract, Transform, and Load, ETL, is a process of moving data from a legacy system and loading it into a SOA application.

References Kai Hwang, Geoffery C. Fox and Jack J. Dongarra, Distributed and Cloud Computing: Clusters, Grids, Clouds and the Future of Internet, First Edition, Morgan Kaufman Publisher, an Imprint of Elsevier, 2012.Distributed Computing. http://distributedcomputing.info/index.htmlJie Wu, Distributed System Design, CRC Press, 1999.Distributed Computing, Wikipedia http://en.wikipedia.org/wiki/Distributed_computingwww.psgtech.edu/yrgcc/attach/GridComputing-an%20introduction.pptwww.cse.unr.edu/~mgunes/cpe401/cpe401sp12/lect15_cloud.pptcsnotes.upm.edu.my/kelasmaya/web.nsf/.../$FILE/Distributed%20Computing.pptwww.strassmann.com/pubs/gmu/2007-11-slides.ppt


Other presentations http://www.slideshare.net/drgst/presentations


83Thank YouQuestions and Comments?Dr Gnanasekaran Thangavel30-Aug-16