An Inquiry Report on Grid Computing

8/2/2019 An Inquiry Report on Grid Computing

1/43


2/43

Jeevan Kumar Vishwakarman. All rights reserved. This is a collection of information availed

from various sources including internet, text books, and newspaper publications. This collection

is a sole work of the Mr. Jeevan Kumar Vishwakarman and any kind of reuse of this work in any

form is prohibited, and all the rights on this collection is reserved to him and violation of this

prohibition is punishable under the laws whichever is applicable.

1st M.C.A |M9 MCA AA 0010Sree Saraswathy Thyagaraja College, Thippampatti, Pollachi.

For Personal Contact Use These.

Karampotta, Kozhinjampara,

Palakkad. 678555

[email protected] | vishnudhoodhan@yahoo.


3/43

Jeevan Kumar Vishwakarman iii

CONTENTS

1. A Gentle Introduction To Grid Computing And Technologies ............. 12. What Is Grid Computing? ......................................................................... 13. Grid Computings Ancestors................................. .................................... 24. The Architecture ......................................................................................... 35. Five Big Ideas .............................................. ................................................ 46. Grids Versus Conventional Supercomputers .......................................... 57. Virtual Organizations................................................................................. 6

From passenger jets to chemical spills .................................................................... 68. The Hardware ............................................. ................................................ 69. Design Considerations And Variations ................................................ .... 710. CPU Scavenging ......................................................................................... 811. History ......................................................................................................... 912. Father of the Grid ....................................................................................... 913. Current Projects And Applications......................................................... 15

Fastest virtual supercomputers ............................................................................ 1714. Definitions ................................................................................................. 17

But what does "high performance" mean? ........................................................... 18 15. The Death Of Distance ................................................. ............................ 18

Faster! Faster! .. ................................................................................................... 1916. Secure Access .............................................. .............................................. 19

Security and trust ................................................................................................ 2017. Resource Use ............................................... .............................................. 20

Middleware to the rescue ...... ...... ...... ........ ....... ....... ....... ....... ....... ...... ...... ...... ...... 2118. Resource Sharing ...................................................................................... 2119. But Would You Trust Your Computer To A Complete Stranger? ....... 21


4/43

Jeevan Kumar Vishwakarman iv

20. Open Standards ........................................................................................ 2221. Who Is In Charge Of Grid Standards? ............................... ..................... 2222. The Middleware ....................................................................................... 23

Agents, brokers and striking deals........ ...... ...... ....... ...... ...... ...... ....... ....... ..... ...... .. 23Delving inside middleware .................................................................................. 23

23. Globus Toolkit .......................................................................................... 24Globus includes programs such as:....................................................................... 25

24. National Grids .......................................................................................... 2625. International Grids .................................................................... ............... 2826. High-Throughput Problems ............................ ........................................ 3227. High-Performance Problems ........................... ........................................ 3328. Grid Computing In 30 Seconds ............................................................... 3429. The Dream ................................................................ ................................. 3430. ''Gridifying'' Your Application ................................................................ 3531. Computational Problems ................................................................ ......... 35

Parallel calculations: ............................................................................................ 35Embarrassingly parallel calculations: ................................................................... 36Coarse-grained calculations: ................................................................................ 36Fine-grained calculations: .................................................................................... 36High-performance vs. high-throughput ................................................................ 36And grid computing..? ............ ...... ....... ....... ....... ....... ........ ....... ...... ...... ...... ....... .. 36

32. Breaking Moores Law? ........................................................................... 37Nice Idea, But... ................................................................................................... 37

33. More On Moore's Law ................................................. ............................ 3834. Works Cited .............................................. Error! Bookmark not defined.

Index ..................................................................... Error! Bookmark not defined.


5/43

A Gentle Introduction To Grid Computing And Technologies

Grid is an infrastructure that involves the integrated and collaborative use of

computers, networks, databases and scientific instruments owned and managed by

multiple organizations. Grid applications often involve large amounts of data and/orcomputing resources that require secure resource sharing across organizational

boundaries. This makes Grid application management and deployment a complex

undertaking. Grid middlewares provide users with seamless computing ability and

uniform access to resources in the heterogeneous Grid environment. Several software

toolkits and systems have been developed, most of which are results of academic

research projects, all over the world. This paper presents an introduction to Grid

computing and discusses two complimentary Grid technologies: Globus developed by

researchers from Argonne National Laboratory and University of Southern California,

USA; and Gridbus by researchers from the University of Melbourne, Australia. Globus

primarily focuses on providing core Grid services whereas Gridbus focuses on providing

user-level Grid services in addition to utility computing model for management of grid

resources.

What Is Grid Computing?

grid computing allows the virtualization of distributed computing and data resources

such as processing, network bandwidth and storage capacity to provide a unique system image,

granting users and applications access to vast it capabilities.

Although "the grid" is still just a dream... Grid computing is already reality.

Imagine several million computers from all over the world, and owned by

thousands of different people. Imagine they include desktops, laptops, supercomputers,

data vaults, and instruments like mobile phones, meteorological sensors and telescopes...

Now imagine that all of these computers can be connected to form a single, huge

and super-powerful computer! This huge, sprawling, global computer is what many

people dream "the grid" will be.

"The grid" takes its name from an analogy with the electrical "power grid". The

idea was that accessing computer power from a computer grid would be as simple as

accessing electrical power from an electrical grid".


6/43

An Inquiry Report On Grid Computing a Report Work

Jeevan Kumar Vishwakarman 2

Grid Computings Ancestors

Grid computing didn't just come out of nowhere. It grew from previous efforts

and ideas, such as those listed below:

Grid computing's immediate ancestor is "metacomputing", which datesback to around 1990. Metacomputing was used to describe efforts to

connect us supercomputing centers. Larry smarr, a former director of the

national center for supercomputing applications in the us, is generally

credited with popularizing the term.

Fafner and i-way were cutting-edge metacomputing projects in the us,both conceived in 1995. Each influenced the evolution of key grid

technologies.

Fafner (factoring via network-enabled recursion) aimed to factorize verylarge numbers, a challenge very relevant to digital security. Since this

challenge could be broken into small parts, even fairly modest computers

could contribute useful power. Many fafner techniques for dividing and

distributing computational problems were forerunners of technology

used for seti@home and other "cycle scavenging" software.

I-way (information wide area year) aimed to link supercomputers usingexisting networks. One of i-way's innovations was a computational

resource broker, conceptually similar to those being developed for gridcomputing today. I-way strongly influenced the development of the

globus project, which is at the core of many grid activities, as well as the

legion project, an alternative approach to distributed supercomputing.

Grid computing was born at a workshop called "building a computationalgrid", held at argonne national laboratory in september 1997. Following

this, in 1998, ian foster of argonne national laboratory and carl kesselman

of the university of southern california published "the grid: blueprint for a

new computing infrastructure", often called "the grid bible". Ian foster had

previously been involved in the i-way project, and the foster-kesselman

duo had published a paper in 1997, called "globus: a metacomputing

infrastructure toolkit", clearly linking the globus toolkit with its

predecessor, metacomputing.


7/43



Grid computing is a phrase in distributed computing which can have several

meanings:

A local computer cluster which is like a "grid" because it is composed ofmultiple nodes.

Offering online computation or storage as a metered commercial service,known as utility computing, "computing on demand", or "cloud

computing".

The creation of a "virtual supercomputer" by using spare computingresources within an organization.

The creation of a "virtual supercomputer" by using a network ofgeographically dispersed computers. Volunteer computing, which

generally focuses on scientific, mathematical, and academic problems, is

the most common application of this technology.

These varying definitions cover the spectrum of "distributed computing", and

sometimes the two terms are used as synonyms. This article focuses on distributed

computing technologies which are not in the traditional dedicated clusters; otherwise,

see computer cluster.

Functionally, one can also speak of several types of grids:

Computational grids (including CPU scavenging grids) which focusesprimarily on computationally-intensive operations.

Data grids or the controlled sharing and management of large amounts ofdistributed data.

Equipment grids which have a primary piece of equipment e.g. Atelescope, and where the surrounding grid is used to control the

equipment remotely and to analyze the data produced.

The Architecture

Grid architecture is the way in which a grid has been designed.


8/43



A grid's architecture is often described in terms of "layers", where each layer has

a specific function. The higher layers are generally user-centric, whereas lower layers are

more hardware-centric, focused on computers and networks.

The lowest layer is the network, which connects grid resources. Above the network layer lies the resource layer: actual grid resources,

such as computers, storage systems, electronic data catalogues,

sensors and telescopes that are connected to the network.

The middleware layer provides the tools that enable the variouselements (servers, storage, networks, etc.) To participate in a grid. The

middleware layer is sometimes the "brains" behind a computing grid!

The highest layer of the structure is the application layer, whichincludes applications in science, engineering, business, finance and

more, as well as portals and development toolkits to support the

applications. This is the layer that grid users "see" and interact with.

The application layer often includes the so-called serviceware, which

performs general management functions like tracking who is

providing grid resources and who is using them.

Five Big Ideas

Grid computing is driven by five big areas:

1. Resource sharing: global sharing is the very essence of grid computing.2. Secure access: trust between resource providers and users is essential,

especially when they don't know each other. Sharing resources conflicts

with security policies in many individual computer centers, and on

individual pcs, so getting grid security right is crucial.

3. Resource use: efficient, balanced use of computing resources is essential.4. The death of distance: distance should make no difference: you should be

able to access to computer resources from wherever you are.

5. Open standards: interoperability between different grids is a big goal, andis driven forward by the adoption of open standards for grid

development, making it possible for everyone can contribute

constructively to grid development. Standardization also encourages


9/43



industry to invest in developing commercial grid services and

infrastructure.

Grids Versus Conventional Supercomputers

"distributed" or "grid computing" in general is a special type of parallel

computing which relies on complete computers (with onboard CPU, storage, power

supply, network interface, etc.) Connected by a conventional network interface, such as

ethernet or the internet. This is in contrast to the traditional notion of a supercomputer,

which has many CPUs connected by a local high-speed computer bus.

The primary advantage of distributed computing is that each node can be

purchased as commodity hardware, which when combined can produce similar

computing resources to a many-CPU supercomputer, but at lower cost. This is due to the

economies of scale of producing commodity hardware, compared to the lower efficiency

of designing and constructing a small number of custom supercomputers. The primary

performance disadvantage is that the various CPUs and local storage areas do not have

high-speed connections. This arrangement is thus well-suited to applications where

multiple parallel computations can take place independently, without the need to

communicate intermediate results between CPUs.

The high-end scalability of geographically dispersed grids is generally favorable,

due to the low need for connectivity between nodes relative to the capacity of the public

internet. Conventional supercomputers also create physical challenges in supplying

sufficient electricity and cooling capacity in a single location. Both supercomputers and

grids can be used to run multiple parallel computations at the same time, which might

be different simulations for the same project, or computations for completely different

applications. The infrastructure and programming considerations needed to do this on

each type of platform are different, however.

There are also differences in programming and deployment. It can be costly and

difficult to write programs so that they can be run in the environment of a

supercomputer, which may have a custom operating system, or require the program to

address concurrency issues. If a problem can be adequately parallelized, a "thin" layer of

"grid" infrastructure can cause conventional, standalone programs to run on multiple

machines (but each given a different part of the same problem). This makes it possible to

write and debug programs on a single conventional machine, and eliminates


10/43



complications due to multiple instances of the same program running in the same shared

memory and storage space at the same time.

Virtual Organizations

Virtual organizations (vos) are groups of people who share a data-intensive goal.

To achieve their mutual goal, people within a vo choose to share their resources,

creating a computer grid. This grid can give vo members direct access to each other's

computers, programs, files, data, sensors and networks. This sharing must be controlled,

secure, flexible, and usually time-limited.

From Passenger Jets To Chemical Spills

Many scientists form vos to pursue their research. Vos exist for astronomy

research, alternative energy research, biology research and more.

The needs of each vo are different. For example, a vo formed to develop a next-

generation passenger jet will need to run complex computer simulations, testing various

combinations of components from different manufacturers, while keeping the

proprietary know-how associated with each component hidden from the other

consortium members.Another example is an environmental science vo, tasked with managing a

chemical spill. This vo will need to analyze local weather and soil models to estimate the

spread of the spill and determine its impact. They will need to create a short term

mitigation plan and help emergency response personnel to plan and coordinate the

evacuation.

The Hardware

Grids must be built "on top of" hardware, which forms the physical infrastructure

of a grid - things like computers and networks. This infrastructure is often called the grid

"fabric".


11/43



Networks are an essential piece of the grid "fabric". Networks link the different

computers that form part of a grid, allowing them to be handled as one huge computer.

Networks are characterized by their size (local, national and international) and

throughput (the amount of data transferred in a specific time). Throughput is measured

in kbps (kilo bits per second; where kilo means a thousand), mbps (m for mega; a

million) or gbps (g for giga; a billion).

One of the big ideas of grid computing is to take advantage of ultra-fast

networks. This idea allows us to access globally distributed resources in an integrated

and data-intensive way. Ultra-fast networks also help to minimize latency: the delays

that build up as data are transmitted over the internet.

Grids are built "on top of" high-performance networks, such as the intra-

european geant network, which has 10gbps performance on the network "backbone".

This backbone links the major "nodes" on the grid (like national computing centres).

One level down from the "backbone" are the network links, which join individual

institutions to nodes on the backbone. Performance of these is typically 1gbps.

A further level down are the 10 to 100mbps desktop-to-institution network links.

Design Considerations And Variations

One feature of distributed grids is that they can be formed from computing

resources belonging to multiple individuals or organizations (known as multiple

administrative domains). This can facilitate commercial transactions, as in utility

computing, or make it easier to assemble volunteer computing networks.

One disadvantage of this feature is that the computers which are actually

performing the calculations might not be entirely trustworthy. The designers of the

system must thus introduce measures to prevent malfunctions or malicious participants

from producing false, misleading, or erroneous results, and from using the system as anattack vector. This often involves assigning work randomly to different nodes

(presumably with different owners) and checking that at least two different nodes report

the same answer for a given work unit. Discrepancies would identify malfunctioning

and malicious nodes.


12/43



Due to the lack of central control over the hardware, there is no way to guarantee

that nodes will not drop out of the network at random times. Some nodes (like laptops

or dialup internet customers) may also be available for computation but not network

communications for unpredictable periods. These variations can be accommodated by

assigning large work units (thus reducing the need for continuous network connectivity)

and reassigning work units when a given node fails to report its results as expected.

The impacts of trust and availability on performance and development difficulty

can influence the choice of whether to deploy onto a dedicated computer cluster, to idle

machines internal to the developing organization, or to an open external network of

volunteers or contractors.

In many cases, the participating nodes must trust the central system not to abuse

the access that is being granted, by interfering with the operation of other programs,

mangling stored information, transmitting private data, or creating new security holes.

Other systems employ measures to reduce the amount of trust "client" nodes must place

in the central system. For example, parabon computation produces grid computing

software that operates in a java sandbox.

Public systems or those crossing administrative domains (including different

departments in the same organization) often result in the need to run on heterogeneous

systems, using different operating systems and hardware architectures. With many

languages, there is a tradeoff between investment in software development and the

number of platforms that can be supported (and thus the size of the resulting network).

Cross-platform languages can reduce the need to make this tradeoff, though potentially

at the expense of high performance on any given node (due to run-time interpretation or

lack of optimization for the particular platform).

Various middleware projects have created generic infrastructure, to allow

various scientific and commercial projects to harness a particular associated grid, or for

the purpose of setting up new grids. Boinc is a common one for academic projects

seeking public volunteers; more are listed at the end of the article.

Cpu Scavenging

CPU-scavenging, cycle-scavenging, cycle stealing, or shared computing creates a

"grid" from the unused resources in a network of participants (whether worldwide or

internal to an organization). Usually this technique is used to make use of instruction

cycles on desktop computers that would otherwise be wasted at night, during lunch, or


13/43



even in the scattered seconds throughout the day when the computer is waiting for user

input or slow devices.

Volunteer computing projects use the CPU scavenging model almost exclusively.

In practice, participating computers also donate some supporting amount of diskstorage space, ram, and network bandwidth, in addition to raw CPU power. Nodes in

this model are also more vulnerable to going "offline" in one way or another from time to

time, as their owners use their resources for their primary purpose.

History

The term grid computing originated in the early 1990s as a metaphor for making

computer power as easy to access as an electric power grid in Ian foster and CarlKesselmans seminal work, "the grid: blueprint for a new computing infrastructure".

CPU scavenging and volunteer computing were popularized beginning in 1997

by distributed.net and later in 1999 by seti@home to harness the power of networked

PCs worldwide, in order to solve CPU-intensive research problems.

The ideas of the grid (including those from distributed computing, object

oriented programming, cluster computing, web services and others) were brought

together by Ian foster, Carl Kesselman and Steve Tuecke, widely regarded as the "fathers

of the grid." they led the effort to create the globus toolkit incorporating not just CPU

management (examples: cluster management and cycle scavenging) but also storagemanagement, security provisioning, data movement, monitoring and a toolkit for

developing additional services based on the same infrastructure including agreement

negotiation, notification mechanisms, trigger services and information aggregation.

While the globus toolkit remains the de facto standard for building grid solutions, a

number of other tools have been built that answer some subset of services needed to

create an enterprise grid.

Father Of The Grid

An exact extraction of an article by Amy m. Braverman, in Chicago university e-

magazine, ( on April 2004 volume 96, number 4).


14/43



computer scientist Ian foster has developed the software to tak e shared

computing to a global level.

In a bare research institutes building room with white, cinder-block walls, Ian

foster sits at a red table holding his laptop, blinds shut to block the windows glare, eyes

glazed behind wire-rimmed glasses. i might not be too articulate today, the Arthur

holly Compton distinguished service professor of computer science warns. Im on two

hours sleep. The previous night a west coast students paper was due at midnight,

Pacific Time, and then, awake anyway, he worked online with some European

colleagues. And because the father of grid computing is alsowith wife Angela

Smyth, md00, a hospitals psychiatry fellowthe father of a five- and a six-year-old, he

rarely gets to sleep in.

So when asked to predict how grid computing will change everyday life in five,

ten, 15 years, he thinks for a moment but comes up short. Im not feeling very creative

right now, he says in the quick cadence of a native New Zealander. But foster, 45, who

heads the distributed systems lab at Argonne national laboratory, clearly has had more

inspired moments, persuading the federal government to invest in several multimillion-

dollar grid-technology projects and convincing companies such as IBM, Hewlett-

Packard, oracle, and Sun Microsystems that grids are the answer to complex

computational problemsthe next major evolution of the internet.

Just as the internet is a tool for mass communication, grids are a tool for

amplifying computer power and storage space. By linking far-flung supercomputers,

servers, storage systems, and databases across existing internet lines, grids allow more

numbers to be crunched faster than ever before. Several grid projects exist today, but

eventually, foster says, a huge global gridthe grid, akin to the internetwill

perform complex tasks such as designing semiconductors or screening thousands of

potential pharmaceutical drugs in an hour rather than a year.

Though corporations recently have begun to show interest in grids, research

institutions have long been a ripe testing ground, in the same way that the internet

sprouted in academia before blossoming in the commercial world. Large projects are

already using the technology. The Sloan digital sky surveyan effort at Chicago,

fermilab, and 11 other institutions to map a quarter of the night sky, determining morethan 100 million celestial objects positions and absolute brightnessharnesses computer

power from labs nationwide to perform in minutes scans that previously took a week.

The national digital mammography archive (ndma) in the united states and ediamond in

the united kingdom are creating digital-image libraries to hold their respective countries


15/43



scans. With an expected 35 million u.s. mammograms a year, at 160 megabytes per

exam, the ndma web site explains, the annual volume could exceed 5.6 petabytes [a

petabyte is 1 million gigabytes] a year, and the minimal daily traffic a day is expected to

be 28 terabytes [a terabyte is 1,024 gigabytes]traffic that wouldnt be possible without

a grid. By combining computer power and storage space from multiple locations,

doctors can view a patients progress over time, compare her with other patient

populations, or access diagnostic tools. A similar venture, the biomedical informatics

research network, compiles brain images from different databases so researchers can

compare the brains of alzheimers patients, for example, to those of healthy people.

Still another project is a grid for the network for earthquake engineering

simulation (nees). An $82 million program funded by the national science foundation,

nees seeks to advance earthquake-engineering research and reduce the physical havoc

earthquakes create. The grid, to be completed in october, links civil engineers around the

country with 15 sites containing equipment such as 4-meter-by-4-meter shake tables ortsunami simulators. Through the grid, engineers building san franciscos new bay bridge

tested their design structures remotely to make sure they met the latest earthquake-

resistance standards. At argonne, a neesgrid partner, an 18-square-inch mini shake table,

used for software development and demonstration, sits in material scientist nestor

zaluzecs office. A researcher in, say, san diego can activate the mini shake table, moving

it quickly back and forth to agitate the 2-foot-tall plastic model sitting on it. Likewise,

from his desktop zaluzec can maneuver video cameras in places like boulder, colorado,

or champaign, illinois, to watch or participate in experiments.

At argonne even some meetings about grids are held using grids. With the accessgrid, developed by argonnes futures lab for remote group collaboration, scientists

nationwide convene in a virtual conference room, from large groups such as a 2002

national science foundation meeting, where 28 sites popped in, star treklike, on a white

argonne wall, to smaller thursday test cruises held to keep the system bug-free. At these

sessions access grid programmers susanne lefvert and eric olson sit at personal

computers, talking with wall-projected images of scientists from other energy

department labs, including the princeton plasma physics lab and lawrence berkeley

national lab.

By now the access grid, first used in 1999, has more than 250 research nodesrooms equipped to connecton five continents. A major automobile company and some

oil and gas companies have developed their own access grids, notes futures lab research

manager and computer-science doctoral student mike papka, sm02, and chicago

researchers also are experimenting with the technology. Last fall jonathan silverstein,


16/43



assistant professor of surgery and senior fellow in the joint argonne/ chicago

computation institute, along with chicago anesthesiologist stephen small and

argonne/chicago computer scientist rick stevens, won a national institutes of health

contract to install access grid nodes at the u of c hospitals. Connecting operating rooms,

the emergency room, radiology, ambulances, and residents hand-held tablet pcs, the

three-year prototype project could change the way hospitals process information.

Students will watch not only real-time operating-room video feeds but also feeds from

laparoscopic devices and robotic surgeons. Radiologists will beam three-dimensional x-

ray scans to surgeonsminus middlemen and waiting time. we are in all these complex

environments, silverstein says. The grid allows medical workers literally to share

environments, eliminate hand-offs, avoid phone taginstead of passing messages

between multiple physicians or waiting before taking the next step, we could all meet

for one moment and relay necessary information.

Then theres the teragrid. Launched in 2001 by the national science foundationwith $53 million, the teragrid aims to be the worlds largest, most comprehensive,

distributed infrastructure for open scientific research, its web site declares. Beginning

with five sitesargonne; the university of illinois urbana-champaign; the university of

california, san diego; the california institute of technology; and the pittsburgh

supercomputing centerthe project has since picked up four more partners. To be

finished by late september, teragrid executive director charlie catlett says, it will have 20

teraflops (a teraflop equals a trillion operations per second) of computing power and a

petabyte of storage space. Many of its sites, the web page says, already boast a cross-

country network backbone four times faster than the fastest research networks currently

in existence.

The teragrid aims to revolutionize the speed at which science operates. The

multi-institutional mimd lattice computation collaboration, for instance, which tests

quantum chromodynamic theory and helps interpret high-energy accelerator

experiments, uses more than 2 million processor hours of computer time per yearand

needs more. Another project, namd, a parallel molecular dynamics code designed to

simulate large biomolecular systems, has maxed out the fastest system available. On the

teragrid, already used by some projects, such research can move forward.

Sharing resourcesa practice known as distributed computinggoes back tocomputers early days. In the late 1950s and early 1960s researchers realized that the

machines, then costing tens or even hundreds of thousands of dollars, needed to be more

efficient. Because they spent much time idly waiting for human input, the researchers

reasoned, multiple users could share them by doling out that unemployed power. Today


17/43



computers are cheaper, but theyre still underutilizedfive percent usage is normal,

foster sayswhich is one reason many companies connect their computers to form

unified networks. In a sense grids are simply another variety of distributed computing,

now used in many forms. Cluster computing, for example, links multiple pcs to replace

unwieldy mainframes or supercomputers. In peer-to-peer computing, such as napster,

users who have downloaded specific software can connect to each other and share files.

And theres internet computing, most notably seti@home, a virtual supercomputer based

at the university of california, berkeley, that analyzes data from puerto ricos arecibo

radio telescope to find signs of extraterrestrial intelligence. Pc users download

seti@homes screen-saver program, and when their computers are otherwise idle they

retrieve data from the internet and send the results to a central processing system.

But a lot had to happen between the grids earliest inklings and its current test

beds. Foster, who switched from studying math and chemistry to computer science at

new zealands university of canterbury before earning a doctorate in the field at londonsimperial college, came to argonne in 1989. Programming specialized languages for

computing chemistry codes, he used parallel networks, similar to clusters. high-speed

networks were starting to appear, he writes in the april 2003 scientific american, and it

became clear that if we could integrate digital resources and activities across networks, it

could transform the process of scientific work. Indeed research was occurring more and

more on an international scale, with scientists from different institutions trying to share

data that was growing exponentially. In 1994 foster refocused his research to distributed

computing. With steven tuecke, today the lead software architect in argonnes

distributed systems laboratory, and carl kesselman, now director of the center for grid

technologies at the university of southern californias information sciences institute, he

began the globus project, a software system for international scientific collaboration. In

the same way that internet protocols became standard for the web, creating a common

language and tools, they envisioned globus software that would link sites into a virtual

organization, with standardized methods to authenticate identities, authorize specific

activities, and control data movement.

The concept was quickly put to use. At a 1995 supercomputing conference rick

stevens, who also directs argonnes math and computer-science division, and thomas a.

Defanti, director of the university of illinoischicagos electronic visualization lab,

headed a prototype project, called i-way (information wide area year), that linked 17

high-speed research networks for two weeks. Fosters team developed the software that,

he writes in scientific american, knitted the sites into a single virtual system, so users

could log on once, locate suitable computers, reserve time, load application codes, and

then monitor their execution. Scientists performed computationally complicated


18/43



simulations such as colliding neutron stars and moving cloud patterns around the

planet. it was the woodstock of the grid, larry smarr, the conferences program chair

and now director of the california institute for telecommunications and information

technology, told the new york times last july, everyone not sleeping for three days,

running around and engaged in a kind of scientific performance art.

The experience inspired much enthusiasmand funding. The u.s. defense

advanced research projects agency gave the globus project $800,000 a year for three

years. In 1997 fosters team unveiled the first version of the globus toolkit, the software

that does the knitting. The national science foundation, nasa, and the energy department

began grid projects, with globus underlying them all. And while foster and his crew

have used an open-source approach to develop the technology, making the software

freely available and its code open for outside programmers to read and modify, in 1998

he and his colleagues also began the global grid forum, a group that meets three times a

year to adopt basic language and infrastructure standards. Such standards, foster writesin what is the grid? (july 2002), allow users to collaborate with any interested party

and thus to create something more than a plethora of balkanized, incompatible, non-

interoperable distributed systems.

The globus toolkit, named the most promising new technology by r&d

magazine in 2002, a top-ten emerging technology by technology review in 2003, and

given a chicago innovation award last year by the sun-times, still needs work to perfect

security and other measures. But the open-source model, much like that used to develop

the internet, has proved useful in ferreting out bugs and making improvements. When

physicists overloaded one grid system by submitting tens of thousands of tasks at once,for example, the university of wisconsin helped design applications to manage a grids

many users. As the technology moves from research institutions, whose data is stored

mostly in electronic files, to corporations, which favor databases, the uks e-science

program is developing ways to handle the different systems.

Without the open-source approach, foster says, the software might not have

become the de facto standard for most grid projects, and ibm, the globus toolkits sole

corporate funder for the past three years, wouldnt have taken such an active role.

success of the grid depends on everyone adopting it, he says, so its

counterproductive to work in private. Brokerage firm charles schwab uses a griddeveloped by ibm to give its clients real-time investment advice. The computer company

also has projects under way with morgan stanley and hewitt associates. For fosterthe

british computer societys 2002 lovelace medal winner and a 2003 american association

for the advancement of science fellowsuch corporate ventures are a critical step in


19/43



making grids, already a powerful scientific tool, important in everyday life, when the

grid will be as common as the internetand as seamless. In the 1960s mits fernando

corbato, whom foster calls the father of time-sharing operating systems, described

shared computing as a utility, meaning computer access would operate like water, gas,

and electricity, where a client would connect and pay by usage amount. Today the grid

is envisioned similarly, and utility computing is used synonymously.

But when grids will become so ubiquitous remains a big question. Even on a full

nights sleep fostertodays father figurehesitates to guess beyond thats some way

out, happily encouraging his virtual child but not wanting to impose unrealistic

expectations. its a process, he says. Although large grids are running in both the

united states and europe, and foster skipped the march global grid forum meeting in

berlin to talk up grids in his homeland new zealand, we havent nailed down all the

standards. Theres more to be done. Its a global, multi-industry path hes forging, and

if he cant predict where the next generation will head, hes prepared the grid to lead theway.

Current Projects And Applications

Grids offer a way to solve grand challenge problems like protein folding,

financial modeling, earthquake simulation, and climate/weather modeling. Grids offer a

way of using the information technology resources optimally inside an organization.

They also provide a means for offering information technology as a utility forcommercial and non-commercial clients, with those clients paying only for what they

use, as with electricity or water.

Grid computing is presently being applied successfully by the national science

foundation's national technology grid, nasa's information power grid, pratt & whitney,

bristol-myers squibb, co., and american express.

One of the most famous cycle-scavenging networks is seti@home, which was

using more than 3 million computers to achieve 23.37 sustained teraflops (979 lifetime

teraflops) as of september 2001.

As of may 2005, folding@home had achieved peaks of 186 teraflops on over

160,000 machines.

As of august 2009 folding@home achieves more than 4 petaflops on over 350,000

machines.


20/43



The european union has been a major proponent of grid computing. Many

projects have been funded through the framework programme of the european

commission. Many of the projects are highlighted below, but two deserve special

mention: beingrid and enabling grids for e-science.

Beingrid (business experiments in grid) is a research project partly funded by the

european commission[citation needed] as an integrated project under the sixth

framework programme (fp6) sponsorship program. Started in june 1, 2006, the project

will run 42 months, until november 2009. The project is coordinated by atos origin.

According to the project fact sheet, their mission is to establish effective routes to foster

the adoption of grid computing across the eu and to stimulate research into innovative

business models using grid technologies. To extract best practice and common themes

from the experimental implementations, two groups of consultants are analyzing a series

of pilots, one technical, one business. The results of these cross analyzes are provided by

the website it-tude.com. The project is significant not only for its long duration, but alsofor its budget, which at 24.8 million euros, is the largest of any fp6 integrated project. Of

this, 15.7 million is provided by the european commission and the remainder by its 98

contributing partner companies.

Another well-known project is distributed.net, which was started in 1997 and has

run a number of successful projects in its history.

The nasa advanced supercomputing facility (nas) has run genetic algorithms

using the condor cycle scavenger running on about 350 sun and sgi workstations.

Until april 27, 2007, united devices operates the united devices cancer research

project based on its grid mp product, which cycle scavenges on volunteer pcs connected

to the internet. As of june 2005, the grid mp ran on about 3,100,000 machines .

The enabling grids for e-science project, which is based in the european union

and includes sites in asia and the united states, is a follow up project to the european

datagrid (edg) and is arguably the largest computing grid on the planet. This, along with

the lhc computing grid (lcg) have been developed to support the experiments using the

cern large hadron collider. The lcg project is driven by cern's need to handle huge

amounts of data, where storage rates of several gigabytes per second (10 petabytes per

year) are required. A list of active sites participating within lcg can be found online as

can real time monitoring of the egee infrastructure . The relevant software and

documentation is also publicly accessible .


21/43



Fastest Virtual Supercomputers

Boinc - 525 teraflops ( as of 4 jun 2007 )

Definitions

Today there are many definitions of grid computing:

The definitive definition of a grid is provided by ian foster in hisarticle "what is the grid? A three point checklist" the three points of

this checklist are:

Computing resources are not administered centrally. Open standards are used. Non-trivial quality of service is achieved.

Plaszczak/wellner define grid technology as "the technology thatenables resource virtualization, on-demand provisioning, and service

(resource) sharing between organizations."

Ibm defines grid computing as "the ability, using a set of openstandards and protocols, to gain access to applications and data,processing power, storage capacity and a vast array of other

computing resources over the internet. A grid is a type of parallel and

distributed system that enables the sharing, selection, and

aggregation of resources distributed across 'multiple' administrative

domains based on their (resources) availability, capacity,

performance, cost and users' quality-of-service requirements"

An earlier example of the notion of computing as utility was in 1965by mit's fernando corbat. Fernando and the other designers of the

multics operating system envisioned a computer facility operating

"like a power company or water company".

Buyya (Dr. Rajkumar Buyya is a Senior Lecturer and the Storage Tekfellow of Grid Computing in the Department of Computer Science

and Software Engineering at the University of Melbourne, Australia)

defines a grid as "a type of parallel and distributed system that


22/43



enables the sharing, selection, and aggregation of geographically

distributed autonomous resources dynamically at runtime depending

on their availability, capability, performance, cost, and users' quality-

of-service requirements".

Cern, one of the largest users of grid technology, talk of the grid: "aservice for sharing computer power and data storage capacity over

the internet."

Pragmatically, grid computing is attractive to geographically-distributed non-profit collaborative research efforts like the ncsa

bioinformatics grids such as birn: external grids.

Grid computing is also attractive to large commercial enterprises withcomplex computation problems who aim to fully exploit their internal

computing power: internal grids.

ServePath.com defines grid computing as, The definition of GridComputing is the simultaneous application of multiple computers to

a problem that typically requires access to significant amounts of data

or a large number of computer processing cycles. Grid computing is

quickly gaining popularity due to its ability to maximize the

efficiency of computing sources as well as its ability to solve large

problems with considerably less computing power.

Grids can be categorized with a three stage model of departmental grids,

enterprise grids and global grids. These correspond to a firm initially utilising resources

within a single group i.e. An engineering department connecting desktop machines,

clusters and equipment. This progresses to enterprise grids where nontechnical staff's

computing resources can be used for cycle-stealing and storage. A global grid is a

connection of enterprise and departmental grids that can be used in a commercial or

collaborative manner.

But What Does "High Performance" Mean?

Performance is measured in flops. A flop is a basic computational operation - like

adding two numbers together. A gigaflop is a billion flops, or a billion operations.

The Death Of Distance


23/43



Computing grids use international networks to link computing resources from all

over the world. This means you can sit in france and use computers in the u.s, or work

from australia using computers from taiwan.

Such international grids are possible today because of the impressive

development of networking technology. Ten years ago, it would have been stupid to

send large amounts of data across the globe for processing on other computer resources,

because of the time taken to transfer the data. Today, all this is possible and more!

Pushed by the internet economy and the widespread penetration of optical fibers

in telecommunications systems, the performance of wide area networks has been

doubling every nine months or so over the last few years. That translates to a 3000x

improvement in 15 years. Imagine if cars had made the same improvements in speed

since 1985you could easily go into orbit by pressing down hard on the accelerator!

Faster! Faster!

Some researchers have computing needs that make even the fastest connections

seem slow: some scientists need even higher-speed connectivity, up to tens of gigabits

per second (gbps); others need ultra-low "latency", which means there is minimal delay

when sending date to remote colleagues in "real time".

Other researchers want "just-in-time" delivery of data across a grid, so that

complicated calculations requiring constant communication between processors can be

performed. To avoid communication bottlenecks, grid developers also have to determine

ways to compensate for failures, like transmission errors or pc crashes.

To meet such critical requirements, several high-performance networking issues

have to be solved, including the optimization of transport protocols and the

development of technical solutions such as high-performance ethernet switching.

Secure Access

Secure access to shared resources is one of the most challenging areas of grid

development.

To ensure secure access, grid developers and users need to manage three

important things:


24/43



Access policy - what is shared? Who is allowed to share? When cansharing occur?

Authentication - how do you identify a user or resource? Authorization - how do you determine whether a certain operation is

consistent with the rules?

Grids need to efficiently track of all this information, which may change from

day to day. This means that grids need to be extremely flexible, and have a reliable

accounting mechanism. Ultimately, such accounting will be used to decide pricing

policies for using a grid.

These accounting challenges are not new - the same questions arise whenever

you use your credit card in a caf. But grid users must share resources, and so grids

require new solutions. Imagine if the owner of a caf were to lend some tables to anothercaf...how would you securely track customers, orders and payments?

Security And Trust

The issue of security is linked to trust: you may trust the other users, but do you

trust that your data and applications are securely protected on their shared machines?

Without adequate security, someone could read or modify your data - hence the

warnings about security when you use your credit card on the internet.

The issue of security concerns all information technologies and is taken very

seriously. New security solutions are constantly being developed, including

sophisticated data encryption techniques. But it is a never-ending race to stay ahead of

malicious hackers.

Resource Use

Grids allow you to efficiently and automatically spread your work across manycomputer resources. The result? Your jobs are finished much faster.

Imagine if you had to do 1000 difficult maths questions. You could do them

yourself, or you could use a computing grid. If you used a grid of 100 computers, you

would give one question or "job" to each computer. When a computer finished one "job",


25/43


26/43


27/43



The open grid forum is a standards body for the grid community. With more

than 5000 volunteer members, this body is a significant force for setting standards and

community developments.

The Middleware

"middleware" is the software that organizes and integrates the resources in a

grid.

Middleware is made up of many software programs, containing hundreds of

thousands of lines of computer code. Together, this code automates all the "machine to

machine" (m2m) interactions that create a single, seamless computational grid.

Agents, Brokers And Striking Deals

Middleware automatically negotiate deals in which resources are exchanged,

passing from a grid resource provider to a grid user. In these deals, some middleware

programs act as "agents" and others as "brokers".

Agent programs present "metadata" (data about data) that describes users, data

and resources. Broker programs undertake the m2m negotiations required for user

authentication and authorization, and then strike the "deals" for access to, and payment

for, specific data and resources.

Once a deal is set, the broker schedules the necessary computational activities

and oversees the data transfers. At the same time, special "housekeeping" agents

optimize network routings and monitor quality of service.

And all this occurs automatically, in a fraction of the time that it would take

humans at their computers to do manually.

Delving Inside Middleware

There are many other layers within the middleware layer. For example,

middleware includes a layer of "resource and connectivity protocols", and a higher layer

of "collective services".


28/43



Resource and connectivity protocols handle all grid-specific network transactions

between different computers and grid resources. For example, computers contributing to

a particular grid must recognize grid-relevant messages and ignore the rest. This is done

with communication protocols, which allow the resources to communicate with each

other, enabling exchange of data, and authentication protocols, which provide secure

mechanisms for verifying the identity of both users and resources.

The collective services are also based on protocols: information protocols, which

obtain information about the structure and state of the resources on a grid, and

management protocols, which negotiate uniform access to the resources. Collective

services include:

Updating directories of available resources

Brokering resources (which like stock broking, is about negotiatingbetween those who want to "buy" resources and those who want to "sell")

Monitoring and diagnosing problems Replicating data so that multiple copies are available at different locations

for ease of use

Providing membership/policy services for tracking who is allowed to dowhat and when.

Globus Toolkit

The globus toolkit is a popular example of grid middleware. It's a set of tools for

constructing a grid, covering security measures, resource location, resource

management, communications and so on.

Many major grid projects use the globus toolkit, which is being developed by the

globus alliance, a team primarily involving ian foster's team at argonne national

laboratory and carl kesselman's team at the university of southern california in los

angeles.

Many of the protocols and functions defined by the globus toolkit are similar tothose in networking and storage today, but have been optimized for grid-specific

deployments.


29/43


30/43



National Grids

National grids like those listed below combine national computing resouces to

create powerful grid computing resources.

D-grid dutchgrid

Eneagrid fermilab computing division

Grid-ireland hungrid

National grid service nersc

Norgrid swegrid

Teragrid thai national grid twgrid

Project details d-grid (germany)

Synopsis the first d-grid projects started in september 2005 with the goal

of developing a distributed, integrated resource platform for

high-performance computing and related services to enable

the processing of large amounts of scientific data and

information.

Project details dutchgrid (the netherlands)

Synopsis dutchgrid is the platform for grid computing and technology

in the netherlands. Open to all institutions for research andtest-bed activities, dutchgrid aims to coordinate various grid

deployment efforts and to offer a forum for the exchange of

experiences on grid technologies.

Project details eneagrid (italy)

Synopsis eneagrid makes use of grid technologies to provide an

integrated production environment including all the high

performance and high throughput computational resources

available in enea, the italian national agency for new

technologies, energy and the environment. Interoperability

with other grid infrastructures is currently in operation.

Project details fermilab computing division (fermilab in the u.s.)


31/43



Synopsis fermigrid united fermilabs computing resources into a single

grid infrastructure, changing the way that computing was

done at the lab by improving efficiency and making better use

of resources. Now involved in developing and supporting

innovative computing solutions and services for fermilab.

Project details grid-ireland (ireland)

Synopsis grid-ireland fosters and promotes grid activities in ireland,

involving partners across the country.

Project details hungrid (uk)

Synopsis hungrid is the first official hungarian virtual organization of

egee. Its goal is to allow grid users of hungarian academic and

educational institutes to perform the computing activitiesrelevant for their researches and thus the vo functions as a

catch-all vo for all the hungarian participants that do not (yet)

have an established vo in their respective field of research. It is

also an egee testing environment for hungarian research

communities that show interest in starting their own virtual

organizations.

Project details national grid service (uk)

Synopsis the ngs aims to provide coherent electronic access for uk

researchers to all computational and data based resources and

facilities. Join their mailing list for up-to-the-minute ngs

action..

Project details nersc (national energy research scientific computing center in

the u.s.)

Synopsis users can access several nersc resources via globus grid

interfaces using x509 grid certificates. Nersc is part of the open

science grid (osg), and is available to select osg virtual

organizations for compute and storage resources.

Project details norgrid (norway)


32/43


33/43



Ap grid d4science

Deisa eela-2

Egee egi_ds

Euasiagrid eu-indiagrid

Gridpp lcg

Nextgrid nordugrid

Open grid forum ogf-europe

Open science grid pragma

Winds

Project details ap grid asia-pacific grid

Synopsis ap grid is a partnership for grid computing in the asia-pacificregion, aiming to share technologies, resources and knowledge

in order to build, nurture and promote grid technologies and

applications. Partners come from 15 countries in the asia-

pacific and beyond..

Project details d4science distributed collaboratories infrastructure on grid

enabled technology 4 science

Synopsis d4science aims to create grid-based and data-centric e-

infrastructures to support scientific research. It is co-funded by

the european commission until 2010 and involves partners

across europe.

Project details deisa distributed european infrastructure for

supercomputing applications

Synopsis deisa combines the power of supercomputing centres across

europe to accelerate scientific research.

Project details eela e-science for europe and latin america

Synopsis eela aims to provide grid facilities to promote scientific

collaboration between europe and latin america, aiming toensure the long-term sustainability of the e-infrastructure.

Project details egee enabling grids for e-science


34/43



Synopsis egee is the largest multi-disciplinary grid infrastructure in the

world, bringing together more than 120 organisations to

provide scientific computing resources to the european and

global research community. Egee comprises 250 sites in 48

countries and more than 68,000 CPUs available to some 8,000

users, 24 hours a day, 7 days a week.

Project details egi_ds european grid initiative design study

Synopsis the european grid initiative design study aims to establish a

sustainable grid infrastructure in europe. Driven by the needs

and requirements of the research community, it is expected to

enable the next leap in research infrastructures, thereby

supporting collaborative scientific discoveries in the european

research area. Egi_ds includes partners across europe

Project details euasiagrid collaboration between europe and asia

Synopsis euasiagrid aims to pave the way towards an asian e-science

grid infrastructure, in synergy with the other european grid

initiatives in europe and asia.

Project details eu-indiagrid collaboration between europe and india

Synopsis eu-indiagrid will bring together over 500 multidisciplinary

organisations to build a grid-enabled e-science community

aiming to boost r&d innovation across europe and india.

Project details gant pan-european gigabit research network

Synopsis gant provides networking infrastructure to support

researchers, as well as an infrastructure for network research.

Gant aims for high speed connectivity, geographical

expansion, global connectivity and guaranteed quality of

service. It comprises 27 european national research and

education networks.

Project details gridpp grid for uk particle physics

Synopsis gridpp is a collaboration of particle physicists and computing

scientists from the uk and cern, who are building a grid for

particle physics. The main objective is to develop and deploy a


35/43



large-scale science grid in the uk for use by the worldwide

particle physics community.

Project details lcg worldwide lhc computing grid

Synopsis the mission of the lhc computing project (lcg) is to build and

maintain a data storage and analysis infrastructure for the

entire high energy physics community that will use the large

hadron collider.

Project details nextgrid supporting mainstream use of grids

Synopsis nextgrid aims to enable the widespread use of grids by

research, industry and the ordinary citizen, thus creating a

dynamic marketplace for new services and products. An eu-

funded project with multiple partners

Project details nordugrid grids in the nordic region

Synopsis nordugrid is a grid research and development collaboration

aiming at development, maintenance and support of a free

grid middleware known as the "advanced resource connector"

(arc). The collaboration was established by five nordic

academic institutes and is based upon a memorandum of

understanding.

Project details open grid forum international grid standards

Synopsis the open grid forum is a community-initiated forum of 5000+

people interested in distributed computing and grid

technologies. Ogf aims to promote and support grid

technologies via the creation and documentation of "best

practices" - technical specifications, user experiences, and

implementation guidelines. Involves more than 400

organizations from 50 countries.

Project details ogf-europe european and international grid standardsSynopsis ogf-europe works closely with open grid forum and plays a

key role in influencing the drive towards global

standardisation efforts and in bringing best practices in the

european computing environment.


36/43



Project details open science grid open grid infrastructure for collaborative

science

Synopsis the open science grid consortium provides an open grid

infrastructure for science in the u.s and beyond. Osg combines

resources at many u.s. labs and universities and provides

access to shared resources for the benefit of scientific

applications.

Project details pragma pacific rim applications and grid middleware

assembly

Synopsis pragma is an open organization in which pacific rim

institutions collaborate to develop grid-enabled applications

and to deploy the infrastructure throughout the pacific region.Pragma aims to enhance current collaborations and

connections, build new collaborations, and formalize resource-

sharing agreements.

Project details winds grid collaboration in europe, latin america and the

caribbean

Synopsis the www.winds-lac.eu platform, maintained by the winds-la

and winds-caribe projects, aims to further develop and

support ict research and development collaboration between

europe, latin america and the caribbean by identifying

common needs, research issues and opportunities for

cooperation, promoting excellence research from the regions in

europe, and proposing a long-term cooperation strategy in the

field of ict research.

High-Throughput Problems

High-throughput applications are problems that can be divided into many

independent tasks. Computing grids can be used to schedule these tasks, dealing them

out to the different computer processors in the grid. As soon as a processor finishes one


37/43



task, the next task arrives. In this way, hundreds of tasks can be performed in a very

short time.

Examples of high-throughput

: Error! Bookmark not defined. applications include:?

The analysis of thousands of particle collisions in a bid to understandmore about our universe, as in the large hadron collider computing grid

The analysis of thousands of molecules in a bid to discover a drugcandidate against a specific malaria protein, as part of the grid-enabled

wisdom project

The analysis of thousands of protein folding configurations in a bid todiscover more efficient ways of packaging drug proteins, using rosetta

software in the open science grid

The use of volunteer computing to power applications includingseti@home, which aids in the search for extraterrestrial intelligence;

fightaids@home, which models the evolution of drug resistance and helps

to design new anti-hiv drugs, or brats@home, which works on

gravitational ray tracing. These "@home" tasks involved are totally

independent, so it doesn't matter whether some tasks take a long time.

After a "time-out" period, unfinished tasks are simply sent elsewhere to be

processed.

High-Performance Problems

When people talk about "high performance computing" or hpc, they're generally

talking about supercomputing. Supercomputers are different to computing grids: where

grids link computers that are distributed around an institution, country or the world,

supercomputers are one giant computer in a single room.

Supercomputers generally deal with computer-centric problems; the secret to

solving these problems is "teraflops": as many as possible.

Grid computing allows large computational resources to be combined, helping

scientists to tackle problems that cannot be solved on a single system, or to solve

problems much more quickly.


38/43


39/43


40/43


41/43


42/43


43/43

Works Cited

(2010, January). Retrieved from GridCafe.org: http://www.gridcafe.org

(2010, January). Retrieved from WikiPedia: http://www.wikipedia.com

Braverman, A. M. (2004, April). Father of Grid Computing. Retrieved from

University of Chicago Magazine: http://uchicago.edu

Buyya, R., & Venugopal, S. (2005, July). A Gentle Introduction to Grid

Computing and Technologies.

Educause Learning Initiative. (2006, January). 7 things you should know about...

Grid computing. Retrieved from educause.edu: www.educause.edu/eli

Jacob, B., Brown, M., Fukui, K., & Trivedi, N. (2005, December). Introduction to

Grid Computing. Retrieved from IBM RedBooks: http://www.ibm.com/redbooks

Viktors Berstis. (2001). Fundamentals of Grid Computing. Retrieved from IBM

RedBooks: ibm.com/redbooks

Documents

An Inquiry Report on Grid Computing