54
Next Generation Cyber- Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This talk does not necessarily reflect NSFs official opinions

Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

Embed Size (px)

Citation preview

Page 1: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems

Xiaodong ZhangCollege of William and Mary

National Science Foundation

This talk does not necessarily reflect NSFs official opinions

Page 2: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

Hardware Cost and Implications

Storages are large and cheap.

Information and computing available everywhere.

Major Challenges:

distributed resource management security and privacy availability

reliability 19901980 2002

$400,000/MIPS (Cray-I)

$250/MIPS (i860)

$1/MIPS or less

.

..

Page 3: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

Impact on US Computer Exports

Speed Limits on Computer Exports - Russia, China, India, and Middle East Countries - Millions of Theoretical Operations Per Second (MTOPS)

Before 2001, MTOPS = 28,000 - less powerful than a cluster of ten 1.5 GHz/2-way PCs.

2001, MTOPS = 85,000 - less powerful than a cluster of ten 2.2 GHz/4-way PCs.

2002, MTOPS = 195,000 MTOPS -less powerful than a cluster of ten 3 GHz/8-way PCs.

Page 4: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

MTOPS Hardly Reflects Reality MTOPS views a computer as a high performance

calculator. - ignores the deep memory hierarchy, - ignores the fast internel interconnections, - ignores the power of clusters, and - ignores resource sharing using Internet.

Senete passed a bill to remove MTOPS on 9/6/01. The computing power is mainly determined by

effective utilization of aggregated networked resources.

Page 5: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

Commodity Processors Based Clusters Cluster technology becomes mature, providing

sufficient computing resources for 90% applications. Dawning-4000A is ranked number 10 in Top 500.

Who take care the 10% ultra scale applications? High-end systems addressing the problems of

Scalability: scale the system to tens of thousands nodes. Reliability: make the system run for thousands of hours. Managing deep memory hierarchy: fast data delivery.

High-end comp != Grid and Cluster computing!

Page 6: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

Client/Server based IT Infrastructure

Services provided by data/computing centers.

Grid and Web search engines are server-based.

Each server can be built by a distributed cluster.

Inter- and intra resource coordination.

Services are guaranteed and trusted

Security is enforced within each server.

Page 7: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

Client/Server based Grid System Original vision and state-of-the-art Grid:

a global networking infrastructure connecting multiple high performance computational resources.

Targeted applications: Supercomputing across the globe. Collaborative computing Global data repository and data-intensive computing

Core Technology: centralized administration (e.g. resource registrations) centralized management (e.g. job scheduling)

Page 8: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

NSF Sponsored Grid Efforts 1997 to 2002:

Two Partnerships for Adv. Comp. Infras. (PACI) NCSA at Illinois and NPACI at San Diego leading 60+ institutions from 27 states.

Missions: - providing grid computing and data resources - developing grid software tools - applications on grids - education outreach and training.

Page 9: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

Building National Grid Infrastructure 2001 to 2004:

Distributed Terascale Facility (DTF) 4 DTF sites: NCSA, NPACI, Argonne, and Caltech providing aggregated 14+ teraflops and 450+ terabytes.

Tasks: - NCSA: 6+ TFs & 240+TBs Linux cluster of Itanium’s - NPACI: 4+ TFs & 225+ TBs - Angonne: 1+ TF IBM cluster, grid & viz. software - Caltech: 86 TB on-line storage.

Page 10: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

Large NSF Sponsored Grid Projects

GIOD (Globally Interconnected Object Databases)

global data storage and accesses of particle collider experiments GriPhyN (Grid Physics Network) building global grids for experimental physics studies. iVDgL (international Virtual-Data grid Lab)

grids for physics/astronomy experiments

data-intensive science, US & EU collaboration NEES (Network for Earthquake Engineering Simulation) shifting from physical tests to simulation (20 grid sites)

Page 11: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

Additional NSF Grid Efforts 2003 to 2005:

Enhanced Distributed Terascale Facility 4 original DTF sites plus Pittsburgh SC.

Tasks: - Enhancing the existing DTFs’ software and hardware - Testing large scale applications. - Widely connecting to users.

Page 12: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

Limits of Current Grid Systems

Application scope is narrow, and killer apps are limited

Increasingly more local clusters will satisfy applications. Special ones by custom-designed HEC (ES, Blue-gene). Global supercomputing is not cost- and performance-

effective: storing data is much cheaper than transferring.

Deployment of grid is still not easy.

High cost and case by case (e.g. NSF grid projects)

Centralized administration and management limiting the scalability. Creating single points of failures.

Page 13: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

Beyond Client/Server World: Internet

The rapid growing Internet services are provided by an increasing number of peers.

Variety of devices: from cell phones to a Supercomputer Centers.

Pervasive computing: access information and services anytime and anywhere.

Page 14: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

Client/Server Model is Being Challenged

No single server or search engine can sufficiently cover increasing Web contents.

21018 Bytes/year generated in Internet.

But only 31012 Bytes/year available to public (0.00015%).

Google only searches 1.3108 Web pages.

(Source: IEEE Internet Computing, 2001)

Page 15: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

Client/Server (continued)

Client/server model seriously limits utilization of available bandwidth and service.

Popular servers and search engines become traffic bottlenecks.

But high speed networks connecting many clients become idle.

Computing cycles and information in clients are ignored.

Page 16: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

Content Delivery Networks:(CDN) A Transition Model

Servers are decentralized (duplicated) throughout the Internet.

The distributed servers are controlled by a centralized authority (headquarters).

Examples: Internet content distributions by Akamai, Overcast, and FFnet.

Both Client/Server and CDN models have single point of failures.

Page 17: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

A New Paradigm: Peer-oriented Systems

Both client (consumer) & server (producer).

Has the freedom to join and leave any time.

Huge peer diversity: service ability, storage space, networking speed, and service demand.

A widely decentralized system opening for both opportunities and new concerns.

Page 18: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

Peer-oriented Systems

Server

Client/server

a search engine/grid

Pure P2P

e.g. Freenet & Gnutella

e.g. Napster

Hybrid P2P

directory

Content Delivery Networks

ServerDuplicated

Server

e.g. Akami

Page 19: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

Objectives and Benefits of P2P

• Adding and removed nodes from P2P will not affect its performance. (system scalability).

• As long as there no physical break in the network, the target file will always be found.

• Adding more contents to P2P will not affect its performance. (information scalability).

Page 20: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

Peer-oriented Applications

File Sharing: document sharing among peers with no or limited central controls.

Instant Messaging (IM): Immediate voice and file exchanges among peers.

Distributed Processing: One can widely utilize resources available in other remote peers.

Page 21: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

P2P Network Infrastructure

Overlay networks: peers communicate to each other in the application layer.

Making friends with an IP address globally without considering distance, message types, low level protocols used.

Peers are not required to understand physical networks, creating a new domain of development opportunities.

Page 22: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

More on Overlay Networks Overlay Graph: each edge is a TCP connection or a

pointer to an IP address.

Overlay Maintenance: (1) periodically ping to verify liveness of peers; (2) delete the edge with an dead peer; (3) new peer needs to bootstrap.

Overlay Problems: (1) topology-unaware; (2) duplicated messages; (3) inefficient network usage.

Page 23: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

P2P Types and Operations Directory-based P2P: a centralized index server

makes a direct map between a pair of requesting and serving peers, e.g. Napster.

Unstructured P2P: peers are randomly connected in overlay graph, flooding for queries/retrievals, e.g. Gnutella, and KaZaA.

Structured P2P: peers are objectively connected in overlay graph by a Distributed Hash Table for registrations and queries/retrievals, e.g. Chord, CAN

Page 24: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

central index

join

query

answer

get

file

...

Directory-based P2P of Sharing Music: Napster

Page 25: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

Brief History and Implication of Napster 1999/1: Shawn Fanning (freshman, Northeastern), dropped out and started it.

1999/6: Napster began operations for swapping music among peers.

1999/12: lawsuit on copyright violation (RIAA), asking for $100K of each.

2000/3: universities ban it due to heavy traffic, e.g. 25% traffic in Uwisc.

2000/5: VC firm Hummer Winblad invested $15 millions to Napster.

2000/7/26: US District judge orders to stop Napster’s operations in 2 days.

2000/7/28: 9th US Circuit Appeals Court rules it is allowed to continue.

2001/2: Federal Appeals Court rules it must stop trading copyrighted music.

2001/9: It reaches a settlement with music writers/publishers: pay $26 M for the past damage and a % to them as it starts as a paying service in 2002.

Page 26: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

How does Naspter Work (very simple!)

Application-level: (1) client/server protocol over point-to-point TCP/IP; (2) central directory server.

User operation steps: connect to Napster server (www.napster.com) upload a request list and the IP address in the server. Index server searches the list and returns results to the IP. User pings the music hosts, looking for best transfer rate. User chooses a music provider for data transfer.

The index server does not scale its P2P system.

Page 27: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

flooding query

Unstructured P2P: Gnutella

Page 28: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

Super Node based P2P: KaZaA (Morpheus)

...

...

......

...

...

super peerquery

answergetfile

Page 29: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

Super Node based P2P: KaZaA (Morpheus)

...

...

......

...

...

super peer

flooding query

Page 30: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

K V

Distributed Hash Table (DHT)

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

Page 31: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

Distributed Hash Table (DHT)

insert(K1,V1)

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

Page 32: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

Distributed Hash Table (DHT)

insert(K1,V1)

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

Page 33: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

Distributed Hash Table (DHT)

(K1,V1)

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

Page 34: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

Distributed Hash Table (DHT)

retrieve (K1)

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

K V

Page 35: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

Problem 1: Loosing Security and Privacy

Providing a conduit for evil code and viruses.

Providing loopholes for information leakage.

Relaxing the privacy protection by exposing peer identities.

Page 36: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

Problem 2: Weak Resource Coordinations

With limited or no central control, but mainly rely on self-organization.

Lacking communication monitoring and scheduling: cause unnecessary traffic jams.

Lacking access and service coordinations: unbalanced loads among peers.

Page 37: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

Demanded Solution (1): Fast Peer Services

Dynamically identifying and collecting trusted and guaranteed peers as the backbones.

Establishing adaptive self-organization and monitoring for resource coordinations.

Fast data and service searching in low-diameter region.

Page 38: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

(2): Allowing Distrustful Peers Exist

Ensure that peer interactions

do not become intrusive (monitoring/scheduling)

do protect privacy (communication anonymity)

not used for denial-of-service attacks (security)

Page 39: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

(3): Measurable Security Metrics

Benchmarks for security measurement.

Stochastical models for security analysis.

Validating systems and quantifying security degrees.

Page 40: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

(4): Understanding the Trade-offs Analyzing the impact of centralized controls to

performance and security. Quantifying the security loss and performance

gain/loss by decentralization. Optimizing peer-oriented systems for individual

and combined objectives:

high performance, highly secured, balanced of both, for a given performance objective, finding...

Page 41: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

(5): Utilizing Existing Infrastructure

New standards and protocols should be easily implemented in existing Internet.

Avoid modifying commonly used and general purpose software.

Peer-oriented processing should be automatic with little user involvement.

Page 42: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

Factors determining P2P or Not P2P

Budget: applications demanding cost-effectiveness.

Resource relevance to peers: common interests.

Security: mutual trusts among peers.

Rate of peer changes: relatively stable applications.

Non-Critical solutions: QoS is not guaranteed.

Page 43: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

NSF’s Efforts on Cyberinfrastructures Grids: provides a global problem solving

environment for large and critical scientific applications and professional collaborations, where each grid is a server. Funding sources: H&S infrastructure (continuous

support) and large ITRs on apps (00, 01, 02, 03). P2P: provides a globally decentralized system

for anyone to participate. Funding source: a large ITR for DHT (02).

Page 44: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

Application Differences: Grid & P2P

Grid: providing (1) a global problem solving environment for large scientific applications, (2) commercial/public services, (2) professional collaborations, where each grid is a server.

P2P: providing a self-organized information sharing/searching services, where each peer can be both server and client.

Page 45: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

Operation Differences: Grid & P2P

Grid: objectively access to computing, software, and data resources in remote & targeted sites. (Servers-based)

P2P: random accesses to available computing, software, and data resources without a specific target. (Clients-based)

Page 46: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

Different Participants: Grid & P2P

Grid: pre-determined and registered clients and servers.

P2P: clients and servers are not distinguished and registered (for an identity purpose), which can come and go by their choices.

Page 47: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

Different QoS: Grid & P2P

Grid: guaranteed and reliable services are required for each grid server.

P2P: only partially reliable, because services from some peers are not guaranteed and trusted.

Page 48: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

Security Differences: Grid & P2P

Grid: authentication, authority, and firewall protection to each grid.

P2P: privacy, anonymity, authentication, authority, and fire wall protection to each peer is not guaranteed.

Page 49: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

Different Controls: Grid & P2P

Grid: centralized control plays an important role in resource monitoring/allocations and job scheduling.

P2P: limited or no central controls, mainly rely on self-organization.

Page 50: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

Edge and Utility Computing

Objectives: adaptively, timely, and (temporally) move contents and computing resources from centralized centers to sites (edge) close to the end-users.

Benefits: QoS improvement (e.g. low response time) High utilization of resources Easy manageability and high availability of services High cost-effectiveness.

Page 51: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

Core Technology and Challenges

Dynamic resource provisioning: deployment of Internet applications upon demand.

What we do not have but demand to have:

Automation of resource provisioning.

Optimization of resource provisioning

Effective service distributions of resource provisioning.

Page 52: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

Merging P2P and Grid

Objectives: Building a Scalable and Reliable Cyber Resource Sharing System.

Keys: Resource administration & management. Keeping merits of grids on security and reliable

services. Keeping merits of P2P on scalability and avoiding

single point of failures. Balancing the trade-offs between Grid and P2P.

Page 53: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

Heterogeneous Internet Members Co-exist Billions of clients in the format of c-phones, PDAs,

laptops, PCs at home (Internet/wireless). Millions of clients become termed super-peer nodes. Millions of powerful clusters for local services. Millions of trusted/independent grid nodes serve. Millions of trusted/collaborative grid nodes serve.

Dozens of supercomputers for science advancement.

Page 54: Next Generation Cyber-Infrastructure: Integrating Peer-based and Grid Systems Xiaodong Zhang College of William and Mary National Science Foundation This

Future of Distributed Computing Grid infrastructure will provide reliable service

(some computing) resources. In a grid region, P2P techniques will be integrated

for resource administration and management. P2P paradigm will play a major role for information

retrievals. The demand for data accesses/transfers will be

higher than cycles.