19
Distributed Systems Principles and Paradigms Maarten van Steen VU Amsterdam, Dept. Computer Science Room R4.20, [email protected] Chapter 12: Distributed Web-Based Systems Version: December 10, 2012

Distributed Systems Principles and Paradigmsariel/download/ds590/pdfs/slides-static.12.pdf · Distributed Web-Based Systems 12.1 Architecture Distributed Web-based systems Essence

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Distributed Systems Principles and Paradigmsariel/download/ds590/pdfs/slides-static.12.pdf · Distributed Web-Based Systems 12.1 Architecture Distributed Web-based systems Essence

Distributed SystemsPrinciples and Paradigms

Maarten van Steen

VU Amsterdam, Dept. Computer ScienceRoom R4.20, [email protected]

Chapter 12: Distributed Web-Based Systems

Version: December 10, 2012

Page 2: Distributed Systems Principles and Paradigmsariel/download/ds590/pdfs/slides-static.12.pdf · Distributed Web-Based Systems 12.1 Architecture Distributed Web-based systems Essence

Distributed Web-Based Systems 12.1 Architecture

Distributed Web-based systems

EssenceThe WWW is a huge client-server system with millions of servers; eachserver hosting thousands of hyperlinked documents.

Documents are often represented in text (plain text, HTML, XML)Alternative types: images, audio, video, applications (PDF, PS)Documents may contain scripts, executed by client-side software

Client machine

Browser

OS

Server machine

Web server

1. Get document request (HTTP)

3. Response

2. Server fetchesdocument fromlocal file

2 / 19

Page 3: Distributed Systems Principles and Paradigmsariel/download/ds590/pdfs/slides-static.12.pdf · Distributed Web-Based Systems 12.1 Architecture Distributed Web-based systems Essence

Distributed Web-Based Systems 12.1 Architecture

Multi-tiered architectures

ObservationAlready very soon, Web sites were organized into three tiers.

Web server Database serverCGI process

CGI program

1. Get request

3. Start process to fetch document

5. HTML document created

HTTP request handler6. Return result

4. Database interaction

3 / 19

Page 4: Distributed Systems Principles and Paradigmsariel/download/ds590/pdfs/slides-static.12.pdf · Distributed Web-Based Systems 12.1 Architecture Distributed Web-based systems Essence

Distributed Web-Based Systems 12.1 Architecture

Web services

ObservationAt a certain point, people started recognizing that it is was more than justuser↔ site interaction: sites could offer services to other sites⇒standardization is then badly needed.

Service description (WSDL)

Client machine

Client application

Stub

Server application

Stub

Communication subsystem

Communication subsystem

SOAP

Service description (WSDL)Service description (WSDL)

Directory service (UDDI)

Publish serviceLook up

a service

Generate stub from WSDL description

Server machine

Generate stub from WSDL description

4 / 19

Page 5: Distributed Systems Principles and Paradigmsariel/download/ds590/pdfs/slides-static.12.pdf · Distributed Web-Based Systems 12.1 Architecture Distributed Web-based systems Essence

Distributed Web-Based Systems 12.2 Processes

Apache Web server

Observation: More than 52% of all 185 million Web sites are Apache.

The server is internally organized more or less according to the steps neededto process an HTTP request.

Hook Hook Hook Hook

Function

... ... ...

Module Module Module

Apache coreFunctions called per hook

Link between function and hook

Request Response5 / 19

Page 6: Distributed Systems Principles and Paradigmsariel/download/ds590/pdfs/slides-static.12.pdf · Distributed Web-Based Systems 12.1 Architecture Distributed Web-based systems Essence

Distributed Web-Based Systems 12.2 Processes

Server clusters

EssenceTo improve performance and availability, WWW servers are often clustered ina way that is transparent to clients.

Frontend

Webserver

Webserver

Webserver

Webserver

Request Response

Front end handlesall incoming requestsand outgoing responses

LAN

6 / 19

Page 7: Distributed Systems Principles and Paradigmsariel/download/ds590/pdfs/slides-static.12.pdf · Distributed Web-Based Systems 12.1 Architecture Distributed Web-based systems Essence

Distributed Web-Based Systems 12.2 Processes

Server clusters

ProblemThe front end may easily get overloaded, so that special measuresneed to be taken.

Transport-layer switching: Front end simply passes the TCPrequest to one of the servers, taking some performance metricinto account.Content-aware distribution: Front end reads the content of theHTTP request and then selects the best server.

7 / 19

Page 8: Distributed Systems Principles and Paradigmsariel/download/ds590/pdfs/slides-static.12.pdf · Distributed Web-Based Systems 12.1 Architecture Distributed Web-based systems Essence

Distributed Web-Based Systems 12.2 Processes

Server Clusters

QuestionWhy can content-aware distribution be so much better?

SwitchClient

Webserver

Webserver

Distributor

Distributor

Dis-patcher

1. Pass setup requestto a distributor

2. Dispatcher selectsserver

3. Hand offTCP connection

4. InformswitchSetup request

Other messages

5. Forwardothermessages

6. Server responses

8 / 19

Page 9: Distributed Systems Principles and Paradigmsariel/download/ds590/pdfs/slides-static.12.pdf · Distributed Web-Based Systems 12.1 Architecture Distributed Web-based systems Essence

Distributed Web-Based Systems 12.6 Consistency and Replication

Web proxy caching

Basic ideaSites install a separate proxy server that handles all outgoing requests.Proxies subsequently cache incoming documents. Cache-consistencyprotocols:

Always verify validity by contacting serverAge-based consistency:

Texpire = α · (Tcached −Tlast modified)+Tcached

9 / 19

Page 10: Distributed Systems Principles and Paradigmsariel/download/ds590/pdfs/slides-static.12.pdf · Distributed Web-Based Systems 12.1 Architecture Distributed Web-based systems Essence

Distributed Web-Based Systems 12.6 Consistency and Replication

Web proxy caching

Basic idea (cnt’d)Cooperative caching, by which you first check your neighbors on acache miss

Webproxy

Webserver

Webproxy

WebproxyCache

Cache

Cache

Client

Client

ClientClient

Client

ClientClient

Client

Client

2. Ask neighboring proxy caches

1. Look inlocal cache

HTTP Get request

3. Forward requestto Web server

10 / 19

Page 11: Distributed Systems Principles and Paradigmsariel/download/ds590/pdfs/slides-static.12.pdf · Distributed Web-Based Systems 12.1 Architecture Distributed Web-based systems Essence

Distributed Web-Based Systems 12.6 Consistency and Replication

Replication in Web hosting systems

ObservationBy-and-large, Web hosting systems are adopting replication to increaseperformance. Much research is done to improve their organization. Followsthe lines of self-managing systems.

Web hosting system

Metric estimation

Analysis

+/-+/-+/-

Reference input

Initial configuration

Uncontrollable parameters (disturbance / noise)

Observed output

Measured outputAdjustment triggers

Corrections

Replica placement

Consistency enforcement

Request routing

11 / 19

Page 12: Distributed Systems Principles and Paradigmsariel/download/ds590/pdfs/slides-static.12.pdf · Distributed Web-Based Systems 12.1 Architecture Distributed Web-based systems Essence

Distributed Web-Based Systems 12.6 Consistency and Replication

Handling flash crowds

ObservationWe need dynamic adjustment to balance resource usage. Flashcrowds introduce a serious problem.

(a) (b)

(c) (d)

2 days 2 days

6 days 2.5 days

12 / 19

Page 13: Distributed Systems Principles and Paradigmsariel/download/ds590/pdfs/slides-static.12.pdf · Distributed Web-Based Systems 12.1 Architecture Distributed Web-based systems Essence

Distributed Web-Based Systems 12.6 Consistency and Replication

Server replication

Content Delivery NetworkCDNs act as Web hosting services to replicate documents across theInternet providing their customers guarantees on high availability andperformance (example: Akamai).

Origin server

Client

CDN server

CDN DNS server

Regular DNS system

Cache

1. Get base document

2. Document with refs to embedded documents

6. Get embedded documents (if not already cached)

5. Get embedded documents

7. Embedded documentsReturn IP address client-best server

DNS lookups 3

4

13 / 19

Page 14: Distributed Systems Principles and Paradigmsariel/download/ds590/pdfs/slides-static.12.pdf · Distributed Web-Based Systems 12.1 Architecture Distributed Web-based systems Essence

Distributed Web-Based Systems 12.6 Consistency and Replication

Replication of Web applications

ObservationReplication becomes more difficult when dealing with databses andsuch. No single best solution.

AssumptionUpdates are carried out at origin server, and propagated to edgeservers.

14 / 19

Page 15: Distributed Systems Principles and Paradigmsariel/download/ds590/pdfs/slides-static.12.pdf · Distributed Web-Based Systems 12.1 Architecture Distributed Web-based systems Essence

Distributed Web-Based Systems 12.6 Consistency and Replication

Replication of Web applications: normal

Appllogic

Appllogic

Authoritative

databaseSchema Schema

Webserver

Webserver

query

response

full/partial data replication

full schema replication/

query templates

Content-aware

cache

Database

copy

Edge-server side Origin-server side

Content-blind

cache

Client

15 / 19

Page 16: Distributed Systems Principles and Paradigmsariel/download/ds590/pdfs/slides-static.12.pdf · Distributed Web-Based Systems 12.1 Architecture Distributed Web-based systems Essence

Distributed Web-Based Systems 12.6 Consistency and Replication

Replication of Web applications

Alternative solutions

Full replication: high read/write ratio, often in combination with complexqueries.Partial replication: high read/write ratio, but in combination with simplequeriesContent-aware caching: Check for queries at local database, andsubscribe for invalidations at the server. Works good with range queriesand complex queries.Content-blind caching: Simply cache the result of previous queries.Works great with simple queries that address unique results (e.g., norange queries).

QuestionWhat can be said about replication vs. performance?

16 / 19

Page 17: Distributed Systems Principles and Paradigmsariel/download/ds590/pdfs/slides-static.12.pdf · Distributed Web-Based Systems 12.1 Architecture Distributed Web-based systems Essence

Distributed Web-Based Systems 12.6 Consistency and Replication

Replication Web apps.: full/partial replication

Appllogic

Schema

Webserver

response

full/partial data replication

full schema replication/

query templates

Content-blind

cache

Content-aware

cache

Database

copy

Client

Edge-server side

Authoritative

databaseSchema

Webserver

query

Origin-server side

Appllogic

17 / 19

Page 18: Distributed Systems Principles and Paradigmsariel/download/ds590/pdfs/slides-static.12.pdf · Distributed Web-Based Systems 12.1 Architecture Distributed Web-based systems Essence

Distributed Web-Based Systems 12.6 Consistency and Replication

Replication Web apps.: content-aware caching

Appllogic

Schema

Webserver

response

full/partial data replication

full schema replication/

query templates

Content-blind

cache

Content-aware

cache

Database

copy

Client

Edge-server side

Authoritative

databaseSchema

Webserver

query

Origin-server side

Appllogic

18 / 19

Page 19: Distributed Systems Principles and Paradigmsariel/download/ds590/pdfs/slides-static.12.pdf · Distributed Web-Based Systems 12.1 Architecture Distributed Web-based systems Essence

Distributed Web-Based Systems 12.6 Consistency and Replication

Replication Web apps.: content-blind caching

Appllogic

Schema

Webserver

response

full/partial data replication

full schema replication/

query templates

Content-blind

cache

Content-aware

cache

Database

copy

Client

Edge-server side

Authoritative

databaseSchema

Webserver

query

Origin-server side

Appllogic

19 / 19