Week 4: Building Scalable and Reliable e-commerce Site

Week 4: Building Scalable and Reliable e-commerce Site.

1: Introduction2: Multi-tier Architectures3: Application Taxonomy 4: Requirements of Web Applications5: Techniques for Scaling6: Caching and Replication7: Load Balancing8: Failure Detection

Scalability: SUN Server Farm

Akamai

Idea: distribute content over many servers spaced around the world

SOURCE: AKAMAI

9700 Servers650 Networks56 Countries

Akamai

SOURCE: AKAMAI

Distributing servers allows better handling of peak loadsGuarantee uptime all the time and more reliable infrastructure

1. Introduction

Receive IP address208.216.181.15

4. Returns a HTML Page (plus supporting files)

In the Browser

In the Address Box

Address http://www.amazon.com

What the User Sees

Behind The Browser – Summary

3. Server Processing

1. Request the IP address for the domain name

2. Send full URL plus extra info to IP address

amazon.com’s WWW server

3.1 Server requests to Other Servers

3.2 Receives info. from Other Servers

Optionally

Other servers

Advanced

Interaction

Complex heterogeneous infrastructures are a reality!

Director Director and Security and Security

ServicesServicesExistingExisting

ApplicationsApplicationsand Dataand Data

BusinessBusinessDataData

DataDataServerServerWebWeb

ApplicationApplicationServerServer

Storage AreaStorage AreaNetworkNetwork

BPs andBPs andExternalExternalServicesServices

WebWebServerServerDNSDNS

ServerServer

DataData

Dozens of systems and applications

Hundreds of components

Thousands of tuning

parameters

One of the D

ata Centers (500 servers)

C is c o 7 0 0 0

ICPMSCOMC7501

C is c o 7 0 0 0

ICPMSCOMC7502

C a ta lyst5 0 0 0

ICPMSCOMC5001(MSCOM1)

ATM0/0/0.1

FE4/0/0Port 1/1

FE4/1/0 FE4/1/0

Port 2/1 Port 2/1 C a ta ly st5 0 0 0

FE4/0/0

ATM0/0/0.1

Port 1/1

C is c o 7 0 0 0

ICPMSCOMC7503

C a ta ly st5 0 0 0

ATM0/0/0.1

FE4/0/0Port 1/1

FE4/1/0 FE4/1/0

Port 2/1 Port 2/1 C a ta lyst50 0 0

FE4/0/0

ATM0/0/0.1

Port 1/1

C is c o 7 0 0 0

ICPMSCOMC7504

SYST EMS

AC A C

48V DC 48 V DC

5VD C O K 5V DC OK

SH UTDO WN SHUTD OWN

CA UT IO N:D o u b le P o l e/ n eu tra l fu s in g CA U TI O N: Dou b le P o le /n e u tr a l f u si n g

F 1 2 A / 25 0 V F 1 2A /2 5 0V

ASX- 100 0

B DB DB D B D

SY STEM S

A C AC

4 8V D C 48V D C

5 VDC OK 5VDC O K

S HUT DOWN SHU TDOW N

CA UT I ON :D ou b le Pol e /n e u tr al f u si n g C AU T IO N: Do u bl e Po le / ne ut ra l fu s in g

F 1 2A /2 5 0 V F 1 2 A/2 5 0 V

ASX-10 00

B DB DB D B D

ICPMDISTFA1001 ICPMDISTFA1002

3A2 2A2

ATM0/0/0.1

C is c o 7 0 0 0

ICPMSCOMC7505

Catalyst 2926

ICPMSFTDLC2921(MSCOM DL1)

Port 1/1

FE4/0/0

C is c o 7 0 0 0

ICPMSCOMC7506

C atalyst 2926

ICPMSFTDLC2922(MSCOM DL2)

Port 1/1

FE5/0/0

Port 1/2Port 1/2

FE4/0/0

FE5/0/0

CPMSFTWBW26CPMSFTWBW28CPMSFTWBW30

WWW.MICROSOFT.COMWWW.MICROSOFT.COM

CPMSFTWBW24CPMSFTWBW31CPMSFTWBW32CPMSFTWBW33CPMSFTWBW34

CPMSFTWBW35CPMSFTWBW40CPMSFTWBW41CPMSFTWBW42CPMSFTWBW43

SEARCH.MICROSOFT.COM

CPMSFTWBS01CPMSFTWBS02CPMSFTWBS03CPMSFTWBS04CPMSFTWBS05CPMSFTWBS06CPMSFTWBS07CPMSFTWBS08CPMSFTWBS09

CPMSFTWBS10CPMSFTWBS11CPMSFTWBS12CPMSFTWBS13CPMSFTWBS14CPMSFTWBS15CPMSFTWBS16CPMSFTWBS17CPMSFTWBS18

WWW.MICROSOFT.COM

CPMSFTWBW08CPMSFTWBW13CPMSFTWBW14CPMSFTWBW29

WWW.MICROSOFT.COM

REGISTER.MICROSOFT.COM

CPMSFTWBR03CPMSFTWBR04CPMSFTWBR05

CPMSFTWBR09CPMSFTWBR10

SUPPORT.MICROSOFT.COM

CPMSFTWBT01CPMSFTWBT02

WINDOWS.MICROSOFT.COMCPMSFTWBY01CPMSFTWBY02

CPMSFTWBY03CPMSFTWBY04

WINDOWS98.MICROSOFT.COM

CPMSFTWBJ01

WINDOWSMEDIA.MICROSOFT.COM

PREMIUM.MICROSOFT.COM

CPMSFTWBP01CPMSFTWBP02

CPMSFTWBP03

SUPPORT.MICROSOFT.COM

CPMSFTWBR07CPMSFTWBR08

CPMSFTWBR01CPMSFTWBR02CPMSFTWBR06

REGISTER.MICROSOFT.COM

WINDOWSMEDIA.MICROSOFT.COM WINDOWSMEDIA.MICROSOFT.COM

CPMSFTWBJ01CPMSFTWBJ02

CPMSFTWBJ06CPMSFTWBJ07CPMSFTWBJ08

MSDN.MICROSOFT.COM

CPMSFTWBN01CPMSFTWBN02

CPMSFTWBN03CPMSFTWBN04KBSEARCH.MICROSOFT.COM

CPMSFTWBT40CPMSFTWBT41CPMSFTWBT42

INSIDER.MICROSOFT.COM

CPMSFTWBI01 CPMSFTWBI02

C a ta ly st5 0 0 0

IUSCCMQUEC5002(COMMUNIQUE2)

C a ta lyst5 0 0 0

IUSCCMQUEC5001(COMMUNIQUE1)

C a ta lys t5 0 0 0

C a ta ly st50 0 0

ICPMSCBAC5001ICPMSCBAC5502

Port 1/1 Port 1/2Port 2/12

C is c o 7 0 0 0

ICPCMGTC7501

C i s c o 7 0 0 0

ICPCMGTC7502

FE4/1/0

Port 1/1

FE4/1/0SQL

Microsoft.com SQL Servers

Microsoft.com Stagers,Build and Misc. Servers

Build Servers 32

IIS 210

Application 2

Exchange 24

Network/Monitoring 12

SQL 120

Search 2

NetShow 3

NNTP 16

SMTP 6

Stagers 26

Total 459

Microsoft.com Server Count

Drawn by: Matt GroshongLast Updated: April 12, 2000

IP addresses removed by J im Gray to protect security

CPMSFTSQLB05CPMSFTSQLB06CPMSFTSQLB08CPMSFTSQLB09CPMSFTSQLB14CPMSFTSQLB16CPMSFTSQLB18CPMSFTSQLB20CPMSFTSQLB21

Backup SQL Servers

CPMSFTSQLB22CPMSFTSQLB23CPMSFTSQLB24CPMSFTSQLB25CPMSFTSQLB26CPMSFTSQLB27CPMSFTSQLB36CPMSFTSQLB37CPMSFTSQLB38CPMSFTSQLB39

CPMSFTSQLA05CPMSFTSQLA06CPMSFTSQLA08CPMSFTSQLA09CPMSFTSQLA14CPMSFTSQLA16CPMSFTSQLA18CPMSFTSQLA20CPMSFTSQLA21CPMSFTSQLA22

Live SQL ServersCPMSFTSQLA23CPMSFTSQLA24CPMSFTSQLA25CPMSFTSQLA26CPMSFTSQLA27CPMSFTSQLA36CPMSFTSQLA37CPMSFTSQLA38CPMSFTSQLA39

IIS IIS

Consolidator SQL Servers

CPMSFTSQLC02CPMSFTSQLC03CPMSFTSQLC06CPMSFTSQLC08CPMSFTSQLC16CPMSFTSQLC18CPMSFTSQLC20CPMSFTSQLC21CPMSFTSQLC22CPMSFTSQLC23

CPMSFTSQLC24CPMSFTSQLC25CPMSFTSQLC26CPMSFTSQLC27CPMSFTSQLC30CPMSFTSQLC36CPMSFTSQLC37CPMSFTSQLC38CPMSFTSQLC39

DOWNLOAD.MICROSOFT.COM DOWNLOAD.MICROSOFT.COM

HTMLNEWS(pvt).MICROSOFT.COMCPMSFTWBV01CPMSFTWBV02CPMSFTWBV03

CPMSFTWBV04CPMSFTWBV05

CPMSFTWBD01CPMSFTWBD05CPMSFTWBD06

CPMSFTWBD07CPMSFTWBD08

CPMSFTWBD03CPMSFTWBD04CPMSFTWBD09

CPMSFTWBD10CPMSFTWBD11

ACTIVEX.MICROSOFT.COM

CPMSFTWBA02 CPMSFTWBA03

FTP.MICROSOFT.COMCPMSFTFTPA03CPMSFTFTPA04

CPMSFTFTPA05CPMSFTFTPA06

NTSERVICEPACK.MICROSOFT.COM

CPMSFTWBH01CPMSFTWBH02

CPMSFTWBH03

HOTFIX.MICROSOFT.COM

CPMSFTFTPA01

ASKSUPPORT.MICROSOFT.COM

CPMSFTWBAM03CPMSFTWBAM04

CPMSFTWBAM01CPMSFTWBAM01

MSDNNews.MICROSOFT.COMCPMSFTWBV21CPMSFTWBV22

CPMSFTWBV23

MSDNSupport.MICROSOFT.COM

CPMSFTWBV41 CPMSFTWBV42

NEWSLETTERS.MICROSOFT.COM

CPMSFTSMTPQ01 CPMSFTSMTPQ02

NEWSLETTERSCPMSFTSMTPQ11CPMSFTSMTPQ12CPMSFTSMTPQ13CPMSFTSMTPQ14CPMSFTSMTPQ15

NEWSWIRE

CPMSFTWBQ01CPMSFTWBQ02CPMSFTWBQ03

Misc. SQL Servers

INTERNAL SMTP

CPMSFTSMTPR01CPMSFTSMTPR02

NEWSWIRE.MICROSOFT.COM

CPITGMSGR01 CPITGMSGR02

NEWSWIRECPITGMSGD01CPITGMSGD02CPITGMSGD03

OFFICEUPDATE.MICROSOFT.COM

CPMSFTWBO01CPMSFTWBO02

PremOFFICEUPDATE.MICROSOFT.COM

CPMSFTWBO32

SearchMCSP.MICROSOFT.COMCPMSFTWBM03

SvcsWINDOWSMEDIA.MICROSOFT.COM

CPMSFTWBJ21 CPMSFTWBJ22

STATSCPITGMSGD04CPITGMSGD05CPITGMSGD07CPITGMSGD14CPITGMSGD15CPITGMSGD16CPMSFTSTA14CPMSFTSTA15CPMSFTSTA16

WINDOWS_Redir.MICROSOFT.COMCPMSFTWBY05

COMMUNITIES

COMMUNITIES.MICROSOFT.COMCPMSFTNGXA01CPMSFTNGXA02CPMSFTNGXA03

CPMSFTNGXA04CPMSFTNGXA05

CODECS.MICROSOFT.COM

CGL.MICROSOFT.COM

CPMSFTWBG03CPMSFTWBG04CPMSFTWBG05

CPMSFTWBG04CPMSFTWBG05

CDMICROSOFT.COM

CPMSFTWBC01CPMSFTWBC02

CPMSFTWBC03

BACKOFFICE.MICROSOFT.COM

CPMSFTWBB01CPMSFTWBB03

CPMSFTWBB04

Build ServersINTERNET-BUILD

INTERNET-BUILD1INTERNET-BUILD2INTERNET-BUILD3INTERNET-BUILD4INTERNET-BUILD5INTERNET-BUILD6INTERNET-BUILD7INTERNET-BUILD8INTERNET-BUILD9INTERNETBUILD10INTERNETBUILD11INTERNETBUILD12INTERNETBUILD13INTERNETBUILD14INTERNETBUILD15INTERNETBUILD16

INTERNETBUILD17INTERNETBUILD18INTERNETBUILD19INTERNETBUILD20INTERNETBUILD21INTERNETBUILD22INTERNETBUILD23INTERNETBUILD24INTERNETBUILD25INTERNETBUILD26INTERNETBUILD27INTERNETBUILD30INTERNETBUILD31INTERNETBUILD32INTERNETBUILD34INTERNETBUILD36INTERNETBUILD42

IIS IIS

IISIIS

IIS IIS

SQLSQL

StagersCPMSFTCRA10CPMSFTCRA14CPMSFTCRA15CPMSFTCRA32CPMSFTCRB02CPMSFTCRB03CPMSFTCRP01CPMSFTCRP02CPMSFTCRP03

CPMSFTCRS01CPMSFTCRS02CPMSFTCRS03CPMSFTSGA01CPMSFTSGA02CPMSFTSGA03CPMSFTSGA04CPMSFTSGA07

PPTP / Terminal ServersCPMSFTPPTP01CPMSFTPPTP02CPMSFTPPTP03CPMSFTPPTP04

CPMSFTTRVA01CPMSFTTRVA02CPMSFTTRVA03

CPMSFTSQLD01CPMSFTSQLD02CPMSFTSQLE01CPMSFTSQLF01CPMSFTSQLG01CPMSFTSQLH01CPMSFTSQLH02CPMSFTSQLH03CPMSFTSQLH04CPMSFTSQLI01CPMSFTSQLL01CPMSFTSQLM01CPMSFTSQLM02CPMSFTSQLP01CPMSFTSQLP02CPMSFTSQLP03CPMSFTSQLP04CPMSFTSQLP05CPMSFTSQLQ01CPMSFTSQLQ06

CPMSFTSQLR01CPMSFTSQLR02CPMSFTSQLR03CPMSFTSQLR05CPMSFTSQLR06CPMSFTSQLR08CPMSFTSQLR20CPMSFTSQLS01CPMSFTSQLS02CPMSFTSQLW01CPMSFTSQLW02CPMSFTSQLX01CPMSFTSQLX02CPMSFTSQLZ01CPMSFTSQLZ02CPMSFTSQLZ04CPMSFTSQL01CPMSFTSQL02CPMSFTSQL03

Monitoring ServersCPMSFTHMON01CPMSFTHMON02CPMSFTHMON03

CPMSFTMONA01CPMSFTMONA02CPMSFTMONA03

Canyon Park Data CenterMicrosoft.com Network Diagram

2. Multi-Tier Architectures

Where it All Takes Place

Client/Server Model Fundamental to the Internet

packet switching decouples computers Functional modules with well-defined interfaces Client requests service; server provides it Data exchanged only through real-time messages

no global variables, no common databases Server may become a client to a different server

SOURCE: NETWORK COMPUTING

2-tier Vs. n-tier Architecture

Client(Browser)

Tier 2Logic

Tier 3Logic

Client Database

Database

Tier 2Logic Data

2-tier

N-tier

Two-Tier Architecture

TIER 1:CLIENT

TIER 2:SERVER Server performs

processing directly

SOURCE: FOURNIER

Two-Tiered Architectures“Gartner Group Configurations”

SOURCE: NETWORK COMPUTING

Why 2-tier?

(Often called “Client-Server”, which is a bad name because it’s too general) Simple Better for dynamic queries Potentially more efficient (probably not in reality) Perhaps more processing off-loaded to client (for better or worse) Global data modeling is not practical

Examples of Two-,Three-,and Four-Tiered Infrastructures

Three-Tier ArchitectureTIER 1:CLIENT

TIER 2:SERVER

TIER 3:BACKEND Application server

offloads processingto tier 3

SOURCE: FOURNIER

N-Tier Architecture

SOURCE: FOURNIER

Data Warehousing Architecture

SOURCE: FOURNIER

Why n-tier?

Modularity via objects, not enterprise-wide data model “Thin” clients since “Fat” clients infeasible Security Replication of business logic easier Flexibility Performance (Due to flexibility) Manageability All data not in one data model All data not in one database brand Etc.

Even with n-tier, Databases Crucial

Databases need to have all functions required in 2-tier and more. Data model support Concurrency Control Security Integrity Performance Manageability Support for heterogeneity

Databases in a Heterogeneous World

There needs to be semantic consistency while using multiple databases Atomicity Consistency Isolation Durability

Transactions will be covered later It is desirable that there be interoperability of applications with multiple databases

Same API to access multiple databases And, ability to access multiple databases Hence, motivation for JDBC and ODBC, which can be considered as

middleware

3. Application Taxonomy

Characterizing Web Applications

Applications

Applications typically made up of many interactions with a client How the application must be built depends on the type of interactions

that comprise it This seems trivial, but it is where all architecture starts All interactions are to varying degrees

Asynchronous or Synchronous Influencing all interactions are requirements for concurrency, throughput,

latency, ... Interactions are sometimes called “transactions,” though no specific

semantic properties are applied to the word transaction when used in this way.

Workload Characteristics

Application Functionality Types of Interaction - Inquiry (Static and Dynamic) vs. Transactions Volume of Transactions Volume of User-Specific Responses (Personalization) Amount of Cross-Session Info Transaction Complexity Data Volatility Integration with legacy systems

Usage Patterns Number of Unique Items Number of Page Views Volume of Dynamic Searches Transaction Volumes Swing

Infrastructure Constraints % Secure Pages (privacy) Security: Authentication, Integrity, Non-repudiation, Regulations

Types of Web Applications

Publish and Subscribe Web Portals such as yahoo.com, excite.com, Media Sites such as www.nfc.co.il, zdnet.com and Events such as www.usopen.org, www.wimbeldon.org

Shopping Exact Inventory Sites - Victoriassecret.com, Abercrombie.com Inexact Inventory Sites - buy.com, dvdexpress.com

Customer Self Service Home banking - bankone.com, wingspanbank.com Travel Sites - Travelocity Insurance - amica.com

Trading Online Brokerages - schwab.com, fidelity.com, etrade.com Auction Sites - ebay.com, priceline.com Games – Interactive group game servers

Workload Characteristics of Web Applications

Low Medium High

Transaction VolumesDynamic ContentDynamic SearchesUser Specific Responses (Personalization)Cross-session Information

Legacy Integration

Data VolatilityTransaction Volume Swings

Number of content Publishers/Sources

Number of Unique Items per pagePage Content Volatility

Number of Page Views

Security, Authentication etc.Percentage of Secure Pages

Transaction Complexity

System Workload Characteristics Publish &Subscribe

Shopping CustomerSelf Svc.

Trading

Application Taxonomy: Read Transactions Read-only transactions

Highly static: X-Ray, Corporate Information Entertainment Video, 1990 Census

Nearly static: Train Schedule, Catalog without quantities Dynamic: Weather Forecast, Catalog with quantities Dynamic with high consistency requirements: Account balance,

Catalog with quantities Dynamic data with high consistency and rapid update: rock concert

sales with assigned seating

Application Taxonomy: Update Transactions

Update w/ modest integrity: Amazon book comment Update w/ high integrity: Billing record Update w/asynchronous processing: Stock Trade Update w/loosely coupled processing: Buying a physical product over

the net, or ordering/provisioning a new ISDN line

Issues

It is the type of applications along the read-only and update dimensions that greatly impact How applications are architected What system support is needed

For each of the previous examples, it is worth considering the implications

4. Requirements of Web Applications

Requirements - Summary

Availability Scalability Security Performance Integrity Manageability Malleability/Longevity Integration Cost

Availability Defined as measurement of perceived uptime by a user There are 86,400 seconds in a day (~100,000) 31,536,000 seconds in a

year (~30 million) 99% uptime represents 1% downtime is

864 seconds/day or 14.4 minutes/day 315,360 seconds/year or 5256 minutes/year or 88 hours/year

99.99%53 minutes/year or 0.14 minutes/day)

99.999%5 minutes/year

99.99999% (7 nines)3 seconds/year99.9999%30 seconds/year

Percentage UptimeDowntime

Availability - Discussion

What do you see on the web? Why? What will be required in the future?

In the News

Source: Gartner Group

Downtime Costs (per Hour) Brokerage operations $6,450,000 Credit card authorization $2,600,000 Ebay (1 outage 22 hours) $225,000 Amazon.com $180,000 Package shipping services $150,000 Home shopping channel $113,000 Catalog sales center $90,000 Airline reservation center $89,000 Cellular service activation $41,000 On-line network fees $25,000 ATM service fees $14,000

Sources: InternetWeek 4/3/2000 + Fibre Channel: A Comprehensive Introduction, R. Kembel 2000, p.8. ”...based on a survey done by Contingency Planning Research."

September 11, 2001

Only 15% of the companies in the World Trade Center had a working business continuity plan

One Law firm did not have a backup outside of the building – it went out of business

One of the trading firms was able to successfully, immediately transition over to a backup site across the river with absolutely no interruption to their customers

An investment bank had only a tape backup. It took them four days to recover

Scalability

The capability of a system to adapt readily to a greater or lesser intensity of use, volume, or demand while still meeting its business objectives (acceptable levels of performance, availability, manageability etc.)

Ideal - Gracefully degrade as load increases. Seldom happens

Bad situation - Think it's OK until load increases. Poor design

Utilization increases faster than the load - Typical

Utilization increase linearly with load - Good Situation

Resource Utilization

Security

Privacy Authentication Authorization Audit Non-repudiation

Performance

How long does it take to get a response to a request from the system? Top-level metrics

Latency Throughput

How many transactions can be completed in a unit of time (Capacity)? Subsidiary metrics

CPU Network Bandwidth I/O of various types ...

Integrity

Data correctness Data permanence Disaster recovery Data currency

Manageability

Consider number of elements in a web applications Consistency Security Modifications Performance Configuration Training level required of operators

Malleability/Longevity

Continuous availability (despite update and failure) Time period of use of program

Integration

Note: millions of person-years of spent every year for applications This represents a total multi-trillion dollar investment Hence, integration is a necessity Integration approaches

Application to application Data sharing by multiple applications Process (Complex application integration)

For some applications, integration cost is 7x cost of system, yet this is less than recreating existing applications or losing benefits of integrated systems

Initial implementation Modification Installation Management (management is greater than development cost – usually

at least double)

Total Cost of Ownership

HW management

Environmental14%

Downtime20%

Purchase20%

Administration

Backup Restore

30%•Administration: all people time•Backup Restore: devices, media, and people time•Environmental: floor space, power, air conditioning

Cause of System Crashes

20%10% 5%

15% 18% 21%

0%20%40%60%80%

1985 1993 2001

Other: app, power, network failureSystem management: actions + N/problemOperating SystemfailureHardware failure

(est.)

Current State of the ART

Failures due to people up, hard to measure VAX crashes ‘85, ‘93 [Murp95]; extrap. to ‘01 HW/OS 70% in ‘85 to 28% in ‘93. In ‘01, 10%? How get administrator to admit mistake? (Heisenberg?)

(based on the lecture “Recovery Oriented Computing” by Dave Patterson, Berkeley)

5. Techniques for Scaling

Techniques for achieving the requirements

Motivation

Defined: Data is stored without overlap across multiple sites and each site processes its data the same way

This is the architecture of the web (Order of magnitude circa 10^12 hits/day)

Back of the envelope thought exercise: Assume a server can handle average number of hits ranging from

10^1/sec. – 10^4 /sec Then, there must be 10^3 – 10^6 web sites to meet load…

Examples (data partitioning – segmented workload): 1999 data on one site, 1998 on another… a’s on one site, b’s on another…

Some typical Web site loads over a 24-hour period

Example Response Time Budget

Client Request5%

Request Network Latency5%

Server Time55%

Response Network Latency20%

Client Response Processing15%

How Latency Varies Based on Workload Pattern and Tier

Achieving the Requirements

Faster Machines (Vertical Growth) Replicated Machines (Horizontal Growth) Specialized Machines Segmented Workloads Request Batching User Data Aggregation Connection Management and Caching

It is important to note that a detailed understanding of the application is key to the successful implementation

Faster Machines - Vertical Growth

Scalability can be achieved through the use of faster machines. This technique can include:

moving to hardware that is bigger than current environment. For example: moving a web server from and PC based server running NT to a UNIX based serverusing machines with more CPUs to leverage

the operating system's multitasking and multiprocessing capabilitiesusing machines that leverage other

computing paradigms such as parallel computingusing better software that is optimized for the

CPUusing faster hardware components such as

memory, cache, disk and I/O devices etc.

Replicated Machines - Clusters

Adding more machines of the same type and load balancing requests across these machines. In order to implement this technique we have to implement additional components in the architecture such as:

Dispatcher node that can monitor and load balance processing requests across the replicated machines A synchronization node that synchronizes the

content and data across the machinesA mechanism for managing sessions across

replicated machines

Specialized Machines

Individual components of the architecture can be scaled by using specialized machines that perform a certain function much faster. This technique is typically used in architectures to facilitate: Intelligent routing of traffic and data across replicated machines Dynamic caching, used extensively by event sites and other media

sites to speed up access to frequently accessed content Security and encryption, used by high volume sites to speed up the

SSL encryption and decryption

Segmented Workload

This is a technique that is typically used in conjunction with replicated machines. It involves the partitioning of the workload of an application to achieve optimum performance. There are several ways of implementing this technique, they vary from:

URL references, which is the most simplistic form of segmenting the workload by analyzing the URL and directing the requests to appropriate serversFunctional Partitioning, which looks at the

application and builds the partitioning of the workload in through custom programmingData Partitioning, placing segments of the data

in different machines

Function 1

Function 2

Function 3

Request Batching Multi-tier communication places a large computational

load on both the client-tier (requester) and the server-tier. It also introduces considerable latency. Furthermore, the overhead costs of virtually all cross-tier requests are equal, therefore it is much better to make fewer, but larger requests.

The goal of this technique is to reduce the number of requests that are sent between requesters and responders (such as between tiers or processes) by allowing the requester to define new requests that combine multiple requests.

Client Server

Client

Server

Client

Server

Command

User Data Aggregation

This technique aggregates most commonly accessed data from multiple backend systems to speed up the overall performance of the architecture. This technique is typically implemented using:Custom ProgrammingIntelligent Middleware andData replication

Client Server

Client

Server

Client

Server

Connection Management

This technique aims to achieve scalability by reducing the most expensive operations within an application's workflows. This includes connections to legacy systems, databases and other servers

Servlet /App

WEB Application Server

PoolConnection

Connection Manager

ClientClient

Resource

I ncoming Request

1. WAS passes a user request to a Servlet/App2. The Servlet requests a connection from the Mgr. 3. The Mgr get a connection from the pool and gives the

Servlet/app a connection. 4. The Servlet uses the connection to the resource5. The resource returns data back6. The Servlet return the connection to the Manager and the

connection is returned to the pool7. The Servlet/App sends the response back

If a connection is not available: A The CM requests a new connection B Adds the connection to the pool

Caching

Defined: Storage of and reference to data in a location that can be accessed faster and/or with higher aggregate bandwidth

Done at every level of a system Processor/memory Computer/disk Browser Web

Simplest when only one, infrequent writer of the data Issues: Write through caches

Cache invalidation

Caching (continued)

More complex when multiple writers and/or higher frequency updates There is the distributed cache consistency problem This happens in:

Computer architecture Multi-computer architectures Distributed systems of all types, including the web

Examples: Browser cache DNS Mirror sites Etc.

Techniques Applied to Web Tiers

Dimensions of the Scaling Techniques

Scaling Technique Increase Power

Improve Efficiency

Shift / Reduce Load

Faster Machine X

Replicate Machines X

Specialized Machines X X

Segmented Workload X X

Request Batching X

User Data Aggregation X

Connection Management X

Caching X X

6. Caching and Replication

The Technology Behind the Techniques

Cache Consistency Techniques

Fuzzy Use time-out and hope for the best Setting time-out is very tricky and error-prone

Consistent caching Use distributed cache consistency algorithms There are trade-offs between availability and consistency Algorithms are very tricky but can be gotten right Typical approach is the concept of token management The concept of token management...

Read token Write token Usually more tokens required to make things really work

Replication

Definition: Explicit creation, maintenance, and access of multiple copies of some resource Processors Bandwidth Data Etc.

Why replicate? Throughput Bandwidth Availability Integrity

Replication vs. Data Partitioning

Replication Same or overlapping data stored at multiple locations

Partitioning Data non-overlapping Typically, only one “home” for any data element

Replication vs. Caching

Difference between caching and replication Caching: there is a fundamental difference between a cached copy

and the real “backing” data. Loss of the cache is not a failure except from the perspective of performance

Replication: all replicas are of the same type, albeit not necessarily identical. Loss of a replica is a failure and could result in higher likelihood of lost data

Semantics of Replication

Consistency/fuzzy replication Same issue as in caching as above

What does consistency mean? Ticket Sales (OK to not show all the seats) Latest Score in basketball game (Can lag by up to n seconds) Weather forecast (Variable lag, depending on serverity of change) Prices for certain goods (Perhaps they need to be exact, as

differentials would cause customer dissatisfaction)

Replication Algorithms Abound

Unanimous Update Always update all copies Read from any copy

Excellent read throughput Excellent read availability Very poor write throughout Very poor write availability

Unanimous Read Always read all copies Update any copy

Excellent write throughput and availability Very poor read throughput and availability

Additional Replication Algorithms

Primary Copy Must update primary copy Primary copy ensures all other copies get updated Read from any copy

Excellent read throughput and availability Poor write availability

Signicant complexity in ensuring primary copy updates all other replicas

Voting Assume n copies Read from any r Write to n-r+1

Replication Conclusions

All algorithms quite difficult to implement But, replication has compelling benefits

Best long term approach for high data availability Software update or data reorganization Disaster recovery

Obvious performance benefits as well, at least for data which is either read or written infrequently. (Often, one of these is true.)

Systems support for replication required if implementation is to be feasible

Systems Support – Atomic Transactions in particular

7. Load Balancing

Load Balancing

Definition: Load Balancing refers to a technique that uses a load balancing algorithm (LBA) to choose a replica

Definition: An LBA is an algorithm (typically distributed) that permits a client to select a replica that meets performance & availability goals

Participants in the algorithm include clients and commonly replicas and other intermediaries

May want priority for certain requests

Load Balancing In Use - Examples

Direct a data read or write to: An unloaded replica A nearby replica A replica that will not charge much for its service …

Direct a processing request to: A replica that will complete the request with minimum latency A node that has been used for similar processing, so its cache is

primed …

Many Approaches to Load Balancing

Maintain a replicated directory service Client can consult an instance of it to gain an address of a replica Approaches

Directory can return set of replicas and client can use algorithm to determine proper replica

Or, Directory service can apply algorithm and return proper replica Can use a replicated, intermediary that is a forwarding service

Algorithms for Directing Load

Randomization Round-robin Dynamic: Based on recent replica performance Locality-based (recent usage) Content-based Geography or Topology-based Negotiation-based (Request for Proposal -- direction to lowest bidder)

Randomization

Simple Excellent if

Locality effects are not important Reasonable distribution of requests

Timing Duration

No need for priority-based execution Willingness to accept stochastically good performance

Round-Robin

What is meant by Round-Robin Intra-client round robin? Inter-client round robin?

Simple Excellent if

Locality effects are unimportant (or non-existent) Requests have similar duration

Add’l Topics for Randomization & RR

Algorithms should take into account: Differential capacity of replicas Differential capacity of networks Ownership of resources Security issues

Dynamic Load Balancing

Can track in one or more places: Actual performance by replica Metrics of replica loading Results of probes

That information can be used to determine best replica Complex Advantages

Can provide excellent results in situations that randomized or round-robin load-balancing does not

Can be customized to provide priority, etc.

A Strawman LBA

Assumptions below… Clients 1..n, Datagatherer, & Replicas A & B DataGatherer

Probes replicas every 60 seconds, (Time = 0, 60, …) Chooses least loaded replica & reports it for 60 secs

Clients issue Service time for requests is ~10 secs w/low variance Requests to replicas based upon consulting DataGatherer

What’s the Result?

A meta-stable system: all load oscillates between Replica A and Replica B

Problem: reported load not tracking actual load Solutions

More frequent probes: probes should happen more frequently than 1/average(service time)

LBA should be less definitive in nature; e.g., somewhat stochastic In any case, designing good load balancing algorithms is hard without

knowing lots of information about the load

Locality-based

Premise is that a replica that has serviced a certain type of request recently should do so again

Why? Efficiency due to already available resources

E.g., open files or databases Efficiency due to security

E.g., secure communication sessions Complexity: how to other techniques, as Locality may not be enough

Content-based

As in data partitioning, assume certain types of data can best be handled by certain sites Site A stores “aa…az” in random access memory Site B does the same for “ba..bz” Therefore, “a” requests should generally go to Site A.

This is actually an approach for achieving locality

Geography or Topology-based

Based on co-location of client and replica May be an indicator of

Higher bandwidth Shorter latency Increased reliability Better security

Domain names are now registered with geographical coordinates

Negotiation-based

Virtual capitalism in action: Issue RFP Evaluate RFPs Ship work as appropriate

Cost of load-balancing overhead must be less than benefit This approach can get very interesting quickly:

Contractual commitments and compensation if unmet A way to do Pareto optimal scheduling

Useful to implement for real load balancing in business-to-business e-commerce

Role of Caching

Cache results of LBA for performance and availability The usual problem of cache correctness

How long until cache refresh Time-outs too short -> load balancing algorithm places too much load Time-outs too long -> data is insufficiently fresh

What happens when cache sends you to a failed site If faulty cached-data, go back and refetch This leads to the definition of a Hint

A cached entry which is right with high probability, but can be and always is checked for validity prior to use

The issue of time-out appears again

Example: Load Balancing to HTTP Server

User specifies http://www.xxx.com Request should actually be handled by one of many HTTP servers to

provide higher throughput One approach Can do request re-direction (a type of forwarder)

See http protocol definition as in assigned reading The forwarder a potential bottleneck

Approach 1 – Round Robin DNS

DNS entries allow 32 server addresses per record. DNS (name) servers will cycle through the entries therefore providing

round-robin load balancing Advantages

Cheap Easy

Round Robin DNS - Problems

Addresses of unavailable servers will remain until an administrator removes the entries

It takes hours or days for the DNS database to replicate So, system hands out addresses of down servers for a long time Address of recently added servers take a while to become visible

All servers treated equally Perhaps, new servers will likely be faster than the old ones and

would handle more load Some servers may handle multiple loads and should get fewer

requests

Cisco Local and Distributed Director

See:http://www.cisco.com/warp/public/cc/pd/cxsr/400/tech/scale_wp.htm

Session redirection accomplished by rewriting IP header using a mapping table

Intelligent load balancing to servers within a cluster Takes into account status of servers Uses only a single DNS entry for entire server complex

Simplifies administration Hot standby feasible

Fancier load balancing of this type Routes requests based on topological distance Routing decisions can be based on hop counts, network usage, &

round-trip latency.

IBM Secureway Network Dispatcher

http://www-4.ibm.com/software/network/dispatcher/about/features/keyfeatures.html Network dispatcher

Doesn't modify packets (vs. LocalDirector which does) Only inspects inbound requests (LocalDirector looks at both)

So, response go back directly to the requester (greater efficiency) Background processes check servers to ensure that they are up

"advisors" support HTTP, SSL, FTP, NNTP, POP3, SMTP, Telnet This way requests don't go to down servers.

Balances load across servers of different sizes: Servers send CPU, Disk, I/O metrics to dispatcher

Supports hot standby for high availability of dispatcher Uses a "sticky" port option to route client requests to same server to

ensure state preserved across requests: recall locality topic

8. Failure Detection

Failure Detection

Explicit –clear indication that failure has occurred Timely Semantics clean, … as far as they go Voting

Implicit – timeout Requester does not receive response after waiting a while Unclean: Does not necessarily mean remote system failed

Timeout often used in very many places/levels Communication Naming, … And, ultimately, End-to-end

Some have argued only end-to-end timeouts valuable, but this is incorrect

Timeout In More Depth

Problems with timeouts Semantics Specification of timeout length

Particularly difficult when requests take variable amounts of time And, requester, can not dynamically set time-out interval Long intervals lead to poor customer satisfaction – imagine an

ATM that made you wait 10 minutes before failing and giving you your card back?

Therefore, timeouts are used at multiple system levels Lower levels have more predictable performance so can trigger

timely failures better Higher levels are required for ultimate correctness

The Role of the Sequence Number

Sequence number in communication protocol Failure Duplicate detection Flow control

Sequence number in replication algorithms As discussed previously

Sequence number in site crash detection Sites increment a number post failure Therefore possible to tell if site has crashed This is important to not miss getting work done on a site

Voting

Discussed wrt: Weighted Voting Algorithm Used to determine most up-to-date copies

What if used to detect incorrect data N-way computation

Structure N-inputs: vote on them and determine most typical input N-computations on most typical input Vote on result N-outputs which go into next stage of computation Or go to some device which itself votes

Yahoo Denial of Service Attack

Mostly unavailable 10:20AM – 12:00PM PST 2/7/00 Reported cause (NYT, 2/8/00)

50 computers “flooded” Yahoo site 1 gigabyte/second or 20 mbytes/computer/second “Clogging” Yahoo’s site and routers Difficult to trace due to use of hijacked computers

Solutions Audit, Filter, Legal System

Typical Yahoo availability: 99.3%, according to Keynote Systems Corresponds to being down 61 hours/year And, Yahoo is a good site

Technique (2)

Part 3: Now, one by one: Stop a CtrReplicaGrp Start the new version Do for all CtrReplicaGrps

Now, there is a new function available. Finally, do Part 1: test what we have so carefully installed, so we haven’t

just (methodically) inserted a bug into the entire, supposedly fault-tolerant, system

Issues

Issues: Too many steps for a human being to get right

So, need automation via console May not handle a simultaneous failure during upgrading:

So, more replicas may be needed Cost of availability: The shape of this curve is right, though the calibration is

unknown and undoubtedly flattens as experience grows

010203040506070

Window of Vulnerability

If transactions used, there is a potential availability problem during the “Window of Vulnerability”

The only solution is that transactions coordinators must be rather reliable and be guaranteed to recover quickly after a crash

Availability

So, considerable thought required to achieve high availability in malleable systems

Better when not needed However, when high availability required

Every level of system needs to be studied and addressed

The Architecture As We’re Studying It

Servlet/JSPClient

Integrated Dev’t Environment

Java Runtime Environment

Security/Directory (X509, LDAP, Kerberos)

Linux NT AIX Solaris Sys/390

Reusable Components

Modeling and Other Softw’ Eng. Tools

Systems M

Reliable

Messsaging

Workflow Management

Week 4: Building Scalable and Reliable e-commerce Site

Technology

IPv6 Industrial Wireless Network Universal, Reliable, Scalable

Building scalable and reliable WISP and city carrier

SeerSuite: Developing a Scalable and Reliable Application ... · SeerSuite: Developing a Scalable and Reliable Application Framework for Building Digital Libraries by Crawling the

New reliable, scalable, cost effective · 2020. 1. 3. · reliable, scalable, cost effective. Renewable Interconnectivity Solutions Renewable sources of energy will play ... mandating

A systemic approach toward scalable, reliable and safe

Scalable Fair Reliable Multicast Using Active Services

Deployment of scalable reliable and collaborative technology

Building scalable and reliable WISP and city carrier ...mum.mikrotik.com/presentations/EG07/lutz_kleeman.pdfBuilding scalable and reliable WISP and city carrier networks based on RouterOS

Corona: A Communication Service for Scalable, Reliable Group

SeerSuite: Developing a scalable and reliable application … · 2019. 2. 25. · focuses on reliable, scalable and robust services. A previous implementation, CiteSeer (maintained

designing distributed scalable and reliable systems

Reliable and Scalable Internet Telephony

Building Reliable, Scalable AR System Solutions

S.R.E - create ultra-scalable and highly reliable systems

Building Scalable Solutions for Commerce

Emergency Vehicle Preemption solutions GPS …...Emergency Vehicle Preemption (EVP) solutions. GTT’s reliable, scalable systems help to ensure safer, GTT’s reliable, scalable systems

Datacast: A Scalable and Efﬁcient Reliable Group …conferences.sigcomm.org/co-next/2012/eproceedings/conext/...Datacast: A Scalable and Efﬁcient Reliable Group Data Delivery Service

Scalable and Reliable Logging at Pinterest

Building Scalable and Reliable Applications with Windows Azure

Scalable and reliable wireless sensor network systems