84
Scaling To 1 Billion Hits A Day ChanderDhall Twitter @csdhall [email protected]

Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

Scaling To 1 Billion Hits A Day

ChanderDhall

Twitter @[email protected]

Page 2: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

About Me• Microsoft MVP • Tech Ed Speaker• Asp.NET Insider• Web API Advisor • Pluralsight Author• Dev Chair - Dev Connections

Page 3: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

About Me• Conference Organizer - jsSaturday• Leader – NodeLA user group• Leader – .NET user group at UTDallas • Owner – Chander Dhall, Inc. • Conference Organizer – MVPMIX.com• Chander Tech Podcast

Page 4: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Why?• Amazon claim – Just an extra 1/10th of

a second on their response times will cost them 1% in sales.

• Google – ½ a second increase in latency caused traffic to drop by a fifth.

Page 5: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Theory of Scaling

#devconnections

Page 6: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Practice of Scalability

#devconnections

Page 7: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Agenda• Why is it important to scale? • Creating a scalable solution (in incremental steps)

– Propose an Architecture– Identify Failures and Bottlenecks– Identify Downtime– Apply a better solution – Repeat until we solve (in 10 steps)– Then some bonus stuff (a better solution)

Page 8: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Unfortunate SolutionLoad Balancer

S SSSS Services

Page 9: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Unfortunate Solution

S1

Services

S2 S3 S4 S5

Databases

Page 10: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Gilbert and Lynch white paper

A {“name” : ”Chander”,

“gender” : ”m”}

B {“name” : ”Chander”,

“gender” : ”m”}Write

AlgorithmRead Algorithm

Network 2Network 1

Page 11: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Happy path scenario

A {“name” : ”Dhall”, “gender” : ”m”

}

B

Write Algorithm

Read Algorithm

Network 2Network 1

{“name” : ”Chander”,

“gender” : ”m”}

Update Message

Page 12: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Happy path scenario

A {“name” : ”Dhall”, “gender” : ”m”

}

B

Write Algorithm

Read Algorithm

Network 2Network 1

{“name” : ”Dhall”, “gender” : ”m”

}

Page 13: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Network partitions

A {“name” : ”Dhall”, “gender” : ”m”

}

B

Write Algorithm

Read Algorithm

Network 2Network 1

Update Message

{“name” : ”Chander”,

“gender” : ”m”}

Page 14: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

CAP Theorem

Consistency Availability

Partitioning

Page 15: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Brewer’s CAP Theorem • Consistency (or more appropriately Atomic)• Availability• Partition Tolerance

– “No set of failures less than total network failure is allowed to cause the system to respond incorrectly” – Gilbert & Lynch

Page 16: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Just FYI• Consistency

(in CAP theorem)

•Atomicity(in ACID)

•Consistency(in ACID)● Means any transaction will bring the

database from one valid state to another.

Page 17: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Fallacies of Distributed Computing• Network is reliable. • Latency is zero. • Bandwidth is infinite. • Network is secure. • Topology doesn’t change. • There is one administrator.

Page 18: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Fallacies of Distributed Computing

• Transport cost is zero. • Network is homogenous.

Page 19: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Why is Scalability Important

• Instant success – Thanks to Social networking – Twitter: 200 billion tweets per year– Facebook: 1.23 billion active monthly users a

month

• Billions of devices (desktops, tablets, mobile)

• Need: Millions of hits with Zero downtime

Page 20: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Why is Scalability Important

The website was working greatUNTIL we launched J

– Instagram was down on the launch day

Page 21: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

The Variables• Scalability - Number of users / sessions /

transactions / operations the entire system can handle

• Performance – Optimal utilization of resources

• Responsiveness – Time taken per operation

• Availability – Probability of the application being available at any given point in time

• Downtime Impact - The impact of a downtime of a server/service/resource - number of users, type of impact etc

Page 22: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Major Factors• Platform selection• Hardware• Application Design• Database/Datastore Structure and Architecture• Caching strategy• Asynchronous processing• Deployment Process and Architecture• Monitoring mechanisms• … and more

Page 23: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

23

Step 1 Appserver &

DBServer

Database ServerApp Server

Page 24: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Step 2 – Vertical Scaling Appserver &

DBServer

Database ServerApp Server

Throw more RAM and CPU J

Page 25: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Step 2 - Vertical Scaling

• …. or Scale up– Increasing the hardware resources without changing the

number of nodes• Disadvantages

– Law of diminishing returns– Downtime– Increases Downtime Impact– Incremental costs increase exponentially

Page 26: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Step 3 – Vertical Partitioning (Services)

• Introduction– Deploying each service on a separate node

• Advantages– Increases Availability (per app)– Easy to tune and optimize– Reduces context switching– Simple to implement

App Server

Db Server

Page 27: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Step 3 – Vertical Partitioning (Services)• Disadvantages

– Sub-optimal resource utilization– May not increase overall availability– Finite Scalability App Server

Db Server

Page 28: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Vertical Partitioning• Distribute the responsibilities. Increased

number of nodes. • Each node (or cluster) performs separate Tasks• Each node (or cluster) is different from the

other

Page 29: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Step 4 – Horizontal ScalingLoad Balancer

DB Server

Page 30: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Horizontal Scaling• Replication of nodes• Nodes perform the same tasks • Nodes are identical• Scale out

Page 31: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Sticky Sessions

• Subsequent requests from a user are sent to the original server• Asymmetrical load distribution• Downtime Impact – Loss of session data Load Balancer

User 1

Page 32: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Central Session Store

• Session store is a single point of failure • Session reads and writes generate Disk + Network I/O

Load Balancer

Session Store

App

SERVER

Page 33: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Clustered Session Management–No Single point of failure–Session reads are instantaneous–Session writes generate Network I/O–Increase in number of nodes increases Network I/O exponentially –What happens when?

• User request arrives before intra-node communication finished

• Intra-node communication fails

Clustered Session ManagementLoad Balancer

Page 34: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Recommendations

–Use scaled version of a Central Session Store (Recommended)–Use Clustered Session Management ONLY if you have –

• Smaller Number of App Servers• Fewer Session writes–Don’t use sticky sessions if you want to scale

Page 35: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Load Balanced App Server ClusterActive-Active

assumes that each LB is independently able to take up the load of the other Load Balancer

Users

Load Balancer

Page 36: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Step 5 – Vertical Partitioning (Hardware)

Load Balancer Load Balancer

DB Server SAN

Page 37: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Step 5 – Vertical Partitioning (Hardware)• Advantages

§ Allows “Scaling Up” the DB Server§ Boosts Performance of DB Server

• Disadvantages § Increases Cost

Page 38: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Step 6 – Horizontal Scaling (DB)• Introduction

§ Increasing the number of DB nodes§ Referred to as “Scaling out” the DB Server

• Options§ Shared nothing Cluster§ Real Application Cluster (or Shared Storage Cluster)

Page 39: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Step 6 – Horizontal Scaling (DB)Load Balancer Load Balancer

DB Server SAN

Page 40: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Step 6 – Horizontal Scaling (DB)Load Balancer Load Balancer

DB Server SANDB Server DB Server

DB Replica

Page 41: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Step 7 – Vertical / Horizontal Partitioning (DB)

• Introduction§ Increasing the number of DB Clusters by dividing the data

• Options§ Vertical Partitioning - Dividing tables / columns§ Horizontal Partitioning - Dividing by rows (value)

Page 42: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Step 7 – Vertical / Horizontal Partitioning (DB)

Load Balancer Load Balancer

DB Server SANDB Server DB Server

DB Cluster

Page 43: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Step 7 – Vertical / Horizontal Partitioning (DB)

App Cluster

Db Cluster 1 Db Cluster 2

Twitter Table

Facebook Table

Users Table

Products Table

Vertical Partitioning

Page 44: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Step 7 – Vertical / Horizontal Partitioning (DB)

App Cluster

Db Cluster 1 Db Cluster 2

Twitter Table

Facebook Table

Twitter Table

Facebook Table

Horizontal Partitioning

1st Million Users 2nd Million Users

Page 45: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Step 7 – Vertical / Horizontal Partitioning (DB)

Load Balancer Load Balancer

DB

DB Cluster

DB DB DB DB DB

DB Cluster

Hash Map

SAN

Page 46: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Step 8 – Separating Sets

Load Balancer Load Balancer

DB DB DB

DB Cluster

Hash Map

DB DB DB

DB Cluster

Load Balancer Load Balancer

DB DB DB

DB Cluster

Hash Map

DB DB DB

DB Cluster

Set 1-10 Million Users Set 11-20 Million Users

Global Redirector

Global Look up

Hash Map

Page 47: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Step 9 – Caching• Add caches within App Server

§ Object Cache§ Session Cache§ API cache§ Page cache

• Software§ Memcached§ Redis§ Azure Cache (App Fabric)

Page 48: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Step 10 – HTTP Accelerator• A good HTTP Accelerator / Reverse proxy performs the following –

§ Redirect static content requests to a lighter HTTP server (lighttpd)§ Cache content based on rules § Use Async Non blocking IO § Maintain a limited pool of Keep-alive connections to the App Server§ Intelligent load balancing

• Solutions§ Nginx (HTTP / IMAP)§ Perlbal§ Hardware accelerators plus LBs

Page 49: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

More Important Stuff• CDNs• IP Anycasting• Async Nonblocking IO (for all Network Servers)• If possible - Async Nonblocking IO for disk• Incorporate multi-layer caching strategy where required

§ L1 cache – in-process with App Server§ L2 cache – across network boundary§ L3 cache – on disk

• Grid computing

Page 50: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

NoSql Vs RelationalKey Value

Memcached

Relational Databases

Document Databases

0

2000

4000

6000

8000

10000

12000

14000

0 2000 4000 6000 8000 10000 12000 14000 16000Depth of Functionality à à àSc

alab

ility

and

Perfo

rman

ceà

àà

Page 51: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

NoSql vs Relational

• No Joins– Do you need them though?

• Transactions– RDBMS great for concurrency, integrity or

data type validity.

Page 52: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Relational -> NoSql• Ever increasing users. Scalability needs. • Highly structured data to structured, semi-

structured and unstructured data. • Advent of high speed data networking. • Distributed computing. • Cheap and plenty memory.

Page 53: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Relational -> NoSql

http://www.couchbase.com/why-nosql/nosql-database

Page 54: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Scaling RDBMS

• RDBMS sharding– Highly disruptive to re-shard. – Lose benefits of relational model. – Create and maintain schema on every

server.

Page 55: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Scaling RDBMS

• Denormalizing– Why use a RDBMS? J

Page 56: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Scaling RDBMS• Distributed caching for RDBMS (eg:

memcached)– Speed up reads only. – Cold cache thrash. – Management costs.

Page 57: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Relational -> NoSql• Schemaless.• Auto-sharding.• Distributed querying.• Integrated caching.

Page 58: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

No SQL Types• Key Value• Ordered key value• Wide Column Store• Document Store/Full Text Search• Graph DBs• Object DBs

Page 59: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Key-Value Store• Pros

– Simple. Programmer friendly.– Powerful. Fast.

• Cons– Key range support not good.– Aggregation support lacking.

Value Key

Value Key

Value Key

Value Key

Value Key

Page 60: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Ordered Key-Value Store

• Pros– Processes key ranges. – More powerful.

• Cons– No framework for value modeling.

Value Key

Value Key

Value Key

Value Key

Value Key

Page 61: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Big Table • Pros

– Model values as maps of mapsof maps.

• Cons– Not appropriate for schemesarbitrary complexity.

Value Key

Value Key

Value Key

Value Key

Value Key

Page 62: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Big Table • Pros

– Model values as maps of mapsof maps.

• Cons– Not appropriate for schemesarbitrary complexity.

Page 63: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Big table Key

Column family

Key

Column family

Page 64: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Document/Full-Text

• Pros– Collection of documents which contain key-

value collections. – Natural data modeling. – Programmer friendly. – Web based. Mostly REST/Json friendly.

Page 65: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Document/Full text search databases

Key Val

Key Val

Key Val

“Person”: {“name”: “Chander Dhall”, “address”: {

“city”:”los angeles”, “state”: “CA”,“zip”: “90069”

}}

Page 66: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Graph databasesKey

Key

Key

Key

Page 67: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Step 12- FinalizingLoad Balancer Load Balancer

DB DB DB

DB Cluster

Hash Map

DB DB DB

DB Cluster

Master

Slave Slave

SANNo Sql

Master

Slave Slave

Search Db

CachingOffline

Processing

Page 68: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

NoSql Paradigm - Denormalization

• Data duplication and denormalization– First class citizens.

• Increases total data volume. • Simplifies query processing esp. in a

distributed environment.

Page 69: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

NoSql Paradigm - Atomic AggregatesAccountIdAccount No.

CheckingIdMin bal

SavingsIdInterest rate

Account{“Type”: “Checking”, “Id”: “chk123”,“Min Bal”:”10000”,}

Account{“Type”: “Savings”, “Id”: “sav123”,“Interest Rate”:”5%”,}

Page 70: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

No Sql Paradigm – No joins

• Sql Joins – query time. Hence, performance penalty.

• Handled in application instead.

Page 71: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

No Sql Paradigm – Enumerable keys

• Sequential Ids for composite keys eg. DeptId_employeeId.

• Group into buckets sorted by timestamp, day, week etc.

Page 72: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

No sql paradigm – Index tableEmployee Id

Details

1234 Email: [email protected]; State: CA; Dept: IT

8235 Email: [email protected]; State: TX; Dept: Sales

2234 Email: [email protected]; State: AL; Dept: IT

1671 Email: [email protected]; State: WA; Dept: Sales

State Employee Id

CA 1234, 1235, 1236, 1244

TX 8000, 8100, 8235, 8266

AL 2212, 2221, 2234, 2256

Dept Employee IdIT 1234, 1235, 1236, 1244

Sales 8000, 8100, 8235, 8266

Acc 2212, 2221, 2234, 2256

Page 73: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

No sql paradigm – Tree IndexCountry - USA

State - CA

City - LA

Properties

Facilities

{“property”:[{ “facilityName”: abc”,

“facilityId”:”111”},{“facilityName”:”xyz” , “facilityId”:”222”}]}

Page 74: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

No sql paradigm – Composite Key

Dept= IT:* orDept= Sales:Online:*

IT: Software: 1123 EmpName: John; Address: Los Angeles

IT: Software: 2323 EmpName: Kevin; Address: Dallas, TX

IT: Hardware: 6767 EmpName: Matt; Address: San Francisco

Sales: Online: 832 EmpName: Katie: Address: Austin, Tx

Sales : Online: 423 EmpName: Karen: Address: Irvine, CA

Sales : Store : 556 EmpName: Richard; Address: San Diego

ITEmployees

SalesEmployees

EMPLOYEES

Page 75: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

No sql paradigm - GroupingU123: O111 Product Ids: [“Surface”, “xbox”]

U124:O123 Product Ids: [“Win 8”, “xbox”]

U124:O234 Product Ids: [“Win phone”, “surface”]

U124:O999 Product Ids: [“office”, “azure sub”]

U125:O789 Product Ids: [“msdn”, “office”]

U125:O945 Product Ids: [“surface”, “xbox”]

Colocation of a users’ data.

Page 76: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Inverted search & direct aggregationEmpId, dept, city, …….

Dept-IT: [111, 123, 234….]

Dept-Sales:[673, 343, 434….]

City: Dallas

City: LA

111: Dept-Sales, City: LA …

222: Dept-IT, City: Dallas ….

Page 77: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

No sql paradigm – Materialized pathsElectronics

TV Phones Computers Cameras

Samsung Apple LG

LCD LED

Page 78: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

No sql paradigm – Materialized pathsTV

Samsung Apple LG

LCD LED

{ “entity”: “TV”, “category”:”Electronics”}

{ “entity”: “Samsung”, “category”:”Electronics, TV”}

{ “entity”: “Samsung”, “category”:”Electronics, TV, LCD”}

Page 79: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

No sql paradigm – Nested setsElectronics

TV Phones

Samsung Sony Cell Landline1 2 3 4 5 6 7 8 9 10 11 12 13 14

Page 80: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

No sql paradigm – Nested sets

1 2 3 4 5 6 7 8 9 10 11 12 13 14

Sony Samsung

TV

LandlineCell

Phone

Electronics

Page 81: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Flattening nested documents

Name: Chander

Hadoop: Expert

Nodejs: Expert

Spanish: Novice

{“name”:”chander”,

“skills”:”hadoop, nodejs, Spanish”, “level”:”expert, expert, novice”

}

Skills:hadoop AND level:expert

Page 82: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

Flattening nested documents

Name: Chander

Hadoop: Expert

Nodejs: Expert

Spanish: Novice

{“name”:”chander”,“skills_1”:”hadoop”,“skills_2”: “nodejs”,“skills_3”: “spanish”, “level_1”:”expert”,“level_2”: “expert”,“level_3”: “novice”

}

Page 83: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

References • http://www.couchbase.com/why-

nosql/nosql-database• Highly scalable blog. • www.10gen.com• http://couchdb.apache.org/• www.ravendb.net

Page 84: Scaling To 1 Billion Hits A Day - SDD Conferencesddconf.com/brands/sdd/library/Scaling_Web_Apps.pdf · 2016-05-16 · SCALING TO 1 BILLION HITS A DAY Step 10 – HTTP Accelerator

SCALING TO 1 BILLION HITS A DAY

References • http://redis.io/• http://neo4j.org/• http://Cassandra.apache.org• http://elasticsearch.org• http://memcached.org/• Building Scalable Architecture