73
Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL Scalability & Performance Principles & Techniques

Scalability and Performance

Embed Size (px)

Citation preview

Page 1: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Scalability & PerformancePrinciples & Techniques

Page 2: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

What is a Performance Problem?

System is Slow for a Single User

Page 3: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

What is a Scalability Problem?Fast for Single User

butSlow under Heavy

Load

Page 4: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

How do you measure Performance?

Response Time for 1 Useri.e. how long the user waits

Page 5: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Number of Users thatcan work simultaneously

with acceptable performance

How do you measure Scalability?

Page 6: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

95% of time is spent

in fronten

d

Page 7: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

But, we won’t talk about frontend performance improvements today.

Page 8: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Let’s talk

about backen

d

Page 9: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Relationship between Performance & Scalability

Page 10: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Performance & Scalability Mantra

Strive for maximum throughput with acceptable response times

Page 11: Scalability and Performance

Copyright © 2014-2017 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

To Improve Scalability...

Improve Performance

AddCapacity

OR

Page 12: Scalability and Performance

Copyright © 2014-2017 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Response Time : 1sWorkers : 4

Machines : 1

Poor Performance & Scalability

4 requests/secondSlowest response : 1s

Page 13: Scalability and Performance

Copyright © 2014-2017 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Better Performance & Scalability

Response Time : 500ms

Workers : 4Machines : 1

8 requests/secondSlowest response : 1s

Page 14: Scalability and Performance

Copyright © 2014-2017 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Even Better Performance ...

Response Time : 100ms

Workers : 4Machines : 1

40 requests/secondSlowest response : 1s

Page 15: Scalability and Performance

Copyright © 2014-2017 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Great Performance & Scalability

Response Time : 10msWorkers : 4

Machines : 1

400 requests/secondSlowest response : 1s

Page 16: Scalability and Performance

Copyright © 2014-2017 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

You cannot make response time 0ms

To support more than 400 requests….

Increase Number of Workers

Page 17: Scalability and Performance

Copyright © 2014-2017 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

With double the workers...

Response Time : 10msWorkers : 8

Machines : 2

800 requests/secondSlowest response : 1s

Page 18: Scalability and Performance

Copyright © 2014-2017 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

and with even more workers...

Response Time : 10msWorkers : 16Machines : 4

1600 requests/secondSlowest response : 1s

Page 19: Scalability and Performance

Copyright © 2014-2017 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Coming Back To Reality

Increasing Capacity Should

Increase Throughput

It Does Not.

Not Until You Design Your Application Correctly.

Page 20: Scalability and Performance

Copyright © 2014-2017 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

You have to increase capacity at

each layer.Web & App Server

Cache Server, Database CPU, Network, Disk

Why?

And there are locks.

Database LocksSynchronized Code Blocks. MutexesFile System Locks

Page 21: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

What should our goal be?

1. Reduce Response Times2. Make it possible to add

more Capacity

Page 22: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Scaling Data Storage

Page 23: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Browser Edge Server

Load Balancer

Web Server

App Server REST API DatabaseUser

Browser Edge Server

Load Balancer

Web Server

App Server REST API DatabaseUser

Browser Cache CDN Web

AcceleratorNginx /Apache

Object Cache

AkamaiAWS

Cloudfront

VarnishAll BrowsersAll Mobiles

mod_proxy Static VariablesRedis,

MemcachedEHCache

Less Granularity, More Effective More Granularity, Less Effective

Page 24: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

HTTP Caching StrategyCache ForeverCache-Control : maxageExpires :

Do Not CacheCache-Control : maxage=-1Expires : 1970-01-01

Cache TemporarilyCache-Control : maxage=3600Expires : <now plus 1 hour>

➔ Use for HTML pages with dynamic content➔ Avoid for static resources

➔ Use for high traffic public html pages - i.e. homepage➔ Specify etag or expires header to use conditional

GET➔ Use javascript to load user specific data➔ Avoid for static resources

➔ Use for static resources - css, images, js➔ Change URL in HTML when resource is modified➔ Use a pre-processor to simplify management

Page 25: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Cache TemporarilyRender this using JS

after page load

Cache for 2h.Don’t overdo. You

cannot change the URL of your homepage if the content has to change.

Page 26: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Cache Static Files PermanentlyStatic files are loaded

from browser cache

Cache ForeverIf base.css changes, serve it

from base.css?v=2New URL. Fresh Download.

Page 27: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Cache-Control: privateLogged in user-

specific pages or APIs.

Only Cached By Browser

CDN, Web Accelerator, Proxies and Web Servers will not

cache.

Page 28: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Ask These Questions

1.How often does the data change?2.Can you tolerate stale data? For

how long?3.How critical is the data? Can you

lose some of it?

Page 29: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Caching Strategy : Split Objects

Split objects based on frequency of change.Freshers create their profile once, but apply to jobs very often.

FresherPersonal Details

Education & Work Ex

Job Application Status

Fresher ProfilePersonal Details

Education & Work Ex

Job ApplicationsJob Application Status

Create Two Objects With Different TTLs

Page 30: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Cache In-Memory

Put them in settings file

➔Frequently Accessed Data➔Infrequently Changing Data➔Configuration & Settings

Make It Easy To Deploy Just The Settings

Page 31: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Manually Clear Cache

Cache ForeverDelete as Needed

➔Dynamic Settings➔Throttles & Blacklists

Page 32: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Delete on Modification, Rebuild on Read

When data changes, delete it from the cache. Next read will automatically fill up the cache.

Page 33: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

DB Design For Scalability

Page 34: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

First, Know your Data1. Sizing

How many records in 6 months / 1 year?

2. Query VolumesHow many reads / writes?

3. Hot TablesMost frequently accessed tables?

4. Criticality of DataHow important is it to not lose data?

5. Availability v/s ConsistencyHow important is it to not lose data?

Page 35: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

What is Availability?

Ensuring your system can be used anytime

Page 36: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

What is Consistency?

Data is in same state across all the copies

Page 37: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

You can’t get both...

You can choose only one Consistency or Availability

Page 38: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Two Salesmen Selling Apartments➔ Each has a diary of sold flats➔ Call and confirm before selling

What happens when one salesman is offline, and a customer calls?

1. He takes the order. But there is a chance the other salesman also sold the same flat…Not Consistent

2. He does not take the order…Not Available.

Page 39: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Different Data, Different Guarantees

Sales Transactions must be Consistent

Product Catalog must be Available

Page 40: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Start with a Normalized Schema...

…which essentially means no redundant data.

Page 41: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Optimize Heavy Read Operations

Selectively de-normalize to eliminate joins.

1. Counts of objects2. Summary Statistics3. Events / Activities

Page 42: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Denormalize - Number of people watching this Issue

Page 43: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Not everything needs to be Accurate

Choose wisely

between Accuracy

and Performance

Page 44: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Create a Separate Reporting Schema

1. Use a Star Schema2. Aggregate Data

a. by timehour, day, week, month, quarter, year

b. by regionnorth, south, east, west, central

Page 45: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Use a Search Engine

1. Relational DB as Source of Truth2. SOLR or ElasticSearch as Index3. Cron Job to update Search Engine

Page 46: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Things to Avoid in a RDBMS

1. Don’t store files in DB2. Don’t create task queues3. Don’t maintain counters

Page 47: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Use Read Replicas

Use Master-Slave Replication, and use Slaves for Reads.

Only use for non-transactional reads.

Page 48: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Shard your Data

1. Choose shard key wiselyLocation is usually a poor choice

2. Sharding later is painfulIf you think you may need it, shard upfront.

Page 49: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Async & Availability - Best Friends

Pre-computationSlow Jobs

Offload Work to a Job Queue

Page 50: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Database Optimization

Page 51: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Quick Recap

For New Systems - See Database Design

For Existing Systems - See Database Optimization

Page 52: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Asking Deepak for a Notepad

How does he get you a notepad?The stationery shop is the database

1. Waits for the elevator2. Walks down the street3. Waits for the pedestrian traffic light4. Reaches Store5. Waits for the previous customer6. Requests for a Notepad7. Waits for the attendant to search8. Bonus : Attendant misplaces notepads

Page 53: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Three General Techniques

1. Minimize Queries

2. Do More Work in One Trip

3. Make the Query Efficient

Page 54: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

The Fastest Query...

is Never Executed

Cache Aggressively to Minimize Queries

Don’t use ORM for Reports

Page 55: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

DB Optimization : More in One Query

Page 56: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

N + 1 Problemdoc_ids = db.query(“select id from documents where user = ?“, user)

docs = []for docid in doc_ids: doc = db.query(“select … from documents where id = ?”, docid) docs.append(doc)

Query in a Loop = Disaster

Page 57: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Homogeneous Queries - Union AllTablesmonthly_incomemonthly_expenditure

Select ‘income’ as heading, month, income from monthly_incomeUNION ALLSelect ‘expenditure’ as heading, month, expenses from monthly_expenditure

Page 58: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Anti-Pattern : Fetch and Update

Page 59: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Bulk Operations & Batch Inserts

Page 60: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

DB Optimization : Efficient Queries

Page 61: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Find matching lines from a book

Page 62: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Finding a word in a novel

Full Table Scan

Page 63: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Finding a Topic in a Tech Book

Index Seek

Page 64: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Lookup Meaning of a Word

Find by Primary KeyClustered IndexData is Sorted

Page 65: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Find Words like ab* and ac*

Clustered Indexes are Great for Range Queries

Page 66: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Query Planner Algorithm

For each table in a query : Find constraining columns (where, join) For each Index on the table : Find if the index can be used If multiple indexes : Find Best Index

Page 67: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Find if Index can be used

1. Index column must be in where clause2. For multi-column indexes, the starting columns must

be in where clause

Page 68: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Best Index?

Index Cardinality

Table Statistics

Page 69: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

When is an Index NOT Used?

Page 70: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Slow Queries / Profiler

MySQL Slow Queries LogMS SQL Profiler

Page 71: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Anti Pattern : in clause with subquery

Select …. from table1 where id in (select id from table2 where…)

Page 72: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

Database Locks & Isolation Levels

Page 73: Scalability and Performance

Copyright © 2014 HashedIn Technologies Pvt Ltd. All rights reserved. – CONFIDENTIAL

What is a Lock?

Mechanism to prevent data corruption when multiple people access the database concurrently.