(PFC402) Bigger, Faster: Performance Tips for High Speed and High Volume Applications | AWS...

Preview:

DESCRIPTION

This expert level session covers best practices and tips on how to reduce latency to the absolute minimum when dealing with high volume, high speed datasets, using Amazon DynamoDB. We take a deep dive into the design patterns and access patterns geared to provide low latency at very high throughput. We cover some ways in which customers have achieved low latencies and have a customer speak about their experience of using DynamoDB at scale.

Citation preview

November 13, 2014 | Las Vegas, NV

Ben Clay, Amazon DynamoDB

Brett McCleary, Precision Exams

• Independent throughput or storage scaling

• Supports both document and key-value data models

Example Schema: Webstore Orders

Hash Key (string) Customer ID

Range Key (string) Timestamp

Attribute (map) Item ID : quantity map

Attribute (number) Customer ID

Item #1 #2 …

Partition #1 Partition #2 Partition #3

Item #N• One item, one partition

• Placement based on key

Item #1 #2 …

Partition #1 Partition #2 Partition #3

Provisioned

Throughput1000 WPS 1000 WPS 1000 WPS

Table

Partition 1

Partition 3Partition 2

Client

Client

Client

Partition 4

Partition 6Partition 5

Client

Client

Client

Client Machine

Application

OS

SDK

DynamoDB

Client Machine

Application

OS

SDK

DynamoDB

Client Machine

Application

OS

SDK

DynamoDB

0

250

500

750

1000

1250

1500

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20

Th

rou

gh

pu

t

Time

Provisioned

Consumed

Table

Partition 1

Partition 3Partition 2

Client

Client

Client

Partition 4Client

Table

Partition 1Client

Client

Table grows over time, skew becomes noticeable

Webstore Orders

Hash Customer ID

Range Timestamp

Attrib Items ordered

Attrib Order ID

Orders

Partition 1

Partition 3Partition 2

Client

Client

Client

Partition 4

Partition 6Partition 5

Client

Client

Client

Webstore Orders

Hash Customer ID

Range Timestamp

Attrib Items ordered

Attrib Order ID

Attrib Household ID

Household Index

Hash Household ID

Range Timestamp

Attrib Items ordered

Attrib Order ID

Attrib Customer ID

Indexing

TableItem #1

Item #3Item #2

IndexItem #1

Item #3Item #2

Read

Clients

Read +

Write

Clients

Day of Order Index

Hash Day of order

Range Order ID

Attrib Items ordered

Attrib Customer ID

Attrib Timestamp

Webstore Orders

Hash Customer ID

Range Timestamp

Attrib Items ordered

Attrib Order ID

Attrib Day of order

Indexing

Day of

order

indexTable

Partition 1

Partition 3Partition 2

Index

Partition 1

Partition 3Partition 2

Client

Client

Client

New

orders

Day of

order

indexTable

Partition 1

Partition 3Partition 2

Index

Partition 1

Partition 3Partition 2

Client

Client

Client

New

orders

… …

Alternate Approach: Scanning

H: Alice

R: Oct 2

Partition #1 Partition #2

H: Alice

R: Nov 11

H: Alice

R: Dec 25

H: Bob

R: Oct 20

H: Bob

R: Nov 12

H: Bob

R: Dec 23

P1 P2 P3 P4 P5 P6 P7 P8 P9

Scan

• Delete old items from the client side

H: Alice

R: Oct 2

Partition #1 Partition #2

H: Alice

R: Nov 11

H: Alice

R: Dec 25

H: Bob

R: Oct 20

H: Bob

R: Nov 12

H: Bob

R: Dec 23

• Takeaway: Table growth can impact throughput per key

• Important when: Accumulating infrequently-read data

• Controlling table growth with deletes works but…

• Deleting items from client = 2x write cost!

• Can we achieve cheaper deletes AND scans?

H: Alice

R: Oct 2

Partition #1 Partition #2

H: Alice

R: Nov 11

H: Alice

R: Dec 25

H: Bob

R: Oct 20

H: Bob

R: Nov 12

H: Bob

R: Dec 23

Scan for last month

H: Alice

R: Dec 23

Dec Table

Hash: Bob

R: Dec 25

H: Alice

R: Oct 2

Oct Table

Hash: Bob

R: Oct 20

• Takeaway: Time series data chunks very well

• Important when: Big, growing time series tables

H: Alice

R: Nov 11

Nov Table

Hash: Bob

R: Nov 12

Scan last month

Orders TableItem #1Item #2Item #3

Orders TableItem #1Item #2

EMR

Fleet

2. Export

1. ScanItem #3

Orders TableItem #1Item #2

Stream

Processor

Fleet

1. Update

records

You need near-realtime data

2. Export

“It is not the strongest of the species

that survives, nor the most

intelligent, but the one most

adaptable to change.”

-- Charles Darwin

web nweb02web02

Web Tier

app01 app02 app n

Application Tier

rpt01

web nweb02web02

Web Tier

app01 app02 app n

Application Tier

Data warehouse process

sat01

sat02

sat n

rpt01

web nweb02web02

Web Tier

app01 app02 app n

Application Tier

dw01

Amazon

DynamoDB

Test Packet Answer Record

Hash Key (string) Test Packet ID

Attribute (string) Answer JSON

Test Packet Response Record

Hash Key (string) Test Packet ID

Range Key (string) Test Packet Response ID

Attribute (string) Create Timestamp

Attribute (string) Post Date

Attribute (string) Response JSON

{

"testPacketId":11193654,

"answerJson": {

"SQ22545":{"responses":{"010":"Y"},"awardedPts":1},

"SQ22546":{"responses":{"040":"Y"},"awardedPts":1},

"21137":{"responses":{"030":"Y"},"awardedPts":0}

...

}

}

{

"testPacketId":11193654,

"testPacketResponseId":"SQ22545",

"createdTimeStamp":"1412609315419",

"postDate":"1412609315419",

"responseJson":{"i":"26492","f":"N","t":0,"r":{"010":"Y"}}

}

tst01

tst02 Amazon

DynamoDB

Client

Client

ClientG

rinder

Client

Client

ClientAmazon

DynamoDB

0

2

4

6

8

10

12

300 600 1200 2200 5000

La

ten

cy (

ms

)

Provisioned Capacity (units)

P90 Client Latency (ms) Avg. Server Latency (ms) Avg. Client Latency (ms)

http://bit.ly/awsevals

Recommended