Upload
amazon-web-services
View
390
Download
0
Tags:
Embed Size (px)
DESCRIPTION
This expert level session covers best practices and tips on how to reduce latency to the absolute minimum when dealing with high volume, high speed datasets, using Amazon DynamoDB. We take a deep dive into the design patterns and access patterns geared to provide low latency at very high throughput. We cover some ways in which customers have achieved low latencies and have a customer speak about their experience of using DynamoDB at scale.
Citation preview
November 13, 2014 | Las Vegas, NV
Ben Clay, Amazon DynamoDB
Brett McCleary, Precision Exams
• Independent throughput or storage scaling
• Supports both document and key-value data models
Example Schema: Webstore Orders
Hash Key (string) Customer ID
Range Key (string) Timestamp
Attribute (map) Item ID : quantity map
Attribute (number) Customer ID
Item #1 #2 …
Partition #1 Partition #2 Partition #3
Item #N• One item, one partition
• Placement based on key
Item #1 #2 …
Partition #1 Partition #2 Partition #3
Provisioned
Throughput1000 WPS 1000 WPS 1000 WPS
Table
Partition 1
Partition 3Partition 2
…
Client
Client
Client
Partition 4
Partition 6Partition 5
Client
Client
Client
Client Machine
Application
OS
SDK
DynamoDB
Client Machine
Application
OS
SDK
DynamoDB
Client Machine
Application
OS
SDK
DynamoDB
0
250
500
750
1000
1250
1500
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
Th
rou
gh
pu
t
Time
Provisioned
Consumed
Table
Partition 1
Partition 3Partition 2
Client
Client
Client
Partition 4Client
Table
Partition 1Client
Client
Table grows over time, skew becomes noticeable
Webstore Orders
Hash Customer ID
Range Timestamp
Attrib Items ordered
Attrib Order ID
Orders
Partition 1
Partition 3Partition 2
…
Client
Client
Client
Partition 4
Partition 6Partition 5
Client
Client
Client
Webstore Orders
Hash Customer ID
Range Timestamp
Attrib Items ordered
Attrib Order ID
Attrib Household ID
Household Index
Hash Household ID
Range Timestamp
Attrib Items ordered
Attrib Order ID
Attrib Customer ID
Indexing
TableItem #1
Item #3Item #2
…
IndexItem #1
Item #3Item #2
…
Read
Clients
Read +
Write
Clients
Day of Order Index
Hash Day of order
Range Order ID
Attrib Items ordered
Attrib Customer ID
Attrib Timestamp
Webstore Orders
Hash Customer ID
Range Timestamp
Attrib Items ordered
Attrib Order ID
Attrib Day of order
Indexing
Day of
order
indexTable
Partition 1
Partition 3Partition 2
…
Index
Partition 1
Partition 3Partition 2
…
Client
Client
Client
New
orders
Day of
order
indexTable
Partition 1
Partition 3Partition 2
Index
Partition 1
Partition 3Partition 2
Client
Client
Client
New
orders
… …
Alternate Approach: Scanning
H: Alice
R: Oct 2
Partition #1 Partition #2
H: Alice
R: Nov 11
H: Alice
R: Dec 25
H: Bob
R: Oct 20
H: Bob
R: Nov 12
H: Bob
R: Dec 23
P1 P2 P3 P4 P5 P6 P7 P8 P9
Scan
• Delete old items from the client side
H: Alice
R: Oct 2
Partition #1 Partition #2
H: Alice
R: Nov 11
H: Alice
R: Dec 25
H: Bob
R: Oct 20
H: Bob
R: Nov 12
H: Bob
R: Dec 23
• Takeaway: Table growth can impact throughput per key
• Important when: Accumulating infrequently-read data
• Controlling table growth with deletes works but…
• Deleting items from client = 2x write cost!
• Can we achieve cheaper deletes AND scans?
H: Alice
R: Oct 2
Partition #1 Partition #2
H: Alice
R: Nov 11
H: Alice
R: Dec 25
H: Bob
R: Oct 20
H: Bob
R: Nov 12
H: Bob
R: Dec 23
Scan for last month
H: Alice
R: Dec 23
Dec Table
Hash: Bob
R: Dec 25
H: Alice
R: Oct 2
Oct Table
Hash: Bob
R: Oct 20
• Takeaway: Time series data chunks very well
• Important when: Big, growing time series tables
H: Alice
R: Nov 11
Nov Table
Hash: Bob
R: Nov 12
Scan last month
Orders TableItem #1Item #2Item #3
Orders TableItem #1Item #2
…
EMR
Fleet
2. Export
1. ScanItem #3
Orders TableItem #1Item #2
Stream
Processor
Fleet
1. Update
records
You need near-realtime data
2. Export
“It is not the strongest of the species
that survives, nor the most
intelligent, but the one most
adaptable to change.”
-- Charles Darwin
web nweb02web02
Web Tier
app01 app02 app n
Application Tier
rpt01
web nweb02web02
Web Tier
app01 app02 app n
Application Tier
Data warehouse process
sat01
sat02
sat n
rpt01
web nweb02web02
Web Tier
app01 app02 app n
Application Tier
dw01
Amazon
DynamoDB
Test Packet Answer Record
Hash Key (string) Test Packet ID
Attribute (string) Answer JSON
Test Packet Response Record
Hash Key (string) Test Packet ID
Range Key (string) Test Packet Response ID
Attribute (string) Create Timestamp
Attribute (string) Post Date
Attribute (string) Response JSON
{
"testPacketId":11193654,
"answerJson": {
"SQ22545":{"responses":{"010":"Y"},"awardedPts":1},
"SQ22546":{"responses":{"040":"Y"},"awardedPts":1},
"21137":{"responses":{"030":"Y"},"awardedPts":0}
...
}
}
{
"testPacketId":11193654,
"testPacketResponseId":"SQ22545",
"createdTimeStamp":"1412609315419",
"postDate":"1412609315419",
"responseJson":{"i":"26492","f":"N","t":0,"r":{"010":"Y"}}
}
tst01
tst02 Amazon
DynamoDB
Client
Client
ClientG
rinder
Client
Client
ClientAmazon
DynamoDB
0
2
4
6
8
10
12
300 600 1200 2200 5000
La
ten
cy (
ms
)
Provisioned Capacity (units)
P90 Client Latency (ms) Avg. Server Latency (ms) Avg. Client Latency (ms)
http://bit.ly/awsevals