74
© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved. Andreas Chatzakis, AWS Solutions Architecture 7 th July 2016 Deep Dive on Amazon DynamoDB

Deep Dive on Amazon DynamoDB

Embed Size (px)

Citation preview

Page 1: Deep Dive on Amazon DynamoDB

© 2016, Amazon Web Services, Inc. or its Affiliates. All rights reserved.

Andreas Chatzakis, AWS Solutions Architecture

7th July 2016

Deep Dive on Amazon DynamoDB

Page 2: Deep Dive on Amazon DynamoDB

Objectives

• Prepare for success

• Large tables & demanding use-cases

• High Performance

• Cost optimized

• New functionality

Page 3: Deep Dive on Amazon DynamoDB

Technology adoption and the hype curve

Page 4: Deep Dive on Amazon DynamoDB

Why NoSQL?

Optimized for storage Optimized for scalability

Normalized/relational Denormalized/hierarchical

Ad hoc queries Instantiated views

Scale vertically Scale horizontally

SQL NoSQL

Page 5: Deep Dive on Amazon DynamoDB

Scaling efficiently

Page 6: Deep Dive on Amazon DynamoDB

Size(Gigabytes)

Throughput(Requests per second)

Scaling

Page 7: Deep Dive on Amazon DynamoDB

Partitioning

Page 8: Deep Dive on Amazon DynamoDB

Partition count: Size

# 𝑜𝑓 𝑃𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝑠 =𝑇𝑎𝑏𝑙𝑒 𝑆𝑖𝑧𝑒 𝑖𝑛 𝑏𝑦𝑡𝑒𝑠

10 𝐺𝐵(𝑓𝑜𝑟 𝑠𝑖𝑧𝑒)

In the future, these details might change…

Page 9: Deep Dive on Amazon DynamoDB

Throughput

• Write capacity units (WCUs): 1 KB

• Read capacity units (RCUs): 4 KB

• 1 RCU => 1 strongly consistent read

• 1 RCU => 2 eventually consistent reads

Page 10: Deep Dive on Amazon DynamoDB

Partition count: Throughput

# 𝑜𝑓 𝑃𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝑠(𝑓𝑜𝑟 𝑡ℎ𝑟𝑜𝑢𝑔ℎ𝑝𝑢𝑡)

=𝑅𝐶𝑈𝑓𝑜𝑟 𝑟𝑒𝑎𝑑𝑠

3000 𝑅𝐶𝑈+

𝑊𝐶𝑈𝑓𝑜𝑟 𝑤𝑟𝑖𝑡𝑒𝑠

1000𝑊𝐶𝑈

In the future, these details might change…

Page 11: Deep Dive on Amazon DynamoDB

ProvisionedThroughputExceededException

Page 12: Deep Dive on Amazon DynamoDB

Built-in flexibility for small spikes

0

400

800

1200

1600

Cap

ac

ity U

nit

s

Time

Provisioned Consumed

“save up” unused capacity

consume saved up capacity

Page 13: Deep Dive on Amazon DynamoDB

Burst capacity

0

400

800

1200

1600

Cap

ac

ity U

nit

s

Time

Provisioned Consumed Attempted

Burst capacity: 300 seconds

(1200 × 300 = 3600 CU)

Throttled requests

Don’t completely depend on burst capacity… provision sufficient throughput

Page 14: Deep Dive on Amazon DynamoDB

Throughput per partition

100,000𝑅𝐶𝑈50𝑃𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛𝑠

≈ 𝟐𝟎𝟎𝟎 𝑟𝑒𝑎𝑑 𝑐𝑎𝑝𝑎𝑐𝑖𝑡𝑦 𝑢𝑛𝑖𝑡𝑠 𝑝𝑒𝑟 𝑝𝑎𝑟𝑡𝑖𝑡𝑖𝑜𝑛

Partition 1

2000 RCU

Partition K

2000 RCU

Partition M

2000 RCU

Partition 50

2000 RCU

ProductCatalog Table

Page 15: Deep Dive on Amazon DynamoDB

Space(which partition keys)

Time(consumed capacity

per second)

Aim for Uniformity

Page 16: Deep Dive on Amazon DynamoDB

Examine your traffic pattern: Space

Part

itio

n

Time

Heat

Page 17: Deep Dive on Amazon DynamoDB

Hot key issues manifest after you scale

Client

Client

Table

Partition

Table

PartitionClient

Client

Client

Client

Partition

Partition

Partition

Partition

Page 18: Deep Dive on Amazon DynamoDB

A bad choice for a partition key

f(x)

Partition 1 Partition 2 Partition 3 Partition 4

Partition key: “07-07-2016”

Range key: “Session Attendee X”

Partition key: “07-07-2016”

Range key: “Session Attendee Y”

Table: SummitSessionAttendance

Page 19: Deep Dive on Amazon DynamoDB

But I have random partition keys!

Keys/partition is important but also other outliers:

- Frequency (Hot keys)

- Size (Large objects or collections)

- Table history (partitions are not merged)?

Page 20: Deep Dive on Amazon DynamoDB

Partition key value Uniformity

User ID, where the application has many users and each

user has similar activity levels.

Status code, where there are only a few possible status

codes.

Device ID, where each device accesses data at relatively

similar intervals

Device ID, where one device generates a lot more traffic

than any other device

Page 21: Deep Dive on Amazon DynamoDB

What a hot partition problem looks like

Read Capacity Throttled read requests

provisioned

consumed

Page 22: Deep Dive on Amazon DynamoDB

Troubleshooting hot partitions

- CloudWatch

- AWS Support

- Access logs

- ReturnConsumedCapacity

- Sampling works well

- GSIs

- must also have enough write capacity

- uniformity requirement also applies

Page 23: Deep Dive on Amazon DynamoDB

Examine your traffic pattern: Time

Part

itio

n

Time

Heat

Page 24: Deep Dive on Amazon DynamoDB

Avoid Sudden Bursts of Read Activity

throttling

Page 25: Deep Dive on Amazon DynamoDB

Query rather than scan

Query

- Specify partition key name

- Condition on sort key

- Cheap with high cardinality

keys

Scan

- Reads all data

- Conditions available

through filters

- Expensive for large tables

Partition Sort Atribute1 … Attribute N

Page 26: Deep Dive on Amazon DynamoDB

When you have to scan a table

• Scans constrained by single

partition throughput

• Use parallel Scans if

table>20GB

• Avoid sudden bursts vs

provisioned capacity

• Offload to S3, HDFS,

Redshift, ElasticSearch or

second table

Page 27: Deep Dive on Amazon DynamoDB

Design patterns & best practices

Page 28: Deep Dive on Amazon DynamoDB

Product catalog

Popular items (read)

Page 29: Deep Dive on Amazon DynamoDB

Partition 1

2000 RCUs

Partition K

2000 RCUs

Partition M

2000 RCUs

Partition 50

2000 RCU

Scaling bottlenecks

Product A Product B

Shoppers

ProductCatalog Table

SELECT Id, Description, ...

FROM ProductCatalog

WHERE Id="POPULAR_PRODUCT"

Page 30: Deep Dive on Amazon DynamoDB

Partition 1 Partition 2

ProductCatalog Table

User

DynamoDB

User

Cache

popular items

SELECT Id, Description, ...

FROM ProductCatalog

WHERE Id="POPULAR_PRODUCT"

Page 31: Deep Dive on Amazon DynamoDB

Real-time voting

Write-heavy items

Page 32: Deep Dive on Amazon DynamoDB

Partition 1

1000 WCUs

Partition K

1000 WCUs

Partition M

1000 WCUs

Partition N

1000 WCUs

Votes Table

Candidate A Candidate B

Scaling bottlenecks

Voters

Provision 200,000 WCUs

Page 33: Deep Dive on Amazon DynamoDB

Write sharding

Candidate A_2

Candidate B_1

Candidate B_2

Candidate B_3

Candidate B_5

Candidate B_4

Candidate B_7

Candidate B_6

Candidate A_1

Candidate A_3

Candidate A_4Candidate A_7 Candidate B_8

Candidate A_6 Candidate A_8

Candidate A_5

Voter

Votes Table

Page 34: Deep Dive on Amazon DynamoDB

Write sharding

Candidate A_2

Candidate B_1

Candidate B_2

Candidate B_3

Candidate B_5

Candidate B_4

Candidate B_7

Candidate B_6

Candidate A_1

Candidate A_3

Candidate A_4Candidate A_7 Candidate B_8

UpdateItem: “CandidateA_” + rand(0, 10)

ADD 1 to Votes

Candidate A_6 Candidate A_8

Candidate A_5

Voter

Votes Table

Page 35: Deep Dive on Amazon DynamoDB

Votes Table

Shard aggregation

Candidate A_2

Candidate B_1

Candidate B_2

Candidate B_3

Candidate B_5

Candidate B_4

Candidate B_7

Candidate B_6

Candidate A_1

Candidate A_3

Candidate A_4

Candidate A_5

Candidate A_6 Candidate A_8

Candidate A_7 Candidate B_8

Periodic

process

Candidate A

Total: 2.5M

1. Sum

2. Store Voter

Page 36: Deep Dive on Amazon DynamoDB

Trade off read cost for write scalability

Consider throughput per partition key

Shard write-heavy partition keys

Your write workload is not horizontally

scalable

Page 37: Deep Dive on Amazon DynamoDB

Cost Optimization tips

Page 38: Deep Dive on Amazon DynamoDB

Auto Scaling

• Cost saving technique

• Open Source solutions

• Set minimums and maximums

• Scale up proactively, scale down conservatively

• Scale up time can be from minutes to hours

• Implement a circuit-breaker

Page 39: Deep Dive on Amazon DynamoDB

Event loggingStoring time series data

Page 40: Deep Dive on Amazon DynamoDB

A mix of hot and cold data

Events_tableil

Event_id

(Partition)

Timestamp

(Sort)

Attribute1 …. Attribute N RCUs = 10000

WCUs = 10000Current table

Antipattern:

• Mix of hot and cold data

• Old data rarely accessed

• Unbounded data (partition) growth

• Partition dilution

• Scan costs increase with table size

• Deletes of old data not trivial or cheap

Page 41: Deep Dive on Amazon DynamoDB

Time series tables

Events_table_2015_April

Event_id

(Partition)

Timestamp

(Sort)

Attribute1 …. Attribute N

Events_table_2015_March

Event_id

(Partition)

Timestamp

(Sort)

Attribute1 …. Attribute N

Events_table_2015_Feburary

Event_id

(Partition)

Timestamp

(Sort)

Attribute1 …. Attribute N

Events_table_2015_January

Event_id

(Partition)

Timestamp

(Sort)

Attribute1 …. Attribute N

RCUs = 1000

WCUs = 1

RCUs = 10000

WCUs = 10000

RCUs = 100

WCUs = 1

RCUs = 10

WCUs = 1

Current table

Older tables

Hot

data

Cold

data

Don’t mix hot and cold data; archive cold data to Amazon S3

Page 42: Deep Dive on Amazon DynamoDB

Use a table per time period

Precreate daily, weekly, monthly tables

Provision required throughput for current table

Writes go to the current table

Turn off (or reduce) throughput for older tables

Cheaper scans – free deletes

Dealing with time series data

Page 43: Deep Dive on Amazon DynamoDB

Multiplayer online gaming

Query filters vs.

composite key indexes

Page 44: Deep Dive on Amazon DynamoDB

GameId Date Host Opponent Status

d9bl3 2014-10-02 David Alice DONE

72f49 2014-09-30 Alice Bob PENDING

o2pnb 2014-10-08 Bob Carol IN_PROGRESS

b932s 2014-10-03 Carol Bob PENDING

ef9ca 2014-10-03 David Bob IN_PROGRESS

Games table

Hierarchical data structures

Page 45: Deep Dive on Amazon DynamoDB

Query for incoming game requests

DynamoDB indexes provide partition and sort

What about queries for two equalities and a sort?

SELECT * FROM Game

WHERE Opponent='Bob‘

AND Status=‘PENDING'

ORDER BY Date DESC

(hash)

(range)

(?)

Page 46: Deep Dive on Amazon DynamoDB

Secondary index

Opponent Date GameId Status Host

Alice 2014-10-02 d9bl3 DONE David

Carol 2014-10-08 o2pnb IN_PROGRESS Bob

Bob 2014-09-30 72f49 PENDING Alice

Bob 2014-10-03 b932s PENDING Carol

Bob 2014-10-03 ef9ca IN_PROGRESS David

Approach 1: Query filter

BobPartition key Sort key

Page 47: Deep Dive on Amazon DynamoDB

Secondary Index

Approach 1: Query filter

Bob

Opponent Date GameId Status Host

Alice 2014-10-02 d9bl3 DONE David

Carol 2014-10-08 o2pnb IN_PROGRESS Bob

Bob 2014-09-30 72f49 PENDING Alice

Bob 2014-10-03 b932s PENDING Carol

Bob 2014-10-03 ef9ca IN_PROGRESS David

SELECT * FROM Game

WHERE Opponent='Bob'

ORDER BY Date DESC

FILTER ON Status='PENDING'

(filtered out)

Page 48: Deep Dive on Amazon DynamoDB

Needle in a haystack

Bob

Page 49: Deep Dive on Amazon DynamoDB

Send back less data “on the wire”

Simplify application code

Simple SQL-like expressions

• AND, OR, NOT, ()

Use query filter

Your index isn’t entirely selective

Page 50: Deep Dive on Amazon DynamoDB

Approach 2: Composite key

StatusDate

DONE_2014-10-02

IN_PROGRESS_2014-10-08

IN_PROGRESS_2014-10-03

PENDING_2014-09-30

PENDING_2014-10-03

Status

DONE

IN_PROGRESS

IN_PROGRESS

PENDING

PENDING

Date

2014-10-02

2014-10-08

2014-10-03

2014-10-03

2014-09-30

+ =

Page 51: Deep Dive on Amazon DynamoDB

Secondary Index

Approach 2: Composite key

Opponent StatusDate GameId Host

Alice DONE_2014-10-02 d9bl3 David

Carol IN_PROGRESS_2014-10-08 o2pnb Bob

Bob IN_PROGRESS_2014-10-03 ef9ca David

Bob PENDING_2014-09-30 72f49 Alice

Bob PENDING_2014-10-03 b932s Carol

Partition key Sort key

Page 52: Deep Dive on Amazon DynamoDB

Opponent StatusDate GameId Host

Alice DONE_2014-10-02 d9bl3 David

Carol IN_PROGRESS_2014-10-08 o2pnb Bob

Bob IN_PROGRESS_2014-10-03 ef9ca David

Bob PENDING_2014-09-30 72f49 Alice

Bob PENDING_2014-10-03 b932s Carol

Secondary index

Approach 2: Composite key

Bob

SELECT * FROM Game

WHERE Opponent='Bob'

AND StatusDate BEGINS_WITH 'PENDING'

Page 53: Deep Dive on Amazon DynamoDB

Needle in a sorted haystack

Bob

Page 54: Deep Dive on Amazon DynamoDB

Sparse indexes

CustomerId(Partition)

OrderId(Sort)

Total Date Open

1 234234 $100 2016-07-01

1 526346 $10 2016-07-022 746346 $200 2016-07-02

1 23462 $300 2016-07-05 X

3 635245 $150 2016-07-05

4 245362 $80 2016-07-07

Customer Orders

CustomerId(Partition)

Open(Sort)

Total OrderId Date

1 X $300 23462 2016-07-05

OpenOrders-GSI

Page 55: Deep Dive on Amazon DynamoDB

Concatenate attributes to form useful

secondary index keys

Take advantage of sparse indexes

Replace filter with indexes

You want to optimize a query as much

as possible

Status + Date

Page 56: Deep Dive on Amazon DynamoDB

Messaging appLarge items, Varied Access Patterns

Filters vs. Indexes

M:N Modeling—inbox and outbox

Page 57: Deep Dive on Amazon DynamoDB

Messages

table

Messages app

David

SELECT *

FROM Messages

WHERE Recipient='David'

LIMIT 50

ORDER BY Date DESC

Inbox

SELECT *

FROM Messages

WHERE Sender ='David'

LIMIT 50

ORDER BY Date DESC

Outbox

Page 58: Deep Dive on Amazon DynamoDB

Recipient Date Sender Message

David 2014-10-02 Bob …

… 48 more messages for David …

David 2014-10-03 Alice …

Alice 2014-09-28 Bob …

Alice 2014-10-01 Carol …

Large and small attributes mixed

(Many more messages)

David

Messages table

50 items × 256 KB each

Partition key Sort key

Large message bodies

Attachments

SELECT *

FROM Messages

WHERE Recipient='David'

LIMIT 50

ORDER BY Date DESC

Inbox

Page 59: Deep Dive on Amazon DynamoDB

Computing inbox query cost

Items evaluated by query

Average item size

Conversion ratio

Eventually consistent reads

50 * 256KB * (1 RCU / 4KB) * (1 / 2) = 1600 RCU

All those RCUs against one partition key

Page 60: Deep Dive on Amazon DynamoDB

Recipient Date Sender Subject MsgId

David 2014-10-02 Bob Hi!… afed

David 2014-10-03 Alice RE: The… 3kf8

Alice 2014-09-28 Bob FW: Ok… 9d2b

Alice 2014-10-01 Carol Hi!... ct7r

Separate the bulk data

Inbox-GSI Messages table

MsgId Body

9d2b …

3kf8 …

ct7r …

afed …

David1. Query Inbox-GSI: 1 RCU

2. BatchGetItem Messages: 1600 RCU

(50 separate items at 256 KB)

(50 sequential items at 128 bytes)

Page 61: Deep Dive on Amazon DynamoDB

Inbox GSI

Define which attributes to copy into the index

Page 62: Deep Dive on Amazon DynamoDB

Outbox Sender

Outbox GSI

SELECT *

FROM Messages

WHERE Sender ='David'

LIMIT 50

ORDER BY Date DESC

Page 63: Deep Dive on Amazon DynamoDB

Messaging app

Messages

Table

David

Inbox

global secondary

index

Inbox

Outbox

global secondary

index

Outbox

Page 64: Deep Dive on Amazon DynamoDB

Reduce one-to-many item sizes

Configure secondary index projections

Use GSIs to model M:N relationship

between sender and recipient

Distribute large items

Querying many large items at once

InboxMessagesOutbox

Page 65: Deep Dive on Amazon DynamoDB

Event driven applications and

DynamoDB Streams

Page 66: Deep Dive on Amazon DynamoDB

• Stream of updates

• Asynchronous

• Exactly once

• Strictly ordered (per item)

• Highly durable

• Scale with table

• 24-hour lifetime

• Sub-second latency

DynamoDB Streams

Page 67: Deep Dive on Amazon DynamoDB

Stream

Table

Partition 1

Partition 2

Partition 3

Partition 4

Partition 5

Table

Shard 1

Shard 2

Shard 3

Shard 4

KCL

Worker

KCL

Worker

KCL

Worker

KCL

Worker

Amazon Kinesis Client

Library application

DynamoDB

client application

Updates

DynamoDB Streams and

Amazon Kinesis Client Library

Page 68: Deep Dive on Amazon DynamoDB

DynamoDB Streams

Open Source Cross-

Region Replication Library

Asia Pacific (Sydney) EU (Ireland) Replica

US East (N. Virginia)

Cross-region replication

Page 69: Deep Dive on Amazon DynamoDB

DynamoDB Streams and AWS Lambda

Page 70: Deep Dive on Amazon DynamoDB

Triggers

Lambda functionNotify change

Derivative tables

Amazon CloudSearch

Amazon ElastiCache

Page 71: Deep Dive on Amazon DynamoDB

Search your DynamoDB tables

Page 72: Deep Dive on Amazon DynamoDB

A polyglot data layer

Page 73: Deep Dive on Amazon DynamoDB

Please remember to rate this

session under My Agenda on

awssummit.london

Page 74: Deep Dive on Amazon DynamoDB