DynamoDB In-depth & Developer Drill Down

Preview:

DESCRIPTION

DynamoDB In-Depth: In this technical discussion, learn how to use DynamoDB for your mobile and web apps, and how to pick the right database for your app. We will cover the fundamental concepts and how to go about architecting your app on DynamoDB. Plus, gain key insights to help you make the most out of DynamoDB. Developer Drill Down: Come learn with live examples of a DynamoDB application integrating with other data services on AWS to enrich your app. This is a developer driven interactive session focused on building real-life applications.

Citation preview

Peter-Mark Verwoerd

DynamoDB In-Depth & Developer Drill Down

Solutions ArchitectAce Hotel, New York. May 22nd, 2014

Overview

• Local Secondary Indexes• Global Secondary Indexes• Design Patterns

– User Data

• Demo• Break• Design Patterns (continued…)

– Game State– Save Games– Global Leaderboard– High throughput voting

• Data design patterns

Local Secondary Indexes

Local Secondary Indexes

• Alternate Range Key for your table• More flexible Query patterns• Local to the Hash Key

local secondary indexes (LSI)index and table data is co-located (same partition)

Use case for Local Secondary Indexes

• Find the recent DynamoDB forum posts• Table sorted by range key only

Forum Subject LastReplyTime Views Replies Answered

S3 How to set permissions? 2013-04-02 100 20 1DynamoDB Creating secondary indexes? 2013-02-12 100 20 0DynamoDB I get an error 2012-11-05 98 3 1DynamoDB Setting row permissions 2012-06-17 100 8 0DynamoDB Signature not working 2012-03-28 12 1 1DynamoDB Transaction support 2013-04-01 5 10 0

Use case for Local Secondary Indexes

• Create a local secondary index on LastReplyTime

Forum LastReplyTime Subject Views Replies Answered

S3 2013-04-02 How to set permissions? 100 20 1DynamoDB 2012-03-28 Signature not working 12 1 1DynamoDB 2012-06-17 Setting row permissions 100 8 0DynamoDB 2012-11-05 I get an error 98 3 1DynamoDB 2013-02-12 Creating secondary indexes? 100 20 0DynamoDB 2013-04-01 Transaction support 5 10 0

Write example (behind the scenes)

• Updating the LastReplyTime for a Post– from “2013-03-17” to “2013-04-02”

DynamoDB

ForumPostPartition 1

UpdateItem

ReplyTimeIndex

Table

ForumPostPartition 2

Write example (behind the scenes)

• Update the attribute(s) in the item in the table• Update the attribute(s) in the index if necessary

Write example (behind the scenes)

• Update the attribute(s) in the item in the table– Update “How..” date from 2 to 5

Forum Q’n Date

S3 Ask… 1

DDB Ask… 5

DDB Help.. 1

DDB How… 2

DDB Using… 3

Forum Date Q’n

S3 1 Ask...

DDB 1 Help…

DDB 2 How…

DDB 3 Using…

DDB 5 Ask…

Table Index

Write example (behind the scenes)

• Update the attribute(s) in the item in the table– Update “How..” date from 2 to 5

Forum Q’n Date

S3 Ask… 1

DDB Ask… 5

DDB Help.. 1

DDB How… 2

DDB Using… 3

Forum Date Q’n

S3 1 Ask...

DDB 1 Help…

DDB 2 How…

DDB 3 Using…

DDB 5 Ask…

Table Index

5

Write example (behind the scenes)

• Update the attribute(s) in the index– Update “How..” date from 2 to 5

Forum Q’n Date

S3 Ask… 1

DDB Ask… 5

DDB Help.. 1

DDB How… 2

DDB Using… 3

Forum Date Q’n

S3 1 Ask...

DDB 1 Help…

DDB 2 How…

DDB 3 Using…

DDB 5 Ask…

DDB 5 How…

Table Index

5

User(hash)

Date(range)

File(key)

User(hash)

File(range)

Date Type Size S3Key

Date-index

User(hash)

Type(range)

File(key)

Date(projected)

Table

KEYS_ONLY

INCLUDE Date

User(hash)

Size(range)

File(key)

Date(projected)

Type(projected)

S3key(projected) ALL

Local Secondary Index Projections

Type-index

Size-index

Projections

• Pick which attributes are “copied” into the index• Pros:

– Improves Query performance when querying projected attributes

• Cons:– Increases write cost when:

• Projected attributes are frequently updated• Projected attributes are > 1KB

Provisioned throughput cost (reads)

• If querying only for projected attributes:– Query costs the same as a Query on a table

• If querying for non-projected attributes– Query costs the same as a Query on a table– Plus, the cost of retrieving each item from the table independently

• (similar to Query + BatchGetItem)

Queries that Fetch

• Index: Project KEYS_ONLY• Query: (“DDB”, “Date >= 3”, “ALL_ATTRIBUTES”)

Forum Q’n Date Answered

S3 Ask… 1 1

DDB Ask… 5 0

DDB Help.. 1 1

DDB How… 2 1

DDB Using… 3 0

Forum Date Q’n

S3 1 Ask...

DDB 1 Help…

DDB 2 How…

DDB 3 Using…

DDB 5 Ask…

Table Index

Queries that Fetch

• Index: Project KEYS_ONLY• Query: (“DDB”, “Date >= 3”, “ALL_ATTRIBUTES”)

Forum Q’n Date Answered

S3 Ask… 1 1

DDB Ask… 5 0

DDB Help.. 1 1

DDB How… 2 1

DDB Using… 3 0

Forum Date Q’n

S3 1 Ask...

DDB 1 Help…

DDB 2 How…

DDB 3 Using…

DDB 5 Ask…

Table Index

1. Query Index2. Fetch items

Queries that Fetch

DynamoDB

ForumPostPartition 1

1. Query

ReplyTimeIndex

Table

ForumPostPartition 2

2. DynamoDB Queries Index

3. DynamoDB fetches each item from the table

Sparse indexes

• “Unanswered” entries are very interesting

Forum Subject LastReplyTime Views Replies Answered

S3 How to set permissions? 2013-04-02 100 20 1

DynamoDB Creating secondary indexes? 2013-02-12 100 20 0

DynamoDB I get an error 2013-04-01 98 3 1

DynamoDB Setting row permissions 2013-04-01 100 8 0

DynamoDB Signature not working 2013-04-01 12 1 0

DynamoDB Using the SDK 2013-04-01 5 10 1

Sparse indexes

• The “Unanswered” index contains only unanswered replies

Forum Unanswered

Subject LastReplyTime Views Replies

DynamoDB 1 Setting row permissions 2013-04-01 100 8

DynamoDB 1 Signature not working 2013-04-01 12 1

DynamoDB 1 Creating secondary indexes? 2013-02-12 100 20

Sparse indexes

• Tip: To get useful sort order, populate Unanswered with LastReplyDateTime

Forum Unanswered Subject LastReplyTime Views Replies

DynamoDB 2013-02-12 Creating secondary indexes? 2013-02-12 100 20

DynamoDB 2013-04-01 Setting row permissions 2013-04-01 100 8

DynamoDB 2013-04-01 Signature not working 2013-04-01 12 1

Global Secondary Indexes

Global Secondary Indexes

• Alternate Hash and/or Range Key for your table

• Even more flexible Query patterns

Global Secondary Index Projections

Urgent(hash)

Id(key)

GSIs

Table

INCLUDE

To(hash)

Date(range)

Id(key)

Message(projected)

From(projected)

ALL

To(hash)

From(range)

Id(key)

23

Id(hash)

Date From To Message Urgent

From(hash)

To(range)

Id(key) KEYS_ONLY

From(hash)

Date(range)

Id(key)

To

GSI Query Pattern

• Query covered by GSI– Query GSI & get the attributes

• Query not covered by GSI– Query GSI get the table key(s)– BatchGetItem/GetItem from table– 2 or more round trips to DynamoDB

Tip: If you need very low latency then project all required attributes into GSI

24

How do GSI updates work

Table

Primary tablePrimary

tablePrimary

tablePrimary

tableGlobal

Secondary Index

Client1. Update request

2. Asynchronous update (in progress)

2. Update response

25

1 Table update = 0, 1 or 2 GSI updates

Table Operation No. of GSI index updates

• Item not in Index before or after update 0

• Update introduces a new indexed-attribute• Update deletes the indexed-attribute

1

• Updated changes the value of an indexed attribute from A to B

2

26

Local Secondary Index Global Secondary Index

1 Key = hash key and a range key Key = hash or hash-and-range

2 Hash same attribute as that of the table. Range key can be any scalar table attribute

The index hash key and range key (if present) can be any scalar table attributes

3 For each hash key, the total size of all indexed items must be 10 GB or less No size restrictions for global secondary indexes

4 Query over a single partition, as specified by the hash key value in the query Query over the entire table, across all partitions

5 Eventual consistency or strong consistency Eventual consistency only

6 Read and write capacity units consumed from the table. Every global secondary index has its own provisioned read and write capacity units

7 Query will automatically fetch non-projected attributes from the table

Query can only request projected attributes. It will not fetch any attributes from the table

LSI or GSI?

• LSI can be modeled as a GSI• If date size in a item collection > 10GB use GSI• If GSI will work for your scenario use GSI!

– 2 round trips (unless you include)– Eventual consistency

Best Practices

• Provision enough throughput for GSI– one update to the table may result in two writes to an index

• If GSIs do not have enough write capacity, table writes will eventually be throttled down to what the "slowest" index can consume

User DataFine-grained access control

User Data

Your App

Users

Amazon DynamoDB

User Data

Users(Cost, Ops, Latency)

Amazon DynamoDB

Your App

User Data

Users

Amazon DynamoDB

User Data

Users

(Access control?)

Amazon DynamoDB

Web Identity Federation

Users

AWS IAM

Web identity federation

Amazon DynamoDB

Web Identity Federation

Users

AWS IAM

Web identity federation

(Fine-grained access control)

Amazon DynamoDB

Fine-Grained Access Control

• Limit access to particular hash key values• Limit access to specific attributes• Use policy substitution variables to write the policy once

Fine-Grained Access Control

Images Table

User Image Date Link

Bob aed4c 2013-10-01 s3://…

Bob 5f2e2 2013-09-05 s3://…

Bob f93bae 2013-10-08 s3://…

Alice ca61a 2013-09-12 s3://…

“Allow all authenticated Facebook users to Query the Images table, but only on items where their Facebook ID is the hash key”

Fine-Grained Access Control

Images Table

User Image Date Link

Bob aed4c 2013-10-01 s3://…

Bob 5f2e2 2013-09-05 s3://…

Bob f93bae 2013-10-08 s3://…

Alice ca61a 2013-09-12 s3://…

Bob

AWS IAM

Bob “logs in” using web identity federation

Fine-Grained Access Control

Images Table

User Image Date Link

Bob aed4c 2013-10-01 s3://…

Bob 5f2e2 2013-09-05 s3://…

Bob f93bae 2013-10-08 s3://…

Alice ca61a 2013-09-12 s3://…

Bob

Bob can Query for Images where User=“Bob”

Fine-Grained Access Control

Images Table

User Image Date Link

Bob aed4c 2013-10-01 s3://…

Bob 5f2e2 2013-09-05 s3://…

Bob f93bae 2013-10-08 s3://…

Alice ca61a 2013-09-12 s3://…

Bob

Bob cannot Query for Images where User=“Alice”

Two-tier Architecture Tradeoffs

• Pros:– Lower latency– Lower cost– Lower operational complexity

• Cons:– Less visibility into application behavior– More difficult to make changes to persistence layer– Requires “scoping” items to a given user

Amazon DynamoDB

Users

Demo

Tagging App Query Patterns

• Image Table:– How many votes does this image(URL) have?– Does an item already exist for this URL?

• Tag Table:– How many images are tagged with this tag?

• ImageTag table:– All images with a given tag– All tags for a given image– How may votes does this tag have?

Image Table

Id DateAdded VoteCount

"http://tag-pics.s3.amazonaws.com/aws-icons/cloudsearch.png" "2014-05-06T05:50:06.371Z" 0

"http://tag-pics.s3.amazonaws.com/aws-icons/dynamodb.png" "2014-05-06T05:03:16.582Z" 3

Attribute Type Value

Id (Hash Key) String "http://tag-pics.s3.amazonaws.com/aws-icons/cloudsearch.png"

DateAdded String "2014-05-06T05:50:06.371Z"

VoteCount Number 0

Tag Table

Tag ImageCount "new" 2"database" 1"nosql" 1

"cloudsearch" 1

"dynamodb" 1

Attribute Type Value

Tag (Hash Key) String "database"

ImageCount Number 1

ImageTag TableTag ImageId LastUpdateTime Vote

Count

"new" "http://tag-pics.s3.amazonaws.com/aws-icons/cloudsearch.png" "2014-05-06T05:50:06.371Z" 0

"new" "http://tag-pics.s3.amazonaws.com/aws-icons/dynamodb.png" "2014-05-06T05:50:36.452Z" 3

"database" "http://tag-pics.s3.amazonaws.com/aws-icons/dynamodb.png" "2014-05-06T05:03:51.964Z" 3

"nosql" "http://tag-pics.s3.amazonaws.com/aws-icons/dynamodb.png" "2014-05-06T05:03:45.489Z" 3

"cloudsearch" "http://sivar-pics.s3.amazonaws.com/aws-icons/cloudsearch.png" "2014-05-06T05:50:19.364Z" 0

"dynamodb" "http://sivar-pics.s3.amazonaws.com/aws-icons/dynamodb.png" "2014-05-06T05:03:35.655Z" 3

Attribute Type Value

Tag (Hash Key) String "database"

ImageId (Range Key) String "http://tag-pics.s3.amazonaws.com/aws-icons/dynamodb.png"

LastUpdateTime String "2014-05-06T05:03:51.964Z"

VoteCount Number 3

ImageTag Table Indexes

Index Name Hash Key Range Key Projected Attributes Index Size (Bytes)* Item Count*

VoteCount-index Tag (String) VoteCount (Number) All 369 3

Index Name Hash Key Range

KeyProjected Attributes Status

Read Capacity

Units

Write Capacity

Units

Last Decrease Time

Last Increase Time

Index Size

(Bytes)*

Item Count

*

ImageId-index ImageId (String) Tag (String) Tag, ImageId Active 1 1 222 3

Local Secondary Index

Global Secondary Index

Conditional Update

dynamodb.putItem({ 'TableName': 'Image', 'Item': { 'Id': {'S': imageId}, 'VoteCount': {'N': "0"}, 'DateAdded': {'S': dateStr} }, 'Expected': { 'Id': { 'Exists': false } }

UpdateItem Increment

// Upserts a new Tag into the Tag table, incrementing its ImageCount.function insertTag(tag, incrementBy, callback) { console.log("Insert / increment Tag. Tag: " + tag); dynamodb.updateItem({ 'TableName': 'Tag', 'Key': { 'Tag': {'S': tag} }, 'AttributeUpdates': { 'ImageCount': { 'Action': 'ADD', 'Value': {'N': "1" } } } }, callback);}

function queryImagesByTag(tag, limit, votes, imageId, callback) {var params = { 'TableName': 'ImageTag', 'IndexName': 'VoteCount-index', 'KeyConditions': { 'Tag': { 'ComparisonOperator': 'EQ', 'AttributeValueList': [{'S': tag}] } }, 'ScanIndexForward': false }; if(limit) { params['Limit'] = parseInt(limit); } if(votes != null && imageId != null) { params['ExclusiveStartKey'] = { 'Tag': { 'S': tag }, 'VoteCount': { 'N': votes }, 'ImageId': { 'S': imageId } }; } dynamodb.query(params, callback);}

Summary: Image Tagging Demo

• Modeling applications on DynamoDB is similar to with databases• Need to plan your schema and indexes around how you are going to

query your data

Basic Game StateConditional Writes

Tic Tac Toe

Tic Tac Toe

Alice Bob

DynamoDB

Your App

Tic Tac Toe Table

Game Table

Id Players O State IsTie Winner Data

abecd [ Alice, Bob ] Alice DONE 1 …

fbdcc [ Alice, Bob ] Alice DONE Alice …

dbace [ Alice, Bob ] Alice STARTED …

Tic Tac Toe Table

{ "Data" : [ [ "X", null, "O" ], [ null, "O", null], [ "O", null, "X" ] ] }

Id Players O State IsTie Winner Data

abecd [ Alice, Bob ] Alice DONE 1 …

fbdcc [ Alice, Bob ] Alice DONE Alice …

dbace [ Alice, Bob ] Alice STARTED …

State Transitions with Conditional Writes

DynamoDB

Alice Bob

State Transitions with Conditional Writes

DynamoDB

UpdateItem:Top-Right = OTurn = Bob

Alice Bob

State Transitions with Conditional Writes

DynamoDB

UpdateItem:Top-Left = XTurn = Alice

Alice Bob

State Transitions with Conditional Writes

Alice Bob (1)

DynamoDB

Bob (2) Bob (3)

State Transitions with Conditional Writes

Alice Bob (1)

DynamoDB

Bob (2) Bob (3)

State Transitions with Conditional Writes

Alice Bob (1)

DynamoDB

Bob (2) Bob (3)

State Transitions with Conditional Writes

Bob (1)

DynamoDB

Bob (2)Bob (3)

State : STARTED,Turn : Bob,Top-Right : O

State Transitions with Conditional Writes

Bob (1)

DynamoDB

Bob (2)Bob (3)

Update: Turn : Alice Top-Left : X

Update: Turn : Alice Low-Right : X

Update: Turn : Alice Mid : X

State : STARTED,Turn : Bob,Top-Right : O

State Transitions with Conditional Writes

Bob (1)

DynamoDB

Bob (2)Bob (3)

Update: Turn : Alice Top-Left : X

Update: Turn : Alice Low-Right : X

Update: Turn : Alice Mid : X

State : STARTED,Turn : Alice,Top-Right : O,Top-Left : X,Mid: X,Low-Right: X

Conditional Writes

• Apply an update only if values are as expected • Otherwise reject the write

Conditional Writes

{ Id : abecd, Players : [ Alice, Bob ], State : STARTED, Turn : Bob, Top-Right: O}

Game Item Updates: { Turn : Alice, Top-Left: X}

Expected: { Turn : Bob, Top-Left : null, State : STARTED}

UpdateItem Id=abecd

State Transitions with Conditional Writes

Bob (1)

DynamoDB

Bob (2)Bob (3)

Update: Turn : Alice Top-Left : XExpect: Turn : Bob Top-Left : null

State : STARTED, Turn : Bob, Top-Right : O

Update: Turn : Alice Low-Right : XExpect: Turn : Bob Low-Right : null

Update: Turn : Alice Mid : XExpect: Turn : Bob Mid : null

State Transitions with Conditional Writes

Bob (1)

DynamoDB

Bob (2)Bob (3)

State : STARTED, Turn : Bob, Top-Right : O

Update: Turn : Alice Top-Left : XExpect: Turn : Bob Top-Left : null

Update: Turn : Alice Low-Right : XExpect: Turn : Bob Low-Right : null

Update: Turn : Alice Mid : XExpect: Turn : Bob Mid : null

State Transitions with Conditional Writes

Bob (1)

DynamoDB

Bob (2)Bob (3)

State : STARTED, Turn : Alice, Top-Right : O, Top-Left : X

Update: Turn : Alice Top-Left : XExpect: Turn : Bob Top-Left : null

Update: Turn : Alice Low-Right : XExpect: Turn : Bob Low-Right : null

Update: Turn : Alice Mid : XExpect: Turn : Bob Mid : null

Save GamesHash + Range

Save Games

Primary Key Schemas

Id Players O State IsTie Winner Data

abecd [ Alice, Bob ] Alice DONE 1 …

fbdcc [ Alice, Bob ] Alice DONE Alice …

dbace [ Alice, Bob ] Alice STARTED …

Primary Key

Hash Key Schema

Primary Key Schemas

Id Turn Players Turn State IsTie Winner Data

abecd 0 [ Alice, Bob ] Alice STARTED …

abecd 1 [ Alice, Bob ] Bob STARTED …

abecd 2 [ Alice, Bob ] Alice STARTED …

abecd 3 [ Alice, Bob ] Bob STARTED …

abecd 4 [ Alice, Bob ] Alice DONE Alice …

dbace 0 [ Alice, Bob ] Bob STARTED

dbace 1 [ Alice, Bob ] Alice STARTED …

Primary Key

Hash and Range Key Schema

Primary Key Schemas

Id Turn Players Turn State IsTie Winner Data

abecd 0 [ Alice, Bob ] Alice STARTED …

abecd 1 [ Alice, Bob ] Bob STARTED …

abecd 2 [ Alice, Bob ] Alice STARTED …

abecd 3 [ Alice, Bob ] Bob STARTED …

abecd 4 [ Alice, Bob ] Alice DONE Alice …

dbace 0 [ Alice, Bob ] Bob STARTED

dbace 1 [ Alice, Bob ] Alice STARTED …

Primary Key

Primary Key Schemas

• Hash-only– Key/value lookups only

• Hash and Range– Given a hash key value, query for items by range key– Items are sorted by range key within each hash key

Primary Key Schemas

Id Turn Players Turn State IsTie Winner Data

abecd 0 [ Alice, Bob ] Alice STARTED …

abecd 1 [ Alice, Bob ] Bob STARTED …

abecd 2 [ Alice, Bob ] Alice STARTED …

abecd 3 [ Alice, Bob ] Bob STARTED …

abecd 4 [ Alice, Bob ] Alice DONE Alice …

dbace 0 [ Alice, Bob ] Bob STARTED

dbace 1 [ Alice, Bob ] Alice STARTED …

Primary Key

Query WHERE Id=abecd, ORDER BY Turn DESC, LIMIT 2

Global LeaderboardScatter-gather

Game-Wide Leaderboard

• Find the top 10 scores game-wide

HighScore User

1000 Alice

850 Dave

580 Erin

470 Bob

30 Chuck

Game-Wide Leaderboard

• Find the top 10 scores game-wide

HighScore User

1000 Alice

850 Dave

580 Erin

470 Bob

30 Chuck

Table Schemas must beginwith a Hash Key

Game-Wide Leaderboard

• Find the top 10 scores game-wide

Cannot be Queriedthe way we want

User HighScore

Chuck 20

Alice 1000

Bob 470

Dave 850

Erin 580

Game-Wide Leaderboard

• Use a constant Hash key?

Constant HighScore-User

1 0001000-Alice

1 0000850-Dave

1 0000580-Erin

1 0000470-Bob

1 0000030-ChuckZero-pad strings for sortstability

Game-Wide Leaderboard

• Use a constant Hash key?

Constant HighScore-User

1 0001000-Alice

1 0000850-Dave

1 0000580-Erin

1 0000470-Bob

1 0000030-Chuck

Extremely non-uniform workload

Scatter-Gather Leading Range Key

HighScores Table

Shard HighScore-User

1 0001000-Alice

1 0000850-Dave

1 0000580-Erin

Shard HighScore-User

3 0000900-Dan

3 0000850-Wendy

3 0000080-Trent

Shard HighScore-User

2 0000980-Eve

2 0000600-Frank

2 0000581-Trent

Shard HighScore-User

4 0000500-Merlin

4 0000350-Carole

4 0000280-Paul

Shard HighScore-User

5 0000999-Oscar

5 0000700-Craig

5 0000030-Chuck

Scatter-Gather Leading Range Key

HighScores Table

Shard HighScore-User

1 0001000-Alice

1 0000850-Dave

1 0000580-Erin

Shard HighScore-User

3 0000900-Dan

3 0000850-Wendy

3 0000080-Trent

Shard HighScore-User

2 0000980-Eve

2 0000600-Frank

2 0000581-Trent

Shard HighScore-User

4 0000500-Merlin

4 0000350-Carole

4 0000280-Paul

Shard HighScore-User

5 0000999-Oscar

5 0000700-Craig

5 0000030-Chuck

1. Periodically Query each Shard DESC, LIMIT N

Scatter-Gather Leading Range Key

HighScores Table

Shard HighScore-User

1 0001000-Alice

1 0000850-Dave

1 0000580-Erin

Shard HighScore-User

3 0000900-Dan

3 0000850-Wendy

3 0000080-Trent

Shard HighScore-User

2 0000980-Eve

2 0000600-Frank

2 0000581-Trent

Shard HighScore-User

4 0000500-Merlin

4 0000350-Carole

4 0000280-Paul

Shard HighScore-User

5 0000999-Oscar

5 0000700-Craig

5 0000030-Chuck

2. Keep only the top N, Store somewhere

HighScore User

1000 Alice

999 Oscar

Scatter-Gather Leading Range Key

HighScores Table

Shard HighScore-User

1 0001000-Alice

1 0000850-Dave

1 0000580-Erin

Shard HighScore-User

3 0000900-Dan

3 0000850-Wendy

3 0000080-Trent

Shard HighScore-User

2 0000980-Eve

2 0000600-Frank

2 0000581-Trent

Shard HighScore-User

4 0000500-Merlin

4 0000350-Carole

4 0000280-Paul

Shard HighScore-User

5 0000999-Oscar

5 0000700-Craig

5 0000030-Chuck

Store the Shard id by User for high score updates

User Shard

Alice 1

Oscar 5

Carole 4

High-Throughput VotingWrite sharding

Voting

Votes Table

Voter

Candidate AVotes: 20

Candidate BVotes: 30

Voting

Votes Table

Voter

Candidate AVotes: 21

Candidate BVotes: 30

UpdateItemADD 1 to “Candidate A”(aka Atomic Increment)

Scaling on DynamoDB

Votes Table

YouNeed to scale

for the election

Scaling on DynamoDB

Votes Table

You

Provision 1200 Write Capacity Units

Scaling on DynamoDB

Votes Table

Partition 1 Partition 2

You

600 Write Capacity Units (each)

Provision 1200 Write Capacity Units

Scaling on DynamoDB

Votes Table

Partition 1 Partition 2

You

(no sharing)

Provision 1200 Write Capacity Units

Scaling on DynamoDB

Votes Table

You

Provision 200,000 Write Capacity Units

Partition 1(600 WCU)

Partition K(600 WCU)

Partition M(600 WCU)

Partition N(600 WCU)

Scaling bottlenecks

Votes Table

Partition 1(600 WCU)

Candidate A

Partition K(600 WCU)

Partition M(600 WCU)

Partition N(600 WCU)

Candidate B

Voters

Scaling bottlenecks

Votes Table

Partition 1(600 WCU)

Candidate A

Partition K(600 WCU)

Partition M(600 WCU)

Partition N(600 WCU)

Candidate B

Voters

50,000 / sec

70,000 / sec

Best Practice: Uniform Workloads

“To achieve the full amount of request throughput you have provisioned for a table, keep your workload spread evenly across the hash key values.”

– DynamoDB Developer Guide

Scaling on DynamoDB

Votes Table

Candidate A_2

Candidate B_1

Candidate B_2

Candidate B_3

Candidate B_5

Candidate B_4

Candidate B_7

Candidate B_6

Candidate A_1

Candidate A_3

Candidate A_4Candidate A_7 Candidate B_8

Voter

Candidate A_6 Candidate A_8

Candidate A_5

Scaling on DynamoDB

Votes Table

Candidate A_2

Candidate B_1

Candidate B_2

Candidate B_3

Candidate B_5

Candidate B_4

Candidate B_7

Candidate B_6

Candidate A_1

Candidate A_3

Candidate A_4Candidate A_7 Candidate B_8

Voter

UpdateItem: “CandidateA_” + rand(0, 10)ADD 1 to Votes

Candidate A_6 Candidate A_8

Candidate A_5

Scaling on DynamoDB

Votes Table

Candidate A_2

Candidate B_1

Candidate B_2

Candidate B_3

Candidate B_5

Candidate B_4

Candidate B_7

Candidate B_6

Candidate A_1

Candidate A_3

Candidate A_4

Candidate A_5

Candidate A_6 Candidate A_8

Candidate A_7 Candidate B_8

Periodic Process

Candidate ATotal: 2.5M

1. Sum2. Store Voter

Data Design Patterns

Reference Architecture …for a classic

3-tier application

Application

Reference Architecture

Data Bus

Amazon RDS

Amazon CloudSearch

Amazon DynamoDB

Amazon ElastiCache

Amazon EMR

Amazon S3

Amazon Glacier

AWS Data Pipeline

Amazon Redshift

Use Case: A Video Streaming App – Upload

Application

AWS Data Pipeline

AmazonDynamoDB

AmazonRDS

Amazon CloudSearch

AmazonS3

A Video Streaming App – Discovery

Application

AWS Data Pipeline

X

AmazonGlacier

AmazonElastiCache

CloudFront

AmazonDynamoDB

AmazonRDS

Amazon CloudSearch

AmazonS3

Use Case: A Video Streaming App – Recs

Application

AWS Data Pipeline

AmazonS3

AmazonGlacier

AmazonDynamoDB

AmazonEMR

How do I choose the right date store?

Date Structure & Query Pattern

Service characteristics Cost

Data Structure & Query Pattern

Structured – Complex Query• SQL

– Amazon RDS (MySQL, Oracle, SQL Server, Postgres)

• Data Warehouse– Amazon Redshift

• Search– Amazon

CloudSearch

Unstructured – Custom Query• Hadoop

– Amazon Elastic MapReduce (EMR)

Structured – Simple Query• NoSQL

– Amazon DynamoDB

• Cache– Amazon ElastiCache

(Memcached, Redis)

Unstructured – No Query• Cloud Storage

– Amazon S3– Amazon Glacier

Data Characteristics: Hot, Warm, Cold

Hot Warm Cold

Volume MB–GB GB–TB PB

Item size B–KB KB–MB KB–TB

Latency ms ms, sec min, hrs

Durability Low–High High Very High

Request rate Very High High Low

Cost/GB $$-$ $-¢¢ ¢

We are sincerely eager to hear your feedback on this

presentation and on re:Invent.

Please fill out an evaluation form when you have a chance.

AmazonElastiCache

AmazonRDS

AmazonRedshift

Amazon S3

Request rateHigh Low

Cost/GBHigh Low

LatencyLow High

Data VolumeLow High

AmazonGlacier

AmazonCloudSearch

AmazonEMR

Stru

ctur

eLow

High

AmazonDynamoDB

What data store should I use?Elasti-Cache

AmazonDynamoDB

AmazonRDS

CloudSearch

Amazon Redshift Amazon EMR (Hive)

Amazon S3 Amazon Glacier

Average latency

ms ms ms,sec ms,sec sec,min sec,min,hrs

ms,sec,min(~ size)

hrs

Data volume GB GB–TBs(no limit)

GB–TB(3 TB Max)

GB–TB TB–PB(1.6 PB max)

GB–PB(~nodes)

GB–PB(no limit)

GB–PB(no limit)

Item size B-KB KB(64 KB max)

KB(~rowsize)

KB(1 MB max)

KB(64 K max)

KB-MB KB-GB(5 TB max)

GB(40 TB max)

Request rate Very High Very High High High Low Low Low–Very High(no limit)

Very Low(no limit)

Storage cost $/GB/month

$$ ¢¢ ¢¢ $ ¢ ¢ ¢ ¢

Durability Low - Moderate

Very High High High High High Very High Very High

Hot Data Warm Data Cold Data

Use the right tool for the job!

App/Web Tier

Client Tier

Data Tier

Amazon RDS

Amazon CloudSearch

Amazon DynamoDB

Amazon ElastiCache

Amazon Elastic MapReduce Amazon S3

Amazon Glacier

Amazon Redshift AWS Data Pipeline

When to use• Fast and predictable

performance• Seamless/massive scale• Autosharding• Consistent/low latency• No size or throughput limits• Very high durability• Key-value or simple queries

When not to use• Need multi-item/row or cross

table transactions• Need complex queries, joins• Need real-time analytics on

historic data• Storing cold data

Amazon DynamoDBManaged NoSQL Service

Questions?Peter-Mark Verwoerdverwoerd@amazon.com@petermark

Derek Chilesderekch@amazon.com@derekchiles

Social GamingLocal secondary indexes

Social Gaming

• Host games• Invite friends to play• Find friends’ games to play• See history of games

Social Gaming

HostedGame Table

Hash: UserIdRange: GameIdAttributes: OpponentId, Date, (rest of game state)

UserId GameId Date OpponentId …

Carol e23f5a 2013-10-08 Charlie …

Alice d4e2dc 2013-10-01 Bob …

Alice e9cba3 2013-09-27 Bob …

Alice f6a3bd 2013-10-08

Social Gaming

• Host games• Invite friends to play• Find friends’ games to play• See history of games

Social Gaming: find recent games

UserId GameId Date OpponentId …

Carol e23f5a 2013-10-08 Charlie …

Alice d4e2dc 2013-10-01 Bob …

Alice e9cba3 2013-09-27 Bob …

Alice f6a3bd 2013-10-08

Query UserId=Alice

Query cost

• Provisioned Throughput: Work / sec allowed on your table• Capacity Units: Amount of provisioned throughput consumed by an

operation

Query costUserId GameId Date OpponentId …

Carol e23f5a 2013-10-08 Charlie …

Alice d4e2dc 2013-10-01 Bob …

Alice e9cba3 2013-09-27 Bob …

Alice f6a3bd 2013-10-08

(1 item = 600 bytes)

(397 more games for Alice)

Query costUserId GameId Date OpponentId …

Carol e23f5a 2013-10-08 Charlie …

Alice d4e2dc 2013-10-01 Bob …

Alice e9cba3 2013-09-27 Bob …

Alice f6a3bd 2013-10-08

(1 item = 600 bytes)

(397 more games for Alice)

400 X 600 / 1024 / 4 = 60 Read Capacity Units

(bytes per item) (bytes per KB)

(KB per Read Capacity Unit)(Items evaluated by Query)

Local Secondary Indexes

• An alternate range key on a table

UserId GameId Date

Carol e23f5a 2013-10-08

Alice d4e2dc 2013-10-01

Alice e9cba3 2013-09-27

Alice f6a3bd 2013-10-01

UserId Date GameId

Carol 2013-10-08 e23f5a

Alice 2013-09-27 e9cba3

Alice 2013-10-01 d4e2dc

Alice 2013-10-01 f6a3bd

HostedGame Table LocalSecondaryIndex on Date

Query cost on Local Secondary Indexes

UserId Date GameId …

Carol 2013-10-08 e23f5a …

Alice (397 older games)

Alice 2013-09-27 e9cba3 …

Alice 2013-10-01 d4e2dc …

Alice 2013-10-01 f6a3bd …

Query for the 10 most recent games

Query cost on Local Secondary Indexes

UserId Date GameId …

Carol 2013-10-08 e23f5a …

Alice (397 older games)

Alice 2013-09-27 e9cba3 …

Alice 2013-10-01 d4e2dc …

Alice 2013-10-01 f6a3bd …

10 X 600 / 1024 / 4 = 2 Read Capacity Units

(bytes per item) (bytes per KB)

(KB per Read Capacity Unit)(Items evaluated by Query)

Query for the 10 most recent games

Example Local Secondary Indexes

• Find 10 recent matches between Alice and Bob

Example Local Secondary Indexes

• Find 10 recent matches between Alice and Bob– Hash: UserId– Range: OpponentId + Date

Query WHERE UserId=Alice AND OpponentAndDate STARTS_WITH “Bob-” LIMIT 10 DESC

More example Local Secondary Indexes

• Find a host’s matches without an opponent

More example Local Secondary Indexes

• Find a host’s matches without an opponent– Hash: UserId– Range: UnmatchedDate

(sparse index)

Query WHERE UserId=Alice LIMIT 10 DESC

Local Secondary Index Projections

• Choose what attributes are copied into the index– ALL, SPECIFIC, KEYS

• Substantially cheaper to Query only projection• Project the attributes that your use case requires• Can make writes cheaper too

Write cost for Local Secondary Index

• Insert new item– 1 additional write

• Setting index range key to / from null– 1 additional write

• Updating a projected attribute– 1 additional write

• Updating a non-projected attribute– 0 additional writes

• Updating the index range key– 2 additional writes

Read cost for Query of non-projected attributes

• Regular Query cost+

• Single-item Get cost for each evaluated item

Example Local Secondary Index Projections

• Query Alice’s 10 most recent Games

UserId GameId Date OpponentId …

Carol e23f5a 2013-10-08 Charlie …

Alice d4e2dc 2013-10-01 Bob …

Alice e9cba3 2013-09-27 Bob …

Alice f6a3bd 2013-10-08

Example Local Secondary Index Projections

• Query Alice’s 10 most recent Games– Opponent, Winner, (UserId, GameId, Date)– Projected item size from 600 bytes to 40 bytes

• Write cost:– 1 Write Capacity Unit for insert, opponent joining, and completion– 0 Write Capacity Units for other state transitions

Social Gamingtransactions

Social Gaming: Friends

• Query who you are friends with• Ask to be friends with someone• Acknowledge (or decline) friend request

Social Gaming: Friends

FriendsTable

Hash: UserIdRange: FriendIdAttributes: Status, Date, etc

UserId FriendId Status Date …

Alice Bob FRIENDS 2013-08-20 …

Bob Alice FRIENDS 2013-08-20 …

Bob Chuck INCOMING 2013-10-08 …

Chuck Bob SENT 2013-10-08 …

Becoming Friends: Multi-item Atomic Writes

UserId FriendId Status

Alice Bob FRIENDS

Bob Alice FRIENDS

Bob Chuck INCOMING

Chuck Bob SENT

Bob

A friend request!

Becoming Friends: Multi-item Atomic Writes

UserId FriendId Status

Alice Bob FRIENDS

Bob Alice FRIENDS

Bob Chuck INCOMING

Chuck Bob SENT

Bob

1. Update Bob/Chuck record2. Update Chuck/Bob record

Becoming Friends: Multi-item Atomic Writes

UserId FriendId Status

Alice Bob FRIENDS

Bob Alice FRIENDS

Bob Chuck FRIENDS

Chuck Bob SENT

Bob

UpdateItemStatus=FRIENDS

1. Update Bob/Chuck record2. Update Chuck/Bob record

Becoming Friends: Multi-item Atomic Writes

UserId FriendId Status

Alice Bob FRIENDS

Bob Alice FRIENDS

Bob Chuck FRIENDS

Chuck Bob FRIENDS

Bob

UpdateItemStatus=FRIENDS

1. Update Bob/Chuck record2. Update Chuck/Bob record

When things go wrong

Becoming Friends: When things go wrong

UserId FriendId Status

Alice Bob FRIENDS

Bob Alice FRIENDS

Bob Chuck INCOMING

Chuck Bob SENT

Bob

A friend request!

1. Update Bob/Chuck record2. Update Chuck/Bob record

Becoming Friends: When things go wrong

UserId FriendId Status

Alice Bob FRIENDS

Bob Alice FRIENDS

Bob Chuck FRIENDS

Chuck Bob SENT

Bob

UpdateItemStatus=FRIENDS

1. Update Bob/Chuck record2. Update Chuck/Bob record

Becoming Friends: When things go wrong

UserId FriendId Status

Alice Bob FRIENDS

Bob Alice FRIENDS

Bob Chuck FRIENDS

Chuck Bob SENT

Bob

UpdateItemStatus=FRIENDS

1. Update Bob/Chuck record2. Update Chuck/Bob record

Multi-item transaction in DynamoDB

• Scan for “stuck” transactions• Use the Client Transactions Library on the AWS SDK for Java• Roll your own scheme

Replayable state machines

INCOMING ACCEPTING FRIENDS

SENT FRIENDSSENDING

Bob/Chuck

Chuck/Bob

Client Transactions Library

Friends TableTransactions Table

Transaction ImagesTable

TransactionClient

Bob

Client Transactions Usage

• Low contention only• Don’t mix Tx Client writes with normal writes• No Query support• Expensive, slower• But, easy to use

Specialized Transactions

UserId FriendId Status V

Alice Bob FRIENDS 3

Bob Alice FRIENDS 3

Bob Chuck INCOMING 2

Chuck Bob SENT 2

Id Status V1 V2

Bob

1. Read items2. Write to Tx table3. Apply writes4. Delete from Tx table

Transactions TableA friend request!

Specialized Transactions

UserId FriendId Status V

Alice Bob FRIENDS 3

Bob Alice FRIENDS 3

Bob Chuck INCOMING 2

Chuck Bob SENT 2

Id Status V1 V2

Bob BatchGetItem

1. Read items2. Write to Tx table3. Apply writes4. Delete from Tx table

Transactions Table

Specialized Transactions

UserId FriendId Status V

Alice Bob FRIENDS 3

Bob Alice FRIENDS 3

Bob Chuck INCOMING 2

Chuck Bob SENT 2

Id Status V1 V2Bob-Chuck Bob: FRIENDS

Chuck: FRIENDS2 2

Bob PutItem,Expect not exists

1. Read items2. Write to Tx table3. Apply writes4. Delete from Tx table

Transactions Table

Specialized Transactions

UserId FriendId Status V

Alice Bob FRIENDS 3

Bob Alice FRIENDS 3

Bob Chuck FRIENDS 3

Chuck Bob FRIENDS 3

Id Status V1 V2Bob-Chuck Bob: FRIENDS

Chuck: FRIENDS2 2

Bob UpdateItem,Expect V=Vprev

1. Read items2. Write to Tx table3. Apply writes4. Delete from Tx table

Transactions Table

Specialized Transactions

UserId FriendId Status V

Alice Bob FRIENDS 3

Bob Alice FRIENDS 3

Bob Chuck FRIENDS 3

Chuck Bob FRIENDS 3

Id Status V1 V2

Bob DeleteItem,Expect V1=V1prev,

V2=V2prev,1. Read items2. Write to Tx table3. Apply writes4. Delete from Tx table

Transactions Table

When things go wrong

Specialized Transactions

UserId FriendId Status V

Alice Bob FRIENDS 3

Bob Alice FRIENDS 3

Bob Chuck INCOMING 2

Chuck Bob SENT 2

Id Status V1 V2

Bob BatchGetItem

1. Read items2. Write to Tx table3. Apply writes4. Delete from Tx table

Transactions Table

Specialized Transactions

UserId FriendId Status V

Alice Bob FRIENDS 3

Bob Alice FRIENDS 3

Bob Chuck INCOMING 2

Chuck Bob SENT 2

Id Status V1 V2Bob-Chuck Bob: FRIENDS

Chuck: FRIENDS2 2

Bob PutItem,Expect not exists

1. Read items2. Write to Tx table3. Apply writes4. Delete from Tx table

Transactions Table

Specialized Transactions

UserId FriendId Status V

Alice Bob FRIENDS 3

Bob Alice FRIENDS 3

Bob Chuck FRIENDS 3

Chuck Bob SENT 2

Id Status V1 V2Bob-Chuck Bob: FRIENDS

Chuck: FRIENDS2 2

Bob UpdateItem,Expect V=Vprev

1. Read items2. Write to Tx table3. Apply writes4. Delete from Tx table

Transactions Table

Specialized Transactions

UserId FriendId Status V

Alice Bob FRIENDS 3

Bob Alice FRIENDS 3

Bob Chuck FRIENDS 3

Chuck Bob SENT 2

Id Status V1 V2Bob-Chuck Bob: FRIENDS

Chuck: FRIENDS2 2

Sweeper Scan

1. Scan for stuck Tx2. Apply writes3. Delete from Tx table

Transactions Table

Specialized Transactions

UserId FriendId Status V

Alice Bob FRIENDS 3

Bob Alice FRIENDS 3

Bob Chuck FRIENDS 3

Chuck Bob FRIENDS 3

Id Status V1 V2Bob-Chuck Bob: FRIENDS

Chuck: FRIENDS2 2

UpdateItem,Expect V=Vprev

Transactions Table

Sweeper

1. Scan for stuck Tx2. Apply writes3. Delete from Tx table

Specialized Transactions

UserId FriendId Status V

Alice Bob FRIENDS 3

Bob Alice FRIENDS 3

Bob Chuck FRIENDS 3

Chuck Bob FRIENDS 3

Id Status V1 V2

DeleteItem,Expect V1=V1prev,

V2=V2prev,

Transactions Table

Sweeper

1. Scan for stuck Tx2. Apply writes3. Delete from Tx table

Transaction advice

• Lock items before modifying– Including items that don’t exist yet

• Don’t stomp on future writes (use versions)• Sweep for stuck transactions• Avoid deadlock