49
Data Modeling on NoSQL Bryce Cottam Principal Architect, Think Big a Teradata Company

Data Modeling on NoSQL

Embed Size (px)

Citation preview

Page 1: Data Modeling on NoSQL

Data Modeling on NoSQL

Bryce CottamPrincipal Architect, Think Big a Teradata Company

Page 2: Data Modeling on NoSQL

• Where we came from (RDBMS Modeling)• Migrate Existing Data Model to NoSQL• Questions

Agenda

Page 3: Data Modeling on NoSQL

• Migrate a SQL based solution to NoSQL• NoSQL Smack-Down (Battle of the NoSQL Bands)

Anti-Agenda

What we are NOT going to cover:

Page 4: Data Modeling on NoSQL

Where We Came From(RDBMS Modeling)

Page 5: Data Modeling on NoSQL

SQL Backdrop

123 Tony Soprano true 1963-04-15

124 Carmella Soprano false 1968-12-02

125 Johnny Sacrimoni true 1959-01-11

158 Paulie Gualtieri false 1960-08-04

159 Silvio Dante false 1965-10-11

162 Ralph Cifaretto false 1969-03-28

164 Christopher Moltisanti false 1974-01-11

165 Adriana La Cerva false 1976-11-02

• Column Order• Column Names• Column Width• Data Types

Metadata Raw Data

• Save space• Consistent format• Familiar syntax (ANSI SQL Standard)

Page 6: Data Modeling on NoSQL

Issues at Scale

Page 7: Data Modeling on NoSQL

UI Presentation

Page 8: Data Modeling on NoSQL

UI Presentation

Page 9: Data Modeling on NoSQL

UI Presentation

Page 10: Data Modeling on NoSQL

Where We Came From

Auction

User Bid

Payment

id

email

name

profile_image_url

access_level

created_date

id

user_id

auction_id

amount

timestamp

id

title

image_url

current_price

high_bidder

end_time

id

auction_id

timestamp

card_type

confirmation_number

Page 11: Data Modeling on NoSQL

Data Modelspublic class User { private long id; private String email; private String name; private String profileImageUrl; // AccessLevel is an enum private AccessLevel accessLevel; private Date createdDate; private List<Auction> auctions; private List<Bid> bids; ...}

public class Auction { private long id; private String title; private String imageUrl; private BigDecimal currentPrice; private User highBidder; private Date endTime; private List<Bid> bids; private Payment payment; ...}

public class Bid { private long id; private User user; private Auction auction; private BigDecimal amount; private Date timestamp; ...}

public class Payment { private long id; private Auction auction; private Date timestamp; // Visa, MasterCard, AmEx etc. private String cardType; private String confirmationNumber; ...}

Page 12: Data Modeling on NoSQL

Support Queries

select a.*, b.*from auction ajoin bid bon a.id = b.auction_idwhere a.id = 12345order by b.timestamp desc

• Either manual SQL or ORM generated SQL will wind up joining a few tables to get the desired results

• Joins are not supported by most NoSQL solutions

Get all Bids for a given Auction:

Page 13: Data Modeling on NoSQL

Support Queries

select count(*) from bid where user_id = 554422

• Aggregates in NoSQL are usually not supported• If they are supported, they often have performance or memory issues

select avg(current_price) from auction

select u.name, max(s.bid_count) as bidsfrom (select user_id, count(*) as bid_count from bid group by user_id) as sjoin user u on u.id = s.user_id

Count all Bids for a User:

Get average final price of all Auctions:

Get the User with the most Bids:

Page 14: Data Modeling on NoSQL

Adapt to your Data Store

Model

• Most web app developers think in terms of tables, columns, queries• Many times the schema is simply mirrored in the application layer model objects

• (Not a bad thing, but hard to change)• The most successful/scalable applications embrace the features and limitations of their

chosen datastore

Schema DAO Application

Patterns defined here effect application behavior for data interaction

Model

Access PatternStorage Details

Model

Page 15: Data Modeling on NoSQL

Encouraging Scalable Access Patterns

public class BidDao { // Common API structure, loads all in memory // Also requires that the full User object is available public List<Bid> getBids(User user) {…} ...}

public class BidDao { // Paging is a good option to avoid memory issues public List<Bid> getBids(String userId, int offset, int limit) {…}

// Streaming APIs encourages streaming processing public Iterator<Bid> getBids(String userId) {…} ...}

Common:

Alternative:

Page 16: Data Modeling on NoSQL

Encouraging Scalable Access Patterns

DAO

DAO

Common:

Streaming:

Small buffer

Memory Required

DAO

Paging: Memory Required

Garbage Collected

Memory Required

Page 17: Data Modeling on NoSQL

Adapt to your Data Store

Application

SQL-NoSQL Adapter

DAO DAO DAO

Danger!!If you mask your true datastore semantics,

you risk your scalability

• DataNucleus is a good option if used with discipline• Provides JDO/JPA support

NoSQL Store

Page 18: Data Modeling on NoSQL

Top level concepts to embrace

• Denormalization• Intelligent Key Design• Counters• Sharding

Page 19: Data Modeling on NoSQL

Denormalization

Page 20: Data Modeling on NoSQL

Identify Conceptually Immutable Fieldspublic class User { private long id; private String email; private String name; private String profileImageUrl; // AccessLevel is an enum private AccessLevel accessLevel; private Date createdDate; private List<Auction> auctions; private List<Bid> bids; ...}

public class Auction { private long id; private String title; private String imageUrl; private BigDecimal currentPrice; private User highBidder; private Date endTime; private List<Bid> bids; private Payment payment; ...}

public class UserReference { private long id; private String name; private String profileImageUrl; ...}

public class AuctionReference { private long id; private String title; private String imageUrl; ...}

Page 21: Data Modeling on NoSQL

Modified Data Structurespublic class User { // Changed ids to Strings // (more on that soon) private String id; private String email; private String name; private String profileImageUrl; private AccessLevel accessLevel; private Date createdDate; private List<Auction> auctions; private List<Bid> bids; ...}

public class Auction { private String id; private String title; private String imageUrl; private BigDecimal currentPrice; private UserReference highBidder; private Date endTime; private List<Bid> bids; private Payment payment; ...}

public class Bid { private String id; private UserReference user; private AuctionReference auction; private BigDecimal amount; private Date timestamp; ...}

public class Payment { private String id; private AuctionReference auction; private Date timestamp; // Visa, MasterCard, AmEx etc. private String cardType; private String confirmationNumber; ...}

Page 22: Data Modeling on NoSQL

Modified Data Modelspublic class Bid { // the @Embedded annotation (both JDO and JPA) // indicates that this is not an FK relationship: @Embedded private UserReference user; @Embedded private AuctionReference auction; ...}

…/d288-4af3-8821-27a37269ec0c {amount:”14.00”, user_id:”abc123”, user_name:”Ralph Cifaretto”, user_profile_image:”http://…”, …}

…/d288-4af3-8821-27a37283af10 {amount:”240.00”, user_id:”abc123”, user_name:”Ralph Cifaretto”, user_profile_image:”http://…”, …}

Bidid

user_id

user_name

user_profile_image

amount

timestamp

auction_title

…Under the hood in the data store:

• JDO/JPA configuration is certainly not required• We’re making a copy of the conceptually immutable properties of the user• When we read a Bid record now, we don’t need to go fetch the User record• Nor do we need a join

Page 23: Data Modeling on NoSQL

Manual Marshalingpublic class BidDao { public Bid read(String id) { // This is an HBase-like API, but the idea is the same for most all // NoSQL datastore native APIs: Result result = openConnection().get(“bid”, id); Bid bid = new Bid(); bid.setId(result.getValue(“id”)); ... String userId = result.getValue(“user_id”); String userName = result.getValue(“user_name”); String profileUrl = result.getValue(“user_profile_image”); UserReference user = new UserReference(userId, userName, profileUrl); bid.setUser(user); ... return bid; } ...}

// To access user information: UserReference user = bid.getUser(); String userName = user.getName();

Page 24: Data Modeling on NoSQL

We support access pattern without joins

auction_title

auction_title

auction_title

auction_title

auction_image

.somg

Bidid

user_id

user_name

user_profile_image

amount

timestamp

auction_id

auction_title

auction_image_url

Click on Auction image or name and go to details for Auction

Page 25: Data Modeling on NoSQL

Data is duplicated many (many) times

Bidid amount user_id user_name user_profile_image auction_id auction_title . . .

124 14.00 5432 Gustavo ‘Gus’ Fring http://nj.boss.com… 555111222 Barrel Methylamine . . .

125 13.00 1234 Walter White http://dead.users… 555111222 Barrel Methylamine . . .

126 12.00 2223 Hank Schrader http://dea.bro.com… 555111222 Barrel Methylamine . . .

127 11.00 1234 Walter White http://dead.users… 555111222 Barrel Methylamine . . .

128 10.00 1112 Jesse Pinkman http://facebook.com… 555111222 Barrel Methylamine . . .

129 9.00 2223 Hank Schrader http://dea.bro.com… 555111222 Barrel Methylamine . . .

130 8.00 1234 Walter White http://dead.users… 555111222 Barrel Methylamine . . .

131 7.00 1112 Jesse Pinkman http://facebook.com… 555111222 Barrel Methylamine . . .

132 6.00 1234 Walter White http://dead.users… 555111222 Barrel Methylamine . . .

Userid name profile_image email created_date . . .

5432 Gustavo ‘Gus’ Fring http://nj.boss.com… [email protected] 2008-01-01 . . .

1234 Walter White http://chem.users… [email protected] 2008-02-02 . . .

2223 Hank Schrader http://dea.bro.com… [email protected] 2009-01-12 . . .

1112 Jesse Pinkman http://facebook.com… [email protected] 2008-11-16 . . .

Page 26: Data Modeling on NoSQL

What about updates?

BackendNode(s)Async Request to

change all Bid records related to

this user

Name Change Request

EdgeNode

Time Line

NoSQLResponse

sent to user

Use workers to modify affected

records

Possibly minutes

Page 27: Data Modeling on NoSQL

Denormalization Observations

• We don’t always need ACID compliance• Strict FK enforcement not always required

• MySQL’s MyISAM storage works fine for many situations• Users are getting used to change latency• There is a trade off between horizontal scalability in your app

and patterns we’ve been trained to rely on

Page 28: Data Modeling on NoSQL

Intelligent Key Design

Page 29: Data Modeling on NoSQL

Sample NoSQL Storage Layout

Server 1key001 ...data...

key002 ...data...

key003 ...data...

key004 ...data...

key005 ...data...

key006 ...data...

key007 ...data...

key008 ...data...

key009 ...data...

key010 ...data...

Server 2key011 ...data...

key012 ...data...

key013 ...data...

key014 ...data...

key015 ...data...

key016 ...data...

key017 ...data...

key018 ...data...

key019 ...data...

key020 ...data...

Server 3key021 ...data...

key022 ...data...

key023 ...data...

key024 ...data...

key025 ...data...

key026 ...data...

key027 ...data...

key028 ...data...

key029 ...data...

key030 ...data...

Server nkey091 ...data...

key092 ...data...

key093 ...data...

key094 ...data...

key095 ...data...

key096 ...data...

key097 ...data...

key098 ...data...

key099 ...data...

key100 ...data...

• This scan is “get everything from key16 through key22”• A key-range scan returns N rows in linear time O(N) regardless of the number of rows in the table

• This is not true for relational databases

Page 30: Data Modeling on NoSQL

Intelligent Key Design

abc123 {…}

abc124 {name:”Tony Soprano”, createdDate:”2011-01-12”, email:”[email protected]”, role:”BOSS”}

abc125 {name:”Salvator Bonpensiero”, createdDate:”2014-10-02”, email:”[email protected]”, role:”CAPO”}

abc126 {name:”Christopher Moltisanti”, createdDate:”2012-10-02”, email:”[email protected]”, role:”SOLDIER”}

abc2 {name:”Carmella Soprano”, createdDate:”2011-10-02”, email:”[email protected]”, favoriateCar:”BMW”}

abc20 {name:”Meadow Soprano”, createdDate:”2012-01-02”, email:”[email protected]”, favoriateCar:12.25}

abc21 {someField:”some value”, averageScore:5.75, someOtherDate:”2011-10-02”}

abc22 {…}

bcd1 {…}

bcd12 {…}

Key ordering is lexical

Records can be different schemas

Page 31: Data Modeling on NoSQL

Ascending Timestamp

Bid/2014-10-26T09:00:00.000 {…}

Bid/2014-10-26T09:00:12.975 {…}

Bid/2014-10-26T09:00:14.221 {…}

Bid/2014-10-26T09:00:18.005 {…}

Bid/2014-10-26T09:00:35.572 {…}

Bid/2014-10-26T09:00:40.003 {…}

Bid/2014-10-26T09:00:41.123 {…}

Bid/2014-10-26T09:00:41.124 {…}

Bid/2014-10-26T09:00:41.150 {…}

Bid/2014-10-26T09:00:41.218 {…}

yyyy-MM-ddTHH:mm:ss.SSSis a pretty standard timestamp and lexically orders chronologically

• Great for time-series data• Timeline tracking (viewing data in the order it was processed etc.)

Old

erN

ewer

Page 32: Data Modeling on NoSQL

UI Presentation

Descending Order

Page 33: Data Modeling on NoSQL

UI Presentation

Descending Order

Page 34: Data Modeling on NoSQL

Descending Timestamp

Bid/9223370622642200431 {…}

Bid/9223370622642200478 {…}

Bid/9223370622642200512 {…}

Bid/9223370622642203021 {…}

Bid/9223370622642203897 {…}

Bid/9223370622642204112 {…}

Bid/9223370622642204559 {…}

Bid/9223370622642207054 {…}

Bid/9223370622642215431 {…}

Bid/9223370622642235500 {…}

public class User { // This will yield some ridiculous value like: 9223370622642200431 // Number of millseconds in a year: 3153600000 // This computation will reach 0 in the year 292,471,163 long descendingTimestamp = Long.MAX_VALUE – System.currentTimeMillis();}

New

erO

lder

Page 35: Data Modeling on NoSQL

Descending Timestamp

Bid/9223370622642200431 {… action_id:”12345” …}

Bid/9223370622642200478 {… action_id:”54321” …}

Bid/9223370622642200512 {… action_id:”12345” …}

Bid/9223370622642203021 {… action_id:”22222” …}

Bid/9223370622642203897 {… action_id:”22233” …}

Bid/9223370622642204112 {… action_id:”12345” …}

Bid/9223370622642204559 {… action_id:”22233” …}

Bid/9223370622642207054 {… action_id:”54321” …}

Bid/9223370622642215431 {… action_id:”54321” …}

Bid/9223370622642235500 {… action_id:”12345” …}

1

2

3

4

5

Start with ”Bid/”

Stop after 5 rows

5 most recent bids

• Known as a “range scan”• Very easy to start with some prefix and read for N records• Complexity stays constant for top 5 bids no matter how many bids are in the system

Page 36: Data Modeling on NoSQL

Descending Timestamp

Auction/11222/Bid/9223370622642203021 {… action_id:”11222” …}

Auction/12233/Bid/9223370622642203897 {… action_id:”12233” …}

Auction/12233/Bid/9223370622642204559 {… action_id:”12233” …}

Auction/12345/Bid/9223370622642200431 {… action_id:”12345” …}

Auction/12345/Bid/9223370622642200512 {… action_id:”12345” …}

Auction/12345/Bid/9223370622642204112 {… action_id:”12345” …}

Auction/12345/Bid/9223370622642235500 {… action_id:”12345” …}

Auction/54321/Bid/9223370622642200478 {… action_id:”54321” …}

Auction/54321/Bid/9223370622642207054 {… action_id:”54321” …}

Auction/54321/Bid/9223370622642215431 {… action_id:”54321” …}

1

2

3

4

Start with ”Auction/12345”

Stop after 4 rows

4 most recent bids

“Bid/9223370622642200431”“Auction/12345”

• Now, all Bids for each Auction are located right next to each other• This matches our most used access pattern• We now have information about related data just from the key

• Key-only queries can be used to help speed up apps• Why 4 Bids instead of 5? My example only had 4 records

(or until row “Auction/12346”)

Page 37: Data Modeling on NoSQL

Linking Related Data With Intelligent Keys

1234

12341234

BidAuction/11222/... {…}

Auction/12233/... {…}

Auction/12233/... {…}

Auction/12345/... {…}

Auction/12345/... {…}

Auction/12345/... {…}

Auction/12345/... {…}

Auction/54321/... {…}

Auction/54321/... {…}

Auction/54321/... {…}

Auction11222 {…}

12233 {…}

12345 {…}

54321 {…}

http://myapp.com/api/auctions/12345

datastore.get(”12345”);

datastore.rangeScan(”Auction/12345/”, 5);

Both reads can be done in parallel

Page 38: Data Modeling on NoSQL

Linking Related Data With Intelligent Keys

1234

12341234

AuctionData

Auction/11222/Bid/987321... {…}

Auction/12233/Bid/987534... {…}

Auction/12233/Bid/987635... {…}

Auction/12345 {…, ..., ...}

Auction/12345/Bid/977534... {…}

Auction/12345/Bid/987501... {…}

Auction/12345/Bid/987687... {…}

Auction/12345/Bid/988012... {…}

Auction/54321 {…, ..., ...}

Auction/54321/... {…}

Auction/54321/... {…}

datastore.rangeScan(”Auction/12345”, 6);

Data of completely different schemas / types can be written to the same table co-located on disk

http://myapp.com/api/auctions/12345

Page 39: Data Modeling on NoSQL

Counters

Page 40: Data Modeling on NoSQL

Counterspublic void placeBid(String userId, String auctionId) { // Many NoSQL stores support a native counter via some increment-and-get // After the counter has been incremented, we don’t need to worry about contention long bidCount = datastore.incrementAndGet(auctionId + ”_counter”); BigDecimal amount = bidCount * BID_INCREMENT; long descendingTimestamp = Long.MAX_VALUE - System.currentTimeMillis();

String bidId = ”Auction/” + auctionId + ”/Bid/” + reverseTimestamp + ”/” + amount;

// Increment some helper counters... datastore.incrementAndGet(”global_bidCounter”); datastore.incrementAndGet(auctionId + ”_bidCounter”); datastore.incrementAndGet(userId + ”_bidCounter”);

// ... other logic like creating the Bid object ...

bidDao.write(bidId, bid);}

// Some datastores may have a first-order Counter object: Counter bidCounter = datastore.getCounter(auctionId + ”_counter”); long bidCount = counter.incrementAndGet();

Page 41: Data Modeling on NoSQL

UI Presentation

datastore.incrementAndGet(userId + ”_bidCounter”);

Page 42: Data Modeling on NoSQL

UI Presentation

datastore.incrementAndGet(”global_bidCounter”);

• Global counters are a major bottleneck

Page 43: Data Modeling on NoSQL

Sharding

Page 44: Data Modeling on NoSQL

Data Model Shardingpublic class Auction { private String id; private String title; private String imageUrl; private String description;

private BigDecimal currentPrice; private User highBidder; private Date endTime;

...}

public class AuctionState { private String id; private BigDecimal currentPrice; private User highBidder; private Date endTime;

...}

• Separate frequently changing data from static data• Allows caching of static data• Makes reads/writes of changing data faster

• Separate values expensive to serialize but in-frequently read

Page 45: Data Modeling on NoSQL

12341234 http://myapp.com/api/auctions/12345

More Parallel Reads

1234

AuctionState

Auction11222 {…}

12233 {…}

12345 {…}

54321 {…}

datastore.get(”12345”);

datastore.get(”12345”);

Both records can share the same key

11222 {…}

12233 {…}

12345 {…}

54321 {…}

Memcache CheckCache

Both reads can be done in parallel

Page 46: Data Modeling on NoSQL

12341234

AuctionData

Auction/11222/Bid/987321... {…}

Auction/12233/Bid/987534... {…}

Auction/12233/Bid/987635... {…}

Auction/12345 {…, ..., ...}

Auction/12345/AuctionState {…}

Auction/12345/Bid/977534... {…}

Auction/12345/Bid/987501... {…}

Auction/54321 {…, ..., ...}

Auction/54321/... {…}

More Parallel Reads12341234 http://myapp.com/api/auctions/12345

datastore.get(”Auction/12345/AuctionState”);

datastore.get(”Auction/12345”);

Again, records can be in the same table

Memcache CheckCache

1 4

Page 47: Data Modeling on NoSQL

Sharding a 64 bit Integer

long count = datastore.incrementAndGet(”global_bidCounter”);

176

52 84 40+ + = 176

global_bidCounter

52 84 41 177+ + =53 84 40 177+ + =

52 85 40 177+ + =

• Decompose the counter• Pick any part of the count and increment it

Page 48: Data Modeling on NoSQL

Implementing a Sharded Counterpublic class ShardedCounter { // the @Embedded annotation (both JDO and JPA) // indicates that this is not an FK relationship: private String name; private int shards;

private void increment() { int index = random(shards); datastore.incrementAndGet(name + ”-” + index); }

private long get() { long count = 0;

// All the shards of the counter are located next to each other: Result scan = datastore.rangeScan(name + ”-”, shards); while (scan.hasNext()) { Counter next = scan.next(); count += next.get(); }

return count; }}

Page 49: Data Modeling on NoSQL

We Love Feedback

Questions/CommentsEmail: [email protected]

Rate This Session with the PARTNERS Mobile App

Remember To Share Your Virtual Passes

Follow Teradata 2015 PARTNERSwww.teradata-partners.com/social