35

SCHEMA ON READ · RANK DBMS MODEL SCORE GROWTH (20 MO) 1. Oracle Relational DBMS 1,442 -5% 2. MySQL Relational DBMS 1,294 2% 3. Microsoft SQL Server Relational DBMS 1,131 -10% 4

  • Upload
    others

  • View
    9

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SCHEMA ON READ · RANK DBMS MODEL SCORE GROWTH (20 MO) 1. Oracle Relational DBMS 1,442 -5% 2. MySQL Relational DBMS 1,294 2% 3. Microsoft SQL Server Relational DBMS 1,131 -10% 4
Page 2: SCHEMA ON READ · RANK DBMS MODEL SCORE GROWTH (20 MO) 1. Oracle Relational DBMS 1,442 -5% 2. MySQL Relational DBMS 1,294 2% 3. Microsoft SQL Server Relational DBMS 1,131 -10% 4

SCHEMA ON READ

Page 3: SCHEMA ON READ · RANK DBMS MODEL SCORE GROWTH (20 MO) 1. Oracle Relational DBMS 1,442 -5% 2. MySQL Relational DBMS 1,294 2% 3. Microsoft SQL Server Relational DBMS 1,131 -10% 4

Index everything One query type Low latency High concurrency

Index nothing Queries as programs High latency Low concurrency

Page 4: SCHEMA ON READ · RANK DBMS MODEL SCORE GROWTH (20 MO) 1. Oracle Relational DBMS 1,442 -5% 2. MySQL Relational DBMS 1,294 2% 3. Microsoft SQL Server Relational DBMS 1,131 -10% 4

Index everything One query type Low latency High concurrency

Index nothing Queries as programs High latency Low concurrency

Page 5: SCHEMA ON READ · RANK DBMS MODEL SCORE GROWTH (20 MO) 1. Oracle Relational DBMS 1,442 -5% 2. MySQL Relational DBMS 1,294 2% 3. Microsoft SQL Server Relational DBMS 1,131 -10% 4

IT’S POPULAR, BUT WHY?

Page 6: SCHEMA ON READ · RANK DBMS MODEL SCORE GROWTH (20 MO) 1. Oracle Relational DBMS 1,442 -5% 2. MySQL Relational DBMS 1,294 2% 3. Microsoft SQL Server Relational DBMS 1,131 -10% 4
Page 7: SCHEMA ON READ · RANK DBMS MODEL SCORE GROWTH (20 MO) 1. Oracle Relational DBMS 1,442 -5% 2. MySQL Relational DBMS 1,294 2% 3. Microsoft SQL Server Relational DBMS 1,131 -10% 4

7

Diverse operational workloads are common

Top 5 Marketing Firm Government Agency Top 5 Investment Bank

Data Key / Value 10+ fields, arrays, nested documents 20+ fields, arrays, nested documents

Queries Key – based

1-100 docs / query 80/20 read/write

Compound queries Range queries

MapReduce 20/80 read/write

Compound queries Range queries

50/50 read/write

Servers ~250 ~50 4

Ops / Sec 1,200,000 500,000 30,000

Page 8: SCHEMA ON READ · RANK DBMS MODEL SCORE GROWTH (20 MO) 1. Oracle Relational DBMS 1,442 -5% 2. MySQL Relational DBMS 1,294 2% 3. Microsoft SQL Server Relational DBMS 1,131 -10% 4

8

Some deployments are large

Cluster Scale Performance Scale Data Scale

Entertainment Company 1,400 servers 250 Million Ticks / Sec Petabytes

Asian Internet Company 1,000+ servers 300k Ops / Sec 10s of billions of

objects

250+ servers Federal Agency 500k Ops / Sec 13 billion documents

Page 9: SCHEMA ON READ · RANK DBMS MODEL SCORE GROWTH (20 MO) 1. Oracle Relational DBMS 1,442 -5% 2. MySQL Relational DBMS 1,294 2% 3. Microsoft SQL Server Relational DBMS 1,131 -10% 4

9

Multiple indicators suggest adoption is strong

RANK DBMS MODEL SCORE GROWTH (20 MO)

1. Oracle Relational DBMS 1,442 -5%

2. MySQL Relational DBMS 1,294 2%

3. Microsoft SQL Server Relational DBMS 1,131 -10%

4. MongoDB Document Store 277 172%

5. PostgreSQL Relational DBMS 273 40%

6. DB2 Relational DBMS 201 11%

7. Microsoft Access Relational DBMS 146 -26%

8. Cassandra Wide Column 107 87%

9. SQLite Relational DBMS 105 19%

Source: DB-engines database popularity rankings; May 2015

Page 10: SCHEMA ON READ · RANK DBMS MODEL SCORE GROWTH (20 MO) 1. Oracle Relational DBMS 1,442 -5% 2. MySQL Relational DBMS 1,294 2% 3. Microsoft SQL Server Relational DBMS 1,131 -10% 4

Source: Stack Overflow via Stackoverkill.com

Page 11: SCHEMA ON READ · RANK DBMS MODEL SCORE GROWTH (20 MO) 1. Oracle Relational DBMS 1,442 -5% 2. MySQL Relational DBMS 1,294 2% 3. Microsoft SQL Server Relational DBMS 1,131 -10% 4

Source: Stack Overflow via Stackoverkill.com

Page 12: SCHEMA ON READ · RANK DBMS MODEL SCORE GROWTH (20 MO) 1. Oracle Relational DBMS 1,442 -5% 2. MySQL Relational DBMS 1,294 2% 3. Microsoft SQL Server Relational DBMS 1,131 -10% 4

TO ME, THREE THINGS DRIVE THIS ADOPTION

Page 13: SCHEMA ON READ · RANK DBMS MODEL SCORE GROWTH (20 MO) 1. Oracle Relational DBMS 1,442 -5% 2. MySQL Relational DBMS 1,294 2% 3. Microsoft SQL Server Relational DBMS 1,131 -10% 4

13

We asked users why, here’s what they told us

{ CODE } DB SCHEMA XML CONFIG

APPLICATION RELATIONAL DATABASE OBJECT RELATIONAL MAPPING

Page 14: SCHEMA ON READ · RANK DBMS MODEL SCORE GROWTH (20 MO) 1. Oracle Relational DBMS 1,442 -5% 2. MySQL Relational DBMS 1,294 2% 3. Microsoft SQL Server Relational DBMS 1,131 -10% 4

14

We asked users why, here’s what they told us

{ CODE } DB SCHEMA XML CONFIG

APPLICATION RELATIONAL DATABASE OBJECT RELATIONAL MAPPING

Page 15: SCHEMA ON READ · RANK DBMS MODEL SCORE GROWTH (20 MO) 1. Oracle Relational DBMS 1,442 -5% 2. MySQL Relational DBMS 1,294 2% 3. Microsoft SQL Server Relational DBMS 1,131 -10% 4

15

RDBMS MongoDB

Database Database

Table Collection

Index Index

Row Document

Join Embedding & Linking

#1 The data model

Page 16: SCHEMA ON READ · RANK DBMS MODEL SCORE GROWTH (20 MO) 1. Oracle Relational DBMS 1,442 -5% 2. MySQL Relational DBMS 1,294 2% 3. Microsoft SQL Server Relational DBMS 1,131 -10% 4

16

Documents are rich data structures

{ first_name: ‘Paul’, surname: ‘Miller’, cell: 447557505611, city: ‘London’, location: [45.123,47.232], Profession: [‘banking’, ‘finance’, ‘trader’], cars: [ { model: ‘Bentley’, year: 1973, value: 100000}, { model: ‘Rolls Royce’, year: 1965, value: 330000} ]

}

Fields can contain an array of sub-documents

Typed field values

Fields can contain arrays

String

Number

Geo-Location

Fields

Page 17: SCHEMA ON READ · RANK DBMS MODEL SCORE GROWTH (20 MO) 1. Oracle Relational DBMS 1,442 -5% 2. MySQL Relational DBMS 1,294 2% 3. Microsoft SQL Server Relational DBMS 1,131 -10% 4

17

Documents are self-describing

{ product_name: ‘Acme Paint’, color: [‘Red’, ‘Green’],

size_oz: [8, 32], finish: [‘satin’, ‘eggshell’]

}

{ product_name: ‘T-shirt’, size: [‘S’, ‘M’, ‘L’, ‘XL’], color: [‘Heather Gray’ … ],

material: ‘100% cotton’, wash: ‘cold’, dry: ‘tumble dry low’

}

{ product_name: ‘Mountain Bike’, brake_style: ‘mechanical disc’, color: ‘grey’,

frame_material: ‘aluminum’, no_speeds: 21, package_height: ‘7.5x32.9x55’,

weight_lbs: 44.05, suspension_type: ‘dual’, wheel_size_in: 26}

Documents in the same product catalog collection in MongoDB

Page 18: SCHEMA ON READ · RANK DBMS MODEL SCORE GROWTH (20 MO) 1. Oracle Relational DBMS 1,442 -5% 2. MySQL Relational DBMS 1,294 2% 3. Microsoft SQL Server Relational DBMS 1,131 -10% 4

18

#2 Idiomatic drivers & frameworks

Morphia

MEAN Stack

Page 19: SCHEMA ON READ · RANK DBMS MODEL SCORE GROWTH (20 MO) 1. Oracle Relational DBMS 1,442 -5% 2. MySQL Relational DBMS 1,294 2% 3. Microsoft SQL Server Relational DBMS 1,131 -10% 4

// Java: mapsDBObject query = new BasicDBObject(”publisher.founded”, 1980));Map m = collection.findOne(query);Date pubDate = (Date)m.get(”published_date”);

// Javascript: objectsm = collection.findOne({”publisher.founded” : 1980});pubDate = m.published_date; // ISODateyear = pubDate.getUTCFullYear();

# Python: dictionariesm = coll.find_one({”publisher.founded” : 1980 });pubDate = m[”pubDate”].year # datetime.datetime

Documents map to language constructs

Page 20: SCHEMA ON READ · RANK DBMS MODEL SCORE GROWTH (20 MO) 1. Oracle Relational DBMS 1,442 -5% 2. MySQL Relational DBMS 1,294 2% 3. Microsoft SQL Server Relational DBMS 1,131 -10% 4

20

#3 It’s easy…and fun

•  Easy to acquire – AGPL license •  Easy to install and configure – up and running in <5 min •  Easy to get high performance – no black magic for millisecond latency, scale out architecture •  Easy to deliver “always on” – replication and automatic failover built in •  Easy to add, query data – no complex modeling, no DDL

Page 21: SCHEMA ON READ · RANK DBMS MODEL SCORE GROWTH (20 MO) 1. Oracle Relational DBMS 1,442 -5% 2. MySQL Relational DBMS 1,294 2% 3. Microsoft SQL Server Relational DBMS 1,131 -10% 4

21

#3 It’s easy…and fun

•  Easy to acquire – AGPL license •  Easy to install and configure – up and running in <5 min •  Easy to get high performance – no black magic for millisecond latency, scale out architecture •  Easy to deliver “always on” – replication and automatic failover built in •  Easy to add, query data – no complex modeling, no DDL

BUT WHAT ABOUT •  Data governance? •  Referential integrity? •  Analytics?

Page 22: SCHEMA ON READ · RANK DBMS MODEL SCORE GROWTH (20 MO) 1. Oracle Relational DBMS 1,442 -5% 2. MySQL Relational DBMS 1,294 2% 3. Microsoft SQL Server Relational DBMS 1,131 -10% 4

DOCUMENT VALIDATION

Page 23: SCHEMA ON READ · RANK DBMS MODEL SCORE GROWTH (20 MO) 1. Oracle Relational DBMS 1,442 -5% 2. MySQL Relational DBMS 1,294 2% 3. Microsoft SQL Server Relational DBMS 1,131 -10% 4

23

Data governance: document validation

Implement data governance without sacrificing the

agility that comes from schema on read

Page 24: SCHEMA ON READ · RANK DBMS MODEL SCORE GROWTH (20 MO) 1. Oracle Relational DBMS 1,442 -5% 2. MySQL Relational DBMS 1,294 2% 3. Microsoft SQL Server Relational DBMS 1,131 -10% 4

24

Document validation gives you flexible control

•  Use familiar MongoDB Query Language •  Automatically tests each insert/update; delivers warning or error if a rule is broken •  You choose what keys to validate and how

db.runCommand({ collMod: "contacts", validator: { $and: [ {year_of_birth: {$lte: 1994}}, {$or: [ {phone: { $type: ”string"}}, {email: { $type: ”string"}} ]}] }})

Page 25: SCHEMA ON READ · RANK DBMS MODEL SCORE GROWTH (20 MO) 1. Oracle Relational DBMS 1,442 -5% 2. MySQL Relational DBMS 1,294 2% 3. Microsoft SQL Server Relational DBMS 1,131 -10% 4

25

Example validation failure

db.contacts.insert( name: "Fred", email: "[email protected]", year_of_birth: 2012})

Document failed validationWriteResult({ "nInserted": 0, "writeError": { "code": 121, "errmsg": "Document failed validation”}})

Page 26: SCHEMA ON READ · RANK DBMS MODEL SCORE GROWTH (20 MO) 1. Oracle Relational DBMS 1,442 -5% 2. MySQL Relational DBMS 1,294 2% 3. Microsoft SQL Server Relational DBMS 1,131 -10% 4

26

Many ways to validate, no foreign keys yet

•  Can check most things that work with a find expression –  Existence –  Non-existence –  Data type of values –  <, <=, >, >=, ==, != –  AND, OR –  Regular expressions

–  Some geospatial operators (e.g. $geoWithin & $geoIntersects) •  Validate existing data by wrapping expression in $not

Page 27: SCHEMA ON READ · RANK DBMS MODEL SCORE GROWTH (20 MO) 1. Oracle Relational DBMS 1,442 -5% 2. MySQL Relational DBMS 1,294 2% 3. Microsoft SQL Server Relational DBMS 1,131 -10% 4

27

Where MongoDB validation excels (vs. RDBMS)

•  Simple –  Use familiar search expressions (MQL) –  No need for stored procedures

•  Flexible –  Only enforced on mandatory parts of the schema –  Can start adding new data at any point and then add validation later if needed

•  Practical to deploy –  Simple to role out new rules across thousands of production servers

•  Light weight –  Negligible impact to performance

Page 28: SCHEMA ON READ · RANK DBMS MODEL SCORE GROWTH (20 MO) 1. Oracle Relational DBMS 1,442 -5% 2. MySQL Relational DBMS 1,294 2% 3. Microsoft SQL Server Relational DBMS 1,131 -10% 4

28

Controlling validation

validationLevel

off moderate strict

validationAction

warn

No checks

Warn on validation failure for inserts & updates to existing valid documents. Updates to

existing invalid docs OK.

Warn on any validation failure for any insert or update.

error

No checks

Reject invalid inserts & updates to existing valid documents.

Updates to existing invalid docs OK.

Reject any violation of validation rules for any insert or update.

DEFAULT

Page 29: SCHEMA ON READ · RANK DBMS MODEL SCORE GROWTH (20 MO) 1. Oracle Relational DBMS 1,442 -5% 2. MySQL Relational DBMS 1,294 2% 3. Microsoft SQL Server Relational DBMS 1,131 -10% 4

29

Versioning of validators (optional)

•  Application can lazily update documents with an older version or with no version set at all

db.runCommand({ collMod: "contacts", validator: {$or: [{version: {"$exists": false}}, {version: 1, {Name: {"$exists": true}} }, {version: 2, {Name: {"$type": ”string"}} } ] } })

Page 30: SCHEMA ON READ · RANK DBMS MODEL SCORE GROWTH (20 MO) 1. Oracle Relational DBMS 1,442 -5% 2. MySQL Relational DBMS 1,294 2% 3. Microsoft SQL Server Relational DBMS 1,131 -10% 4

SCHEMA DISCOVERY

Page 31: SCHEMA ON READ · RANK DBMS MODEL SCORE GROWTH (20 MO) 1. Oracle Relational DBMS 1,442 -5% 2. MySQL Relational DBMS 1,294 2% 3. Microsoft SQL Server Relational DBMS 1,131 -10% 4
Page 32: SCHEMA ON READ · RANK DBMS MODEL SCORE GROWTH (20 MO) 1. Oracle Relational DBMS 1,442 -5% 2. MySQL Relational DBMS 1,294 2% 3. Microsoft SQL Server Relational DBMS 1,131 -10% 4

FUTURE DECISIONS

Page 33: SCHEMA ON READ · RANK DBMS MODEL SCORE GROWTH (20 MO) 1. Oracle Relational DBMS 1,442 -5% 2. MySQL Relational DBMS 1,294 2% 3. Microsoft SQL Server Relational DBMS 1,131 -10% 4

33

Still lots of hard problems to solve

•  Schema evolution •  Specialized storage engines

–  WORM –  Blockchain –  Proprietary hardware –  Integrated data warehouse

•  Complex transactions

Page 34: SCHEMA ON READ · RANK DBMS MODEL SCORE GROWTH (20 MO) 1. Oracle Relational DBMS 1,442 -5% 2. MySQL Relational DBMS 1,294 2% 3. Microsoft SQL Server Relational DBMS 1,131 -10% 4

34

One surface fits all

Content Repo IoT Sensor Backend Ad Service Customer

Analytics Archive

MongoDB Query Language (MQL) + Native Drivers

MongoDB Document Data Model

BTree LSM

Man

agem

ent

Sec

urity

In-memory WORM Archive

Page 35: SCHEMA ON READ · RANK DBMS MODEL SCORE GROWTH (20 MO) 1. Oracle Relational DBMS 1,442 -5% 2. MySQL Relational DBMS 1,294 2% 3. Microsoft SQL Server Relational DBMS 1,131 -10% 4