67
1 NoSQL: Introduction Asya Kamsky

Intro to NoSQL and MongoDB

Embed Size (px)

Citation preview

Page 1: Intro to NoSQL and MongoDB

1

NoSQL: Introduction

Asya Kamsky

Page 2: Intro to NoSQL and MongoDB

2

• 1970's Relational Databases Invented

– Storage is expensive

– Data is normalized

– Data storage is abstracted away from app

Page 3: Intro to NoSQL and MongoDB

3

• 1970's Relational Databases Invented

– Storage is expensive

– Data is normalized

– Data storage is abstracted away from app

• 1980's RDBMS commercialized

– Client/Server model

– SQL becomes the standard

Page 4: Intro to NoSQL and MongoDB

4

• 1970's Relational Databases Invented

– Storage is expensive

– Data is normalized

– Data storage is abstracted away from app

• 1980's RDBMS commercialized

– Client/Server model

– SQL becomes the standard

• 1990's Things begin to change

– Client/Server=> 3-tier architecture

– Rise of the Internet and the Web

Page 5: Intro to NoSQL and MongoDB

5

• 2000's Web 2.0

– Rise of "Social Media"

– Acceptance of E-Commerce

– Constant decrease of HW prices

– Massive increase of collected data

Page 6: Intro to NoSQL and MongoDB

6

• 2000's Web 2.0

– Rise of "Social Media"

– Acceptance of E-Commerce

– Constant decrease of HW prices

– Massive increase of collected data

• Result

– Constant need to scale dramatically

– How can we scale?

Page 7: Intro to NoSQL and MongoDB

7

OLTP / operational

BI / reporting

+ complex transactions

+ tabular data

+ ad hoc queries

- O<->R mapping hard

- speed/scale problems

- not super agile

Page 8: Intro to NoSQL and MongoDB

8

OLTP / operational

BI / reporting

+ complex transactions

+ tabular data

+ ad hoc queries

- O<->R mapping hard

- speed/scale problems

- not super agile

+ ad hoc queries

+ SQL standard

protocol between

clients and servers

+ scales horizontally

better than oper dbs.

- some scale limits at

massive scale

- schemas are rigid

- no real time; great at

bulk nightly data loads

Page 9: Intro to NoSQL and MongoDB

9

OLTP / operational

BI / reporting

+ complex transactions

+ tabular data

+ ad hoc queries

- O<->R mapping hard

- speed/scale problems

- not super agile

+ ad hoc queries

+ SQL standard

protocol between

clients and servers

+ scales horizontally

better than oper dbs.

- some scale limits at

massive scale

- schemas are rigid

- no real time; great at

bulk nightly data loads

fewer issues here

Page 10: Intro to NoSQL and MongoDB

10

OLTP / operational

BI / reporting

+ complex transactions

+ tabular data

+ ad hoc queries

- O<->R mapping hard

- speed/scale problems

- not super agile

+ ad hoc queries

+ SQL standard

protocol between

clients and servers

+ scales horizontally

better than oper dbs.

- some scale limits at

massive scale

- schemas are rigid

- no real time; great at

bulk nightly data loads

fewer issues here

a lot more issues here

Page 11: Intro to NoSQL and MongoDB

11

OLTP / operational

BI / reporting

caching

flat files

map/reduce

app layer partitioning

+ complex transactions

+ tabular data

+ ad hoc queries

- O<->R mapping hard

- speed/scale problems

- not super agile

+ ad hoc queries

+ SQL standard

protocol between

clients and servers

+ scales horizontally

better than oper dbs.

- some scale limits at

massive scale

- schemas are rigid

- no real time; great at

bulk nightly data loads

Page 12: Intro to NoSQL and MongoDB

12

• Agile Development Methodology • Shorter development cycles

• Constant evolution of requirements

• Flexibility at design time

Page 13: Intro to NoSQL and MongoDB

13

• Agile Development Methodology • Shorter development cycles

• Constant evolution of requirements

• Flexibility at design time

• Relational Schema • Hard to evolve

• long painful migrations

• must stay in sync with

application

• few developers interact directly

Page 14: Intro to NoSQL and MongoDB

14

Page 15: Intro to NoSQL and MongoDB

15

Page 16: Intro to NoSQL and MongoDB

16

• Horizontal scaling

• More real time requirements

• Faster development time

• Flexible data model

• Low upfront cost

• Low cost of ownership

Page 17: Intro to NoSQL and MongoDB

17

Relational

vs

Non-Relational

What is NoSQL?

Page 18: Intro to NoSQL and MongoDB

18

scalable nonrelational (“nosql”)

OLTP / operational

BI / reporting

+ speed and scale

- ad hoc query limited

- not very transactional

- no sql/no standard

+ fits OO well

+ agile

Page 19: Intro to NoSQL and MongoDB

19

Non-relational next generation

operation data stores and databases

A collection of very different products

• Different data models (Not relational)

• Most are not using SQL for queries

• No predefined schema

• Some allow flexible data structures

Page 20: Intro to NoSQL and MongoDB

20

• Relational

• Key-Value

• Document

• XML

• Graph

• Column

Page 21: Intro to NoSQL and MongoDB

21

• Relational

• ACID

• Key-Value

• Document

• XML

• Graph

• Column

• BASE

Page 22: Intro to NoSQL and MongoDB

22

• Relational

• ACID

• Two-phase commit

• Key-Value

• Document

• XML

• Graph

• Column

• BASE

• Atomic transactions on

document level

Page 23: Intro to NoSQL and MongoDB

23

• Relational

• ACID

• Two-phase commit

• Joins

• Key-Value

• Document

• XML

• Graph

• Column

• BASE

• Atomic transactions on

document level

• No Joins

Page 24: Intro to NoSQL and MongoDB

24

Page 25: Intro to NoSQL and MongoDB

25

• Transaction rate

• Reliability

• Maintainability

• Ease of Use

• Scalability

• Cost

Page 26: Intro to NoSQL and MongoDB

26

MongoDB: Introduction

Page 27: Intro to NoSQL and MongoDB

27

• Designed and developed by founders of Doubleclick, ShopWiki, GILT groupe, etc.

• Coding started fall 2007

• First production site March 2008 - businessinsider.com

• Open Source – AGPL, written in C++

• Version 0.8 – first official release February 2009

• Version 1.0 – August 2009

• Version 2.0 – September 2011

Page 28: Intro to NoSQL and MongoDB

28

MongoDB

Design Goals

Page 29: Intro to NoSQL and MongoDB

29

Page 30: Intro to NoSQL and MongoDB

30

• Document-oriented

Storage

• Based on JSON

Documents

• Flexible Schema

• Scalable Architecture

• Auto-sharding

• Replication & high

availability

• Key Features Include:

• Full featured indexes

• Query language

• Map/Reduce &

Aggregation

Page 31: Intro to NoSQL and MongoDB

31

• Rich data models

• Seamlessly map to native programming

language types

• Flexible for dynamic data

• Better data locality

Page 32: Intro to NoSQL and MongoDB

32

Page 33: Intro to NoSQL and MongoDB

33

{

_id : ObjectId("4e2e3f92268cdda473b628f6"),

title : “Too Big to Fail”,

when : Date(“2011-07-26”),

author : “joe”,

text : “blah”

}

Page 34: Intro to NoSQL and MongoDB

34

{

_id : ObjectId("4e2e3f92268cdda473b628f6"),

title : “Too Big to Fail”,

when : Date(“2011-07-26”),

author : “joe”,

text : “blah”,

tags : [“business”, “news”, “north america”]

}

> db.posts.find( { tags : “news” } )

Page 35: Intro to NoSQL and MongoDB

35

{

_id : ObjectId("4e2e3f92268cdda473b628f6"),

title : “Too Big to Fail”,

when : Date(“2011-07-26”),

author : “joe”,

text : “blah”,

tags : [“business”, “news”, “north america”],

votes : 3,

voters : [“dmerr”, “sj”, “jane” ]

}

Page 36: Intro to NoSQL and MongoDB

36

{

_id : ObjectId("4e2e3f92268cdda473b628f6"),

title : “Too Big to Fail”,

when : Date(“2011-07-26”),

author : “joe”,

text : “blah”,

tags : [“business”, “news”, “north america”],

votes : 3,

voters : [“dmerr”, “sj”, “jane” ],

comments : [

{ by : “tim157”, text : “great story” },

{ by : “gora”, text : “i don’t think so” },

{ by : “dmerr”, text : “also check out...” }

]

}

Page 37: Intro to NoSQL and MongoDB

37

{

_id : ObjectId("4e2e3f92268cdda473b628f6"),

title : “Too Big to Fail”,

when : Date(“2011-07-26”),

author : “joe”,

text : “blah”,

tags : [“business”, “news”, “north america”],

votes : 3,

voters : [“dmerr”, “sj”, “jane” ],

comments : [

{ by : “tim157”, text : “great story” },

{ by : “gora”, text : “i don’t think so” },

{ by : “dmerr”, text : “also check out...” }

]

}

> db.posts.find( { “comments.by” : “gora” } )

> db.posts.ensureIndex( { “comments.by” : 1 } )

Page 38: Intro to NoSQL and MongoDB

38

Seek = 5+ ms Read = really really fast

Post

Author Comment

Page 39: Intro to NoSQL and MongoDB

39

Post

Author

Comment Comment Comment Comment Comment

Disk seeks and data locality

Page 40: Intro to NoSQL and MongoDB

40

• Sophisticated secondary indexes

• Dynamic queries

• Sorting

• Rich updates, upserts

• Easy aggregation

• Viable primary data store

Page 41: Intro to NoSQL and MongoDB

41

• Scale linearly

• High Availability

• Increase capacity with no downtime

• Transparent to the application

Page 42: Intro to NoSQL and MongoDB

42

Replica Sets

• High Availability/Automatic Failover

• Data Redundancy

• Disaster Recovery

• Transparent to the application

• Perform maintenance with no down time

Page 43: Intro to NoSQL and MongoDB

43

Asynchronous

Replication

Page 44: Intro to NoSQL and MongoDB

44

Asynchronous

Replication

Page 45: Intro to NoSQL and MongoDB

45

Asynchronous

Replication

Page 46: Intro to NoSQL and MongoDB

46

Page 47: Intro to NoSQL and MongoDB

47

Automatic

Election

Page 48: Intro to NoSQL and MongoDB

48

Page 49: Intro to NoSQL and MongoDB

49

• Increase capacity with no downtime

• Transparent to the application

Page 50: Intro to NoSQL and MongoDB

50

• Increase capacity with no downtime

• Transparent to the application

• Range based partitioning

• Partitioning and balancing is automatic

Page 51: Intro to NoSQL and MongoDB

51

mongod

Write Scalability

Key Range

0..100

Page 52: Intro to NoSQL and MongoDB

52

Write Scalability

mongod mongod

Key Range

0..50

Key Range

51..100

Page 53: Intro to NoSQL and MongoDB

53

mongod mongod mongod mongod

Key Range

0..25

Key Range

26..50

Key Range

51..75 Key Range

76.. 100

Write Scalability

Page 54: Intro to NoSQL and MongoDB

54

Primary

Secondary

Secondary

Primary

Secondary

Secondary

Primary

Secondary

Secondary

Primary

Secondary

Secondary

Key Range

0..25

Key Range

26..50

Key Range

51..75

Key Range

76.. 100

Page 55: Intro to NoSQL and MongoDB

55

Primary

Secondary

Secondary

Primary

Secondary

Secondary

Primary

Secondary

Secondary

Primary

Secondary

Secondary

Key Range

0..25

Key Range

26..50

Key Range

51..75

Key Range

76.. 100

MongoS

Application

Page 56: Intro to NoSQL and MongoDB

56

Primary

Secondary

Secondary

Primary

Secondary

Secondary

Primary

Secondary

Secondary

Primary

Secondary

Secondary

Key Range

0..25

Key Range

26..50

Key Range

51..75

Key Range

76.. 100

MongoS MongoS MongoS

Application

Page 57: Intro to NoSQL and MongoDB

57

Primary

Secondary

Secondary

Primary

Secondary

Secondary

Primary

Secondary

Secondary

Primary

Secondary

Secondary

Key Range

0..25

Key Range

26..50

Key Range

51..75

Key Range

76.. 100

MongoS MongoS MongoS

Config

Config

Config

Application

Page 58: Intro to NoSQL and MongoDB

58

• Few configuration options

• Does the right thing out of the box

• Easy to deploy and manage

Page 59: Intro to NoSQL and MongoDB

59

MySQL

START TRANSACTION;

INSERT INTO contacts VALUES

(NULL, ‘joeblow’);

INSERT INTO contact_emails VALUES

( NULL, ”[email protected]”,

LAST_INSERT_ID() ),

( NULL, “[email protected]”,

LAST_INSERT_ID() );

COMMIT;

MongoDB

db.contacts.save( {

userName: “joeblow”,

emailAddresses: [

[email protected]”,

[email protected]” ] } );

Page 60: Intro to NoSQL and MongoDB

60

MySQL

START TRANSACTION;

INSERT INTO contacts VALUES

(NULL, ‘joeblow’);

INSERT INTO contact_emails VALUES

( NULL, ”[email protected]”,

LAST_INSERT_ID() ),

( NULL, “[email protected]”,

LAST_INSERT_ID() );

COMMIT;

MongoDB

db.contacts.save( {

userName: “joeblow”,

emailAddresses: [

[email protected]”,

[email protected]” ] } );

• Native drivers for dozens of languages

• Data maps naturally to OO data

structures

Page 61: Intro to NoSQL and MongoDB

61

MongoDB Usage Examples

Page 62: Intro to NoSQL and MongoDB

62

User Data Management High Volume Data Feeds

Content Management Operational Intelligence E-Commerce

Page 63: Intro to NoSQL and MongoDB

63

Analyze a staggering amount of data for a system build on continuous stream of high-quality text pulled from online sources

Adding too much data too quickly resulted in outages; tables locked for tens of seconds during inserts

Initially launched entirely on MySQL but quickly hit performance road blocks

Problem

Life with MongoDB has been good for Wordnik. Our code is faster, more flexible and dramatically smaller. Since we don’t spend time worrying about the database, we can spend more time writing code for our application. -Tony Tam, Vice President of Engineering and Technical Co-founder

Migrated 5 billion records in a single day with zero downtime

MongoDB powers every website request: 20m API calls per day

Ability to eliminate memcached layer, creating a simplified system that required fewer resources and was less prone to error.

Why MongoDB

Reduced code by 75% compared to MySQL

Fetch time cut from 400ms to 60ms

Sustained insert speed of 8k words per second, with frequent bursts of up to 50k per second

Significant cost savings and 15% reduction in servers

Impact

Wordnik uses MongoDB as the foundation for its “live” dictionary that stores its entire

text corpus – 3.5T of data in 20 billion records

Page 64: Intro to NoSQL and MongoDB

64

Intuit hosts more than 500,000 websites

wanted to collect and analyze data to recommend conversion and lead generation improvements to customers.

With 10 years worth of user data, it took several days to process the information using a relational database.

Problem

MongoDB's querying and Map/Reduce functionality could server as a simpler, higher-performance solution than a complex Hadoop implementation.

The strength of the MongoDB community.

Why MongoDB

In one week Intuit was able to become proficient in MongoDB development

Developed application features more quickly for MongoDB than for relational databases

MongoDB was 2.5 times faster than MySQL

Impact

Intuit relies on a MongoDB-powered real-time analytics tool for small businesses to

derive interesting and actionable patterns from their customers’ website traffic

We did a prototype for one week, and within one week we had made big progress. Very big progress. It was so amazing that we decided, “Let’s go with this.” -Nirmala Ranganathan, Intuit

Page 65: Intro to NoSQL and MongoDB

65

Managing 20TB of data (six billion images for millions of customers) partitioning by function.

Home-grown key value store on top of their Oracle database offered sub-par performance

Codebase for this hybrid store became hard to manage

High licensing, HW costs

Problem

JSON-based data structure

Provided Shutterfly with an agile, high performance, scalable solution at a low cost.

Works seamlessly with Shutterfly’s services-based architecture

Why MongoDB

500% cost reduction and 900% performance improvement compared to previous Oracle implementation

Accelerated time-to-market for nearly a dozen projects on MongoDB

Improved Performance by reducing average latency for inserts from 400ms to 2ms.

Impact

Shutterfly uses MongoDB to safeguard more than six billion images for millions of

customers in the form of photos and videos, and turn everyday pictures into keepsakes

The “really killer reason” for using MongoDB is its rich JSON-based data structure, which offers Shutterfly an agile approach to develop software. With MongoDB, the Shutterfly team can quickly develop and deploy new applications, especially Web 2.0 and social features. -Kenny Gorman, Director of Data Services

Page 66: Intro to NoSQL and MongoDB

66

Page 67: Intro to NoSQL and MongoDB

67

Open source, high performance database