Intro to NoSQL and MongoDB

NoSQL: Introduction

Asya Kamsky

• 1970's Relational Databases Invented

– Storage is expensive

– Data is normalized

– Data storage is abstracted away from app

• 1980's RDBMS commercialized

– Client/Server model

– SQL becomes the standard

• 1980's RDBMS commercialized

– Client/Server model

– SQL becomes the standard

• 1990's Things begin to change

– Client/Server=> 3-tier architecture

– Rise of the Internet and the Web

• 2000's Web 2.0

– Rise of "Social Media"

– Acceptance of E-Commerce

– Constant decrease of HW prices

– Massive increase of collected data

• 2000's Web 2.0

– Rise of "Social Media"

– Acceptance of E-Commerce

– Constant decrease of HW prices

– Massive increase of collected data

• Result

– Constant need to scale dramatically

– How can we scale?

OLTP / operational

BI / reporting

+ complex transactions

+ tabular data

+ ad hoc queries

- O<->R mapping hard

- speed/scale problems

- not super agile

OLTP / operational

BI / reporting

+ tabular data

+ ad hoc queries

- not super agile

+ ad hoc queries

+ SQL standard

protocol between

clients and servers

+ scales horizontally

better than oper dbs.

- some scale limits at

massive scale

- schemas are rigid

- no real time; great at

bulk nightly data loads

OLTP / operational

BI / reporting

+ tabular data

+ ad hoc queries

- not super agile

+ ad hoc queries

+ SQL standard

protocol between

clients and servers

massive scale

- schemas are rigid

fewer issues here

OLTP / operational

BI / reporting

+ tabular data

+ ad hoc queries

- not super agile

+ ad hoc queries

+ SQL standard

protocol between

clients and servers

massive scale

- schemas are rigid

fewer issues here

a lot more issues here

OLTP / operational

BI / reporting

caching

flat files

map/reduce

app layer partitioning

+ tabular data

+ ad hoc queries

- not super agile

+ ad hoc queries

+ SQL standard

protocol between

clients and servers

massive scale

- schemas are rigid

• Agile Development Methodology • Shorter development cycles

• Constant evolution of requirements

• Flexibility at design time

• Agile Development Methodology • Shorter development cycles

• Constant evolution of requirements

• Flexibility at design time

• Relational Schema • Hard to evolve

• long painful migrations

• must stay in sync with

application

• few developers interact directly

• Horizontal scaling

• More real time requirements

• Faster development time

• Flexible data model

• Low upfront cost

• Low cost of ownership

Relational

Non-Relational

What is NoSQL?

scalable nonrelational (“nosql”)

OLTP / operational

BI / reporting

+ speed and scale

- ad hoc query limited

- not very transactional

- no sql/no standard

+ fits OO well

+ agile

Non-relational next generation

operation data stores and databases

A collection of very different products

• Different data models (Not relational)

• Most are not using SQL for queries

• No predefined schema

• Some allow flexible data structures

• Relational

• Key-Value

• Document

• XML

• Graph

• Column

• Relational

• ACID

• Key-Value

• Document

• XML

• Graph

• Column

• BASE

• Relational

• ACID

• Two-phase commit

• Key-Value

• Document

• XML

• Graph

• Column

• BASE

• Atomic transactions on

document level

• Relational

• ACID

• Two-phase commit

• Joins

• Key-Value

• Document

• XML

• Graph

• Column

• BASE

• Atomic transactions on

document level

• No Joins

• Transaction rate

• Reliability

• Maintainability

• Ease of Use

• Scalability

• Cost

MongoDB: Introduction

• Designed and developed by founders of Doubleclick, ShopWiki, GILT groupe, etc.

• Coding started fall 2007

• First production site March 2008 - businessinsider.com

• Open Source – AGPL, written in C++

• Version 0.8 – first official release February 2009

• Version 1.0 – August 2009

• Version 2.0 – September 2011

MongoDB

Design Goals

• Document-oriented

Storage

• Based on JSON

Documents

• Flexible Schema

• Scalable Architecture

• Auto-sharding

• Replication & high

availability

• Key Features Include:

• Full featured indexes

• Query language

• Map/Reduce &

Aggregation

• Rich data models

• Seamlessly map to native programming

language types

• Flexible for dynamic data

• Better data locality

_id : ObjectId("4e2e3f92268cdda473b628f6"),

title : “Too Big to Fail”,

when : Date(“2011-07-26”),

author : “joe”,

text : “blah”

when : Date(“2011-07-26”),

author : “joe”,

text : “blah”,

tags : [“business”, “news”, “north america”]

> db.posts.find( { tags : “news” } )

when : Date(“2011-07-26”),

author : “joe”,

text : “blah”,

tags : [“business”, “news”, “north america”],

votes : 3,

voters : [“dmerr”, “sj”, “jane” ]

when : Date(“2011-07-26”),

author : “joe”,

text : “blah”,

votes : 3,

voters : [“dmerr”, “sj”, “jane” ],

comments : [

{ by : “tim157”, text : “great story” },

{ by : “gora”, text : “i don’t think so” },

{ by : “dmerr”, text : “also check out...” }

when : Date(“2011-07-26”),

author : “joe”,

text : “blah”,

votes : 3,

voters : [“dmerr”, “sj”, “jane” ],

comments : [

{ by : “tim157”, text : “great story” },

{ by : “gora”, text : “i don’t think so” },

{ by : “dmerr”, text : “also check out...” }

> db.posts.find( { “comments.by” : “gora” } )

> db.posts.ensureIndex( { “comments.by” : 1 } )

Seek = 5+ ms Read = really really fast

Author Comment

Author

Comment Comment Comment Comment Comment

Disk seeks and data locality

• Sophisticated secondary indexes

• Dynamic queries

• Sorting

• Rich updates, upserts

• Easy aggregation

• Viable primary data store

• Scale linearly

• High Availability

• Increase capacity with no downtime

• Transparent to the application

Replica Sets

• High Availability/Automatic Failover

• Data Redundancy

• Disaster Recovery

• Perform maintenance with no down time

Asynchronous

Replication

Asynchronous

Replication

Asynchronous

Replication

Automatic

Election

• Range based partitioning

• Partitioning and balancing is automatic

mongod

Write Scalability

Key Range

0..100

Write Scalability

mongod mongod

Key Range

51..100

mongod mongod mongod mongod

Key Range

26..50

Key Range

51..75 Key Range

76.. 100

Write Scalability

Primary

Secondary

Primary

Secondary

Primary

Secondary

Primary

Secondary

Key Range

26..50

Key Range

51..75

Key Range

76.. 100

Primary

Secondary

Primary

Secondary

Primary

Secondary

Primary

Secondary

Key Range

26..50

Key Range

51..75

Key Range

76.. 100

MongoS

Application

Primary

Secondary

Primary

Secondary

Primary

Secondary

Primary

Secondary

Key Range

26..50

Key Range

51..75

Key Range

76.. 100

MongoS MongoS MongoS

Application

Primary

Secondary

Primary

Secondary

Primary

Secondary

Primary

Secondary

Key Range

26..50

Key Range

51..75

Key Range

76.. 100

MongoS MongoS MongoS

Config

Application

• Few configuration options

• Does the right thing out of the box

• Easy to deploy and manage

START TRANSACTION;

INSERT INTO contacts VALUES

(NULL, ‘joeblow’);

INSERT INTO contact_emails VALUES

( NULL, ”joe@blow.com”,

LAST_INSERT_ID() ),

( NULL, “joseph@blow.com”,

LAST_INSERT_ID() );

COMMIT;

MongoDB

db.contacts.save( {

userName: “joeblow”,

emailAddresses: [

“joe@blow.com”,

“joseph@blow.com” ] } );

START TRANSACTION;

INSERT INTO contacts VALUES

(NULL, ‘joeblow’);

INSERT INTO contact_emails VALUES

( NULL, ”joe@blow.com”,

LAST_INSERT_ID() ),

( NULL, “joseph@blow.com”,

LAST_INSERT_ID() );

COMMIT;

MongoDB

db.contacts.save( {

userName: “joeblow”,

emailAddresses: [

“joe@blow.com”,

“joseph@blow.com” ] } );

• Native drivers for dozens of languages

• Data maps naturally to OO data

structures

MongoDB Usage Examples

User Data Management High Volume Data Feeds

Content Management Operational Intelligence E-Commerce

Analyze a staggering amount of data for a system build on continuous stream of high-quality text pulled from online sources

Adding too much data too quickly resulted in outages; tables locked for tens of seconds during inserts

Initially launched entirely on MySQL but quickly hit performance road blocks

Problem

Life with MongoDB has been good for Wordnik. Our code is faster, more flexible and dramatically smaller. Since we don’t spend time worrying about the database, we can spend more time writing code for our application. -Tony Tam, Vice President of Engineering and Technical Co-founder

Migrated 5 billion records in a single day with zero downtime

MongoDB powers every website request: 20m API calls per day

Ability to eliminate memcached layer, creating a simplified system that required fewer resources and was less prone to error.

Why MongoDB

Reduced code by 75% compared to MySQL

Fetch time cut from 400ms to 60ms

Sustained insert speed of 8k words per second, with frequent bursts of up to 50k per second

Significant cost savings and 15% reduction in servers

Impact

Wordnik uses MongoDB as the foundation for its “live” dictionary that stores its entire

text corpus – 3.5T of data in 20 billion records

Intuit hosts more than 500,000 websites

wanted to collect and analyze data to recommend conversion and lead generation improvements to customers.

With 10 years worth of user data, it took several days to process the information using a relational database.

Problem

MongoDB's querying and Map/Reduce functionality could server as a simpler, higher-performance solution than a complex Hadoop implementation.

The strength of the MongoDB community.

Why MongoDB

In one week Intuit was able to become proficient in MongoDB development

Developed application features more quickly for MongoDB than for relational databases

MongoDB was 2.5 times faster than MySQL

Impact

Intuit relies on a MongoDB-powered real-time analytics tool for small businesses to

derive interesting and actionable patterns from their customers’ website traffic

We did a prototype for one week, and within one week we had made big progress. Very big progress. It was so amazing that we decided, “Let’s go with this.” -Nirmala Ranganathan, Intuit

Managing 20TB of data (six billion images for millions of customers) partitioning by function.

Home-grown key value store on top of their Oracle database offered sub-par performance

Codebase for this hybrid store became hard to manage

High licensing, HW costs

Problem

JSON-based data structure

Provided Shutterfly with an agile, high performance, scalable solution at a low cost.

Works seamlessly with Shutterfly’s services-based architecture

Why MongoDB

500% cost reduction and 900% performance improvement compared to previous Oracle implementation

Accelerated time-to-market for nearly a dozen projects on MongoDB

Improved Performance by reducing average latency for inserts from 400ms to 2ms.

Impact

Shutterfly uses MongoDB to safeguard more than six billion images for millions of

customers in the form of photos and videos, and turn everyday pictures into keepsakes

The “really killer reason” for using MongoDB is its rich JSON-based data structure, which offers Shutterfly an agile approach to develop software. With MongoDB, the Shutterfly team can quickly develop and deploy new applications, especially Web 2.0 and social features. -Kenny Gorman, Director of Data Services

Open source, high performance database

Intro to NoSQL and MongoDB

Technology

mongoDB - A document-based NoSQL database

Database NoSql Document Oriented - MongoDB

NoSql y MongoDB

Introduction to NoSQL and MongoDB - College of · PDF fileIntroduction to NoSQL and MongoDB ... Northeastern University 1. Outline for today • Introduction to NoSQL ... • Like

NoSQL & MongoDB

Mongodb - NoSql Database

Overview on NoSQL and MongoDB

Certified NoSQL Analyst (MongoDB) (CNA)

NoSQL: From Oracle to MongoDB

NoSQL Shootout: RavenDB vs MongoDB

Introducción a NoSQL y MongoDB Webinar

NoSQL Concepts MongoDB Concepts MongoDB Demos Agenda

Nosql Now 2012: MongoDB Use Cases

NoSQL Nedir MongoDB ile .NET Kardeşliği

MongoDB - An Agile NoSQL Database

NoSQL - MongoDB

Introdução ao MongoDB (NoSQL)

MongoDB Basics - NoSQL Tutorial

MongoDB & NoSQL 101

NoSQL, MongoDB y MongoMapper