28
© 2011 Xpanxion all rights reserved GLOBAL SOFTWARE ENGINEERING EXCELLENCE MongoDB <Version 5.1> 17 April 2013 Internal <Internal Restricted/Confidential(when filled) > - Sachin Bhosale

MongoDB Introduction and Data Modelling

Embed Size (px)

Citation preview

Page 1: MongoDB Introduction and Data Modelling

© 2011 Xpanxion all rights reserved

GLOBAL SOFTWARE ENGINEERING EXCELLENCE

MongoDB

<Version 5.1>

17 April 2013

Internal

<Internal Restricted/Confidential(when filled) >

- Sachin Bhosale

Page 2: MongoDB Introduction and Data Modelling

© 2011 Xpanxion all rights reserved

The Evolution of Databases 2010

RDBMS

NoSQL

OLAP/BI

Hadoop

2000

RDBMS

OLAP/BI

1990

RDBMS

Operational Data

Datawarehouse

Page 3: MongoDB Introduction and Data Modelling

© 2011 Xpanxion all rights reserved

Big Data "Big Data" describes data sets so large and complex they are impractical to

manage with traditional software tools. Big Data relates to data creation, storage, retrieval and analysis that is remarkable in terms of volume, velocity, and variety. Volume - A typical PC might have had 10 gigabytes of storage in 2000.

Today, Facebook ingests 500 terabytes of new data every day Velocity - Clickstreams and ad impressions capture user behavior at

millions of events per second; high-frequency stock trading algorithms reflect market changes within microseconds

Variety - Big Data data isn't just numbers, dates, and strings. Big Data is also geospatial data, 3D data, audio and video, and unstructured text, including log files and social media

Page 4: MongoDB Introduction and Data Modelling

© 2011 Xpanxion all rights reserved

Big Data TechnologiesOperational Analytical

Latency 10 ms - 100 ms 1 min - 100 min

Concurrency 1000 - 100,000 1 - 10

Access Pattern Writes and Reads Reads

Queries Selective Unselective

Data Scope Operational Retrospective

End User Customer Data Scientist

Technology NoSQL MapReduce, MPP Database

Page 5: MongoDB Introduction and Data Modelling

© 2011 Xpanxion all rights reserved

Relational Database Challenges Data Types

• Unstructured data• Semi-structured data• Polymorphic data

Volume of Data• Petabytes of data• Trillions of records• Tens of millions of queries per second

Agile Development• Iterative• Short development cycles• New workloads

New Architectures• Horizontal scaling • Commodity servers• Cloud computing

Page 6: MongoDB Introduction and Data Modelling

© 2011 Xpanxion all rights reserved

NOSQL Categories

Redis Cassandra MongoDB Neo4j

Page 7: MongoDB Introduction and Data Modelling

© 2011 Xpanxion all rights reserved

Which one is the best?

Page 8: MongoDB Introduction and Data Modelling

© 2011 Xpanxion all rights reserved

What is MongoDB?

MongoDB is a ___________ database

Document

Open source

High performance

Horizontally scalable

Full featured

Page 9: MongoDB Introduction and Data Modelling

© 2011 Xpanxion all rights reserved

Document Database Not for .PDF & .DOC files

A document is essentially an associative array

Document == JSON object

Document == PHP Array

Document == Python Dictionary

Document == Ruby Hash

etc

Page 10: MongoDB Introduction and Data Modelling

© 2011 Xpanxion all rights reserved

Open Source MongoDB is an open source project

On GitHub

Licensed under the AGPL

Commercial licenses available

Started & sponsored by 10gen

Page 11: MongoDB Introduction and Data Modelling

© 2011 Xpanxion all rights reserved

High Performance Written in C++

Extensive use of memory-mapped files

i.e. read-through write-through memory caching.

Runs nearly everywhere

Data serialized as BSON (fast parsing)

Full support for primary & secondary indexes

Document model = less work

Page 12: MongoDB Introduction and Data Modelling

© 2011 Xpanxion all rights reserved

Horizontally Scalable

Page 13: MongoDB Introduction and Data Modelling

© 2011 Xpanxion all rights reserved

Full Featured Ad Hoc queries

Real time aggregation

Rich query capabilities

Traditionally consistent

Geospatial features

Support for most programming languages

JavaScript, Python, Ruby, PHP, Perl, Java, Scala, C#, C, C++

Flexible schema

Page 14: MongoDB Introduction and Data Modelling

© 2011 Xpanxion all rights reserved

MongoDB Installation

Get the MongoDB distributions by platform and version from

http://www.mongodb.org/downloads

MongoDB requires a data folder to store its files. The default location for

the MongoDB data directory is C:\data\db (Windows) or /data/db (Linux)

Running MongoDBWindowsC:\mongodb\bin\mongod.exe --dbpath d:\test\dataLinux./bin/mongod --dbpath /data/mongodb

Page 15: MongoDB Introduction and Data Modelling

© 2011 Xpanxion all rights reserved

MongoDB Package Components - 1 Core Processes

mongod mongos mongo

Binary Import and Export Tools mongodump mongorestore bsondump Mongooplog

Page 16: MongoDB Introduction and Data Modelling

© 2011 Xpanxion all rights reserved

MongoDB Package Components - 2 Data Import and Export Tools

mongoimport Mongoexport

Diagnostic Tools mongostat mongotop mongosniff Mongoperf

GridFS mongofiles

Page 17: MongoDB Introduction and Data Modelling

© 2011 Xpanxion all rights reserved

Mongo Shell

vars / functions / data structs + typesSpidermonkey / V8

ObjectId("...")new Date()Object.bsonsize()

db["collection"].find/count/updateshort-hand for collections

Doesn't require quoted keysDon’t copy and paste too much

Embedded Javascript Interpreter

Global Functions and Objects

MongoDB driver Exposed

JSON-like stuff

Page 18: MongoDB Introduction and Data Modelling

© 2011 Xpanxion all rights reserved

Terminology

Page 19: MongoDB Introduction and Data Modelling

© 2011 Xpanxion all rights reserved

Core MongoDB Operations (CRUD) - 1 CREATE

insert() - is the primary method to insert a document or documents into a MongoDB collectiondb.studs.insert({_id : 1, name : “Sachin”, score : 110})

save() - performs an insert if the document to save does not contain the _id fielddb.studs.save({name : “Sachin”, score : 110})

READ find() - method returns a cursor that contains a number of documents

db.collection.find( <query>, <projection> ) findOne() - selects a single document from a collection and returns

that document

Page 20: MongoDB Introduction and Data Modelling

© 2011 Xpanxion all rights reserved

Core MongoDB Operations (CRUD) - 2 UPDATE

update() - method updates a single document, but by using the multi option, update() can update all documents that match the query criteria in the collection

Update Operators Fields - $inc, $rename, $set, $unset Array - $addToSet, $pop, $pullAll, $pull, $push

save() - performs a special type of update(), depending on the _id field of the specified document

Examplesdb.bios.update( { _id: 3}, {$unset: {birth: 1 } }, { multi: true } )db.bios.update( { _id: 1}, {$set: {'contribs.1': 'ALGOL 58' } } )

Page 21: MongoDB Introduction and Data Modelling

© 2011 Xpanxion all rights reserved

Core MongoDB Operations (CRUD) - 3 DELETE

remove() - deletes documents from a collection.db.collection.remove( <query>, <justOne> )

Remove All documentsdb.bios.remove()

Remove a single document that matches a conditiondb.bios.remove( { turing: true }, 1 )

Page 22: MongoDB Introduction and Data Modelling

© 2011 Xpanxion all rights reserved

Data Modeling Data in MongoDB has a flexible schema.

Collections do not enforce document structure. documents in the same collection do not need to have the same set of

fields or structure, and common fields in a collection’s documents may hold different types of

data.

MongoDB does not support Joins – on multiple collections Transaction - across multiple documents

Page 23: MongoDB Introduction and Data Modelling

© 2011 Xpanxion all rights reserved

Data Modeling Considerations Inherent properties and requirements of the application objects and the

relationships

MongoDB data models must also reflect how data will grow and change over time, and the kinds of queries your application will perform

These considerations and requirements force to make a number of multi-factored decisions: normalization and de-normalization indexing strategy representation of data in arrays in BSON

Page 24: MongoDB Introduction and Data Modelling

© 2011 Xpanxion all rights reserved

Data Modeling DecisionsData modeling decisions involve determining how to structure the documents to model the data effectively. Embedding

To de-normalize data, store two related pieces of data in a single document.

Referencing To normalize data, store references between two documents to

indicate a relationship between the data represented in each document.

Atomicity MongoDB only provides atomic operations on the level of a single

document

Page 25: MongoDB Introduction and Data Modelling

© 2011 Xpanxion all rights reserved

Aggregation MongoDB introduced the aggregation framework that provides a powerful

and flexible set of tools to use for many data aggregation tasks without having to use map-reduce

While map-reduce is powerful, it is often more difficult than necessary for many simple aggregation tasks, such as totaling or averaging field values.

db.collection.mapReduce() Pipeline Operators and Indexes

$match, $sort, $limit, $skip, $project, $unwind, $group

db.articles.aggregate( { $project : { author : 1, tags : 1, } }, { $unwind : "$tags" }, { $group : { _id : { tags : "$tags" }, authors : { $addToSet : "$author" } } })

Page 26: MongoDB Introduction and Data Modelling

© 2011 Xpanxion all rights reserved

Blog Project with MongoDB

Blogger with following functionality Singup New Post Login Logout

It uses Python, Pymongo drivers, MongoDB

Page 27: MongoDB Introduction and Data Modelling

© 2011 Xpanxion all rights reserved

Questions ?

Page 28: MongoDB Introduction and Data Modelling

© 2011 Xpanxion all rights reserved

Thank You