Upload
sachin-bhosale
View
131
Download
1
Embed Size (px)
Citation preview
© 2011 Xpanxion all rights reserved
GLOBAL SOFTWARE ENGINEERING EXCELLENCE
MongoDB
<Version 5.1>
17 April 2013
Internal
<Internal Restricted/Confidential(when filled) >
- Sachin Bhosale
© 2011 Xpanxion all rights reserved
The Evolution of Databases 2010
RDBMS
NoSQL
OLAP/BI
Hadoop
2000
RDBMS
OLAP/BI
1990
RDBMS
Operational Data
Datawarehouse
© 2011 Xpanxion all rights reserved
Big Data "Big Data" describes data sets so large and complex they are impractical to
manage with traditional software tools. Big Data relates to data creation, storage, retrieval and analysis that is remarkable in terms of volume, velocity, and variety. Volume - A typical PC might have had 10 gigabytes of storage in 2000.
Today, Facebook ingests 500 terabytes of new data every day Velocity - Clickstreams and ad impressions capture user behavior at
millions of events per second; high-frequency stock trading algorithms reflect market changes within microseconds
Variety - Big Data data isn't just numbers, dates, and strings. Big Data is also geospatial data, 3D data, audio and video, and unstructured text, including log files and social media
© 2011 Xpanxion all rights reserved
Big Data TechnologiesOperational Analytical
Latency 10 ms - 100 ms 1 min - 100 min
Concurrency 1000 - 100,000 1 - 10
Access Pattern Writes and Reads Reads
Queries Selective Unselective
Data Scope Operational Retrospective
End User Customer Data Scientist
Technology NoSQL MapReduce, MPP Database
© 2011 Xpanxion all rights reserved
Relational Database Challenges Data Types
• Unstructured data• Semi-structured data• Polymorphic data
Volume of Data• Petabytes of data• Trillions of records• Tens of millions of queries per second
Agile Development• Iterative• Short development cycles• New workloads
New Architectures• Horizontal scaling • Commodity servers• Cloud computing
© 2011 Xpanxion all rights reserved
NOSQL Categories
Redis Cassandra MongoDB Neo4j
© 2011 Xpanxion all rights reserved
Which one is the best?
© 2011 Xpanxion all rights reserved
What is MongoDB?
MongoDB is a ___________ database
Document
Open source
High performance
Horizontally scalable
Full featured
© 2011 Xpanxion all rights reserved
Document Database Not for .PDF & .DOC files
A document is essentially an associative array
Document == JSON object
Document == PHP Array
Document == Python Dictionary
Document == Ruby Hash
etc
© 2011 Xpanxion all rights reserved
Open Source MongoDB is an open source project
On GitHub
Licensed under the AGPL
Commercial licenses available
Started & sponsored by 10gen
© 2011 Xpanxion all rights reserved
High Performance Written in C++
Extensive use of memory-mapped files
i.e. read-through write-through memory caching.
Runs nearly everywhere
Data serialized as BSON (fast parsing)
Full support for primary & secondary indexes
Document model = less work
© 2011 Xpanxion all rights reserved
Horizontally Scalable
© 2011 Xpanxion all rights reserved
Full Featured Ad Hoc queries
Real time aggregation
Rich query capabilities
Traditionally consistent
Geospatial features
Support for most programming languages
JavaScript, Python, Ruby, PHP, Perl, Java, Scala, C#, C, C++
Flexible schema
© 2011 Xpanxion all rights reserved
MongoDB Installation
Get the MongoDB distributions by platform and version from
http://www.mongodb.org/downloads
MongoDB requires a data folder to store its files. The default location for
the MongoDB data directory is C:\data\db (Windows) or /data/db (Linux)
Running MongoDBWindowsC:\mongodb\bin\mongod.exe --dbpath d:\test\dataLinux./bin/mongod --dbpath /data/mongodb
© 2011 Xpanxion all rights reserved
MongoDB Package Components - 1 Core Processes
mongod mongos mongo
Binary Import and Export Tools mongodump mongorestore bsondump Mongooplog
© 2011 Xpanxion all rights reserved
MongoDB Package Components - 2 Data Import and Export Tools
mongoimport Mongoexport
Diagnostic Tools mongostat mongotop mongosniff Mongoperf
GridFS mongofiles
© 2011 Xpanxion all rights reserved
Mongo Shell
vars / functions / data structs + typesSpidermonkey / V8
ObjectId("...")new Date()Object.bsonsize()
db["collection"].find/count/updateshort-hand for collections
Doesn't require quoted keysDon’t copy and paste too much
Embedded Javascript Interpreter
Global Functions and Objects
MongoDB driver Exposed
JSON-like stuff
© 2011 Xpanxion all rights reserved
Terminology
© 2011 Xpanxion all rights reserved
Core MongoDB Operations (CRUD) - 1 CREATE
insert() - is the primary method to insert a document or documents into a MongoDB collectiondb.studs.insert({_id : 1, name : “Sachin”, score : 110})
save() - performs an insert if the document to save does not contain the _id fielddb.studs.save({name : “Sachin”, score : 110})
READ find() - method returns a cursor that contains a number of documents
db.collection.find( <query>, <projection> ) findOne() - selects a single document from a collection and returns
that document
© 2011 Xpanxion all rights reserved
Core MongoDB Operations (CRUD) - 2 UPDATE
update() - method updates a single document, but by using the multi option, update() can update all documents that match the query criteria in the collection
Update Operators Fields - $inc, $rename, $set, $unset Array - $addToSet, $pop, $pullAll, $pull, $push
save() - performs a special type of update(), depending on the _id field of the specified document
Examplesdb.bios.update( { _id: 3}, {$unset: {birth: 1 } }, { multi: true } )db.bios.update( { _id: 1}, {$set: {'contribs.1': 'ALGOL 58' } } )
© 2011 Xpanxion all rights reserved
Core MongoDB Operations (CRUD) - 3 DELETE
remove() - deletes documents from a collection.db.collection.remove( <query>, <justOne> )
Remove All documentsdb.bios.remove()
Remove a single document that matches a conditiondb.bios.remove( { turing: true }, 1 )
© 2011 Xpanxion all rights reserved
Data Modeling Data in MongoDB has a flexible schema.
Collections do not enforce document structure. documents in the same collection do not need to have the same set of
fields or structure, and common fields in a collection’s documents may hold different types of
data.
MongoDB does not support Joins – on multiple collections Transaction - across multiple documents
© 2011 Xpanxion all rights reserved
Data Modeling Considerations Inherent properties and requirements of the application objects and the
relationships
MongoDB data models must also reflect how data will grow and change over time, and the kinds of queries your application will perform
These considerations and requirements force to make a number of multi-factored decisions: normalization and de-normalization indexing strategy representation of data in arrays in BSON
© 2011 Xpanxion all rights reserved
Data Modeling DecisionsData modeling decisions involve determining how to structure the documents to model the data effectively. Embedding
To de-normalize data, store two related pieces of data in a single document.
Referencing To normalize data, store references between two documents to
indicate a relationship between the data represented in each document.
Atomicity MongoDB only provides atomic operations on the level of a single
document
© 2011 Xpanxion all rights reserved
Aggregation MongoDB introduced the aggregation framework that provides a powerful
and flexible set of tools to use for many data aggregation tasks without having to use map-reduce
While map-reduce is powerful, it is often more difficult than necessary for many simple aggregation tasks, such as totaling or averaging field values.
db.collection.mapReduce() Pipeline Operators and Indexes
$match, $sort, $limit, $skip, $project, $unwind, $group
db.articles.aggregate( { $project : { author : 1, tags : 1, } }, { $unwind : "$tags" }, { $group : { _id : { tags : "$tags" }, authors : { $addToSet : "$author" } } })
© 2011 Xpanxion all rights reserved
Blog Project with MongoDB
Blogger with following functionality Singup New Post Login Logout
It uses Python, Pymongo drivers, MongoDB
© 2011 Xpanxion all rights reserved
Questions ?
© 2011 Xpanxion all rights reserved
Thank You