MongoDB Berlin Aggregation

AggregationNew framework in MongoDB

Alvin Richards

Technical Director, EMEAalvin@10gen.com

@jonnyeight

What problem are we solving?

• Map/Reduce can be used for aggregation…• Currently being used for totaling, averaging, etc

• Map/Reduce is a big hammer• Simpler tasks should be easier

• Shouldn’t need to write JavaScript• Avoid the overhead of JavaScript engine

• We’re seeing requests for help in handling complex documents• Select only matching subdocuments or arrays

How will we solve the problem?

• New aggregation framework• Declarative framework (no JavaScript)• Describe a chain of operations to apply• Expression evaluation

• Return computed values• Framework: new operations added easily• C++ implementation

Aggregation - Pipelines

• Aggregation requests specify a pipeline• A pipeline is a series of operations• Members of a collection are passed

through a pipeline to produce a result• e.g. ps -ef | grep -i mongod

Example - twitter{

"_id" : ObjectId("4f47b268fb1c80e141e9888c"),

"user" : {

"friends_count" : 73,

"location" : "Brazil",

"screen_name" : "Bia_cunha1",

"name" : "Beatriz Helena Cunha",

"followers_count" : 102,

• Find the # of followers and # friends by location

Example - twitterdb.tweets.aggregate( {$match: {"user.friends_count": { $gt: 0 }, "user.followers_count": { $gt: 0 } } }, {$project: { location: "$user.location", friends: "$user.friends_count", followers: "$user.followers_count" } }, {$group: {_id: "$location", friends: {$sum: "$friends"}, followers: {$sum: "$followers"} } });

Predicate

Parts of the document you want to project

Predicate

Parts of the document you want to project

Function to apply to the

result set

Example - twitter{ "result" : [ { "_id" : "Far Far Away", "friends" : 344, "followers" : 789 },... ], "ok" : 1}

Pipeline Operations• $match

• Uses a query predicate (like .find({…})) as a filter• $project

• Uses a sample document to determine the shape of the result (similar to .find()’s optional argument)• This can include computed values

• $unwind• Hands out array elements one at a time

• $group• Aggregates items into buckets defined by a key

Pipeline Operations (continued)

• $sort• Sort documents

• $limit• Only allow the specified number of

documents to pass• $skip

• Skip over the specified number of documents

Computed Expressions

• Available in $project operations• Prefix expression language

• $add:[“$field1”, “$field2”]• $ifNull:[“$field1”, “$field2”]• Nesting:

$add:[“$field1”, $ifNull:[“$field2”, “$field3”]]• Other functions….

• $divide, $mod, $multiply

Computed Expressions

• String functions• $toUpper, $toLower, $substr

• Date field extraction• $year, $month, $day, $hour...

• Date arithmetic• $ifNull• Ternary conditional

• Return one of two values based on a predicate

Projections

• $project can reshape results• Include or exclude fields• Computed fields

• Arithmetic expressions• Pull fields from nested documents to the top• Push fields from the top down into new virtual

documents

Unwinding

• $unwind can “stream” arrays• Array values are doled out one at time in the

context of their surrounding documents• Makes it possible to filter out elements before

returning

Grouping

• $group aggregation expressions• Define a grouping key as the _id of the result• Total grouped column values: $sum• Average grouped column values: $avg• Collect grouped column values in an array or

set: $push, $addToSet• Other functions

• $min, $max, $first, $last

Sorting

• $sort can sort documents• Sort specifications are the same as today,

e.g., $sort:{ key1: 1, key2: -1, …}

DemoDemo files are at https://gist.github.com/2036709

Usage Tips

• Use $match in a pipeline as early as possible• The query optimizer can then be used to

choose an index and avoid scanning the entire collection

• Use $sort in a pipeline as early as possible• The query optimizer can sometimes be used

to choose an index to scan instead of sorting the result

Driver Support

• Initial version is a command• For any language, build a JSON database

object, and execute the command• { aggregate : <collection>, pipeline : {…} }

• Beware of result size limit of 16MB

When is this being released?

• Now!• 2.1.0 - unstable• 2.2.0 - stable (soon)

Sharding support

• Initial release supports sharding• Mongos analyzes pipeline

• forwards operations up to $group or $sort to shards

• combines shard server results and returns them

Pipeline Operations – Future

• $out• Saves the document stream to a collection• Similar to M/R $out, but with sharded output• Functions like a tee, so that intermediate

results can be saved

Documentation, Bug Reports• http://www.mongodb.org/display/DOCS/

Aggregation+Framework

• https://jira.mongodb.org/browse/SERVER/component/10840

@mongodb

conferences, appearances, and meetupshttp://www.10gen.com/events

http://bit.ly/mongoE Facebook | Twitter | LinkedIn

http://linkd.in/joinmongo

download at mongodb.org

alvin@10gen.com

MongoDB Berlin Aggregation

Documents

MongoDB for Time Series Data Part 2: Analyzing Time Series Data Using the Aggregation Framework and Hadoop

MongoDB Aggregation Framework inside Countly

MongoDB Europe 2016 - Advanced MongoDB Aggregation Pipelines

Automate MongoDB with MongoDB Management Service

MongoDB Aggregation and Indexing · Aggregation •MongoD’s aggregation framework is modeled on the concept of data processing pipelines. •Documents enter a multi-stage pipeline

MongoDB Aggregation and Data Processingdb.prof.ninja/class10/monagg.pdf · Map-Reduce (page 10) Map-reduce is a generic multi-phase data aggregation modality for processing quanti-ties

Welcome to MongoDB Berlin

Aggregation Framework in MongoDB Overview Part-1

MongoDB Aggregation and Indexing · PDF fileMongoDB Aggregation and Indexing By Prof. B.A.Khivsara Note: The material to prepare this presentation has been taken from internet and

MongoDB Analytics: Learn Aggregation by Example - Exploratory Analytics and Visualization Using Flight Data

history using Comet Access to short term context Twitter ... · –MongoDB •Optimized for time series functioning (specific schemas + pre-aggregation) •Web server: –hapi •Exposing

Webinar: Applikationsentwicklung mit MongoDB: Teil 5: Reporting & Aggregation

MongoDB: What, why, when. Solutions Architect, MongoDB Inc. Massimo Brignoli #mongodb

Complexity of Manipulation in Premise-Based Judgment ... fileComplexity of Manipulation in Premise-Based Judgment Aggregation with Simple Formulas Robert Bredereck TU Berlin Berlin,

Aggregation Framework MongoDB Days Munich

MongoDB Internals: From Storage Engine to Aggregation Framework

MongoDB Europe 2016 - MongoDB 3.4 preview and introduction to MongoDB Atlas

MongoDB - Aggregation Pipeline · What is the Aggregation Pipeline? 5 A framework for data visualization and or manipulation using one ore multiple stages in

Mongodb - Eötvös Loránd University · 2014. 2. 26. · • akkor ajánlott ha olyat szeretnénk, amit a beépített framework nem tud •Aggregation framework(2.2.4-estől, aktuális

Joins and Other MongoDB 3.2 Aggregation Enhancements