21
1 Operational Intelligence with MongoDB Edouard Servan-Schreiber, Ph.D. Director for Solution Architecture October 11 th 2012

Operational Intelligence with MongoDB Webinar

  • Upload
    mongodb

  • View
    3.757

  • Download
    0

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Operational Intelligence with MongoDB Webinar

1

Operational Intelligence with MongoDB

Edouard Servan-Schreiber, Ph.D.Director for Solution Architecture

October 11th 2012

Page 2: Operational Intelligence with MongoDB Webinar

2

The goal

Real Time Analytics Engine

Data SourceData

SourceData Source

Page 3: Operational Intelligence with MongoDB Webinar

3

Sample Customers

Page 4: Operational Intelligence with MongoDB Webinar

4

Solution goals

• Lots of data sources• Lots of data from each source

High write volume

• Users can drill down into dataDynamic queries

• Lots of clients• High request rate

Fast queries

• How long before an event appears in a report?

Minimize delay between collection &

query

Page 5: Operational Intelligence with MongoDB Webinar

5

Systems Architecture

Data Sources

Asynchronous writes

Upserts avoid unnecessary reads

Writes buffered in RAM and flushed to

disk in bulk

Data SourcesData

SourcesData Sources

Spread writes over multiple shards

Page 6: Operational Intelligence with MongoDB Webinar

6

Sample data

Original Event Data

127.0.0.1 - frank [10/Oct/2000:13:55:36 -0700] "GET /apache_pb.gif HTTP/1.0" 200 2326 “http://www.example.com/start.html" "Mozilla/4.08 [en] (Win98; I ;Nav)”

As BSON doc = { _id: ObjectId('4f442120eb03305789000000'), host: "127.0.0.1", time: ISODate("2000-10-10T20:55:36Z"), path: "/apache_pb.gif", referer: “http://www.example.com/start.html", user_agent: "Mozilla/4.08 [en] (Win98; I ;Nav)”}

Insert to MongoDB

db.logs.insert( doc )

Page 7: Operational Intelligence with MongoDB Webinar

7

Dynamic Queries

Find all logs for a URL

db.logs.find( { ‘path’ : ‘/index.html’ } )

Find all logs for a time range

db.logs.find( { ‘time’ : { ‘$gte’ : new Date(2012,0), ‘$lt’ : new Date(2012,1) } } );

Find all logs for a host over a range of dates

db.logs.find( { ‘host’ : ‘127.0.0.1’, ‘time’ : { ‘$gte’ : new Date(2012,0), ‘$lt’ : new Date(2012, 1) } } );

Page 8: Operational Intelligence with MongoDB Webinar

8

Three Approaches

• Aggregation Framework for on-demand rollups

• Map/Reduce Framework for background rollups

• Pre-Aggregation for real-time reporting

Page 9: Operational Intelligence with MongoDB Webinar

9

Aggregation Framework(New in version 2.2!)

Requests per day by URL

db.logs.aggregate( [ { '$match': { 'time': { '$gte': new Date(2012,0), '$lt': new Date(2012,1) } } }, { '$project': { 'path': 1, 'date': { 'y': { '$year': '$time' }, 'm': { '$month': '$time' }, 'd': { '$dayOfMonth': '$time' } } } }, { '$group': { '_id': { 'p':'$path’, 'y': '$date.y', 'm': '$date.m', 'd': '$date.d' }, 'hits': { '$sum': 1 } } },])

$project $match $limit $skip

$unwind $group $sort

Page 10: Operational Intelligence with MongoDB Webinar

10

Aggregation Framework

{ ‘ok’: 1, ‘result’: [ { '_id': {'p':’/index.html’,'y': 2012,'m': 1,'d': 1 },'hits’: 124 } }, { '_id': {'p':’/index.html’,'y': 2012,'m': 1,'d': 2 },'hits’: 245} }, { '_id': {'p':’/index.html’,'y': 2012,'m': 1,'d': 3 },'hits’: 322} }, { '_id': {'p':’/index.html’,'y': 2012,'m': 1,'d': 4 },'hits’: 175} }, { '_id': {'p':’/index.html’,'y': 2012,'m': 1,'d': 5 },'hits’: 94} } ]}

Page 11: Operational Intelligence with MongoDB Webinar

11

Map Reduce – Map Phase

Generate hourly rollups from log data

var map = function() { var key = { p: this.path, d: new Date( this.ts.getFullYear(), this.ts.getMonth(), this.ts.getDate(), this.ts.getHours(), 0, 0, 0) }; emit( key, { hits: 1 } );}

Page 12: Operational Intelligence with MongoDB Webinar

12

Map Reduce – Reduce Phase

Generate hourly rollups from log data

var reduce = function(key, values) { var r = { hits: 0 }; values.forEach(function(v) { r.hits += v.hits; }); return r; })

Page 13: Operational Intelligence with MongoDB Webinar

13

Map Reduce

Generate hourly rollups from log data

cutoff = new Date(2012,0,1)

query = { 'ts': { '$gt': last_run, '$lt': cutoff } }

db.logs.mapReduce( map, reduce, { ‘query’: query, ‘out’: { ‘reduce’ : ‘stats.hourly’ } } )

last_run = cutoff

Page 14: Operational Intelligence with MongoDB Webinar

14

Map Reduce Output

> db.stats.hourly.find() { '_id': {'p':’/index.html’,’d’:ISODate(“2012-0-1 00:00:00”) }, ’value': { ’hits’: 124 } }, { '_id': {'p':’/index.html’,’d’:ISODate(“2012-0-1 01:00:00”) }, ’value': { ’hits’: 245} }, { '_id': {'p':’/index.html’,’d’:ISODate(“2012-0-1 02:00:00”) }, ’value': { ’hits’: 322} }, { '_id': {'p':’/index.html’,’d’:ISODate(“2012-0-1 03:00:00”) }, ’value': { ’hits’: 175} }, ... More ...

Page 15: Operational Intelligence with MongoDB Webinar

15

Chained Map Reduce

Collection 1 : Raw Logs

Map Reduce

Collection 2: Hourly Stats

Collection 3: Daily Stats

Map Reduce

Runs every hour

Runs every day

Page 16: Operational Intelligence with MongoDB Webinar

16

Pre-Aggregation

Data for URL / Date

{ _id: "20101010/site-1/apache_pb.gif", metadata: { date: ISODate("2000-10-10T00:00:00Z"), site: "site-1", page: "/apache_pb.gif" }, daily: 5468426, hourly: { "0": 227850, "1": 210231, ... "23": 20457 }, minute: { "0": 3612, "1": 3241, ... "1439": 2819 } }

WARNING: arrays are not random accessed in MongoDB….

Page 17: Operational Intelligence with MongoDB Webinar

17

Pre-Aggregation

Data for URL / Date

{ _id: "20101010/site-1/apache_pb.gif", metadata: { date: ISODate("2000-10-10T00:00:00Z"), site: "site-1", page: "/apache_pb.gif" }, daily: 5468426, hourly: { "0": { “0” : 3612, “1” : 3241 … “59” : 2130 } "1": { … } …. “23”: { ….} }

Page 18: Operational Intelligence with MongoDB Webinar

18

Pre-Aggregation

Data for URL / Date

id_daily = dt_utc.strftime('%Y%m%d/') + site + pagehour = dt_utc.hourminute = dt_utc.minute

# Get a datetime that only includes date infod = datetime.combine(dt_utc.date(), time.min)query = { '_id': id_daily, 'metadata': { 'date': d, 'site': site, 'page': page } }update = { '$inc': { 'hourly.%d' % (hour,): 1, 'minute.%d.%d' % (hour,minute): 1 } }

db.stats.daily.update(query, update, upsert=True)

Page 19: Operational Intelligence with MongoDB Webinar

19

Reporting

Javascript Charting

Page 20: Operational Intelligence with MongoDB Webinar

20

Apache Hadoop

Log Aggregation with MongoDB as

sink

More complex aggregations or integration with

tools like Mahout

Page 21: Operational Intelligence with MongoDB Webinar

21

Q&A