29
©Yottaa Confidential. Do Not Distribute. a better internet experience Scaling Rails @ Yottaa September 20 th 2010 Jared Rosoff @forjared [email protected]

Realtime Analytics with MongoDB

Embed Size (px)

DESCRIPTION

My talk from Mongo Boston (9/20/2010) about how we use MongoDB to scale Rails at Yottaa.

Citation preview

Page 1: Realtime Analytics with MongoDB

©Yottaa Confidential. Do Not Distribute.

a better internet experience

Scaling Rails @ Yottaa

September 20th 2010

Jared Rosoff@forjared

[email protected]

Page 2: Realtime Analytics with MongoDB

From zero to humongous

2

• About our application • How we chose MongoDB • How we use MongoDB

Page 3: Realtime Analytics with MongoDB

About our application

3

• We collect lots of data– 6000+ URLs– 300 samples per URL per day– Some samples are >1MB (firebug) – Missing a sample isn’t a bit deal

• We visualize data in real-time– No delay when showing data– “On-Demand” samples – The “check now” button

Page 4: Realtime Analytics with MongoDB

The Yottaa Network

4

Page 5: Realtime Analytics with MongoDB

How we chose mongo

5

Page 6: Realtime Analytics with MongoDB

©Yottaa Confidential. Do Not Distribute.

Requirements

• Our data set is going to grow very quickly – Scalable by default

• We have a very small team– Focus on application, not infrastructure

• We are a startup – Requirements change hourly

• Operations– We’re 100% in the cloud

6

Page 7: Realtime Analytics with MongoDB

Rails default architecture

MySQL

Data Source Collection Server

User Reporting Server

“Just” a Rails App

Performance Bottleneck: Too much load

Page 8: Realtime Analytics with MongoDB

Let’s add replication!

MySQLMasterMySQL

MasterMySQLMaster

MySQLMaster

Replication

Data Source Collection Server

User Reporting Server

Off the shelf!Scalable Reads!

Performance Bottleneck: Still can’t scale

writes

Page 9: Realtime Analytics with MongoDB

What about sharding?

MySQLMasterMySQL

MasterMySQLMaster

Data Source Collection Server

User Reporting Server

Shar

ding

Shar

ding

Scalable Writes!

Development Bottleneck:

Need to write custom code

Page 10: Realtime Analytics with MongoDB

Key Value stores to the rescue?

MySQLMasterMySQL

MasterCassandra

orVoldemort

Data Source Collection Server

User Reporting Server

Scalable Writes!

Development Bottleneck:

Reporting is limited / hard

Page 11: Realtime Analytics with MongoDB

Can I Hadoop my way out of this?

MySQLMasterMySQL

MasterCassandra

orVoldemort

Data Source Collection Server

User Reporting Server

Hadoop

MySQLMasterMySQL

MasterMySQLSlave

MySQLMaster

Scalable Writes!

Flexible Reports!

“Just” a Rails App

Development Bottleneck:

Too many systems!

Page 12: Realtime Analytics with MongoDB

MongoDB!

MySQLMasterMySQL

MasterMongoDB

Data Source Collection Server

User Reporting Server

Scalable Writes!

“Just” a rails app

Flexible Reporting!

Page 13: Realtime Analytics with MongoDB

MongoD

MongoD

MongoD

Data Source

App Server

CollectionN

ginx

Pass

enge

r

Mon

gos

ReportingUser

Sharding!

High ConcurrencyScale-Out

LoadBalancer

Page 14: Realtime Analytics with MongoDB

Sharding is critical

14

• Distribute write load across servers• Decentralize data storage

Scale out!

Page 15: Realtime Analytics with MongoDB

Before Sharding

15

AppServer

App Server

App Server

Need higher write volume

Buy a bigger database

Need more storage volume

Buy a bigger database

Page 16: Realtime Analytics with MongoDB

After Sharding

16

AppServer

App Server

App Server

Need higher write volume

Add more servers

Need more storage volume

Add more servers

Page 17: Realtime Analytics with MongoDB

Scale out is the new scale up

17

AppServer

App Server

App Server

Page 18: Realtime Analytics with MongoDB

How we’re using MongoDB

18

Page 19: Realtime Analytics with MongoDB

Our Data Model

19

• Document per URL we track – Meta-data– Summary Data– Most recent measurements

• Document per URL per Day– Detailed metrics– Pre-aggregated data

Page 20: Realtime Analytics with MongoDB

Thinking in rows

20

URL

Location Connect First Byte

Last Byte Timestamp{ url: ‘www.google.com’, location: “SFO” connect: 23, first_byte: 123, last_byte: 245, timestamp: 1234 }

{ url: ‘www.google.com’, location: “NYC” connect: 23, first_byte: 123, last_byte: 245, timestamp: 2345 }

Page 21: Realtime Analytics with MongoDB

Thinking in rows

21

URL

Location Connect First Byte

Last Byte Timestamp

What was the average connect time for google on friday?

From SFO?From NYC?Between 1AM-2AM?

Page 22: Realtime Analytics with MongoDB

Thinking in rows

22

URL

Location Connect First Byte

Last Byte Timestamp

AVG

AVG

AVG

Day 1

Day 2

Day 3

Result

Up to 100’s of samples per

URL per day!!

30 days average query

range

An “average” chart had to hit

600 rows

Page 23: Realtime Analytics with MongoDB

Thinking in Documents

23

URL www.google.com

Day 9/20/2010

Last Byte

Sum 2312

Count 12

Locations

Location SFO

Sum 1200

Count 5

Location NYC

Sum 1112

Count 7

This document contains all data for www.google.com collected during 9/20/2010

This tells us the average value for this metric for this url / time period

Average value from SFO

Average value from NYC

Page 24: Realtime Analytics with MongoDB

Storing a sample

24

Create the document if it doesn’t already exist

Update the location specific value

Update the aggregate value

Which document we’re updating

Atomically update the document

db.metrics.dailies.update( { url: ‘www.google.com’,

day: ‘9/20/2010’ }, { ‘$inc’: { ‘connect.sum’:1234,

‘connect.count’:1, ‘connect.sfo.sum’:1234, ‘connect.sfo.count’:1 } }, { upsert: true } );

Page 25: Realtime Analytics with MongoDB

Putting it together

25

{ url: ‘www.google.com’, location: “SFO” connect: 23, first_byte: 123, last_byte: 245, timestamp: 1234 }

Atomically update the daily

data

1

Atomically update the

weekly data

2

Atomically update the

monthly data

3

Page 26: Realtime Analytics with MongoDB

Drawing connect time graph

26

We just want connect time data

Data for google

The range of dates for the chart

Compound index to make this query fast

db.metrics.dailies.ensureIndex({url:1,day:-1})

db.metrics.dailies.find( { url: ‘www.google.com’,

day: { “$gte”: ‘9/1/2010’, “$lte”:’9/20/2010’ },

{ ‘connect’:true});

Page 27: Realtime Analytics with MongoDB

More efficient charts

27

URL Day <data>

AVG

AVG

AVG

Day 1

Day 2

Day 3

Result

1 Document per URL per

Day

30 days == 30 documents

Average chart hits 30

documents.

20x fewer

Page 28: Realtime Analytics with MongoDB

Real Time Updates

28

URL Most Recent DataSingle query to fetch all

metric data for a URL

Fast enough that browser can poll

constantly for updated data without impacting

server

Page 29: Realtime Analytics with MongoDB

Final thoughts

• Mongo has been a great choice • 80gb of data and counting

– Majorly compressed after moving from table to document oriented data model

• 100’s of updates per second 24x7• Not using Sharding in production yet,

but planning on it soon • You are using replication, right?

29