35
ToroDB: All your MongoDB data are belong to SQL Álvaro Hernández <[email protected]>

ToroDB: All your MongoDB data are belong to SQL

Embed Size (px)

Citation preview

Page 1: ToroDB: All your MongoDB data are belong to SQL

ToroDB: All your MongoDB data are belong to SQL

Álvaro Hernández <[email protected]>

Page 2: ToroDB: All your MongoDB data are belong to SQL

ToroDB

About $self and 8Kdata

Page 3: ToroDB: All your MongoDB data are belong to SQL

ToroDB

What do Stripe and Buffer have in common?

Use MongoDB for high load OLTP

Successful companies

They are Web Scale

Page 4: ToroDB: All your MongoDB data are belong to SQL

ToroDB

What else do they have in common?

Both ETL to a RDBMS

for Analytics

Page 5: ToroDB: All your MongoDB data are belong to SQL

ToroDB

MongoDB analytics performance: Github dataset

{  "id": "2489368070", "type": "PushEvent", "public": true, "created_at": "2015-01-01T00:00:00Z",  "actor": {    "id": 9152315,    "login": "davidjhulse",    "gravatar_id": "",    "url": "https://api.github.com/users/davidjhulse",    "avatar_url": "https://avatars.githubusercontent.com/u/9152315?"  },  "repo": {    "id": 28635890,    "name": "davidjhulse/davesbingrewardsbot",    "url": "https://api.github.com/repos/davidjhulse/davesbingrewardsbot"  },

Page 6: ToroDB: All your MongoDB data are belong to SQL

ToroDB

MongoDB analytics performance: Github dataset

"payload": {    "push_id": 536740396, "size": 1, "distinct_size": 1, "ref": "refs/heads/master",    "head": "a9b22a6d80c1e0bb49c1cf75a3c075b642c28f81",    "before": "86ffa724b4d70fce46e760f8cc080f5ec3d7d85f",    "commits": [      {        "sha": "a9b22a6d80c1e0bb49c1cf75a3c075b642c28f81",        "author": {          "email": "[email protected]",          "name": "davidjhulse"        },        "message": "Altered BingBot.jar\n\nFixed issue with multiple account",        "distinct": true, "url": "https://api.github.com/repos/..."      }    ]  } }

Page 7: ToroDB: All your MongoDB data are belong to SQL

ToroDB

MongoDB analytics performance: Github dataset

Mill

iseco

nds

0

300.000

600.000

900.000

1.200.000

1.500.000

1.800.000

2.100.000

2.400.000

2.700.000

Github dataset size (GB)

1 GB 10 GB 100 GBMongoDB

db.github.find({   type: 'PushEvent',   'actor.login': 'davidjhulse' });

Page 8: ToroDB: All your MongoDB data are belong to SQL

ToroDB

Why is it so slow?

• For most aggregated queries, the whole collection is scanned for every query.

• For every document, many keys and offsets are parsed and computed to find the possible keys. Worst case: all keys are scanned.

• Different documents with different in the same collection. Mixing apples and oranges!

Page 9: ToroDB: All your MongoDB data are belong to SQL

ToroDB

How MongoDB performs an aggregated query

Page 10: ToroDB: All your MongoDB data are belong to SQL

What if we had all our data in a RDBMS?

Page 11: ToroDB: All your MongoDB data are belong to SQL

ToroDB

How a RDBMS performs an aggregate query

Page 12: ToroDB: All your MongoDB data are belong to SQL

ToroDB

How a RDBMS performs an aggregate query

Page 13: ToroDB: All your MongoDB data are belong to SQL

ToroDB

Read I/O required to answer the query db.githubarchive.aggregate([ { $group: { _id: '$actor.login', events: { $sum: 1 } } }, { $sort: { events: -1 }}, { $limit: 10 } ])

SELECT count(*), login FROM actor GROUP BY login ORDER BY 1 DESC LIMIT 10;

Page 14: ToroDB: All your MongoDB data are belong to SQL

ToroDB

Read I/O required to answer the query (“iotop -o -a”)

Github Archive: top 10 actors (1,4GB dataset)

Disk

Rea

d (M

B)

0

125

250

375

500

MongoDB PostgreSQL

87,93

536,36MongoDB storageSize: 536.37 MB

MongoDB size: 1410.35 MB

Exactly 100% of the storageSize!

Page 15: ToroDB: All your MongoDB data are belong to SQL

ToroDB

What if we use a columnar store?

Page 16: ToroDB: All your MongoDB data are belong to SQL

ToroDB

And if we use a columnar store? (“iotop -o -a”)

Github Archive: top 10 actors (1,4GB dataset)

Disk

Rea

d (M

B)

0

125

250

375

500

MongoDB PostgreSQL PostgreSQL + cstore

6,587,93

536,36compressed, only 1 column read

Page 17: ToroDB: All your MongoDB data are belong to SQL

ToroDB

What about query performance?

Mill

iseco

nds

0

43.750

87.500

131.250

175.000

Github dataset size (GB)

1 GB 10 GB 100 GB

Mill

iseco

nds

0

300.000

600.000

900.000

1.200.000

1.500.000

1.800.000

2.100.000

2.400.000

2.700.000

Github dataset size (GB)

1 GB 10 GB 100 GB

PostgreSQLMongoDB

Page 18: ToroDB: All your MongoDB data are belong to SQL

ToroDB

What if we take this to the extreme

• What if we do an aggregate query for a 1:N relationship which is empty 99% of the time?

• MongoDB will still scan the whole collection

• A RDBMS will only scan an almost empty table

Page 19: ToroDB: All your MongoDB data are belong to SQL

ToroDB

Show me another benchmark! YASP dataset

 db.getCollection('player_matches').aggregate([   { $match: { 'item_uses.key': 'quelling_blade' } },   { $group: { '_id': null, avg: { $avg: '$level' } } } ]);

SELECT AVG(matches.level) FROM uses  INNER JOIN matches    ON uses.did = matches.did WHERE uses.key = 'quelling_blade';

Page 20: ToroDB: All your MongoDB data are belong to SQL

ToroDB

Show me another benchmark! YASP dataset

YASP dataset. Avg level of users using 'quelling_blade' object

exec

utio

n tim

e (m

s)

0K

14K

28K

42K

56K

71K

85K

99K

113K

127K

MongoDB PostgreSQL

63,36

131.678,49

That's more than 2000x faster!

Page 21: ToroDB: All your MongoDB data are belong to SQL

ToroDB

So what about indexes?

• For most aggregated queries, indexes will not be used!

• You cannot plan in advance all the queries that will require an index.

•Worst, but common use case: its the users who craft the queries they want to make.

Page 22: ToroDB: All your MongoDB data are belong to SQL

ToroDB

Show me another benchmark! YASP dataset

YASP dataset. Avg level of users using 'quelling_blade' object

exec

utio

n tim

e (m

s)

0

8

16

24

32

MongoDB (Indexed) PostgreSQL (Indexed)

19,65

30,85

Page 23: ToroDB: All your MongoDB data are belong to SQL

ToroDB

I’m sold! Gimme SQL

• Significant performance improvement!

• No connectors required: native SQL!

• Design DDL, implement ETL

• Not real-time, not HA

Maybe not that fast…

Page 24: ToroDB: All your MongoDB data are belong to SQL
Page 25: ToroDB: All your MongoDB data are belong to SQL

ToroDB

ToroDB: MongoDB to relational

Page 26: ToroDB: All your MongoDB data are belong to SQL

ToroDB

ToroDB for MongoDB Analytics

• No need to design the SQL schema:

• ToroDB automagically does it for you!

• Real-time: insert in MongoDB, automatically shows up in ToroDB. Even if tables need created.

• Native SQL: your data ends in a RDBMS like PostgreSQL

Page 27: ToroDB: All your MongoDB data are belong to SQL

ToroDB

ToroDB works as a secondary on a replica set

ToroD

this is your SQL replica!

Page 28: ToroDB: All your MongoDB data are belong to SQL

ToroDB

ToroDB performance: yet another example :)

Github Archive 10GB dataset

Seco

nds

06

1218

2430

3642

4854

60

MongoDB ToroDB on PostgreSQLToroDB on Greenplum

2,467,901

58,519

Page 29: ToroDB: All your MongoDB data are belong to SQL

ToroDB

ToroDB main characteristics

• Works as a MongoDB secondary node

•No need to run drdl or define schema. Accepts any input document, even with type conflicts

•Query w/ native SQL (PostgreSQL as of today)

•Open source!

Page 30: ToroDB: All your MongoDB data are belong to SQL

ToroDB

ToroDB vs MongoDB BI Connector

ToroDB Mongo BI CLicense Open Source Proprietary

Query Language PostgreSQLPostgreSQL (v1) or reduced SQL set (v2)

Performance 100x faster than MongoDB

100x slower plus big (v1) or smaller (v2) connector overhead

NoSQL to SQL transformation Once (insert time) Many (per query)

Distributed analyticsWith Greenplum or CitusDB No

Columnar store and compression

Yes No

Page 31: ToroDB: All your MongoDB data are belong to SQL

ToroDB

ToroDB main use cases

Native SQL BI Connector

Native SQL BI

Connector

Data Integration Platform: SQL and NoSQL apps in the

same RDBMS

Live MongoDB

to RDBMS migration

Apps: Write data with

Mongo API, query with

SQL!

Page 32: ToroDB: All your MongoDB data are belong to SQL

ToroDB

Last but definitely not least

BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ READ ONLY;

Consistent reads!

Page 33: ToroDB: All your MongoDB data are belong to SQL

ToroDB

ToroDB: coming to Github soon!

Page 34: ToroDB: All your MongoDB data are belong to SQL

ToroDB

Rate My Session!

Page 35: ToroDB: All your MongoDB data are belong to SQL

Let’s Talk!

Edificio 4B - Loft 33 Avda. Fuencarral, 44

Campus Empresarial Tribeca 28108 Alcobendas, Madrid (SPAIN)

(+34) 91 867 55 54