28
Consulting Engineer, MongoDB André Spiegel #MongoDBWorld The Weather of the Century Part II: High Performance

The Weather of the Century Part 2: High Performance

  • Upload
    mongodb

  • View
    646

  • Download
    5

Embed Size (px)

Citation preview

Page 1: The Weather of the Century Part 2: High Performance

Consulting Engineer, MongoDB

André Spiegel

#MongoDBWorld

The Weather of the CenturyPart II: High Performance

Page 2: The Weather of the Century Part 2: High Performance

What was the weatherwhen you were born?

Page 3: The Weather of the Century Part 2: High Performance
Page 4: The Weather of the Century Part 2: High Performance

Data Format: Raw and in MongoDB

0303725053947282013060322517+40779-073969FM-15+0048KNYC V0309999C00005030485MN0080475N5+02115+02005100975ADDAA101000095AU100001015AW1105GA1025+016765999GA2045+024385999GA3075+030485999GD11991+0167659GD22991+0243859GD33991+0304859...

{ "st" : "u725053", "ts" : ISODate("2013-06-03T22:51:00Z"), "airTemperature" : { "value" : 21.1, "quality" : "5" }, "atmosphericPressure" : { "value" : 1009.7, "quality" : "5" }}

Station Identifier(»NYC Central Park«)

Page 5: The Weather of the Century Part 2: High Performance

How Big Is It?

• 2.5 billion data points

• 4 Terabyte (1.6k per document)

• “moderately big”

Page 6: The Weather of the Century Part 2: High Performance

How to do this with MongoDB?

Page 7: The Weather of the Century Part 2: High Performance

First Deployment

• A single server with a really big disk

Application mongod

i2.8xlarge

251 GB RAM

6 TB SSD

c3.8xlarge

Page 8: The Weather of the Century Part 2: High Performance

Second Deployment

• A really big cluster where everything is in RAM

Application / mongos

...

100 x r3.2xlarge

61 GB RAM@

100 GB disk

mongod

c3.8xlarge

Page 9: The Weather of the Century Part 2: High Performance

Second Deployment

• A really big cluster where everything is in RAM

Application / mongos

...

100 x r3.2xlarge

61 GB RAM@

100 GB disk

mongod

Page 10: The Weather of the Century Part 2: High Performance

Now... how much would you pay?

..

$60,000 / yr

$700,000 / yr

Page 11: The Weather of the Century Part 2: High Performance

Use Cases

• Bulk loading– getting all data into the system

• Latency and throughput for queries– point in space-time– one station, one year– the whole world, once upon a time

• Aggregation and Exploration– warmest and coldest day ever, etc.

Page 12: The Weather of the Century Part 2: High Performance

Bulk Loading: Principles

• On the application side:– batch size– number of client threads– use unordered bulk writes

• On the server side:– Journaling off ( temporarily! )– Index later– In cluster: pre-split, no balancing

Page 13: The Weather of the Century Part 2: High Performance

Bulk Loading: Single Server

batchsize

threads

throughput

8 threads,batch size 100→ 85,000 doc/s

Page 14: The Weather of the Century Part 2: High Performance

Bulk Loading: Single Server

• Settings: 8 threads

batch size 100

• Total loading time: 10 h 20 min

• Documents per second: 70,000

• Index build time: 7 h 40 min (ts_1_st_1)

Page 15: The Weather of the Century Part 2: High Performance

Bulk Loading: Cluster144 threads,batch size 200→ 220,000 doc/s

Page 16: The Weather of the Century Part 2: High Performance

Bulk Loading: Cluster

• Shard Key: Station ID, hashed

• Settings: 10 mongos @ 144 threads

batch size 200

• Total loading time: 3 h 10 min

• Documents per second: 228,000

• Index build time: 5 min (ts_1_st_1)

Page 17: The Weather of the Century Part 2: High Performance

Queries: Point in Space-Timedb.data.find({"st" : "u747940", "ts" : ISODate("1969-07-16T12:00:00Z")})

Page 18: The Weather of the Century Part 2: High Performance

Queries: Point in Space-Time

single server cluster0

0.20.40.60.8

11.21.41.6

avg95th99th

ms

max. throughput:

40,000/s 610,000/s(10 mongos)

db.data.find({"st" : "u747940", "ts" : ISODate("1969-07-16T12:00:00Z")})

Page 19: The Weather of the Century Part 2: High Performance

Queries: One Station, One Yeardb.data.find({"st" : "u103840", "ts" : {"$gte": ISODate("1989-01-01"), "$lt" : ISODate("1990-01-01")}})

Page 20: The Weather of the Century Part 2: High Performance

single server cluster0

1000

2000

3000

4000

avg95th99th

ms

Queries: One Station, One Year

max.throughput: 20/s 430/s

(10 mongos)

targeted query

db.data.find({"st" : "u103840", "ts" : {"$gte": ISODate("1989-01-01"), "$lt" : ISODate("1990-01-01")}})

Page 21: The Weather of the Century Part 2: High Performance

Queries: The Whole World, Once Upon...db.data.find({"ts" : ISODate("2000-01-01T00:00:00Z")})

Page 22: The Weather of the Century Part 2: High Performance

single server cluster0

2000

4000

6000

8000

avg95th99th

ms

Queries: The Whole World, Once Upon...

max.throughput: 8/s

310/s(10 mongos)

scatter/gather query

db.data.find({"ts" : ISODate("2000-01-01T00:00:00Z")})

Page 23: The Weather of the Century Part 2: High Performance

Analytics and Exploration

• Analytics means ad-hoc queries for whichwe do not have an index– Find all tornados– Maximum reported temperature

• We cannot just index everything– memory– write performance

Page 24: The Weather of the Century Part 2: High Performance

Analytics: Find all Tornados

db.data.find ({ "presentWeatherObservation.condition" : "99"})

47 sCluster

1 h 28 minSingle Server

Page 25: The Weather of the Century Part 2: High Performance

Analytics: Maximum Temperature

db.data.aggregate ([ { "$match" : { "airTemperature.quality" : { "$in" : [ "1", "5" ] } } }, { "$group" : { "_id" : null, "maxTemp" : { "$max" : "$airTemperature.value" } } }])

61.8 °C = 143 °F

2 minCluster

4 h 45 minSingle Server

Page 26: The Weather of the Century Part 2: High Performance

Summary: Single Server

Pro

• Cost-effective

• Very good latency for single queries

Con

• Some operations are prohibitive:– Indexing– Table Scans

Page 27: The Weather of the Century Part 2: High Performance

Summary: Cluster

Con

• High cost

Pro

• High throughput

• Very good latency for single queries

• Scatter-gather yields significant speed-up

• Analytics are possible

..

Page 28: The Weather of the Century Part 2: High Performance

Thank you.