MongoUK - Approaching 1 billion documents with MongoDB1 Billion Documents

Approaching 1 Billion Documents in MongoDB

David Myttondavid@boxedice.com / @davidmytton1/30

Server Density Monitoring

Processing Database UI

www.serverdensity.com2/30

Cache / Data Store

Postback

checksLatest checksHistorical

db.stats()

Documents 937,393,315

Collections 27,566

Indexes 45,277

Stored data 638GB

Inserts 5000-8000/s

As of 17th Jun 2010.4/30

13 months ago

Why we moved: http://bit.ly/mysqltomongo5/30

Initial Setup

MasterDC1

8GB RAM

SlaveDC2

8GB RAM

Replication

Vertical Scaling

MasterDC1

72GB RAM

SlaveDC2

8GB RAM

Replication

Tip #1

Keep your indexes in memory at all times.

db.stats()

i/o not an issue

Data is flushed to disk every 60s.

db.runCommand({fsync:1});

--syncdelay [60]

Tip #2

Sharding solves everything

Manual Partitioning

Master ADC1

16GB RAM

Slave ADC2

16GB RAM

Replication

Master BDC1

16GB RAM

Slave BDC2

16GB RAM

Replication

Sustained Traffic

Avg out: 2.4Mbit/s

Avg in: 3.8Mbit/s

Master

Avg out: 4.0Mbit/s

Avg in: 111.2Kbit/s

Database vs collections

• Many databases = many data files (small but quickly get large).

• Many collections = watch namespace limit.

Namespaces = Number of collections + number of indexes

Tip #3

Monitor the 24,000 namespace limit.

Using Server Density

Console

db.system.namespaces.count()

Replica Pairs = Failover

Master ADC1

16GB RAM

Slave ADC2

16GB RAM

Replica Pair

Master BDC1

16GB RAM

Slave BDC2

16GB RAM

Replica Pair

Tip #4

Pre-provision your oplog files.

for i in {0..40} do echo $i head -c 2146435072 /dev/zero > local.$i done

A shell script to generate 75GB oplog files

Tip #5

Expect slower performance during initial replica sync.

Tip #6

You can rotate your log files from the console.

Rotating your log files

db.runCommand("logRotate")

Tip #7

Index creation blocks by default. Use background

indexing if necessary.

MongoDB Manual: http://bit.ly/mongobgindex25/30

Tip #8

Increase your OS file descriptor limit + use

persistent connections.

Too many open files!

mongo hard nofile 10000mongo soft nofile 10000

/etc/security/limits.conf

UsePAM yes

/etc/ssh/sshd_config

user type limit

Space is not reused

Data + indexes 551GB

Actual disk usage 638GB

Fixed in

1.1.4 1.3.x 1.5.0 1.5.1 1.5.2 1.5.3 1.5.4?

JIRA: SERVER-36628/30

Summary1. Keep indexes in memory.

2. Data is flushed to disk every 60s.

3. Monitor the 24k namespace limit.

4. Pre-provision oplog files.

5. Expect slower performance on replica sync.

6. Rotate logs from the console.

7. Index creation blocks by default.

8. OS file descriptor limit + persistent connections.29/30

David Myttondavid@boxedice.com / @davidmytton

Slides

blog.boxedice.com/mongodb

MongoUK - Approaching 1 billion documents with MongoDB1 Billion Documents

Technology

MongoUK 2011 - Rplacing RabbitMQ with MongoDB

State Strategies to Manage Budget Shortfalls: Revenue Actions › documents › fiscal › glm11eckl.pdf · 2011-02-28 · – FY 2012: $36.818 billion – FY 2013: $37.554 billion

NLP on a Billion Documents: Scalable Machine Learning with Apache Spark

Vaughan. Business Link. Documents/Vaughan_Busi… · Billion-dollar private-sector investment will be a catalyst for economic activity. $0 from the City $1.2-$1.5 billion private-sector

Final Monthly Treasury StatementTotal Receipts $344 Billion Total Outlays $224 Billion Surplus $119 Billion Other $93 Billion Interest on Debt $-7 Billion Medicare $26 Billion Social

The Real World Of the Internet of Things · 1992 1996 2000 2004 2008 2012 2016 2020 t 50.1 billion 34.8 billion 22.9 billion 14.4 billion 8.7 billion 0.1 0.5 billion billion IoT primarily

PowerPoint プレゼンテーション › enjapanhp › wp...2009/02/13 · －8－ ¥21.32 billion ¥15.12 billion Sales Expenses ¥7.57 billion ¥5.9 billion ¥15.38 billion ¥22.68

Africa Bottom Billion Fastest Billion

Implementing WSH 2018 for the Construction Sector in Singapore/media/mom/documents/press... · $10 billion. This overall contract value increased to $21.0 billion in 20092, of which

Meredith to Acquire Time Inc. Creates Premier … › media › documents › ...• $4.8 billion in calendar 2016 revenue, including $2.7 billion of advertising revenue • Approximately

Prime Vendor Contracting: Lessons Learned Sponsored Documents/UMD Repor… · The enacted defense budget for FY2010 ($660.4 billion, with an additional $33 billion supplemental appropriation)

ECONOMIC IMPACT STUDY › Resources › Documents › 2014 Documents...Economic Impact Study is presented by: THE INDUSTRY HAS AN annual total economic impact of $13 BILLION PROCESSORS

Powerful. Flexible. Secure. · 2020-03-02 · medical care or employment and obtain government documents such as passports. $24.7 BILLION $14.7 BILLION Financial Losses due to Identity

Follow @AIDSadvocacy | #2018USCA...Non-Defense Spending Caps Pre-BBA $519 billion $516 billion $529 billion Post-BBA $579 billion $597 billion Change +$63 billion +$68 billion Defense

draft BE presentation MA2 - remember.irena.orgremember.irena.org/sites/Documents/Shared Documents/Legislators... · 2014 GDP: 400 billion ... draft BE presentation MA2 Author: Marina

Folder Citation: Collection: Office of Staff Secretary; …...$6.4 billion $5.7 billion $5.0 billion $1.3 billion $5.5 billion $0.4 billion $0.7 billion $1.3 billion $26.3 billion

Centrelink Annual Report 2001-2002: - United Nationsunpan1.un.org/intradoc/groups/public/documents/... · Mainframe online transactions Approx. 3.7 billion a year Approx. 3.4 billion

White Paper: The Deep Web: Surfacing Hidden Value · The deep Web contains nearly 550 billion individual documents compared to the one billion of the surface Web. More than 200,000

U:04,-CV, 5b1-- 02 (, - ($Ksecurities.stanford.edu/filings-documents/1039/DRI_01/200844_o01c... · billion, compared to $5.35 billion in the prior year. This 4.0% increase reflects

Building web applications with PyMongo and Django - Mongouk