25
www.autoscout24.com www.autoscout24.com Munich| 10.10.2011 | Sebastian Geib, Jean-Charles Thomas Introducing MongoDB in a HA multi-site Environment

Introducing MongoDB in a multi-site HA environment

Embed Size (px)

DESCRIPTION

This presentation was given by us at Mongo Munich on 10th of October 2011. It covers the introduction and mostly the durability and robustness testing of MongoDB at AutoScout24 before launching a new site.

Citation preview

Page 1: Introducing MongoDB in a multi-site HA environment

www.autoscout24.comwww.autoscout24.com

Munich| 10.10.2011 | Sebastian Geib, Jean-Charles Thomas

Introducing MongoDB in a HA multi-site Environment

Page 2: Introducing MongoDB in a multi-site HA environment

Seite 2

Jean-Charles ThomasTeam Lead Unix Systems and ApplicationsSebastian GeibDatabase Administrator

| Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas

Page 3: Introducing MongoDB in a multi-site HA environment

2 Why MongoDB?

Index

| Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas Seite 3

1 About AutoScout24

3 MongoDB architecture at AutoScout24

4 Testing MongoDB

5 MongoDB backup/restore

6 Monitoring MongoDB

7 Conclusion

Page 4: Introducing MongoDB in a multi-site HA environment

AutoScout24Who are we?

14 Countries 1.9 Mio Cars5.4 Mio Users

| Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas Seite 4

Page 5: Introducing MongoDB in a multi-site HA environment

AutoScout24 KPI

Two seperated Data Centers in Germany

>1000 Servers, 10 Loadbalancer, 25 Firewalls, 60 development servers

16 Storagesystemes with raw capacity of 800TB

11,2 Mrd. Total Requests (PI + Grabber and Bots) / Month

58 Mio. Image files for 1,9 Mio. Cars

2,1 Gbit/sec Peak Traffic

180TB Data Volume / Month

Four Broadband Provider with in Total 13GBit/sec

Numbers, Data and Facts for the overall Autoscout24 IT

Seite 5

Loadbalancer

Firewall

Loadbalancer

Backbone Router

Applicationsserver

Loadbalancer

Firewall

Loadbalancer

Backbone Router

Applicationsserver

Global Traffic Mgmt.

DC 1 DC 2

Internet

AS44355

Database server Database server

| Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas

Page 6: Introducing MongoDB in a multi-site HA environment

2 Why MongoDB

Index

| Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas Seite 6

1 About AutoScout24

3 MongoDB architecture at AutoScout24

4 Testing MongoDB

5 MongoDB backup/restore

6 Monitoring MongoDB

7 Conclusion

Page 7: Introducing MongoDB in a multi-site HA environment

Why MongoDB at AutoScout24?

Complete new application development from scratch for the Front- and Back-ends

Let‘s use what we dreamed!

Initial Database Requirements

Scale for large quantity of data

Highly available across Data Centers

Flexible database changes (avoid the DBAs as much as possible!)

MapReduce functions

Easy management

MongoDB was choosen as the best Product

Product Launch was September 2011

New Product 2011: Portal for car inspection and services

Seite 7 | Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas

http://werkstatt.autoscout24.de/

Page 8: Introducing MongoDB in a multi-site HA environment

2 Why MongoDB

Index

| Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas Seite 8

1 About AutoScout24

3 MongoDB architecture at AutoScout24

4 Testing MongoDB

5 MongoDB backup/restore

6 Monitoring MongoDB

7 Conclusion

Page 9: Introducing MongoDB in a multi-site HA environment

Mongo Architecture

Seite 8 | Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas

Page 10: Introducing MongoDB in a multi-site HA environment

Mongo Architecture

Seite 9 | Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas

Replica set across two data centers: primary and secondary.

All four nodes are actively used by the application.

Primary data center split in two fire areas.

In the primary data center, both primary and secondary nodes can assume the role of a primary automatically.

In the secondary data center, both secondary nodes can only be manually promoted to become a primary to avoid split brain situations.

Currently running MongoDB 1.8.1

All servers virtualized using Vmware ESX 4.1

2 Cores, 4 GB RAM, 100 GB HDD per server

Page 11: Introducing MongoDB in a multi-site HA environment

2 Why MongoDB

Index

| Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas Seite 10

1 About AutoScout24

3 MongoDB architecture at AutoScout24

4 Testing MongoDB

5 MongoDB backup/restore

6 Monitoring MongoDB

7 Conclusion

Page 12: Introducing MongoDB in a multi-site HA environment

MongoDB Robustness Testing

On-server tests

Seite 11 | Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas

Running out of disk space on data volume while writing

• On primary node only: crashed the whole replica set.

• On all nodes: it led to error messages in the log but no feedback in the client.

Running out of disk space on data volume while reading

• No significant impact.Removing volume while writing

• Primary switched to another host and insert and replica set were broken.

Removing volume while reading

• Open query didn’t return. Further queries were handled by other set members.

Page 13: Introducing MongoDB in a multi-site HA environment

MongoDB HA Testing 1

Replica set tests

Seite 12 | Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas

Primary node failing while writing

• 9 seconds failover time until new primary is elected without safe mode enabled.

• The failover takes 13 seconds with safe mode enabled.

• After reboot the former primary becomes a working secondary.

Page 14: Introducing MongoDB in a multi-site HA environment

MongoDB HA Testing 1

Replica set tests

Seite 13 | Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas

Secondary node failing while reading

• It takes 9 seconds for the remaining replica set to realize the node is gone.

Arbiter failing

• No impact whatsoever.• Majority remains intact and replica set is

working properly.

Page 15: Introducing MongoDB in a multi-site HA environment

MongoDB HA Testing 2

Testing Replica Set in both data centers

Seite 14 | Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas

Primary and Secondary nodes failing in main data center while writing

• Test tool crashes and cannot write anymore.

• Cluster remains without primary.• Reads are handled properly.

Arbiter failing (in both data centers)

• No impact.• Replica set still working fine due to

majority being in place.

Page 16: Introducing MongoDB in a multi-site HA environment

2 Why MongoDB

Index

| Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas Seite 15

1 About AutoScout24

3 MongoDB architecture at AutoScout24

4 Testing MongoDB

5 MongoDB backup/restore

6 Monitoring MongoDB

7 Conclusion

Page 17: Introducing MongoDB in a multi-site HA environment

MongoDB Backup

Backing up data and getting into trouble

Seite 16 | Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas

Testing and preparing backup and restore was the most boring task.

Long waiting with large sets of test data.

Different attempts:

• LVM snapshot• Working fine. Restore a bit more complicated.

• Dump• Easier to restore and to extract specific data from a restore.

For our current data volume mongodump is the best choice for us.

Locking an issue (verify your locks are released after backup or you‘ll be in trouble).

Page 18: Introducing MongoDB in a multi-site HA environment

MongoDB Restore

Testing Restores on full replica set

Seite 17 | Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas

Restore of full database (test size 70 GB)

• Cannot be restored in one transaction because secondaries become stale although oplog size was already increased a lot

• With a restore in three 40 minute chunks a restore was possible• A stale secondary could be restored within 30 minutes by removing its data

Restore of one Secondary/Passive after failure or becoming stale

• No surprises• It took 30 minutes to get it back up and running

Page 19: Introducing MongoDB in a multi-site HA environment

2 Why MongoDB

Index

| Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas Seite 18

1 About AutoScout24

3 MongoDB architecture at AutoScout24

4 Testing MongoDB

5 MongoDB backup/restore

6 Monitoring MongoDB

7 Conclusion

Page 20: Introducing MongoDB in a multi-site HA environment

MongoDB Monitoring

Monitoring 1.0 (during testing) and some pitfalls

Seite 19 | Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas

How?

• Centreon• Nagios-based• Combining Nagios and Munin plugins to have nice

charting and alerting in the same place.

What?

• Basically everything that could be relevant:• CPU, Load, Memory, Network, I/O, Disks• MongoDB Specific: Commands, Connections,

Replica Set State, Flushing, Locking, Memory Consumption, Data file size

Page 21: Introducing MongoDB in a multi-site HA environment

MongoDB Monitoring

Monitoring 1.1 (after going live)

Seite 20 | Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas

How?

• PRTG• XML-driven• Windows-based which needs to make heavy use of

Cygwin to watch Linux servers.• Integrates with AutoScout24 platform monitoring.• Cluster monitoring with checks for overall

availability and the like.

What?

• System Monitoring:• CPU, Load, Memory, Network, I/O, Swap,

Disks• MongoDB Specific: Availability, Commands,

Connections, Replica Set State, Flushing, Locking, Memory Consumption

Page 22: Introducing MongoDB in a multi-site HA environment

2 Why MongoDB

Index

| Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas Seite 21

1 About AutoScout24

3 MongoDB architecture at AutoScout24

4 Testing MongoDB

5 MongoDB backup/restore

6 Monitoring MongoDB

7 Conclusion

Page 23: Introducing MongoDB in a multi-site HA environment

MongoDB Conclusion

What‘s been important for us

Seite 22 | Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas

Conclusion:

As long as replica sets are working fine they are great. Watch their health and you won‘t get into trouble.

Overall robustness could be further improved with better error handling and reporting from the MongoDB server.

C# driver needs some further tweaking to avoid accesses on arbiters. The current release fixes this but hasn‘t been introduced in production yet.

When our primary data center is down, no primary can be elected in the secondary data center due to missing majority. This was our design choice to have better control over primary election.

Permissions need to be set in a more atomic fashion. Most of our team members are coming from an Oracle background so they are expecting a lot.

Page 24: Introducing MongoDB in a multi-site HA environment

MongoDB Outlook

What we are looking forward to

Seite 23 | Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas

Outlook:

MongoDB 2.0 looks really promising for us. We are currently waiting for the first bugfix release and will then start our testing.

Improved data center awareness a big win for us.

Replica set configuration with minority in place is really useful for our failover scenario.

Page 25: Introducing MongoDB in a multi-site HA environment

Seite 24

Great! We are looking to hire DBAs. Have a look on our homepage or contact us directly.

| Autoscout24 MongoDB HA environment| Sebastian Geib, J.C. Thomas

Questions?

Looking for a great job as DBA in one of the largest internet companies in Europe?

[email protected]@autoscout24.de