Upload
mongodb
View
1.016
Download
0
Embed Size (px)
Citation preview
Google confidential │ Do not distribute Google confidential │ Do not distribute
MongoDB on Google Cloud Platform
Tom GreyEMEA Head of Cloud Platform Solutions Engineering at Google
Jorge SalameroChief Evangelist at Server Density
A Technical Overview
Everybody is talking about cloud...
Google Confidential & Proprietary
Making Google look easy is hard...
Google Confidential & Proprietary
Powering google.com...
Google Confidential & Proprietary
Google is the fourth largest server manufacturer in the world after Dell, HP, and IBM according to Martin Reynolds of Garter Group
Powering google.com...
Google Confidential & Proprietary
If Google were an ISP it would be the second largest ISP by traffic on the planet according to Arbor Networks
Powering google.com...
Google Confidential & Proprietary
Customised hardware built from cheap commodity partsSoftware resilience and easy repair not hardware resilienceHorizontal layers not vertical stacksVast numbers of homogeneous servers managed at scale
Google confidential │ Do not distribute Google confidential │ Do not distribute
Externalising Google
Omega
Application Runtimes & Services● Iterate & deploy fast
● Scale to global demand● Standards compliant
Data Services● Data Intelligence● Designed for Big Data● High Performance Map
Reduce Dremel Pregel Percolator
Data Storage and Distribution● Global Resilient Architecture● Global Edge Distribution● Huge Secure Capacity GFS Colossus Spanner BigTable F1
Global Data Centre & Networks● Highly Resilient, Efficient & Performant● 3rd Largest Server Manufacturer● 2nd Largest Global Data Network
Google Research Publications referenced are available here: http://research.google.com/pubs/papers.html
Google Products etc...
App EngineCloud Endpoints
BigQuery
Cloud StorageCloud SQL
Cloud DataStore
Compute Engine
Your Applications
Externalising Google
Google confidential │ Do not distribute Google confidential │ Do not distribute
Storage
ComputeOps
Big Data
Network
The Google Cloud Platform
Google confidential │ Do not distribute Google confidential │ Do not distribute
The Google Cloud Platform
Google Cloud Storage Google Cloud SQL Google Cloud Datastore Google Cloud Bigtable
ComputeOps
Big Data
Network
Google confidential │ Do not distribute Google confidential │ Do not distribute
The Google Cloud Platform
Google Cloud Storage Google Cloud SQL Google Cloud Datastore Google Cloud Bigtable
Google Compute Engine
Google App Engine
Google Container Engine
OpsBig Data
Network
Google confidential │ Do not distribute Google confidential │ Do not distribute
The Google Cloud Platform
Google Cloud Storage Google Cloud SQL Google Cloud Datastore Google Cloud Bigtable
Google Compute Engine
Google App Engine
Google Container Engine
Google BigQuery
Google Cloud Dataflow
Google Cloud Datalab
Google Cloud Pub/Sub
Google Cloud Dataproc
Ops
Network
Google confidential │ Do not distribute Google confidential │ Do not distribute
The Google Cloud Platform
Google Cloud Storage Google Cloud SQL Google Cloud Datastore Google Cloud Bigtable
Google Compute Engine
Google App Engine
Google Container Engine
Google BigQuery
Google Cloud Dataflow
Google Cloud Datalab
Google Cloud Pub/Sub
Google Cloud Dataproc
Google Cloud Networking
Ops
Google confidential │ Do not distribute Google confidential │ Do not distribute
The Google Cloud Platform
Google Cloud Storage Google Cloud SQL Google Cloud Datastore Google Cloud Bigtable
Google Compute Engine
Google App Engine
Google Container Engine
Google BigQuery
Google Cloud Dataflow
Google Cloud Datalab
Google Cloud Pub/Sub
Google Cloud Networking
Google Cloud Monitoring
Google Cloud Logging
Google Cloud Dataproc
Google confidential │ Do not distribute Google confidential │ Do not distribute
The Google Cloud Platform
Google Cloud Storage Google Cloud SQL Google Cloud Datastore Google Cloud Bigtable
Google Compute Engine
Google App Engine
Google Container Engine
Google BigQuery
Google Cloud Dataflow
Google Cloud Datalab
Google Cloud Pub/Sub
Google Cloud Networking
Google Cloud Monitoring
Google Cloud Logging
Google Cloud Dataproc
Google confidential │ Do not distribute Google confidential │ Do not distribute
API
Project
Anatomy of a Compute Engine project
Google confidential │ Do not distribute Google confidential │ Do not distribute
CLI
UI
Code API
Project
Anatomy of a Compute Engine project
Google confidential │ Do not distribute Google confidential │ Do not distribute
CLI
UI
Code API VMVMVM
Project
Anatomy of a Compute Engine project
Google confidential │ Do not distribute Google confidential │ Do not distribute
CLI
UI
Code API
Persistent
DiskCloud
Storage
VMVMVM
Project
Anatomy of a Compute Engine project
Google confidential │ Do not distribute Google confidential │ Do not distribute
PrivateNetwork
CLI
UI
Code API
Persistent
DiskCloud
Storage
VMVMVM
Project
Anatomy of a Compute Engine project
Google confidential │ Do not distribute Google confidential │ Do not distribute
Internet
PrivateNetwork
CLI
UI
Code API
Persistent
DiskCloud
Storage
VMVMVM
Project
Anatomy of a Compute Engine project
Google confidential │ Do not distribute Google confidential │ Do not distribute
Region
Regions:Geographic location of resourcesInter-Region Latency > Zone Group Latency
Zones:Independent of other ZonesIsolated within a RegionDistribute instances across Zones to protect against single zone system
failure.
Google confidential │ Do not distribute Google confidential │ Do not distribute
Machine Types
Google Confidential and Proprietary
MongoDB on Google Compute Engine
+our experience @
Google Confidential and Proprietary
Google Confidential and Proprietary
cloud infrastructure monitoring
Monitoring as-a-Service
servers & services 30+ integration
custom plugins + API
dashboard + alerts
Google Confidential and Proprietary
How we use MongoDB?
Time series databaseSince 2009250+TB/month6000 writes/sec500,000,000 new documents per day
Ubuntu 12.04 LTSBare metal serversSSD disksPuppet Forge MongoDB module
Google Confidential and Proprietary
MongoDB on Cloud Servers
Traditionally MongoDB on VM issuesCPU steal from other guests
(no high CPU requirements itself)
Disk IO
Google Compute intelligent throttlingno more noisy neighboursprovisioned IOPSpredefined instances types
Google Confidential and Proprietary
Google Compute Engine Disks
Standard Persistent Disk (storage backed by hard disk drives)SSD PD (storage backed by solid state drives)LOCAL SSD (not persistent, obviously)
Standard: sustained performance increases with size + burst for peaks100 GB: 30 random read IOPS or 150 random write IOPS (12 MB/s for reads and 9 MB/s for writes)
10 TB: 3000 random read IOPS or 15000 random write IOPS (180 MB/s for reads and 120 MB/s for writes)
● SSD: IOPS increase faster, throughput the samemax 10000 random read IOPS at 333 GBmax 15000 random write IOPs at 500 GB
● Network egress cap: redundancy 3.3 x IOPS (2Gbps / CPU)
Google Confidential and Proprietary
Google Compute Engine Disks: example
SATA2 7200RPM ~75 IOPS / 120MB/s
IO Pattern SSD PD size required
75 random reads 250 GB
75 random writes 50 GB
120 MB/s read 1000 GB
120 MB/s writes 1333 GB
Google Confidential and Proprietary
Google Compute Engine Disks: local SSD
375 GB, up to 4 per VMmany limitations: no redundancy, no snapshots, create-time onlyNVMe or SCSI
Standard PD SSD PD Local SSD NVMe
Read IOPS per GB 0.3 30 453.3
Write IOPS per GB 1.5 30 240
Read IOPS per instance 3,000 10,000 680,000
Write IOPS per instance 15,000 15,000 360,000
Google Confidential and Proprietary
So, remember
In Google Compute, IOPS scale linearly with volume size
Google Confidential and Proprietary
Dimensioning and tuning
FS: DICARD/TRIM, lazy init (Google takes care)
Low readahead (blockdev)
IO queue depth:1 each 400-800 IOPSmax 64↑ depth ↑ IOPS but ↑ latency
1 CPU each 2000 read IOPS / 2500 write IOPS
Google Confidential and Proprietary
Recommended configuration
dbpath separated volumejournal on a different volume
likely to be big volume for IOPS, low usage~ at least 200GB ~6000 write IOPS
cannot use snapshots for backups (fsync lock or shutdown required)
directoryperdb for each db(locking is managed at database level)flexibility, performance independent
Google Confidential and Proprietary
Our tests
3 different performance scenarios:
1. no extra disks (default 10GB volume)2. dedicated dbpath (200GB volume)3. dedicated dbpath (200GB volume) + dedicated journal (200GB
volume)
2 different nodes:
4. n1-standard-2 (2 vCPUs and 7.5GB RAM)
5. n1-highmem-8 (8 vCPUs and 52GB RAM)
Google Confidential and Proprietary
The results on n1-standard-2
Google Confidential and Proprietary
The results on n1-highmem-8
Google Confidential and Proprietary
Conclusions
No much difference until you start to acknowledge writesDifference between instances is small, real-life test needed
Validated the recommendations, again:separate your dbpathseparate your journal
Full details:https://blog.serverdensity.com/mongodb-on-google-compute-engine-tips-and-benchmarks/
Google Confidential and Proprietary
How we use MongoDB on Google Compute?
MongoDB Cloud Managerwith backup verification in GC
real-time offsite backups(only a few seconds behind)
replica node for each replica setcopy of every write operationsustained traffic 42Mbpspoint-in-time restores
Google Confidential and Proprietary
How we use MongoDB on Google Compute?
Off-site backups into GC
API to trigger restore job: get a tarballDownloaded to Google Cloud storage
versioning includedregional buckets for redundancy
USA + EU
Google Confidential and Proprietary
How we use MongoDB on Google Compute?
Restore on MongoDB GC
Launch instance with SSD PDgsutil to download the tarballuntar the backupinstall MongoDB
● COMPARE with PRODUCTION● whole process: ~10 min, twice a day● Python, Buildbot, notifications on HipChat
Google Confidential and Proprietary
Thanks!
Q&A
Confidential | Do not distribute