45
Andrey Zaychikov, Solutions Architect, EMEA 21.02.2017 Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS

Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

Embed Size (px)

Citation preview

Page 1: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

Andrey Zaychikov, Solutions Architect, EMEA21.02.2017

Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS

Page 2: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

Typical algorithm of choosing right options for NoSQL DB deployments

Page 3: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

What we will cover today?

Page 4: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

How these databases differs?

DynamoDB

Cloud-based Self-managed (EC2)Key-value Document-oriented

Graph

Page 5: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

Cassandra

Page 6: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

What is it?• Dynamo model database

+ CQL• Horizontally scalable• No single point of failure • Data is immutable and

stored in collections• JVM based• Lot of management work

is done in a background• Rely on gossip protocol

Page 7: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

Main concerns of the customers

Schema & usage pattern

Geo distribution Background routines &

specific optimizations

Page 8: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

How does it work?

Page 9: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

Choosing instance & storage capacity: 80% Writes

• For most of the workloads (especially with 50/50 RW ratio) M4s with EBS is the best option

• For write-heavy workloads with high RPS requirements C4 with EBS should be considered

• When the performance requirements are high and the size of the dataset is relatively small you can use I2s with ephemeral storage

Page 10: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

Choosing instance & storage capacity: 80% Reads

• For most of the workloads M4s with EBS is the good choice

• When the performance requirements are high and the size of the dataset is relatively small you can use I2s with ephemeral storage

• When performance requirements are high and dataset is large the best option will be to use R4s with different EBS flavors

Page 11: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

FAQ: 2AZ cluster architecture

Hint: RetryPolicy for Cassandra Driver

Page 12: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

FAQ

Cassandra backup / restore

Auto Scaling of Cassandra

clusters

Cassandra in Containers

- Restore procedure for the whole cluster can be complicated

- Restore for single node can be done

with EBS Snapshots

- Auto-scaling puts unpredictable

pressure on the cluster

- Scaling up is simple, but scaling down is

extremely complicated

- Makes sense only for test / dev

environments

Page 13: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

FAQ: Troubleshooting

JVM Caching Compaction

Disks I/O CPU Memory

Page 14: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

MongoDB

Page 15: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

What is it?• Document-oriented

database• Horizontally scalable• HA is based on

master / slave replication

• Geo-distributed• Lots of management

work is done in a background

Page 16: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

Main concerns of the customers

Schema & usage pattern

Geo distribution and performance

Data consistency & partition tolerance

Page 17: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

How does it work?

Page 18: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

Choosing instance & storage• MongoDB needs a lot of memory

and really fast disks so unless your dataset is quite big the best option will be either R3 or I2 (depending on the size of the dataset)

• If the dataset is big you should consider to use R4 with different EBS flavors

• For hidden nodes you use M4 with EBS as EBS snapshots would help you to backup data easily

Page 19: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

FAQ: 2AZ cluster architecture

Best option: Replica Set in one AZ and Hidden member in another one.

Page 20: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

FAQ

MongoDB backup / restore

Querying large amount of data

MongoDB consistency

- Hidden nodes with EBS and EBS

snapshots backups

- Design schema properly

- Avoid using MapReduce on

Master

- Lots of improvements where done but

there are some edge cases

Page 21: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

FAQ: Troubleshooting

Mongos performance

Long running queries

Fragmentation

Disks I/O CPU Memory

Page 22: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

CouchDB

Page 23: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

What is it?• Document-oriented database

built on Dynamo model• Supports RESTful API• Eventual consistency• Lockless optimistic with

conflicts resolution• Horizontally scalable (with

constraints)• Offline-first database• Map reduce to prepare views

Page 24: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

How it works?

Page 25: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

Choosing instance & storage

Page 26: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

FAQ: 2AZ cluster architecture• You should plan

replication schema on your own so it is your responsibility to check how it will behave in case of DR event

Page 27: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

FAQ

Proper replication schema

Indexed views & its performance

Proxy for requests

Page 28: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

Aerospike

Page 29: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

What is it?• In-memory key-

value database• High and

constant performance

• Sharing-nothing architecture

• Geo-distributed (hash partitions)

• Master-slave replication

Page 30: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

How does it work?

Page 31: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

Choosing instance & storage• Aerospike is used when

the performance requirements are extreme. It needs a lot of memory and super fast disks. That is why EC2 with Ephemeral storage would be a first choice for Aerospike deployments.

Page 32: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

FAQ: 2AZ cluster architecture• If one AZ goes down

depending on you replication factor you will still have a copy of data

• Aerospike will be able to add more nodes and replicate data to it without putting much pressure on the existing nodes

• It takes time to replicate data

Page 33: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

FAQ

Aerospike backup / restore

Auto Scaling of Aerospike clusters

Aerospike in Containers

- Restore procedure for the whole cluster can be complicated

- Restore for single node can be done

with EBS Snapshots

- Auto-scaling puts unpredictable

pressure on the cluster

- Scaling up is simple, but scaling down is

complicated

- Does not make any sense

Page 34: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

FAQ: Troubleshooting

Disks I/O CPU Memory

Page 35: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks
Page 36: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

What is it?• Graph database• JVM based• Provides REST API • Two clustering modes:

HA cluster & Casual cluster

• Two types of nodes – Core nodes & Read replicas (RAFT protocol)

• Uses Cypher language for querying Neo4j Casual Clustering

Page 37: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

How does it work?

Page 38: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

Choosing instance & storage

Page 39: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

FAQ: 2AZ cluster architecture• If AZ fails and the

master node was in it – new master election procedure is initiated

• Core nodes in Casual cluster mode vote by simple majority

• If majority is unavailable cluster becomes read-only

Page 40: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

FAQ: Troubleshooting

JVM Page Caching

Disks I/O CPU Memory

Page 41: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

NoSQL on EC2:Cost considerations

Page 42: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

General cost considerations

Usage pattern (R/W)

RPS Size of the dataset

Traffic costs Object size Number of nodes

Page 43: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

Cost: Performance / Size• If you want to be always cost

effective and efficient than deployment is a journey for you

• Consider EBS as main option for most of the workloads

• If your performance requirements are really high and the size of the dataset is relatively low – consider EC2 with ephemerals, overvise – go for EC2 with EBS

Page 44: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

Sum up• There is no general solution for

all cases• Context matters and the

solution should follow the changing context

• Apps and code should be adapted to the way NoSQL DBs work

• Initial choice of the deployment options can be changed

• Best way to make initial choice of the deployment – PoC

Page 45: Best Practices for NoSQL Workloads on Amazon EC2 and Amazon EBS - February 2017 Online Tech Talks

Thank you!