Cassandra on EPAM Cloud

Preview:

Citation preview

Cassandra on

EPAM Cloud

Database deployed in multiple locations

AGENDA

• Typical issues with RDBMS

• Solutions with Cassandra

• Cassandra on EPAM Cloud

ABOUT ME

Oresztész Margaritisz

• Java CC member since 2015

• Distributed / Cloud Computing

• NoSQL

• Agile

• DevOps

@gitaroktato gitaroktato https://www.linkedin.com/in/oreszteszgitaroktato

TYPICAL ISSUES WITH RDBMS

TYPICAL ISSUES WITH RDBMS

• EPAM needs global delivery of services

• 25 countries

• 4 continents

• 19,600 employees

• Data storage with traditional RDBMS can be cumbersome

• Configuration issues

• Migrating data between locations can be hard

• Master - Slave configuration in local site gives tradeoff in performance

ARE WE SOLVING

THE SAME PROBLEM?

LATENCY AROUND THE GLOBE

LATENCY FOR 100 REQUESTS

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

1 request 100 requests

10 ms 1 second

BANDWIDTH VS. LATENCY

TYPICAL MULTI-MASTER MYSQL DEPLOYMENT

Master

1

Slave

1.1

Slave

1.2

Slave

1.3

Master

2

Slave

2.1

Slave

2.2

Slave

2,3

Master-Master

Replication

WE NEED NOSQL?

Dude, you need NoSQL!

ACID vs. BASE

WHY CASSANDRA?

CLIENT CONNECTIVITY

R/W

Client

R/W

Client

MULTI-REGION DEPLOYMENT

TokyoMinsk

Client

TUNABLE CONSISTENCY

Client

RAPID READ PROTECTION

Client

CASSANDRA ON EPAM CLOUD

INITIATIVE BY RND TEAM @ JAVACC

WE NEED YOU!

CONFIGURATION GUIDE

CONFIGURATION GUIDE

AWS-AP-NORTHEASTEPAM-BY1

cassandra-rackdc.properties

dc=AWS-AP-NORTHEASTrack=rack1

prefer_local=true

Public IP

cassandra.yaml

endpoint_snitch: GossipingPropertyFileSnitch

cassandra-rackdc.properties

dc=EPAM-BY1rack=rack1

prefer_local=true

cassandra.yaml

broadcast_address: <PUBLIC_IP>

cassandra.yaml

seed_provider:- class_name: org.apache.cassandra.locator.SimpleSeedProvider- seeds: <AWS_SEED_PUBLIC_IP>

cassandra.yaml

...- seeds: <BY1_SEED_PUBLIC_IP>

BOOT SEQUENCE

AWS-AP-NORTHEASTEPAM-BY1

CREATE KEYSPACE replicated WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'EPAM-BY1' : 2, 'AWS-AP-NORTHEAST' : 2 };

CAPACITY PLANNING

• Replication latency between regions

• Transactions per second for the whole cluster

• 3 MEDIUM instance in EPAM-BY1

• 3 MEDIUM instance in AWS-AP-NORTHEAST

REPLICATION LATENCY

ClientClient

WRITEREAD

Client

WRITEREAD

NTP

REPLICATION LATENCY

0 50 100 150 200 250 300 350 400

TCP Ping

DC1 -> DC2

Single Client

Average 99% Max

CLUSTER THROUGHPUT

0 5000 10000 15000 20000 25000 30000

LOCAL_QUORUMReplication: 2

LOCAL_ONEReplication: 2

LOCAL_ONEReplication: 1

node #1 node #2 node #3 SUM

SUMMARY

• Configuration is easy

• Migrating data between locations is built-in

• Load spread evenly

• Dealing with network failures by default

UP NEXT

• Real migration use-case

• Performance tuning

LOOKING FOR A REAL MIGRATION USE-CASE

KB Page

https://kb.epam.com/display/EJAVACC/Multi+datacenter+setup+with+NoSQL

Dzmitry Skaredau - Dzmitry_Skaredau@epam.com

Oresztesz Margaritisz - Oresztesz_Margaritisz@epam.com

References

EPAM Project Space

https://kb.epam.com/display/EJAVACC/Multi-Region+Cassandra+set-up+in+EPAM+Cloud

Latency: The Next Web Performance Bottleneck

https://www.igvita.com/2012/07/19/latency-the-new-web-performance-bottleneck/

More Bandwdth Doesn’t Matter

https://docs.google.com/a/chromium.org/viewer?a=v&pid=sites&srcid=Y2hyb21pdW0ub3JnfGRldnxne

DoxMzcyOWI1N2I4YzI3NzE2

References pt. 2

Cassandra’s Rapid Read Protection

http://www.datastax.com/dev/blog/rapid-read-protection-in-cassandra-2-0-2

Recommended