How to size up an Apache Cassandra cluster (Training)

How To Size Up A Cassandra ClusterJoe Chu, Technical Trainerjchu@datastax.com

April 2014

What is Apache Cassandra?

• Distributed NoSQL database

• Linearly scalable

• Highly available with no single point of failure

• Fast writes and reads

• Tunable data consistency

• Rack and Datacenter awareness

Peer-to-peer architecture

• All nodes are the same

• No master / slave architecture

• Less operational overhead for better scalability.

• Eliminates single point of failure, increasing availability.

Master

PeerPeer

Linear Scalability

• Operation throughput increases linearly with the number of nodes added.

Data Replication

• Cassandra can write copies of data on different nodes.

RF = 3

• Replication factor setting determines the number of copies.

• Replication strategy can replicate data to different racks and and different datacenters.

INSERT INTO user_table (id, first_name, last_name) VALUES (1, ‘John’, ‘Smith’); R1

• Instance of a running Cassandra process.

• Usually represented a single machine or server.

• Logical grouping of nodes.

• Allows data to be replicated across different racks.

Datacenter

• Grouping of nodes and racks.

• Each data center can have separate replication settings.

• May be in different geographical locations, but not always.

Cluster

• Grouping of datacenters, racks, and nodes that communicate with each other and replicate data.

• Clusters are not aware of other clusters.

Consistency Models

• Immediate consistency

When a write is successful, subsequent reads are guaranteed to return that latest value.

• Eventual consistency

When a write is successful, stale data may still be read but will eventually return the latest value.

Tunable Consistency

• Cassandra offers the ability to chose between immediate and eventual consistency by setting a consistency level.

• Consistency level is set per read or write operation.

• Common consistency levels are ONE, QUORUM, and ALL.

• For multi-datacenters, additional levels such as LOCAL_QUORUM and EACH_QUORUM to control cross-datacenter traffic.

CL ONE

• Write: Success when at least one replica node has acknowleged the write.

• Read: Only one replica node is given the read request.

R3CoordinatorClient

RF = 3

CL QUORUM

• Write: Success when a majority of the replica nodes has acknowledged the write.

• Read: A majority of the nodes are given the read request.

• Majority = ( RF / 2 ) + 1

R3CoordinatorClient

RF = 3

CL ALL

• Write: Success when all of the replica nodes has acknowledged the write.

• Read: All replica nodes are given the read request.

R3CoordinatorClient

RF = 3

Log-Structured Storage Engine

• Cassandra storage engine inspired by Google BigTable

• Key to fast write performance on Cassandra

Memtable

SSTable SSTable SSTable

Commit Log

Updates and Deletes

• SSTable files are immutable and cannot be changed.• Updates are written as new data.• Deletes write a tombstone, which mark a row or

column(s) as deleted.• Updates and deletes are just as fast as inserts.

id:1, first:John, last:Smith

timestamp: …405

id:1, first:John, last:Williams

timestamp: …621

id:1, deleted

timestamp: …999

Compaction

• Periodically an operation is triggered that will merge the data in several SSTables into a single SSTable.

• Helps to limits the number of SSTables to read.• Removes old data and tombstones.• SSTables are deleted after compaction

id:1, first:John, last:Smith

timestamp:405

id:1, first:John, last:Williams

timestamp:621

id:1, deleted

timestamp:999

New SSTable

id:1, deleted

timestamp:999

Cluster Sizing

Cluster Sizing Considerations

• Replication Factor

• Data Size

“How many nodes would I need to store my data set?”

• Data Velocity (Performance)

Choosing a Replication Factor

What Are You Using Replication For?

• Durability or Availability?

• Each node has local durability (Commit Log), but replication can be used for distributed durability.

• For availability, a recommended setting is RF=3.

• RF=3 is the minimum necessary to achieve both consistency and availability using QUORUM and LOCAL_QUORUM.

How Replication Can Affect Consistency Level

• When RF < 3, you do not have as much flexibility when choosing consistency and availability.

• QUORUM = ALL

CoordinatorClient

RF = 2

Using A Larger Replication Factor

• When RF > 3, there is more data usage and higher latency for operations requiring immediate consistency.

• If using eventual consistency, a consistency level of ONE will have consistent performance regardless of the replication factor.

• High availability clusters may use a replication factor as high as 5.

Data Size

Disk Usage Factors

• Data Size

• Replication Setting

• Old Data

• Compaction

• Snapshots

Data Sizing

• Row and Column Data

• Row and Column Overhead

• Indices and Other Structures

Replication Overhead

• A replication factor > 1 will effectively multiply your data size by that amount.

RF = 1 RF = 2 RF = 3

Old Data

• Updates and deletes do not actually overwrite or delete data.

• Older versions of data and tombstones remain in the SSTable files until they are compacted.

• This becomes more important for heavy update and delete workloads.

Compaction

• Compaction needs free disk space to write the new SSTable, before the SSTables being compacted are removed.

• Leave enough free disk space on each node to allow compactions to run.

• Worst case for the Size Tier Compaction Strategy is

50% of the total data capacity of the node.

• For the Leveled Compaction Strategy, that is about 10% of the total data capacity.

Snapshots

• Snapshots are hard-links or copies of SSTable data files.

• After SSTables are compacted, the disk space may not be reclaimed if a snapshot of those SSTables were created.

Snapshots are created when:

• Executing the nodetool snapshot command• Dropping a keyspace or table• Incremental backups• During compaction

Recommended Disk Capacity

• For current Cassandra versions, the ideal disk capacity is approximate 1TB per node if using spinning disks and 3-5 TB per node using SSDs.

• Having a larger disk capacity may be limited by the resulting performance.

• What works for you is still dependent on your data model design and desired data velocity.

Data Velocity (Performance)

How to Measure Performance

• I/O Throughput

“How many reads and writes can be completed per second?”

• Read and Write Latency

“How fast should I be able to get a response for my read and write requests?”

Sizing for Failure

• Cluster must be sized taking into account the performance impact caused by failure.

• When a node fails, the corresponding workload must be absorbed by the other replica nodes in the cluster.

• Performance is further impacted when recovering a node. Data must be streamed or repaired using the other replica nodes.

Hardware Considerations for Performance

CPU• Operations are often CPU-intensive.• More cores are better.

Memory• Cassandra uses JVM heap memory.• Additional memory used as off-heap memory by

Cassandra, or as the OS page cache.

Disk• C* optimized for spinning disks, but SSDs will perform

better.• Attached storage (SAN) is strongly discouraged.

Some Final Words…

Summary

• Cassandra allows flexibility when sizing your cluster from a single node to thousands of nodes

• Your use case will dictate how you want to size and configure your Cassandra cluster. Do you need availability? Immediate consistency?

• The minimum number of nodes needed will be determined by your data size, desired performance and replication factor.

Additional Resources

• DataStax Documentationhttp://www.datastax.com/documentation/cassandra/2.0/cassandra/architecture/architecturePlanningAbout_c.html

• Planet Cassandrahttp://planetcassandra.org/nosql-cassandra-education/

• Cassandra Users Mailing Listuser-subscribe@cassandra.apache.orghttp://mail-archives.apache.org/mod_mbox/cassandra-user/

Questions?

Thank You

We power the big dataapps that transform business.

How to size up an Apache Cassandra cluster (Training)

Technology

Apache Cassandra From The Ground Up - Leanpubsamples.leanpub.com/apachecassandrafromthegroundup-sample.pdfAnIntroductionToNoSQL&Apache Cassandra WelcometoApacheCassandrafromTheGroupUp.Theprimarygoalofthisbooktohelpdevelopers

Amazon Managed Apache Cassandra Service - Developer GuideCassandra Query Language (CQL) is the primary language for communicating with Apache Cassandra. Amazon Managed Apache Cassandra

NoSQL Apache Cassandra

Apache Cassandra

Deploying Apache Cassandra on Oracle Cloud Infrastructure...When a cluster of Cassandra nodes is created, the data is dynamically partitioned across the cluster so that every node

Apache Cassandra at Wayin

Cassandra Day London 2015: Apache Cassandra Anti-Patterns

Cassandra Community Webinar - Introduction To Apache Cassandra 1.2

Apache Cassandra, part 3 – machinery, work with Cassandra

Введение в Apache Cassandra

Diaposotivas apache-cassandra

Apache Cassandra 2.0

Cassandra Summit 2014: Apache Cassandra on Pivotal CloudFoundry

Introduction to Cassandra • Why Spark - Apache Cassandra | Apache Kafka | Apache Spark · 2017. 12. 20. · • Introduction to Cassandra • Why Spark + Cassandra • Problem background

Apache Cassandra in Bangalore - Cassandra Internals and Performance

State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax Some Cassandra

About "Apache Cassandra"

Cassandra Community Webinar: Apache Cassandra Internals

Apache Cassandra overview

Apache Cassandra at Target - Cassandra Summit 2014