Advanced Cassandra

©2013 DataStax Confidential. Do not distribute without consent.

@PatrickMcFadin

Patrick McFadinChief Evangelist, DataStax

Advanced Cassandra

1

Does Apache Cassandra Work?

Motivations

Cassandra is not…

6

A Data Ocean or Pond., Lake

An In-Memory Database

A Key-Value Store

A magical database unicorn that farts rainbows

7

When to use…

Loose data model (joins, sub-selects) Absolute consistency (aka gotta have ACID) No need to use anything else You’ll miss the long, candle lit dinners with your Oracle rep that always end with “what’s your budget look like this year?”

Oracle, MySQL, Postgres or <RDBMS>

Uptime is a top priority Unpredictable or high scaling requirements Workload is transactional Willing to put the time or effort into understanding how Cassandra works and how to use it.

8

When to use…

Use Oracle when you want to count your money. Use Cassandra when you want to make money.

Cassandra

Copy n Paste your relational model

APACHE

CASSANDRA

1000 Node Cluster

Scaling up

Stick the landing

12

Going to deploy in production!

Not sure about this!

Done!

Topology considerations

Replication Strategy

CREATE KEYSPACE killrvideo WITH REPLICATION = { 'class' : 'SimpleStrategy', 'replication_factor' : 3 };

StrategyCopies

Topology considerations

• Default • One data center

SimpleStrategy

NetworkTopologyStrategy

• Use for multi-data center • Just use this always

NetworkTopologyStrategy

CREATE KEYSPACE Product_Catalog WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'replication_factor' : 3 };

CREATE KEYSPACE EU_Customer_Data WITH REPLICATION = { 'class' : 'NetworkTopologyStrategy', 'eu1' : 3 ‘eu2’ : 3 ‘us1’ : 0 };

Symmetric

Asymmetric

No copies in the US

Application• Closer to customers • No downtime

Product_Catalog RF=3Product_Catalog RF=3 EU_Customer_Data RF=3

EU_Customer_Data RF=0

Product_Catalog RF=3EU_Customer_Data RF=3

Snitches

SnitchesDC1

DC1: RF=3

Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-100

10.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

76-100 51-75

00-25 76-100

26-50 00-25

51-75 26-50

Client

Where do I place this data?

?

Dynamic SnitchingRoute based on node performance

Snitches

SimpleSnitch

GossipingPropertyFileSnitch

RackInferringSnitch

PropertyFileSnitch

EC2Snitch

GoogleCloudSnitchCloudStackSnitch

EC2MultiRegionSnitch

Snitches

• Most typically used in production • Absolute placement

GossipingPropertyFileSnitch

cassandra-rackdc.properties

dc=DC1 rack=RAC1

Booting a datacenterDC1

DC1: RF=3Node Primary Replica Replica

10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-100

10.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

76-100 51-75

00-25 76-100

26-50 00-25

51-75 26-50

DC2

Pre-check

• Use NetworkTopologyStrategy • In cassandra.yaml

• auto_bootstrap: false

• add seeds from other DC • Set node location for Snitch

• GossipingPropertyFileSnitch: cassandra-rackdc.properties

• PropertyFileSnitch: cassandra-topology.properties



10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-100

10.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

76-100 51-75

00-25 76-100

26-50 00-25

51-75 26-50

DC2

10.1.0.1 00-25

10.1.0.4 76-100

10.1.0.2 26-50

10.1.0.3 51-75

76-100 51-75

00-25 76-100

26-50 00-25

51-75 26-50


10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-100

10.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

DC2: RF=3

ALTER KEYSPACE



10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-100

10.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

76-100 51-75

00-25 76-100

26-50 00-25

51-75 26-50

DC2

10.1.0.1 00-25

10.1.0.4 76-100

10.1.0.2 26-50

10.1.0.3 51-75

76-100 51-75

00-25 76-100

26-50 00-25

51-75 26-50


10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-100

10.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

DC2: RF=3

nodetool rebuild

Security

NoSQL == No Security

User Auth

Step 1 Turn it on

cassandra.yaml

authorizer:PasswordAuthorizerAllowAllAuthorizer

authenticator:AllowAllAuthenticatorPasswordAuthenticator

User Auth

cqlsh -u cassandra -p cassandra

Step 2 Create users

cqlsh> create user dude with password 'manager' superuser;

cqlsh> create user worker with password 'newhire';

cqlsh> list users; name | super ----------+------- cassandra | True worker | False dude | True

User Auth

cqlsh -u cassandra -p cassandra

Step 3 Grant permissions

cqlsh> create user ro_user with password '1234567';

cqlsh> grant all on killrvideo.user to dude;

cqlsh> grant select on killrvideo.user to ro_user;

SSL

http://thelastpickle.com/blog/2015/09/30/hardening-cassandra-step-by-step-part-1-server-to-server.html

10.0.0.1

10.0.0.4 10.0.0.2

10.0.0.3

• Create SSL certificates • Copy to each server • Start each node

Prepared Statements• Built for speed an efficiency

How they work: Prepare

SELECT * FROM user WHERE id = ?

10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

Client

Prepare

Parsed

Hashed Cached

Prepared Statement

How they work: Bind

id = 1 + PreparedStatement Hash

10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

Client

Bind & Execute

Combine Pre-parsed Query and Variable

Execute

Result?

How to Prepare(Statements)

PreparedStatement userSelect = session.prepare(“SELECT * FROM user WHERE id = ?”);BoundStatement userSelectStatement = new BoundStatement(userSelect);

session.execute(userSelectStatement.bind(1));

prepared_stmt = session.prepare (“SELECT * FROM user WHERE id = ?”)bound_stmt = prepared_stmt.bind([1])

session.execute(bound_stmt)

Java

Python

Don’t do this

for (int i = 1; i < 100; i++) { PreparedStatement userSelect = session.prepare(“SELECT * FROM user WHERE id = ?”); BoundStatement userSelectStatement = new BoundStatement(userSelect);

session.execute(userSelectStatement.bind(1));}

Execute vs Execute Async• Very subtle difference • Blocking vs non-blocking call

VS

Async• Request pipelining • One connection for requests • Responses return whenever

Async

for (…) {future = executeAsync(statement)

}

10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

Client

Do something

for (…) {result = future.get

}Block

Batch vs Execute Async

VS

(Potentially)

Load Balancing Policies

cluster = Cluster .builder() .addContactPoint("192.168.0.30") .withQueryOptions(new QueryOptions().setConsistencyLevel(ConsistencyLevel.ONE) .withRetryPolicy(DefaultRetryPolicy.INSTANCE) .withLoadBalancingPolicy(new TokenAwarePolicy(new DCAwareRoundRobinPolicy())) .build(); session = cluster.connect("demo");

Data LocalityDC1


10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-100

10.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

76-100 51-75

00-25 76-100

26-50 00-25

51-75 26-50

Client

Read partition 15

DC2

10.1.0.1 00-25

10.1.0.4 76-100

10.1.0.2 26-50

10.1.0.3 51-75

76-100 51-75

00-25 76-100

26-50 00-25

51-75 26-50


10.0.0.1 00-25 76-100 51-75

10.0.0.2 26-50 00-25 76-100

10.0.0.3 51-75 26-50 00-25

10.0.0.4 76-100 51-75 26-50

DC2: RF=3

Client

Read partition 15

Batch (Logged)• All statements collected on client • Sent in one shot • All done on 1 node

Batch is accepted

All actions are logged on two replicas

Statements executed in sequence

Results are collected and returned

Batches: The good• Great for denormalized inserts/updates

// Looking from the video side to many usersCREATE TABLE comments_by_video ( videoid uuid, commentid timeuuid, userid uuid, comment text, PRIMARY KEY (videoid, commentid)) WITH CLUSTERING ORDER BY (commentid DESC);

// looking from the user side to many videosCREATE TABLE comments_by_user ( userid uuid, commentid timeuuid, videoid uuid, comment text, PRIMARY KEY (userid, commentid)) WITH CLUSTERING ORDER BY (commentid DESC);

Batches: The good• Both inserts are run • On failure, the batch log will replay

BEGIN BATCH INSERT INTO comments_by_video (videoid, userid, commentid, comment) VALUES (99051fe9-6a9c-46c2-b949-38ef78858dd0,d0f60aa8-54a9-4840-b70c-fe562b68842b,now(), 'Worst. Video. Ever.') INSERT INTO comments_by_video (videoid, userid, commentid, comment) VALUES (99051fe9-6a9c-46c2-b949-38ef78858dd0,d0f60aa8-54a9-4840-b70c-fe562b68842b,now(), 'Worst. Video. Ever.')APPLY BATCH;

Batches: The bad“I was doing a load test and nodes started blinking offline”

“Were you using a batch by any chance?”

“Why yes I was! How did you know?”

“How big was each batch?”

“1000 inserts each”

Batches: The bad

BEGIN BATCH 1000 insertsAPPLY BATCH;

10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

Client

Batches: The rules• Keep them small and for atomicity

CASSANDRA-6487 - Warn on large batches (5Kb default)

CASSANDRA-8011 - Fail on large batches (50Kb default)

The alternative

BEGIN BATCH 1000 insertsAPPLY BATCH;

while() { future = session.executeAsync(statement)}

Instead of:

Do this:

Old Row cache: The problem• Reads an entire storage row of data

ID = 1Partition Key

(Storage Row Key)

2014-09-08 12:00:00 : name

SFO

2014-09-08 12:00:00 : temp

63.4

2014-09-08 12:01:00 : name

SFO

2014-09-08 12:00:00 : temp

63.9

2014-09-08 12:02:00 : name

SFO

2014-09-08 12:00:00 : temp

64.0

Need this

Caches this

New Row Cache: The solution• Stores just a few CQL rows

ID = 1Partition Key

(Storage Row Key)

2014-09-08 12:00:00 : name

SFO

2014-09-08 12:00:00 : temp

63.4

2014-09-08 12:01:00 : name

SFO

2014-09-08 12:00:00 : temp

63.9

2014-09-08 12:02:00 : name

SFO

2014-09-08 12:00:00 : temp

64.0

Need this

Caches this

Using row cache

CREATE TABLE user_search_history_with_cache ( id int, search_time timestamp, search_text text, search_results int, PRIMARY KEY (id, search_time)) WITH CLUSTERING ORDER BY (search_time DESC)AND caching = { 'keys' : 'ALL', 'rows_per_partition' : '20' };

Perf increase

95th ms

Requests

Go make something awesome

Thank you!

Bring the questions

Follow me on twitter @PatrickMcFadin

Technology

Advanced Cassandra