Hey Relational Developer, Let's Go Crazy (Patrick McFadin, DataStax) | Cassandra Summit 2016

@PatrickMcFadin

Patrick McFadinChief Evangelist for Apache Cassandra, DataStax

Hey relational developer, let's go crazy

Why do you develop?

value = Business.add(you)

KillrVideo

https://killrvideo.github.io/

Major areas to cover

Connecting to the database Inserting Data Selecting Data Indexing Data Locality

WARNING

Connecting to the database

Cluster cluster;Session session;

// Connect to the cluster and keyspace "killrvideo"cluster = Cluster.builder().addContactPoint(“192.168.0.1,192.168.0.2”).build();session = cluster.connect("killrvideo");

Cluster cluster;Session session;

// Connect to the cluster and keyspace "killrvideo"cluster = Cluster.builder().addContactPoint(“NODE1,NODE2”).build();session = cluster.connect("killrvideo");

WARNINGCluster cluster = Cluster.builder() .addContactPoint(“192.168.0.1,192.168.0.2”) .withLoadBalancingPolicy( DCAwareRoundRobinPolicy.builder() .withLocalDc("myLocalDC") .build() ).build();

Multi-DCEast West

< 1ms > 70ms

I wonder why I have random slow queries?

Inserting Data

Inserting dataCREATE TABLE video_ratings_by_user ( videoid uuid, userid uuid, rating int, PRIMARY KEY (videoid, userid) );

INSERT INTO video_ratings_by_user(videoid, userid)VALUES (?,?);

Inserting data

• Batch in the same partition is great • Pay attention to the partition key

BEGIN BATCH INSERT INTO comments_by_video (videoid, userid, commentid, comment) VALUES (99051fe9-6a9c-46c2-b949-38ef78858dd0,d0f60aa8-54a9-4840-b70c-fe562b68842b,now(), 'Worst. Video. Ever.ʼ);

…100 Inserts later…

INSERT INTO comments_by_video (videoid, userid, commentid, comment) VALUES (99051fe9-6a9c-46c2-b949-38ef78858dd0,d0f60aa8-54a9-4840-b70c-fe562b68842b,now(), 'Worst. Video. Ever.');APPLY BATCH;

Batches: The bad

BEGIN BATCH 1000 insertsAPPLY BATCH;

10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

Client

WARNING

Prepared Statements• Built for speed an efficiency

How they work: Prepare

SELECT * FROM user WHERE id = ?

10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

Client

Prepare

Parsed

Hashed Cached

Prepared Statement

How they work: Bind

id = 1 + PreparedStatement Hash

10.0.0.1 00-25

10.0.0.4 76-100

10.0.0.2 26-50

10.0.0.3 51-75

Client

Bind & Execute

Combine Pre-parsed Query and Variable

Execute

Selecting Data

Getting data

• Use a partition key always •Need JSON? Just ask • Order of clustering columns matter

SELECT * FROM user_videosWHERE userid = ?;

SELECT * FROM user_videosWHERE userid = ?AND added_date = ?;

CREATE TABLE IF NOT EXISTS user_videos ( userid uuid, added_date timestamp, videoid uuid, name text, preview_image_location text, PRIMARY KEY (userid, added_date, videoid)) WITH CLUSTERING ORDER BY (added_date DESC, videoid ASC);

SELECT * FROM user_videosWHERE userid = ?AND videoid = ?;

SELECT JSON * FROM user_videosWHERE userid = ?;

Getting data

• CQLSH trace facility is your friend •Watch the logs. Filter for warnings

SELECT * FROM videos;

SELECT * FROM videos ALLOW FILTERING;

WARNING

SELECT * FROM videosWHERE key IN <10s, 100s or 1000s of keys>;

Indexing

Check out what I built This query is really slow

Duh. Add an index to this field.

Oh yeah. That is faster.

Indexing data

• Secondary Indexes are not for speed • Index clustering columns • Index collections

CREATE INDEX videoid_idxON user_videos(videoid) ;

CREATE TABLE IF NOT EXISTS user_videos ( userid uuid, added_date timestamp, videoid uuid, name text, preview_image_location text, PRIMARY KEY (userid, added_date, videoid)) WITH CLUSTERING ORDER BY (added_date DESC, videoid ASC);

CREATE INDEX tags_idxON videos(tags) ;

name (PK) location

Jonathan TX

Aleksey UK

Patrick CA

Stefania HK

CREATE INDEX location_idx ON users(location)

USERS Index:user(location)

Index:user(location)

name (PK) location

Jonathan TX

Aleksey UK

Patrick CA

Stefania HK

CREATE CUSTOM INDEX location_idx ON users(location) USING ‘org.apache.cassandra.sasi.SASIIndex’;

name (PK) location

Jonathan TX

Aleksey UK

Patrick CA

Stefania HK

CREATE CUSTOM INDEX location_idx ON users(location) USING ‘org.apache.cassandra.sasi.SASIIndex’;

Memtable

SSTable

SASI Index

SASI Queries

SELECT * FROM users WHERE firstname LIKE 'pat%';

SELECT * FROM users WHERE lastname LIKE ‘%Fad%';

SELECT * FROM users WHERE email LIKE '%data%';

SELECT * FROM users WHERE created_date > '2011-6-15' AND created_date < '2011-06-30';

Data Locality

8 Fallacies of Distributed Computing

1. The network is reliable 2. Latency is zero 3. Bandwidth is infinite 4. The network is secure 5. Topology doesn’t change 6. There is one administrator 7. Transport cost is zero 8. The network is homogeneous

Insert Alternative

BEGIN BATCH 1000 insertsAPPLY BATCH;

while() { future = session.executeAsync(statement)}

Instead of:

Do this:

WARNING

Collect and deal with your futures!

Thank you!Questions?

Follow me @PatrickMcFadin

Hey Relational Developer, Let's Go Crazy (Patrick McFadin, DataStax) | Cassandra Summit 2016

Software

CQL for Cassandra 2 - Huihoodocs.huihoo.com/apache/cassandra/datastax/CQL-3.1... · DataStax drivers support Cassandra 2.0. CQL for Cassandra 2.0 deprecated super columns. ... For

Getting started with DataStax .NET Driver for Cassandra

DataStax | Data Science with DataStax Enterprise (Brian Hess) | Cassandra Summit 2016

Datastax Cassandra + Spark Streaming

NoSQL Performance Benchmark 2018 - Couchbase, Inc. · 2.4 DataStax Enterprise (Cassandra) cluster configuration DataStax Enterprise (Cassandra) is a wide-column store NoSQL database

Cassandra Day Chicago 2015: DataStax Enterprise & Apache Cassandra Hardware Best Practices

Spectator to Participant. Contributing to Cassandra (Patrick McFadin, DataStax) | C* Summit 2016

DataStax: Extreme Cassandra Optimization: The Sequel

DataStax: Making Cassandra Fail (for effective testing)

DataStax: Dockerizing Cassandra on Modern Linux

Cassandra Day London 2015: Securing Cassandra and DataStax Enterprise

DataStax | Best Practices for Securing DataStax Enterprise (Matt Kennedy) | Cassandra Summit 2016

DataStaxODBCdriverforApache ......[DataStax ODBC driver for Apache Cassandra and DataStax Enterprise with CQL connector 32-bit] Description=DataStax ODBC driver for Apache Cassandra

Integración de DataStax de Spark con Cassandra

Cassandra Internals: The Read Path (Tyler Hobbs, DataStax) | Cassandra Summit 2016

Introduction to Cassandra and datastax DSE

DataStax: Backup and Restore in Cassandra and OpsCenter

Cassandra and DataStax Enterprise on PCF

DataStax | Graph Data Modeling in DataStax Enterprise (Artem Chebotko) | Cassandra Summit 2016

State of Cassandra, 2012 - NoSQL | Apache Cassandra · State of Cassandra, 2012 Jonathan Ellis Project Chair, Apache Cassandra CTO, DataStax @spyced ©2012 DataStax Some Cassandra