@PatrickMcFadin
Patrick McFadinChief Evangelist for Apache Cassandra, DataStax
Hey relational developer, let's go crazy
1
Why do you develop?
value = Business.add(you)
KillrVideo
https://killrvideo.github.io/
Major areas to cover
Connecting to the database Inserting Data Selecting Data Indexing Data Locality
WARNING
Connecting to the database
Cluster cluster;Session session;
// Connect to the cluster and keyspace "killrvideo"cluster = Cluster.builder().addContactPoint(“192.168.0.1,192.168.0.2”).build();session = cluster.connect("killrvideo");
Cluster cluster;Session session;
// Connect to the cluster and keyspace "killrvideo"cluster = Cluster.builder().addContactPoint(“NODE1,NODE2”).build();session = cluster.connect("killrvideo");
WARNINGCluster cluster = Cluster.builder() .addContactPoint(“192.168.0.1,192.168.0.2”) .withLoadBalancingPolicy( DCAwareRoundRobinPolicy.builder() .withLocalDc("myLocalDC") .build() ).build();
Multi-DCEast West
< 1ms > 70ms
I wonder why I have random slow queries?
Major areas to cover
Connecting to the database Inserting Data Selecting Data Indexing Data Locality
Inserting Data
Inserting dataCREATE TABLE video_ratings_by_user ( videoid uuid, userid uuid, rating int, PRIMARY KEY (videoid, userid) );
INSERT INTO video_ratings_by_user(videoid, userid)VALUES (?,?);
Inserting data
• Batch in the same partition is great • Pay attention to the partition key
BEGIN BATCH INSERT INTO comments_by_video (videoid, userid, commentid, comment) VALUES (99051fe9-6a9c-46c2-b949-38ef78858dd0,d0f60aa8-54a9-4840-b70c-fe562b68842b,now(), 'Worst. Video. Ever.ʼ);
…100 Inserts later…
INSERT INTO comments_by_video (videoid, userid, commentid, comment) VALUES (99051fe9-6a9c-46c2-b949-38ef78858dd0,d0f60aa8-54a9-4840-b70c-fe562b68842b,now(), 'Worst. Video. Ever.');APPLY BATCH;
Batches: The bad
BEGIN BATCH 1000 insertsAPPLY BATCH;
10.0.0.1 00-25
10.0.0.4 76-100
10.0.0.2 26-50
10.0.0.3 51-75
Client
WARNING
Prepared Statements• Built for speed an efficiency
How they work: Prepare
SELECT * FROM user WHERE id = ?
10.0.0.1 00-25
10.0.0.4 76-100
10.0.0.2 26-50
10.0.0.3 51-75
Client
Prepare
Parsed
Hashed Cached
Prepared Statement
How they work: Bind
id = 1 + PreparedStatement Hash
10.0.0.1 00-25
10.0.0.4 76-100
10.0.0.2 26-50
10.0.0.3 51-75
Client
Bind & Execute
Combine Pre-parsed Query and Variable
Execute
Major areas to cover
Connecting to the database Inserting Data Selecting Data Indexing Data Locality
Selecting Data
Getting data
• Use a partition key always •Need JSON? Just ask • Order of clustering columns matter
SELECT * FROM user_videosWHERE userid = ?;
SELECT * FROM user_videosWHERE userid = ?AND added_date = ?;
CREATE TABLE IF NOT EXISTS user_videos ( userid uuid, added_date timestamp, videoid uuid, name text, preview_image_location text, PRIMARY KEY (userid, added_date, videoid)) WITH CLUSTERING ORDER BY (added_date DESC, videoid ASC);
SELECT * FROM user_videosWHERE userid = ?AND videoid = ?;
SELECT JSON * FROM user_videosWHERE userid = ?;
Getting data
• CQLSH trace facility is your friend •Watch the logs. Filter for warnings
SELECT * FROM videos;
SELECT * FROM videos ALLOW FILTERING;
WARNING
SELECT * FROM videosWHERE key IN <10s, 100s or 1000s of keys>;
Major areas to cover
Connecting to the database Inserting Data Selecting Data Indexing Data Locality
Indexing
Check out what I built This query is really slow
Duh. Add an index to this field.
Oh yeah. That is faster.
Indexing data
• Secondary Indexes are not for speed • Index clustering columns • Index collections
CREATE INDEX videoid_idxON user_videos(videoid) ;
CREATE TABLE IF NOT EXISTS user_videos ( userid uuid, added_date timestamp, videoid uuid, name text, preview_image_location text, PRIMARY KEY (userid, added_date, videoid)) WITH CLUSTERING ORDER BY (added_date DESC, videoid ASC);
CREATE INDEX tags_idxON videos(tags) ;
name (PK) location
Jonathan TX
Aleksey UK
Patrick CA
Stefania HK
CREATE INDEX location_idx ON users(location)
USERS Index:user(location)
Index:user(location)
Index:user(location)
Index:user(location)
name (PK) location
Jonathan TX
Aleksey UK
Patrick CA
Stefania HK
CREATE CUSTOM INDEX location_idx ON users(location) USING ‘org.apache.cassandra.sasi.SASIIndex’;
USERS
name (PK) location
Jonathan TX
Aleksey UK
Patrick CA
Stefania HK
CREATE CUSTOM INDEX location_idx ON users(location) USING ‘org.apache.cassandra.sasi.SASIIndex’;
USERS
Memtable
Users
SSTable
Users
SASI Index
SASI Index
SASI Queries
SELECT * FROM users WHERE firstname LIKE 'pat%';
SELECT * FROM users WHERE lastname LIKE ‘%Fad%';
SELECT * FROM users WHERE email LIKE '%data%';
SELECT * FROM users WHERE created_date > '2011-6-15' AND created_date < '2011-06-30';
userid | created_date | email | firstname | lastname --------------------------------------+---------------------------------+----------------------+-----------+---------- 9761d3d7-7fbd-4269-9988-6cfd4e188678 | 2011-06-20 20:50:00.000000+0000 | [email protected] | Patrick | McFadin
Major areas to cover
Connecting to the database Inserting Data Selecting Data Indexing Data Locality
Data Locality
8 Fallacies of Distributed Computing
1. The network is reliable 2. Latency is zero 3. Bandwidth is infinite 4. The network is secure 5. Topology doesn’t change 6. There is one administrator 7. Transport cost is zero 8. The network is homogeneous
Insert Alternative
BEGIN BATCH 1000 insertsAPPLY BATCH;
while() { future = session.executeAsync(statement)}
Instead of:
Do this:
WARNING
Collect and deal with your futures!
Thank you!Questions?
Follow me @PatrickMcFadin