Cassandra - lesson learned

Cassandra - lesson learned

Andrzej Ludwikowski

About me?- http://aludwikowski.blogspot.com/- https://github.com/aludwiko- @aludwikowski- SoftwareMill

Why cassandra?- BigData!!!

- Volume (petabytes of data, trillions of entities)- Velocity (real-time, streams, millions of transactions per second)- Variety (un-, semi-, structured)

- Near-linear horizontal scaling (in proper use cases)- Fully distributed, with no single point of failure

- Data replication- By default

What is cassandra vs CAP?- CAP Theorem - pick two



Origins?

2010

Name?

Name?

Write path

Node 1

Node 2

Node 3

Node 4

Client (driver)

Write path

Node 1

Node 2

Node 3

Node 4

Client (driver)

- Any node can coordinate any request (NSPOF)

- Any node can coordinate any request (NSPOF)- Replication Factor

Write path

Node 1

Node 2

Node 3

Node 4

Client

RF=3

- Any node can coordinate any request (NSPOF)- Replication Factor- Consistency Level

Write path

Node 1

Node 2

Node 3

Node 4

Client

RF=3

CL=2

- Token ring from -2^63 to 2^64

Write path - consistent hashing

Node 1

Node 2

Node 3

Node 4

0100

- Token ring from -2^63 to 2^64 - Partitioner: partition key -> token


Node 1

Node 2

Node 3

Node 4

Client

Partitioner

0-25

25-5051-75

76-10077

- Token ring from -2^63 to 2^64 - Partitioner: primary key -> token


Node 1

Node 2

Node 3

Node 4

Client

Partitioner

0-25

25-5051-75

76-100

77



Node 1

Node 2

Node 3

Node 4

Client

Partitioner

0-25

25-5051-75

76-100

77

77

77



Node 1

Node 2

Node 3

Node 4

Client

0-25

Partitioner

77

25-5051-75

76-100

77

77

DEMO

Write path - problems?

Node 1

Node 2

Node 3

Node 4

Client

0-2577

25-5051-75

76-100

77

77

- Hinted handoff


Node 1

Node 2

Node 3

Node 4

Client

0-2577

25-5051-75

76-100

77

77

- Hinted handoff- Retry idempotent inserts

- build-in policies


Node 1

Node 2

Node 3

Node 4

Client

0-2577

25-5051-75

76-100

77

77


- build-in policies

- Lightweight transactions (Paxos)


Node 1

Node 2

Node 3

Node 4

Client

0-2577

25-5051-75

76-100

77

77


- build-in policies

- Lightweight transactions (Paxos)- Batches


Node 1

Node 2

Node 3

Node 4

Client

0-2577

25-5051-75

76-100

77

77

Write path - node level

Write path - why so fast?- Commit log - append only

Write path - why so fast?


50,000 t/s 50 t/ms 5 t/100us 1 t/20us

Write path - why so fast?- Commit log - append only- Periodic (10s) or batch sync to disk

Node 1

Node 2

Node 3

Node 4

Client

RF=2

CL=2

Dasdd Rack 2

Rack 1

Write path - why so fast?- Commit log - append only- Periodic or batch sync to disk- Network topology aware

Node 1

Node 2

Node 3

Node 4

Client

RF=2

CL=2


Client

- Commit log - append only- Periodic or batch sync to disk- Network topology aware

Asia DC

Europe DC

- Most recent win- Eager retries- In-memory

- MemTable- Row Cache- Bloom Filters- Key Caches- Partition Summaries

- On disk- Partition Indexes- SSTables

Node 1

Node 2

Node 3

Node 4

Client

RF=3

CL=3

Read path

timestamp 67

timestamp 99

timestamp 88

Immediate vs. Eventual Consistency- if (writeCL + readCL) > replication_factor then immediate consistency- writeCL=ALL, readCL=1- writeCL=1, readCL=ALL- writeCL,readCL=QUORUM- If "stale" is measured in milliseconds,

how much are those milliseconds worth?

Node 1

Node 2

Node 3

Node 4

Client

RF=3

Modeling - new mindset- QDD, Query Driven Development- Nesting is ok- Duplication is ok- Writes are cheap

QDD - Conceptual model- Technology independent- Chen notation

QDD - Application workflow

QDD - Logical model

- Chebotko diagram

QDD - Physical model

- Technology dependent- Analysis and validation (finding problems)- Physical optimization (fixing problems)- Data types

Physical storage

- Primary key- Partition key

CREATE TABLE videos ( id int, title text, runtime int, year int, PRIMARY KEY (id));

id | title | runtime | year----+---------------------+---------+------ 1 | dzien swira | 93 | 2002 2 | chlopaki nie placza | 96 | 2000 3 | psy | 104 | 1992 4 | psy 2 | 96 | 1994

1title runtime year

dzien swira 93 2002

2title runtime year

chlopaki... 96 2000

3title runtime year

psy 104 1992

4title runtime year

psy 2 96 1994

SELECT FROM videosWHERE title = ‘dzien swira’

Physical storage

CREATE TABLE videos_with_clustering ( title text, runtime int, year int, PRIMARY KEY ((title), year));

- Primary key (could be compound)- Partition key- Clustering column (order, uniqueness)

title | year | runtime-------------+------+--------- godzilla | 1954 | 98 godzilla | 1998 | 140 godzilla | 2014 | 123 psy | 1992 | 104

godzilla1954 runtime

98

1998 runtime

140

2014 runtime

123

1992 runtime

104psy

SELECT FROM videos_with_clusteringWHERE title = ‘godzilla’

Physical storage

CREATE TABLE videos_with_composite_pk( title text, runtime int, year int, PRIMARY KEY ((title, year)));

- Primary key (could be compound)- Partition key (could be composite)- Clustering column (order, uniqueness)

title | year | runtime-------------+------+--------- godzilla | 1954 | 98 godzilla | 1998 | 140 godzilla | 2014 | 123 psy | 1992 | 104

godzilla:1954runtime

93


140


123

psy:1992runtime

104

SELECT FROM videos_with_composite_pkWHERE title = ‘godzilla’AND year = 1954

Modeling - clustering column(s)

Q: Retrieve videos an actor has appeared in (newest first).


CREATE TABLE videos_by_actor ( actor text, added_date timestamp, video_id timeuuid, character_name text, description text, encoding frozen<video_encoding>, tags set<text>, title text, user_id uuid, PRIMARY KEY ( )) WITH CLUSTERING ORDER BY ( );



CREATE TABLE videos_by_actor ( actor text, added_date timestamp, video_id timeuuid, character_name text, description text, encoding frozen<video_encoding>, tags set<text>, title text, user_id uuid, PRIMARY KEY ((actor), added_date)) WITH CLUSTERING ORDER BY (added_date desc);



CREATE TABLE videos_by_actor ( actor text, added_date timestamp, video_id timeuuid, character_name text, description text, encoding frozen<video_encoding>, tags set<text>, title text, user_id uuid, PRIMARY KEY ((actor), added_date, video_id)) WITH CLUSTERING ORDER BY (added_date desc);



CREATE TABLE videos_by_actor ( actor text, added_date timestamp, video_id timeuuid, character_name text, description text, encoding frozen<video_encoding>, tags set<text>, title text, user_id uuid, PRIMARY KEY ((actor), added_date, video_id, character_name)) WITH CLUSTERING ORDER BY (added_date desc);


Modeling - compound partition key

CREATE TABLE temperature_by_day ( weather_station_id text, date text, event_time timestamp, temperature text PRIMARY KEY ( )) WITH CLUSTERING ORDER BY ( );

Q: Retrieve last 1000 measurement from given day.


CREATE TABLE temperature_by_day ( weather_station_id text, date text, event_time timestamp, temperature text PRIMARY KEY ((weather_station_id), date, event_time)) WITH CLUSTERING ORDER BY (event_time desc);


1 day = 86 400 rows1 week = 604 800 rows1 month = 2 592 000 rows1 year = 31 536 000 rows


CREATE TABLE temperature_by_day ( weather_station_id text, date text, event_time timestamp, temperature text PRIMARY KEY ((weather_station_id, date), event_time)) WITH CLUSTERING ORDER BY (event_time desc);


Modeling - TTL

CREATE TABLE temperature_by_day ( weather_station_id text, date text, event_time timestamp, temperature text PRIMARY KEY ((weather_station_id, date), event_time)) WITH CLUSTERING ORDER BY (event_time desc);

Retention policy - keep data only from last week.

INSERT INTO temperature_by_day … USING TTL 604800;

Modeling - bit map index

CREATE TABLE car ( year timestamp, model text, color timestamp, vehicle_id int, //other columns PRIMARY KEY ((year, model, color), vehicle_id));

Q: Find car by year and/or model and/or color.

INSERT INTO car (year, model, color, vehicle_id, ...) VALUES (2000, 'Multipla', 'blue', 13, ...);INSERT INTO car (year, model, color, vehicle_id, ...) VALUES (2000, 'Multipla', '', 13, ...);INSERT INTO car (year, model, color, vehicle_id, ...) VALUES (2000, '', 'blue', 13, ...);INSERT INTO car (year, model, color, vehicle_id, ...) VALUES (2000, '', '', 13, ...);INSERT INTO car (year, model, color, vehicle_id, ...) VALUES ('', 'Multipla', 'blue', 13, ...);INSERT INTO car (year, model, color, vehicle_id, ...) VALUES ('', 'Multipla', '', 13, ...);INSERT INTO car (year, model, color, vehicle_id, ...) VALUES ('', '', 'blue', 13, ...);

SELECT * FROM car WHERE year=2000 and model=’’ and color=’blue’;

Modeling - wide rows

CREATE TABLE user ( email text, name text, age int, PRIMARY KEY (email));

Q: Find user by email.

Modeling - wide rows

CREATE TABLE user ( domain text, user text, name text, age int, PRIMARY KEY ((domain), user));

Q: Find user by email.

Modeling - versioning with lightweight transactions

CREATE TABLE document ( id text, content text, version int, locked_by text, PRIMARY KEY ((id)));

INSERT INTO document (id, content , version ) VALUES ( 'my doc', 'some content', 1) IF NOT EXISTS;

UPDATE document SET locked_by = 'andrzej' WHERE id = 'my doc' IF locked_by = null;

UPDATE document SET content = 'better content', version = 2, locked_by = null WHERE id = 'my doc' IF locked_by = 'andrzej';

Modeling - JSON with UDT and tuples

{"title": "Example Schema","type": "object","properties": {

"firstName": “andrzej”,"lastName": “ludwikowski”,"age": {

"description": "Age in years","type": "integer","minimum": 0

}},“x_dimension”: “1”,

“y_dimension”: “2”,}

CREATE TYPE age ( description text, type int, minimum int);

CREATE TYPE prop ( firstName text, lastName text, age frozen <age>);

CREATE TABLE json ( title text, type text, properties list<frozen <prop>>, dimensions tuple<int, int> PRIMARY KEY (title));

Common use cases

- Sensor data (Zonar)- Fraud detection (Barracuda)- Playlist and collections (Spotify)- Personalization and recommendation engines (Ebay)- Messaging (Instagram)

Common anti use cases

- Queue- Search engine