View
241
Download
4
Category
Preview:
Citation preview
Cassandra data modeling
Andrey Kozlov
Cassandra features
● “Masterless” architecture
● Scalable
● Fast inserts
SSTable for storing data
Cassandra Data Structure
● Column Family
o Row
Column
Map<RowKey, Map<ColumnKey, ColumnValue>>
Simple Data Table
CREATE TABLE employees (
name text,
age int,
role text,
PRIMARY KEY (name));
INSERT INTO employees (name, age,
role) VALUES ('john', 37, 'dev');
INSERT INTO employees (name, age,
role) VALUES ('eric', 38, 'ceo');
name | age | role
------+-----+------
eric | 38 | ceo
john | 37 | dev
age role
john 37 dev
age role
eric 38 ceo
Data Table with Composite key
CREATE TABLE employees (
company text,
name text,
age int,
role text,
PRIMARY KEY (company,name)
);
company | name | age | role
---------+------+-----+------
OSC | eric | 38 | ceo
OSC | john | 37 | dev
RKG | anya | 29 | lead
RKG | ben | 27 | dev
RKG | chan | 35 | ops
eric:age eric:role john:age joghn:role
OSC 38 dev 37 dev
anya:age anya:role ben:age ben:role chad:age chad:role
RKG 29 lead 27 dev 35 ops
Select by Composite key
CREATE TABLE no_column_skip(
a int,
b int,
c int,
d int,
e int,
PRIMARY KEY (a, b, c, d));
Valid:
SELECT ... WHERE a=0 AND (b, c) > (1, 2)
SELECT ... WHERE a=0 AND (b) > (3)
SELECT ... WHERE a=0 AND (b, c, d) > (1, 2, 5)
Not Valid:
SELECT ... WHERE a=0 AND (b, d) > (1, 2)
SELECT ... WHERE a=0 AND (c) > (3)
SELECT ... WHERE (b, c, d) > (1, 2)
Many to Many
Normalized with one additional table
Normalized with two tables
Particular denormalization
Particular denormalization with composite
keys
Event data for time period
Event data for time period
Secondary index template
CREATE TABLE playlists (
id uuid,
song_order int,
song_id uuid,
title text,
album text,
artist text,
PRIMARY KEY (id, song_order ) );
INSERT INTO playlists (id, song_order, song_id, title, artist, album)
VALUES (playlist_1, 1, song_1, 'La Grange', 'ZZ Top', 'Tres Hombres');
INSERT INTO playlists (id, song_order, song_id, title, artist, album)
VALUES (playlist_1, 2, song_2, 'Moving in Stereo', 'Fu Manchu', 'We Must Obey');
INSERT INTO playlists (id, song_order, song_id, title, artist, album)
VALUES (playlist_2, 3, song_3, 'Hang On', 'Fu Manchu', 'California Crossing');
CREATE INDEX ON playlists( artist );
Secondary index template
id song_oder song_id title artist album
playlist_1 1 song_1 La Grange ZZ Top Tres Hombres
2 song_2 Moving in Stereo Fu Manchu We Must Obey
...
playlist_2 1 song_3 Hang On Fu Manchu California Crossing
...
artist playlist_id song_order
ZZ Top playlist_1 1
...
Fu manchu playlist_1 2
playlist_2 1
...
Secondary index distribution
Thank you for attention
References
• https://ru.wikipedia.org/wiki/BigTable
• http://www.datastax.com/documentation/cassandra/2.1/cassandra/gettingStartedCassandraIntro.ht
ml
• http://www.datastax.com/documentation/cassandra/2.1/cassandra/dml/dml_write_path_c.html
• http://www.datastax.com/documentation/cassandra/2.1/cassandra/dml/architectureClientRequests
Read_c.html
• http://www.datastax.com/documentation/cassandra/2.1/cassandra/dml/dml_config_consistency_c.
html#concept_ds_umf_5xx_zj__table_vs2_f2s_gk
• http://rollerweblogger.org/roller/entry/composite_keys_in_cassandra
• http://www.ebaytechblog.com/2012/07/16/cassandra-data-modeling-best-practices-part-
1/#.VPNAcvmUeUT
• https://highlyscalable.wordpress.com/2012/03/01/nosql-data-modeling-techniques/
• http://habrahabr.ru/post/203200/
Recommended