52
Cassandra Intro to Tyler Hobbs

Intro to Cassandra

Embed Size (px)

DESCRIPTION

An introduction to Apache Cassandra, covering the clustering model and the data model.Presented by Tyler Hobbs at the October 2011 Austin NoSQL meetup.

Citation preview

Page 1: Intro to Cassandra

CassandraIntro to

Tyler Hobbs

Page 2: Intro to Cassandra

Dynamo(clustering)

History

BigTable(data model)

Cassandra

Page 3: Intro to Cassandra

Users

Page 4: Intro to Cassandra

Every node plays the same role– No masters, slaves, or special nodes

– No single point of failure

Clustering

Page 5: Intro to Cassandra

Consistent Hashing

0

10

20

30

40

50

Page 6: Intro to Cassandra

0

10

20

30

40

50

Key: “www.google.com”

Consistent Hashing

Page 7: Intro to Cassandra

0

10

20

30

40

50

Key: “www.google.com”

14

md5(“www.google.com”)

Consistent Hashing

Page 8: Intro to Cassandra

0

10

20

30

40

50

14

Key: “www.google.com”

md5(“www.google.com”)

Consistent Hashing

Page 9: Intro to Cassandra

0

10

20

30

40

50

14

Key: “www.google.com”

md5(“www.google.com”)

Consistent Hashing

Page 10: Intro to Cassandra

0

10

20

30

40

50

14

Key: “www.google.com”

md5(“www.google.com”)

Replication Factor = 3

Consistent Hashing

Page 11: Intro to Cassandra

Client can talk to any node

Clustering

Page 12: Intro to Cassandra

Scaling

50

0

10

20

30

The node at50 owns the red portion

RF = 2

Page 13: Intro to Cassandra

Scaling

50

0

10

20

30

40Add a new node at 40

RF = 2

Page 14: Intro to Cassandra

Scaling

50

0

10

20

30

40Add a new node at 40

RF = 2

Page 15: Intro to Cassandra

Node Failures

50

0

10

20

30

RF = 2

40

Replicas

Page 16: Intro to Cassandra

Node Failures

50

0

10

20

30

RF = 2

40

Replicas

Page 17: Intro to Cassandra

Node Failures

50

0

10

20

30

RF = 2

40

Page 18: Intro to Cassandra

Consistency, Availability Consistency

– Can I read stale data? Availability

– Can I write/read at all? Tunable Consistency

Page 19: Intro to Cassandra

Consistency N = Total number of replicas R = Number of replicas read from

– (before the response is returned) W = Number of replicas written to

– (before the write is considered a success)

Page 20: Intro to Cassandra

Consistency N = Total number of replicas R = Number of replicas read from

– (before the response is returned) W = Number of replicas written to

– (before the write is considered a success)

W + R > N gives strong consistency

Page 21: Intro to Cassandra

Consistency

W + R > N gives strong consistency

N = 3W = 2R = 2

2 + 2 > 3 ==> strongly consistent

Page 22: Intro to Cassandra

Consistency

W + R > N gives strong consistency

N = 3W = 2R = 2

2 + 2 > 3 ==> strongly consistent

Only 2 of the 3 replicas must be available.

Page 23: Intro to Cassandra

Consistency Tunable Consistency

– Specify N (Replication Factor) per data set– Specify R, W per operation

Page 24: Intro to Cassandra

Consistency Tunable Consistency

– Specify N (Replication Factor) per data set– Specify R, W per operation– Quorum: N/2 + 1

• R = W = Quorum• Strong consistency• Tolerate the loss of N – Quorum replicas

– R, W can also be 1 or N

Page 25: Intro to Cassandra

Availability Can tolerate the loss of:

– N – R replicas for reads– N – W replicas for writes

Page 26: Intro to Cassandra

CAP Theorem

Availability

Consistency

During node or network failure:

100%

100%

Possible

Not Possible

Page 27: Intro to Cassandra

CAP Theorem

Availability

Consistency

During node or network failure:

100%

100%

Cassandra

Not Possible

Possible

Page 28: Intro to Cassandra

No single point of failure Replication that works Scales linearly

– 2x nodes = 2x performance• For both writes and reads

– Up to 100's of nodes Operationally simple Multi-Datacenter Replication

Clustering

Page 29: Intro to Cassandra

Comes from Google BigTable Goals

– Minimize disk seeks– High throughput– Low latency– Durable

Data Model

Page 30: Intro to Cassandra

Keyspace– A collection of Column Families– Controls replication settings

Column Family– Kinda resembles a table

Data Model

Page 31: Intro to Cassandra

Static– Object data– Similar to a table in a relational database

Dynamic– Pre-calculated query results– Materialized views

Column Families

Page 32: Intro to Cassandra

Static Column Families

zznate

driftx

thobbs

jbellis

password: *

password: *

password: *

name: Nate

name: Brandon

name: Tyler

password: * name: Jonathan site: riptano.com

Users

Page 33: Intro to Cassandra

Rows– Each row has a unique primary key– Sorted list of (name, value) tuples

• Like a sorted map or dictionary– The (name, value) tuple is called a “column”

Dynamic Column Families

Page 34: Intro to Cassandra

Dynamic Column Families

zznate

driftx

thobbs

jbellis

driftx: thobbs:

driftx: thobbs:mdennis: zznate

Following

zznate:

pcmanus xedin:

Page 35: Intro to Cassandra

Column Timestamps– Each column (tuple) has a timestamp– In the case of a collision, the latest timestamp wins– Client specifies timestamp with write– Writes are idempotent

• Infinite retries allowed

Dynamic Column Families

Page 36: Intro to Cassandra

Dynamic Column Families Other Examples:

– Timeline of tweets by a user– Timeline of tweets by all of the people a user is

following– List of comments sorted by score– List of friends grouped by state

Page 37: Intro to Cassandra

The Data API Two choices

– RPC-based API– CQL

• Cassandra Query Language

Page 38: Intro to Cassandra

Inserting Data

INSERT INTO users (KEY, “name”, “age”) VALUES (“thobbs”, “Tyler”, 24);

Page 39: Intro to Cassandra

Updating Data

INSERT INTO users (KEY, “age”) VALUES (“thobbs”, 34);

Updates are the same as inserts:

Or

UPDATE users SET “age” = 34 WHERE KEY = “thobbs”;

Page 40: Intro to Cassandra

Fetching Data

SELECT * FROM users WHERE KEY = “thobbs”;

Whole row select:

Page 41: Intro to Cassandra

Fetching Data

SELECT “name”, “age” FROM users WHERE KEY = “thobbs”;

Explicit column select:

Page 42: Intro to Cassandra

Fetching Data

UPDATE letters SET 1='a', 2='b', 3='c', 4='d', 5='e' WHERE KEY = “key”;

SELECT 1..3 FROM letters WHERE KEY = “key”;

Get a slice of columns

Returns [(1, a), (2, b), (3, c)]

Page 43: Intro to Cassandra

Fetching Data

SELECT FIRST 2 FROM letters WHERE KEY = “key”;

Get a slice of columns

Returns [(1, a), (2, b)]

SELECT FIRST 2 REVERSED FROM letters WHERE KEY = “key”;

Returns [(5, e), (4, d)]

Page 44: Intro to Cassandra

Fetching Data

SELECT 3..'' FROM letters WHERE KEY = “key”;

Get a slice of columns

Returns [(3, c), (4, d), (5, e)]

SELECT FIRST 2 REVERSED 4..'' FROM letters WHERE KEY = “key”;

Returns [(4, d), (3, c)]

Page 45: Intro to Cassandra

Deleting Data

DELETE FROM users WHERE KEY = “thobbs”;

Delete a whole row:

DELETE “age” FROM users WHERE KEY = “thobbs”;

Delete specific columns:

Page 46: Intro to Cassandra

Secondary Indexes

CREATE INDEX ageIndex ON users (age);

SELECT name FROM USERS WHERE age = 24 AND state = “TX”;

Builtin basic indexes

Page 47: Intro to Cassandra

Performance Writes

– 10k – 30k per second per node– Sub-millisecond latency

Reads– 1k – 10k per second per node– Depends on data set, caching– Usually 0.1 to 10ms latency

Page 48: Intro to Cassandra

Other Features Distributed Counters

– Can support millions of high-volume counters Excellent Multi-datacenter Support

– Disaster recovery– Locality

Hadoop Integration– Isolation of resources– Hive and Pig drivers

Compression

Page 49: Intro to Cassandra

What Cassandra Can't Do Transactions

– Unless you use a distributed lock– Atomicity, Isolation– These aren't needed as often as you'd think

Limited support for ad-hoc queries– Know what you want to do with the data

Page 50: Intro to Cassandra

Not One-size-fits-all Use alongside an RDBMS

– Use the RDBMS for highly-transactional or highly-relational data• Usually a small set of data

– Let Cassandra scale to handle the rest

Page 51: Intro to Cassandra

Language Support Good:

– Java– Python– Ruby– PHP– C#

Coming Soon:– Everything else, now that we have CQL

Page 52: Intro to Cassandra

Tyler Hobbs@tylhobbs

[email protected]

Questions?