NoSQL Matters 2012 - Real Time Big Data in practice with Cassandra

Preview:

Citation preview

Real-Time Big Data in practice with Cassandra

Michaël Figuière@mfiguiere

©2012 DataStax

Speaker

Michaël Figuière

@mfiguiere

2

©2012 DataStax

Ring Architecture

CassandraNode

Node

NodeNode

Node

Node

3

©2012 DataStax

Ring Architecture

4

Node

Node Replica

Replica

NodeReplica

©2012 DataStax

Linear Scalability

5

Client Writes/s by Node Count - Replication Factor = 3

©2012 DataStax

Client / Server Communication

Client

Client

Client

Client

Node?

Node

6

Replica

Replica

Replica

Node

©2012 DataStax

Client / Server Communication

Client

Client

Client

Client

Node

Node

Coordinator node:Forwards all R/W requeststo corresponding replicas

7

Replica

Replica

ReplicaNode

©2012 DataStax

3 replicas

A A A

Time

8

Tunable Consistency

©2012 DataStax

Write and wait for acknowledge from one node

Write ‘B’

B A A

9

Time

A A A

Tunable Consistency

©2012 DataStax

R + W < N

Read waiting for one node to answer

B A A

10

B A A

A A A

Write and wait for acknowledge from one node

Time

Tunable Consistency

©2012 DataStax

R + W = N

11

B B A

B A

A A A

B

Write and wait for acknowledges from two nodes

Read waiting for one node to answer

Tunable ConsistencyTime

©2012 DataStax

Tunable Consistency

R + W > N

12

B A

B A

A A A

B

B

Write and wait for acknowledges from two nodes

Read waiting for two nodes to answer

Time

©2012 DataStax

Tunable Consistency

R = W = QUORUM

13

B A

B A

A A A

B

B

Time

QUORUM = (N / 2) + 1

©2012 DataStax

Request Path

1

2

2

2

3

3

3

4

14

Client

Client

Client

Client

Node

Node

Node

Replica

Replica

Replica

Coordinator node

©2012 DataStax

Column Family Data Model

15

Jonathan

name

jb@ds.com

email

123 main

address

TX

statejbellis

Daria

name

dh@ds.com

email

45 2nd st

address

CA

statedhutch

Eric

name

eg@ds.com

emailegilmore

Row Key Columns

©2012 DataStax

Column Family Data Model

16

dhutch egilmore datastax mzcassiejbellis

egilmoredhutch

datastax mzcassieegilmore

Row Key Columns

©2012 DataStax

CQL3 Data Model

17

gmason

user_id

1765

tweet_id

phenry

author

Give me liberty or give me death

body

PartitionKey

gmason 1742 gwashington I chopped down the cherry tree

ahamilton 1797 jadams A government of laws, not men

ahamilton 1742 gwashington I chopped down the cherry tree

RemainingKey

Timeline Table

©2012 DataStax

CQL3 Data Model

18

gmason

user_id

1765

tweet_id

phenry

author

Give me liberty or give me death

body

gmason 1742 gwashington I chopped down the cherry tree

ahamilton 1797 jadams A government of laws, not men

ahamilton 1742 gwashington I chopped down the cherry tree

Timeline Table

CREATE TABLE timeline ( user_id varchar, tweet_id uuid, author varchar, body varchar, PRIMARY KEY (user_id, tweet_id));

CQL

©2012 DataStax

CQL3 Data Model

19

gmason

user_id

1765

tweet_id

phenry

author

Give me liberty or give me death

body

gmason 1742 gwashington I chopped down the cherry tree

ahamilton 1797 jadams A government of laws, not men

ahamilton 1742 gwashington I chopped down the cherry tree

gwashington

[1742, author]

I chopped down the...

[1742, body]

phenry

[1765, author]

Give me liberty or give...

[1765, body]gmason

gwashington

[1742, author]

I chopped down the...

[1742, body]

jadams

[1797, author]

A government of laws...

[1797, body]ahamilton

Timeline Table

Timeline Physical Layout

©2012 DataStax

Real-Time Analytics

Google Analytics gives you immediate statistics about

your website traffic

20

©2012 DataStax

Web Analytics Data Model

21

/index.html

url

12:00

time

354

views

/index.html 12:01 402

/contacts.html 12:00 23

/contacts.html 12:01 20

Analytics Table

CREATE TABLE analytics ( url varchar, time timestamp, views counter, from_search counter, direct counter, from_referrer counter, PRIMARY KEY (url, time));

300

from_search

333

3

4

20

direct

25

0

1

34

from_referrer

44

20

15

CQL

©2012 DataStax

Web Analytics Data Model

22

/index.html

url time

354

views

/index.html 402

/contacts.html 23

/contacts.html 20

Analytics Table

UPDATE analyticsSET views = views + 1, from_search = from_search + 1WHERE url = '/index.html'AND time = '2012-10-06 12:00';

300

from_search

333

3

4

20

direct

25

0

1

34

from_referrer

44

20

15

CQL

12:00

12:01

12:00

12:01

©2012 DataStax

Web Analytics Data Model

23

/index.html

url time

354

views

/index.html 402

/contacts.html 23

/contacts.html 20

Analytics Table

SELECT * FROM analyticsWHERE url = '/index.html'

300

from_search

333

3

4

20

direct

25

0

1

34

from_referrer

44

20

15

CQL

12:00

12:01

12:00

12:01

©2012 DataStax

Connect and Write

24

Cluster cluster = Cluster.builder() .addContactPoints("127.0.0.1", "127.0.0.2") .build();

Session session = cluster.connect();

session.execute( "INSERT INTO user (user_id, name, email) VALUES (12345, 'johndoe', 'john@doe.com')");

©2012 DataStax

Read

25

ResultSet rs = session.execute("SELECT * FROM user");

List<CQLRow> rows = rs.fetchAll(); for (CQLRow row : rows) {

String userId = row.getString("user_id"); String name = row.getString("name"); String email = row.getString("email");}

©2012 DataStax

Object Mapping

26

public enum Gender {

@EnumValue("m") MALE, @EnumValue("f") FEMALE;}

@Table("user_and_messages")public class User { @Column("user_id") private String userId; private String name; private String email; private Gender gender;}

©2012 DataStax

Aggregation

27

@Table("user_and_messages")public class User { @Column("user_id") private String userId; private String name; private String email; @GroupBy("user_id") private List<Message> messages;}

public class Message {

private String title; private String body; }

©2012 DataStax

Inheritance

28

@InheritanceValue("tv")public class TV extends Product {

private float size;}

@Table("catalog")@Inheritance({Phone.class, TV.class})@InheritanceColumn("product_type")public abstract class Product {

@Column("product_id") private String productId; private float price; private String vendor; private String model;

}

©2012 DataStax

Online Business Intelligence

Application Cassandra Hadoop

Storage for application in production

Using results in production

Distributed batch processing

Storage forresults

29

@mfiguiere

blog.datastax.com

Stay Tuned!