36
ADVANCED DATA MODELING AND BITMAP INDEXES Matt Stump [email protected] Monday, May 6, 13

Advanced Data Modeling and Bitmap Indexes

Embed Size (px)

DESCRIPTION

Matt Stump presents for the DataStax Cassandra South Bay Users group on advanced data modeling and bitmap indexes.

Citation preview

Page 1: Advanced Data Modeling and Bitmap Indexes

ADVANCED DATA MODELING AND BITMAP INDEXES

Matt [email protected]

Monday, May 6, 13

Page 2: Advanced Data Modeling and Bitmap Indexes

WHO ARE YOUR

Customers?

Monday, May 6, 13

Page 3: Advanced Data Modeling and Bitmap Indexes

WHERE DO THEY

Hang out?

Monday, May 6, 13

Page 4: Advanced Data Modeling and Bitmap Indexes

HOW SHOULD YOU

Engage?

Monday, May 6, 13

Page 5: Advanced Data Modeling and Bitmap Indexes

What is User Experience?

Monday, May 6, 13

Page 6: Advanced Data Modeling and Bitmap Indexes

What is my Data

?Monday, May 6, 13

Page 7: Advanced Data Modeling and Bitmap Indexes

Form Follows Function

Monday, May 6, 13

Page 8: Advanced Data Modeling and Bitmap Indexes

Data Follows Queries

Monday, May 6, 13

Page 9: Advanced Data Modeling and Bitmap Indexes

Primary Key

CREATE TABLE users ( username text PRIMARY KEY, first_name text, last_name text, postal_code text, last_login timestamp);

INSERT INTO users (username,first_name,last_name,postal_code,last_login)VALUES ('cstar','Cassandra','Database','11111','2013-4-4');

SELECT first_name, last_nameFROM users WHERE username = 'cstar';

Monday, May 6, 13

Page 10: Advanced Data Modeling and Bitmap Indexes

Primary Key

RowKey username first_name last_name postal_code

cstar cstar Cassandra Database 11111

user2 user2 Some Guy 22222

Monday, May 6, 13

Page 11: Advanced Data Modeling and Bitmap Indexes

Secondary Index

CREATE INDEX user_zipcode ON users(postal_code);

11111 cstar

22222 user2 user3 user456 ...

Monday, May 6, 13

Page 12: Advanced Data Modeling and Bitmap Indexes

Where Secondary Indexes Break

High Cardinality Data1

Only one index per query2

Indexes are distributed3

Only some datatypes; no counters4

Range queries are expensive5

Monday, May 6, 13

Page 13: Advanced Data Modeling and Bitmap Indexes

Roll Your Own Using Wide Rows

RowKey 05/02/2012 02/01/2013 05/02/2013 ...

user2 JSON JSON JSON JSON

All events for “user2” indexed by time

Monday, May 6, 13

Page 14: Advanced Data Modeling and Bitmap Indexes

Limitations to Rolling Your Own

Can’t query across rows1

Only some datatypes; no counters2

Requires lots of work in the application3

No complex queries4

Monday, May 6, 13

Page 15: Advanced Data Modeling and Bitmap Indexes

What do I need

?Monday, May 6, 13

Page 16: Advanced Data Modeling and Bitmap Indexes

A Query Engine Wishlist

High cardinality data; counters1

Complex queries, multiple clauses2

Results in < 500ms for billions of rows3

Sub-field searching; regex4

Range queries5

Monday, May 6, 13

Page 17: Advanced Data Modeling and Bitmap Indexes

First Iteration: Ginormus String Sets

11111 cstar

22222 user2 user3 user456 ...

11111 22222

Monday, May 6, 13

Page 18: Advanced Data Modeling and Bitmap Indexes

Bitmaps

Monday, May 6, 13

Page 19: Advanced Data Modeling and Bitmap Indexes

Bitmaps

Monday, May 6, 13

Page 20: Advanced Data Modeling and Bitmap Indexes

Bitmaps: How do they Work?

0-7 8-15 16-23 24-31

11111 11010011 1011011 1010000 00000000

22222 00000000 0011011 00000000 00000000

Monday, May 6, 13

Page 21: Advanced Data Modeling and Bitmap Indexes

Bitmaps: Equality

0-7 8-15 16-23 24-31

11111 11010011 1011011 1010000 00000000

22222 00000000 0011011 00000000 00000000

SELECT * FROM users WHERE postal_code IN ('11111','22222');

0-7 8-15 16-23 24-31

11111 & 22222 00000000 0011011 00000000 00000000

Monday, May 6, 13

Page 22: Advanced Data Modeling and Bitmap Indexes

Bitmaps: Range, or How Do I Query Counters?

Field Value 0-7 8-15 16-23 24-31

Event2 1 11010011 1011011 1010000 00000000

Event2 4 00000000 0011011 00000000 00000000

0-7 8-15 16-23 24-31

1 & 4 00000000 0011011 00000000 00000000

SELECT * FROM users WHERE Event2 > 0 AND Event2 < 5;

Monday, May 6, 13

Page 23: Advanced Data Modeling and Bitmap Indexes

Trigrams; AKA You Promised REGEX

Field Value 0-7 8-15 16-23 24-31

last_name “foo” 11010011 1011011 1010000 00000000

last_name “bar” 00000000 0011011 00000000 00000000

0-7 8-15 16-23 24-31“foo” & “bar” 00000000 0011011 00000000 00000000

SELECT * FROM users WHERE last_name ~= ‘f.*bar’;

INSERT INTO users (username,first_name,last_name,postal_code,last_login)VALUES ('foobar82','johnny','foobar','94110','2013-4-4');

Monday, May 6, 13

Page 24: Advanced Data Modeling and Bitmap Indexes

Monday, May 6, 13

Page 25: Advanced Data Modeling and Bitmap Indexes

Not Everything is Roses and Honey

Indexes can be huge1

Requires a read before write2

Requires synchronization3

4

Monday, May 6, 13

Page 26: Advanced Data Modeling and Bitmap Indexes

Compression

2

4

Monday, May 6, 13

Page 27: Advanced Data Modeling and Bitmap Indexes

RLE Compression: How it Works

2

4

Header Fill, 11 blocks of 1s Literal 15 bits Fill,18 blocks of 0s Literal 15 bits

1010 10000000001011 111010000100101 000000000010010 000000010000011

Example taken from PWAH: http://www.sjvs.nl/?p=72

Monday, May 6, 13

Page 28: Advanced Data Modeling and Bitmap Indexes

Dealing with Read Before Write

Partition Index Using a Ring

4

{ "product": 124, "user": 22, "event": "event2", "value": "Name=Jonathan+Doe&Age=23"}

Apply Hash to User Configured Fieldhash(:product) = c62fb32eadd5a0fcceb1ddf2697e2345c604f451

Monday, May 6, 13

Page 29: Advanced Data Modeling and Bitmap Indexes

Ring Partitioning

Solves read before write1

Solves synchronization issues2

Insures index locality3

4 Easy to isolate big customers4

Index size is limited to the largest customer

5

Monday, May 6, 13

Page 30: Advanced Data Modeling and Bitmap Indexes

Sparse Indexes

2

4

Offset 0x00 Offset 0x01 Offset 0xA0 Offset 0xF0

Field1 0111010101101111 1001010100100101 0111010000100101 0111011100100101

Only Store the Set Bits

Monday, May 6, 13

Page 31: Advanced Data Modeling and Bitmap Indexes

Query & Indexing Engine

The Whole Enchilada

4

Queries and Events

Monday, May 6, 13

Page 32: Advanced Data Modeling and Bitmap Indexes

Goals

Core query and index engine, wrapped1

Extensible events and queries via Lua2

Equality, range and REGEX queries3

44

No single point of failure5

Distributed, <500ms for billions of rows

Monday, May 6, 13

Page 34: Advanced Data Modeling and Bitmap Indexes

GOT ANY

Questions

?Monday, May 6, 13

Page 35: Advanced Data Modeling and Bitmap Indexes

Thanks

4

Eric Tschetter of the Druid Projectand

Cassandra Devs for answering my questions

Monday, May 6, 13

Page 36: Advanced Data Modeling and Bitmap Indexes

THANK YOU!

Matt Stumpwww.matthewstump.com

@mattstump

Monday, May 6, 13