Upload
planet-cassandra
View
1.779
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Matt Stump presents for the DataStax Cassandra South Bay Users group on advanced data modeling and bitmap indexes.
Citation preview
WHO ARE YOUR
Customers?
Monday, May 6, 13
WHERE DO THEY
Hang out?
Monday, May 6, 13
HOW SHOULD YOU
Engage?
Monday, May 6, 13
What is User Experience?
Monday, May 6, 13
What is my Data
?Monday, May 6, 13
Form Follows Function
Monday, May 6, 13
Data Follows Queries
Monday, May 6, 13
Primary Key
CREATE TABLE users ( username text PRIMARY KEY, first_name text, last_name text, postal_code text, last_login timestamp);
INSERT INTO users (username,first_name,last_name,postal_code,last_login)VALUES ('cstar','Cassandra','Database','11111','2013-4-4');
SELECT first_name, last_nameFROM users WHERE username = 'cstar';
Monday, May 6, 13
Primary Key
RowKey username first_name last_name postal_code
cstar cstar Cassandra Database 11111
user2 user2 Some Guy 22222
Monday, May 6, 13
Secondary Index
CREATE INDEX user_zipcode ON users(postal_code);
11111 cstar
22222 user2 user3 user456 ...
Monday, May 6, 13
Where Secondary Indexes Break
High Cardinality Data1
Only one index per query2
Indexes are distributed3
Only some datatypes; no counters4
Range queries are expensive5
Monday, May 6, 13
Roll Your Own Using Wide Rows
RowKey 05/02/2012 02/01/2013 05/02/2013 ...
user2 JSON JSON JSON JSON
All events for “user2” indexed by time
Monday, May 6, 13
Limitations to Rolling Your Own
Can’t query across rows1
Only some datatypes; no counters2
Requires lots of work in the application3
No complex queries4
Monday, May 6, 13
What do I need
?Monday, May 6, 13
A Query Engine Wishlist
High cardinality data; counters1
Complex queries, multiple clauses2
Results in < 500ms for billions of rows3
Sub-field searching; regex4
Range queries5
Monday, May 6, 13
First Iteration: Ginormus String Sets
11111 cstar
22222 user2 user3 user456 ...
11111 22222
Monday, May 6, 13
Bitmaps
Monday, May 6, 13
Bitmaps
Monday, May 6, 13
Bitmaps: How do they Work?
0-7 8-15 16-23 24-31
11111 11010011 1011011 1010000 00000000
22222 00000000 0011011 00000000 00000000
Monday, May 6, 13
Bitmaps: Equality
0-7 8-15 16-23 24-31
11111 11010011 1011011 1010000 00000000
22222 00000000 0011011 00000000 00000000
SELECT * FROM users WHERE postal_code IN ('11111','22222');
0-7 8-15 16-23 24-31
11111 & 22222 00000000 0011011 00000000 00000000
Monday, May 6, 13
Bitmaps: Range, or How Do I Query Counters?
Field Value 0-7 8-15 16-23 24-31
Event2 1 11010011 1011011 1010000 00000000
Event2 4 00000000 0011011 00000000 00000000
0-7 8-15 16-23 24-31
1 & 4 00000000 0011011 00000000 00000000
SELECT * FROM users WHERE Event2 > 0 AND Event2 < 5;
Monday, May 6, 13
Trigrams; AKA You Promised REGEX
Field Value 0-7 8-15 16-23 24-31
last_name “foo” 11010011 1011011 1010000 00000000
last_name “bar” 00000000 0011011 00000000 00000000
0-7 8-15 16-23 24-31“foo” & “bar” 00000000 0011011 00000000 00000000
SELECT * FROM users WHERE last_name ~= ‘f.*bar’;
INSERT INTO users (username,first_name,last_name,postal_code,last_login)VALUES ('foobar82','johnny','foobar','94110','2013-4-4');
Monday, May 6, 13
Monday, May 6, 13
Not Everything is Roses and Honey
Indexes can be huge1
Requires a read before write2
Requires synchronization3
4
Monday, May 6, 13
Compression
2
4
Monday, May 6, 13
RLE Compression: How it Works
2
4
Header Fill, 11 blocks of 1s Literal 15 bits Fill,18 blocks of 0s Literal 15 bits
1010 10000000001011 111010000100101 000000000010010 000000010000011
Example taken from PWAH: http://www.sjvs.nl/?p=72
Monday, May 6, 13
Dealing with Read Before Write
Partition Index Using a Ring
4
{ "product": 124, "user": 22, "event": "event2", "value": "Name=Jonathan+Doe&Age=23"}
Apply Hash to User Configured Fieldhash(:product) = c62fb32eadd5a0fcceb1ddf2697e2345c604f451
Monday, May 6, 13
Ring Partitioning
Solves read before write1
Solves synchronization issues2
Insures index locality3
4 Easy to isolate big customers4
Index size is limited to the largest customer
5
Monday, May 6, 13
Sparse Indexes
2
4
Offset 0x00 Offset 0x01 Offset 0xA0 Offset 0xF0
Field1 0111010101101111 1001010100100101 0111010000100101 0111011100100101
Only Store the Set Bits
Monday, May 6, 13
Query & Indexing Engine
The Whole Enchilada
4
Queries and Events
Monday, May 6, 13
Goals
Core query and index engine, wrapped1
Extensible events and queries via Lua2
Equality, range and REGEX queries3
44
No single point of failure5
Distributed, <500ms for billions of rows
Monday, May 6, 13
Resources
Lots of Papers on Bitmap Compressionhttp://www-users.cs.umn.edu/~kewu/annotated.html
4
How Google Code Search Workedhttp://swtch.com/~rsc/regexp/regexp4.html
Monday, May 6, 13
GOT ANY
Questions
?Monday, May 6, 13
Thanks
4
Eric Tschetter of the Druid Projectand
Cassandra Devs for answering my questions
Monday, May 6, 13
THANK YOU!
Matt Stumpwww.matthewstump.com
@mattstump
Monday, May 6, 13