Cassandra & CassandraObject
Michael [email protected]
Intro to Me
I Don’t Have To Scale
Intro to Cassandra
Distributed
Fault Tolerant
Elastic
The Ring
C
A
B
D
E
F
The Ring
C
A
B
D
E
F
The Ring
C
A
B
D
E
F
The Ring
C
A
B
D
E
F
The Ring
C
A
B
D
E
F
The Ring
C
A
B
D
E
F
The Ring
C
A
B
D
E
F
The Ring
C
A
B
D
E
F
someKey
RandomPartitioner OrderPreservingPartitionerMD5(key) key
Replication Factor
RF = 2
C
A
B
D
E
F
someKey
Replication Factor
C
A
B
D
E
F
someKey
RF = 3
Consistency Level
C
A
B
D
E
F
GET 'someKey'
ConsistencyLevel.ONE
Consistency Level
C
A
B
D
E
F
GET 'someKey'
ConsistencyLevel.ONE
Consistency Level
C
A
B
D
E
F
GET 'someKey'
ConsistencyLevel.ONE
Consistency Level
C
A
B
D
E
F
GET 'someKey'
ConsistencyLevel.ONE
Consistency Level
C
A
B
D
E
F
GET 'someKey'
ConsistencyLevel.QUORUM
Consistency Level
C
A
B
D
E
F
GET 'someKey'
ConsistencyLevel.QUORUM
Consistency Level
C
A
B
D
E
F
GET 'someKey'
ConsistencyLevel.QUORUM
Consistency Level
C
A
B
D
E
F
GET 'someKey'
ConsistencyLevel.ALL
Consistency Level
C
A
B
D
E
F
GET 'someKey'
ConsistencyLevel.ALL
Consistency Level
C
A
B
D
E
F
GET 'someKey'
ConsistencyLevel.ALL
Consistency Level
C
A
B
D
E
F
GET 'someKey'
ConsistencyLevel.ALL
Fault Tolerance
C
A
DEAD
D
E
F
GET 'someKey'
ConsistencyLevel.QUORUM
Fault Tolerance
C
A
DEAD
D
E
F
GET 'someKey'
ConsistencyLevel.QUORUM
Fault Tolerance
C
A
D
E
F
GET 'someKey'
Elastic
C
A
B
D
E
F
NEW
Elastic
C
A
B
D
E
F
NEW
Data Model
Key Value Store
someKey Some Value
Column Store
Column
Column
firstName Michael
Row (Column Family)
firstName Michael
lastName KoziarskisomeKey
Row (Super Column Family)
firstName Michael
lastName KoziarskisomeSubColumn
firstName Kate
lastName KoziarskiotherSubColumn
someKey
JSON
Column
{ 'name': 'first_name', 'value': 'Michael', 'timestamp': 1276040575}
Column
{ 'first_name': "Michael"}
Users["koz"] = { 'first_name': 'Michael', 'last_name': 'Koziarski'}
Column Family
Users["koz"] = { 'first_name': 'Michael', 'last_name': 'Koziarski'}
Column Family
Users["koz"] = { 'first_name': 'Michael', 'last_name': 'Koziarski'}
Column Family
Users["koz"] = { 'first_name': 'Michael', 'last_name': 'Koziarski'}
Column Family
UserAddresses["koz"] = { "home" : { "suburb": "Berhampore" "city": "Wellington", "country": "New Zealand" }, "office" : { "suburb": "CBD" "city": "Wellington", "country": "New Zealand" }}
Super Column Family
Don’t let the name fool you
One to Many
Timeline["nzkoz"] = { uuid_one: "http://twitter.com/chadfowler/status/15740739666", uuid_two: "http://twitter.com/dhh/status/15740689762", uuid_three: "http://twitter.com/glv/status/15740546908"}
<ColumnFamily CompareWith="TimeUUIDType" Name="Timeline"/>
Modeling
Schema Driven Modeling
Start with your Data
Model
firstNamelastNamedateOfBirth
User
Figure out the Queries
Query
SELECT firstName, lastName FROM `users`WHERE dateOfBirth < '1992-‐06-‐09'
Query
SELECT dateOfBirthFROM `users`WHERE firstName = 'Michael' AND lastName = 'Koziarski'
Query
SELECT firstName, lastName FROM `users`WHERE YEAR(dateOfBirth) = 1980
Query
SELECT COUNT(DISTINCT firstName)FROM `users`WHERE dateOfBirth < '1992-‐06-‐09'
Cassandra Limitations
No WHERE
No WHERE
Kinda
No ORDER
No ORDER
Kinda
No COUNT
No COUNTKinda
No SUM
Query DrivenModeling
Start with the Queries
Populate a data model which enables them
Modeling Example
Users
Users["koz"] = { 'first_name': "Michael", 'last_name': "Koziarski"}
Users
Users["koz"] = { 'first_name': "Michael", 'last_name': "Koziarski"}
connection.get(:Users, "koz")
Koziarski Family
UsersByLastName["koziarski"] = { uuid_one: "koz" uuid_two: "kate"}
Koziarski Family
UsersByLastName["koziarski"] = { uuid_one: "koz" uuid_two: "kate"}
connection.get(:UsersByLastName, "koziarski").values.map do |key| connection.get(:Users, key)end
Share my BirthdayUsersByDOB["1980-‐08-‐15"] = { uuid_one: "koz"}
Share my BirthdayUsersByDOB["1980-‐08-‐15"] = { uuid_one: "koz"}
connection.get(:UsersByDOB, "1980-‐08-‐15")
Users Born in 1980UsersByDOB["1980-‐08-‐15"] = { uuid_one: "koz"}
Users Born in 1980UsersByDOB["1980-‐08-‐15"] = { uuid_one: "koz"}
connection.get_range(:UsersByDOB, :start=>"1980", :finish=>"1981")
Users Born in 1980UsersByDOB["1980-‐08-‐15"] = { uuid_one: "koz"}
connection.get_range(:UsersByDOB, :start=>"1980", :finish=>"1981")
Only with OrderPreservingPartitioner
A Column Family per Query
Do you really need Cassandra?
CassandraObject
class Customer < CassandraObject::Base attribute :first_name, :type => :string attribute :last_name, :type => :string attribute :date_of_birth, :type => :date attribute :signed_up_at, :type => :time_with_zone
validate :should_be_cool
key :uuid
index :date_of_birth
association :invoices, :unique=>false, :inverse_of=>:customer
private
def should_be_cool unless ["Michael", "Anika", "Evan", "James"].include?(first_name) errors.add(:first_name, "must be that of a cool person") end endend
Motivations
Prove ActiveModel
Learn Cassandra
Have a fun Side Project
Solve my Scaling Problems
Solve my Scaling Problems
Mostly AR Compatible
Not Compatible
def index @people = @customer.people.order(params[:order]). limit(params[:limit]). where(...)end
Compatible
def create @person = Customer.new params[:customer] if @person.save redirect_to @person else render :action=>'new' endend
Compatible
<%= form_for(@customer) do |customer| %> <%= customer.error_messages %> <%= customer.text_field :first_name %> <%= customer.submit "Save" %><% end %>
Walkthrough
class Customer < CassandraObject::Base attribute :first_name, :type => :string attribute :last_name, :type => :string attribute :date_of_birth, :type => :date attribute :signed_up_at, :type => :time_with_zone
validate :should_be_cool
key :uuid
index :date_of_birth
association :invoices, :unique=>false, :inverse_of=>:customer
private
def should_be_cool unless ["Michael", "Anika", "Evan", "James"].include?(first_name) errors.add(:first_name, "must be that of a cool person") end endend
class Invoice < CassandraObject::Base attribute :number, :type=>:integer attribute :total, :type=>:float attribute :gst_number, :type=>:string
# indexes can have a single entry also. index :number, :unique=>true
# bi-‐directional associations with read-‐repair support. association :customer, :unique=>true, :inverse_of=>:invoices
# Read migration support migrate 1 do |attrs| attrs["total"] ||= rand(2000) / 100.0 end
migrate 2 do |attrs| attrs["gst_number"] = "66-‐666-‐666" end
key :natural, :attributes => :numberend
Attributes
attribute :first_name, :type => :stringattribute :last_name, :type => :stringattribute :date_of_birth, :type => :dateattribute :signed_up_at, :type => :time_with_zoneattribute :number, :type => :integerattribute :total, :type => :floatattribute :gst_number, :type => :string
Attributes
@customer.first_name = "Michael"@customer.attributes= {:first_name=>"Michael"}
attribute :first_name, :type => :string
Validationsvalidate :should_be_cool
def should_be_cool unless ["Michael", "Anika", "Evan", "James"].include?(first_name) errors.add(:first_name, "must be that of a cool person") endend
Validationsvalidate :should_be_cool
def should_be_cool unless ["Michael", "Anika", "Evan", "James"].include?(first_name) errors.add(:first_name, "must be that of a cool person") endend
@customer.first_name = "Marcel"@customer.valid? # => false
Validations
validates_confirmation_of :tosvalidates_format_of :gst_number, :with=> /.../validates_length_of :first_name, :max=>123
Keys
Key Selection Matters
Key Selection Matters
UsersByDOB["1980-‐08-‐15"] = { uuid_one: "koz"}
connection.get_range(:UsersByDOB, :start=>"1980", :finish=>"1981")
Keys
key :uuid
Keys
key :uuid
"bf1ba5da-‐735a-‐11df-‐8b47-‐377649cf993b"
Keys
key :natural, :attributes => :number
Custom Key Factories
key RedisKeyFactory.new(REDIS_CONNECTION, "customer_key")
Custom Key Factoriesclass RedisKeyFactory def initialize(connection, key) @connection, @key = connection, key end def next_key(object) @connection.incr(@key) end # Parse should create a new key object from the 'to_param' format def parse(string) string.to_i end # create should create a new key object from the cassandra format. def create(string) string.to_i end end
Migrations
Migrationsclass AddLicenseNameToArticle < ActiveRecord::Migration def self.up add_column :articles, :license_name, :string, :default=>"Exclusive"
execute "UPDATE articles SET license_name = 'Exclusive' WHERE price_first IS NOT NULL" execute "UPDATE articles SET license_name = 'Syndicated' WHERE price_first IS NULL" end
def self.down remove_column :articles, :license_name endend
Migrations
{ 'price_first': 45, 'schema_version': 0}
Migrationsclass Article < CassandraObject::Base attribute :price_first, :type=>:float attribute :license_name, :type=>:string
migrate 1 do |attrs| if attrs[:price_first] attrs[:license_name] = "Exclusive" else attrs[:license_name] = "Syndicated" end endend
Migrationsclass Article < CassandraObject::Base attribute :price_first, :type=>:float attribute :license_name, :type=>:string
migrate 1 do |attrs| if attrs[:price_first] attrs[:license_name] = "Exclusive" else attrs[:license_name] = "Syndicated" end endend
@article = Article.get("some-‐old-‐story")@article.license_name # => "Exclusive"
Migrations
{ 'price_first': 45, 'schema_version': 1, 'license_name': 'Exclusive'}
Indexes
Indexesclass Article < CassandraObject::Base attribute :slug, :type=>:string key :uuid index :slug, :unique=>trueend
connection.insert(:ArticlesBySlug, @article.slug, {UUID.new => @article.key})
Indexesclass Article < CassandraObject::Base attribute :slug, :type=>:string key :uuid index :slug, :unique=>trueend
@article = Article.find_by_slug("some-‐slug")
connection.insert(:ArticlesBySlug, @article.slug, {UUID.new => @article.key})
Indexes
class Article < CassandraObject::Base attribute :publication_date, :type=>:date key :uuid index :publication_date, :unique=>falseend
connection.insert(:ArticlesByPublicationDate, @article.publication_date,
{UUID.new => @article.key})
Indexes
class Article < CassandraObject::Base attribute :publication_date, :type=>:date key :uuid index :publication_date, :unique=>falseend
connection.insert(:ArticlesByPublicationDate, @article.publication_date,
{UUID.new => @article.key})
@article = Article.find_all_by_publication_date(Date.today -‐ 1)
Indexes
class Article < CassandraObject::Base attribute :publication_date, :type=>:date key :uuid index :publication_date, :unique=>falseend
connection.insert(:ArticlesByPublicationDate, @article.publication_date,
{UUID.new => @article.key})
@article = Article.find_all_by_publication_date(Date.today -‐ 1)
Read-Repairresults = []connection.get(:ArticlesByPublicationDate, date).each do |(uuid, key)| article = Article.get(uuid) if article.publication_date != date connection.delete(:ArticlesByPublicationDate, date, uuid) else results << article endendresults
Associations
Associationsclass Invoice < CassandraObject::Base association :customer, :unique=>true, :inverse_of=>:invoicesend
class Customer < CassandraObject::Base association :invoices, :unique=>false, :inverse_of=>:customerend
@customer.invoices.create! params[:invoice]
@invoice.customer
Project Status
Very Beta
In Flux
Taking Patches
Thanks!Michael Koziarski
http://github.com/NZKoz/cassandra_objecthttp://cassandra.apache.org/