Upload
planet-cassandra
View
475
Download
0
Embed Size (px)
DESCRIPTION
Speaker: Patricia Gorla, Systems Engineer at Opensource Connections Video: http://www.youtube.com/watch?v=jnQ1atqOIZk&list=PLqcm6qE9lgKLoYaakl3YwIWP4hmGsHm5e&index=26 For any venture, storing your data is just the first step in making sense of it. How do you make your system discoverable? How do you tune your relevancy to accommodate real-time updates? In this session, we explore pairing Cassandra with Solr using Datastax Enterprise Search, and look at different search paradigms to help your users find patterns in your data.
Citation preview
#CASSANDRAEU CASSANDRASUMMITEU
Adventures in Discoverability with C* and Solr
Patricia Gorla, Systems Engineer@patriciagorla@o19s
#CASSANDRAEU CASSANDRASUMMITEU
• Solr
• Cassandra
• Information retrieval
About Me
Paul Hostetler - phostetler.com
#CASSANDRAEU CASSANDRASUMMITEU
How Do I Find What I’m Looking For?
Simple Complex
Aristotle’s birthplace? All ancient Greek philosophers?
Coordinates of Stagira? All cities within 100km of Stagira
#CASSANDRAEU CASSANDRASUMMITEU
How Do I Find What I’m Looking For?
Simple Complex
Aristotle’s birthplace? All ancient Greek philosophers?
Coordinates of Stagira? All cities within 100km of Stagira
select birthPlacewhere name = “Aristotle”;
select coordwhere name = “Stagira”;
#CASSANDRAEU CASSANDRASUMMITEU
How Do I Find What I’m Looking For?
Simple Complex
Aristotle’s birthplace? All ancient Greek philosophers?
Coordinates of Stagira? All cities within 100km of Stagira
select birthPlacewhere name = “Aristotle”;
create index on tag;
select *where tag = “Greek philosophy”;
select coordwhere name = “Stagira”;
???
#CASSANDRAEU CASSANDRASUMMITEU
How Do I Find What I’m Looking For?
Simple
Aristotle’s birthplace? All ancient Greek philosophers?
Coordinates of Stagira? All cities within 100km of Stagira
q=Aristotle&fl=birthPlace q=Greek philosophy
q=Stagira&fl=point q=*:*&fq={!geofilt pt=40.530, 23.752 sfield=point d=100}
#CASSANDRAEU CASSANDRASUMMITEU
• Google Site Search
• MySQL ‘like’ statements
Approaches to Search
Seth Casteel - littlefriendsphoto.com
#CASSANDRAEU CASSANDRASUMMITEU
Approaches to Search
• Full-text search
• Ranking (Scoring)
• Tokenization
• Stemming
• Faceting
#CASSANDRAEU CASSANDRASUMMITEU
Approaches to Search
• Full-text search
• Ranking (Scoring)
• Tokenization
• Stemming
• Facetingand many more!
#CASSANDRAEU CASSANDRASUMMITEU
[1] Pleasure in the job puts perfection in the work.
[2] Education is the best provision for the journey to old age.
[3] If some animals are good at hunting and others are suitable for hunting, then the gods must clearly smile on hunting.
[4] It is the mark of an educated mind to be able to entertain a thought without absorbing it.
Inverted Index
Term Freq Documents
education 2 [2] [4]
hunting 3 [3]
perfection 1 [1]
#CASSANDRAEU CASSANDRASUMMITEU
<fieldType name="text_general" class="solr.TextField" > <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true /> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.EnglishPossessiveFilterFactory"/> <filter class="solr.PorterStemFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"/> <filter class="solr.StopFilterFactory” ignoreCase="true"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.PorterStemFilterFactory"/> </analyzer></fieldType>
Defining Field Types
#CASSANDRAEU CASSANDRASUMMITEU
Index-side Analysis
Hope is a waking dream
Hope is a waking dream.
Hope waking dream
hope wake dream
hope waking dream
Punctuation
Stop Words
Lowercase
Stemming
Hope is a waking dream
Hope is a waking dream.
Hope waking dream
hope wake dream
hope waking dream
Punctuation
Stop Words
Lowercase
Stemming
hope wake dreamdesire awake wish
Synonyms
#CASSANDRAEU CASSANDRASUMMITEU
Query-side Analysis
#CASSANDRAEU CASSANDRASUMMITEU
Facetingfacet_fields: { tags: [hunting: 1, education: 2, work: 2], locations: [Stagira: 5, Chalcis: 3]}
#CASSANDRAEU CASSANDRASUMMITEU
• Pay Google more $$
• MySQL ‘like’ shards
• Master/Slave replication
• SolrCloud
Approaches to Distribution
#CASSANDRAEU CASSANDRASUMMITEU
“Distributed search is hard.”
#CASSANDRAEU CASSANDRASUMMITEU
• Full-text search
• Tokenization
• Stemming
• Date ranges
• Aggregation
• High Availability
• Distributed Nature
Solr + Cassandra: Datastax Enterprise
#CASSANDRAEU CASSANDRASUMMITEU
Person
Examining DBpedia.org Datasets
<http://xmlns.com/foaf/0.1/name> "Aristotle"@en .<http://www.w3.org/1999/02/22-rdf-syntax-ns#type> Person .<http://purl.org/dc/elements/1.1/description> "Greek philosopher"@en .<http://dbpedia.org/ontology/birthPlace> Stagira<http://dbpedia.org/ontology/deathPlace> Chalcis .
<http://dbpedia.org/resource/Stagira> <http://www.opengis.net/gml/_Feature> .<http://dbpedia.org/resource/Stagira#lat> "40.5916667" .<http://dbpedia.org/resource/Stagira#long> "23.7947222" .<http://dbpedia.org/resource/Stagira#point> "40.591667 23.7947222"@en .
Place
#CASSANDRAEU CASSANDRASUMMITEU
“Love is a single soul inhabiting two bodies.”
#CASSANDRAEU CASSANDRASUMMITEU
curl http://localhost:8983/solr/solr.person/q=Aristotle
Querying for data
#CASSANDRAEU CASSANDRASUMMITEU
curl http://localhost:8983/solr/solr.location/select?q=*:* &spatial=true&fq={!geofilt pt=40.53027,23.7525 sfield=point d=100}
Filtering by location
#CASSANDRAEU CASSANDRASUMMITEU
“There is no great genius without a mixture of madness.”
#CASSANDRAEU CASSANDRASUMMITEU
Unified Schema<fields>
<field name="id" type="string" />
<field name="name" type="text" />
<dynamicField name="*Date" type="date" />
<dynamicField name="*Place" type="text" />
<dynamicField name="*Point" type="location" />
<dynamicField name="*_tag" type="text" />
</fields>
Schema.xml
#CASSANDRAEU CASSANDRASUMMITEU
curl http://localhost:8983/solr/resource/solr.location/schema.xml \
--data-binary @solr/location_schema.xml \
-H 'Content-type:text/xml; charset=utf-8 '
curl http://localhost:8983/solr/resource/solr.location/solrconfig.xml \
--data-binary @solr/location_solrconfig.xml \
-H 'Content-type:text/xml; charset=utf-8 '
curl http://localhost:8983/solr/admin/cores?action=CREATE&name=solr.location
Upload to DSE, Create Core
#CASSANDRAEU CASSANDRASUMMITEU
Location Schemacqlsh:solr> DESC COLUMNFAMILY location;
CREATE TABLE location (
id text PRIMARY KEY,
"_docBoost" text,
"_dynFld" text,
location text,
name text,
solr_query text,
tags text
) WITH COMPACT STORAGE AND
bloom_filter_fp_chance=0.010000 AND
caching='KEYS_ONLY' AND
comment='' AND
dclocal_read_repair_chance=0.000000 AND
gc_grace_seconds=864000 AND
read_repair_chance=0.100000 AND
replicate_on_write='true' AND
populate_io_cache_on_flush='false' AND
compaction={'class': 'SizeTieredCompactionStrategy'} AND
compression={'sstable_compression': 'SnappyCompressor'};
CREATE INDEX solr_location__docBoost_index ON location ("_docBoost");
...
CREATE INDEX solr_location_solr_query_index ON location (solr_query);
Cassandra Schema
#CASSANDRAEU CASSANDRASUMMITEU
Solr
• No multiValued fields
• No JOIN*
What Changes
Cassandra
• No composite columns
• No counter columns
#CASSANDRAEU CASSANDRASUMMITEU
• Fault tolerant, available search
Bringing it All Together
#CASSANDRAEU CASSANDRASUMMITEU
“Thank you.”
#CASSANDRAEU CASSANDRASUMMITEU
THANK YOU
[email protected]@patriciagorla@o19sAll information, including slides, are on http://github.com/pgorla/million-books