Upload
sematext-group-inc
View
109
Download
1
Tags:
Embed Size (px)
DESCRIPTION
From Gopher, WAIS, and Harvest to Lucene, Solr, SolrCloud, and Elasticsearch.
Citation preview
Today
The Early Days
Even Earlier Days
Foci
1974 1995 now()________________________________________________________________________________________________________________________
SEARCH
Otis Who?
SEARCH
Then & Now
1990s 2014WebGlimpse
Swish
Harvest
Ht://Dig
freeWAIS elasticsearch.
Still New?
elasticsearch.
…………………... 2000
…………………... 2004
…………………... 2010
Dominance
[Open Source]Search Evolution
Big Cake
Big DataBeyond Text
Memory FootprintDistributed ModelLanguage Support
Indexing Speed, NRTRelevance Algorithms
Language Support: Stemming
Language Support: Lemmatization
Language Support: Morphology
Language Support
Lucene 2004: ~ 20 languagesLucene 2014: ~ 40 languages
most are stemmers
Relevance Models: VSM
TF IDFFor term i in document j
wi,j = tfi,j x log(N/dfi)
tfi,j = number of occurrences of i in jdfi = number of document containing i
N = total number of documents
Relevance Models: Pluggable
Lucene until 2011: 1 relevance modelLucene 2014: 6 relevance models
got more?
Distributed Architecture
1 Master - N Slavesgood for scaling queriesnot good for scaling data
Sharded index with replicationgood for scaling queries
good for scaling data
Indexing Speed & NRT Search
Memory Footprint
Beyond Text
Geospatial SearchClassifier
Recommendation EngineKey Value Store
NoSQL DBAnalytical DB
Geospatial Search
Classifier
Recommender
Content Similarity
Collaborative Filtering
Key Value Store
id123 ⇒ manu:Apple desc:foo bar price:$111
id234 ⇒ manu:Sony desc:baz bam price:$222
NoSQL DB
DistributedReplicated
Horizontally ScalableFast RetrievalSearchable?
Slicing & Dicing
Analytical Queries
Gobble Gobble
If software is eating the world,then [open source] search is gobbling it.
And has been for years.
FIN. Questions