Upload
couchbase
View
8.842
Download
5
Tags:
Embed Size (px)
DESCRIPTION
Couchbase Server 2.0 allows for full-text search integration. In this webinar we examine how you can integrate your Couchbase Server 2.0 cluster with an Elasticsearch Cluster to provide enhanced querying capabilities and build large scale applications.
Citation preview
Using&Elas*csearch&and&Couchbase&Together&to&Build&Large&Scale&Apps&
Uri&Boness,&Founder,&Elas*csearch&
Dip*&Borkar,&Director,&Products,&Couchbase&
Introduction to Elasticsearch
What is Elasticsearch?
Open source Apache 2 license•
multi-tenant, realtime anddistributed search & analytics
engine
Backed by Elasticsearch (the company)•
Proven technology in productionOver 2 million downloads
•
What can Elasticsearch do?Unstructured search
find all companies in the “search” market
•
Structured searchfind all companies founded since 2000
•
Analyticsfind the average annual revenue of all companies
•
Combine allfind the average annual revenue of all companies foundedsince 2000 within the “search” market
•
(near) real-time!
Distributed & multi-tenantA node is single Elasticsearch instanceMultiple nodes can form a clusterA cluster can manage multiple indicesA cluster is agile & self managing
continuously ensuring the distributed characteristics of allindices are maintained and that all nodes in the cluster areefficiently & effectively utilized
••••
The Index
What’s in an index?An identified collection of documentsBuilt & designed for small & large scales
data volumesdata can be split and distributed between shards
loads & HAeach shard can have zero or more replicas
••
•
•
starting a node
node_1
creating our first index
node_1
curl -XPUT 'localhost:9200/companies' -d '{ "settings" : { "index" : { "number_of_shards" : 2, "number_of_replicas" : 1 } }}'
the two shards are allocated
node_1
0 1
curl -XPUT 'localhost:9200/companies' -d '{ "settings" : { "index" : { "number_of_shards" : 2, "number_of_replicas" : 1 } }}'
starting a second node
node_1 node_2
0 1
curl -XPUT 'localhost:9200/companies' -d '{ "settings" : { "index" : { "number_of_shards" : 2, "number_of_replicas" : 1 } }}'
shards are relocating
node_1 node_2
0 1
curl -XPUT 'localhost:9200/companies' -d '{ "settings" : { "index" : { "number_of_shards" : 2, "number_of_replicas" : 1 } }}'
replicas are allocated
node_1 node_2
0 11 0
curl -XPUT 'localhost:9200/companies' -d '{ "settings" : { "index" : { "number_of_shards" : 2, "number_of_replicas" : 1 } }}'
Indexing Data
the dataDocuments are typically JSON formatted•
curl -XPUT 'localhost:9200/companies/company/1' -d '{ "id" : "elasticsearch", "name" : "elasticsearch", "website" : "http://www.elasticsearch.com", "category" : "software", "overview" : "distributed search & analytics engine", "founded_year" : 2012, "location" : { "city" : "Amsterdam", "country_code" : "NL", "geo" : { "lat" : 52.370176, "lon" : 4.895008 } }}'
sending req. to one of the nodes
node_3node_1 node_2
0 11 010
client
sending req. to one of the nodes
node_3node_1 node_2
0 11 010
client
resolve the target shard
resolve shard & index to primary
node_3node_1 node_2
0 11 010
client
replicate to replicas
node_3node_1 node_2
0 11 010
client
Searching
unstructured searchUsing an extensive & powerful QueryDSL•
curl -XGET 'localhost:9200/companies/_search' -d '{ "query" : {, "match" : { "overview" : "search" } }}'
unstructured searchUsing an extensive & powerful QueryDSL•
curl -XGET 'localhost:9200/companies/_search' -d '{ "query" : {, "match" : { "overview" : "search" } }}'
search for the term “search” in the “overview”field
structured searchnarrows the “searchable” document space•
curl -XGET 'localhost:9200/companies/company/_search' -d '{ "query" : {, "filtered" : { "query" : { "match" : { "overview" : "search" } }, "filter" : { "range" : { "founded_year" : { "gte" : 2000 } } } } }}'
structured searchnarrows the “searchable” document space•
curl -XGET 'localhost:9200/companies/company/_search' -d '{ "query" : {, "filtered" : { "query" : { "match" : { "overview" : "search" } }, "filter" : { "range" : { "founded_year" : { "gte" : 2000 } } } } }}'
only search companies that were founded since year 2000
returned hits{ ... "hits": [ { "_index": "companies", "_type": "company", "_id": "1", "_score": 0.13424811, "_source": { "id": "elasticsearch", "name": "elasticsearch", "website": "http://www.elasticsearch.com", "category": "software", "founded_year": 2012, "overview": "distributed search & analytics engine", "location": { "city": "Amsterdam", "country_code": "NL", "geo": { "lat": 52.370176, "lon": 4.895008 } } } } ] }}
returned hits{ ... "hits": [ { "_index": "companies", "_type": "company", "_id": "1", "_score": 0.13424811, "_source": { "id": "elasticsearch", "name": "elasticsearch", "website": "http://www.elasticsearch.com", "category": "software", "founded_year": 2012, "overview": "distributed search & analytics engine", "location": { "city": "Amsterdam", "country_code": "NL", "geo": { "lat": 52.370176, "lon": 4.895008 } } } } ] }}
returned hits{ ... "hits": [ { "_index": "companies", "_type": "company", "_id": "1", "_score": 0.13424811, "_source": { "id": "elasticsearch", "name": "elasticsearch", "website": "http://www.elasticsearch.com", "category": "software", "founded_year": 2012, "overview": "distributed search & analytics engine", "location": { "city": "Amsterdam", "country_code": "NL", "geo": { "lat": 52.370176, "lon": 4.895008 } } } } ] }}
returned hits{ ... "hits": [ { "_index": "companies", "_type": "company", "_id": "1", "_score": 0.13424811, "_source": { "id": "elasticsearch", "name": "elasticsearch", "website": "http://www.elasticsearch.com", "category": "software", "founded_year": 2012, "overview": "distributed search & analytics engine", "location": { "city": "Amsterdam", "country_code": "NL", "geo": { "lat": 52.370176, "lon": 4.895008 } } } } ] }}
Query DSLQueries (unstructured)
term queries
boolean queries
phrase (proximity) queries
fuzzy/prefix/regexp/wildcards
more...
Filters (structured)term (exact match)
range
boolean
geo_* (e.g. geo_distance)
•
•
Analytics(a.k.a facets)
Analytics (facets)Slice & dice your dataCompute aggregations over field valuesAcross any index field/sAll in (near) realtime
••••
used as navigation aid
or analytics dashboards
Elasticsearch is often usedpurely for analytics
(without incorporating free text search)
ExampleFind the average revenue of all companies
since 2000•
curl -XGET 'localhost:9200/companies/revenues/_search' -d '{ "query" : { "match_all" : {} }, "facets" : { "revenue_stats" : { "date_histogram" : { "key_field" : "year", "value_field" : "value", "interval" : "month" } } }}'
ExampleFind the average revenue of all companies
since 2000•
curl -XGET 'localhost:9200/companies/revenues/_search' -d '{ "query" : { "match_all" : {} }, "facets" : { "revenue_stats" : { "date_histogram" : { "key_field" : "year", "value_field" : "value", "interval" : "month" } } }}'
return a yearly breakdown of stats over companies revenues
response"facets": { "revenue_stats": { "_type": "date_histogram", "entries": [ { "time": 956448895664, "mean": 23.0 }, { "time": 987984922557, "mean": 267.1034482758621 }, { "time": 1019520942098, "mean": 195.51724137931035 } ... ] } }
response"facets": { "revenue_stats": { "_type": "date_histogram", "entries": [ { "time": 956448895664, "mean": 23.0 }, { "time": 987984922557, "mean": 267.1034482758621 }, { "time": 1019520942098, "mean": 195.51724137931035 } ... ] } }
year 2000
avg revenue
Types of analyticsterms
unique value counts
rangestatistics of specific field for a set of range groups ofanother field
statisticalstats over a specific field
terms_statsstats over a specific fields for every unique field value
date_/histograma breakdown of statistics of a specific field over a
•
•
•
•
•
There’s much moreFine control of how documents are treated
indexed, stored, text analysis, relations
Additional featureshighlighting
suggest API (type ahead, auto-completion)
percolator (reverse search)
support of document relations (parent/child)
extensive geo-location search & analytics
more....
•
•------
Introduc)on*to*Couchbase*
Couchbase*Server*NoSQL*Document*Database*
Couchbase*Open*Source*Project*
• Leading(NoSQL(database(project(focused(on(distributed(database(technology(and(surrounding(ecosystem(
• Supports(both(key;value(and(document;oriented(use(cases(
• All(components(are(available(under(the(Apache*2.0*Public*License*
• Obtained(as(packaged(so?ware(in(both(enterprise(and(community(ediAons.(
Couchbase Open Source Project
Easy*Scalability*
Consistent*High*Performance*
Always*On*24x365*
Grow(cluster(without(applicaAon(changes,(without(downAme(with(a(single(click(
Consistent(sub;millisecond((read(and(write(response(Ames((with(consistent(high(throughput(
No(downAme(for(so?ware(upgrades,(hardware(maintenance,(etc.(
JSONJSONJSON
JSONJSON
PERFORMANCE
Flexible*Data*Model*
JSON(document(model(with(no(fixed(schema.(
Couchbase*Server*
Features*in*Couchbase*Server*2.0*
JSON*support* Indexing*and*Querying*
Cross*data*center*replica)on*Incremental*Map*Reduce*
JSONJSONJSON
JSONJSON
Addi)onal*Features*
Built;in(clustering(–(All(nodes(equal((Data(replicaAon(with(auto;failover((Zero;downAme(maintenance(((Built;in(managed(cached((
((
Append;only(storage(layer((Online(compacAon((Monitoring(and(admin(API(&(UI((SDK(for(a(variety(of(languages(
Couchbase*Server*2.0*Architecture*
Heartbe
at(
Process(mon
itor(
Global(singleton
(sup
ervisor(
Confi
guraAon
(manager(
on(each(node(
Rebalance(orchestrator(
Nod
e(he
alth(m
onitor(
one(per(cluster(
vBucket(state(and(replicaA
on(m
anager(
hQp*RE
ST*m
anagem
ent*A
PI/W
eb*UI*
HTTP(8091*
Erlang(port(mapper(4369*
Distributed(Erlang(21100*Y*21199*
Erlang/OTP*
storage(interface(
Couchbase*EP*Engine*
11210*Memcapable((2.0(
Moxi*
11211*Memcapable((1.0(
Memcached*
New*Persistence*Layer*
8092*Query(API(
Que
ry*Engine*
Data*Manager* Cluster*Manager*
3(3( 2(
Cross*data*center*replica)on*–*Data*flow*2(
Managed(Cache(
Disk(Que
ue(
Disk(
ReplicaAon(Queue(
App(Server(
Couchbase(Server(Node(
Doc*1*Doc*1*
Doc*1*
To(other(node(
XDCR(Queue(
Doc*1*
To(other(cluster(
Cross*Datacenter*Replica)on*(XDCR)*
Couchbase*plugYin*for*Elas)csearch*
How*does*it*work?*
Elas)cSearch*
UnidirecAonal(Cross(Data(Center(ReplicaAon(
ElasAcsearch(IntegraAon((via(XDCR)(
RAM(CACHE(
Doc(1(
Doc(2(
Doc(
Doc(
Doc(
Doc(
Doc(
Doc(
Doc(
Doc(
Doc(
SERVER(1(
Doc(6(
DISK(
RAM(CACHE(
Doc(1(
Doc(2(
Doc(
Doc(
Doc(
Doc(
Doc(
Doc(
Doc(
Doc(
Doc(
SERVER(2(
Doc(6(
DISK(
RAM(CACHE(
Doc(1(
Doc(2(
Doc(
Doc(
Doc(
Doc(
Doc(
Doc(
Doc(
Doc(
Doc(
SERVER(3(
Doc(6(
DISK(
Couchbase(Cluster(West(Coast(Data(Center(
ES(SERVER(1(
ElasAc(Search(Cluster(
ES(SERVER(2( ES(SERVER(3(
Couchbase(Transport(Plugin(
Couchbase(Transport(Plugin(
Couchbase(Transport(Plugin(
Install*the*Couchbase*PlugYIn*• PreYrequisite* ExisAng(Couchbase(and(ElasAcSearch(Clusters(
• Install*the*Elas)cSearch*Couchbase*Transport*PlugYin* bin/plugin(;install((
((((((((((((couchbaselabs/elasAcsearch;transport;couchbase/1.0.0;dp(
• Configure*the*PlugYin* Set(a(password( Install(the(Couchbase(Index(Template(
• Restart*Elas)cSearch*• Create*an*Elas)cSearch*index*for*your*documents*
Configure*Couchbase*XDCR*(step*1)*
Configure*Couchbase*XDCR*(step*2)*
Documents*are*now*indexed*in*Elas)csearch*
Document(Count(Increasing(
Reference*Architecture*
Recommended*Usage*PaQern*
Elas)cSearch*
1.((ElasAcSearch(Query(
2.(ElasAcSearch(Result(
3.(Couchbase(MulA;GET(
4.(Couchbase(Result(
Common*Couchbase*Use*Cases*Social*Gaming*
*• Couchbase(stores(player(and(game(data((
• Examples(customers(include:(Zynga(
• Tapjoy,(Ubiso?,(Tencent(
*
*Mobile*Apps*
*• Couchbase(stores(user(info(and(app(content(
• Examples(customers(include:(Kobo,(PlayAka((
*
*
Ad*Targe)ng**
• Couchbase(stores(user(informaAon(for(fast(access(
• Examples(customers(include:(AOL,(Mediamind,(Convertro((
*
Session*store**
• Couchbase(Server(as(a(key;value(store(
• Examples(customers(include:(Concur,(Sabre(
*
User*Profile*Store**
• Couchbase(Server(as(a(key;value(store(
• Examples(customers(include:(Tunewiki(
*High*availability*cache**
• Couchbase(Server(used(as(a(cache(Aer(replacement(
• Examples(customers(include:(Orbitz(
*
Content*&*Metadata*Store*
• Couchbase(document(store(with(ElasAc(Search(
• Examples(customers(include:(McGraw(Hill(
*
*3rd*party*data**aggrega)on**
*• Couchbase(stores(social(media(and(data(feeds(
• Examples(customers(include:(Sambacloud(
*
Social*Gaming**
• Couchbase(stores(player(and(game(data((
• Examples(customers(include:(Zynga(
• Tapjoy,(Ubiso?,(Tencent(
*
*Mobile*Apps*
*• Couchbase(stores(user(info(and(app(content(
• Examples(customers(include:(Kobo,(PlayAka((
*
*
Ad*Targe)ng**
• Couchbase(stores(user(informaAon(for(fast(access(
• Examples(customers(include:(AOL,(Mediamind,(Convertro((
*
Session*store**
• Couchbase(Server(as(a(key;value(store(
• Examples(customers(include:(Concur,(Sabre(
*
User*Profile*Store**
• Couchbase(Server(as(a(key;value(store(
• Examples(customers(include:(Tunewiki(
*High*availability*cache**
• Couchbase(Server(used(as(a(cache(Aer(replacement(
• Examples(customers(include:(Orbitz(
*
Content*&*Metadata*Store*
• Couchbase(document(store(with(ElasAc(Search(
• Examples(customers(include:(McGraw(Hill(
*
*3rd*party*data**aggrega)on**
*• Couchbase(stores(social(media(and(data(feeds(
• Examples(customers(include:(Sambacloud(
*
RealYworld*example*Couchbase*+*Elas)csearch*
• Content*metadata*• Content:*Ar)cles,*text**• Landing*pages*for*website*• Digital*content:*eBooks,*magazine,*research*material**
Content*and*Metadata*Store*
Use*Case:*Content*and*Metadata*Store*
• Flexibility*to*store*any*kind*of*content*• Fast*access*to*content*metadata*(most*accessed*objects)*and*content**• FullYtext*Search*across*data*set*• Scales*horizontally*as*more*content*gets*added*to*the*system*
• Fast*access*to*metadata*and*content*via*objectYmanaged*cache*• JSON*provides*schema*flexibility*to*store*all*types*of*content*and*metadata*• Indexing*and*querying*provides*realY)me*analy)cs*capabili)es*across*dataset**• Integra)on*with*Elas)cSearch*for*fullYtext*search*• Ease*of*scalability*ensures*that*the*data*cluster*can*be*grown*seamlessly*as*the*amount*of*user*and*ad*data*grows*
Types*of*Data* Applica)on*Requirements*
Why*NoSQL*and*Couchbase**
McGraw*Hill*Educa)on*Labs**Learning*portal*
*
Use*Case:*Content*and*metadata*store*
Building(a(self;adapAng,(interacAve(learning(portal(with(Couchbase(and(ElasAcsearch(
As learning move online in great numbers
Growing need to build interactive learning environments that
Scale!!Scale(to(millions(of(learners(
Serve(MHE(as(well(as(third;party(content(
Including(open(content(
Support(learning(apps(
010100100111010101010101001010101010(
Self;adapt(via(usage(data(
The Problem*
• Allow(for(elasAc(scaling(under(spike(periods(
• Ability(to(catalog(&(deliver(content(from(many*sources*
• Consistent(lowYlatency*for(metadata(and(stats(access(
• Require(fullYtext*search*support(for(content(discovery(
• Offer(tunable(content(ranking(&(recommendaAon(
funcAons((
Backend is an Interactive Content Delivery Cloud that must:
XML(Databases(
SQL/MR(Engines(
In;memory(Data(Grids(
Enterprise(Search(Servers(
Experimented with a combination of:
The Challenge*
• Document(Modeling(
• Metadata(&(Content(Storage(
• View(Querying(to(support(Content(Browsing(• ElasAcsearch(IntegraAon((; Content(Updated(in(near(Real;Time(
; Search(Content(Summaries(
; Relevancy(boosted(based(on(User(Preferences(• Real;Time(Content(Updates(
• Event(Logging(for(offline(analysis(
Techniques*Used*
Couchbase*2.0*****+******Elas)csearch*
Store(full-text articles(as(well(as(document metadata(for(image,(video(and(text(content(in(Couchbase(
Combine(user(preferences(staAsAcs(with(custom relevancy scoring(to(provide(personalized search results
Logs(user behavior(to(calculate(user(preference(staAsAcs((e.g.(video(>(text)(
1(
2( 4(
ConAnuously(accept updates from(Couchbase(with(new(content(&(stats(
3(
Data(Model(
Content Metadata Bucket
User Profiles Bucket
Content Stats Bucket
• Stores content metadata for media objects and content for articles
• Includes tags, contributors, type information
• Includes pointer to the media
• Stores user view details per type • Updated every time a user views
a doc with running count • To be used for customizing ES
search results per user preference
• Stores content view details • Updated for every time a
document is viewed • To be used for boosting ES
search results based on popularity
Couchbase Views
Top Contributors & Tagsdriven by Incremental MapReduce Views!
Calcula)ng*sta)s)cs*via*Couchbase*
Tuning(content(ranking(via(
ElasticSearch
ElasticSearch-driven based on settings below!
Content popularity boost!
User preference boost!
{ "_id": �4ae5be2df3122f06ba45b70753001841�,
�_rev�: �1-0013b349ffc3afc700000000068000000�, �$flags�: 0, �#expiration�: 0, �type�: �access�, �user�: �[email protected]�, �resource�: �379823�, �timestamp�: �2012-09-02T22:46:07Z�
}
{ "_id": �4ae5be2df3122f06ba45b70753001842�,
�_rev�: �1-0013b349ffc3afc700000000068000000�, �$flags�: 0, �#expiration�: 0, �type�: �create�, �user�: �[email protected]�, �resource�: �948177�, �timestamp�: �2012-09-02T22:48:32Z�
}
What?!
Who?!
Which?!
When?!
Analy)cs*and*Event*Logging*
• Store*full*event*log*for*offline*analysis*
• Stored*on*a*separate*analy)cs*cluster**
• Limit*impact*on*OLTP*
• Tuned*differently*
• Keep*an*upperYbound*on*data*size*via*TTL*(24*hrs)*
{ "filter": { "term": { "type": "video� } }, "boost": USER_VIDEO_PREFERENCE * PREFERENCE_SLIDER }
User*Preference*Boost*
• Use*Elas)csearch*filter*boos)ng*
"script": "_score * (((doc['popularity'].value + 1) / AVG_POPULARITY ) * POPULARITY_SLIDER)"
Document Popularity Boost*
• Use*Elas)csearch*custom*script*to*score*documents*
"filters": [ { "filter": { "term": { "type": "video" } }, "boost": USER_VIDEO_PREFERENCE * PREFERENCE_SLIDER }, … image and texts filters omitted … ], "score_mode": "total" } }, "script": "_score * (((doc['popularity'].value + 1) / AVG_POPULARITY ) * POPULARITY_SLIDER)" }
Combined Algorithm in a Query*
The Learning Portal*
• Designed and built as a collaboration between MHE Labs and Couchbase
• Serves as proof-of-concept and testing harness for Couchbase + Elasticsearch integration
• Available for download and further development as open source code
h"ps://github.com/couchbaselabs/learningportal5
Q*&*A*
Thank*you*
*******
dip)@couchbase.com*uri.boness@elas)csearch.com*
****