45
Harnessing The Power of Search André Ricardo Barreto de Oliveira ("Arbo") Software Engineer - Team Lead - Search Darmstadt, Germany 7 October, 2015

Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany

Embed Size (px)

Citation preview

Page 1: Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany

Harnessing ThePower of SearchAndré Ricardo Barreto de Oliveira ("Arbo")Software Engineer - Team Lead - Search

Darmstadt, Germany7 October, 2015

Page 2: Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany
Page 3: Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany
Page 4: Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany

What's Searchand why is it so cool?

Page 5: Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany

The dawn of Search

Page 6: Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany

Searching higher

Page 7: Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany

Search and the

Digital Experience

Page 8: Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany

Understanding Search

Page 9: Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany

Inside the Search Engine

The Index

Page 10: Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany

Inside the Search Engine

The Index Documents

Page 11: Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany

Inside the Search Engine

The Index Documents Fields

Page 12: Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany

Inside the Search Engine

The Index Documents Fields

Not that different from ye olde database?...

Page 13: Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany

Indexing documents

PUT /megacorp/employee/1{ "first_name" : "John", "last_name" : "Smith", "age" : 25, "about" : "I love to go rock climbing", "interests": [ "sports", "music" ]}

PUT /megacorp/employee/2{ "first_name" : "Jane", "last_name" : "Smith", "age" : 32, "about" : "I like to collect rock albums", "interests": [ "music" ]}

PUT /megacorp/employee/3{ "first_name" : "Douglas", "last_name" : "Fir", "age" : 35, "about": "I like to build cabinets", "interests": [ "forestry" ]}

Page 14: Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany

Queries and Filters

GET /megacorp/employee/_search?q=last_name:Smith "hits": [ { "_source": { "first_name": "John", "last_name": "Smith", "age": 25, "about": "I love to go rock climbing", "interests": [ "sports", "music" ] } }, { "_source": { "first_name": "Jane", "last_name": "Smith", "age": 32, "about": "I like to collect rock albums", "interests": [ "music" ] } } ]

GET /megacorp/employee/_search{ "query" : { "filtered" : { "filter" : { "range" : { "age" : { "gt" : 21 } } }, "query" : { "match" : { "last_name" : "smith" } } } }}

Page 15: Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany

Full-Text Search

GET /megacorp/employee/_search{ "query" : { "match" : {

"about" : "rock climbing" } }}

"hits": [ {

"_score": 0.16273327, "_source": { "first_name": "John", "last_name": "Smith", "age": 25,

"about": "I love to go rock climbing", "interests": [ "sports", "music" ] } }, {

"_score": 0.016878016, "_source": { "first_name": "Jane", "last_name": "Smith", "age": 32,

"about": "I like to collect rock albums", "interests": [ "music" ] } } ]

Page 16: Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany

Analysis and Analyzers

Set the shape to semi-transparent by calling Set_Trans(5)

Standard analyzer

set, the, shape, to, semi, transparent, by, calling, set_trans, 5

Simple analyzer

set, the, shape, to, semi, transparent, by, calling, set, trans

Whitespace analyzer

Set, the, shape, to, semi-transparent, by, calling, Set_Trans(5)

English language analyzer

set, shape, semi, transpar, call, set_tran, 5

Page 17: Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany

Field mappings

{ "number_of_clicks": {

"type": "integer" }}

{ "tag": { "type": "string",

"index": "not_analyzed" }}

{ "tweet": { "type": "string",

"analyzer": "english" }}

Page 18: Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany

Analytics and Aggregations

GET /megacorp/employee/_search{ "query": { "match": { "last_name": "smith" } }, "aggs" : { "all_interests" : { "terms" : { "field" : "interests" }, "aggs" : { "avg_age" : { "avg" : { "field" : "age" } } } } }}

"buckets": [

{

"key": "music",

"doc_count": 2,

"avg_age": {

"value": 28.5

}

},

{

"key": "sports",

"doc_count": 1,

"avg_age": {

"value": 25

}

}

]

Page 19: Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany

The LiferaySearch Infrastructure

Page 20: Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany

The Liferay Search architecture

Liferay Portal

Assets:web content,

message boards, wiki pages...

Search infrastructure

(Magic happens

here)

Search engine(s)

Indices, documents, analysis...

Page 21: Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany

The Liferay Search Engine plugins

public interface SearchEngine {

public IndexSearcher getIndexSearcher();

public IndexWriter getIndexWriter();

}

public class ElasticsearchSearchEngineextends BaseSearchEngine

public class ElasticsearchIndexSearcherextends BaseIndexSearcher

public class ElasticsearchIndexWriterextends BaseIndexWriter

public class SolrSearchEngineextends BaseSearchEngine

public class SolrIndexSearcherextends BaseIndexSearcher

public class SolrIndexWriterextends BaseIndexWriter

Page 22: Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany

Solr: schema.xml

<fields>

<field indexed="true"

name="articleId"

stored="true"

type="string_keyword_lowercase"

/>

<field indexed="true"

name="companyId"

stored="true"

type="long"

/>

<field indexed="true"

name="emailAddress"

stored="true"

type="string"

/>

</fields>

The Liferay Document Mappings

Elasticsearch: liferay-type-mappings.json

"LiferayDocumentType": {

"properties": {

"articleId": {

"analyzer": "keyword_lowercase",

"store": "yes",

"type": "string"

},

"companyId": {

"index": "not_analyzed",

"store": "yes",

"type": "string"

},

"emailAddress": {

"index": "not_analyzed",

"store": "yes",

"type": "string"

}

}

}

Page 23: Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany

From Portal assets to Index documents…

public interface Indexer<T> {

public Document getDocument(T object);

}

public class JournalArticleIndexer extends BaseIndexer<JournalArticle> {

protected Document doGetDocument(JournalArticle journalArticle) {

Document document = getBaseModelDocument(CLASS_NAME, journalArticle);

document.addText(

LocalizationUtil.getLocalizedName(Field.CONTENT, languageId),

content);

document.addKeyword(

Field.VERSION, journalArticle.getVersion());

document.addDate(

"displayDate", journalArticle.getDisplayDate());

}

}

public class MBMessageIndexer extends BaseIndexer<MBMessage> {

protected Document doGetDocument(MBMessage mbMessage) {

Document document = getBaseModelDocument(CLASS_NAME, mbMessage);

document.addText(

Field.CONTENT, processContent(mbMessage));

document.addKeyword(

"discussion", discussion == null ? false : true);

if (mbMessage.isAnonymous()) {

document.remove(Field.USER_NAME);

}

}

}

public interface Document {

public void addKeyword(String name, String value);public void addNumber(String name, long value);

}

Page 24: Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany

… from Search Box to queries and filters

public class JournalArticleIndexer

extends BaseIndexer<JournalArticle> {

public void postProcessSearchQuery(

BooleanQuery searchQuery,

BooleanFilter fullQueryBooleanFilter,

SearchContext searchContext) {

addSearchTerm(searchQuery, searchContext,

Field.ARTICLE_ID, false);

addSearchLocalizedTerm(searchQuery, searchContext,

Field.CONTENT, false);

addSearchLocalizedTerm(searchQuery, searchContext,

Field.TITLE, false);

addSearchTerm(searchQuery, searchContext,

Field.USER_NAME, false);

}

}

public class MBThreadIndexer

extends BaseIndexer<MBThread> {

public void postProcessContextBooleanFilter(

BooleanFilter contextBooleanFilter,

SearchContext searchContext) {

contextBooleanFilter.addRequiredTerm(

"discussion", discussion);

if ((endDate > 0) && (startDate > 0)) {

contextBooleanFilter.addRangeTerm(

"lastPostDate", startDate, endDate);

}

}

}

Page 25: Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany

Classic query types (and filters)

TermQuery / TermFilter

"term" : { "locale" : "de_DE" }

TermRangeQuery / RangeTermFilter

"range" : { "age" : { "gte" : 8, "lte" : 42 } }

WildcardQuery

"wildcard" : { "company" : "L*ray" }

StringQuery

"query_string": { "query": "(content:this OR name:this) AND (content:that OR name:that)" }

BooleanQuery / BooleanFilter

"bool" : { "must" : { "term" : { "locale" : "de_DE" } }, "must_not" : { "range" : { "age" : { "from" : 8, "to" : 42 } } }, "should" : [ { "wildcard" : { "company" : "L*ray" } }, { "term" : { "product" : "Portal" } } ] }

Page 26: Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany

Speaking to the Search Engine

public interface Query {

public BooleanFilter getPreBooleanFilter();

public Filter getPostFilter();

}

public interface Filter {

public Boolean isCached();

}

public class StringQueryTranslatorImpl implements StringQueryTranslator {

public QueryBuilder translate(StringQuery stringQuery) {

// Elasticsearch Client Java API

return QueryBuilders.queryStringQuery(stringQuery.getQuery());

}}

public class ElasticsearchIndexSearcher extends BaseIndexSearcher {

protected SearchResponse doSearch(

SearchContext searchContext, Query query) {

// Elasticsearch Client Java API

Client client = _elasticsearchConnectionManager.getClient();

SearchRequestBuilder searchRequestBuilder = client.prepareSearch(

getSelectedIndexNames(queryConfig, searchContext));

QueryBuilder queryBuilder = _queryTranslator.translate(

query, searchContext);

searchRequestBuilder.setQuery(queryBuilder);

SearchResponse searchResponse = searchRequestBuilder.get();

return searchResponse;

}}

Page 27: Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany

Search in Liferay 7

Page 28: Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany

What's new in Liferay 7

Liferay 6

● Embedded Lucene by default

● Remote: Solr only

● Solr 4

● Portal-centric Lucene clustering

Liferay 7

● Embedded Elasticsearch by default

● Remote: Elasticsearch and Solr

● Solr 5.x and SolrCloud

● Native, transparent Elasticsearch clustering

● Queries + Filters + Boosting + Geolocation

● Extensibility and modularization

● Enterprise extras

○ Shield for security

○ Marvel for cluster monitoring

○ Kibana for visualization

Page 29: Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany

New Queries

MatchQuery

"match" : { "subject" : { "query" : "Liferay Portal", "type" : "phrase" }}

MoreLikeThisQuery

"more_like_this" : {"fields" : ["title", "content"],"like_text" : "Search In Liferay 7","min_term_freq" : 1, "max_query_terms" : 12

}

DisMaxQuery

"dis_max" : {"tie_breaker" : 0.7,"queries" : [

{ "term" : { "age" : 34 } },{ "term" : { "age" : 35 } }

]}

FuzzyQuery

"fuzzy" : { "user" : { "value" : "ed", "fuzziness" : 2, "max_expansions": 100 }}

MatchAllQuery / MatchAllFilter

"match_all" : { "boost" : 1.2

}

MultiMatchQuery

"multi_match" : { "query": "Enterprise. Open Source. For Life", "type": "most_fields", "fields": [ "title", "title.original", "title.shingles" ]}

Page 30: Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany

New Filters

ExistsFilter

"exists" : { "field" : "emailAddress" }

MissingFilter

"missing" : { "field" : "emailAddress" }

PrefixFilter

"prefix" : { "product" : "life" }

TermsFilter

"terms" : { "locale" : ["de_DE", "pt_BR", "en_CA"] }

QueryFilter

"fquery" : { "query" : { "bool" : { "must" : [ { "wildcard" : { "company" : "L*ray" } }, { "term" : { "product" : "Portal" } } ] } }, "_cache" : true}

Page 31: Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany

Geolocation filters

GeoDistanceFilter

"geo_distance" : { "distance" : "12km", "pin.location" : { "lat" : 40, "lon" : -70 }}

GeoBoundingBoxFilter

"geo_bounding_box" : { "pin.location" : { "top_left" : { "lat" : 40.73, "lon" : -74.1 }, "bottom_right" : { "lat" : 40.01, "lon" : -71.12 } }}

GeoDistanceRangeFilter

"geo_distance_range" : { "from" : "200km", "to" : "400km", "pin.location" : { "lat" : 40, "lon" : -70 }}

GeoPolygonFilter

"geo_polygon" : { "person.location" : { "points" : [ [-70, 40], [-80, 30], [-90, 20] ] }}

Page 32: Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany

Query-time boosting

"should": [ { "match": { "title": { "query": "Liferay Portal", "boost": 2 } } }, { "match": { "content": { "query": "Liferay Portal", } } } ]

Page 33: Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany

New Aggregations: Top Hits

"terms": { "field": "conference", "size": 2},"aggs": { "talks": { "top_hits": { "size" : 1, "sort": [ { "attendees": { "order": "desc" } } ] } }}

{ "key": "Liferay DEVCON", "talks": { "hits": [ { "_source": { "title": "The Power of Search" } } ] } }, { "key": "Liferay North America Symposium", "talks": { "hits": [ { "_source": { "title": "The ELK Stack" } } ] } }

Page 34: Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany

New Aggregations: Extended Stats

"extended_stats" : { "field" : "attendees"

}

"attendees_per_talk_stats": { "count": 9, "min": 72, "max": 99, "avg": 86, "sum": 774, "sum_of_squares": 67028, "std_deviation": 7.180219742846005 }

Page 35: Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany

Modularity and Search

● OSGi● Liferay's default Search Engine: now a plugin in itself● Extension points in Search

○ Node Settings contributors → fine tune your cluster○ Index Settings contributors → fine tune your shards and

logs○ Analyzers and Mappings contributors → fine tune your

fields and queries

Page 36: Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany

Liferay 7:Enter Elasticsearch

Page 37: Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany

Why Elasticsearch?

Best of breed

Built for modern web applications

Distributed and clusterable by design

Lucene based

Multi-tenancy

Great vendor support

Great monitoring tools: Marvel, Logstash

Page 38: Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany

Great for Developers

Open Source

Amazing documentation

High "just works" factor, e.g. zero-config indexing and clustering

REST for queries, health, admin - everything

Update live settings programmatically

Great Java Client API

Pretty JSON for talks ;-)

Page 39: Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany

Clustering with Liferay and Elasticsearch

Production mode

Dev mode

Page 40: Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany

Scaling and tuning made easy

Page 41: Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany

Enterprise-level Searchin Liferay 7 EE

Page 42: Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany

Security: Shield

Protect your Liferay index with a username and password

SSL/TLS encryption for traffic within the Liferay Elasticsearch cluster

Elasticsearch plugin - no need for an external security solution

Restrict access to Liferay Portal instances with IP filtering

Page 43: Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany

Monitoring: Marvel

Page 44: Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany

Visualization:

Kibana

Page 45: Harnessing The Power of Search - Liferay DEVCON 2015, Darmstadt, Germany

Thanks and happy searching!http://j.mp/[email protected]/arboliveira@arbocombr