31
Querying Elasticsearch Binary Studio Academy PRO 2016 binary-studio.com

Academy PRO: Querying Elasticsearch

Embed Size (px)

Citation preview

Querying Elasticsearch

Binary Studio Academy PRO 2016

binary-studio.com

Search Types

STRUCTURE

(FIELD)

SEARCH

FULL-TEXT

SEARCH

Search APIs

● Lite (query string) search

● Full-body search

Lite search

http

localhost:9200/github/repository/_search?q=language:Javascript%20+forks_count:

%3E20000&sort=forks_count:desc&size=3

Lite search

Expects all parameters to be passed via query string and encoded properly e.g:

http localhost:9200/github/repository/_search?q=name:angular.js

Based on _search API:

http localhost:9200/_search http localhost:9200/user,repository

http localhost:9200/{index}/{type}/_search?q=field:value...

http localhost:9200/github/repository/_search?size=2&from=50

Lite searchSupports pagination:

Supports obligatory conditions (+ \ -):

http localhost:9200/github/repository/_search?q=+language:(php%20css)

Supports sorting

http

localhost:9200/github/repository/_search?q=language:Java&sort=watchers_count:d

esc

Lite search

PROS

Powerful

Convenient for development and ad-

hoc queries

End-users can run queries directly

from their web-browser

CONS

Queries should be carefuly encoded

Opened API can cause potentially

slow queries or even kill your

cluster

Not so efficient for complex queries

FULL-BODY

SEARCH

data

FULL-BODY SEARCH● Utilizes the same _search API

● Transfers parameters in request body e.g

curl localhost:9200/github/repository/_search -d '{"size": 2, "from": 10}'

● According to RFC 7231 there is no strict definition what to do when server received GET query with body parameters (depends on server

implementation). So both GET and POST methods are allowed.

● Instead of encoded urls there is convenient search query domain-specific

language (DSL)

SEARCH QUERY

DSL

DSL

SEARCH QUERY CLAUSES

● Leaf clauses - compare field to a query string

(match, term, range)

● Compound clauses - combine other query clauses

(bool, dis_max)

SEARCH QUERY FORMAT

SEARCH QUERY DSL EXAMPLEcurl localhost:9200/github/repository/_search?pretty -d '{

"query": {

"match": {

"language": "Javascript"

}

}

}'

curl localhost:9200/github/repository/_search?pretty -d '{

"query": {

"bool": {

"must": {"match": {"language": "Javascript"}},

"should": {"match": {"description": "library"}}

}

}

}'

SEARCH QUERY MATCHERS

match

multi_match common_terms query_string

simple_query_string

FULL TEXT QUERIES

MATCHERS

MATCH & MULTI_MATCHcurl localhost:9200/github/repository/_search?pretty -d '{

"query": {

"match": {

"language": "Javascript"

}

}

}'

curl localhost:9200/github/repository/_search?pretty -d '{

"query": {

"multi_match": {

"query": "javascript",

"fields": ["language", "description"]

}

}

}'

QUERY STRING QUERY

curl localhost:9200/github/repository/_search?pretty -d '{

"query": {

"query_string": {

"query": "language:(C OR PHP) AND watchers_count:[15000 TO *]"

}

}

}'

Supports compact Lucene query string syntax

SIMPLE QUERY STRING QUERY

curl localhost:9200/github/repository/_search?pretty -d '{

"query": {

"simple_query_string": {

"fields": ["description"],

"query": "(framework^2 realtime) + -(web port client)"

}

}

}'

Have simplified query syntax

COMMON TERMS QUERY

curl localhost:9200/github/repository/_search?pretty -d '{

"query": {

"common": {

"description": {

"query": "for is and web",

"cutoff_frequency": 0.001

}

}

}

}'

Divides query terms into two groups:

● More important - low frequency

● Less important - high frequency (applied first)

SEARCH QUERY FILTERS

● term

● terms

● range

● exists

● missing

● bool

● prefix

● wildcard

● regex

● fuzzy

TERM AND RANGE FILTERScurl localhost:9200/github/repository/_search?pretty -d '{

"query": {

"term": {

"language": "C++"

}

}

}'

curl localhost:9200/github/repository/_search?pretty -d '{

"query": {

"range": {

"watchers_count": {

"gte": 5000,

"lte": 15000

}

}

}

}'

EXISTS AND MISSING FILTERScurl localhost:9200/github/repository/_search?pretty -d '{

"query": {

"filtered": {

"query": {

"match_all": {}

},

"filter": {

"bool": {

"must_not": {

"exists": {

"field": "language"

}

}

}

}

}

}

}'

BOOL FILTER

● must

○ Clauses must match, like and

● must_not

○ Clauses must not match, like not

● should

○ At least one of clauses must match, like or .

"query": {

"filtered": {

"query": {

"match_all": {}

},

"filter": {

"bool": {

"must": {

"term": {

"language": "JavaScript"

}

},

"should": {

"range": {

"forks_count": {

"gt": 10000

}

}

}

}

}

}

}

COMBINING FILTERS AND MATCHERS

curl localhost:9200/github/repository/_search?pretty -d '{

"query": {

"filtered": {

"query": {

"match": {

"has_issues": true

}

},

"filter": {

"term": {

"language": "Objective-C"

}

}

}

}

}'

SORTING

SORTINGcurl localhost:9200/github/repository/_search?pretty -d '{

"query": {

"filtered": {

"query": {

"match": {

"has_issues": true

}

},

"filter": {

"term": {

"language": "Objective-C"

}

}

}

},

"sort": {

"forks_count": {

"order": "desc"

}

}

}'

RELEVANCE

RELEVANCE● How well a retrieved document or set of documents meets the information

need (criteria) of the user

● Positive FP number stored under _score property

● Calculated by term frequency/inverce document frequency (TF/IDF) algorithm:

○ Term Frequency (tf): more often - more

relevant (field)

○ Inverted Document Frequency(idf) more often - less relevant (index)

○ Field-length norm (fieldNorm) shorter - more relevant (field)

RELEVANCE EXPLANATION

curl localhost:9200/github/repository/_search?pretty -d '{

"query": {

"term": {

"language": "C++"

}

},

"size": 1,

"explain": true

}'

TO BE CONTINUED...