Elasticsearch Server - Third Edition · The use cases Limiting results to given tags Searching for...

Preview:

Citation preview

www.EBooksWorld.ir

www.EBooksWorld.ir

ElasticsearchServerThirdEdition

www.EBooksWorld.ir

TableofContents

ElasticsearchServerThirdEdition

Credits

AbouttheAuthors

AbouttheReviewer

www.PacktPub.com

eBooks,discountoffers,andmore

Whysubscribe?

Preface

Whatthisbookcovers

Whatyouneedforthisbook

Whothisbookisfor

Conventions

Readerfeedback

Customersupport

Downloadingtheexamplecode

Downloadingthecolorimagesofthisbook

Errata

Piracy

Questions

1.GettingStartedwithElasticsearchCluster

Fulltextsearching

TheLuceneglossaryandarchitecture

Inputdataanalysis

Indexingandquerying

Scoringandqueryrelevance

ThebasicsofElasticsearch

KeyconceptsofElasticsearch

Index

Document

www.EBooksWorld.ir

Documenttype

Mapping

KeyconceptsoftheElasticsearchinfrastructure

Nodesandclusters

Shards

Replicas

Gateway

Indexingandsearching

Installingandconfiguringyourcluster

InstallingJava

InstallingElasticsearch

RunningElasticsearch

ShuttingdownElasticsearch

Thedirectorylayout

ConfiguringElasticsearch

Thesystem-specificinstallationandconfiguration

InstallingElasticsearchonLinux

InstallingElasticsearchusingRPMpackages

InstallingElasticsearchusingtheDEBpackage

Elasticsearchconfigurationfilelocalization

ConfiguringElasticsearchasasystemserviceonLinux

ElasticsearchasasystemserviceonWindows

ManipulatingdatawiththeRESTAPI

UnderstandingtheRESTAPI

StoringdatainElasticsearch

Creatinganewdocument

Automaticidentifiercreation

Retrievingdocuments

Updatingdocuments

Dealingwithnon-existingdocuments

Addingpartialdocuments

www.EBooksWorld.ir

Deletingdocuments

Versioning

Usageexample

Versioningfromexternalsystems

SearchingwiththeURIrequestquery

Sampledata

URIsearch

Elasticsearchqueryresponse

Queryanalysis

URIquerystringparameters

Thequery

Thedefaultsearchfield

Analyzer

Thedefaultoperatorproperty

Queryexplanation

Thefieldsreturned

Sortingtheresults

Thesearchtimeout

Theresultswindow

Limitingper-shardresults

Ignoringunavailableindices

Thesearchtype

Lowercasingtermexpansion

Wildcardandprefixanalysis

Lucenequerysyntax

Summary

2.IndexingYourData

Elasticsearchindexing

Shardsandreplicas

Writeconsistency

Creatingindices

www.EBooksWorld.ir

Alteringautomaticindexcreation

Settingsforanewlycreatedindex

Indexdeletion

Mappingsconfiguration

Typedeterminingmechanism

Disablingthetypedeterminingmechanism

Tuningthetypedeterminingmechanismfornumerictypes

Tuningthetypedeterminingmechanismfordates

Indexstructuremapping

Typeandtypesdefinition

Fields

Coretypes

Commonattributes

String

Number

Boolean

Binary

Date

Multifields

TheIPaddresstype

Tokencounttype

Usinganalyzers

Out-of-the-boxanalyzers

Definingyourownanalyzers

Defaultanalyzers

Differentsimilaritymodels

Settingper-fieldsimilarity

Availablesimilaritymodels

Configuringdefaultsimilarity

ConfiguringBM25similarity

ConfiguringDFRsimilarity

www.EBooksWorld.ir

ConfiguringIBsimilarity

Batchindexingtospeedupyourindexingprocess

Preparingdataforbulkindexing

Indexingthedata

The_allfield

The_sourcefield

Additionalinternalfields

Introductiontosegmentmerging

Segmentmerging

Theneedforsegmentmerging

Themergepolicy

Themergescheduler

Throttling

Introductiontorouting

Defaultindexing

Defaultsearching

Routing

Theroutingparameters

Routingfields

Summary

3.SearchingYourData

QueryingElasticsearch

Theexampledata

Asimplequery

Pagingandresultsize

Returningtheversionvalue

Limitingthescore

Choosingthefieldsthatwewanttoreturn

Sourcefiltering

Usingthescriptfields

Passingparameterstothescriptfields

www.EBooksWorld.ir

Understandingthequeryingprocess

Querylogic

Searchtype

Searchexecutionpreference

SearchshardsAPI

Basicqueries

Thetermquery

Thetermsquery

Thematchallquery

Thetypequery

Theexistsquery

Themissingquery

Thecommontermsquery

Thematchquery

TheBooleanmatchquery

Thephrasematchquery

Thematchphraseprefixquery

Themultimatchquery

Thequerystringquery

Runningthequerystringqueryagainstmultiplefields

Thesimplequerystringquery

Theidentifiersquery

Theprefixquery

Thefuzzyquery

Thewildcardquery

Therangequery

Regularexpressionquery

Themorelikethisquery

Compoundqueries

Theboolquery

Thedis_maxquery

www.EBooksWorld.ir

Theboostingquery

Theconstant_scorequery

Theindicesquery

Usingspanqueries

Aspan

Spantermquery

Spanfirstquery

Spannearquery

Spanorquery

Spannotquery

Spanwithinquery

Spancontainingquery

Spanmultiquery

Performanceconsiderations

Choosingtherightquery

Theusecases

Limitingresultstogiventags

Searchingforvaluesinarange

Boostingsomeofthematcheddocuments

Ignoringlowerscoringpartialqueries

UsingLucenequerysyntaxinqueries

Handlinguserquerieswithouterrors

Autocompleteusingprefixes

Findingtermssimilartoagivenone

Matchingphrases

Spans,spanseverywhere

Summary

4.ExtendingYourQueryingKnowledge

Filteringyourresults

Thecontextisthekey

Explicitfilteringwithboolquery

www.EBooksWorld.ir

Highlighting

Gettingstartedwithhighlighting

Fieldconfiguration

Underthehood

Forcinghighlightertype

ConfiguringHTMLtags

Controllinghighlightedfragments

Globalandlocalsettings

Requirematching

Customhighlightingquery

ThePostingshighlighter

Validatingyourqueries

UsingtheValidateAPI

Sortingdata

Defaultsorting

Selectingfieldsusedforsorting

Sortingmode

Specifyingbehaviorformissingfields

Dynamiccriteria

Calculatescoringwhensorting

Queryrewrite

Prefixqueryasanexample

GettingbacktoApacheLucene

Queryrewriteproperties

Summary

5.ExtendingYourIndexStructure

Indexingtree-likestructures

Datastructure

Analysis

Indexingdatathatisnotflat

Data

www.EBooksWorld.ir

Objects

Arrays

Mappings

Finalmappings

SendingthemappingstoElasticsearch

Tobeornottobedynamic

Disablingobjectindexing

Usingnestedobjects

Scoringandnestedqueries

Usingtheparent-childrelationship

Indexstructureanddataindexing

Childmappings

Parentmappings

Theparentdocument

Childdocuments

Querying

Queryingdatainthechilddocuments

Queryingdataintheparentdocuments

Performanceconsiderations

ModifyingyourindexstructurewiththeupdateAPI

Themappings

Addinganewfieldtotheexistingindex

Modifyingfieldsofanexistingindex

Summary

6.MakeYourSearchBetter

IntroductiontoApacheLucenescoring

Whenadocumentismatched

Defaultscoringformula

Relevancymatters

ScriptingcapabilitiesofElasticsearch

Objectsavailableduringscriptexecution

www.EBooksWorld.ir

Scripttypes

Infilescripts

Inlinescripts

Indexedscripts

Queryingwithscripts

Scriptingwithparameters

Scriptlanguages

Usingotherthanembeddedlanguages

Usingnativecode

Thefactoryimplementation

Implementingthenativescript

Theplugindefinition

Installingtheplugin

Runningthescript

Searchingcontentindifferentlanguages

Handlinglanguagesdifferently

Handlingmultiplelanguages

Detectingthelanguageofthedocument

Sampledocument

Themappings

Querying

Querieswithanidentifiedlanguage

Querieswithanunknownlanguage

Combiningqueries

Influencingscoreswithqueryboosts

Theboost

Addingtheboosttoqueries

Modifyingthescore

Constantscorequery

Boostingquery

Thefunctionscorequery

www.EBooksWorld.ir

Structureofthefunctionquery

Theweightfactorfunction

Fieldvaluefactorfunction

Thescriptscorefunction

Therandomscorefunction

Decayfunctions

Whendoesindex-timeboostingmakesense?

Definingboostinginthemappings

Wordswiththesamemeaning

Synonymfilter

Synonymsinthemappings

Synonymsstoredonthefilesystem

Definingsynonymrules

UsingApacheSolrsynonyms

Explicitsynonyms

Equivalentsynonyms

Expandingsynonyms

UsingWordNetsynonyms

Queryorindex-timesynonymexpansion

Understandingtheexplaininformation

Understandingfieldanalysis

Explainingthequery

Summary

7.AggregationsforDataAnalysis

Aggregations

Generalquerystructure

Insidetheaggregationsengine

Aggregationtypes

Metricsaggregations

Minimum,maximum,average,andsum

Missingvalues

www.EBooksWorld.ir

Usingscripts

Fieldvaluestatisticsandextendedstatistics

Valuecount

Fieldcardinality

Percentiles

Percentileranks

Tophitsaggregation

Additionalparameters

Geoboundsaggregation

Scriptedmetricsaggregation

Bucketsaggregations

Filteraggregation

Filtersaggregation

Termsaggregation

Countsareapproximate

Minimumdocumentcount

Rangeaggregation

Keyedbuckets

Daterangeaggregation

IPv4rangeaggregation

Missingaggregation

Histogramaggregation

Datehistogramaggregation

Timezones

Geodistanceaggregations

Geohashgridaggregation

Globalaggregation

Significanttermsaggregation

Choosingsignificantterms

Multiplevalueanalysis

Sampleraggregation

www.EBooksWorld.ir

Childrenaggregation

Nestedaggregation

Reversenestedaggregation

Nestingaggregationsandorderingbuckets

Bucketsordering

Pipelineaggregations

Availabletypes

Referencingotheraggregations

Gapsinthedata

Pipelineaggregationtypes

Min,max,sum,andaveragebucketaggregations

Cumulativesumaggregation

Bucketselectoraggregation

Bucketscriptaggregation

Serialdifferencingaggregation

Derivativeaggregation

Movingavgaggregation

Predictingfuturebuckets

Themodels

Summary

8.BeyondFull-textSearching

Percolator

Theindex

Percolatorpreparation

Gettingdeeper

Controllingthesizeofreturnedresults

Percolatorandscorecalculation

Combiningpercolatorswithotherfunctionalities

Gettingthenumberofmatchingqueries

Indexeddocumentpercolation

Elasticsearchspatialcapabilities

www.EBooksWorld.ir

Mappingpreparationforspatialsearches

Exampledata

Additionalgeo_fieldproperties

Samplequeries

Distance-basedsorting

Boundingboxfiltering

Limitingthedistance

Arbitrarygeoshapes

Point

Envelope

Polygon

Multipolygon

Anexampleusage

Storingshapesintheindex

Usingsuggesters

Availablesuggestertypes

Includingsuggestions

Suggesterresponse

Termsuggester

Termsuggesterconfigurationoptions

Additionaltermsuggesteroptions

Phrasesuggester

Configuration

Completionsuggester

Indexingdata

Queryingindexedcompletionsuggesterdata

Customweights

Contextsuggester

Contexttypes

Usingcontext

Usingthegeolocationcontext

www.EBooksWorld.ir

TheScrollAPI

Problemdefinition

Scrollingtotherescue

Summary

9.ElasticsearchClusterinDetail

Understandingnodediscovery

Discoverytypes

Noderoles

Masternode

Datanode

Clientnode

Configuringnoderoles

Settingthecluster’sname

Zendiscovery

Masterelectionconfiguration

Configuringunicast

Faultdetectionpingsettings

Clusterstateupdatescontrol

Dealingwithmasterunavailability

AdjustingHTTPtransportsettings

DisablingHTTP

HTTPport

HTTPhost

Thegatewayandrecoverymodules

Thegateway

Recoverycontrol

Additionalgatewayrecoveryoptions

IndicesrecoveryAPI

Delayedallocation

Indexrecoveryprioritization

Templatesanddynamictemplates

www.EBooksWorld.ir

Templates

Anexampleofatemplate

Dynamictemplates

Thematchingpattern

Fielddefinitions

Elasticsearchplugins

Thebasics

Installingplugins

Removingplugins

Elasticsearchcaches

Fielddatacache

Fielddatasize

Circuitbreakers

Fielddataanddocvalues

Shardrequestcache

Enablingandconfiguringtheshardrequestcache

Perrequestshardrequestcachedisabling

Shardrequestcacheusagemonitoring

Nodequerycache

Indexingbuffers

Whencachesshouldbeavoided

TheupdatesettingsAPI

TheclustersettingsAPI

TheindicessettingsAPI

Summary

10.AdministratingYourCluster

Elasticsearchtimemachine

Creatingasnapshotrepository

Creatingsnapshots

Additionalparameters

Restoringasnapshot

www.EBooksWorld.ir

Cleaningup–deletingoldsnapshots

Monitoringyourcluster’sstateandhealth

ClusterhealthAPI

Controllinginformationdetails

Additionalparameters

IndicesstatsAPI

Docs

Store

Indexing,get,andsearch

Additionalinformation

NodesinfoAPI

Returnedinformation

NodesstatsAPI

ClusterstateAPI

ClusterstatsAPI

PendingtasksAPI

IndicesrecoveryAPI

IndicesshardstoresAPI

IndicessegmentsAPI

Controllingtheshardandreplicaallocation

Explicitlycontrollingallocation

Specifyingnodeparameters

Configuration

Indexcreation

Excludingnodesfromallocation

Requiringnodeattributes

UsingtheIPaddressforshardallocation

Disk-basedshardallocation

Configuringdiskbasedshardallocation

Disablingdiskbasedshardallocation

Thenumberofshardsandreplicaspernode

www.EBooksWorld.ir

Allocationthrottling

Cluster-wideallocation

Allocationawareness

Forcingallocationawareness

Filtering

Whatdoinclude,exclude,andrequiremean

Manuallymovingshardsandreplicas

Movingshards

Cancelingshardallocation

Forcingshardallocation

MultiplecommandsperHTTPrequest

Allowingoperationsonprimaryshards

Handlingrollingrestarts

Controllingclusterrebalancing

Understandingrebalance

Clusterbeingready

Theclusterrebalancesettings

Controllingwhenrebalancingwillbeallowed

Controllingthenumberofshardsbeingmovedbetweennodesconcurrently

Controllingwhichshardsmayberebalanced

TheCatAPI

Thebasics

UsingCatAPI

Commonarguments

Theexamples

Gettinginformationaboutthemasternode

Gettinginformationaboutthenodes

Retrievingrecoveryinformationforanindex

Warmingup

Defininganewwarmingquery

Retrievingthedefinedwarmingqueries

www.EBooksWorld.ir

Deletingawarmingquery

Disablingthewarmingupfunctionality

Choosingqueriesforwarming

Indexaliasingandusingittosimplifyyoureverydaywork

Analias

Creatinganalias

Modifyingaliases

Combiningcommands

Retrievingaliases

Removingaliases

Filteringaliases

Aliasesandrouting

Zerodowntimereindexingandaliases

Summary

11.ScalingbyExample

Hardware

Physicalserversoracloud

CPU

RAMmemory

Massstorage

Thenetwork

Howmanyservers

Costcutting

PreparingasingleElasticsearchnode

Thegeneralpreparations

Avoidingswapping

Filedescriptors

Virtualmemory

Thememory

Fielddatacacheandbreakingthecircuit

Usedocvalues

www.EBooksWorld.ir

RAMbufferforindexing

Indexrefreshrate

Threadpools

Horizontalexpansion

Automaticallycreatingthereplicas

Redundancyandhighavailability

Costandperformanceflexibility

Continuousupgrades

MultipleElasticsearchinstancesonasinglephysicalmachine

Preventingashardanditsreplicasfrombeingonthesamenode

Designatednoderolesforlargerclusters

Queryaggregatornodes

Datanodes

Mastereligiblenodes

Preparingtheclusterforhighindexingandqueryingthroughput

Indexingrelatedadvice

Indexrefreshrate

Threadpoolstuning

Automaticstorethrottling

Handlingtime-baseddata

Multipledatapaths

Datadistribution

Bulkindexing

RAMbufferforindexing

Adviceforhighqueryratescenarios

Shardrequestcache

Thinkaboutthequeries

Parallelizeyourqueries

Fielddatacacheandbreakingthecircuit

Keepsizeandshardsizeundercontrol

Monitoring

www.EBooksWorld.ir

ElasticsearchHQ

Marvel

SPMforElasticsearch

Summary

Index

www.EBooksWorld.ir

www.EBooksWorld.ir

ElasticsearchServerThirdEdition

www.EBooksWorld.ir

www.EBooksWorld.ir

ElasticsearchServerThirdEditionCopyright©2016PacktPublishing

Allrightsreserved.Nopartofthisbookmaybereproduced,storedinaretrievalsystem,ortransmittedinanyformorbyanymeans,withoutthepriorwrittenpermissionofthepublisher,exceptinthecaseofbriefquotationsembeddedincriticalarticlesorreviews.

Everyefforthasbeenmadeinthepreparationofthisbooktoensuretheaccuracyoftheinformationpresented.However,theinformationcontainedinthisbookissoldwithoutwarranty,eitherexpressorimplied.Neithertheauthors,norPacktPublishing,anditsdealersanddistributorswillbeheldliableforanydamagescausedorallegedtobecauseddirectlyorindirectlybythisbook.

PacktPublishinghasendeavoredtoprovidetrademarkinformationaboutallofthecompaniesandproductsmentionedinthisbookbytheappropriateuseofcapitals.However,PacktPublishingcannotguaranteetheaccuracyofthisinformation.

Firstpublished:October2013

Secondedition:February2015

Thirdedition:February2016

Productionreference:1230216

PublishedbyPacktPublishingLtd.

LiveryPlace

35LiveryStreet

BirminghamB32PB,UK.

ISBN978-1-78588-881-6

www.packtpub.com

www.EBooksWorld.ir

www.EBooksWorld.ir

CreditsAuthors

RafałKuć

MarekRogoziński

Reviewer

PaigeCook

CommissioningEditor

NadeemBagban

AcquisitionEditor

DivyaPoojari

ContentDevelopmentEditor

KirtiPatil

TechnicalEditor

UtkarshaS.Kadam

CopyEditor

AlphaSingh

ProjectCoordinator

NidhiJoshi

Proofreader

SafisEditing

Indexer

RekhaNair

Graphics

JasonMonteiro

ProductionCoordinator

ManuJoseph

CoverWork

ManuJoseph

www.EBooksWorld.ir

www.EBooksWorld.ir

AbouttheAuthorsRafałKućisasoftwareengineer,trainer,speakerandconsultant.HeisworkingasaconsultantandsoftwareengineeratSematextGroupInc.whereheconcentratesonopensourcetechnologiessuchasApacheLucene,Solr,andElasticsearch.Hehasmorethan14yearsofexperienceinvarioussoftwaredomains—frombankingsoftwaretoe–commerceproducts.HeismainlyfocusedonJava;however,heisopentoeverytoolandprogramminglanguagethatmighthelphimtoachievehisgoalseasilyandquickly.Rafałisalsooneofthefoundersofthesolr.plsite,wherehetriestosharehisknowledgeandhelppeoplesolvetheirSolrandLuceneproblems.HeisalsoaspeakeratvariousconferencesaroundtheworldsuchasLuceneEurocon,BerlinBuzzwords,ApacheCon,Lucene/SolrRevolution,Velocity,andDevOpsDays.

RafałbeganhisjourneywithLucenein2002;however,itwasn’tloveatfirstsight.WhenhecamebacktoLuceneinlate2003,herevisedhisthoughtsabouttheframeworkandsawthepotentialinsearchtechnologies.ThenSolrcameandthatwasit.HestartedworkingwithElasticsearchinthemiddleof2010.Atpresent,Lucene,Solr,Elasticsearch,andinformationretrievalarehismainareasofinterest.

RafałisalsotheauthoroftheSolrCookbookseries,ElasticSearchServeranditssecondedition,andthefirstandsecondeditionsofMasteringElasticSearch,allpublishedbyPacktPublishing.

MarekRogozińskiisasoftwarearchitectandconsultantwithmorethan10yearsofexperience.Hisspecializationconcernssolutionsbasedonopensourcesearchengines,suchasSolrandElasticsearch,andthesoftwarestackforbigdataanalyticsincludingHadoop,Hbase,andTwitterStorm.

Heisalsoacofounderofthesolr.plsite,whichpublishesinformationandtutorialsaboutSolrandLucenelibraries.HeisthecoauthorofElasticSearchServeranditssecondedition,andthefirstandsecondeditionsofMasteringElasticSearch,allpublishedbyPacktPublishing.

HeiscurrentlythechieftechnologyofficerandleadarchitectatZenCard,acompanythatprocessesandanalyzeslargequantitiesofpaymenttransactionsinrealtime,allowingautomaticandanonymousidentificationofretailcustomersonallretailerchannels(m-commerce/e-commerce/brick&mortar)andgivingretailersacustomerretentionandloyaltytool.

www.EBooksWorld.ir

www.EBooksWorld.ir

AbouttheReviewerPaigeCookworksasasoftwarearchitectforVidea,partoftheCoxFamilyofCompanies,andlivesnearAtlanta,Georgia.Hehastwentyyearsofexperienceinsoftwaredevelopment,primarilywiththeMicrosoft.NETFramework.Hiscareerhasbeenlargelyfocusedonbuildingenterprisesolutionsforthemediaandentertainmentindustry.HeisespeciallyinterestedinsearchtechnologiesusingtheApacheLucenesearchengineandhasexperiencewithbothElasticsearchandApacheSolr.Apartfromhiswork,heenjoysDIYhomeprojectsandspendingtimewithhiswifeandtwodaughters.

www.EBooksWorld.ir

www.EBooksWorld.ir

www.PacktPub.com

www.EBooksWorld.ir

eBooks,discountoffers,andmoreDidyouknowthatPacktofferseBookversionsofeverybookpublished,withPDFandePubfilesavailable?YoucanupgradetotheeBookversionatwww.PacktPub.comandasaprintbookcustomer,youareentitledtoadiscountontheeBookcopy.Getintouchwithusat<customercare@packtpub.com>formoredetails.

Atwww.PacktPub.com,youcanalsoreadacollectionoffreetechnicalarticles,signupforarangeoffreenewslettersandreceiveexclusivediscountsandoffersonPacktbooksandeBooks.

https://www2.packtpub.com/books/subscription/packtlib

DoyouneedinstantsolutionstoyourITquestions?PacktLibisPackt’sonlinedigitalbooklibrary.Here,youcansearch,access,andreadPackt’sentirelibraryofbooks.

www.EBooksWorld.ir

Whysubscribe?FullysearchableacrosseverybookpublishedbyPacktCopyandpaste,print,andbookmarkcontentOndemandandaccessibleviaawebbrowser

www.EBooksWorld.ir

www.EBooksWorld.ir

PrefaceWelcometoElasticsearchServer,ThirdEdition.ThisisthethirdinstalmentofthebookdedicatedtoyetanothermajorreleaseofElasticsearch—thistimeversion2.2.Inthethirdedition,wehavedecidedtogoonasimilarroutethatwetookwhenwewrotethesecondeditionofthebook.WenotonlyupdatedthecontenttomatchthenewversionofElasticsearch,butalsorestructuredthebookbyremovingandaddingnewsectionsandchapters.Wereadthesuggestionswegotfromyou—thereadersofthebook,andwecarefullytriedtoincorporatethesuggestionsandcommentsreceivedsincethereleaseofthefirstandsecondeditions.

Whilereadingthisbook,youwillbetakenonajourneytothewonderfulworldoffull-textsearchprovidedbytheElasticsearchserver.WewillstartwithageneralintroductiontoElasticsearch,whichcovershowtostartandrunElasticsearch,itsbasicconcepts,andhowtoindexandsearchyourdatainthemostbasicway.Thisbookwillalsodiscussthequerylanguage,socalledQueryDSL,thatallowsyoutocreatecomplicatedqueriesandfilterreturnedresults.Inadditiontoallofthis,you’llseehowyoucanusetheaggregationframeworktocalculateaggregateddatabasedontheresultsreturnedbyyourqueries.WewillimplementtheautocompletefunctionalitytogetherandlearnhowtouseElasticsearchspatialcapabilitiesandprospectivesearch.

Finally,thisbookwillshowyouElasticsearch’sadministrationAPIcapabilitieswithfeaturessuchasshardplacementcontrol,clusterhandling,andmore,endingwithadedicatedchapterthatwilldiscussElasticsearch’spreparationforsmallandlargedeployments—bothonesthatconcentrateonindexingandalsoonesthatconcentrateonindexing.

www.EBooksWorld.ir

WhatthisbookcoversChapter1,GettingStartedwithElasticsearchCluster,coverswhatfull-textsearchingis,whatApacheLuceneis,whattextanalysisis,howtorunandconfigureElasticsearch,andfinally,howtoindexandsearchyourdatainthemostbasicway.

Chapter2,IndexingYourData,showshowindexingworks,howtoprepareindexstructure,whatdatatypesweareallowedtouse,howtospeedupindexing,whatsegmentsare,howmergingworks,andwhatroutingis.

Chapter3,SearchingYourData,introducesthefull-textsearchcapabilitiesofElasticsearchbydiscussinghowtoqueryit,howthequeryingprocessworks,andwhattypesofbasicandcompoundqueriesareavailable.Inadditiontothis,wewillshowhowtouseposition-awarequeriesinElasticsearch.

Chapter4,ExtendingYourQueryKnowledge,showshowtoefficientlynarrowdownyoursearchresultsbyusingfilters,howhighlightingworks,howtosortyourresults,andhowqueryrewriteworks.

Chapter5,ExtendingYourIndexStructure,showshowtoindexmorecomplexdatastructures.Welearnhowtoindextree-likedatatypes,howtoindexdatawithrelationshipsbetweendocuments,andhowtomodifyindexstructure.

Chapter6,MakeYourSearchBetter,coversApacheLucenescoringandhowtoinfluenceitinElasticsearch,thescriptingcapabilitiesofElasticsearch,anditslanguageanalysiscapabilities.

Chapter7,AggregationsforDataAnalysis,introducesyoutothegreatworldofdataanalysisbyshowingyouhowtousetheElasticsearchaggregationframework.Wewilldiscussalltypesofaggregations—metrics,buckets,andthenewpipelineaggregationsthathavebeenintroducedinElasticsearch.

Chapter8,BeyondFull-textSearching,discussesnonfull-textsearch-relatedfunctionalitiessuchaspercolator—reversedsearch,andthegeo-spatialcapabilitiesofElasticsearch.Thischapteralsodiscussessuggesters,whichallowustobuildaspellcheckingfunctionalityandanefficientautocompletemechanism,andwewillshowhowtohandledeep-pagingefficiently.

Chapter9,ElasticsearchClusterinDetail,discussesnodesdiscoverymechanism,recoveryandgatewayElasticsearchmodules,templates,caches,andsettingsupdateAPI.

Chapter10,AdministratingYourCluster,coverstheElasticsearchbackupfunctionality,rebalancing,andshardsmoving.Inadditiontothis,youwilllearnhowtousethewarmupfunctionality,usetheCatAPI,andworkwithaliases.

Chapter11,ScalingbyExample,isdedicatedtoscalingandtuning.WewillstartwithhardwarepreparationsandconsiderationsandasingleElasticsearchnode-relatedtuning.Wewillgothroughclustersetupandverticalscaling,endingthechapterwithhighqueryingandindexingusecasesandclustermonitoring.

www.EBooksWorld.ir

www.EBooksWorld.ir

WhatyouneedforthisbookThisbookwaswrittenusingElasticsearchserver2.2andalltheexamplesandfunctionsshouldworkwiththis.Inadditiontothis,you’llneedacommandthatallowsyoutosendHTTPrequestsuchascurl,whichisavailableformostoperatingsystems.Pleasenotethatalltheexamplesinthisbookusethepreviouslymentionedcurltool.Ifyouwanttouseanothertool,pleaseremembertoformattherequestinanappropriatewaythatisunderstoodbythetoolofyourchoice.

Inadditiontothis,somechaptersmayrequireadditionalsoftware,suchasElasticsearchplugins,butwhenneededithasbeenexplicitlymentioned.

www.EBooksWorld.ir

www.EBooksWorld.ir

WhothisbookisforIfyouareabeginnertotheworldoffull-textsearchandElasticsearch,thenthisbookisespeciallyforyou.YouwillbeguidedthroughthebasicsofElasticsearchandyouwilllearnhowtousesomeoftheadvancedfunctionalities.

IfyouknowElasticsearchandyouworkedwithit,thenyoumayfindthisbookinterestingasitprovidesaniceoverviewofallthefunctionalitieswithexamplesanddescriptions.However,youmayencountersectionsthatyoualreadyknow.

IfyouknowtheApacheSolrsearchengine,thisbookcanalsobeusedtocomparesomefunctionalitiesofApacheSolrandElasticsearch.Thismaygiveyoutheknowledgeaboutwhichtoolismoreappropriateforyourusecase.

IfyouknowallthedetailsaboutElasticsearchandyouknowhoweachoftheconfigurationparameterswork,thenthisisdefinitelynotthebookyouarelookingfor.

www.EBooksWorld.ir

www.EBooksWorld.ir

ConventionsInthisbook,youwillfindanumberoftextstylesthatdistinguishbetweendifferentkindsofinformation.Herearesomeexamplesofthesestylesandanexplanationoftheirmeaning.

Codewordsintext,databasetablenames,foldernames,filenames,fileextensions,pathnames,dummyURLs,userinput,andTwitterhandlesareshownasfollows:“IfyouusetheLinuxorOSXcommand,thecURLpackageshouldalreadybeavailable.”

Ablockofcodeissetasfollows:

{

"mappings":{

"post":{

"properties":{

"id":{"type":"long"},

"name":{"type":"string"},

"published":{"type":"date"},

"contents":{"type":"string"}

}

}

}

}

Whenwewishtodrawyourattentiontoaparticularpartofacodeblock,therelevantlinesoritemsaresetinbold:

{

"mappings":{

"post":{

"properties":{

"id":{"type":"long"},

"name":{"type":"string"},

"published":{"type":"date"},

"contents":{"type":"string"}

}

}

}

}

Anycommand-lineinputoroutputiswrittenasfollows:

curl-XPUThttp://localhost:9200/users/?pretty-d'{

"mappings":{

"user":{

"numeric_detection":true

}

}

}'

NoteWarningsorimportantnotesappearinaboxlikethis.

www.EBooksWorld.ir

www.EBooksWorld.ir

ReaderfeedbackFeedbackfromourreadersisalwayswelcome.Letusknowwhatyouthinkaboutthisbook—whatyoulikedordisliked.Readerfeedbackisimportantforusasithelpsusdeveloptitlesthatyouwillreallygetthemostoutof.

Tosendusgeneralfeedback,simplye-mail<feedback@packtpub.com>,andmentionthebook’stitleinthesubjectofyourmessage.

Ifthereisatopicthatyouhaveexpertiseinandyouareinterestedineitherwritingorcontributingtoabook,seeourauthorguideatwww.packtpub.com/authors.

www.EBooksWorld.ir

www.EBooksWorld.ir

CustomersupportNowthatyouaretheproudownerofaPacktbook,wehaveanumberofthingstohelpyoutogetthemostfromyourpurchase.

www.EBooksWorld.ir

DownloadingtheexamplecodeYoucandownloadtheexamplecodefilesforthisbookfromyouraccountathttp://www.packtpub.com.Ifyoupurchasedthisbookelsewhere,youcanvisithttp://www.packtpub.com/supportandregistertohavethefilese-maileddirectlytoyou.

Youcandownloadthecodefilesbyfollowingthesesteps:

1. Loginorregistertoourwebsiteusingyoure-mailaddressandpassword.2. HoverthemousepointerontheSUPPORTtabatthetop.3. ClickonCodeDownloads&Errata.4. EnterthenameofthebookintheSearchbox.5. Selectthebookforwhichyou’relookingtodownloadthecodefiles.6. Choosefromthedrop-downmenuwhereyoupurchasedthisbookfrom.7. ClickonCodeDownload.

Oncethefileisdownloaded,pleasemakesurethatyouunziporextractthefolderusingthelatestversionof:

WinRAR/7-ZipforWindowsZipeg/iZip/UnRarXforMac7-Zip/PeaZipforLinux

www.EBooksWorld.ir

DownloadingthecolorimagesofthisbookWealsoprovideyouwithaPDFfilethathascolorimagesofthescreenshots/diagramsusedinthisbook.Thecolorimageswillhelpyoubetterunderstandthechangesintheoutput.Youcandownloadthisfilefromhttps://www.packtpub.com/sites/default/files/downloads/ElasticsearchServerThirdEdition_ColorImages.pdf

www.EBooksWorld.ir

ErrataAlthoughwehavetakeneverycaretoensuretheaccuracyofourcontent,mistakesdohappen.Ifyoufindamistakeinoneofourbooks—maybeamistakeinthetextorthecode—wewouldbegratefulifyoucouldreportthistous.Bydoingso,youcansaveotherreadersfromfrustrationandhelpusimprovesubsequentversionsofthisbook.Ifyoufindanyerrata,pleasereportthembyvisitinghttp://www.packtpub.com/submit-errata,selectingyourbook,clickingontheErrataSubmissionFormlink,andenteringthedetailsofyourerrata.Onceyourerrataareverified,yoursubmissionwillbeacceptedandtheerratawillbeuploadedtoourwebsiteoraddedtoanylistofexistingerrataundertheErratasectionofthattitle.

Toviewthepreviouslysubmittederrata,gotohttps://www.packtpub.com/books/content/supportandenterthenameofthebookinthesearchfield.TherequiredinformationwillappearundertheErratasection.

www.EBooksWorld.ir

PiracyPiracyofcopyrightedmaterialontheInternetisanongoingproblemacrossallmedia.AtPackt,wetaketheprotectionofourcopyrightandlicensesveryseriously.IfyoucomeacrossanyillegalcopiesofourworksinanyformontheInternet,pleaseprovideuswiththelocationaddressorwebsitenameimmediatelysothatwecanpursuearemedy.

Pleasecontactusat<copyright@packtpub.com>withalinktothesuspectedpiratedmaterial.

Weappreciateyourhelpinprotectingourauthorsandourabilitytobringyouvaluablecontent.

www.EBooksWorld.ir

QuestionsIfyouhaveaproblemwithanyaspectofthisbook,youcancontactusat<questions@packtpub.com>,andwewilldoourbesttoaddresstheproblem.

www.EBooksWorld.ir

www.EBooksWorld.ir

Chapter1.GettingStartedwithElasticsearchClusterWelcometothewonderfulworldofElasticsearch—agreatfulltextsearchandanalyticsengine.Itdoesn’tmatterifyouarenewtoElasticsearchandfulltextsearchesingeneral,orifyoualreadyhavesomeexperienceinthis.Wehopethat,byreadingthisbook,you’llbeabletolearnandextendyourknowledgeofElasticsearch.Asthisbookisalsodedicatedtobeginners,wedecidedtostartwithashortintroductiontofulltextsearchesingeneral,andafterthat,abriefoverviewofElasticsearch.

PleaserememberthatElasticsearchisarapidlychangingofsoftware.Notonlyarefeaturesadded,buttheElasticsearchcorefunctionalityisalsoconstantlyevolvingandchanging.Wetrytokeepupwiththesechanges,andbecauseofthiswearegivingyouthethirdeditionofthebookdedicatedtoElasticsearch2.x.

ThefirstthingweneedtodowithElasticsearchisinstallandconfigureit.Withmanyapplications,youstartwiththeinstallationandconfigurationandusuallyforgettheimportanceofthesesteps.Wewilltrytoguideyouthroughthesestepssothatitbecomeseasiertoremember.Inadditiontothis,wewillshowyouthesimplestwaytoindexandretrievedatawithoutgoingintotoomuchdetail.ThefirstchapterwilltakeyouonaquickridethroughElasticsearchandthefulltextsearchworld.Bytheendofthischapter,youwillhavelearnedthefollowingtopics:

FulltextsearchingThebasicsofApacheLucenePerformingtextanalysisThebasicconceptsofElasticsearchInstallingandconfiguringElasticsearchUsingtheElasticsearchRESTAPItomanipulatedataSearchingusingbasicURIrequests

www.EBooksWorld.ir

FulltextsearchingBackinthedayswhenfulltextsearchingwasatermknowntoasmallpercentageofengineers,mostofususedSQLdatabasestoperformsearchoperations.UsingSQLdatabasestosearchforthedatastoredinthemwasokaytosomeextent.Suchasearchwasn’tfast,especiallyonlargeamountsofdata.Evennow,smallapplicationsareusuallygoodwithastandardLIKE%phrase%searchinaSQLdatabase.However,aswegodeeperanddeeper,westarttoseethelimitsofsuchanapproach—alackofscalability,notenoughflexibility,andalackoflanguageanalysis.Ofcourse,thereareadditionalmodulesthatextendSQLdatabaseswithfulltextsearchcapabilities,buttheyarestilllimitedcomparedtodedicatedfulltextsearchlibrariesandsearchenginessuchasElasticsearch.SomeofthosereasonsledtothecreationofApacheLucene(http://lucene.apache.org/),alibrarywrittencompletelyinJava(http://java.com/en/),whichisveryfast,light,andprovideslanguageanalysisforalargenumberoflanguagesspokenthroughouttheworld.

www.EBooksWorld.ir

TheLuceneglossaryandarchitectureBeforegoingintothedetailsoftheanalysisprocess,wewouldliketointroduceyoutotheglossaryandoverallarchitectureofApacheLucene.WedecidedthatthisinformationiscrucialforunderstandinghowElasticsearchworks,andeventhoughthebookisnotaboutApacheLucene,knowingthefoundationoftheElasticsearchanalyticsandindexingengineisvitaltofullyunderstandhowthisgreatsearchengineworks.

Thebasicconceptsofthementionedlibraryareasfollows:

Document:Thisisthemaindatacarrierusedduringindexingandsearching,comprisingoneormorefieldsthatcontainthedataweputinandgetfromLucene.Field:Thisasectionofthedocument,whichisbuiltoftwoparts:thenameandthevalue.Term:Thisisaunitofsearchrepresentingawordfromthetext.Token:Thisisanoccurrenceofaterminthetextofthefield.Itconsistsofthetermtext,startandendoffsets,andatype.

ApacheLucenewritesalltheinformationtoastructurecalledtheinvertedindex.Itisadatastructurethatmapsthetermsintheindextothedocumentsandnottheotherwayaroundasarelationaldatabasedoesinitstables.Youcanthinkofaninvertedindexasadatastructurewheredataisterm-orientedratherthandocument-oriented.Let’sseehowasimpleinvertedindexwilllook.Forexample,let’sassumethatwehavedocumentswithonlyasinglefieldcalledtitletobeindexed,andthevaluesofthatfieldareasfollows:

ElasticsearchServer(document1)MasteringElasticsearchSecondEdition(document2)ApacheSolrCookbookThirdEdition(document3)

AverysimplifiedvisualizationoftheLuceneinvertedindexcouldlookasfollows:

Eachtermpointstothenumberofdocumentsitispresentin.Forexample,theterm

www.EBooksWorld.ir

editionispresenttwiceinthesecondandthirddocuments.Suchastructureallowsforveryefficientandfastsearchoperationsinterm-basedqueries(butnotexclusively).Becausetheoccurrencesofthetermareconnectedtothetermsthemselves,Lucenecanuseinformationaboutthetermoccurrencestoperformfastandprecisescoringinformationbygivingeachdocumentavaluethatrepresentshowwelleachofthereturneddocumentsmatchedthequery.

Ofcourse,theactualindexcreatedbyLuceneismuchmorecomplicatedandadvancedbecauseofadditionalfilesthatincludeinformationsuchastermvectors(perdocumentinvertedindex),docvalues(columnorientedfieldinformation),storedfields(theoriginalandnottheanalyzedvalueofthefield),andsoon.However,allyouneedtoknowfornowishowthedataisorganizedandnotwhatexactlyisstored.

Eachindexisdividedintomultiplewrite-onceandread-many-timestructurescalledsegments.EachsegmentisaminiatureApacheLuceneindexonitsown.Whenindexing,afterasinglesegmentiswrittentothediskitcan’tbeupdated,orweshouldrathersayitcan’tbefullyupdated;documentscan’tberemovedfromit,theycanonlybemarkedasdeletedinaseparatefile.ThereasonthatLucenedoesn’tallowsegmentstobeupdatedisthenatureoftheinvertedindex.Afterthefieldsareanalyzedandputintotheinvertedindex,thereisnoeasywayofbuildingtheoriginaldocumentstructure.Whendeleting,Lucenewouldhavetodeletetheinformationfromthesegment,whichtranslatestoupdatingalltheinformationwithintheinvertedindexitself.

Becauseofthefactthatsegmentsarewrite-oncestructuresLuceneisabletomergesegmentstogetherinaprocesscalledsegmentmerging.Duringindexing,ifLucenethinksthattherearetoomanysegmentsfallingintothesamecriterion,anewandbiggersegmentwillbecreated—onethatwillhavedatafromtheothersegments.Duringthatprocess,Lucenewilltrytoremovedeleteddataandgetbackthespaceneededtoholdinformationaboutthosedocuments.SegmentmergingisademandingoperationbothintermsoftheI/OandCPU.Whatwehavetorememberfornowisthatsearchingwithonelargesegmentisfasterthansearchingwithmultiplesmalleronesholdingthesamedata.That’sbecause,ingeneral,searchingtranslatestojustmatchingthequerytermstotheonesthatareindexed.Youcanimaginehowsearchingthroughmultiplesmallsegmentsandmergingthoseresultswillbeslowerthanhavingasinglesegmentpreparingtheresults.

www.EBooksWorld.ir

InputdataanalysisThetransformationofadocumentthatcomestoLuceneandisprocessedandputintotheinvertedindexformatiscalledindexation.OneofthethingsLucenehastododuringthisisdataanalysis.Youmaywantsomeofyourfieldstobeprocessedbyalanguageanalyzersothatwordssuchascarandcarsaretreatedasthesamebeyourindex.Ontheotherhand,youmaywantotherfieldstobedividedonlyonthewhitespacecharacterorbeonlylowercased.

Analysisisdonebytheanalyzer,whichisbuiltofatokenizerandzeroormoretokenfilters,anditcanalsohavezeroormorecharactermappers.

AtokenizerinLuceneisusedtosplitthetextintotokens,whicharebasicallythetermswithadditionalinformationsuchasitspositionintheoriginaltextanditslength.Theresultsofthetokenizer’sworkiscalledatokenstream,wherethetokensareputonebyoneandarereadytobeprocessedbythefilters.

Apartfromthetokenizer,theLuceneanalyzerisbuiltofzeroormoretokenfiltersthatareusedtoprocesstokensinthetokenstream.Someexamplesoffiltersareasfollows:

Lowercasefilter:MakesallthetokenslowercasedSynonymsfilter:ChangesonetokentoanotheronthebasisofsynonymrulesLanguagestemmingfilters:Responsibleforreducingtokens(actually,thetextpartthattheyprovide)intotheirrootorbaseformscalledthestem(https://en.wikipedia.org/wiki/Word_stem)

Filtersareprocessedoneafteranother,sowehavealmostunlimitedanalyticalpossibilitieswiththeadditionofmultiplefilters,oneafteranother.

Finally,thecharactermappersoperateonnon-analyzedtext—theyareusedbeforethetokenizer.Therefore,wecaneasilyremoveHTMLtagsfromwholepartsoftextwithoutworryingabouttokenization.

www.EBooksWorld.ir

IndexingandqueryingYoumaywonderhowalltheinformationwe’vedescribedsofaraffectsindexingandqueryingwhenusingLuceneandallthesoftwarethatisbuiltontopofit.Duringindexing,Lucenewilluseananalyzerofyourchoicetoprocessthecontentsofyourdocument;ofcourse,differentanalyzerscanbeusedfordifferentfields,sothenamefieldofyourdocumentcanbeanalyzeddifferentlycomparedtothesummaryfield.Forexample,thenamefieldmayonlybetokenizedonwhitespacesandlowercased,sothatexactmatchesaredoneandthesummaryfieldisstemmedinadditiontothat.Wecanalsodecidetonotanalyzethefieldsatall—wehavefullcontrolovertheanalysisprocess.

Duringaquery,yourquerytextcanbeanalyzedaswell.However,youcanalsochoosenottoanalyzeyourqueries.ThisiscrucialtorememberbecausesomeElasticsearchqueriesareanalyzedandsomearenot.Forexample,prefixandtermqueriesarenotanalyzed,andmatchqueriesareanalyzed(wewillgettothatinChapter3,SearchingYourData).Havingqueriesthatareanalyzedandnotanalyzedisveryuseful;sometimes,youmaywanttoqueryafieldthatisnotanalyzed,whilesometimesyoumaywanttohaveafulltextsearchanalysis.Forexample,ifwesearchfortheLightRedtermandthequeryisbeinganalyzedbythestandardanalyzer,thenthetermsthatwouldbesearchedarelightandred.Ifweuseaquerytypethathasnotbeenanalyzed,thenwewillexplicitlysearchfortheLightRedterm.Wemaynotwanttoanalyzethecontentofthequeryifweareonlyinterestedinexactmatches.

Whatyoushouldrememberaboutindexingandqueryinganalysisisthattheindexshouldmatchthequeryterm.Iftheydon’tmatch,Lucenewon’treturnthedesireddocuments.Forexample,ifyouusestemmingandlowercasingduringindexing,youneedtoensurethatthetermsinthequeryarealsolowercasedandstemmed,oryourquerieswon’treturnanyresultsatall.Forexample,let’sgetbacktoourLightRedtermthatweanalyzedduringindexing;wehaveitastwotermsintheindex:lightandred.IfwerunaLightRedqueryagainstthatdataanddon’tanalyzeit,wewon’tgetthedocumentintheresults—thequerytermdoesnotmatchtheindexedterms.Itisimportanttokeepthetokenfiltersinthesameorderduringindexingandquerytimeanalysissothatthetermsresultingfromsuchananalysisarethesame.

www.EBooksWorld.ir

ScoringandqueryrelevanceThereisoneadditionalthingthatweonlymentionedoncetillnow—scoring.Whatisthescoreofadocument?Thescoreisaresultofascoringformulathatdescribeshowwellthedocumentmatchesthequery.Bydefault,ApacheLuceneusestheTF/IDF(termfrequency/inversedocumentfrequency)scoringmechanism,whichisanalgorithmthatcalculateshowrelevantthedocumentisinthecontextofourquery.Ofcourse,itisnottheonlyalgorithmavailable,andwewillmentionotheralgorithmsintheMappingsconfigurationsectionofChapter2,IndexingYourData.

NoteIfyouwanttoreadmoreabouttheApacheLuceneTF/IDFscoringformula,pleasevisitApacheLuceneJavadocsfortheTFIDF.Thesimilarityclassisavailableathttp://lucene.apache.org/core/5_4_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html

www.EBooksWorld.ir

www.EBooksWorld.ir

ThebasicsofElasticsearchElasticsearchisanopensourcesearchserverprojectstartedbyShayBanonandpublishedinFebruary2010.Duringthistime,theprojectgrewintoamajorplayerinthefieldofsearchanddataanalysissolutionsandiswidelyusedinmanycommonorlesser-knownsearchanddataanalysisplatforms.Inaddition,duetoitsdistributednatureandreal-timesearchandanalyticscapabilities,manyorganizationsuseitasadocumentstore.

www.EBooksWorld.ir

KeyconceptsofElasticsearchInthenextfewpages,wewillgetyouthroughthebasicconceptsofElasticsearch.YoucanskipthissectionifyouarealreadyfamiliarwithElasticsearcharchitecture.However,ifyouarenotfamiliarwithElasticsearch,westronglyadviseyoutoreadthissection.Wewillrefertothekeywordsusedinthissectionintherestofthebook,andunderstandingthoseconceptsiscrucialtofullyutilizeElasticsearch.

IndexAnindexisthelogicalplacewhereElasticsearchstoresthedata.EachindexcanbespreadontomultipleElasticsearchnodesandisdividedintooneormoresmallerpiecescalledshardsthatarephysicallyplacedontheharddrives.Ifyouarecomingfromtherelationaldatabaseworld,youcanthinkofanindexlikeatable.However,theindexstructureispreparedforfastandefficientfulltextsearchingand,inparticular,doesnotstoreoriginalvalues.Thatstructureiscalledaninvertedindex(https://en.wikipedia.org/wiki/Inverted_index).

IfyouknowMongoDB,youcanthinkoftheElasticsearchindexasacollectioninMongoDB.IfyouarefamiliarwithCouchDB,youcanthinkaboutanindexasyouwouldabouttheCouchDBdatabase.Elasticsearchcanholdmanyindiceslocatedononemachineorspreadthemovermultipleservers.Aswehavealreadysaid,everyindexisbuiltofoneormoreshards,andeachshardcanhavemanyreplicas.

DocumentThemainentitystoredinElasticsearchisadocument.Adocumentcanhavemultiplefields,eachhavingitsowntypeandtreateddifferently.Usingtheanalogytorelationaldatabases,adocumentisarowofdatainadatabasetable.WhenyoucompareanElasticsearchdocumenttoaMongoDBdocument,youwillseethatbothcanhavedifferentstructures.ThethingtokeepinmindwhenitcomestoElasticsearchisthatfieldsthatarecommontomultipletypesinthesameindexneedtohavethesametype.Thismeansthatallthedocumentswithafieldcalledtitleneedtohavethesamedatatypeforit,forexample,string.

Documentsconsistoffields,andeachfieldmayoccurseveraltimesinasingledocument(suchafieldiscalledmultivalued).Eachfieldhasatype(text,number,date,andsoon).Thefieldtypescanalsobecomplex—afieldcancontainothersubdocumentsorarrays.ThefieldtypeisimportanttoElasticsearchbecausetypedetermineshowvariousoperationssuchasanalysisorsortingareperformed.Fortunately,thiscanbedeterminedautomatically(however,westillsuggestusingmappings;takealookatwhatfollows).

Unliketherelationaldatabases,documentsdon’tneedtohaveafixedstructure—everydocumentmayhaveadifferentsetoffields,andinadditiontothis,fieldsdon’thavetobeknownduringapplicationdevelopment.Ofcourse,onecanforceadocumentstructurewiththeuseofschema.Fromtheclient’spointofview,adocumentisaJSONobject(seemoreabouttheJSONformatathttps://en.wikipedia.org/wiki/JSON).Eachdocumentisstoredinoneindexandhasitsownuniqueidentifier,whichcanbegenerated

www.EBooksWorld.ir

automaticallybyElasticsearch,anddocumenttype.Thethingtorememberisthatthedocumentidentifierneedstobeuniqueinsideanindexandshouldbeforagiventype.Thismeansthat,inasingleindex,twodocumentscanhavethesameuniqueidentifieriftheyarenotofthesametype.

DocumenttypeInElasticsearch,oneindexcanstoremanyobjectsservingdifferentpurposes.Forexample,ablogapplicationcanstorearticlesandcomments.Thedocumenttypeletsuseasilydifferentiatebetweentheobjectsinasingleindex.Everydocumentcanhaveadifferentstructure,butinreal-worlddeployments,dividingdocumentsintotypessignificantlyhelpsindatamanipulation.Ofcourse,oneneedstokeepthelimitationsinmind.Thatis,differentdocumenttypescan’tsetdifferenttypesforthesameproperty.Forexample,afieldcalledtitlemusthavethesametypeacrossalldocumenttypesinagivenindex.

MappingInthesectionaboutthebasicsoffulltextsearching(theFulltextsearchingsection),wewroteabouttheprocessofanalysis—thepreparationoftheinputtextforindexingandsearchingdonebytheunderlyingApacheLucenelibrary.Everyfieldofthedocumentmustbeproperlyanalyzeddependingonitstype.Forexample,adifferentanalysischainisrequiredforthenumericfields(numbersshouldn’tbesortedalphabetically)andforthetextfetchedfromwebpages(forexample,thefirststepwouldrequireyoutoomittheHTMLtagsasitisuselessinformation).Tobeabletoproperlyanalyzeatindexingandqueryingtime,Elasticsearchstorestheinformationaboutthefieldsofthedocumentsinso-calledmappings.Everydocumenttypehasitsownmapping,evenifwedon’texplicitlydefineit.

www.EBooksWorld.ir

KeyconceptsoftheElasticsearchinfrastructureNow,wealreadyknowthatElasticsearchstoresitsdatainoneormoreindicesandeveryindexcancontaindocumentsofvarioustypes.WealsoknowthateachdocumenthasmanyfieldsandhowElasticsearchtreatsthesefieldsisdefinedbythemappings.Butthereismore.Fromthebeginning,Elasticsearchwascreatedasadistributedsolutionthatcanhandlebillionsofdocumentsandhundredsofsearchrequestspersecond.Thisisduetoseveralimportantkeyfeaturesandconceptsthatwearegoingtodescribeinmoredetailnow.

NodesandclustersElasticsearchcanworkasastandalone,single-searchserver.Nevertheless,tobeabletoprocesslargesetsofdataandtoachievefaulttoleranceandhighavailability,Elasticsearchcanberunonmanycooperatingservers.Collectively,theseserversconnectedtogetherarecalledaclusterandeachserverformingaclusteriscalledanode.

ShardsWhenwehavealargenumberofdocuments,wemaycometoapointwhereasinglenodemaynotbeenough—forexample,becauseofRAMlimitations,harddiskcapacity,insufficientprocessingpower,andaninabilitytorespondtoclientrequestsfastenough.Insuchcases,anindex(andthedatainit)canbedividedintosmallerpartscalledshards(whereeachshardisaseparateApacheLuceneindex).Eachshardcanbeplacedonadifferentserver,andthusyourdatacanbespreadamongtheclusternodes.Whenyouqueryanindexthatisbuiltfrommultipleshards,Elasticsearchsendsthequerytoeachrelevantshardandmergestheresultinsuchawaythatyourapplicationdoesn’tknowabouttheshards.Inadditiontothis,havingmultipleshardscanspeedupindexing,becausedocumentsendupindifferentshardsandthustheindexingoperationisparallelized.

ReplicasInordertoincreasequerythroughputorachievehighavailability,shardreplicascanbeused.Areplicaisjustanexactcopyoftheshard,andeachshardcanhavezeroormorereplicas.Inotherwords,Elasticsearchcanhavemanyidenticalshardsandoneofthemisautomaticallychosenasaplacewheretheoperationsthatchangetheindexaredirected.Thisspecialshardiscalledaprimaryshard,andtheothersarecalledreplicashards.Whentheprimaryshardislost(forexample,aserverholdingthesharddataisunavailable),theclusterwillpromotethereplicatobethenewprimaryshard.

GatewayTheclusterstateisheldbythegateway,whichstorestheclusterstateandindexeddataacrossfullclusterrestarts.Bydefault,everynodehasthisinformationstoredlocally;itissynchronizedamongnodes.WewilldiscussthegatewaymoduleinThegatewayandrecoverymodulessectionofChapter9,ElasticsearchCluster,indetail.

www.EBooksWorld.ir

IndexingandsearchingYoumaywonderhowyoucantiealltheindices,shards,andreplicastogetherinasingleenvironment.Theoretically,itwouldbeverydifficulttofetchdatafromtheclusterwhenyouhavetoknowwhereyourdocumentis:onwhichserver,andinwhichshard.Evenmoredifficultwouldbesearchingwhenonequerycanreturndocumentsfromdifferentshardsplacedondifferentnodesinthewholecluster.Infact,thisisacomplicatedproblem;fortunately,wedon’thavetocareaboutthisatall—itishandledautomaticallybyElasticsearch.Let’slookatthefollowingdiagram:

Whenyousendanewdocumenttothecluster,youspecifyatargetindexandsendittoanyofthenodes.Thenodeknowshowmanyshardsthetargetindexhasandisabletodeterminewhichshardshouldbeusedtostoreyourdocument.Elasticsearchcanalterthisbehavior;wewilltalkaboutthisintheIntroductiontoroutingsectioninChapter2,IndexingYourData.TheimportantinformationthatyouhavetorememberfornowisthatElasticsearchcalculatestheshardinwhichthedocumentshouldbeplacedusingtheuniqueidentifierofthedocument—thisisoneofthereasonseachdocumentneedsauniqueidentifier.Aftertheindexingrequestissenttoanode,thatnodeforwardsthedocumenttothetargetnode,whichhoststherelevantshard.

Now,let’slookatthefollowingdiagramonsearchingrequestexecution:

www.EBooksWorld.ir

Whenyoutrytofetchadocumentbyitsidentifier,thenodeyousendthequerytousesthesameroutingalgorithmtodeterminetheshardandthenodeholdingthedocumentandagainforwardstherequest,fetchestheresult,andsendstheresulttoyou.Ontheotherhand,thequeryingprocessisamorecomplicatedone.Thenodereceivingthequeryforwardsittoallthenodesholdingtheshardsthatbelongtoagivenindexandasksforminimuminformationaboutthedocumentsthatmatchthequery(theidentifierandscorearematchedbydefault),unlessroutingisused,whenthequerywillgodirectlytoasingleshardonly.Thisiscalledthescatterphase.Afterreceivingthisinformation,theaggregatornode(thenodethatreceivestheclientrequest)sortstheresultsandsendsasecondrequesttogetthedocumentsthatareneededtobuildtheresultslist(alltheotherinformationapartfromthedocumentidentifierandscore).Thisiscalledthegatherphase.Afterthisphaseisexecuted,theresultsarereturnedtotheclient.

Nowthequestionarises:whatisthereplica’sroleinthepreviouslydescribedprocess?Whileindexing,replicasareonlyusedasanadditionalplacetostorethedata.Whenexecutingaquery,bydefault,Elasticsearchwilltrytobalancetheloadamongtheshardanditsreplicassothattheyareevenlystressed.Also,rememberthatwecanchangethisbehavior;wewilldiscussthisintheUnderstandingthequeryingprocesssectioninChapter3,SearchingYourData.

www.EBooksWorld.ir

www.EBooksWorld.ir

InstallingandconfiguringyourclusterInstallingandrunningElasticsearcheveninproductionenvironmentsisveryeasynowadays,comparedtohowitwasinthedaysofElasticsearch0.20.x.FromasystemthatisnotreadytoonewithElasticsearch,thereareonlyafewstepsthatoneneedstogo.Wewillexplorethesestepsinthefollowingsection:

www.EBooksWorld.ir

InstallingJavaElasticsearchisaJavaapplicationandtouseitweneedtomakesurethattheJavaSEenvironmentisinstalledproperly.ElasticsearchrequiresJavaVersion7orlatertorun.Youcandownloaditfromhttp://www.oracle.com/technetwork/java/javase/downloads/index.html.YoucanalsouseOpenJDK(http://openjdk.java.net/)ifyouwish.Youcan,ofcourse,useJavaVersion7,butitisnotsupportedbyOracleanymore,atleastwithoutcommercialsupport.Forexample,youcan’texpectnew,patchedversionsofJava7tobereleased.Becauseofthis,westronglysuggestthatyouinstallJava8,especiallygiventhatJava9seemstoberightaroundthecornerwiththegeneralavailabilityplannedtobereleasedinSeptember2016.

www.EBooksWorld.ir

InstallingElasticsearchToinstallElasticsearchyoujustneedtogotohttps://www.elastic.co/downloads/elasticsearch,choosethelaststableversionofElasticsearch,downloadit,andunpackit.That’sit!Theinstallationiscomplete.

NoteAtthetimeofwriting,weusedasnapshotofElasticsearch2.2.Thismeansthatwe’veskippeddescribingsomepropertiesthatweremarkedasdeprecatedandareorwillberemovedinthefutureversionsofElasticsearch.

ThemaininterfacetocommunicatewithElasticsearchisbasedontheHTTPprotocolandREST.Thismeansthatyoucanevenuseawebbrowserforsomebasicqueriesandrequests,butforanythingmoresophisticatedyou’llneedtouseadditionalsoftware,suchasthecURLcommand.IfyouusetheLinuxorOSXcommand,thecURLpackageshouldalreadybeavailable.IfyouuseWindows,youcandownloadthepackagefromhttp://curl.haxx.se/download.html.

www.EBooksWorld.ir

RunningElasticsearchLet’srunourfirstinstancethatwejustdownloadedastheZIParchiveandunpacked.GotothebindirectoryandrunthefollowingcommandsdependingontheOS:

LinuxorOSX:./elasticsearchWindows:elasticsearch.bat

Congratulations!Now,youhaveyourElasticsearchinstanceup-and-running.Duringitswork,theserverusuallyusestwoportnumbers:thefirstoneforcommunicationwiththeRESTAPIusingtheHTTPprotocol,andthesecondoneforthetransportmoduleusedforcommunicationinaclusterandbetweenthenativeJavaclientandthecluster.ThedefaultportusedfortheHTTPAPIis9200,sowecanchecksearchreadinessbypointingthewebbrowsertohttp://127.0.0.1:9200/.Thebrowsershouldshowacodesnippetsimilartothefollowing:

{

"name":"Blob",

"cluster_name":"elasticsearch",

"version":{

"number":"2.2.0",

"build_hash":"5b1dd1cf5a1957682d84228a569e124fedf8e325",

"build_timestamp":"2016-01-13T18:12:26Z",

"build_snapshot":true,

"lucene_version":"5.4.0"

},

"tagline":"YouKnow,forSearch"

}

TheoutputisstructuredasaJavaScriptObjectNotation(JSON)object.IfyouarenotfamiliarwithJSON,pleasetakeaminuteandreadthearticleavailableathttps://en.wikipedia.org/wiki/JSON.

NoteElasticsearchissmart.Ifthedefaultportisnotavailable,theenginebindstothenextfreeport.Youcanfindinformationaboutthisontheconsoleduringbootingasfollows:

[2016-01-1320:04:49,953][INFO][http][Blob]publish_address

{127.0.0.1:9201},bound_addresses{[fe80::1]:9200},{[::1]:9200},

{127.0.0.1:9201}

Notethefragmentwith[http].Elasticsearchusesafewportsforvarioustasks.TheinterfacethatweareusingishandledbytheHTTPmodule.

Now,wewillusethecURLprogramtocommunicatewithElasticsearch.Forexample,tochecktheclusterhealth,wewillusethefollowingcommand:

curl-XGEThttp://127.0.0.1:9200/_cluster/health?pretty

The-XparameterisadefinitionoftheHTTPrequestmethod.ThedefaultvalueisGET(sointhisexample,wecanomitthisparameter).Fornow,donotworryabouttheGETvalue;wewilldescribeitinmoredetaillaterinthischapter.

www.EBooksWorld.ir

Asastandard,theAPIreturnsinformationinaJSONobjectinwhichnewlinecharactersareomitted.TheprettyparameteraddedtoourrequestsforcesElasticsearchtoaddanewlinecharactertotheresponse,makingtheresponsemoreuser-friendly.Youcantryrunningtheprecedingquerywithandwithoutthe?prettyparametertoseethedifference.

Elasticsearchisusefulinsmallandmedium-sizedapplications,butithasbeenbuiltwithlargeclustersinmind.So,nowwewillsetupourbigtwo-nodecluster.UnpacktheElasticsearcharchiveinadifferentdirectoryandrunthesecondinstance.Ifwelookatthelog,wewillseethefollowing:

[2016-01-1320:07:58,561][INFO][cluster.service][BigMan]

detected_master{Blob}{5QPh00RUQraeLHAInbR4Jw}{127.0.0.1}{127.0.0.1:9300},

added{{Blob}{5QPh00RUQraeLHAInbR4Jw}{127.0.0.1}{127.0.0.1:9300},},reason:

zen-disco-receive(frommaster[{Blob}{5QPh00RUQraeLHAInbR4Jw}{127.0.0.1}

{127.0.0.1:9300}])

Thismeansthatoursecondinstance(namedBigMan)discoveredthepreviouslyrunninginstance(namedBlob).Here,Elasticsearchautomaticallyformedanewtwo-nodecluster.StartingfromElasticsearch2.0,thiswillonlyworkwithnodesrunningonthesamephysicalmachine—becauseElasticsearch2.0nolongersupportsmulticast.Toallowyourclustertoform,youneedtoinformElasticsearchaboutthenodesthatshouldbecontactedinitiallyusingthediscovery.zen.ping.unicast.hostsarrayinelasticsearch.yml.Forexample,likethis:

discovery.zen.ping.unicast.hosts:["192.168.2.1","192.168.2.2"]

www.EBooksWorld.ir

ShuttingdownElasticsearchEventhoughweexpectourcluster(ornode)torunflawlesslyforalifetime,wemayneedtorestartitorshutitdownproperly(forexample,formaintenance).ThefollowingarethetwowaysinwhichwecanshutdownElasticsearch:

Ifyournodeisattachedtotheconsole,justpressCtrl+CThesecondoptionistokilltheserverprocessbysendingtheTERMsignal(seethekillcommandontheLinuxboxesandProgramManageronWindows)

NoteThepreviousversionsofElasticsearchexposedadedicatedshutdownAPIbut,in2.0,thisoptionhasbeenremovedbecauseofsecurityreasons.

www.EBooksWorld.ir

ThedirectorylayoutNow,let’sgotothenewlycreateddirectory.Weshouldseethefollowingdirectorystructure:

Directory Description

Bin ThescriptsneededtorunElasticsearchinstancesandforpluginmanagement

Config Thedirectorywhereconfigurationfilesarelocated

Lib ThelibrariesusedbyElasticsearch

Modules ThepluginsbundledwithElasticsearch

AfterElasticsearchstarts,itwillcreatethefollowingdirectories(iftheydon’texist):

Directory Description

Data ThedirectoryusedbyElasticsearchtostoreallthedata

Logs Thefileswithinformationabouteventsanderrors

Plugins Thelocationtostoretheinstalledplugins

Work ThetemporaryfilesusedbyElasticsearch

www.EBooksWorld.ir

ConfiguringElasticsearchOneofthereasons—ofcourse,nottheonlyone—whyElasticsearchisgainingmoreandmorepopularityisthatgettingstartedwithElasticsearchisquiteeasy.Becauseofthereasonabledefaultvaluesandautomaticsettingsforsimpleenvironments,wecanskiptheconfigurationandgostraighttoindexingandquerying(ortothenextchapterofthebook).Wecandoallthiswithoutchangingasinglelineinourconfigurationfiles.However,inordertotrulyunderstandElasticsearch,itisworthunderstandingsomeoftheavailablesettings.

WewillnowexplorethedefaultdirectoriesandthelayoutofthefilesprovidedwiththeElasticsearchtar.gzarchive.Theentireconfigurationislocatedintheconfigdirectory.Wecanseetwofileshere:elasticsearch.yml(orelasticsearch.json,whichwillbeusedifpresent)andlogging.yml.Thefirstfileisresponsibleforsettingthedefaultconfigurationvaluesfortheserver.Thisisimportantbecausesomeofthesevaluescanbechangedatruntimeandcanbekeptasapartoftheclusterstate,sothevaluesinthisfilemaynotbeaccurate.Thetwovaluesthatwecannotchangeatruntimearecluster.nameandnode.name.

Thecluster.namepropertyisresponsibleforholdingthenameofourcluster.Theclusternameseparatesdifferentclustersfromeachother.Nodesconfiguredwiththesameclusternamewilltrytoformacluster.

Thesecondvalueistheinstance(thenode.nameproperty)name.Wecanleavethisparameterundefined.Inthiscase,Elasticsearchautomaticallychoosesauniquenameforitself.Notethatthisnameischosenduringeachstartup,sothenamecanbedifferentoneachrestart.DefiningthenamecanhelpfulwhenreferringtoconcreteinstancesbytheAPIorwhenusingmonitoringtoolstoseewhatishappeningtoanodeduringlongperiodsoftimeandbetweenrestarts.Thinkaboutgivingdescriptivenamestoyournodes.

Otherparametersarecommentedwellinthefile,soweadviseyoutolookthroughit;don’tworryifyoudonotunderstandtheexplanation.Wehopethateverythingwillbecomeclearerafterreadingthenextfewchapters.

NoteRememberthatmostoftheparametersthathavebeensetintheelasticsearch.ymlfilecanbeoverwrittenwiththeuseoftheElasticsearchRESTAPI.WewilltalkaboutthisAPIinTheupdatesettingsAPIsectionofChapter9,ElasticsearchClusterinDetail.

Thesecondfile(logging.yml)defineshowmuchinformationiswrittentosystemlogs,definesthelogfiles,andcreatesnewfilesperiodically.Changesinthisfileareusuallyrequiredonlywhenyouneedtoadapttomonitoringorbackupsolutionsorduringsystemdebugging;however,ifyouwanttohaveamoredetailedlogging,youneedtoadjustitaccordingly.

Let’sleavetheconfigurationfilesfornowandlookatthebaseforalltheapplications—theoperatingsystem.TuningyouroperatingsystemisoneofthekeypointstoensurethatyourElasticsearchinstancewillworkwell.Duringindexing,especiallywhenhaving

www.EBooksWorld.ir

manyshardsandreplicas,Elasticsearchwillcreatemanyfiles;so,thesystemcannotlimittheopenfiledescriptorstolessthan32,000.ForLinuxservers,thiscanusuallybechangedin/etc/security/limits.confandthecurrentvaluecanbedisplayedusingtheulimitcommand.Ifyouendupreachingthelimit,Elasticsearchwillnotbeabletocreatenewfiles;somergingwillfail,indexingmayfail,andnewindiceswillnotbecreated.

NoteOnMicrosoftWindowsplatforms,thedefaultlimitismorethan16millionhandlesperprocess,whichshouldbemorethanenough.YoucanreadmoreaboutfilehandlesontheMicrosoftWindowsplatformathttps://blogs.technet.microsoft.com/markrussinovich/2009/09/29/pushing-the-limits-of-windows-handles/.

ThenextsetofsettingsisconnectedtotheJavaVirtualMachine(JVM)heapmemorylimitforasingleElasticsearchinstance.Forsmalldeployments,thedefaultmemorylimit(1,024MB)willbesufficient,butforlargeonesitwillnotbeenough.IfyouspotentriesthatindicateOutOfMemoryErrorexceptionsinalogfile,settheES_HEAP_SIZEvariabletoavaluegreaterthan1024.WhenchoosingtherightamountofmemorysizetobegiventotheJVM,rememberthat,ingeneral,nomorethan50percentofyourtotalsystemmemoryshouldbegiven.However,aswithalltherules,thereareexceptions.Wewilldiscussthisingreaterdetaillater,butyoushouldalwaysmonitoryourJVMheapusageandadjustitwhenneeded.

www.EBooksWorld.ir

Thesystem-specificinstallationandconfigurationAlthoughdownloadinganarchivewithElasticsearchandunpackingitworksandisconvenientfortesting,therearededicatedmethodsforLinuxoperatingsystemsthatgiveyouseveraladvantageswhenyoudoproductiondeployment.Inproductiondeployments,theElasticsearchserviceshouldberunautomaticallywithasystemboot;weshouldhavededicatedstartandstopscripts,unifiedpaths,andsoon.ElasticsearchsupportsinstallationpackagesforvariousLinuxdistributionsthatwecanuse.Let’sseehowthisworks.

InstallingElasticsearchonLinuxTheotherwaytoinstallElasticsearchonaLinuxoperatingsystemistousepackagessuchasRPMorDEB,dependingonyourLinuxdistributionandthesupportedpackagetype.Thiswaywecanautomaticallyadapttosystemdirectorylayout;forexample,configurationandlogswillgointotheirstandardplacesinthe/etc/or/var/logdirectories.Butthisisnottheonlything.Whenusingpackages,Elasticsearchwillalsoinstallstartupscriptsandmakeourlifeeasier.What’smore,wewillbeabletoupgradeElasticsearcheasilybyrunningasinglecommandfromthecommandline.Ofcourse,thementionedpackagescanbefoundatthesameURLaddressaswementionedpreviouslywhenwetalkedaboutinstallingElasticsearchfromziportar.gzpackages:https://www.elastic.co/downloads/elasticsearch.Elasticsearchcanalsobeinstalledfromremoterepositoriesviastandarddistributiontoolssuchasapt-getoryum.

NoteBeforeinstallingElasticsearch,makesurethatyouhaveaproperversionofJavaVirtualMachineinstalled.

InstallingElasticsearchusingRPMpackages

WhenusingaLinuxdistributionthatsupportsRPMpackagessuchasFedoraLinux,(https://getfedora.org/)Elasticsearchinstallationisveryeasy.AfterdownloadingtheRPMpackage,wejustneedtorunthefollowingcommandasroot:

yumelasticsearch-2.2.0.noarch.rpm

Alternatively,youcanaddtheremoterepositoryandinstallElasticsearchfromit(thiscommandneedstoberunasrootaswell):

rpm--importhttps://packages.elastic.co/GPG-KEY-elasticsearch

ThiscommandaddstheGPGkeyandallowsthesystemtoverifythatthefetchedpackagereallycomesfromElasticsearchdevelopers.Inthesecondstep,weneedtocreatetherepositorydefinitioninthe/etc/yum.repos.d/elasticsearch.repofile.Weneedtoaddthefollowingentriestothisfile:

[elasticsearch-2.2]

name=Elasticsearchrepositoryfor2.2.xpackages

baseurl=http://packages.elastic.co/elasticsearch/2.x/centos

gpgcheck=1

www.EBooksWorld.ir

gpgkey=http://packages.elastic.co/GPG-KEY-elasticsearch

enabled=1

Nowit’stimetoinstalltheElasticsearchserver,whichisassimpleasrunningthefollowingcommand(again,don’tforgettorunitasroot):

yuminstallelasticsearch

Elasticsearchwillbeautomaticallydownloaded,verified,andinstalled.

InstallingElasticsearchusingtheDEBpackage

WhenusingaLinuxdistributionthatsupportsDEBpackages(suchasDebian),installingElasticsearchisagainveryeasy.AfterdownloadingtheDEBpackage,allyouneedtodoisrunthefollowingcommand:

sudodpkg-ielasticsearch-2.2.0.deb

Itisassimpleasthat.Anotherway,whichissimilartowhatwedidwithRPMpackages,isbycreatinganewpackagessourceandinstallingElasticsearchfromtheremoterepository.ThefirststepistoaddthepublicGPGkeyusedforpackageverification.Wecandothatusingthefollowingcommand:

wget-qO-https://packages.elastic.co/GPG-KEY-elasticsearch|sudoapt-key

add-

ThesecondstepisbyaddingtheDEBpackagelocation.Weneedtoaddthefollowinglinetothe/etc/apt/sources.listfile:

debhttp://packages.elastic.co/elasticsearch/2.2/debianstablemain

ThisdefinesthesourcefortheElasticsearchpackages.ThelaststepisupdatingthelistofremotepackagesandinstallingElasticsearchusingthefollowingcommand:

sudoapt-getupdate&&sudoapt-getinstallelasticsearch

Elasticsearchconfigurationfilelocalization

WhenusingpackagestoinstallElasticsearch,theconfigurationfilesareinslightlydifferentdirectoriesthanthedefaultconfdirectory.Aftertheinstallation,theconfigurationfilesshouldbestoredinthefollowinglocation:

/etc/sysconfig/elasticsearchor/etc/default/elasticsearch:AfilewiththeconfigurationoftheElasticsearchprocessasausertorunas,directoriesforlogs,dataandmemorysettings/etc/elasticsearch/:AdirectoryfortheElasticsearchconfigurationfiles,suchastheelasticsearch.ymlfile

ConfiguringElasticsearchasasystemserviceonLinuxIfeverythinggoeswell,youcanrunElasticsearchusingthefollowingcommand:

/bin/systemctlstartelasticsearch.service

IfyouwantElasticsearchtostartautomaticallyeverytimetheoperatingsystemstarts,you

www.EBooksWorld.ir

cansetupElasticsearchasasystemservicebyrunningthefollowingcommand:

/bin/systemctlenableelasticsearch.service

ElasticsearchasasystemserviceonWindowsInstallingElasticsearchasasystemserviceonWindowsisalsoveryeasy.YoujustneedtogotoyourElasticsearchinstallationdirectory,thengotothebinsubdirectory,andrunthefollowingcommand:

service.batinstall

You’llbeaskedforpermissiontodoso.Ifyouallowthescripttorun,ElasticsearchwillbeinstalledasaWindowsservice.

Ifyouwouldliketoseeallthecommandsexposedbytheservice.batscriptfile,justrunthefollowingcommandinthesamedirectoryasearlier:

service.bat

Forexample,tostartElasticsearch,wewilljustrunthefollowingcommand:

service.batstart

www.EBooksWorld.ir

www.EBooksWorld.ir

ManipulatingdatawiththeRESTAPIElasticsearchexposesaveryrichRESTAPIthatcanbeusedtosearchthroughthedata,indexthedata,andcontrolElasticsearchbehavior.YoucanimaginethatusingtheRESTAPIallowsyoutogetasingledocument,indexorupdateadocument,gettheinformationonElasticsearchcurrentstate,createordeleteindices,orforceElasticsearchtomovearoundshardsofyourindices.Ofcourse,theseareonlyexamplesthatshowwhatyoucanexpectfromtheElasticsearchRESTAPI.Fornow,wewillconcentrateonusingthecreate,retrieve,update,delete(CRUD)partoftheElasticsearchAPI(https://en.wikipedia.org/wiki/Create,_read,_update_and_delete),whichallowsustouseElasticsearchinafashionsimilartohowwewoulduseanyotherNoSQL(https://en.wikipedia.org/wiki/NoSQL)datastore.

www.EBooksWorld.ir

UnderstandingtheRESTAPIIfyou’veneverusedanapplicationexposingtheRESTAPI,youmaybesurprisedhoweasyitistousesuchapplicationsandrememberhowtousethem.InREST-likearchitectures,everyrequestisdirectedtoaconcreteobjectindicatedbyapathintheaddress.Forexample,let’sassumethatourhypotheticalapplicationexposesthe/booksRESTend-pointasareferencetothelistofbooks.Insuchcase,acallto/books/1couldbeareferencetoaconcretebookwiththeidentifier1.Youcanthinkofitasadata-orientedmodelofanAPI.Ofcourse,wecannestthepaths—forexample,apathsuchas/books/1/chapterscouldreturnthelistofchaptersofourbookwithidentifier1andapathsuchas/books/1/chapters/6couldbeareferencetothesixthchapterinthatparticularbook.

Wetalkedaboutpaths,butwhenusingtheHTTPprotocol,(https://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol)wehavesomeadditionalverbs(suchasPOST,GET,PUT,andsoon.)thatwecanusetodefinesystembehaviorinadditiontopaths.Soifwewouldliketoretrievethebookwithidentifier1,wewouldusetheGETrequestmethodwiththe/books/1path.However,wewouldusethePUTrequestmethodwiththesamepathtocreateabookrecordwiththeidentifierorone,thePOSTrequestmethodtoaltertherecord,DELETEtoremovethatentry,andtheHEADrequestmethodtogetbasicinformationaboutthedatareferencedbythepath.

Now,let’slookatexampleHTTPrequeststhataresenttorealElasticsearchRESTAPIendpoints,sotheprecedinghypotheticalinformationwillbeturnedintosomethingreal:

GEThttp://localhost:9200/:ThisretrievesbasicinformationaboutElasticsearch,suchastheversion,thenameofthenodethatthecommandhasbeensentto,thenameoftheclusterthatnodeisconnectedto,theApacheLuceneversion,andsoon.

GEThttp://localhost:9200/_cluster/state/nodes/Thisretrievesinformationaboutallthenodesinthecluster,suchastheiridentifiers,names,transportaddresseswithports,andadditionalnodeattributesforeachnode.

DELETEhttp://localhost:9200/books/book/123:Thisdeletesadocumentthatisindexedinthebooksindex,withthebooktypeandanidentifierof123.

WenowknowwhatRESTmeansandwecanstartconcentratingonElasticsearchtoseehowwecanstore,retrieve,alter,anddeletethedatafromitsindices.IfyouwouldliketoreadmoreaboutREST,pleaserefertohttp://en.wikipedia.org/wiki/Representational_state_transfer.

www.EBooksWorld.ir

StoringdatainElasticsearchInElasticsearch,everydocumentisrepresentedbythreeattributes—theindex,thetype,andtheidentifier.Eachdocumentmustbeindexedintoasingleindex,needstohaveitstypecorrespondtothedocumentstructure,andisdescribedbytheidentifier.ThesethreeattributesallowsustoidentifyanydocumentinElasticsearchandneedstobeprovidedwhenthedocumentisphysicallywrittentotheunderlyingApacheLuceneindex.Havingtheknowledge,wearenowreadytocreateourfirstElasticsearchdocument.

CreatinganewdocumentWewillstartlearningtheElasticsearchRESTAPIbyindexingonedocument.Let’simaginethatwearebuildingaCMSsystem(http://en.wikipedia.org/wiki/Content_management_system)thatwillprovidethefunctionalityofabloggingplatformforourinternalusers.Wewillhavedifferenttypesofdocumentsinourindices,butthemostimportantonesarethearticlesthatwillbepublishedandarereadablebyusers.

BecausewetalktoElasticsearchusingJSONnotationandElasticsearchrespondstousagainusingJSON,ourexampledocumentcouldlookasfollows:

{

"id":"1",

"title":"NewversionofElasticsearchreleased!",

"content":"Version2.2releasedtoday!",

"priority":10,

"tags":["announce","elasticsearch","release"]

}

Asyoucanseeintheprecedingcodesnippet,theJSONdocumentisbuiltwithasetoffields,whereeachfieldcanhaveadifferentformat.Inourexample,wehaveasetoftextfields(id,title,andcontent),wehaveanumber(thepriorityfield),andanarrayoftextvalues(thetagsfield).Wewillshowdocumentsthataremorecomplicatedinthenextexamples.

NoteOneofthechangesintroducedinElasticsearch2.0hasbeenthatfieldnamescan’tcontainthedotcharacter.SuchfieldnameswerepossibleinolderversionsofElasticsearch,butcouldresultinserializationerrorsincertaincasesandthusElasticsearchcreatorsdecidedtoremovethatpossibility.

OnethingtorememberisthatbydefaultElasticsearchworksasaschema-lessdatastore.ThismeansthatitcantrytoguessthetypeofthefieldinadocumentsenttoElasticsearch.Itwilltrytousenumerictypesforthevaluesthatarenotenclosedinquotationmarksandstringsfordataenclosedinquotationmarks.Itwilltrytoguessthedateandindexthemindedicatedfieldsandsoon.ThisispossiblebecausetheJSONformatissemi-typed.Internally,whenthefirstdocumentwithanewfieldissenttoElasticsearch,itwillbeprocessedandmappingswillbewritten(wewilltalkmoreaboutmappingsintheMappingsconfigurationsectionofChapter2,IndexingYourData).

www.EBooksWorld.ir

NoteAschema-lessapproachanddynamicmappingscanbeproblematicwhendocumentscomewithaslightlydifferentstructure—forexample,thefirstdocumentwouldcontainthevalueofthepriorityfieldwithoutquotationmarks(liketheoneshowninthediscussedexample),whiletheseconddocumentwouldhavequotationmarksforthevalueinthepriorityfield.ThiswillresultinanerrorbecauseElasticsearchwilltrytoputatextvalueinthenumericfieldandthisisnotpossibleinLucene.Becauseofthis,itisadvisabletodefineyourownmappings,whichyouwilllearnintheMappingsconfigurationsectionofChapter2,IndexingYourData.

Let’snowindexourdocumentandmakeitavailableforretrievalandsearching.Wewillindexourarticlestoanindexcalledblogunderatypenamedarticle.Wewillalsogiveourdocumentanidentifierof1,asthisisourfirstdocument.Toindexourexampledocument,wewillexecutethefollowingcommand:

curl-XPUT'http://localhost:9200/blog/article/1'-d'{"title":"New

versionofElasticsearchreleased!","content":"Version2.2released

today!","priority":10,"tags":["announce","elasticsearch","release"]

}'

Noteanewoptiontothecurlcommand,the-dparameter.Thevalueofthisoptionisthetextthatwillbeusedasarequestpayload—arequestbody.Thisway,wecansendadditionalinformationsuchasthedocumentdefinition.Also,notethattheuniqueidentifierisplacedintheURLandnotinthebody.Ifyouomitthisidentifier(whileusingtheHTTPPUTrequest),theindexingrequestwillreturnthefollowingerror:

Nohandlerfoundforuri[/blog/article]andmethod[PUT]

Ifeverythingworkedcorrectly,ElasticsearchwillreturnaJSONresponseinformingusaboutthestatusoftheindexingoperation.Thisresponseshouldbesimilartothefollowingone:

{

"_index":"blog",

"_type":"article",

"_id":"1",

"_version":1,

"_shards":{

"total":2,

"successful":1,

"failed":0},

"created":true

}

Intheprecedingresponse,Elasticsearchincludedinformationaboutthestatusoftheoperation,index,type,identifier,andversion.Wecanalsoseeinformationabouttheshardsthattookpartintheoperation—allofthem,theonesthatweresuccessfulandtheonesthatfailed.

Automaticidentifiercreation

www.EBooksWorld.ir

Inthepreviousexample,wespecifiedthedocumentidentifiermanuallywhenweweresendingthedocumenttoElasticsearch.However,thereareusecaseswhenwedon’thaveanidentifierforourdocuments—forexample,whenhandlinglogsasourdata.Insuchcases,wewouldlikesomeapplicationtocreatetheidentifierforusandElasticsearchcanbesuchanapplication.Ofcourse,generatingdocumentidentifiersdoesn’tmakesensewhenyourdocumentalreadyhasthem,suchasdatainarelationaldatabase.Insuchcases,youmaywanttoupdatethedocuments;inthiscase,automaticidentifiergenerationisnotthebestidea.However,whenweareinneedofsuchfunctionality,insteadofusingtheHTTPPUTmethodwecanusePOSTandomittheidentifierintheRESTAPIpath.SoifwewouldlikeElasticsearchtogeneratetheidentifierinthepreviousexample,wewouldsendacommandlikethis:

curl-XPOST'http://localhost:9200/blog/article/'-d'{"title":"New

versionofElasticsearchreleased!","content":"Version2.2released

today!","priority":10,"tags":["announce","elasticsearch","release"]

}'

We’veusedtheHTTPPOSTmethodinsteadofPUTandwe’veomittedtheidentifier.TheresponseproducedbyElasticsearchinsuchacasewouldbeasfollows:

{

"_index":"blog",

"_type":"article",

"_id":"AU1y-s6w2WzST_RhTvCJ",

"_version":1,

"_shards":{

"total":2,

"successful":1,

"failed":0},

"created":true

}

Asyoucansee,theresponsereturnedbyElasticsearchisalmostthesameasinthepreviousexample,withaminordifference—the_idfieldisreturned.Now,insteadofthe1value,wehaveavalueofAU1y-s6w2WzST_RhTvCJ,whichistheidentifierElasticsearchgeneratedforourdocument.

www.EBooksWorld.ir

RetrievingdocumentsWenowhavetwodocumentsindexedintoourElasticsearchinstance—oneusingaexplicitidentifierandoneusingageneratedidentifier.Let’snowtrytoretrieveoneofthedocumentsusingitsuniqueidentifier.Todothis,wewillneedinformationabouttheindexthedocumentisindexedin,whattypeithas,andofcoursewhatidentifierithas.Forexample,togetthedocumentfromtheblogindexwiththearticletypeandtheidentifierof1,wewouldrunthefollowingHTTPGETrequest:

curl-XGET'localhost:9200/blog/article/1?pretty'

NoteTheadditionalURIpropertycalledprettytellsElasticsearchtoincludenewlinecharactersandadditionalwhitespacesinresponsetomaketheoutputeasiertoreadforusers.

Elasticsearchwillreturnaresponsesimilartothefollowing:

{

"_index":"blog",

"_type":"article",

"_id":"1",

"_version":1,

"found":true,

"_source":{

"title":"NewversionofElasticsearchreleased!",

"content":"Version2.2releasedtoday!",

"priority":10,

"tags":["announce","elasticsearch","release"]

}

}

Asyoucanseeintheprecedingresponse,Elasticsearchreturnedthe_sourcefield,whichistheoriginaldocumentsenttoElasticsearchandafewadditionalfieldsthattellusaboutthedocument,suchastheindex,type,identifier,documentversion,andofcourseinformationastowhetherthedocumentwasfoundornot(thefoundproperty).

Ifwetrytoretrieveadocumentthatisnotpresentintheindex,suchastheonewiththe12345identifier,wegetaresponselikethis:

{

"_index":"blog",

"_type":"article",

"_id":"12345",

"found":false

}

Asyoucansee,thistimethevalueofthefoundpropertywassettofalseandtherewasno_sourcefieldbecausethedocumenthasnotbeenretrieved.

www.EBooksWorld.ir

UpdatingdocumentsUpdatingdocumentsintheindexisamorecomplicatedtaskcomparedtoindexing.WhenthedocumentisindexedandElasticsearchflushesthedocumenttoadisk,itcreatessegments—animmutablestructurethatiswrittenonceandreadmanytimes.ThisisdonebecausetheinvertedindexcreatedbyApacheLuceneiscurrentlyimpossibletoupdate(atleastmostofitsparts).Toupdateadocument,ElasticsearchinternallyfirstfetchesthedocumentusingtheGETrequest,modifiesits_sourcefield,removestheolddocument,andindexesanewdocumentusingtheupdatedcontent.ThecontentupdateisdoneusingscriptsinElasticsearch(wewilltalkmoreaboutscriptinginElasticsearchintheScriptingcapabilitiesofElasticsearchsectioninChapter6,MakeYourSearchBetter).

NotePleasenotethatthefollowingdocumentupdateexamplesrequireyoutoputthescript.inline:onpropertyintoyourelasticsearch.ymlconfigurationfile.ThisisneededbecauseinlinescriptingisdisabledinElasticsearchforsecurityreasons.TheotherwaytohandleupdatesistostorethescriptcontentinthefileintheElasticsearchconfigurationdirectory,butwewilltalkaboutthatintheScriptingcapabilitiesofElasticsearchsectioninChapter6,MakeYourSearchBetter.

Let’snowtrytoupdateourdocumentwithidentifier1bymodifyingitscontentfieldtocontaintheThisistheupdateddocumentsentence.Todothis,weneedtorunaPOSTHTTPrequestonthedocumentpathusingthe_updateRESTend-point.Ourrequesttomodifythedocumentwouldlookasfollows:

curl-XPOST'http://localhost:9200/blog/article/1/_update'-d'{

"script":"ctx._source.content=new_content",

"params":{

"new_content":"Thisistheupdateddocument"

}

}'

Asyoucansee,we’vesenttherequesttothe/blog/article/1/_updateRESTend-point.Intherequestbody,we’veprovidedtwoparameters—theupdatescriptinthescriptpropertyandtheparametersofthescript.Thescriptisverysimple;ittakesthe_sourcefieldandmodifiesthecontentfieldbysettingitsvaluetothevalueofthenew_contentparameter.Theparamspropertycontainsallthescriptparameters.

Fortheprecedingupdatecommandexecution,Elasticsearchwouldreturnthefollowingresponse:

{"_index":"blog","_type":"article","_id":"1","_version":2,"_shards":

{"total":2,"successful":1,"failed":0}}

Thethingtolookatintheprecedingresponseisthe_versionfield.Rightnow,theversionis2,whichmeansthatthedocumenthasbeenupdated(orre-indexed)once.Basically,eachupdatemakesElasticsearchupdatethe_versionfield.

Wecouldalsoupdatethedocumentusingthedocsectionandprovidingthechangedfield,

www.EBooksWorld.ir

forexample:

curl-XPOST'http://localhost:9200/blog/article/1/_update'-d'{

"doc":{

"content":"Thisistheupdateddocument"

}

}'

Wenowretrievethedocumentusingthefollowingcommand:

curl-XGET'http://localhost:9200/blog/article/1?pretty'

AndwegetthefollowingresponsefromElasticsearch:

{

"_index":"blog",

"_type":"article",

"_id":"1",

"_version":2,

"found":true,

"_source":{

"title":"NewversionofElasticsearchreleased!",

"content":"Thisistheupdateddocument",

"priority":10,

"tags":["announce","elasticsearch","release"]

}

}

Asyoucansee,thedocumenthasbeenupdatedproperly.

NoteThethingtorememberwhenusingtheupdateAPIofElasticsearchisthatthe_sourcefieldneedstobepresentbecausethisisthefieldthatElasticsearchusestoretrievetheoriginaldocumentcontentfromtheindex.Bydefault,thatfieldisenabledandElasticsearchusesittostoretheoriginaldocument.

Dealingwithnon-existingdocumentsThenicethingwhenitcomestodocumentupdates,whichwewouldliketomentionasitcancomeinhandywhenusingElasticsearchUpdateAPI,isthatwecandefinewhatElasticsearchshoulddowhenthedocumentwetrytoupdateisnotpresent.

Forexample,let’stryincrementingthepriorityfieldvalueforanon-existingdocumentwithidentifier2:

curl-XPOST'http://localhost:9200/blog/article/2/_update'-d'{

"script":"ctx._source.priority+=1"

}'

TheresponsereturnedbyElasticsearchwouldlookmoreorlessasfollows:

{"error":{"root_cause":[{"type":"document_missing_exception","reason":"

[article][2]:document

missing","shard":"2","index":"blog"}],"type":"document_missing_exception","

reason":"[article][2]:document

www.EBooksWorld.ir

missing","shard":"2","index":"blog"},"status":404}

Asyoucanimagine,thedocumenthasnotbeenupdatedbecauseitdoesn’texist.Sonow,let’smodifyourrequesttoincludetheupsertsectioninourrequestbodythatwilltellElasticsearchwhattodowhenthedocumentisnotpresent.Thenewcommandwouldlookasfollows:

curl-XPOST'http://localhost:9200/blog/article/2/_update'-d'{

"script":"ctx._source.priority+=1",

"upsert":{

"title":"Emptydocument",

"priority":0,

"tags":["empty"]

}

}'

Withthemodifiedrequest,anewdocumentwouldbeindexed;ifweretrieveitusingtheGETAPI,itwilllookasfollows:

{

"_index":"blog",

"_type":"article",

"_id":"2",

"_version":1,

"found":true,

"_source":{

"title":"Emptydocument",

"priority":0,

"tags":["empty"]

}

}

Asyoucansee,thefieldsfromtheupsertsectionofourupdaterequestweretakenbyElasticsearchandusedasdocumentfields.

AddingpartialdocumentsInadditiontowhatwealreadywroteabouttheupdateAPI,Elasticsearchisalsocapableofmergingpartialdocumentsfromtheupdaterequesttoalreadyexistingdocumentsorindexingnewdocumentsusinginformationabouttherequest,similartowhatwesawseenwiththeupsertsection.

Let’simaginethatwewouldliketoupdateourinitialdocumentandaddanewfieldcalledcounttoit(settingitto1initially).Wewouldalsoliketoindexthedocumentunderthespecifiedidentifierifthedocumentisnotpresent.Wecandothisbyrunningthefollowingcommand:

curl-XPOST'http://localhost:9200/blog/article/1/_update'-d'{

"doc":{

"count":1

},

"doc_as_upsert":true

}

Wespecifiedthenewfieldinthedocsectionandwesaidthatwewantthedocsectionto

www.EBooksWorld.ir

betreatedastheupsertsectionwhenthedocumentisnotpresent(withthedoc_as_upsertpropertysettotrue).

Ifwenowretrievethatdocument,weseethefollowingresponse:

{

"_index":"blog",

"_type":"article",

"_id":"1",

"_version":3,

"found":true,

"_source":{

"title":"NewversionofElasticsearchreleased!",

"content":"Thisistheupdateddocument",

"priority":10,

"tags":["announce","elasticsearch","release"],

"count":1

}

}

NoteForafullreferenceondocumentupdates,pleaserefertotheofficialElasticsearchdocumentationontheUpdateAPI,whichisavailableathttps://www.elastic.co/guide/en/elasticsearch/reference/current/docs-update.html.

www.EBooksWorld.ir

DeletingdocumentsNowthatweknowhowtoindexdocuments,updatethem,andretrievethem,itistimetolearnabouthowwecandeletethem.DeletingadocumentfromanElasticsearchindexisverysimilartoretrievingit,butwithonemajordifference—insteadofusingtheHTTPGETmethod,wehavetouseHTTPDELETEone.

Forexample,ifwewouldliketodeletethedocumentindexedintheblogindexunderthearticletypeandwithanidentifierof1,wewouldrunthefollowingcommand:

curl-XDELETE'localhost:9200/blog/article/1'

TheresponsefromElasticsearchindicatesthatthedocumenthasbeendeletedandshouldlookasfollows:

{

"found":true,

"_index":"blog",

"_type":"article",

"_id":"1",

"_version":4,

"_shards":{

"total":2,

"successful":1,

"failed":0

}

}

Ofcourse,thisisnottheonlythingwhenitcomestodeleting.Wecanalsoremoveallthedocumentsofagiventype.Forexample,ifwewouldliketodeletetheentireblogindex,weshouldjustomittheidentifierandthetype,sothecommandwouldlooklikethis:

curl-XDELETE'localhost:9200/blog'

Theprecedingcommandwouldresultinthedeletionoftheblogindex.

www.EBooksWorld.ir

VersioningFinally,thereisonelastthingthatwewouldliketotalkaboutwhenitcomestodatamanipulationinElasticsearch—thegreatfeatureofversioning.Asyoumayhavealreadynoticed,Elasticsearchincrementsthedocumentversionwhenitdoesupdatestoit.Wecanleveragethisfunctionalityanduseoptimisticlocking(http://en.wikipedia.org/wiki/Optimistic_concurrency_control),andavoidconflictsandoverwriteswhenmultipleprocessesorthreadsaccessthesamedocumentconcurrently.Youcanassumethatyourindexingapplicationmaywanttotrytoupdatethedocument,whiletheuserwouldliketoupdatethedocumentwhiledoingsomemanualwork.Thequestionthatarisesis:Whichdocumentshouldbethecorrectone—theoneupdatedbytheindexingapplication,theoneupdatedbytheuser,orthemergeddocumentofthechanges?Whatifthechangesareconflicting?Tohandlesuchcases,wecanuseversioning.

UsageexampleLet’sindexanewdocumenttoourblogindex—onewithanidentifierof10,andlet’sindexitssecondversionsoonafterwedothat.Thecommandsthatdothislookasfollows:

curl-XPUT'localhost:9200/blog/article/10'-d'{"title":"Testdocument"}'

curl-XPUT'localhost:9200/blog/article/10'-d'{"title":"Updatedtest

document"}'

Becausewe’veindexedthedocumentwiththesameidentifier,itshouldhaveaversion2(youcancheckitusingtheGETrequest).

Now,let’strydeletingthedocumentwe’vejustindexedbutlet’sspecifyaversionpropertyequalto1.Bydoingthis,wetellElasticsearchthatweareinterestedindeletingthedocumentwiththeprovidedversion.Becausethedocumentisadifferentversionnow,Elasticsearchshouldn’tallowindexingwithversion1.Let’scheckifwhatwesayistrue.Thecommandwewillusetosendthedeleterequestlooksasfollows:

curl-XDELETE'localhost:9200/blog/article/10?version=1'

TheresponsegeneratedbyElasticsearchshouldbesimilartothefollowingone:

{

"error":{

"root_cause":[{

"type":"version_conflict_engine_exception",

"reason":"[article][10]:versionconflict,current[2],provided

[1]",

"shard":1,

"index":"blog"

}],

"type":"version_conflict_engine_exception",

"reason":"[article][10]:versionconflict,current[2],provided

[1]",

"shard":1,

"index":"blog"

},

"status":409

www.EBooksWorld.ir

}

Asyoucansee,thedeleteoperationwasnotsuccessful—theversionsdidn’tmatch.Ifwesettheversionpropertyto2,thedeleteoperationwouldbesuccessful:

curl-XDELETE'localhost:9200/blog/article/10?version=2&pretty'

Theresponsethistimewilllookasfollows:

{

"found":true,

"_index":"blog",

"_type":"article",

"_id":"10",

"_version":3,

"_shards":{

"total":2,

"successful":1,

"failed":0

}

}

Thistimethedeleteoperationhasbeensuccessfulbecausetheprovidedversionwasproper.

VersioningfromexternalsystemsTheverygoodthingaboutElasticsearchversioningcapabilitiesisthatwecanprovidetheversionofthedocumentthatwewouldlikeElasticsearchtouse.Thisallowsustoprovideversionsfromexternaldatasystemsthatareourprimarydatastores.Todothis,weneedtoprovideanadditionalparameterduringindexing—version_type=externaland,ofcourse,theversionitself.Forexample,ifwewouldlikeourdocumenttohavethe12345version,wecouldsendarequestlikethis:

curl-XPUT'localhost:9200/blog/article/20?

version=12345&version_type=external'-d'{"title":"Testdocument"}'

TheresponsereturnedbyElasticsearchisasfollows:

{

"_index":"blog",

"_type":"article",

"_id":"20",

"_version":12345,

"_shards":{

"total":2,

"successful":1,

"failed":0

},

"created":true

}

Wejustneedtorememberthat,whenusingversion_type=external,weneedtoprovidetheversionincaseswhereweindexthedocument.Incaseswherewewouldliketochangethedocumentanduseoptimisticlocking,weneedtoprovideaversionparameter

www.EBooksWorld.ir

equalto,orhigherthan,theversionpresentinthedocument.

www.EBooksWorld.ir

www.EBooksWorld.ir

SearchingwiththeURIrequestqueryBeforegettingintothewonderfulworldoftheElasticsearchquerylanguage,wewouldliketointroduceyoutothesimplebutprettyflexibleURIrequestsearch,whichallowsustouseasimpleElasticsearchquerycombinedwiththeLucenequerylanguage.Ofcourse,wewillextendoursearchknowledgeusingElasticsearchinChapter3,SearchingYourData,butfornowwewillsticktothesimplestapproach.

www.EBooksWorld.ir

SampledataForthepurposeofthissectionofthebook,wewillcreateasimpleindexwithtwodocumenttypes.Todothis,wewillrunthefollowingsixcommands:

curl-XPOST'localhost:9200/books/es/1'-d'{"title":"Elasticsearch

Server","published":2013}'

curl-XPOST'localhost:9200/books/es/2'-d'{"title":"ElasticsearchServer

SecondEdition","published":2014}'

curl-XPOST'localhost:9200/books/es/3'-d'{"title":"Mastering

Elasticsearch","published":2013}'

curl-XPOST'localhost:9200/books/es/4'-d'{"title":"Mastering

ElasticsearchSecondEdition","published":2015}'

curl-XPOST'localhost:9200/books/solr/1'-d'{"title":"ApacheSolr4

Cookbook","published":2012}'

curl-XPOST'localhost:9200/books/solr/2'-d'{"title":"SolrCookbookThird

Edition","published":2015}'

Runningtheprecedingcommandswillcreatethebook’sindexwithtwotypes:esandsolr.Thetitleandpublishedfieldswillbeindexedandthus,searchable.

www.EBooksWorld.ir

URIsearchAllqueriesinElasticsearcharesenttothe_searchendpoint.Youcansearchasingleindexormultipleindices,andyoucanrestrictyoursearchtoagivendocumenttypeormultipletypes.Forexample,inordertosearchourbook’sindex,wewillrunthefollowingcommand:

curl-XGET'localhost:9200/books/_search?pretty'

TheresultsreturnedbyElasticsearchwillincludeallthedocumentsfromourbook’sindex(becausenoqueryhasbeenspecified)andshouldlooksimilartothefollowing:

{

"took":3,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":6,

"max_score":1.0,

"hits":[{

"_index":"books",

"_type":"es",

"_id":"2",

"_score":1.0,

"_source":{

"title":"ElasticsearchServerSecondEdition",

"published":2014

}

},{

"_index":"books",

"_type":"es",

"_id":"4",

"_score":1.0,

"_source":{

"title":"MasteringElasticsearchSecondEdition",

"published":2015

}

},{

"_index":"books",

"_type":"solr",

"_id":"2",

"_score":1.0,

"_source":{

"title":"SolrCookbookThirdEdition",

"published":2015

}

},{

"_index":"books",

"_type":"es",

"_id":"1",

"_score":1.0,

www.EBooksWorld.ir

"_source":{

"title":"ElasticsearchServer",

"published":2013

}

},{

"_index":"books",

"_type":"solr",

"_id":"1",

"_score":1.0,

"_source":{

"title":"ApacheSolr4Cookbook",

"published":2012

}

},{

"_index":"books",

"_type":"es",

"_id":"3",

"_score":1.0,

"_source":{

"title":"MasteringElasticsearch",

"published":2013

}

}]

}

}

Asyoucansee,theresponsehasaheaderthattellsyouthetotaltimeofthequeryandtheshardsusedinthequeryprocess.Inadditiontothis,wehavedocumentsmatchingthequery—thetop10documentsbydefault.Eachdocumentisdescribedbytheindex,type,identifier,score,andthesourceofthedocument,whichistheoriginaldocumentsenttoElasticsearch.

Wecanalsorunqueriesagainstmanyindices.Forexample,ifwehadanotherindexcalledclients,wecouldalsorunasinglequeryagainstthesetwoindicesasfollows:

curl-XGET'localhost:9200/books,clients/_search?pretty'

WecanalsorunqueriesagainstallthedatainElasticsearchbyomittingtheindexnamescompletelyorsettingthequeriesto_all:

curl-XGET'localhost:9200/_search?pretty'

curl-XGET'localhost:9200/_all/_search?pretty'

Inasimilarmanner,wecanalsochoosethetypeswewanttouseduringsearching.Forexample,ifwewanttosearchonlyintheestypeinthebook’sindex,werunacommandasfollows:

curl-XGET'localhost:9200/books/es/_search?pretty'

Pleaserememberthat,inordertosearchforagiventype,weneedtospecifytheindexormultipleindices.Elasticsearchallowsustohavequitearichsemanticswhenitcomestochoosingindexnames.Ifyouareinterested,pleaserefertohttps://www.elastic.co/guide/en/elasticsearch/reference/current/multi-index.html;however,thereisonethingwewouldliketopointout.Whenrunningaqueryagainstmultiple

www.EBooksWorld.ir

indices,itmayhappenthatsomeofthemdonotexistorareclosed.Insuchcases,theignore_unavailablepropertycomesinhandy.Whensettotrue,ittellsElasticsearchtoignoreunavailableorclosedindices.

Forexample,let’stryrunningthefollowingquery:

curl-XGET'localhost:9200/books,non_existing/_search?pretty'

Theresponsewouldbesimilartothefollowingone:

{

"error":{

"root_cause":[{

"type":"index_missing_exception",

"reason":"nosuchindex",

"index":"non_existing"

}],

"type":"index_missing_exception",

"reason":"nosuchindex",

"index":"non_existing"

},

"status":404

}

Nowlet’scheckwhatwillhappenifweaddtheignore_unavailable=truetoourrequestandexecutethefollowingcommand:

curl-XGET'localhost:9200/books,non_existing/_search?

pretty&ignore_unavailable=true'

Inthiscase,Elasticsearchwouldreturntheresultswithoutanyerror.

ElasticsearchqueryresponseLet’sassumethatwewanttofindallthedocumentsinourbook’sindexthatcontaintheelasticsearchterminthetitlefield.Wecandothisbyrunningthefollowingquery:

curl-XGET'localhost:9200/books/_search?pretty&q=title:elasticsearch'

TheresponsereturnedbyElasticsearchfortheprecedingrequestwillbeasfollows:

{

"took":37,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.625,

"hits":[{

"_index":"books",

"_type":"es",

"_id":"1",

"_score":0.625,

www.EBooksWorld.ir

"_source":{

"title":"ElasticsearchServer",

"published":2013

}

},{

"_index":"books",

"_type":"es",

"_id":"2",

"_score":0.5,

"_source":{

"title":"ElasticsearchServerSecondEdition",

"published":2014

}

},{

"_index":"books",

"_type":"es",

"_id":"4",

"_score":0.5,

"_source":{

"title":"MasteringElasticsearchSecondEdition",

"published":2015

}

},{

"_index":"books",

"_type":"es",

"_id":"3",

"_score":0.19178301,

"_source":{

"title":"MasteringElasticsearch",

"published":2013

}

}]

}

}

Thefirstsectionoftheresponsegivesusinformationabouthowmuchtimetherequesttook(thetookpropertyisspecifiedinmilliseconds),whetheritwastimedout(thetimed_outproperty),andinformationabouttheshardsthatwerequeriedduringtherequestexecution—thenumberofqueriedshards(thetotalpropertyofthe_shardsobject),thenumberofshardsthatreturnedtheresultssuccessfully(thesuccessfulpropertyofthe_shardsobject),andthenumberoffailedshards(thefailedpropertyofthe_shardsobject).Thequerymayalsotimeoutifitisexecutedforalongerperiodthanwewant.(Wecanspecifythemaximumqueryexecutiontimeusingthetimeoutparameter.)Thefailedshardmeansthatsomethingwentwrongwiththatshardoritwasnotavailableduringthesearchexecution.

Ofcourse,thementionedinformationcanbeuseful,butusually,weareinterestedintheresultsthatarereturnedinthehitsobject.Wehavethetotalnumberofdocumentsreturnedbythequery(inthetotalproperty)andthemaximumscorecalculated(inthemax_scoreproperty).Finally,wehavethehitsarraythatcontainsthereturneddocuments.Inourcase,eachreturneddocumentcontainsitsindexname(the_indexproperty),thetype(the_typeproperty),theidentifier(the_idproperty),thescore(the_scoreproperty),andthe

www.EBooksWorld.ir

_sourcefield(usually,thisistheJSONobjectsentforindexing.

www.EBooksWorld.ir

QueryanalysisYoumaywonderwhythequerywe’verunintheprevioussectionworked.WeindexedtheElasticsearchtermandranaqueryforElasticsearchandeventhoughtheydiffer(capitalization),therelevantdocumentswerefound.Thereasonforthisistheanalysis.Duringindexing,theunderlyingLucenelibraryanalyzesthedocumentsandindexesthedataaccordingtotheElasticsearchconfiguration.Bydefault,ElasticsearchwilltellLucenetoindexandanalyzebothstring-baseddataaswellasnumbers.ThesamehappensduringqueryingbecausetheURIrequestquerymapstothequery_stringquery(whichwillbediscussedinChapter3,SearchingYourData),andthisqueryisanalyzedbyElasticsearch.

Let’susetheindices-analyzeAPI(https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-analyze.html).Itallowsustoseehowtheanalysisprocessisdone.Withthis,wecanseewhathappenedtooneofthedocumentsduringindexingandwhathappenedtoourqueryphraseduringquerying.

InordertoseewhatwasindexedinthetitlefieldoftheElasticsearchserverphrase,wewillrunthefollowingcommand:

curl-XGET'localhost:9200/books/_analyze?pretty&field=title'-d

'ElasticsearchServer'

Theresponsewillbeasfollows:

{

"tokens":[{

"token":"elasticsearch",

"start_offset":0,

"end_offset":13,

"type":"<ALPHANUM>",

"position":0

},{

"token":"server",

"start_offset":14,

"end_offset":20,

"type":"<ALPHANUM>",

"position":1

}]

}

YoucanseethatElasticsearchhasdividedthetextintotwoterms—thefirstonehasatokenvalueofelasticsearchandthesecondonehasatokenvalueoftheserver.

Nowlet’slookathowthequerytextwasanalyzed.Wecandothisbyrunningthefollowingcommand:

curl-XGET'localhost:9200/books/_analyze?pretty&field=title'-d

'elasticsearch'

Theresponseoftherequestwilllookasfollows:

www.EBooksWorld.ir

{

"tokens":[{

"token":"elasticsearch",

"start_offset":0,

"end_offset":13,

"type":"<ALPHANUM>",

"position":0

}]

}

Wecanseethatthewordisthesameastheoriginalonethatwepassedtothequery.Wewon’tgetintotheLucenequerydetailsandhowthequeryparserconstructedthequery,butingeneraltheindexedtermaftertheanalysiswasthesameastheoneinthequeryaftertheanalysis;so,thedocumentmatchedthequeryandtheresultwasreturned.

www.EBooksWorld.ir

URIquerystringparametersThereareafewparametersthatwecanusetocontrolURIquerybehavior,whichwewilldiscussnow.Thethingtorememberisthateachparameterinthequeryshouldbeconcatenatedwiththe&character,asshowninthefollowingexample:

curl-XGET'localhost:9200/books/_search?

pretty&q=published:2013&df=title&explain=true&default_operator=AND'

PleaseremembertoenclosetheURLoftherequestusingthe'charactersbecause,onLinux-basedsystems,the&characterwillbeanalyzedbytheLinuxshell.

ThequeryTheqparameterallowsustospecifythequerythatwewantourdocumentstomatch.ItallowsustospecifythequeryusingtheLucenequerysyntaxdescribedintheLucenequerysyntaxsectionlaterinthischapter.Forexample,asimplequerywouldlooklikethis:q=title:elasticsearch.

ThedefaultsearchfieldUsingthedfparameter,wecanspecifythedefaultsearchfieldthatshouldbeusedwhennofieldindicatorisusedintheqparameter.Bydefault,the_allfieldwillbeused.(ThisisthefieldthatElasticsearchusestocopythecontentofalltheotherfields.WewilldiscussthisingreaterdepthinChapter2,IndexingYourData).Anexampleofthedfparametervaluecanbedf=title.

AnalyzerTheanalyzerpropertyallowsustodefinethenameoftheanalyzerthatshouldbeusedtoanalyzeourquery.Bydefault,ourquerywillbeanalyzedbythesameanalyzerthatwasusedtoanalyzethefieldcontentsduringindexing.

ThedefaultoperatorpropertyThedefault_operatorpropertythatcanbesettoORorAND,allowsustospecifythedefaultBooleanoperatorusedforourquery(http://en.wikipedia.org/wiki/Boolean_algebra).Bydefault,itissettoOR,whichmeansthatasinglequerytermmatchwillbeenoughforadocumenttobereturned.SettingthisparametertoANDforaquerywillresultinreturningthedocumentsthatmatchallthequeryterms.

QueryexplanationIfwesettheexplainparametertotrue,Elasticsearchwillincludeadditionalexplaininformationwitheachdocumentintheresult—suchastheshardfromwhichthedocumentwasfetchedandthedetailedinformationaboutthescoringcalculation(wewilltalkmoreaboutitintheUnderstandingtheexplaininformationsectioninChapter6,MakeYourSearchBetter).Alsoremembernottofetchtheexplaininformationduringnormalsearchqueriesbecauseitrequiresadditionalresourcesandaddsperformancedegradationtothequeries.Forexample,aquerythatincludesexplaininformationcouldlookasfollows:

www.EBooksWorld.ir

curl-XGET'localhost:9200/books/_search?pretty&explain=true&q=title:solr'

TheresultsreturnedbyElasticsearchfortheprecedingquerywouldbeasfollows:

{

"took":2,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":2,

"max_score":0.70273256,

"hits":[{

"_shard":2,

"_node":"v5iRsht9SOWVzu-GY-YHlA",

"_index":"books",

"_type":"solr",

"_id":"2",

"_score":0.70273256,

"_source":{

"title":"SolrCookbookThirdEdition",

"published":2015

},

"_explanation":{

"value":0.70273256,

"description":"weight(title:solrin0)[PerFieldSimilarity],

resultof:",

"details":[{

"value":0.70273256,

"description":"fieldWeightin0,productof:",

"details":[{

"value":1.0,

"description":"tf(freq=1.0),withfreqof:",

"details":[{

"value":1.0,

"description":"termFreq=1.0",

"details":[]

}]

},{

"value":1.4054651,

"description":"idf(docFreq=1,maxDocs=3)",

"details":[]

},{

"value":0.5,

"description":"fieldNorm(doc=0)",

"details":[]

}]

}]

}

},{

"_shard":3,

"_node":"v5iRsht9SOWVzu-GY-YHlA",

"_index":"books",

www.EBooksWorld.ir

"_type":"solr",

"_id":"1",

"_score":0.5,

"_source":{

"title":"ApacheSolr4Cookbook",

"published":2012

},

"_explanation":{

"value":0.5,

"description":"weight(title:solrin1)[PerFieldSimilarity],

resultof:",

"details":[{

"value":0.5,

"description":"fieldWeightin1,productof:",

"details":[{

"value":1.0,

"description":"tf(freq=1.0),withfreqof:",

"details":[{

"value":1.0,

"description":"termFreq=1.0",

"details":[]

}]

},{

"value":1.0,

"description":"idf(docFreq=1,maxDocs=2)",

"details":[]

},{

"value":0.5,

"description":"fieldNorm(doc=1)",

"details":[]

}]

}]

}

}]

}

}

ThefieldsreturnedBydefault,foreachdocumentreturned,Elasticsearchwillincludetheindexname,thetypename,thedocumentidentifier,score,andthe_sourcefield.Wecanmodifythisbehaviorbyaddingthefieldsparameterandspecifyingacomma-separatedlistoffieldnames.Thefieldwillberetrievedfromthestoredfields(iftheyexist;wewilldiscusstheminChapter2,IndexingYourData)orfromtheinternal_sourcefield.Bydefault,thevalueofthefieldsparameteris_source.Anexampleis:fields=title,priority.

Wecanalsodisablethefetchingofthe_sourcefieldbyaddingthe_sourceparameterwithitsvaluesettofalse.

SortingtheresultsUsingthesortparameter,wecanspecifycustomsorting.ThedefaultbehaviorofElasticsearchistosortthereturneddocumentsindescendingorderofthevalueofthe_scorefield.Ifwewanttosortourdocumentsdifferently,weneedtospecifythesort

www.EBooksWorld.ir

parameter.Forexample,addingsort=published:descwillsortthedocumentsindescendingorderofpublishedfield.Byaddingthesort=published:ascparameter,wewilltellElasticsearchtosortthedocumentsonthebasisofthepublishedfieldinascendingorder.

Ifwespecifycustomsorting,Elasticsearchwillomitthe_scorefieldcalculationforthedocuments.Thismaynotbethedesiredbehaviorinyourcase.Ifyouwanttostillkeepatrackofthescoresforeachdocumentwhenusingacustomsort,youshouldaddthetrack_scores=truepropertytoyourquery.Pleasenotethattrackingthescoreswhendoingcustomsortingwillmakethequeryalittlebitslower(youmaynotevennoticethedifference)duetotheprocessingpowerneededtocalculatethescore.

ThesearchtimeoutBydefault,Elasticsearchdoesn’thavetimeoutforqueries,butyoumaywantyourqueriestotimeoutafteracertainamountoftime(forexample,5seconds).Elasticsearchallowsyoutodothisbyexposingthetimeoutparameter.Whenthetimeoutparameterisspecified,thequerywillbeexecuteduptoagiventimeoutvalueandtheresultsthatweregathereduptothatpointwillbereturned.Tospecifyatimeoutof5seconds,youwillhavetoaddthetimeout=5sparametertoyourquery.

TheresultswindowElasticsearchallowsyoutospecifytheresultswindow(therangeofdocumentsintheresultslistthatshouldbereturned).Wehavetwoparametersthatallowustospecifytheresultswindowsize:sizeandfrom.Thesizeparameterdefaultsto10anddefinesthemaximumnumberofresultsreturned.Thefromparameterdefaultsto0andspecifiesfromwhichdocumenttheresultsshouldbereturned.Inordertoreturnfivedocumentsstartingfromthe11thone,wewilladdthefollowingparameterstothequery:size=5&from=10.

Limitingper-shardresultsElasticsearchallowsustospecifythemaximumnumberofdocumentsthatshouldbefetchedfromeachshardusingterminate_afterpropertyandspecifyingthemaximumnumberofdocuments.Forexample,ifwewanttogetnomorethan100documentsfromeachshard,wecanaddterminate_after=100toourURIrequest.

IgnoringunavailableindicesWhenrunningqueriesagainstmultipleindices,itishandytotellElasticsearchthatwedon’tcareabouttheindicesthatarenotavailable.Bydefault,Elasticsearchwillthrowanerrorifoneoftheindicesisnotavailable,butwecanchangethisbysimplyaddingtheignore_unavailable=trueparametertoourURIrequest.

ThesearchtypeTheURIqueryallowsustospecifythesearchtypeusingthesearch_typeparameter,whichdefaultstoquery_then_fetch.Twovaluesthatwecanusehereare:dfs_query_then_fetchandquery_then_fetch.TherestofthesearchtypesavailableinolderElasticsearchversionsarenowdeprecatedorremoved.We’lllearnmoreabout

www.EBooksWorld.ir

searchtypesintheUnderstandingthequeryingprocesssectionofChapter3,SearchingYourData.

LowercasingtermexpansionSomequeries,suchastheprefixquery,usequeryexpansion.WewilldiscussthisintheQueryrewritesectioninChapter4,ExtendingYourQueryingKnowledge.Weareallowedtodefinewhethertheexpandedtermsshouldbelowercasedornotusingthelowercase_expanded_termsproperty.Bydefault,thelowercase_expanded_termspropertyissettotrue,whichmeansthattheexpandedtermswillbelowercased.

WildcardandprefixanalysisBydefault,wildcardqueriesandprefixqueriesarenotanalyzed.Ifwewanttochangethisbehavior,wecansettheanalyze_wildcardpropertytotrue.

NoteIfyouwanttoseealltheparametersexposedbyElasticsearchastheURIrequestparameters,pleaserefertotheofficialdocumentationavailableat:https://www.elastic.co/guide/en/elasticsearch/reference/current/search-uri-request.html.

www.EBooksWorld.ir

LucenequerysyntaxWethoughtthatitwouldbegoodtoknowabitmoreaboutwhatsyntaxcanbeusedintheqparameterpassedintheURIquery.SomeofthequeriesinElasticsearch(suchastheonecurrentlybeingdiscussed)supporttheLucenequeryparsersyntax—thelanguagethatallowsyoutoconstructqueries.Let’stakealookatitanddiscusssomebasicfeatures.

AquerythatwepasstoLuceneisdividedintotermsandoperatorsbythequeryparser.Let’sstartwiththeterms;youcandistinguishthemintotwotypes—singletermsandphrases.Forexample,toqueryforabookterminthetitlefield,wewillpassthefollowingquery:

title:book

Toqueryfortheelasticsearchbookphraseinthetitlefield,wewillpassthefollowingquery:

title:"elasticsearchbook"

Youmayhavenoticedthenameofthefieldinthebeginningandinthetermorthephraselater.

Aswealreadysaid,theLucenequerysyntaxsupportsoperators.Forexample,the+operatortellsLucenethatthegivenpartmustbematchedinthedocument,meaningthatthetermwearesearchingformustpresentinthefieldinthedocument.The-operatoristheopposite,whichmeansthatsuchapartofthequerycan’tbepresentinthedocument.Apartofthequerywithoutthe+or-operatorwillbetreatedasthegivenpartofthequerythatcanbematchedbutitisnotmandatory.So,ifwewanttofindadocumentwiththebookterminthetitlefieldandwithoutthecatterminthedescriptionfield,wesendthefollowingquery:

+title:book-description:cat

Wecanalsogroupmultipletermswithparentheses,asshowninthefollowingquery:

title:(crimepunishment)

Wecanalsoboostpartsofthequery(thisincreasestheirimportanceforthescoringalgorithm—thehighertheboost,themoreimportantthequerypartis)withthe^operatorandtheboostvalueafterit,asshowninthefollowingquery:

title:book^4

ThesearethebasicsoftheLucenequerylanguageandshouldallowyoutouseElasticsearchandconstructquerieswithoutanyproblems.However,ifyouareinterestedintheLucenequerysyntaxandyouwouldliketoexplorethatindepth,pleaserefertotheofficialdocumentationofthequeryparseravailableathttp://lucene.apache.org/core/5_4_0/queryparser/org/apache/lucene/queryparser/classic/package-summary.html.

www.EBooksWorld.ir

www.EBooksWorld.ir

SummaryInthischapter,welearnedwhatfulltextsearchisandthecontributionApacheLucenemakestothis.Inadditiontothis,wearenowfamiliarwiththebasicconceptsofElasticsearchanditstop-levelarchitecture.WeusedtheElasticsearchRESTAPInotonlytoindexdata,butalsotoupdate,retrieve,andfinallydeleteit.We’velearnedwhatversioningisandhowwecanuseitforoptimisticlockinginElasticsearch.Finally,wesearchedourdatausingthesimpleURIquery.

Inthenextchapter,we’llfocusonindexingourdata.WewillseehowElasticsearchindexingworksandwhattheroleofprimaryshardsandreplicasis.We’llseehowElasticsearchhandlesdatathatitdoesn’tknowandhowtocreateourownmappings—theJSONstructurethatdescribesthestructureofourindex.We’llalsolearnhowtousebatchindexingtospeeduptheindexingprocessandwhatadditionalinformationcanbestoredalongwithourindextohelpusachieveourgoal.Inaddition,wewilldiscusswhatanindexsegmentis,whatsegmentmergingis,andhowtotuneasegment.Finally,we’llseehowroutingworksinElasticsearchandwhatoptionswehavewhenitcomestobothindexingandqueryingrouting.

www.EBooksWorld.ir

www.EBooksWorld.ir

Chapter2.IndexingYourDataInthepreviouschapter,welearnedwhatfulltextsearchisandhowApacheLucenefitsthere.WewereintroducedtothebasicconceptsofElasticsearchandwearenowfamiliarwithitstop-levelarchitecture,soweknowhowitworks.WeusedtheRESTAPItoindexdata,toupdateit,todeleteit,andofcoursetoretrieveit.WesearchedourdatawiththesimpleURIqueryandweusedversioningthatallowedustouseoptimisticlockingfunctionality.Bytheendofthischapter,youwillhavelearnedthefollowingtopics:

BasicinformationaboutElasticsearchindexingAdjustingElasticsearchschema-lessbehaviorCreatingyourownmappingsUsingoutoftheboxanalyzersConfiguringyourownanalyzersIndexdatainbatchesAddingadditionalinternalinformationtoindicesSegmentmergingRouting

www.EBooksWorld.ir

ElasticsearchindexingSofarwehaveourElasticsearchclusterupandrunning.WealsoknowhowtouseElasticsearchRESTAPItoindexourdata,weknowhowtoretrieveit,andwealsoknowhowtoremovethedatathatwenolongerneed.We’vealsolearnedhowtosearchinourdatabyusingtheURIrequestsearchandApacheLucenequerylanguage.However,untilnowwe’veusedElasticsearchfunctionalitythatallowsusnottocareaboutindices,shards,anddatastructure.ThisisnotsomethingthatyoumaybeusedtowhenyouarecomingfromtheworldofSQLdatabases,whereyouneedthedatabaseandthetableswithallthecolumnscreatedupfront.Ingeneral,youneededtodescribethedatastructuretobeabletoputdataintothedatabase.Elasticsearchisschema-lessandbydefaultcreatesindicesautomaticallyandbecauseofthatwecanjustinstallitandindexdatawithouttheneedofanypreparations.However,thisisusuallynotthebestsituationwhenitcomestoproductionenvironmentswhereyouwanttocontroltheanalysisofyourdata.BecauseofthatwewillstartwithshowingyouhowtomanageyourindicesandthenwewillgetyouthroughtheworldofmappingsinElasticsearch.

www.EBooksWorld.ir

ShardsandreplicasInChapter1,GettingStartedwithElasticsearchCluster,wetoldyouthatindicesinElasticsearcharebuiltfromoneormoreshards.EachofthoseshardscontainspartofthedocumentsetandeachshardisaseparateLuceneindex.Inadditiontothat,eachshardcanhavereplicas–physicalcopiesoftheprimarysharditself.Whenwecreateanindex,wecantellElasticsearchhowmanyshardsitshouldbebuiltfrom.

NoteThedefaultnumberofshardsthatElasticsearchusesis5andeachindexwillalsocontainasinglereplica.Thedefaultconfigurationcanbechangedbysettingtheindex.number_of_shardsandindex.number_of_replicaspropertiesintheelasticsearch.ymlconfigurationfile.

Whendefaultsareused,wewillendupwithfiveApacheLuceneindicesthatourElasticsearchindexisbuiltofandonereplicaforeachofthose.So,withfiveshardsandonereplica,wewouldactuallyget10shards.Thisisbecauseeachshardwouldgetitsowncopy,sothetotalnumberofshardsintheclusterwouldbe10.

Dividingindicesinsuchawayallowsustospreadtheshardsacrossthecluster.Thenicethingaboutthatisthatalltheshardswillbeautomaticallyspreadthroughoutthecluster.Ifwehaveasinglenode,Elasticsearchwillputthefiveprimaryshardsonthatnodeandwillleavethereplicasunassigned,becauseElasticsearchdoesn’tassignshardsandtheirreplicastothesamenode.Thereasonforthatissimple–ifanodewouldcrash,wewouldloseboththeprimarysourceofthedataandallthecopies.So,ifyouhaveoneElasticsearchnode,don’tworryaboutreplicasnotbeingassigned–itissomethingtobeexpected.OfcoursewhenyouhaveenoughnodesforElasticsearchtoassignallthereplicas(inadditiontoshards),itisnotgoodtonothavethemassignedandyoushouldlookfortheprobablecausesofthatsituation.

Thethingtorememberisthathavingshardsandreplicasisnotfree.Firstofall,eachreplicaneedsadditionaldiskspace,exactlythesameamountofspacethattheoriginalshardneeds.Soifwehave3replicasforourindex,wewillactuallyneed4timesmorespace.Ifourprimaryshardweighs100GBintotal,with3replicaswewouldneed400GB–100GBforeachreplica.However,thisisnottheonlycost.EachreplicaisaLuceneindexonitsownandElasticsearchneedssomememorytohandlethat.Themoreshardsinthecluster,themorememoryisbeingused.Andfinally,havingreplicasmeansthatwewillhavetodoindexationoneachofthereplica,inadditiontotheindexationontheprimaryshard.Thereisanotionofshadowreplicaswhichcancopythewholebinaryindex,but,inmostcases,eachreplicawilldoitsownindexation.ThegoodthingaboutreplicasisthatElasticsearchwilltrytospreadthequeryandgetrequestsevenlybetweentheshardsandtheirreplicas,whichmeansthatwecanscaleourclusterhorizontallybyusingthem.

Sotosumuptheconclusions:

Havingmoreshardsintheindexallowsustospreadtheindexbetweenmoreservers

www.EBooksWorld.ir

andparallelizetheindexingoperationsandthushavebetterindexingthroughput.Dependingonyourdeployment,havingmoreshardsmayincreasequerythroughputandlowerquerieslatency–especiallyinenvironmentsthatdon’thavealargenumberofqueriespersecond.Havingmoreshardsmaybeslowercomparedtoasingleshardquery,becauseElasticsearchneedstoretrievethedatafrommultipleserversandcombinethemtogetherinmemory,beforereturningthefinalqueryresults.Havingmorereplicasresultsinamoreresilientcluster,becausewhentheprimaryshardisnotavailable,itscopywilltakethatrole.Basically,havingasinglereplicaallowsustoloseonecopyofashardandstillservethewholedata.Havingtworeplicasallowsustolosetwocopiesoftheshardandstillseethewholedata.Thehigherthereplicacount,thehigherqueriesthroughputtheclusterwillhave.That’sbecauseeachreplicacanservethedataithasindependentlyfromalltheothers.Thehighernumberofshards(bothprimaryandreplicas)willresultinmorememoryneededbyElasticsearch.

Ofcourse,thesearenottheonlyrelationshipsbetweenthenumberofshardsandreplicasinElasticsearch.Wewilltalkaboutmostofthemlaterinthebook.

So,howmanyshardsandreplicasshouldwehaveforourindices?Thatdepends.Webelievethatthedefaultsarequitegoodbutnothingcanreplaceagoodtest.Notethatthenumberofreplicasisnotveryimportantbecauseyoucanadjustitonaliveclusterafterindexcreation.Youcanremoveandaddthemifyouwantandhavetheresourcestorunthem.Unfortunately,thisisnottruewhenitcomestothenumberofshards.Onceyouhaveyourindexcreated,theonlywaytochangethenumberofshardsistocreateanotherindexandre-indexyourdata.

WriteconsistencyElasticsearchallowsustocontrolthewriteconsistencytopreventwriteshappeningwhentheyshouldnot.Bydefault,Elasticsearchindexingoperationissuccessfulwhenthewriteissuccessfulonthequorumonactiveshards–meaning50%oftheactiveshardsplusone.Wecancontrolthisbehaviorbyaddingaction.write_consitencytoourelasticsearch.ymlfileorbyaddingtheconsistencyparametertoourindexrequest.Thementionedpropertiescantakethefollowingvalues:

quorum:Thedefaultvalue,requiring50%plus1activeshardstobesuccessfulfortheindexoperationtosucceedone:Requiresonlyasingleactiveshardtobesuccessfulfortheindexoperationtosucceedall:Requiresalltheactiveshardstobesuccessfulfortheindexoperationtosucceed

www.EBooksWorld.ir

CreatingindicesWhenwewereindexingourdocumentsinChapter1,GettingStartedwithElasticsearchCluster,wedidn’tcareaboutindexcreationatall.WeassumedthatElasticsearchwilldoeverythingforusandactuallyitwastrue;wejustusedthefollowingcommand:

curl-XPUT'http://localhost:9200/blog/article/1'-d'{"title":"New

versionofElasticsearchreleased!","content":"Version1.0released

today!","tags":["announce","elasticsearch","release"]}'

Thisisjustfine.Ifsuchanindexdoesnotexist,Elasticsearchautomaticallycreatestheindexforus.However,therearetimeswhenwewanttocreateindicesourselvesforvariousreasons.Maybewewouldliketohavecontroloverwhichindicesarecreatedtoavoiderrorsormaybewehavesomenondefaultsettingsthatwewouldliketousewhencreatingaparticularindex.Thereasonsmaydiffer,butit’sgoodtoknowthatwecancreateindiceswithoutindexingdocuments.

ThesimplestwaytocreateanindexistorunaPUTHTTPrequestwiththenameoftheindexwewanttocreate.Forexample,tocreateanindexcalledblog,wecouldusethefollowingcommand:

curl-XPUThttp://localhost:9200/blog/

WejusttoldElasticsearchthatwewanttocreatetheindexwiththenameblog.Ifeverythinggoesright,youwillseethefollowingresponsefromElasticsearch:

{"acknowledged":true}

AlteringautomaticindexcreationWealreadymentionedthatautomaticindexcreationisnotthebestideainsomecases.Forexample,asimpletypoduringindexcreationcanleadtocreatinghundredsofunusedindicesandmakeclusterstateinformationlargerthanitshouldbe,puttingmorepressureonElasticsearchandtheunderlyingJVM.Becauseofthat,wecanturnoffautomaticindexcreationbyaddingasimplepropertytotheelasticsearch.ymlconfigurationfile:

action.auto_create_index:false

Let’sstopforawhileanddiscusstheaction.auto_create_indexproperty,becauseitallowsustodomorecomplicatedthingsthanjustallowing(settingittotrue)anddisabling(settingittofalse)automaticindexcreation.Thementionedpropertyallowsustousepatternsthatspecifytheindexnameswhichshouldbeallowedtobeautomaticallycreatedandwhichshouldbedisallowed.Forexample,let’sassumethatwewouldliketoallowautomaticindexcreationforindicesstartingwithlogsandwewouldliketodisallowalltheothers.Todosomethinglikethis,wewouldsettheaction.auto_create_indexpropertytosomethingasfollows:

action.auto_create_index:+logs*,-*

Nowifwewouldliketocreateanindexcalledlogs_2015-10-01,wewouldsucceed.Tocreatesuchanindex,wewouldusethefollowingcommand:

www.EBooksWorld.ir

curl-XPUThttp://localhost:9200/logs_2015-10-01/log/1-d'{"message":

"Testlogmessage"}'

Elasticsearchwouldrespondwith:

{

"_index":"logs_2015-10-01",

"_type":"log",

"_id":"1",

"_version":1,

"_shards":{

"total":2,

"successful":1,

"failed":0

},

"created":true

}

However,supposewenowtrytocreatetheblogusingthefollowingcommand:

curl-XPUThttp://localhost:9200/blog/article/1-d'{"title":"Testarticle

title"}'

Elasticsearchwouldrespondwithanerrorsimilartothefollowingone:

{

"error":{

"root_cause":[{

"type":"index_not_found_exception",

"reason":"nosuchindex",

"resource.type":"index_expression",

"resource.id":"blog",

"index":"blog"

}],

"type":"index_not_found_exception",

"reason":"nosuchindex",

"resource.type":"index_expression",

"resource.id":"blog",

"index":"blog"

},

"status":404

}

Onethingtorememberisthattheorderofpatterndefinitionsmatters.Elasticsearchchecksthepatternsuptothefirstpatternthatmatches,soifwemove-*asthefirstpattern,the+logs*patternwon’tbeusedatall.

SettingsforanewlycreatedindexManualindexcreationisalsonecessarywhenwewanttopassnondefaultconfigurationoptionsduringindexcreation;forexample,initialnumberofshardsandreplicas.WecandothatbyincludingJSONpayloadwithsettingsasthePUTHTTPrequestbody.Forexample,ifwewouldliketotellElasticsearchthatourblogindexshouldonlyhaveasingleshardandtworeplicasinitially,thefollowingcommandcouldbeused:

curl-XPUThttp://localhost:9200/blog/-d'{

www.EBooksWorld.ir

"settings":{

"number_of_shards":1,

"number_of_replicas":2

}

}'

Theprecedingcommandwillresultinthecreationoftheblogindexwithoneshardandtworeplicas,makingatotalofthreephysicalLuceneindices–calledshardsaswealreadyknow.Ofcoursetherearealotmoresettingsthatwecanuse,butwhatwedidisenoughfornowandwewilllearnabouttherestthroughoutthebook.

IndexdeletionOfcourse,similartohowwehandleddocuments,Elasticsearchallowsustodeleteindicesaswell.Deletinganindexisverysimilartocreatingit,butinsteadofusingthePUTHTTPmethod,weusetheDELETEone.Forexample,ifwewouldliketodeleteourpreviouslycreatedblogindex,wewouldrunthefollowingcommand:

curl-XDELETEhttp://localhost:9200/blog

Theresponsewillbethesameastheonewesawearlierwhenwecreatedanindexandshouldlookasfollows:

{"acknowledged":true}

Nowthatweknowwhatanindexis,howtocreateit,andhowtodeleteit,wearereadytocreateindiceswiththemappingswehavedefined.EventhoughElasticsearchisschema–less,therearealotofsituationswherewewouldliketomanuallycreatetheschema,toavoidanyproblemswiththeindexstructure.

www.EBooksWorld.ir

www.EBooksWorld.ir

MappingsconfigurationIfyouareusedtoSQLdatabases,youmayknowthatbeforeyoucanstartinsertingthedatainthedatabase,youneedtocreateaschema,whichwilldescribewhatyourdatalookslike.AlthoughElasticsearchisaschema-less(werathercallitdatadrivenschema)searchengineandcanfigureoutthedatastructureonthefly,wethinkthatcontrollingthestructureandthusdefiningitourselvesisabetterway.Thefieldtypedeterminingmechanismisnotgoingtoguessthefuture.Forexample,ifyoufirstsendanintegervalue,suchas60,andyousendafloatvaluesuchas70.23forthesamefield,anerrorcanhappenorElasticsearchwilljustcutoffthedecimalpartofthefloatvalue(whichisactuallywhathappens).ThisisbecauseElasticsearchwillfirstsetthefieldtypetointegerandwilltrytoindexthefloatvaluetotheintegerfieldwhichwillcausecuttingofthedecimalpointinthefloatingpointnumber.Inthenextfewpagesyou’llseehowtocreatemappingsthatsuityourneedsandmatchyourdatastructure.

NoteNotethatwedidn’tincludealltheinformationabouttheavailabletypesinthischapterandsomefeaturesofElasticsearch,suchasnestedtype,parent-childhandling,storinggeographicalpoints,andsearch,aredescribedinthefollowingchaptersofthisbook.

www.EBooksWorld.ir

TypedeterminingmechanismBeforewestartdescribinghowtocreatemappingsmanually,wewanttogetbacktotheautomatictypedeterminingalgorithmusedinElasticsearch.Aswealreadysaid,ElasticsearchcantryguessingtheschemaforourdocumentsbylookingattheJSONthatthedocumentisbuiltfrom.BecauseJSONisstructured,thatseemseasytodo.Forexample,stringsaresurroundedbyquotationmarks,Booleansaredefinedusingspecificwords,andnumbersarejustafewdigits.Thisisasimpletrick,butitusuallyworks.Forexample,let’slookatthefollowingdocument:

{

"field1":10,

"field2":"10"

}

Theprecedingdocumenthastwofields.Thefield1fieldwillbegivenatypenumber(tobeprecise,thatfieldwillbegivenalongtype).Thesecondfield,calledfield2willbegivenastringtype,becauseitissurroundedbyquotationmarks.Ofcourse,forsomeusecasesthiscanbethedesiredbehavior.However,ifsomehowwewouldsurroundallthedatausingquotationmark(whichisnotthebestideaanyway)ourindexstructurewouldcontainonlystringtypefields.

NoteDon’tworryaboutthefactthatyouarenotfamiliarwithwhatarethenumerictypes,thestringtypes,andsoon.WewilldescribethemafterweshowyouwhatyoucandototunetheautomatictypedeterminingmechanisminElasticsearch.

DisablingthetypedeterminingmechanismThefirstsolutionistocompletelydisabletheschema-lessbehaviorinElasticsearch.Wecandothatbyaddingtheindex.mapper.dynamicpropertytoourindexpropertiesandsettingittofalse.Wecandothatbyrunningthefollowingcommandtocreatetheindex:

curl-XPUT'localhost:9200/sites'-d'{

"index.mapper.dynamic":false

}'

BydoingthatwetoldElasticsearchthatwedon’twantittoguessthetypeofourdocumentsinthesite’sindexandthatwewillprovidethemappingsourselves.Ifwewilltryindexingsomeexampledocumenttothesite’sindex,wewillgetthefollowingerror:

{

"error":{

"root_cause":[{

"type":"type_missing_exception",

"reason":"type[[doc,tryingtoautocreatemapping,butdynamic

mappingisdisabled]]missing",

"index":"sites"

}],

"type":"type_missing_exception",

"reason":"type[[doc,tryingtoautocreatemapping,butdynamic

www.EBooksWorld.ir

mappingisdisabled]]missing",

"index":"sites"

},

"status":404

}

Thisisbecausewedidn’tcreateanymappings–noschemafordocumentswascreated.Elasticsearchcouldn’tcreateoneforusbecausewedidn’tallowitandtheindexationcommandfailed.

Ofcoursethisisnottheonlythingwecandowhenitcomestoconfiguringhowthetypedeterminingmechanismworks.Wecanalsotuneitordisableitforagiventypeontheobjectlevel.WewilltalkaboutthesecondcaseinChapter5,ExtendingYourIndexStructure.Fornow,let’slookatthepossibilitiesoftuningtypedeterminingmechanisminElasticsearch.

TuningthetypedeterminingmechanismfornumerictypesOneofthesolutionstotheproblemswithJSONdocumentsandtypeguessingisthatwearenotalwaysincontrolofthedata.Thedocumentsthatweareindexingcancomefrommultipleplacesandsomesystemsinourenvironmentmayincludequotationmarksforallthefieldsinthedocument.Thiscanleadtoproblemsandbadguesses.Becauseofthat,Elasticsearchallowsustoenablemoreaggressivefieldsvaluecheckingfornumericfieldsbysettingthenumeric_detectionpropertytotrueinthemappingsdefinition.Forexample,let’sassumethatwewanttocreateanindexcalledusersandwewantittohavetheusertypeonwhichwewillwantmoreaggressivenumericfieldsparsing.Todothat,wewillusethefollowingcommand:

curl-XPUThttp://localhost:9200/users/?pretty-d'{

"mappings":{

"user":{

"numeric_detection":true

}

}

}'

Nowlet’srunthefollowingcommandtoindexasingledocumenttotheusersindex:

curl-XPOSThttp://localhost:9200/users/user/1-d'{"name":"User1",

"age":"20"}'

Earlier,withthedefaultsettings,theagefieldwouldbesettostringtype.Withthenumeric_detectionpropertysettotrue,thetypeoftheagefieldwillbesettolong.Wecancheckthatbyrunningthefollowingcommand(itwillretrievethemappingsforallthetypesintheusersindex):

curl-XGET'localhost:9200/users/_mapping?pretty'

TheprecedingcommandshouldresultinthefollowingresponsereturnedbyElasticsearch:

{

"users":{

"mappings":{

www.EBooksWorld.ir

"user":{

"numeric_detection":true,

"properties":{

"age":{

"type":"long"

},

"name":{

"type":"string"

}

}

}

}

}

}

Aswecansee,theagefieldwasreallysettobeoftypelong.

TuningthetypedeterminingmechanismfordatesAnothertypeofdatathatcausestroublearefieldswithdates.Datescancomeindifferentflavors,forexample,2015-10-0111:22:33isaproperdateandsois2015-10-01T11:22:33+00.Becauseofthat,Elasticsearchtriestomatchthefieldstotimestampsorstringsthatmatchsomegivendateformat.Ifthatmatchingoperationissuccessful,thefieldistreatedasadatebasedone.Ifweknowhowourdatefieldslook,wecanhelpElasticsearchbyprovidingalistofrecognizeddateformatsusingthedynamic_date_formatsproperty,whichallowsustospecifytheformatsarray.Let’slookatthefollowingcommandforcreatinganindex:

curl-XPUT'http://localhost:9200/blog/'-d'{

"mappings":{

"article":{

"dynamic_date_formats":["yyyy-MM-ddhh:mm"]

}

}

}'

Theprecedingcommandwillresultinthecreationofanindexcalledblogwiththesingletypecalledarticle.We’vealsousedthedynamic_date_formatspropertywithasingledateformatthatwillresultinElasticsearchusingthedatecoretype(refertotheCoretypessectioninthischapterformoreinformationaboutfieldtypes)forfieldsmatchingthedefinedformat.Elasticsearchusesthejoda-timelibrarytodefinethedateformats,sovisithttp://joda-time.sourceforge.net/api-release/org/joda/time/format/DateTimeFormat.htmlifyouareinterestedinknowingaboutthem.

NoteRememberthatthedynamic_date_formatpropertyacceptsanarrayofvalues.Thatmeansthatwecanhandleseveraldateformatssimultaneously.

Withtheprecedingindex,wecannowtryindexinganewdocumentusingthefollowingcommand:

www.EBooksWorld.ir

curl-XPUTlocalhost:9200/blog/article/1-d'{"name":"Test",

"test_field":"2015-10-0111:22"}'

Elasticsearchwillofcourseindexthatdocument,butlet’slookatthemappingscreatedforourindex:

curl-XGET'localhost:9200/blog/_mapping?pretty'

Theresponsefortheprecedingcommandwillbeasfollows:

{

"blog":{

"mappings":{

"article":{

"dynamic_date_formats":["yyyy-MM-ddhh:mm"],

"properties":{

"name":{

"type":"string"

},

"test_field":{

"type":"date",

"format":"yyyy-MM-ddhh:mm"

}

}

}

}

}

}

Aswecansee,thetest_fieldfieldwasgivenadatetype,soourtuningworks.

Unfortunately,theproblemstillexistsifwewanttheBooleantypetobeguessed.ThereisnooptiontoforcetheguessingofBooleantypesfromthetext.Insuchcases,whenachangeofsourceformatisimpossible,wecanonlydefinethefielddirectlyinthemappingsdefinition.

www.EBooksWorld.ir

IndexstructuremappingEachdatahasitsownstructure–someareverysimple,andsomeincludecomplicatedobjectrelations,childrendocuments,andnestedproperties.Ineachcase,weneedtohaveaschemainElasticsearchcalledmappingsthatdefinehowthedatalooks.Ofcourse,wecanusetheschema-lessnatureofElasticsearch,butwecanandweusuallywanttopreparethemappingsupfront,soweknowhowthedataishandled.

Forthepurposesofthischapter,wewilluseasingletypeintheindex.Ofcourse,Elasticsearchasamultitenantsystemallowsustohavemultipletypesinasingleindex,butwewanttosimplifytheexample,tomakeiteasiertounderstand.So,forthepurposeofthenextfewpages,wewillcreateanindexcalledpoststhatwillholddatafordocumentsinaposttype.Wealsoassumethattheindexwillholdthefollowinginformation:

UniqueidentifieroftheblogpostNameoftheblogpostPublicationdateContents–textofthepostitself

InElasticsearch,mappings,aswithalmostallcommunication,aresentasJSONobjectsintherequestbody.So,ifwewanttocreatethesimplestmappingsthatmatchesourneed,itwilllookasfollows(westoredthemappingsintheposts.jsonfile,sowecaneasilysendit):

{

"mappings":{

"post":{

"properties":{

"id":{"type":"long"},

"name":{"type":"string"},

"published":{"type":"date"},

"contents":{"type":"string"}

}

}

}

}

Tocreateourpostsindexwiththeprecedingmappingsfile,wewilljustrunthefollowingcommand:

curl-XPOST'http://localhost:9200/posts'-d@posts.json

NoteNotethatyoucanstoreyourmappingsandsetafilenametowhatevernameyoulike.Thecurlcommandwilljusttakethecontentsofit.

Andagain,ifeverythinggoeswell,weseethefollowingresponse:

{"acknowledged":true}

Elasticsearchreportedthatourindexhasbeencreated.IfwelookattheElasticsearchnodewww.EBooksWorld.ir

–onthecurrentmaster,wewillseesomethingasfollows:

[2015-10-1415:02:12,840][INFO][cluster.metadata][Shalla-Bal]

[posts]creatingindex,cause[api],templates[],shards[5]/[1],mappings

[post]

Wecanseethatthepostsindexhasbeencreated,with5shardsand1replica(shards[5]/[1])andwithmappingsforasingleposttype(mappings[post]).Let’snowdiscussthecontentsoftheposts.jsonfileandthepossibilitieswhenitcomestomappings.

TypeandtypesdefinitionThemappingsdefinitioninElasticsearchisjustanotherJSONobject,soitneedstobeproperlystartedandendedwithcurlybrackets.Allthemappingsdefinitionsarenestedinsideasinglemappingsobject.Inourexample,wehadasingleposttype,butwecanhavemultipleofthem.Forexample,ifwewouldliketohavemorethanasingletypeinourmappings,wejustneedtoseparatethemwithacommacharacter.Let’sassumethatwewouldliketohaveanadditionalusertypeinourpostsindex.Themappingsdefinitioninsuchcasewilllookasfollows(westoreditintheposts_with_user.jsonfile):

{

"mappings":{

"post":{

"properties":{

"id":{"type":"long"},

"name":{"type":"string"},

"published":{"type":"date"},

"contents":{"type":"string"}

}

},

"user":{

"properties":{

"id":{"type":"long"},

"name":{"type":"string"}

}

}

}

}

Asyoucansee,wecannamethetypeswiththenameswewant.Undereachtypewehavethepropertiesobjectinwhichwestoretheactualnameofthefieldsandtheirdefinition.

FieldsEachfieldinthemappingsdefinitionisjustanameandanobjectdescribingthepropertiesofthefield.Forexample,wecanhaveafielddefinedasthefollowing:

"body":{"type":"string","store":"yes","index":"analyzed"}

Theprecedingfielddefinitionstartswithaname–body.Afterthatwehaveanobjectwiththreeproperties–thetypeofthefield(thetypeproperty),iftheoriginalfieldvalueshouldbestored(thestoreproperty),andifthefieldshouldbeindexedandhow(theindexproperty).And,ofcourse,multiplefielddefinitionsareseparatedfromeachotherusingthecommacharacter,justlikeotherJSONobjects.

www.EBooksWorld.ir

CoretypesEachfieldtypeinElasticsearchcanbegivenoneoftheprovidedcoretypes.ThecoretypesinElasticsearchareasfollows:

StringNumber(integer,long,float,double)DateBooleanBinary

Inadditiontothecoretypes,Elasticsearchprovidesadditionaltypesthatcanhandlemorecomplicateddata–suchasnesteddocuments,object,andsoon.WewilltalkabouttheminChapter5,ExtendingYourIndexStructure.

Commonattributes

Beforecontinuingwithallthecoretypedescriptions,wewouldliketodiscusssomecommonattributesthatyoucanusetodescribeallthetypes(exceptforthebinaryone):

index_name:Thisattributedefinesthenameofthefieldthatwillbestoredintheindex.Ifthisisnotdefined,thenamewillbesettothenameoftheobjectthatthefieldisdefinedwith.Usually,youdon’tneedtosetthisproperty,butitmaybeusefulinsomecases;forexample,whenyoudon’thavecontroloverthenameofthefieldsintheJSONdocumentsthataresenttoElasticsearch.index:Thisattributecantakethevaluesanalyzedandnoand,forstring-basedfields,itcanalsobesettotheadditionalnot_analyzedvalue.Ifsettoanalyzed,thefieldwillbeindexedandthussearchable.Ifsettono,youwon’tbeabletosearchonsuchafield.Thedefaultvalueisanalyzed.Incaseofstring-basedfields,thereisanadditionaloption,not_analyzed.This,whenset,willmeanthatthefieldwillbeindexedbutnotanalyzed.So,thefieldiswrittenintheindexasitwassenttoElasticsearchandonlyaperfectmatchwillbecountedduringasearch–thequerywillhavetoincludeexactlythesamevalueasthevalueintheindex.IfwecompareittotheSQLdatabasesworld,settingtheindexpropertyofafieldtonot_analyzedwouldworkjustlikeusingwherefield=value.Alsorememberthatsettingtheindexpropertytonowillresultinthedisablinginclusionofthatfieldininclude_in_all(theinclude_in_allpropertyisdiscussedasthelastpropertyinthelist).store:Thisattributecantakethevaluesyesandnoandspecifiesiftheoriginalvalueofthefieldshouldbewrittenintotheindex.Thedefaultvalueisno,whichmeansthatElasticsearchwon’tstoretheoriginalvalueofthefieldandwilltrytousethe_sourcefield(theJSONrepresentingtheoriginaldocumentthathasbeensenttoElasticsearch)whenyouwanttoretrievethefieldvalue.Storedfieldsarenotusedforsearching,howevertheycanbeusedforhighlightingifenabled(whichmaybemoreefficientthatloadingthe_sourcefieldincaseitisbig).doc_values:Thisattributecantakethevaluesoftrueandfalse.Whensettotrue,Elasticsearchwillcreateaspecialondiskstructureduringindexationfornottokenizedfields(likenotanalyzedstringfields,numberbasedfields,Booleanfields,

www.EBooksWorld.ir

anddatefields).ThisstructureishighlyefficientandisusedbyElasticsearchforoperationsthatrequireun-inverteddata,suchasaggregations,sorting,orscripting.StartingwithElasticsearch2.0thedefaultvalueofthisistruefornottokenizedfields.SettingthisvaluetofalsewillresultinElasticsearchusingfielddatacacheinsteadofdocvalues,whichhashighermemorydemand,butmaybefasterinsomeraresituations.boost:Thisattributedefineshowimportantthefieldisinsidethedocument;thehighertheboost,themoreimportantthevaluesinthefieldare.Thedefaultvalueofthisattributeis1,whichmeansaneutralvalue–anythingabove1willmakethefieldmoreimportant,anythinglessthan1willmakeitlessimportant.null_value:Thisattributespecifiesavaluethatshouldbewrittenintotheindexincasethatfieldisnotapartofanindexeddocument.Thedefaultbehaviorwilljustomitthatfield.copy_to:Thisattributespecifiesanarrayoffieldstowhichtheoriginalvaluewillbecopiedto.Thisallowsfordifferentkindofanalysisofthesamedata.Forexample,youcouldimaginehavingtwofields–onecalledtitleandonecalledtitle_sort,eachhavingthesamevaluebutprocesseddifferently.Wecouldusecopy_totocopythetitlefieldvaluetotitle_sort.include_in_all:Thisattributespecifiesifthefieldshouldbeincludedinthe_allfield.The_allfieldisaspecialfieldusedbyElasticsearchtoalloweasysearchinginthecontentsofthewholeindexeddocument.Elasticsearchcreatesthecontentofthe_allfieldbycopyingallthedocumentfieldsthere.Bydefault,ifthe_allfieldisused,allthefieldswillbeincludedinit.

String

Stringisthebasictexttypewhichallowsustostoreoneormorecharactersinsideit.Asampledefinitionofsuchafieldisasfollows:

"body":{"type":"string","store":"yes","index":"analyzed"}

Inadditiontothecommonattributes,thefollowingattributescanalsobesetforthestring-basedfields:

term_vector:Thisattributecantakethevaluesno(thedefaultone),yes,with_offsets,with_positions,andwith_positions_offsets.ItdefineswhetherornottocalculatetheLucenetermvectorsforthatfield.Ifyouareusinghighlighting(distinctionwhichtermswherematchedinadocumentduringthequery),youwillneedtocalculatethetermvectorforthesocalledfastvectorhighlighting–amoreefficienthighlightingversion.analyzer:Thisattributedefinesthenameoftheanalyzerusedforindexingandsearching.Itdefaultstotheglobally-definedanalyzername.search_analyzer:Thisattributedefinesthenameoftheanalyzerusedforprocessingthepartofthequerystringthatissenttoaparticularfield.norms.enabled:Thisattributespecifieswhetherthenormsshouldbeloadedforafield.Bydefault,itissettotrueforanalyzedfields(whichmeansthatthenormswillbeloadedforsuchfields)andtofalsefornon-analyzedfields.Normsarevalues

www.EBooksWorld.ir

insideofLuceneindexthatareusedwhencalculatingascoreforadocument–usuallynotneededfornotanalyzedfieldsandusedonlyduringquerytime.Anexampleindexcreationcommandthatdisablesnormforasinglefieldpresentwouldlookasfollows:

curl-XPOST'localhost:9200/essb'-d'{

"mappings":{

"book":{

"properties":{

"name":{

"type":"string",

"norms":{

"enabled":false

}

}

}

}

}

}'

norms.loading:ThisattributetakesthevalueseagerandlazyanddefineshowElasticsearchwillloadthenorms.Thefirstvaluemeansthatthenormsforsuchfieldsarealwaysloaded.Thesecondvaluemeansthatthenormswillbeloadedonlywhenneeded.Normsareusefulforscoring,butmayrequireavastamountofmemoryforlargedatasets.Havingnormsloadedeagerly(propertysettoeager)meanslessworkduringquerytime,butwillleadtomorememoryconsumption.Anexampleindexcreationcommandthateagerlyloadnormsforasinglefieldpresentlookasfollows:

curl-XPOST'localhost:9200/essb_eager'-d'{

"mappings":{

"book":{

"properties":{

"name":{

"type":"string",

"norms":{

"loading":"eager"

}

}

}

}

}

}'

position_offset_gap:Thisattributedefaultsto0andspecifiesthegapintheindexbetweeninstancesofthegivenfieldwiththesamename.Settingthistoahighervaluemaybeusefulifyouwantposition-basedqueries(suchasphrasequeries)tomatchonlyinsideasingleinstanceofthefield.index_options:Thisattributedefinestheindexingoptionsforthepostingslist–thestructureholdingtheterms(wetalkmoreaboutitinthePostingsformatsectionofthischapter).Thepossiblevaluesaredocs(onlydocumentnumbersareindexed),freqs(documentnumbersandtermfrequenciesareindexed),positions(documentnumbers,termfrequencies,andtheirpositionsareindexed),andoffsets(document

www.EBooksWorld.ir

numbers,termfrequencies,theirpositions,andoffsetsareindexed).Thedefaultvalueforthispropertyispositionsforanalyzedfieldsanddocsforfieldsthatareindexedbutnotanalyzed.ignore_above:Thisattributedefinesthemaximumsizeofthefieldincharacters.Afieldwhosesizeisabovethespecifiedvaluewillbeignoredbytheanalyzer.

NoteInoneoftheupcomingElasticsearchversions,thestringtypemaybedeprecatedandmaybereplacedbytwonewtypes,textandkeyword,tobetterindicatewhatthestringbasedfieldisrepresenting.Thetexttypewillbeusedforanalyzedtextfieldsandthekeywordtypewillbeusedfornotanalyzedtextfields.Ifyouareinterestedintheincomingchanges,refertothefollowingGitHubissue:https://github.com/elastic/elasticsearch/issues/12394.

Number

Thisisthecommonnameforafewcoretypesthatgatherallthenumericfieldtypesthatareavailableandwaitingtobeused.ThefollowingtypesareavailableinElasticsearch(wespecifythembyusingthetypeproperty):

byte:Thistypedefinesabytevalue;forexample,1.Itallowsforvaluesbetween-128and127inclusive.short:Thistypedefinesashortvalue;forexample,12.Itallowsforvaluesbetween-32768and32767inclusive.integer:Thistypedefinesanintegervalue;forexample,134.Itallowsforvaluesbetween-231and231-1inclusiveuptoJava7andvaluesbetween0and232-1inJava8.long:Thistypedefinesalongvalue;forexample,123456789.Itallowsforvaluesbetween-263and263-1inclusiveuptoJava7andvaluesbetween0and264-1inJava8.float:Thistypedefinesafloatvalue;forexample,12.23.Forinformationaboutthepossiblevalues,refertohttps://docs.oracle.com/javase/specs/jls/se8/html/jls-4.html#jls-4.2.3.double:Thistypedefinesadoublevalue;forexample,123.45.Forinformationaboutthepossiblevalues,refertohttps://docs.oracle.com/javase/specs/jls/se8/html/jls-4.html#jls-4.2.3.

NoteYoucanlearnmoreaboutthementionedJavatypesathttp://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html.

Asampledefinitionofafieldbasedononeofthenumerictypesisasfollows:

"price":{"type":"float","precision_step":"4"}

Inadditiontothecommonattributes,thefollowingonescanalsobesetforthenumericfields:

www.EBooksWorld.ir

precision_step:Thisattributedefinesthenumberoftermsgeneratedforeachvalueinthenumericfield.Thelowerthevalue,thehigherthenumberoftermsgenerated.Forfieldswithahighernumberoftermspervalue,rangequerieswillbefasteratthecostofaslightlylargerindex.Thedefaultvalueis16forlonganddouble,8forinteger,short,andfloat,and2147483647forbyte.coerce:Thisattributedefaultstotrueandcantakethevalueoftrueorfalse.ItdefinesifElasticsearchshouldtrytoconvertthestringvaluestonumbersforagivenfieldandifthedecimalpartsofthefloatvalueshouldbetruncatedfortheintegerbasedfields.ignore_malformed:Thisattributecantakethevaluetrueorfalse(whichisthedefault).Itshouldbesettotrueinordertoomitthebadlyformattedvalues.

Boolean

ThebooleancoretypeisdesignedforindexingtheBooleanvalues(trueorfalse).Asampledefinitionofafieldbasedonthebooleantypeisasfollows:

"allowed":{"type":"boolean","store":"yes"}

Binary

ThebinaryfieldisaBASE64representationofthebinarydatastoredintheindex.Youcanuseittostoredatathatisnormallywritteninbinaryform,suchasimages.Fieldsbasedonthistypearebydefaultstoredandnotindexed,soyoucanonlyretrievethemandnotperformsearchoperationsonthem.Thebinarytypeonlysupportstheindex_name,type,store,anddoc_valuesproperties.Thesamplefielddefinitionbasedonthebinaryfieldmaylooklikethefollowing:

"image":{"type":"binary"}

Date

Thedatecoretypeisdesignedtobeusedfordateindexing.ThedateinthefieldallowsustospecifyaformatthatwillberecognizedbyElasticsearch.ItisworthnotingthatallthedatesareindexedinUTCandareinternallyindexedaslongvalues.Inadditiontothat,forthedatebasedfields,ElasticsearchacceptslongvaluesrepresentingUTCmillisecondssinceepochregardlessoftheformatspecifiedforthedatefield.

ThedefaultdateformatrecognizedbyElasticsearchisquiteuniversalandallowsustoprovidethedateandoptionallythetime;forexample,2012-12-24T12:10:22.Asampledefinitionofafieldbasedonthedatetypeisasfollows:

"published":{"type":"date","format":"YYYY-mm-dd"}

Asampledocumentthatusestheabovedatefieldwiththespecifiedformatisasfollows:

{

"name":"Sampledocument",

"published":"2012-12-22"

}

Inadditiontothecommonattributes,thefollowingonescanalsobesetforthefields

www.EBooksWorld.ir

basedonthedatetype:

format:Thisattributespecifiestheformatofthedate.ThedefaultvalueisdateOptionalTime.Forafulllistofformats,visithttps://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-date-format.html.precision_step:Thisattributedefinesthenumberoftermsgeneratedforeachvalueinthenumericfield.Refertothenumericcoretypedescriptionformoreinformationaboutthisparameter.numeric_resolution:ThisattributedefinestheunitoftimethatElasticsearchwillusewhenanumericvalueispassedtothedatebasedfieldinsteadofthedatefollowingaformat.Bydefault,Elasticsearchusesthemillisecondsvalue,whichmeansthatthenumericvaluewillbetreatedasmillisecondssinceepoch.Anothervalueisseconds.ignore_malformed:Thisattributecantakethevaluetrueorfalse.Thedefaultvalueisfalse.Itshouldbesettotrueinordertoomitbadlyformattedvalues.

MultifieldsTherearesituationswhereweneedtohavethesamefieldanalyzeddifferently.Forexample,oneforsorting,oneforsearching,andoneforanalysiswithaggregations,butallusingthesamefieldvalue,justindexeddifferently.Wecouldofcourseusethepreviouslydescribedfieldvaluecopying,butwecanalsousesocalledmultifields.TobeabletousethatfeatureofElasticsearch,weneedtodefineanadditionalpropertyinourfielddefinitioncalledfields.Thefieldsisanobjectthatcancontainoneormoreadditionalfieldsthatwillbepresentinourindexandwillhavethevalueofthefieldthattheyareassignedto.Forexample,ifwewouldliketohaveaggregationsdoneonthenamefieldandinadditiontothatsearchonthatfield,wewoulddefineitasfollows:

"name":{

"type":"string",

"fields":{

"agg":{"type":"string","index":"not_analyzed"}

}

}

Theprecedingdefinitionwillcreatetwofields–onecallednameandthesecondcalledname.agg.Ofcourse,youdon’thavetospecifytwoseparatefieldsinthedatayouaresendingtoElasticsearch–asingleonenamednameisenough.Elasticsearchwilldotherest,whichmeanscopyingthevalueofthefieldtoallthefieldsfromtheprecedingdefinition.

TheIPaddresstypeTheipfieldtypewasaddedtoElasticsearchtosimplifytheuseofIPv4addressesinanumericform.ThisfieldtypeallowsustosearchdatathatisindexedasanIPaddress,sortonsuchdata,anduserangequeriesusingIPvalues.

Asampledefinitionofafieldbasedononeofthenumerictypesisasfollows:

www.EBooksWorld.ir

"address":{"type":"ip"}

Inadditiontothecommonattributes,theprecision_stepattributecanalsobesetfortheiptypebasedfields.Refertothenumerictypedescriptionformoreinformationaboutthatproperty.

Asampledocumentthatusestheipbasedfieldlooksasfollows:

{

"name":"TomPC",

"address":"192.168.2.123"

}

TokencounttypeThetoken_countfieldtypeallowsustostoreandindexinformationabouthowmanytokensthegivenfieldhasinsteadofstoringandindexingthetextprovidedtothefield.Itacceptsthesameconfigurationoptionsasthenumbertype,butinadditiontothat,weneedtospecifytheanalyzerwhichwillbeusedtodividethefieldvalueintotokens.Wedothatbyusingtheanalyzerproperty.

Asampledefinitionofafieldbasedonthetoken_countfieldtypelooksasfollows:

"title_count":{"type":"token_count","analyzer":"standard"}

www.EBooksWorld.ir

UsinganalyzersThegreatthingaboutElasticsearchisthatitleveragestheanalysiscapabilitiesofApacheLucene.Thismeansthatforfieldsthatarebasedonthestringtype,wecanspecifywhichanalyzerElasticsearchshoulduse.AsyourememberfromtheFulltextsearchingsectionofChapter1,GettingStartedwithElasticsearchCluster,theanalyzerisafunctionalitythatisusedtoanalyzedataorqueriesinthewaywewant.Forexample,whenwedividewordsonthebasisofwhitespacesandlowercasecharacters,wedon’thavetoworryabouttheuserssendingwordsthatarelowercasedoruppercased.ThismeansthatElasticsearch,elasticsearch,andElAstIcSeaRChwillbetreatedasthesameword.What’smoreisthatElasticsearchallowsustousenotonlytheanalyzersprovidedoutofthebox,butalsocreateourownconfigurations.Wecanalsousedifferentanalyzersatthetimeofindexinganddifferentanalyzersatthetimeofquerying—wecanchoosehowwewantourdatatobeprocessedateachstageofthesearchprocess.Let’snowhavealookattheanalyzersprovidedbyElasticsearchandatElasticsearchanalysisfunctionalityingeneral.

Out-of-the-boxanalyzersElasticsearchallowsustouseoneofthemanyanalyzersdefinedbydefault.Thefollowinganalyzersareavailableoutofthebox:

standard:ThisanalyzerisconvenientformostEuropeanlanguages(refertohttps://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-standard-analyzer.htmlforthefulllistofparameters).simple:Thisanalyzersplitstheprovidedvalueonnon-lettercharactersandconvertsthemtolowercase.whitespace:Thisanalyzersplitstheprovidedvalueonthebasisofwhitespacecharacters.stop:Thisissimilartoasimpleanalyzer,butinadditiontothefunctionalityofthesimpleanalyzer,itfiltersthedataonthebasisoftheprovidedsetofstopwords(refertohttps://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-stop-analyzer.htmlforthefulllistofparameters).keyword:Thisisaverysimpleanalyzerthatjustpassestheprovidedvalue.You’llachievethesamebyspecifyingaparticularfieldasnot_analyzed.pattern:Thisanalyzerallowsflexibletextseparationbytheuseofregularexpressions(refertohttps://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-pattern-analyzer.htmlforthefulllistofparameters).Thekeypointtorememberwhenitcomestothepatternanalyzeristhattheprovidedpatternshouldmatchtheseparatorsofthewords,notthewordsthemselves.language:Thisanalyzerisdesignedtoworkwithaspecificlanguage.Thefulllistoflanguagessupportedbythisanalyzercanbefoundathttps://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-lang-analyzer.html.snowball:Thisisananalyzerthatissimilartostandard,butadditionallyprovidesthe

www.EBooksWorld.ir

stemmingalgorithm(refertohttps://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-snowball-analyzer.htmlforthefulllistofparameters).

NoteStemmingistheprocessofreducingtheinflectedandderivedwordstotheirstemorbaseform.Suchaprocessallowsforthereductionofwords,forexample,withcarsandcar.Forthementionedwords,stemmer(whichisanimplementationofthestemmingalgorithm)willproduceasinglestem,car.Afterindexing,thedocumentscontainingsuchwordswillbematchedwhileusinganyofthem.Withoutstemming,thedocumentswiththeword“cars”willonlybematchedbyaquerycontainingthesameword.YoucanfindmoreinformationaboutstemmingonWikipediaathttps://en.wikipedia.org/wiki/Stemming.

DefiningyourownanalyzersInadditiontotheanalyzersmentionedpreviously,ElasticsearchallowsustodefinenewoneswithouttheneedforwritingasinglelineofJavacode.Inordertodothat,weneedtoaddanadditionalsectiontoourmappingsfile;thatis,thesettingssection,whichholdsadditionalinformationusedbyElasticsearchduringindexcreation.Thefollowingcodesnippetshowshowwecandefineourcustomsettingssection:

"settings":{

"index":{

"analysis":{

"analyzer":{

"en":{

"tokenizer":"standard",

"filter":[

"asciifolding",

"lowercase",

"ourEnglishFilter"

]

}

},

"filter":{

"ourEnglishFilter":{

"type":"kstem"

}

}

}

}

}

Wespecifiedthatwewantanewanalyzernamedentobepresent.Eachanalyzerisbuiltfromasingletokenizerandmultiplefilters.Acompletelistofthedefaultfiltersandtokenizerscanbefoundathttps://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-tokenizers.html.Ourenanalyzerincludesthestandardtokenizerandthreefilters:asciifoldingandlowercase,whicharetheonesavailablebydefault,andacustomourEnglishFilter,whichisafilterwehavedefined.

www.EBooksWorld.ir

Todefineafilter,weneedtoprovideitsname,itstype(thetypeproperty),andanynumberofadditionalparametersrequiredbythatfiltertype.ThefulllistoffiltertypesavailableinElasticsearchcanbefoundathttps://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-tokenfilters.html.Pleasebeaware,thatwewon’tbediscussingeachfilterasthelistoffiltersisconstantlychanging.Ifyouareinterestedinthefullfilterslist,pleaserefertothementionedpageinthedocumentation.

So,thefinalmappingsfilewithourcustomanalyzerdefinedwillbeasfollows:

{

"settings":{

"index":{

"analysis":{

"analyzer":{

"en":{

"tokenizer":"standard",

"filter":[

"asciifolding",

"lowercase",

"ourEnglishFilter"

]

}

},

"filter":{

"ourEnglishFilter":{

"type":"kstem"

}

}

}

}

},

"mappings":{

"post":{

"properties":{

"id":{"type":"long"},

"name":{"type":"string","analyzer":"en"}

}

}

}

}

Ifwesavetheprecedingmappingstoafilecalledposts_mappings.json,wecanrunthefollowingcommandtocreatethepostsindex:

curl-XPOST'http://localhost:9200/posts'-d@posts_mappings.json

WecanseehowouranalyzerworksbyusingtheAnalyzeAPI(https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-analyze.html).Forexample,let’slookatthefollowingcommand:

curl-XGET'localhost:9200/posts/_analyze?pretty&field=name'-d'robots

cars'

www.EBooksWorld.ir

ThecommandasksElasticsearchtoshowthecontentoftheanalysisofthegivenphrase(robotscars)withtheuseoftheanalyzerdefinedfortheposttypeanditsnamefield.TheresponsethatwewillgetfromElasticsearchisasfollows:

{

"tokens":[{

"token":"robot",

"start_offset":0,

"end_offset":6,

"type":"<ALPHANUM>",

"position":0

},{

"token":"car",

"start_offset":7,

"end_offset":11,

"type":"<ALPHANUM>",

"position":1

}]

}

Asyoucansee,therobotscarsphrasewasdividedintotwotokens.Inadditiontothat,therobotswordwaschangedtorobotandthecarswordwaschangedtocar.

DefaultanalyzersThereisonemorethingtosayaboutanalyzers.Elasticsearchallowsustospecifytheanalyzerthatshouldbeusedbydefaultifnoanalyzerisdefined.Thisisdoneinthesamewayasweconfiguredacustomanalyzerinthesettingssectionofthemappingsfile,butinsteadofspecifyingacustomnamefortheanalyzer,adefaultkeywordshouldbeused.Sotomakeourpreviouslydefinedanalyzerthedefault,wecanchangetheenanalyzertothefollowing:

{

"settings":{

"index":{

"analysis":{

"analyzer":{

"default":{

"tokenizer":"standard",

"filter":[

"asciifolding",

"lowercase",

"ourEnglishFilter"

]

}

},

"filter":{

"ourEnglishFilter":{

"type":"kstem"

}

}

}

}

}

www.EBooksWorld.ir

}

Wecanalsochooseadifferentdefaultanalyzerforsearchingandadifferentoneforindexing.Ifwewouldliketodothatinsteadofusingthedefaultkeywordfortheanalyzername,weshouldusedefault_searchanddefault_indexrespectively.

www.EBooksWorld.ir

DifferentsimilaritymodelsWiththereleaseofApacheLucene4.0in2012,alltheusersofthisgreatfulltextsearchlibraryweregiventheopportunitytoalterthedefaultTF/IDF-basedalgorithmanduseadifferentone(we’vementioneditintheFulltextsearchingsectionofChapter1,GettingStartedwithElasticsearchCluster).BecauseofthatweareabletochooseasimilaritymodelinElasticsearch,whichbasicallyallowsustousedifferentscoringformulasforourdocuments.

NoteNotethatthesimilaritymodelstopicrangesfromintermediatetoadvancedandinmostcasestheTF/IDFbasedalgorithmwillbesufficientforyourusecase.However,wedecidedtohaveitdescribedinthebook,soyouknowthatyouhavethepossibilityofchangingthescoringalgorithmbehaviorifneeded.

Settingper-fieldsimilaritySinceElasticsearch0.90,weareallowedtosetadifferentsimilarityforeachofthefieldsthatwehaveinourmappingsfile.Forexample,let’sassumethatwehavethefollowingsimplemappingsthatweuseinordertoindextheblogposts:

{

"mappings":{

"post":{

"properties":{

"id":{"type":"long"},

"name":{"type":"string"},

"contents":{"type":"string"}

}

}

}

}

Todothis,wewillusetheBM25similaritymodelforthenamefieldandthecontentsfield.Inordertodothat,weneedtoextendourfielddefinitionsandaddthesimilaritypropertywiththevalueofthechosensimilarityname.Ourchangedmappingswilllooklikethefollowing:

{

"mappings":{

"post":{

"properties":{

"id":{"type":"long"},

"name":{"type":"string","similarity":"BM25"},

"contents":{"type":"string","similarity":"BM25"}

}

}

}

}

Andthat’sall,nothingmoreisneeded.Aftertheabovechange,ApacheLucenewillusetheBM25similaritytocalculatethescorefactorforthenameandthecontentsfields.

www.EBooksWorld.ir

AvailablesimilaritymodelsThereareatleastfivenewsimilaritymodelsavailable.Formostoftheusecases,apartfromthedefaultone,youmayfindthefollowingmodelsuseful:

OkapiBM25model:Thissimilaritymodelisbasedonaprobabilisticmodelthatestimatestheprobabilityoffindingadocumentforagivenquery.InordertousethissimilarityinElasticsearch,youneedtousetheBM25name.OkapiBM25similarityissaidperformbestwhendealingwithshorttextdocumentswheretermrepetitionsareespeciallyhurtfultotheoveralldocumentscore.Tousethissimilarity,oneneedstosetthesimilaritypropertyforafieldtoBM25.Thissimilarityisdefinedoutoftheboxanddoesn’tneedadditionalpropertiestobeset.Divergencefromrandomnessmodel:Thissimilaritymodelisbasedontheprobabilisticmodelofthesamename.InordertousethissimilarityinElasticsearch,youneedtousetheDFRname.Itissaidthatthedivergencefromrandomnesssimilaritymodelperformswellontextthatissimilartonaturallanguage.Information-basedmodel:Thisisthelastmodelofthenewlyintroducedsimilaritymodelsandisverysimilartothedivergencefromrandomnessmodel.InordertousethissimilarityinElasticsearch,youneedtousetheIBname.SimilartotheDFRsimilarity,itissaidthattheinformation-basedmodelperformswellondatasimilartonaturallanguagetext.

ThetwoothersimilaritymodelscurrentlyavailableareLMDirichletsimilarity(touseit,setthetypepropertytoLMDirichlet)andLMJelinekMercersimilarity(touseit,setthetypepropertytoLMJelinekMercer).YoucanfindmoreaboutthesesimilaritymodelsinApacheLuceneJavadocs,MasteringElasticsearchSecondEdition,publishedbyPacktPublishingorinofficialdocumentationofElasticsearchavailableathttps://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-similarity.html.

Configuringdefaultsimilarity

Thedefaultsimilarityallowsustoprovideanadditionaldiscount_overlapsproperty.Itallowsustocontrolifthetokensonthesamepositionsinthetokenstream(withpositionincrementof0)areomittedduringscorecalculation.Bydefault,itissettotrue,whichmeansthatthetokensonthesamepositionsareomitted;ifyouwantthemtobecounted,youcansetthatpropertytofalse.Forexample,thefollowingcommandshowshowtocreateanindexwiththediscount_overlapspropertychangedforthedefaultsimilarity:

curl-XPUT'localhost:9200/test_similarity'-d'{

"settings":{

"similarity":{

"altered_default":{

"type":"default",

"discount_overlaps":false

}

}

},

"mappings":{

"doc":{

www.EBooksWorld.ir

"properties":{

"name":{"type":"string","similarity":"altered_default"}

}

}

}

}'

ConfiguringBM25similarity

Eventhoughwedon’tneedtoconfiguretheBM25similarity,wecanprovidesomeadditionaloptionstotuneitsbehavior.TheBM25similarityallowsustoprovidethediscount_overlapspropertysimilartothedefaultsimilarityandtwoadditionalproperties:k1andb.Thek1propertyspecifiesthetermfrequencynormalizationfactorandthebpropertyvaluedeterminestowhatdegreethedocumentlengthwillnormalizethetermfrequencyvalues.

ConfiguringDFRsimilarity

IncaseoftheDFRsimilarity,wecanconfigurethebasic_modelproperty(whichcantakethevaluebe,d,g,if,in,p,orine),theafter_effectproperty(withvaluesofno,b,orl),andthenormalizationproperty(whichcanbeno,h1,h2,h3,orz).Ifwechooseanormalizationvalueotherthanno,weneedtosetthenormalizationfactor.

Dependingonthechosennormalizationvalue,weshouldusenormalization.h1.c(thefloatvalue)forh1normalization,normalization.h2.c(thefloatvalue)forh2normalization,normalization.h3.c(thefloatvalue)forh3normalization,andnormalization.z.z(thefloatvalue)forznormalization.Forexample,thefollowingishowtheexamplesimilarityconfigurationwilllook(weputthisintothesettingssectionofourmappingsfile):

"similarity":{

"esserverbook_dfr_similarity":{

"type":"DFR",

"basic_model":"g",

"after_effect":"l",

"normalization":"h2",

"normalization.h2.c":"2.0"

}

}

ConfiguringIBsimilarity

IncaseofIBsimilarity,wehavethefollowingparametersthroughwhichwecanconfigurethedistributionproperty(whichcantakethevalueofllorspl)andthelambdaproperty(whichcantakethevalueofdfortff).Inadditiontothat,wecanchoosethenormalizationfactor,whichisthesameasfortheDFRsimilarity,sowe’llomitdescribingitasecondtime.ThefollowingishowtheexampleIBsimilarityconfigurationwilllook(weputthisintothesettingssectionofourmappingsfile):

"similarity":{

"esserverbook_ib_similarity":{

"type":"IB",

"distribution":"ll",

www.EBooksWorld.ir

"lambda":"df",

"normalization":"z",

"normalization.z.z":"0.25"

}

}

www.EBooksWorld.ir

www.EBooksWorld.ir

BatchindexingtospeedupyourindexingprocessInChapter1,GettingStartedwithElasticsearchCluster,wesawhowtoindexaparticulardocumentintoElasticsearch.ItrequiredopeninganHTTPconnection,sendingthedocument,andclosingtheconnection.Ofcourse,wewerenotresponsibleformostofthatasweusedthecurlcommand,butinthebackgroundthisiswhathappened.However,sendingthedocumentsonebyoneisnotefficient.Becauseofthat,itisnowtimetofindouthowtoindexalargenumberofdocumentsinamoreconvenientandefficientwaythandoingsoonebyone.

www.EBooksWorld.ir

PreparingdataforbulkindexingElasticsearchallowsustomergemanyrequestsintoonepackage.Thispackagecanbesentasasinglerequest.What’smore,wearenotlimitedtohavingasingletypeofrequestinthesocalledbulk–wecanmixdifferenttypesofoperationstogether,whichinclude:

Addingorreplacingtheexistingdocumentsintheindex(index)Removingdocumentsfromtheindex(delete)

Addingnewdocumentsintotheindexwhenthereisnootherdefinitionofthedocumentintheindex(create)Modifyingthedocumentsorcreatingnewonesifthedocumentdoesn’texist(update)

Theformatoftherequestwaschosenforprocessingefficiency.ItassumesthateverylineoftherequestcontainsaJSONobjectwiththedescriptionoftheoperationfollowedbythesecondlinewithadocument–anotherJSONobjectitself.Wecantreatthefirstlineasakindofinformationlineandthesecondasthedataline.Theexceptiontothisruleisthedeleteoperation,whichcontainsonlytheinformationline,becausethedocumentisnotneeded.Let’slookatthefollowingexample:

{"index":{"_index":"addr","_type":"contact","_id":1}}

{"name":"FyodorDostoevsky","country":"RU"}

{"create":{"_index":"addr","_type":"contact","_id":2}}

{"name":"ErichMariaRemarque","country":"DE"}

{"create":{"_index":"addr","_type":"contact","_id":2}}

{"name":"JosephHeller","country":"US"}

{"delete":{"_index":"addr","_type":"contact","_id":4}}

{"delete":{"_index":"addr","_type":"contact","_id":1}}

Itisveryimportantthateverydocumentoractiondescriptionisplacedinoneline(endedbyanewlinecharacter).Thismeansthatthedocumentcannotbepretty-printed.Thereisadefaultlimitationonthesizeofthebulkindexingfile,whichissetto100megabytesandcanbechangedbyspecifyingthehttp.max_content_lengthpropertyintheElasticsearchconfigurationfile.Thisletsusavoidissueswithpossiblerequesttimeoutsandmemoryproblemswhendealingwithrequeststhataretoolarge.

NoteNotethatwithasinglebatchindexingfile,wecanloadthedataintomanyindicesanddocumentsinthebulkrequestcanhavedifferenttypes.

www.EBooksWorld.ir

IndexingthedataInordertoexecutethebulkrequest,Elasticsearchprovidesthe_bulkendpoint.Thiscanbeusedas/_bulkorwithanindexnameas/index_name/_bulkorevenwithatypeandindexnameas/index_name/type_name/_bulk.Thesecondandthirdformsdefinethedefaultvaluesfortheindexnameandthetypename.WecanomitthesepropertiesintheinformationlineofourrequestandElasticsearchwillusethedefaultvaluesfromtheURI.ItisalsoworthknowingthatthedefaultURIvaluescanbeoverwrittenbythevaluesintheinformationlines.

Assumingwe’vestoredourdatainthedocuments.jsonfile,wecanrunthefollowingcommandtosendthisdatatoElasticsearch:

curl-XPOST'localhost:9200/_bulk?pretty'--data-binary@documents.json

The?prettyparameterisofcoursenotnecessary.We’veusedthisparameteronlyfortheeaseofanalyzingtheresponseoftheprecedingcommand.Whatisimportant,inthiscase,isusingcurlwiththe--data-binaryparameterinsteadofusing–d.Thisisbecausethestandard–dparameterignoresnewlinecharacters,which,aswesaidearlier,areimportantforparsingthebulkrequestcontentbyElasticsearch.Nowlet’slookattheresponsereturnedbyElasticsearch:

{

"took":469,

"errors":true,

"items":[{

"index":{

"_index":"addr",

"_type":"contact",

"_id":"1",

"_version":1,

"_shards":{

"total":2,

"successful":1,

"failed":0

},

"status":201

}

},{

"create":{

"_index":"addr",

"_type":"contact",

"_id":"2",

"_version":1,

"_shards":{

"total":2,

"successful":1,

"failed":0

},

"status":201

}

},{

"create":{

www.EBooksWorld.ir

"_index":"addr",

"_type":"contact",

"_id":"2",

"status":409,

"error":{

"type":"document_already_exists_exception",

"reason":"[contact][2]:documentalreadyexists",

"shard":"2",

"index":"addr"

}

}

},{

"delete":{

"_index":"addr",

"_type":"contact",

"_id":"4",

"_version":1,

"_shards":{

"total":2,

"successful":1,

"failed":0

},

"status":404,

"found":false

}

},{

"delete":{

"_index":"addr",

"_type":"contact",

"_id":"1",

"_version":2,

"_shards":{

"total":2,

"successful":1,

"failed":0

},

"status":200,

"found":true

}

}]

}

Aswecansee,everyresultisapartoftheitemsarray.Let’sbrieflycomparetheseresultswithourinputdata.Thefirsttwocommands,namedindexandcreate,wereexecutedwithoutanyproblems.Thethirdoperationfailedbecausewewantedtocreatearecordwithanidentifierthatalreadyexistedintheindex.Thenexttwooperationsweredeletions.Bothsucceeded.Notethatthefirstofthemtriedtodeleteanonexistentdocument;asyoucansee,thiswasn’taproblemforElasticsearch–thethingworthnotingthoughisthatforthenonexistingdocumentwesawastatusof404,whichintheHTTPresponsecodemeansnotfound(http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html).Asyoucansee,Elasticsearchreturnsinformationabouteachoperation,soforlargebulkrequeststheresponsecanbemassive.

www.EBooksWorld.ir

The_allfieldThe_allfieldisusedbyElasticsearchtostoredatafromalltheotherfieldsinasinglefieldforeaseofsearching.Thiskindoffieldmaybeusefulwhenwewanttoimplementasimplesearchfeatureandwewanttosearchallthedata(oronlythefieldswecopytothe_allfield),butwedon’twanttothinkaboutthefieldnamesandthingslikethat.Bydefault,the_allfieldisenabledandcontainsallthedatafromallthefieldsfromthedocument.However,thisfieldmakestheindexabitbiggerandthatisnotalwaysneeded.

Forexample,whenyouinputasearchphraseintoasearchboxinthelibrarycatalogsite,youexpectthatyoucansearchusingtheauthor’sname,theISBNnumber,andthewordsthatthebooktitlecontains,butsearchingforthenumberofpagesorthecovertypeusuallydoesnotmakesense.Wecaneitherdisablethe_allfieldcompletelyorexcludethecopyingofcertainfieldstoit.Inordernottoincludeacertainfieldinthe_allfield,weusetheinclude_in_allproperty,whichwasdiscussedearlierinthischapter.Tocompletelyturnoffthe_allfieldfunctionality,wemodifyourmappingsfileasfollows:

{

"book":{

"_all":{

"enabled":false

},

"properties":{

...

}

}

}

Inadditiontotheenabledproperty,the_allfieldsupportsthefollowingones:

store

term_vector

analyzer

Forinformationabouttheprecedingproperties,refertotheMappingsconfigurationsectioninthischapter.

www.EBooksWorld.ir

The_sourcefieldThe_sourcefieldallowsustostoretheoriginalJSONdocumentthatwassenttoElasticsearchduringindexation.Bydefault,the_sourcefieldisturnedonassomeoftheElasticsearchfunctionalitiesdependonit(forexample,thepartialupdatefeature).Inadditiontothat,the_sourcefieldcanbeusedasthesourceofdataforthehighlightingfunctionalityifafieldisnotstored.However,ifwedon’tneedsuchafunctionality,wecandisablethe_sourcefieldasitcausessomestorageoverhead.Inordertodothat,weneedtosetthe_sourceobject’senabledpropertytofalse,asfollows:

{

"book":{ "_source":{

"enabled":false

},

"properties":{

...

}

}

}

WecanalsotellElasticsearchwhichfieldswewanttoexcludefromthe_sourcefieldandwhichfieldswewanttoinclude.Wedothatbyaddingtheincludesandexcludespropertiestothe_sourcefielddefinition.Forexample,ifwewanttoexcludeallthefieldsintheauthorpathfromthe_sourcefield,ourmappingswilllookasfollows:

{

"book":{

"_source":{

"excludes":["author.*"]

},

"properties":{

...

}

}

}

www.EBooksWorld.ir

AdditionalinternalfieldsThereareadditionalfieldsthatareinternallyusedbyElasticsearch,butwhichwecan’tconfigure.Thosefieldsare:

_id:Thisfieldisusedtoholdtheidentifierofthedocumentinsidetheindexandtype_uid:Thisfieldisusedtoholdtheuniqueidentifierofthedocumentintheindexandisbuiltof_idand_type(thisallowstohavedocumentswiththesameidentifierwithdifferenttypesinsidethesameindex)_type:Thisfieldisthetypenameforthedocument_field_names:Thisfieldisthelistoffieldsexistinginthedocument

www.EBooksWorld.ir

www.EBooksWorld.ir

IntroductiontosegmentmergingIntheFulltextsearchingsectionofChapter1,GettingStartedwithElasticsearchCluster,wementionedsegmentsandtheirimmutability.WewrotethattheLucenelibrary,andthusElasticsearch,writesdatatocertainstructuresthatarewrittenonceandneverchange.Thisallowsforsomesimplification,butalsointroducestheneedforadditionalwork.Onesuchexampleisdeletion.Becausesegment,cannotbealtered,informationaboutdeletionsmustbestoredalongsideanddynamicallyappliedduringsearch.Thisisdonebyfilteringdeleteddocumentsfromthereturnedresultset.Theotherexampleistheinabilitytomodifythedocuments(however,somemodificationsarepossible,suchasmodifyingnumericdocvalues).Ofcourse,onecansaythatElasticsearchsupportsdocumentupdates(refertotheManipulatingdatawiththeRESTAPIsectionofChapter1,GettingStartedwithElasticsearchCluster).However,underthehood,theolddocumentismarkedasdeletedandtheonewiththeupdatedcontentsisindexed.

Astimepassesandyoucontinuetoindexordeleteyourdata,moreandmoresegmentsarecreated.Dependingonhowoftenyoumodifytheindex,Lucenecreatessegmentswithvariousnumbersofdocuments-thus,segmentshavedifferentsizes.Becauseofthat,thesearchperformancemaybelowerandyourindexmaybelargerthanitshouldbe–itstillcontainsthedeleteddocuments.Theequationissimple-themoresegmentsyourindexhas,theslowerthesearchspeedis.Thisiswhensegmentmergingcomesintoplay.Wedon’twanttodescribethisprocessindetail;inthecurrentElasticsearchversion,thispartoftheenginewassimplifiedbutitisstillaratheradvancedtopic.Wedecidedtomentionmergingbecausewethinkthatitishandytoknowwheretolookforthecauseoftroublesconnectedwithtoomanyopenfiles,suspiciousCPUusage,expandingindices,orsearchingandindexingspeeddegradingwithtime.

www.EBooksWorld.ir

SegmentmergingSegmentmergingistheprocessduringwhichtheunderlyingLucenelibrarytakesseveralsegmentsandcreatesanewsegmentbasedontheinformationfoundinthem.Theresultingsegmenthasallthedocumentsstoredintheoriginalsegmentsexcepttheonesthatweremarkedfordeletion.Afterthemergeoperation,thesourcesegmentsaredeletedfromthedisk.BecausesegmentmergingisrathercostlyintermsofCPUandI/Ousage,itiscrucialtoappropriatelycontrolwhenandhowoftenthisprocessisinvoked.

www.EBooksWorld.ir

TheneedforsegmentmergingYoumayaskyourselfwhyyouhavetobotherwithsegmentmerging.Firstofall,themoresegmentstheindexisbuiltfrom,theslowerthesearchwillbeandthemorememoryLucenewilluse.Thesecondisthediskspaceandresources,suchasfiledescriptors,usedbytheindex.Ifyoudeletemanydocumentsfromyourindexthen,untilthemergehappens,thosedocumentsareonlymarkedasdeletedandnotdeletedphysically.So,itmayhappenthatmostofthedocumentsthatuseourCPUandmemorydon’texist!Fortunately,Elasticsearchusesreasonabledefaultsforsegmentmerginganditisveryprobablethatnochangesarenecessary.

www.EBooksWorld.ir

ThemergepolicyThemergepolicydefineswhenthemergingprocessshouldbeperformed.Elasticsearchmergessegmentsofapproximatelysimilarsizes,takingintoaccountthemaximumnumberofsegmentsallowedpertier.Thealgorithmofmergingcanfindsegmentswiththelowestcostofmergeandthemostimpactontheresultingsegment.

Thebasicpropertiesofthetieredmergepolicyareasfollows:

index.merge.policy.expunge_deletes_allowed:ThispropertytellsElasticsearchtomergesegmentswithpercentageofthedeleteddocumentshigherthanthisvalue,defaultsto10.index.merge.policy.floor_segment:Thispropertydefaultsto2mbandtellsElasticsearchtotreatsmallersegmentsasoneswithsizeequaltothevalueofthisproperty.Itpreventsflushingoftinysegmentstoavoidtheirhighnumber.index.merge.policy.max_merge_at_once:Inthisproperty,themaximumnumberofsegmentstobemergedatoncedefaultsto10.index.merge.policy.max_merge_at_once_explicit:Inthisproperty,themaximumnumberofsegmentsmergedatonceduringexpungedeletesoroptimizeoperationsdefaultsto10.index.merge.policy.max_merged_segment:Inthisproperty,themaximumsizeofsegmentthatcanbeproducedduringnormalmergingdefaultsto5gb.index.merge.policy.segments_per_tier:Thispropertydefaultsto10androughlydefinesthenumberofsegments.Smallervaluesmeanmoremergingbutfewersegments,whichresultsinhighersearchspeedbutlowerindexingspeedandmoreI/Opressure.Highervaluesofthepropertywillresultinhighersegmentscount,thusslowersearchspeedbuthigherindexingspeed.index.merge.policy.reclaim_deletes_weight–ThispropertytellsElasticsearchhowimportantitistochoosesegmentswithmanydeleteddocuments.Itdefaultsto2.0.

Forexample,toupdatemergepolicysettingsofalreadycreatedindexwecouldrunacommandlikethis:

curl-XPUT'localhost:9200/essb/_settings'-d'{

"index.merge.policy.max_merged_segment":"10gb"

}'

Togetdeeperintosegmentmerging,refertoourbookMasteringElasticsearchSecondEdition,publishedbyPacktPublishing.Youcanalsofindmoreinformationaboutthetieredmergepolicyathttps://www.elastic.co/guide/en/elasticsearch/reference/current/index-modules-merge.html.

NoteUptothe2.0versionofElasticsearch,wewereabletochoosebetweenthreemergepolicies:tiered,log_byte_size,andlog_doc.Thecurrentlyusedmergepolicyisbased

www.EBooksWorld.ir

onthetieredmergepolicyandweareforcedtouseit.

www.EBooksWorld.ir

ThemergeschedulerThemergeschedulertellsElasticsearchhowthemergeprocessshouldoccur.Thecurrentimplementationisbasedonaconcurrentmergeschedulerthatisstartedinaseparatethreadandusesthedefinednumberofthreadsdoingmergesinparallel.Elasticsearchallowsyoutosetthenumberofthreadsthatcanbeusedforsimultaneousmergingbyusingtheindex.merge.scheduler.max_thread_countproperty.

www.EBooksWorld.ir

ThrottlingAswehavealreadymentioned,mergingmaybeexpensivewhenitcomestoserverresources.Themergeprocessusuallyworksinparalleltootheroperations,sotheoreticallyitshouldn’thavetoomuchinfluence.Inpractice,thenumberofdiskinput/outputoperationscanbesolargeastosignificantlyaffecttheoverallperformance.Insuchcases,throttlingissomethingthatmayhelp.Infact,thisfeaturecanbeusedforlimitingthespeedofthemerge,butitmayalsobeusedforalltheoperationsusingthedatastore.ThrottlingcanbesetintheElasticsearchconfigurationfile(theelasticsearch.ymlfile)ordynamicallybyusingthesettingsAPI(refertotheTheupdatesettingsAPIsectionofChapter9,ElasticsearchCluster,fordetail).Therearetwosettingsthatadjustthrottling:typeandvalue.

Tosetthethrottlingtype,settheindices.store.throttle.typeproperty,whichallowsustousethefollowingvalues:

none:Thisvaluedefinesthatnothrottlingisonmerge:Thisvaluedefinesthatthrottlingaffectsonlythemergeprocessall:Thisvaluedefinesthatthrottlingisusedforallthedatastoreactivities

Thesecondproperty,indices.store.throttle.max_bytes_per_sec,describeshowmuchthethrottlinglimitstheI/Ooperations.Asitsnamesuggests,ittellsushowmanybytescanbeprocessedpersecond.Forexample,let’slookatthefollowingconfiguration:

indices.store.throttle.type:merge

indices.store.throttle.max_bytes_per_sec:10mb

Inthisexample,welimitthemergeoperationsto10megabytespersecond.Bydefault,Elasticsearchusesthemergethrottlingtypewiththemax_bytes_per_secpropertysetto20mb.Thismeansthatallthemergeoperationsarelimitedto20megabytespersecond.

www.EBooksWorld.ir

www.EBooksWorld.ir

IntroductiontoroutingBydefault,Elasticsearchwilltrytodistributeyourdocumentsevenlyamongalltheshardsoftheindex.However,that’snotalwaysthedesiredsituation.Inordertoretrievethedocuments,Elasticsearchmustqueryalltheshardsandmergetheresults.Whatifwecoulddivideourdataonsomebasis(forexample,theclientidentifier)andusethatinformationtoputdatawiththesamepropertiesinthesameplaceinthecluster.Elasticsearchallowsustodothatbyexposingapowerfuldocumentandquerydistributioncontrolmechanismrouting.Inshort,itallowsustochooseashardtobeusedtoindexorsearchthedata.

www.EBooksWorld.ir

DefaultindexingDuringindexingoperations,whenyousendadocumentforindexing,Elasticsearchlooksatitsidentifiertochoosetheshardinwhichthedocumentshouldbeindexed.Bydefault,Elasticsearchcalculatesthehashvalueofthedocument’sidentifierand,onthebasisofthat,itputsthedocumentinoneoftheavailableprimaryshards.Then,thosedocumentsareredistributedtothereplicas.Thefollowingdiagramshowsasimpleillustrationofhowindexingworksbydefault:

www.EBooksWorld.ir

DefaultsearchingSearchingisabitdifferentfromindexing,becauseinmostsituationsyouneedtoqueryalltheshardstogetthedatayouareinterestedin(wewilltalkaboutthatinChapter3,SearchingYourData),atleastintheinitialscatterphaseofthequery.Imagineasituationwhenyouhavethefollowingmappingsdescribingyourindex:

{

"mappings":{

"post":{

"properties":{

"id":{"type":"long"},

"name":{"type":"string"},

"contents":{"type":"string"},

"userId":{"type":"long"}

}}

}}

Asyoucansee,ourindexconsistsoffourfields:theidentifier(theidfield),nameofthedocument(thenamefield),contentsofthedocument(thecontentsfield),andtheidentifieroftheusertowhichthedocumentsbelong(theuserIdfield).Togetallthedocumentsforaparticularuser,onewithuserIdequalto12,youcanrunthefollowingquery:

curl–XGET'http://localhost:9200/posts/_search?q=userId:12'

Dependingonthesearchtype(wewilltalkmoreaboutitinChapter3,SearchingYourData),Elasticsearchwillrunyourquery.Itusuallymeansthatitwillfirstqueryallthenodesfortheidentifiersandscoreofthematchingdocumentsandthenitwillsendaninternalqueryagain,butonlytotherelevantshards(theonescontainingtheneededdocuments)togetthedocumentsneededtobuildtheresponse.

Averysimplifiedviewofhowthedefaultsearchingworksduringitsinitialphaseisshowninthefollowingillustration:

www.EBooksWorld.ir

Whatifwecouldputallthedocumentsforasingleuserintoasingleshardandqueryonthatshard?Wouldn’tthatbewiseforperformance?Yes,thatishandyandthatiswhatroutingallowsyoudoto.

www.EBooksWorld.ir

RoutingRoutingcancontrolwhichshardyourdocumentsandquerieswillbeforwardedto.Bynow,youwillprobablyhaveguessedthatwecanspecifytheroutingvaluebothduringindexingandduringqueryingand,infact,ifyoudecidetospecifyexplicitroutingvalues,you’llprobablywanttodothatduringindexingandsearching.

Inourcase,wewillusetheuserIdvaluetosetroutingduringindexingandthesamevaluewillbeusedduringsearching.Becausewewillusethesameroutingvalueforallthedocumentsforasingleuser,thesamehashvaluewillbecalculatedandthusallthedocumentsforthatparticularuserwillbeplacedinthesameshard.Usingthesamevalueduringsearchwillresultinsearchingasingleshardinsteadofthewholeindex.

Thereisonethingyoushouldrememberwhenusingroutingwhensearching.Whensearching,youshouldaddaquerypartthatwilllimitthereturneddocumentstotheonesforthegivenuser.Routingisnotenough.Thisisbecauseyou’llprobablyhavemoredistinctroutingvaluesthanthenumberofshardsyourindexwillbebuiltwith.Forexample,youcanhave10shardsbuildingyourindex,butatthesametimehavehundredsofusers.Itisphysicallyimpossibletodedicateasingleshardtoonlyasingleuser.Itisusuallynotgoodfromascalingpointforviewaswell.Becauseofthat,afewdistinctvaluescanpointtothesameshard–inourcasedataofafewuserswillbeplacedinthesameshard.Becauseofthat,weneedaquerypartthatwilllimitthedatatoaparticularuseridentifier,suchasatermquery.

Thefollowingdiagramshowsaverysimpleillustrationofhowsearchingworkswithaprovidedcustomroutingvalue:

www.EBooksWorld.ir

Asyoucansee,Elasticsearchwillsendourquerytoasingleshard.Nowlet’slookathowwecanspecifytheroutingvalues.

www.EBooksWorld.ir

TheroutingparametersTheideaisverysimple.TheendpointusedforalltheoperationsconnectedwithfetchingorstoringdocumentsinElasticsearchallowsustouseadditionalparametercalledrouting.YoucanaddittoyourHTTPorsetitbyusingtheclientlibraryofyourchoice.

So,inordertoindexasampledocumenttothepreviouslyshownindex,wewillusethefollowingcommand:

curl-XPUT'http://localhost:9200/posts/post/1?routing=12'-d'{

"id":"1",

"name":"Testdocument",

"contents":"Testdocument",

"userId":"12"

}'

Ifwenowgetbacktoourpreviousqueryfetchingouruser’sdataandwemodifyittouserouting,itwouldlookasfollows:

curl-XGET'http://localhost:9200/posts/_search?routing=12&q=userId:12'

Asyoucansee,thesameroutingvaluewasusedduringindexingandquerying.Thisispossibleinmostcaseswhenroutingisused.Weknowwhichuserdataweareindexingandwewillprobablyknowwhichuserissearchingforthedata.Inourcase,ourimaginaryuserwasgiventheidentifierof12andweusedthatvalueduringindexingandsearching.

Notethatduringsearchingyoucanspecifymultipleroutingvaluesseparatedbycommas.Forexample,ifwewanttheprecedingquerytobeadditionallyroutedbythevalueofthesectionparameter(ifitexisted)andwealsowanttofilterbythisparameter,ourquerywilllooklikethefollowing:

curl-XGET'http://localhost:9200/posts/_search?

routing=12,6654&q=userId:12+AND+section:6654'

Ofcourse,theprecedingcommandcanmatchmultipleshardsnowasthevaluesgiventoroutingcanpointtomultipleshards.Becauseofthatyouneedtoprovideonlyasingleroutingvalueduringindexation(Elasticsearchneedstobepointedtoasingleshardorindexationwillfail).Youcanofcoursequerymultipleshardsatthesametimeandbecauseofthatmultipleroutingvaluescanbeprovidedduringsearching.

NoteRememberthatroutingisnottheonlythingthatisrequiredtogetresultsforagivenuser.That’sbecauseusuallywehavefewshardsthathaveuniqueroutingvalues.Thismeansthatwewillhavedatafrommultipleusersinasingleshard.So,whenusingrouting,youshouldalsonarrowdownyourresultstotheonesforagivenuser.You’lllearnmoreabouthowyoucandothatinChapter3,SearchingYourData.

www.EBooksWorld.ir

RoutingfieldsSpecifyingtheroutingvaluewitheachrequestiscriticalwhenusinganindexoperation.Withoutit,Elasticsearchusesthedefaultwayofdeterminingwherethedocumentshouldbestored–itusesthehashvalueofthedocumentidentifier.Thismayleadtoasituationwhereonedocumentexistsinmanyversionsondifferentshards.Asimilarsituationmayoccurwhenfetchingthedocument.Whenadocumentisstoredwithagivenroutingvalue,wemayhitthewrongshardandthedocumentmaybenotfound.

Infact,Elasticsearchallowsustochangethedefaultbehaviorandforcesustouseroutingwhenqueryingagivenindex.Todothat,weneedtoaddthefollowingsectiontoourtypedefinition:

"_routing":{

"required":true

}

Theprecedingdefinitionmeansthattheroutingvalueneedstobeprovided(the"required":trueproperty);withoutit,anindexrequestwillfail.

www.EBooksWorld.ir

www.EBooksWorld.ir

SummaryInthischapter,we’velearnedalotwhenitcomestoindexationanddatahandlinginElasticsearch.WestartedwithbasicinformationaboutElasticsearchandweproceededtotuningtheschema-lessbehaviorinElasticsearch.Welearnedhowtoconfigureourmappings,useoutoftheboxlanguageanalysiscapabilitiesofElasticsearch,andcreateourownmappings.Welookedatbatchindexingtospeedupindexationandweaddedadditionalinternalinformationtothedocumentsinourindices.Finally,welookedatsegmentmergingandrouting.

Inthenextchapter,wewillfullyconcentrateonsearchingandtheextensivequerylanguageofElasticsearch.WewillstartwithhowtoqueryElasticsearchandhowtheElasticsearchqueryprocessworks.Wewilllearnaboutallthebasicqueriesandcompoundqueriestobeabletousetheminourapplications.Finally,wewillseewhichqueryshouldbechosenforthegivenusecase.

www.EBooksWorld.ir

www.EBooksWorld.ir

Chapter3.SearchingYourDataInthepreviouschapter,wedivedintoElasticsearchindexing.Welearnedalotwhenitcomestodatahandling.WesawhowtotuneElasticsearchschema-lessmechanismandwenowknowhowtocreateourownmappings.WealsosawthecoretypesofElasticsearchandweusedanalyzers–boththeonethatcomesoutoftheboxwithElasticsearchandtheonewedefinedourselves.Weusedbulkindexingandweaddedadditionalinternalinformationtoourindices.Finally,welearnedwhatsegmentmergingis,howwecanfinetuneit,andhowtouseroutinginElasticsearchandwhatitgivesus.Thischapterisfullydedicatedtoquerying.Bytheendofthischapter,youwillhavelearnedthefollowingtopics:

HowtoqueryElasticsearchWhathappensinternallywhenqueriesarerunWhatarethebasicqueriesinElasticsearchWhatarethecompoundqueriesinElasticsearchthatallowustogroupotherqueriesHowtousepositionawarequeries–spanqueriesHowtochoosetherightqueryforthejob

www.EBooksWorld.ir

QueryingElasticsearchSofar,whenwehavesearchedourdata,weusedtheRESTAPIandasimplequeryortheGETrequest.Similarly,whenwewerechangingtheindex,wealsousedtheRESTAPIandsenttheJSON-structureddatatoElasticsearch.Regardlessofthetypeofoperationwewantedtoperform,whetheritwasamappingchangeordocumentindexation,weusedJSONstructuredrequestbodytoinformElasticsearchabouttheoperationdetails.

AsimilarsituationhappenswhenwewanttosendmorethanasimplequerytoElasticsearch,westructureitusingtheJSONobjectsandsendittoElasticsearchintherequestbody.ThisiscalledthequeryDSL.Inabroaderview,Elasticsearchsupportstwokindsofqueries:basiconesandcompoundones.Basicqueries,suchasthetermquery,areusedforqueryingtheactualdata.WewillcovertheseintheBasicqueriessectionofthischapter.Thesecondtypeofqueryisthecompoundquery,suchastheboolquery,whichcancombinemultiplequeries.WewillcovertheseintheCompoundqueriessectionofthischapter.

However,thisisnotthewholepicture.Inadditiontothesetwotypesofqueries,certainqueriescanhavefiltersthatareusedtonarrowdownyourresultswithcertaincriteria.Filterqueriesdon’taffectscoringandareusuallyveryefficientandeasilycached.

Tomakeitevenmorecomplicated,queriescancontainotherqueries(don’tworry;wewilltrytoexplainallthis!).Furthermore,somequeriescancontainfiltersandotherscancontainbothqueriesandfilters.Althoughthisisnoteverything,wewillstickwiththisworkingexplanationfornow.WewillgooverthisingreaterdetailintheCompoundqueriessectioninthischapterandtheFilteringyourresultssectioninChapter4,ExtendingYourQueryingKnowledge.

www.EBooksWorld.ir

TheexampledataIfnotstatedotherwise,thefollowingmappingswillbeusedfortherestofthechapter:

{

"book":{

"properties":{

"author":{

"type":"string"

},

"characters":{

"type":"string"

},

"copies":{

"type":"long",

"ignore_malformed":false

},

"otitle":{

"type":"string"

},

"tags":{

"type":"string",

"index":"not_analyzed"

},

"title":{

"type":"string"

},

"year":{

"type":"long",

"ignore_malformed":false,

"index":"analyzed"

},

"available":{

"type":"boolean"

}

}

}

}

Theprecedingmappingsrepresentasimplelibraryandwereusedtocreatethelibraryindex.OnethingtorememberisthatElasticsearchwillanalyzethestringbasedfieldsifwedon’tconfigureitdifferently.

Theprecedingmappingswerestoredinthemapping.jsonfileand,inordertocreatethementionedlibraryindex,wecanusethefollowingcommands:

curl-XPOST'localhost:9200/library'

curl-XPUT'localhost:9200/library/book/_mapping'-d@mapping.json

Wealsousedthefollowingsampledataastheexampleonesforthischapter:

{"index":{"_index":"library","_type":"book","_id":"1"}}

{"title":"AllQuietontheWesternFront","otitle":"ImWestennichts

Neues","author":"ErichMariaRemarque","year":1929,"characters":["Paul

Bäumer","AlbertKropp","HaieWesthus","FredrichMüller","Stanislaus

www.EBooksWorld.ir

Katczinsky","Tjaden"],"tags":["novel"],"copies":1,"available":true,

"section":3}

{"index":{"_index":"library","_type":"book","_id":"2"}}

{"title":"Catch-22","author":"JosephHeller","year":1961,"characters":

["JohnYossarian","CaptainAardvark","ChaplainTappman","Colonel

Cathcart","DoctorDaneeka"],"tags":["novel"],"copies":6,"available":

false,"section":1}

{"index":{"_index":"library","_type":"book","_id":"3"}}

{"title":"TheCompleteSherlockHolmes","author":"ArthurConan

Doyle","year":1936,"characters":["SherlockHolmes","Dr.Watson","G.

Lestrade"],"tags":[],"copies":0,"available":false,"section":12}

{"index":{"_index":"library","_type":"book","_id":"4"}}

{"title":"CrimeandPunishment","otitle":"Преступлéниеи

наказáние","author":"FyodorDostoevsky","year":1886,"characters":

["Raskolnikov","SofiaSemyonovnaMarmeladova"],"tags":[],"copies":0,

"available":true}

Westoredoursampledatainthedocuments.jsonfileandweusethefollowingcommandtoindexit:

curl-s-XPOST'localhost:9200/_bulk'--data-binary@documents.json

Thiscommandrunsbulkindexing.YoucanlearnmoreaboutitintheBatchindexingtospeedupyourindexingprocesssectioninChapter2,IndexingYourData.

www.EBooksWorld.ir

AsimplequeryThesimplestwaytoqueryElasticsearchistousetheURIrequestquery.WealreadydiscusseditintheSearchingwiththeURIrequestquerysectionofChapter1,GettingStartedwithElasticsearchCluster.Forexample,tosearchforthewordcrimeinthetitlefield,youcouldsendaqueryusingthefollowingcommand:

curl-XGET'localhost:9200/library/book/_search?q=title:crime&pretty'

Thisisaverysimple,butlimited,wayofsubmittingqueriestoElasticsearch.IfwelookfromthepointofviewoftheElasticsearchqueryDSL,theprecedingqueryisaquery_stringquery.Itsearchesforthedocumentsthathavethetermcrimeinthetitlefieldandcanberewrittenasfollows:

{

"query":{

"query_string":{"query":"title:crime"}

}

}

SendingaqueryusingthequeryDSLisabitdifferent,butstillnotrocketscience.WesendtheGET(POSTisalsoacceptedincaseyourtoolorlibrarydoesn’tallowsendingrequestbodyinHTTPGETrequests)HTTPrequesttothe_searchRESTendpointasearlierandincludethequeryintherequestbody.Let’stakealookatthefollowingcommand:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"query_string":{"query":"title:crime"}

}

}'

Asyoucansee,weusedtherequestbody(the-dswitch)tosendthewholeJSON-structuredquerytoElasticsearch.TheprettyrequestparametertellsElasticsearchtostructuretheresponseinsuchawaythatwehumanscanreaditmoreeasily.Inresponsetotheprecedingcommand,wegetthefollowingoutput:

{

"took":4,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.5,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"4",

"_score":0.5,

www.EBooksWorld.ir

"_source":{

"title":"CrimeandPunishment",

"otitle":"Преступлéниеинаказáние",

"author":"FyodorDostoevsky",

"year":1886,

"characters":["Raskolnikov","SofiaSemyonovnaMarmeladova"],

"tags":[],

"copies":0,

"available":true

}

}]

}

}

Nice!WegotourfirstsearchresultswiththequeryDSL.

www.EBooksWorld.ir

PagingandresultsizeElasticsearchallowsustocontrolhowmanyresultswewanttoget(atmost)andfromwhichresultwewanttostart.Thefollowingarethetwoadditionalpropertiesthatcanbesetintherequestbody:

from:Thispropertyspecifiesthedocumentthatwewanttohaveourresultsfrom.Itsdefaultvalueis0,whichmeansthatwewanttogetourresultsfromthefirstdocument.size:Thispropertyspecifiesthemaximumnumberofdocumentswewantastheresultofasinglequery(whichdefaultsto10).Forexample,ifweareonlyinterestedinaggregationsresultsanddon’tcareaboutthedocumentsreturnedbythequery,wecansetthisparameterto0.

Ifwewantourquerytogetdocumentsstartingfromthetenthitemonthelistandfetch20documents,wesendthefollowingquery:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"from":9,

"size":20,

"query":{

"query_string":{"query":"title:crime"}

}

}'

TipDownloadingtheexamplecode

Youcandownloadtheexamplecodefilesforthisbookfromyouraccountathttp://www.packtpub.com.Ifyoupurchasedthisbookelsewhere,youcanvisithttp://www.packtpub.com/supportandregistertohavethefilese-maileddirectlytoyou.

Youcandownloadthecodefilesbyfollowingthesesteps:

Loginorregistertoourwebsiteusingyoure-mailaddressandpasswordHoverthemousepointerontheSUPPORTtabatthetopClickonCodeDownloads&ErrataEnterthenameofthebookintheSearchboxSelectthebookforwhichyou’relookingtodownloadthecodefilesChoosefromthedrop-downmenuwhereyoupurchasedthisbookfromClickonCodeDownload

Oncethefileisdownloaded,makesurethatyouunziporextractthefolderusingthelatestversionof:

WinRAR/7-ZipforWindowsZipeg/iZip/UnRarXforMac7-Zip/PeaZipforLinux

www.EBooksWorld.ir

ReturningtheversionvalueInadditiontoalltheinformationreturned,Elasticsearchcanreturntheversionofthedocument(wementionedaboutversioninginChapter1,GettingStartedwithElasticsearchCluster.Todothis,weneedtoaddtheversionpropertywiththevalueoftruetothetoplevelofourJSONobject.So,thefinalquery,whichrequeststheversioninformation,willlookasfollows:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"version":true,

"query":{

"query_string":{"query":"title:crime"}

}

}'

Afterrunningtheprecedingquery,wegetthefollowingresults:

{

"took":4,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.5,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"4",

"_version":1,

"_score":0.5,

"_source":{

"title":"CrimeandPunishment",

"otitle":"Преступлéниеинаказáние",

"author":"FyodorDostoevsky",

"year":1886,

"characters":["Raskolnikov","SofiaSemyonovnaMarmeladova"],

"tags":[],

"copies":0,

"available":true

}

}]

}

}

Asyoucansee,the_versionsectionispresentforthesinglehitwegot.

www.EBooksWorld.ir

LimitingthescoreFornonstandardusecases,Elasticsearchprovidesafeaturethatletsusfiltertheresultsonthebasisofaminimumscorevaluethatthedocumentmusthavetobeconsideredamatch.Inordertousethisfeature,wemustprovidethemin_scorevalueatthetoplevelofourJSONobjectwiththevalueoftheminimumscore.Forexample,ifwewantourquerytoonlyreturndocumentswithascorehigherthan0.75,wesendthefollowingquery:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"min_score":0.75,

"query":{

"query_string":{"query":"title:crime"}

}

}'

Wegetthefollowingresponseafterrunningtheprecedingquery:

{

"took":3,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":0,

"max_score":null,

"hits":[]

}

}

Ifyoulookatthepreviousexamples,thescoreofourdocumentwas0.5,whichislowerthan0.75,andthuswedidn’tgetanydocumentsinresponse.

Limitingthescoreusuallydoesn’tmakemuchsensebecausecomparingscoresbetweenthequeriesisquitehard.However,maybeinyourcase,thisfunctionalitywillbeneeded.

www.EBooksWorld.ir

ChoosingthefieldsthatwewanttoreturnWiththeuseofthefieldsarrayintherequestbody,Elasticsearchallowsustodefinewhichfieldstoincludeintheresponse.Rememberthatyoucanonlyreturnthesefieldsiftheyaremarkedasstoredinthemappingsusedtocreatetheindex,orifthe_sourcefieldwasused(Elasticsearchusesthe_sourcefieldtoprovidethestoredvaluesandthe_sourcefieldisturnedonbydefault).

So,forexample,toreturnonlythetitleandtheyearfieldsintheresults(foreachdocument),sendthefollowingquerytoElasticsearch:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"fields":["title","year"],

"query":{

"query_string":{"query":"title:crime"}

}

}'

Inresponse,wegetthefollowingoutput:

{

"took":5,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.5,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"4",

"_score":0.5,

"fields":{

"title":["CrimeandPunishment"],

"year":[1886]

}

}]

}

}

Asyoucansee,everythingworkedaswewantedto.Therearefourthingswewouldliketosharewithyouatthispoint,whichareasfollows:

Ifwedon’tdefinethefieldsarray,itwillusethedefaultvalueandreturnthe_sourcefieldifavailable.Ifweusethe_sourcefieldandrequestafieldthatisnotstored,thenthatfieldwillbeextractedfromthe_sourcefield(however,thisrequiresadditionalprocessing).Ifwewanttoreturnallthestoredfields,wejustpassanasterisk(*)asthefieldname.Fromaperformancepointofview,it’sbettertoreturnthe_sourcefieldinsteadof

www.EBooksWorld.ir

multiplestoredfields.Thisisbecausegettingmultiplestoredfieldsmaybeslowercomparedtoretrievingasingle_sourcefield.

www.EBooksWorld.ir

SourcefilteringInadditiontochoosingwhichfieldsarereturned,Elasticsearchallowsustouseso-calledsourcefiltering.Thisfunctionalityallowsustocontrolwhichfieldsarereturnedfromthe_sourcefield.Elasticsearchexposesseveralwaystodothis.Thesimplestsourcefilteringallowsustodecidewhetheradocumentshouldbereturnedornot.Considerthefollowingquery:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"_source":false,

"query":{

"query_string":{"query":"title:crime"}

}

}'

TheresultretunedbyElasticsearchshouldbesimilartothefollowingone:

{

"took":12,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.5,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"4",

"_score":0.5

}]

}

}

Notethattheresponseislimitedtobaseinformationaboutadocumentandthe_sourcefieldwasnotincluded.IfyouuseElasticsearchasasecondsourceofdataandcontentofthedocumentisservedfromSQLdatabaseorcache,thedocumentidentifierisallyouneed.

Thesecondwayissimilartothatdescribedintheprecedingfields,althoughwedefinewhichfieldsshouldbereturnedinthedocumentsourceitself.Let’sseethatusingthefollowingexamplequery:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"_source":["title","otitle"],

"query":{

"query_string":{"query":"title:crime"}

}

}'

Wewantedtogetthetitleandtheotitledocumentfieldsinthereturned_sourcefield.

www.EBooksWorld.ir

Elasticsearchextractedthosevaluesfromtheoriginal_sourcevalueandincludedthe_sourcefieldonlywiththerequestedfields.ThewholeresponsereturnedbyElasticsearchlookedasfollows:

{

"took":2,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.5,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"4",

"_score":0.5,

"_source":{

"otitle":"Преступлéниеинаказáние",

"title":"CrimeandPunishment"

}

}]

}

}

Wecanalsouseanasterisktoselectwhichfieldsshouldbereturnedinthe_sourcefield;forexample,title*willreturnvaluesforthetitlefieldandfortitle10(ifwehavesuchfieldinourdata).Ifwehavedocumentswithnestedparts,wecanusenotationwithadot;forexample,title.*toselectallthefieldsnestedunderthetitleobject.

Finally,wecanalsospecifyexplicitlywhichfieldswewanttoincludeandwhichtoexcludefromthe_sourcefield.Wecanincludefieldsusingtheincludepropertyandwecanexcludefieldsusingtheexcludeproperty(bothofthemarearraysofvalues).Forexample,ifwewantthereturned_sourcefieldtoincludeallthefieldsstartingwiththelettertbutnotthetitlefield,wewillrunthefollowingquery:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"_source":{

"include":["t*"],

"exclude":["title"]

},

"query":{

"query_string":{"query":"title:crime"}

}

}'

www.EBooksWorld.ir

UsingthescriptfieldsElasticsearchallowsustousescript-evaluatedvaluesthatwillbereturnedwiththeresultdocuments(wewilldiscussElasticsearchscriptingcapabilitiesingreaterdetailintheScriptingcapabilitiesofElasticsearchsectioninChapter6,MakeYourSearchBetter).Tousethescriptfieldsfunctionality,weaddthescript_fieldssectiontoourJSONqueryobjectandanobjectwithanameofourchoiceforeachscriptedvaluethatwewanttoreturn.Forexample,toreturnavaluenamedcorrectYear,whichiscalculatedastheyearfieldminus1800,werunthefollowingquery:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"script_fields":{

"correctYear":{

"script":"doc[\"year\"].value-1800"

}

},

"query":{

"query_string":{"query":"title:crime"}

}

}'

NoteBydefault,Elasticsearchdoesn’tallowustousedynamicscripting.Ifyoutriedtheprecedingquery,youprobablygotanerrorwithinformationstatingthatthescriptsoftype[inline]withoperation[search]andlanguage[groovy]aredisabled.Tomakethisexamplework,youshouldaddthescript.inline:onpropertytotheelasticsearch.ymlfile.However,thisexposesasecuritythreat.MakesuretoreadtheScriptingcapabilitiesofElasticsearchsectioninChapter6,MakeYourSearchBetter,tolearnabouttheconsequences.

Usingthedocnotation,likewedidintheprecedingexample,allowsustocatchtheresultsreturnedandspeedupscriptexecutionatthecostofhighermemoryconsumption.Wealsogetlimitedtosingle-valuedandsingletermfields.Ifwecareaboutmemoryusage,orifweareusingmorecomplicatedfieldvalues,wecanalwaysusethe_sourcefield.Thesamequeryusingthe_sourcefieldlooksasfollows:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"script_fields":{

"correctYear":{

"script":"_source.year-1800"

}

},

"query":{

"query_string":{"query":"title:crime"}

}

}'

ThefollowingresponseisreturnedbyElasticsearchwithdynamicscriptingenabled:

{

"took":76,

www.EBooksWorld.ir

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.5,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"4",

"_score":0.5,

"fields":{

"correctYear":[86]

}

}]

}

}

Asyoucansee,wegotthecalculatedcorrectYearfieldinresponse.

PassingparameterstothescriptfieldsLet’stakealookatonemorefeatureofthescriptfields-thepassingofadditionalparameters.Insteadofhavingthevalue1800intheequation,wecanuseavariablenameandpassitsvalueintheparamssection.Ifwedothis,ourquerywilllookasfollows:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"script_fields":{

"correctYear":{

"script":"_source.year-paramYear",

"params":{

"paramYear":1800

}

}

},

"query":{

"query_string":{"query":"title:crime"}

}

}'

Asyoucansee,weaddedtheparamYearvariableaspartofthescriptedequationandprovideditsvalueintheparamssection.ThisallowsElasticsearchtoexecutethesamescriptwithdifferentparametervaluesinaslightlymoreefficientway.

www.EBooksWorld.ir

www.EBooksWorld.ir

UnderstandingthequeryingprocessAfterreadingtheprevioussection,wenowknowhowqueryingworksinElasticsearch.YouknowthatElasticsearch,inmostcases,needstoscatterthequeryacrossmultiplenodes,gettheresults,mergethem,fetchtherelevantdocumentsfromoneormoreshards,andreturnthefinalresultstotheclientrequestingthedocuments.Whatwedidn’ttalkaboutaretwoadditionalthingsthatdefinehowqueriesbehave:searchtypeandqueryexecutionpreference.WewillnowconcentrateonthesefunctionalitiesofElasticsearch.

www.EBooksWorld.ir

QuerylogicElasticsearchisadistributedsearchengineandsoallfunctionalityprovidedmustbedistributedinitsnature.Itisexactlythesamewithquerying.Becausewewouldliketodiscusssomemoreadvancedtopicsonhowtocontrolthequeryprocess,wefirstneedtoknowhowitworks.

Let’snowgetbacktohowqueryingworks.Westartedthetheoryinthefirstchapterandwewouldliketogetbacktoit.Bydefault,ifwedon’talteranything,thequeryprocesswillconsistoftwophases:thescatterandthegatherphase.Theaggregatornode(theonethatreceivestherequest)willrunthescatterphasefirst.Duringthatphase,thequeryisdistributedtoalltheshardsthatourindexisbuiltfrom(ofcourseifroutingisnotused).Forexample,ifitisbuiltof5shardsand1replicathen5physicalshardswillbequeried(wedon’tneedtoqueryashardanditsreplicaastheycontainthesamedata).Eachofthequeriedshardswillonlyreturnthedocumentidentifierandthescoreofthedocument.Thenodethatsentthescatterquerywillwaitforalltheshardstocompletetheirtask,gathertheresults,andsortthemappropriately(inthiscase,fromtopscoringtothelowestscoringones).

Afterthat,anewrequestwillbesenttobuildthesearchresults.However,nowonlytothoseshardsthatheldthedocumentstobuildtheresponse.Inmostcases,Elasticsearchwon’tsendtherequesttoalltheshardsbuttoitssubset.That’sbecauseweusuallydon’tgetthecompleteresultofthequerybutonlyaportionofit.Thisphaseiscalledthegatherphase.Afterallthedocumentsaregathered,thefinalresponseisbuiltandreturnedasthequeryresult.ThisisthebasicanddefaultElasticsearchbehaviorbutwecanchangeit.

www.EBooksWorld.ir

SearchtypeElasticsearchallowsustochoosehowwewantourquerytobeprocessedinternally.Wecandothatbyspecifyingthesearchtype.Therearedifferentsituationswheredifferentsearchtypesareappropriate:sometimesonecancareonlyabouttheperformancewhilesometimesqueryrelevanceisthemostimportantfactor.YoushouldrememberthateachshardisasmallLuceneindexand,inordertoreturnmorerelevantresults,someinformation,suchasfrequencies,needstobetransferredbetweentheshards.Tocontrolhowthequeriesareexecuted,wecanpassthesearch_typerequestparameterandsetittooneofthefollowingvalues:

query_then_fetch:Inthefirststep,thequeryisexecutedtogettheinformationneededtosortandrankthedocuments.Thisstepisexecutedagainstalltheshards.Thenonlytherelevantshardsarequeriedfortheactualcontentofthedocuments.Thisisthesearchtypeusedbydefaultifnosearchtypeisprovidedwiththequeryandthisisthequerytypewedescribedpreviously.dfs_query_then_fetch:Thisissimilartoquery_then_fetch.However,itcontainsanadditionalqueryphasecomparingtoquery_then_fetchwhichcalculatesdistributedtermfrequencies.

Therearealsotwodeprecatedsearchtypes:countandscan.ThefirstoneisdeprecatedstartingfromElasticsearch2.0andthesecondonestartingwithElasticsearch2.1.Thefirstsearchtypeusedtoprovidebenefitswhereonlyaggregationsorthenumberofdocumentswasrelevant,butnowitisenoughtoaddsizeequalto0toyourqueries.Thescanrequestwasusedforscrollingfunctionality.

Soifwewouldliketousethesimplestsearchtype,wewouldrunthefollowingcommand:

curl-XGET'localhost:9200/library/book/_search?

pretty&search_type=query_then_fetch'-d'{

"query":{

"term":{"title":"crime"}

}

}'

www.EBooksWorld.ir

SearchexecutionpreferenceInadditiontothepossibilityofcontrollinghowthequeryisexecuted,wecanalsocontrolonwhichshardstoexecutethequery.Bydefault,Elasticsearchusesshardsandreplicasonanynodeinaroundrobinmanner–sothateachshardisqueriedasimilarnumberoftimes.Thedefaultbehavioristhepropermethodofshardexecutionpreferenceformostusecases.Buttheremaybetimeswhenwewanttochangethedefaultbehavior.Forexample,youmaywantthesearchtoonlybeexecutedontheprimaryshards.Todothat,wecansetthepreferencerequestparametertooneofthefollowingvalues:

_primary:Theoperationwillbeonlyexecutedontheprimaryshards,sothereplicaswon’tbeused.Thiscanbeusefulwhenweneedtousethelatestinformationfromtheindexbutourdataisnotreplicatedrightaway._primary_first:Theoperationwillbeexecutedontheprimaryshardsiftheyareavailable.Ifnot,itwillbeexecutedontheothershards._replica:Theoperationwillbeexecutedonlyonthereplicashards._replica_first:Thisoperationissimilarto_primary_first,butusesreplicashards.Theoperationwillbeexecutedonthereplicashardsifpossibleandontheprimaryshardsifthereplicasarenotavailable._local:Theoperationwillbeexecutedontheshardsavailableonthenodewhichtherequestwassentfromand,ifsuchshardsarenotpresent,therequestwillbeforwardedtotheappropriatenodes._only_node:node_id:Thisoperationwillbeexecutedonthenodewiththeprovidednodeidentifier._only_nodes:nodes_spec:Thisoperationwillbeexecutedonthenodesthataredefinedinnodes_spec.ThiscanbeanIPaddress,aname,anameorIPaddressusingwildcards,andsoon.Forexample,ifnodes_specissetto192.168.1.*,theoperationwillberunonthenodeswithIPaddressesstartingwith192.168.1._prefer_node:node_id:Elasticsearchwilltrytoexecutetheoperationonthenodewiththeprovidedidentifier.However,ifthenodeisnotavailable,itwillbeexecutedonthenodesthatareavailable._shards:1,2:Elasticsearchwillexecutetheoperationontheshardswiththegivenidentifiers;inthiscase,onshardswithidentifiers1and2.The_shardsparametercanbecombinedwithotherpreferences,buttheshardsidentifiersneedtobeprovidedfirst.Forexample,_shards:1,2;_local.Customvalue:Anycustom,stringvaluemaybepassed.Requestswiththesamevaluesprovidedwillbeexecutedonthesameshards.

Forexample,ifwewouldliketoexecuteaqueryonlyonthelocalshards,wewouldrunthefollowingcommand:

curl-XGET'localhost:9200/library/_search?pretty&preference=_local'-d'{

"query":{

"term":{"title":"crime"}

}

}'

www.EBooksWorld.ir

SearchshardsAPIWhendiscussingthesearchpreference,wewouldalsoliketomentionthesearchshardsAPIexposedbyElasticsearch.ThisAPIallowsustocheckwhichshardsthequerywillbeexecutedon.InordertousethisAPI,runarequestagainstthesearch_shardsrestendpoint.Forexample,toseehowthequerywillbeexecuted,werunthefollowingcommand:

curl-XGET'localhost:9200/library/_search_shards?pretty'-d

'{"query":"match_all":{}}'

Theresponsetotheprecedingcommandwillbeasfollows:

{

"nodes":{

"my0DcA_MTImm4NE3cG3ZIg":{

"name":"Cloud9",

"transport_address":"127.0.0.1:9300",

"attributes":{}

}

},

"shards":[[{

"state":"STARTED",

"primary":true,

"node":"my0DcA_MTImm4NE3cG3ZIg",

"relocating_node":null,

"shard":0,

"index":"library",

"version":4,

"allocation_id":{

"id":"9ayLDbL1RVSyJRYIJkuAxg"

}

}],[{

"state":"STARTED",

"primary":true,

"node":"my0DcA_MTImm4NE3cG3ZIg",

"relocating_node":null,

"shard":1,

"index":"library",

"version":4,

"allocation_id":{

"id":"wfpvtaLER-KVyOsuD46Yqg"

}

}],[{

"state":"STARTED",

"primary":true,

"node":"my0DcA_MTImm4NE3cG3ZIg",

"relocating_node":null,

"shard":2,

"index":"library",

"version":4,

"allocation_id":{

"id":"zrLPWhCOSTmjlb8TY5rYQA"

}

}],[{

www.EBooksWorld.ir

"state":"STARTED",

"primary":true,

"node":"my0DcA_MTImm4NE3cG3ZIg",

"relocating_node":null,

"shard":3,

"index":"library",

"version":4,

"allocation_id":{

"id":"efnvY7YcSz6X8X8USacA7g"

}

}],[{

"state":"STARTED",

"primary":true,

"node":"my0DcA_MTImm4NE3cG3ZIg",

"relocating_node":null,

"shard":4,

"index":"library",

"version":4,

"allocation_id":{

"id":"XJHW2J63QUKdh3bK3T2nzA"

}

}]]

}

Asyoucansee,intheresponsereturnedbyElasticsearch,wehavetheinformationabouttheshardsthatwillbeusedduringthequeryprocess.Ofcourse,withthesearchshardsAPI,wecanuseadditionalparametersthatcontrolthequeryingprocess.Thesepropertiesarerouting,preference,andlocal.Wearealreadyfamiliarwiththefirsttwo.ThelocalparameterisaBoolean(valuestrueorfalse),onethatallowsustotellElasticsearchtousetheclusterstateinformationstoredonthelocalnode(settinglocaltotrue)insteadoftheonefromthemasternode(settinglocaltofalse).Thisallowsustodiagnoseproblemswithclusterstatesynchronization.

www.EBooksWorld.ir

www.EBooksWorld.ir

BasicqueriesElasticsearchhasextensivesearchanddataanalysiscapabilitiesthatareexposedinformsofdifferentqueries,filters,aggregates,andsoon.Inthissection,wewillconcentrateonthebasicqueriesprovidedbyElasticsearch.Bybasicquerieswemeantheonesthatdon’tcombinetheotherqueriestogetherbutrunontheirown.

www.EBooksWorld.ir

ThetermqueryThetermqueryisoneofthesimplestqueriesinElasticsearch.Itjustmatchesthedocumentthathasaterminagivenfield-theexact,notanalyzedterm.Thesimplesttermqueryisasfollows:

{

"query":{

"term":{

"title":"crime"

}

}

}

Itwillmatchthedocumentsthathavethetermcrimeinthetitlefield.Rememberthatthetermqueryisnotanalyzed,soyouneedtoprovidetheexacttermthatwillmatchthetermintheindexeddocument.Notethatinourinputdata,wehavethetitlefieldwiththevalueofCrimeandPunishment(uppercased),butwearesearchingforcrime,becausetheCrimetermsbecomescrimeafteranalysisduringindexing.

Inadditiontothetermwewanttofind,wecanalsoincludetheboostattributetoourtermquery,whichwillaffecttheimportanceofthegiventerm.WewilltalkmoreaboutboostsintheIntroductiontoApacheLucenescoringsectionofChapter6,MakeYourSearchBetter.Fornow,wejustneedtorememberthatitchangestheimportanceofthegivenpartofthequery.

Forexample,tochangeourpreviousqueryandgiveourtermqueryaboostof10.0,sendthefollowingquery:

{

"query":{

"term":{

"title":{

"value":"crime",

"boost":10.0

}

}

}

}

Asyoucansee,thequerychangedabit.Insteadofasimpletermvalue,wenestedanewJSONobjectwhichcontainsthevaluepropertyandtheboostproperty.Thevalueofthevaluepropertyshouldcontainthetermweareinterestedinandtheboostpropertyistheboostvaluewewanttouse.

www.EBooksWorld.ir

ThetermsqueryThetermsqueryisanextensiontothetermquery.Itallowsustomatchdocumentsthathavecertaintermsintheircontentsinsteadofasingleterm.Thetermqueryallowedustomatchasingle,notanalyzedtermandthetermsqueryallowsustomatchmultipleofthose.Forexample,let’ssaythatwewanttogetallthedocumentsthathavethetermsnovelorbookinthetagsfield.Toachievethis,wewillrunthefollowingquery:

{

"query":{

"terms":{

"tags":["novel","book"]

}

}

}

Theprecedingqueryreturnsallthedocumentsthathaveoneorbothofthesearchedtermsinthetagsfield.Thisisakeypointtoremember–thetermsquerywillfinddocumentshavinganyoftheprovidedterms.

www.EBooksWorld.ir

ThematchallqueryThematchallqueryisoneofthesimplestqueriesavailableinElasticsearch.Itallowsustomatchallofthedocumentsintheindex.Ifwewanttogetallthedocumentsfromourindex,wejustrunthefollowingquery:

{

"query":{

"match_all":{}

}

}

Wecanalsoincludeboostinthequery,whichwillbegiventoallthedocumentsmatchedbyit.Forexample,ifwewanttoaddaboostof2.0toallthedocumentsinourmatchallquery,wewillsendthefollowingquerytoElasticsearch:

{

"query":{

"match_all":{

"boost":2.0

}

}

}

www.EBooksWorld.ir

ThetypequeryAverysimplequerythatallowsustofindallthedocumentswithacertaintype.Forexample,ifwewouldliketosearchforallthedocumentswiththebooktypeinourlibraryindex,wewillrunthefollowingquery:

{

"query":{

"type":{

"value":"book"

}

}

}

www.EBooksWorld.ir

TheexistsqueryAquerythatallowsustofindallthedocumentsthathaveavalueinthedefinedfield.Forexample,tofindthedocumentsthathaveavalueinthetagsfield,wewillrunthefollowingquery:

{

"query":{

"exists":{

"field":"tags"

}

}

}

www.EBooksWorld.ir

ThemissingqueryOppositetotheexistsquery,themissingqueryreturnsthedocumentsthathaveanullvalueornovalueatallinagivenfield.Forexample,tofindallthedocumentsthatdon’thaveavalueinthetagsfield,wewillrunthefollowingquery:

{

"query":{

"missing":{

"field":"tags"

}

}

}

www.EBooksWorld.ir

ThecommontermsqueryThecommontermsqueryisamodernElasticsearchsolutionforimprovingqueryrelevanceandprecisionwithcommonwordswhenwearenotusingstopwords(http://en.wikipedia.org/wiki/Stop_words).Forexample,acrimeandpunishmentqueryresultsinthreetermqueriesandeachofthemhaveacostintermsofperformance.However,theandtermisaverycommononeanditsimpactonthedocumentscorewillbeverylow.Thesolutionisthecommontermsquerywhichdividesthequeryintotwogroups.Thefirstgroupistheonewithimportantterms,whicharetheonesthathavelowerfrequency.Thesecondgroupistheonewithlessimportantterms,whicharetheoneswithhighfrequency.ThefirstqueryisexecutedfirstandElasticsearchcalculatesthescoreforallofthetermsfromthefirstgroup.Thiswaythelowfrequencyterms,whichareusuallytheonesthathavemoreimportance,arealwaystakenintoconsideration.ThenElasticsearchexecutesthesecondqueryforthesecondgroupofterms,butcalculatesthescoreonlyforthedocumentsmatchedforthefirstquery.Thiswaythescoreisonlycalculatedfortherelevantdocumentsandthushigherperformancecanbeachieved.

Anexampleofthecommontermsqueryisasfollows:

{

"query":{

"common":{

"title":{

"query":"crimeandpunishment",

"cutoff_frequency":0.001

}

}

}

}

Thequerycantakethefollowingparameters:

query:Theactualquerycontents.cutoff_frequency:Thepercentage(0.001means0.1%)oranabsolutevalue(whenpropertyissettoavalueequaltoorlargerthan1).Highandlowfrequencygroupsareconstructedusingthisvalue.Settingthisparameterto0.001meansthatthelowfrequencytermsgroupwillbeconstructedfortermshavingafrequencyof0.1%andlower.low_freq_operator:Thiscanbesettoororand,butdefaultstoor.ItspecifiestheBooleanoperatorusedforconstructingqueriesinthelowfrequencytermgroup.Ifwewantallthetermstobepresentinadocumentforittobeconsideredamatch,weshouldsetthisparametertoand.high_freq_operator:Thiscanbesettoororand,butdefaultstoor.ItspecifiestheBooleanoperatorusedforconstructingqueriesinthehighfrequencytermgroup.Ifwewantallthetermstobepresentinadocumentforittobeconsideredamatch,weshouldsetthisparametertoand.minimum_should_match:Insteadofusinglow_freq_operatorandhigh_freq_operator,wecanuseminimum_should_match.Justlikewiththeother

www.EBooksWorld.ir

queries,itallowsustospecifytheminimumnumberoftermsthatshouldbefoundinadocumentforittobeconsideredamatch.Wecanalsospecifyhigh_freqandlow_freqinsidetheminimum_should_matchobject,whichallowsustodefinethedifferentnumberoftermsthatneedtobematchedforthehighandlowfrequencyterms.boost:Theboostgiventothescoreofthedocuments.analyzer:Thenameoftheanalyzerthatwillbeusedtoanalyzethequerytext,whichdefaultstothedefaultanalyzer.disable_coord:Defaultstofalseandallowsustoenableordisablethescorefactorcomputationthatisbasedonthefractionofallthequerytermsthatadocumentcontains.Setittotrueforlessprecisescoring,butslightlyfasterqueries.

NoteUnlikethetermandtermsqueries,thecommontermsqueryisanalyzedbyElasticsearch.

www.EBooksWorld.ir

ThematchqueryThematchquerytakesthevaluesgiveninthequeryparameter,analyzesit,andconstructstheappropriatequeryoutofit.Whenusingamatchquery,Elasticsearchwillchoosetheproperanalyzerforthefieldwechoose,soyoucanbesurethatthetermspassedtothematchquerywillbeprocessedbythesameanalyzerthatwasusedduringindexing.Rememberthatthematchquery(andthemulti_matchquery)doesn’tsupportLucenequerysyntax;however,itperfectlyfitsasaqueryhandlerforyoursearchbox.Thesimplestmatch(andthedefault)querywilllooklikethefollowing:

{

"query":{

"match":{

"title":"crimeandpunishment"

}

}

}

Theprecedingquerywillmatchallthedocumentsthathavethetermscrime,and,orpunishmentinthetitlefield.However,thepreviousqueryisonlythesimplestone;therearemultipletypesofmatchquerywhichwewilldiscussnow.

TheBooleanmatchqueryTheBooleanmatchqueryisaquerywhichanalyzestheprovidedtextandmakesaBooleanqueryoutofit.Thisisalsothedefaulttypeforthematchquery.ThereareafewparameterswhichallowustocontrolthebehavioroftheBooleanmatchqueries:

operator:Thisparametercantakethevalueofororand,andcontrolswhichBooleanoperatorisusedtoconnectthecreatedBooleanclauses.Thedefaultvalueisor.Ifwewantallthetermsinourquerytobematched,weshouldusetheandBooleanoperator.analyzer:Thisspecifiesthenameoftheanalyzerthatwillbeusedtoanalyzethequerytextanddefaultstothedefaultanalyzer.fuzziness:Providingthevalueofthisparameterallowsustoconstructfuzzyqueries.Thevalueofthisparametercanvary.Fornumericfields,itshouldbesettonumericvalue;fordatebasedfield,itcanbesettomillisecondortimevalue,suchas2h;andfortextfields,itcanbesetto0,1,or2(theeditdistanceintheLevenshteinalgorithm–https://en.wikipedia.org/wiki/Levenshtein_distance),AUTO(whichallowsElasticsearchtocontrolhowfuzzyqueriesareconstructedandwhichisapreferredvalue).Finally,fortextfields,itcanalsobesettovaluesfrom0.0to1.0,whichresultsineditdistancebeingcalculatedastermlengthminus1.0multipliedbytheprovidedfuzzinessvalue.Ingeneral,thehigherthefuzziness,themoredifferencebetweentermswillbeallowed.prefix_length:Thisallowscontroloverthebehaviorofthefuzzyquery.Formoreinformationonthevalueofthisparameter,refertotheThefuzzyquerysectioninthischapter.max_expansions:Thisallowscontroloverthebehaviorofthefuzzyquery.Formore

www.EBooksWorld.ir

informationonthevalueofthisparameter,refertotheThefuzzyquerysectioninthischapter.zero_terms_query:Thisallowsustospecifythebehaviorofthequery,whenallthetermsareremovedbytheanalyzer(forexample,becauseofstopwords).Itcanbesettononeorall,withnoneasthedefault.Whensettonone,nodocumentswillbereturnedwhentheanalyzerremovesallthequeryterms.Ifsetittoall,allthedocumentswillbereturned.cutoff_frequency:Itallowsdividingthequeryintotwogroups:onewithhighfrequencytermsandonewithlowfrequencyterms.Refertothedescriptionofthecommontermsquerytoseehowthisparametercanbeused.lenient:Whensettotrue(bydefaultitisfalse),itallowsustoignoretheexceptionscausedbydataincompatibility,suchastryingtoquerynumericfieldsusingstringvalue.

Theparametersshouldbewrappedinthenameofthefieldwearerunningthequeryagainst.SoifwewanttorunasampleBooleanmatchqueryagainstthetitlefield,wesendaqueryasfollows:

{

"query":{

"match":{

"title":{

"query":"crimeandpunishment",

"operator":"and"

}

}

}

}

ThephrasematchqueryAphrasematchqueryissimilartotheBooleanquery,but,insteadofconstructingtheBooleanclausesfromtheanalyzedtext,itconstructsthephrasequery.YoumaywonderwhatphraseiswhenitcomestoLuceneandElasticsearch–well,itistwoormoretermspositionedoneafteranotherinanorder.Thefollowingparametersareavailable:

slop:Anintegervaluethatdefineshowmanyunknownwordscanbeputbetweenthetermsinthetextqueryforamatchtobeconsideredaphrase.Thedefaultvalueofthisparameteris0,whichmeansthatnoadditionalwordsareallowed.analyzer:Thisspecifiesthenameoftheanalyzerthatwillbeusedtoanalyzethequerytextanddefaultstothedefaultanalyzer.

Asamplephrasematchqueryagainstthetitlefieldlookslikethefollowingcode:

{

"query":{

"match_phrase":{

"title":{

"query":"crimepunishment",

"slop":1

}

www.EBooksWorld.ir

}

}

}

Notethatweremovedtheandtermfromourquery,butbecausetheslopissetto1,itwillstillmatchourdocumentbecauseweallowedonetermtobepresentbetweenourterms.

ThematchphraseprefixqueryThelasttypeofthematchqueryisthematchphraseprefixquery.Thisqueryisalmostthesameasthephrasematchquery,butinaddition,itallowsprefixmatchesonthelastterminthequerytext.Also,inadditiontotheparametersexposedbythematchphrasequery,itexposesanadditionalone–themax_expansionsparameter,whichcontrolshowmanyprefixesthelasttermwillberewrittento.Ourexamplequerychangedtothematch_phrase_prefixquerywilllookasfollows:

{

"query":{

"match_phrase_prefix":{

"title":{

"query":"crimepunishm",

"slop":1,

"max_expansions":20

}

}

}

}

Notethatwedidn’tprovidethefullcrimeandpunishmentphrase,butonlycrimepunishmandstillthequerywouldmatchourdocument.Thisisbecauseweusedthematch_phrase_prefixquerycombinedwithslopsetto1.

www.EBooksWorld.ir

ThemultimatchqueryItisthesameasthematchquery,butinsteadofrunningagainstasinglefield,itcanberunagainstmultiplefieldswiththeuseofthefieldsparameter.Ofcourse,alltheparametersyouusewiththematchquerycanbeusedwiththemultimatchquery.Soifwewouldliketomodifyourmatchquerytoberunagainstthetitleandotitlefields,wewillrunthefollowingquery:

{

"query":{

"multi_match":{

"query":"crimepunishment",

"fields":["title^10","otitle"]

}

}

}

Asshownintheprecedingexample,thenicethingaboutthemultimatchqueryisthatthefieldsdefinedinitsupportboosting,sowecanincreaseordecreasetheimportanceofmatchesoncertainfields.

However,thisisnottheonlydifferencewhenitcomestocomparisonwiththematchquery.Wecanalsocontrolhowthequeryisruninternallybyusingthetypepropertyandsettingittooneofthefollowingvalues:

best_fields:Thisisthedefaultbehavior,whichfindsdocumentshavingmatchesinanyfieldfromthedefinedones,butsettingthedocumentscoretothescoreofthebestmatchingfield.Themostusefultypewhensearchingformultiplewordsandwantingtoboostdocumentsthathavethosewordsinthesamefield.most_fields:Thisvaluefindsdocumentsthatmatchanyfieldandsetsthescoreofthedocumenttothecombinedscorefromallthematchedfields.cross_fields:Thisvaluetreatsthequeryasifallthetermswereinone,bigfield,thusreturningdocumentsmatchinganyfield.phrase:Thisvalueusesthematch_phrasequeryoneachfieldandsetsthescoreofthedocumenttothescorecombinedfromallthefields.phrase_prefix:Thisvalueusesthematch_phrase_prefixqueryoneachfieldandsetsthescoreofthedocumenttothescorecombinedfromallthefields.

Inadditiontotheparametersmentionedinthematchqueryandtype,themultimatchqueryexposessomeadditionalonesallowingmorecontroloveritsbehavior:

tie_breaker:Thisallowsustospecifythebalancebetweentheminimumandthemaximumscoringqueryitemsandthevaluecanbefrom0.0to1.0.Whenused,thescoreofthedocumentisequaltothebestscoringelementplusthetie_breakermultipliedbythescoreofalltheothermatchingfieldsinthedocument.So,whensetto0.0,Elasticsearchwillonlyusethescoreofthemostscoringmatchingelement.YoucanreadmoreaboutitinThedis_maxquerysectioninthischapter.

www.EBooksWorld.ir

ThequerystringqueryIncomparisontotheotherqueriesavailable,thequerystringquerysupportsfullApacheLucenequerysyntax,whichwediscussedearlierintheLucenequerysyntaxsectionofChapter1,GettingStartedwithElasticsearchCluster.Itusesaqueryparsertoconstructanactualqueryusingtheprovidedtext.Anexamplequerystringquerywilllooklikethefollowingcode:

{

"query":{

"query_string":{

"query":"title:crime^10+title:punishment-otitle:cat+author:

(+Fyodor+dostoevsky)",

"default_field":"title"

}

}

}

BecausewearefamiliarwiththebasicsoftheLucenequerysyntax,wecandiscusshowtheprecedingqueryworks.Asyoucansee,wewantedtogetthedocumentsthatmayhavethetermcrimeinthetitlefieldandsuchdocumentsshouldbeboostedwiththevalueof10.Next,wewantedonlythedocumentsthathavethetermpunishmentinthetitlefieldandwedidn’twantdocumentswiththetermcatintheotitlefield.Finally,wetoldLucenethatweonlywantedthedocumentsthathadthefyodoranddostoevskytermsintheauthorfield.

SimilartomostofthequeriesinElasticsearch,thequerystringqueryprovidesquiteafewparametersthatallowustocontrolthequerybehaviorandthelistofparametersforthisqueryisratherextensive:

query:Thisspecifiesthequerytext.default_field:Thisspecifiesthedefaultfieldthequerywillbeexecutedagainst.Itdefaultstotheindex.query.default_fieldproperty,whichisbydefaultsetto_all.default_operator:Thisspecifiesthedefaultlogicaloperator(ororand)usedwhennooperatorisspecified.Thedefaultvalueofthisparameterisor.analyzer:Thisspecifiesthenameoftheanalyzerusedtoanalyzethequeryprovidedinthequeryparameter.allow_leading_wildcard:Thisspecifiesifawildcardcharacterisallowedasthefirstcharacterofaterm.Itdefaultstotrue.lowercase_expand_terms:Thisspecifiesifthetermsthatarearesultofqueryrewriteshouldbelowercased.Itdefaultstotrue,whichmeansthattherewrittentermswillbelowercased.enable_position_increments:Thisspecifiesifpositionincrementsshouldbeturnedonintheresultquery.Itdefaultstotrue.fuzzy_max_expansions:Thisspecifiesthemaximumnumberoftermsintowhichfuzzyquerywillbeexpanded,iffuzzyqueryisused.Itdefaultsto50.fuzzy_prefix_length:Thisspecifiestheprefixlengthforthegeneratedfuzzy

www.EBooksWorld.ir

queriesanddefaultsto0.Tolearnmoreaboutit,lookatthefuzzyquerydescription.phrase_slop:Thisspecifiesthephraseslopanddefaultsto0.Tolearnmoreaboutit,lookatthephrasematchquerydescription.boost:Thisspecifiestheboostvaluewhichwillbeusedanddefaultsto1.0.analyze_wildcard:Thisspecifiesifthetermsgeneratedbythewildcardqueryshouldbeanalyzed.Itdefaultstofalse,whichmeansthatthosetermswon’tbeanalyzed.auto_generate_phrase_queries:specifiesifthephrasequerieswillbeautomaticallygeneratedfromthequery.Itdefaultstofalse,whichmeansthatthephrasequerieswon’tbeautomaticallygenerated.minimum_should_match:ThiscontrolshowmanyofthegeneratedBooleanshouldclausesshouldbematchedagainstadocumentforthedocumenttobeconsideredahit.Thevaluecanbeprovidedasapercentage;forexample,50%,whichwouldmeanthatatleast50percentofthegiventermsshouldmatch.Itcanalsobeprovidedasanintegervalue,suchas2,whichmeansthatatleast2termsmustmatch.fuzziness:Thiscontrolsthebehaviorofthegeneratedfuzzyquery.Refertothematchquerydescriptionformoreinformation.max_determined_states:Thisdefaultsto10000andsetsthenumberofstatesthattheautomatoncanhaveforhandlingregularexpressionqueries.Itisusedtodisallowveryexpensivequeriesusingregularexpressions.locale:Thissetsthelocalethatshouldbeusedfortheconversionofstringvalues.Bydefault,itissettoROOT.time_zone:Thissetsthetimezonethatshouldbeusedbyrangequeriesthatarerunondatebasedfields.lenient:Thiscantakethevalueoftrueorfalse.Ifsettotrue,format-basedfailureswillbeignored.Bydefault,itissettofalse.

NotethatElasticsearchcanrewritethequerystringqueryand,becauseofthat,Elasticsearchallowsustopassadditionalparametersthatcontroltherewritemethod.However,formoredetailsaboutthisprocess,gototheUnderstandingthequeryingprocesssectioninthischapter.

RunningthequerystringqueryagainstmultiplefieldsItispossibletorunthequerystringqueryagainstmultiplefields.Inordertodothat,oneneedstoprovidethefieldsparameterinthequerybody,whichshouldholdthearrayofthefieldnames.Therearetwomethodsofrunningthequerystringqueryagainstmultiplefields:thedefaultmethodusestheBooleanquerytomakequeriesandtheothermethodcanusethedis_maxquery.

Inordertousethedis_maxquery,oneshouldaddtheuse_dis_maxpropertyinthequerybodyandsetittotrue.Anexamplequerywilllooklikethefollowingcode:

{

"query":{

"query_string":{

"query":"crimepunishment",

"fields":["title","otitle"],

www.EBooksWorld.ir

"use_dis_max":true

}

}

}

www.EBooksWorld.ir

ThesimplequerystringqueryThesimplequerystringqueryusesoneofthenewestqueryparsersinLucene-theSimpleQueryParser(https://lucene.apache.org/core/5_4_0/queryparser/org/apache/lucene/queryparser/simple/SimpleQueryParser.htmlSimilartothequerystringquery,itacceptsLucenequerysyntaxasthequery;however,unlikeit,itneverthrowsanexceptionwhenaparsingerrorhappens.Insteadofthrowinganexception,itdiscardstheinvalidpartsofthequeryandrunstherest.

Anexamplesimplequerystringquerywilllooklikethefollowingcode:

{

"query":{

"simple_query_string":{

"query":"crimepunishment",

"default_operator":"or"

}

}

}

Thequerysupportsparameterssuchasquery,fields,default_operator,analyzer,lowercase_expanded_terms,locale,lenient,andminimum_should_match,andcanalsoberunagainstmultiplefieldsusingthefieldsproperty.

www.EBooksWorld.ir

TheidentifiersqueryThisisasimplequerythatfiltersthereturneddocumentstoonlythosewiththeprovidedidentifiers.Itworksontheinternal_uidfield,soitdoesn’trequirethe_idfieldtobeenabled.Thesimplestversionofsuchaquerywilllooklikethefollowing:

{

"query":{

"ids":{

"values":["1","2","3"]

}

}

}

Thisquerywillonlyreturnthosedocumentsthathaveoneoftheidentifierspresentinthevaluesarray.Wecancomplicatetheidentifiersqueryabitandalsolimitthedocumentsonthebasisoftheirtype.Forexample,ifwewanttoonlyincludedocumentsfromthebooktypes,wewillsendthefollowingquery:

{

"query":{

"ids":{

"type":"book",

"values":["1","2","3"]

}

}

}

Asyoucansee,we’veaddedthetypepropertytoourqueryandwe’vesetitsvaluetothetypeweareinterestedin.

www.EBooksWorld.ir

TheprefixqueryThisqueryissimilartothetermqueryinitsconfigurationandtothemultitermquerywhenlookingintoitslogic.Theprefixqueryallowsustomatchdocumentsthathavethevalueinacertainfieldthatstartswithagivenprefix.Forexample,ifwewanttofindallthedocumentsthathavevaluesstartingwithcriinthetitlefield,wewillrunthefollowingquery:

{

"query":{

"prefix":{

"title":"cri"

}

}

}

Similartothetermquery,youcanalsoincludetheboostattributetoyourprefixquerywhichwillaffecttheimportanceofthegivenprefix.Forexample,ifwewouldliketochangeourpreviousqueryandgiveourqueryaboostof3.0,wewillsendthefollowingquery:

{

"query":{

"prefix":{

"title":{

"value":"cri",

"boost":3.0

}

}

}

}

NoteNotethattheprefixqueryisrewrittenbyElasticsearchandbecauseofthatElasticsearchallowsustopassanadditionalparameter,thatis,controllingtherewritemethod.However,formoredetailsaboutthatprocess,refertotheUnderstandingthequeryingprocesssectioninthischapter.

www.EBooksWorld.ir

ThefuzzyqueryThefuzzyqueryallowsustofinddocumentsthathavevaluessimilartotheoneswe’veprovidedinthequery.Thesimilarityoftermsiscalculatedonthebasisoftheeditdistancealgorithm.Theeditdistanceiscalculatedonthebasisoftermsweprovideinthequeryandagainstthesearcheddocuments.ThisquerycanbeexpensivewhenitcomestoCPUresources,butcanhelpuswhenweneedfuzzymatching;forexample,whenusersmakespellingmistakes.Inourexample,let’sassumethatinsteadofcrime,ouruserentersthecrmewordintothesearchboxandwewouldliketorunthesimplestformoffuzzyquery.Suchaquerywilllooklikethis:

{

"query":{

"fuzzy":{

"title":"crme"

}

}

}

Theresponseforsuchaquerywillbeasfollows:

{

"took":81,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.5,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"4",

"_score":0.5,

"_source":{

"title":"CrimeandPunishment",

"otitle":"Преступлéниеинаказáние",

"author":"FyodorDostoevsky",

"year":1886,

"characters":["Raskolnikov","SofiaSemyonovnaMarmeladova"],

"tags":[],

"copies":0,

"available":true

}

}]

}

}

Eventhoughwemadeatypo,Elasticsearchmanagedtofindthedocumentswewereinterestedin.

www.EBooksWorld.ir

Wecancontrolthefuzzyquerybehaviorbyusingthefollowingparameters:

value:Thisspecifiestheactualquery.boost:Thisspecifiestheboostvalueforthequery.Itdefaultsto1.0.fuzziness:Thiscontrolsthebehaviorofthegeneratedfuzzyquery.Refertothematchquerydescriptionformoreinformation.prefix_length:Thisisthelengthofthecommonprefixofthedifferencingterms.Itdefaultsto0.max_expansions:Thisspecifiesthemaximumnumberoftermsthequerywillbeexpandedto.Thedefaultvalueisunbounded.

Theparametersshouldbewrappedinthenameofthefieldwearerunningthequeryagainst.Soifwewouldliketomodifythepreviousqueryandaddadditionalparameters,thequerywilllooklikethefollowingcode:

{

"query":{

"fuzzy":{

"title":{

"value":"crme",

"fuzziness":2

}

}

}

}

www.EBooksWorld.ir

ThewildcardqueryAquerythatallowsustouse*and?wildcardsinthevalueswesearch.Apartfromthat,thewildcardqueryisverysimilartothetermqueryincaseofitsbody.Tosendaquerythatwouldmatchallthedocumentswiththevalueofthecr?meterm(?matchinganycharacter)wewouldsendthefollowingquery:

{

"query":{

"wildcard":{

"title":"cr?me"

}

}

}

Itwillmatchthedocumentsthathaveallthetermsmatchingcr?meinthetitlefield.However,youcanalsoincludetheboostattributetoyourwildcardquerywhichwillaffecttheimportanceofeachtermthatmatchesthegivenvalue.Forexample,ifwewouldliketochangeourpreviousqueryandgiveourtermqueryaboostof20.0,wewillsendthefollowingquery:

{

"query":{

"wildcard":{

"title":{

"value":"cr?me",

"boost":20.0

}

}

}

}

NoteNotethatwildcardqueriesarenotveryperformanceorientedqueriesandshouldbeavoidedifpossible;especiallyavoidleadingwildcards(termsstartingwithwildcards).ThewildcardqueryisrewrittenbyElasticsearchandbecauseofthatElasticsearchallowsustopassanadditionalparameter,thatis,controllingtherewritemethod.Formoredetailsaboutthisprocess,refertotheUnderstandingthequeryingprocesssectioninthischapter.Alsorememberthatthewildcardqueryisnotanalyzed.

www.EBooksWorld.ir

TherangequeryAquerythatallowsustofinddocumentsthathaveafieldvaluewithinacertainrangeandwhichworksfornumericalfieldsaswellasforstring-basedfieldsanddatebasedfields(justmapstoadifferentApacheLucenequery).Therangequeryshouldberunagainstasinglefieldandthequeryparametersshouldbewrappedinthefieldname.Thefollowingparametersaresupported:

gte:Thequerywillmatchdocumentswiththevaluegreaterthanorequaltotheoneprovidedwiththisparametergt:Thequerywillmatchdocumentswiththevaluegreaterthantheoneprovidedwiththisparameterlte:Thequerywillmatchdocumentswiththevaluelowerthanorequaltotheoneprovidedwiththisparameterlt:Thequerywillmatchdocumentswiththevaluelowerthantheoneprovidedwiththisparameter

Soforexample,ifwewanttofindallthebooksthathavethevaluefrom1700to1900intheyearfield,wewillrunthefollowingquery:

{

"query":{

"range":{

"year":{

"gte":1700,

"lte":1900

}

}

}

}

www.EBooksWorld.ir

RegularexpressionqueryRegularexpressionqueryallowsustouseregularexpressionsasthequerytext.Rememberthattheperformanceofsuchqueriesdependsonthechosenregularexpression.Ifourregularexpressionwouldmatchmanyterms,thequerywillbeslow.Thegeneralruleisthatthemoretermsmatchedbytheregularexpression,theslowerthequerywillbe.

Anexampleregularexpressionquerylookslikethis:

{

"query":{

"regexp":{

"title":{

"value":"cr.m[ae]",

"boost":10.0

}

}

}

}

TheprecedingquerywillresultinElasticsearchrewritingthequery.Therewrittenquerywillhavethenumberoftermqueriesdependingonthecontentofourindexmatchingthegivenregularexpression.Theboostparameterseeninthequeryspecifiestheboostvalueforthegeneratedqueries.

ThefullregularexpressionsyntaxacceptedbyElasticsearchcanbefoundathttps://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-regexp-query.html#regexp-syntax.

www.EBooksWorld.ir

ThemorelikethisqueryOneofthequeriesthatgotamajorreworkinElasticsearch2.0,themorelikethisqueryallowsustoretrievedocumentsthataresimilar(ornotsimilar)totheprovidedtextortothedocumentsthatwereprovided.

Themorelikethisqueryallowsustogetdocumentsthataresimilartotheprovidedtext.Elasticsearchsupportsafewparameterstodefinehowthemorelikethisqueryshouldwork:

fields:Anarrayoffieldsthatthequeryshouldberunagainst.Itdefaultstothe_allfield.like:Thisparametercomesintwoflavors:itallowsustoprovideatextwhichthereturneddocumentsshouldbesimilartooranarrayofdocumentsthatthereturningdocumentshouldbesimilarto.unlike:Thisissimilartothelikeparameter,butitallowsustodefinetextordocumentsthatourreturningdocumentshouldnotbesimilarto.min_term_freq:Theminimumtermfrequency(forthetermsinthedocuments)belowwhichtermswillbeignored.Itdefaultsto2.max_query_terms:Themaximumnumberoftermsthatwillbeincludedinanygeneratedquery.Itdefaultsto25.Thehighervaluemaymeanhigherprecision,butlowerperformance.stop_words:Anarrayofwordsthatwillbeignoredwhencomparingdocumentsandthequery.Itisemptybydefault.min_doc_freq:Theminimumnumberofdocumentsinwhichthetermhastobepresentinordernottobeignored.Itdefaultsto5,whichmeansthatatermneedstobepresentinatleastfivedocuments.max_doc_freq:Themaximumnumberofdocumentsinwhichthetermmaybepresentinordernottobeignored.Bydefault,itisunbounded(setto0).min_word_len:Theminimumlengthofasinglewordbelowwhichawordwillbeignored.Itdefaultsto0.max_word_len:Themaximumlengthofasinglewordabovewhichitwillbeignored.Itdefaultstounbounded(whichmeanssettingthevalueto0).boost_terms:Theboostvaluethatwillbeusedwhenboostingeachterm.Itdefaultsto0.boost:Theboostvaluethatwillbeusedwhenboostingthequery.Itdefaultsto1.include:Thisspecifiesiftheinputdocumentsshouldbeincludedintheresultsreturnedbythequery.Itdefaultstofalse,whichmeansthattheinputdocumentswon’tbeincluded.minimum_should_match:Thiscontrolsthenumberoftermsthatneedtobematchedintheresultingdocuments.Bydefault,itissetto30%.analyzer:Thenameoftheanalyzerthatwillbeusedtoanalyzethetextweprovided.

Anexampleforamorelikethisquerylookslikethis:

www.EBooksWorld.ir

{

"query":{

"more_like_this":{

"fields":["title","otitle"],

"like":"crimeandpunishment",

"min_term_freq":1,

"min_doc_freq":1

}

}

}

Aswesaidearlier,thelikepropertycanalsobeusedtoshowwhichdocumentstheresultsshouldbesimilarto.Forexample,thefollowingisthequerythatwillusethelikepropertytopointtoagivendocument(notethatthefollowingquerywon’treturndocumentsonourexampledata):

{

"query":{

"more_like_this":{

"fields":["title","otitle"],

"min_term_freq":1,

"min_doc_freq":1,

"like":[

{

"_index":"library",

"_type":"book",

"_id":"4"

}

]

}

}

}

Wecanalsomixthedocumentsandtexttogether:

{

"query":{

"more_like_this":{

"fields":["title","otitle"],

"min_term_freq":1,

"min_doc_freq":1,

"like":[

{

"_index":"library",

"_type":"book",

"_id":"4"

},

"crimeandpunishment"

]

}

}

}

www.EBooksWorld.ir

www.EBooksWorld.ir

CompoundqueriesIntheBasicqueriessectionofthischapter,wediscussedthesimplestqueriesexposedbyElasticsearch.WealsotalkedaboutthepositionawarequeriescalledspanqueriesintheSpanqueriessection.However,thesimpleonesandthespanqueriesarenottheonlyqueriesthatElasticsearchprovides.Thecompoundqueries,aswecallthem,allowustoconnectmultiplequeriestogetheroralterthebehaviorofotherqueries.Youmaywonderifyouneedsuchfunctionality.Yourdeploymentmaynotneedit,butanythingapartfromasimplequerywillprobablyrequirecompoundqueries.Forexample,combiningasimpletermquerywithamatch_phrasequerytogetbettersearchresultsmaybeagoodcandidateforcompoundqueriesusage.

www.EBooksWorld.ir

TheboolqueryTheboolqueryallowsustowrapavirtuallyunboundednumberofqueriesandconnectthemwithalogicalvalueusingoneofthefollowingsections:

should:Thequerywrappedintothissectionmayormaynotmatch.Thenumberofshouldsectionsthathavetomatchiscontrolledbytheminimum_should_matchparametermust:Thequerywrappedintothissectionmustmatchinorderforthedocumenttobereturned.must_not:Thequerywhenwrappedintothissectionmustnotmatchinorderforthedocumenttobereturned.

Eachoftheprecedingmentionedsectionscanbepresentmultipletimesinasingleboolquery.Thisallowsustobuildverycomplexqueriesthathavemultiplelevelsofnesting(youcanincludetheboolqueryinanotherboolquery).Rememberthatthescoreoftheresultingdocumentwillbecalculatedbytakingasumofallthewrappedqueriesthatthedocumentmatched.

Inadditiontotheprecedingsections,wecanaddthefollowingparameterstothequerybodytocontrolitsbehavior:

filter:Thisallowsustospecifythepartofthequerythatshouldbeusedasafilter.YoucanreadmoreaboutfiltersintheFilteringyourresultssectioninChapter4,ExtendingYourQueryingKnowledge.boost:Thisspecifiestheboostusedinthequery,defaultingto1.0.Thehighertheboost,thehigherthescoreofthematchingdocument.minimum_should_match:Thisdescribestheminimumnumberofshouldclausesthathavetomatchinorderforthecheckeddocumenttobecountedasamatch.Forexample,itcanbeanintegervaluesuchas2orapercentagevaluesuchas75%.Formoreinformation,refertohttps://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-minimum-should-match.html.disable_coord:ABooleanparameter(defaultstofalse),whichallowsustoenableordisablethescorefactorcomputationthatisbasedonthefractionofallthequerytermsthatadocumentcontains.Weshouldsetittotrueforlessprecisescoring,butslightlyfasterqueries.

Imaginethatwewanttofindallthedocumentsthathavethetermcrimeinthetitlefield.Inaddition,thedocumentsmayormaynothavearangeof1900to2000intheyearfieldandmaynothavethenothingtermintheotitlefield.Suchaquerymadewiththeboolquerywilllookasfollows:

{

"query":{

"bool":{

"must":{

"term":{

"title":"crime"

www.EBooksWorld.ir

}

},

"should":{

"range":{

"year":{

"from":1900,

"to":2000

}

}

},

"must_not":{

"term":{

"otitle":"nothing"

}

}

}

}

}

NoteNotethatthemust,should,andmust_notsectionscancontainasinglequeryoranarrayofqueries.

www.EBooksWorld.ir

Thedis_maxqueryThedis_maxqueryisveryusefulasitgeneratesaunionofdocumentsreturnedbyallthesubqueriesandreturnsitastheresult.Thegoodthingaboutthisqueryisthefactthatwecancontrolhowthelowerscoringsubqueriesaffectthefinalscoreofthedocuments.Forthedis_maxquery,wespecifythequeriesusingthequeriesproperty(queryoranarrayofqueries)andthetiebreaker,withthetie_breakerproperty.Wecanalsoincludeadditionalboostbyspecifyingtheboostparameter.

Thefinaldocumentscoreiscalculatedasthesumofscoresofthemaximumscoringqueryandthesumofscoresreturnedfromtherestofthequeries,multipliedbythevalueofthetieparameter.So,thetie_breakerparameterallowsustocontrolhowthelowerscoringqueriesaffectthefinalscore.Ifwesetthetie_breakerparameterto1.0,wegettheexactsum,whilesettingthetieparameterto0.1resultsinonly10percentofthescores(ofallthescoresapartfromthemaximumscoringquery)beingaddedtothefinalscore.

Anexampleofthedis_maxqueryisasfollows:

{

"query":{

"dis_max":{

"tie_breaker":0.99,

"boost":10.0,

"queries":[

{

"match":{

"title":"crime"

}

},

{

"match":{

"author":"fyodor"

}

}

]

}

}

}

Asyoucansee,weincludedthetie_breakerandboostparameters.Inadditiontothat,wespecifiedthequeriesparameterthatholdsthearrayofqueriesthatwillberunandusedtogeneratetheunionofdocumentsforresults.

www.EBooksWorld.ir

TheboostingqueryTheboostingquerywrapsaroundtwoqueriesandlowersthescoreofthedocumentsreturnedbyoneofthequeries.Therearethreesectionsoftheboostingquerythatneedtobedefined:thepositivesectionthatholdsthequerywhosedocumentscorewillbeleftunchanged,thenegativesectionwhoseresultingdocumentswillhavetheirscorelowered,andthenegative_boostsectionthatholdstheboostvaluethatwillbeusedtolowerthesecondsection’squeryscore.Theadvantageoftheboostingqueryisthattheresultsofboththequeries(thenegativeandthepositiveones)willbepresentintheresults,althoughthescoresofsomequerieswillbelowered.Forcomparison,ifweweretousetheboolquerywiththemust_notsection,wewouldn’tgettheresultsforsuchaquery.

Let’sassumethatwewanttohavetheresultsofasimpletermqueryforthetermcrimeinthetitlefieldandwantthescoreofsuchdocumentstonotbechanged.However,wealsowanttohavethedocumentsthatrangefrom1800to1900intheyearfield,andthescoresofdocumentsreturnedbysuchaquerytohaveanadditionalboostof0.5.Suchaquerywilllooklikethefollowing:

{

"query":{

"boosting":{

"positive":{

"term":{

"title":"crime"

}

},

"negative":{

"range":{

"year":{

"from":1800,

"to":1900

}

}

},

"negative_boost":0.5

}

}

}

www.EBooksWorld.ir

Theconstant_scorequeryTheconstant_scorequerywrapsanotherqueryandreturnsaconstantscoreforeachdocumentreturnedbythewrappedquery.Wespecifythescorethatshouldbegiventothedocumentsbyusingtheboostproperty,whichdefaultsto1.0.Itallowsustostrictlycontrolthescorevalueassignedforadocumentmatchedbyaquery.Forexample,ifwewanttohaveascoreof2.0forallthedocumentsthathavethetermcrimeinthetitlefield,wesendthefollowingquerytoElasticsearch:

{

"query":{

"constant_score":{

"query":{

"term":{

"title":"crime"

}

},

"boost":2.0

}

}

}

www.EBooksWorld.ir

TheindicesqueryTheindicesqueryisusefulwhenexecutingaqueryagainstmultipleindices.Itallowsustoprovideanarrayofindices(theindicesproperty)andtwoqueries,onethatwillbeexecutedifwequerytheindexfromthelist(thequeryproperty)andthesecondthatwillbeexecutedonalltheotherindices(theno_match_queryproperty).Forexample,assumewehaveanaliasnamedbooks,holdingtwoindices:libraryandusers.Whatwewanttodoisusethisalias.However,wewanttorundifferentqueriesdependingonwhichindexisusedforsearching.Anexamplequeryfollowingthislogicwilllookasfollows:

{

"query":{

"indices":{

"indices":["library"],

"query":{

"term":{

"title":"crime"

}

},

"no_match_query":{

"term":{

"user":"crime"

}

}

}

}

}

Intheprecedingquery,thequerydescribedinthequerypropertywasrunagainstthelibraryindexandthequerydefinedintheno_match_querysectionwasrunagainstalltheotherindicespresentinthecluster,whichforourhypotheticalaliasmeanstheusersindex.

Theno_match_querypropertycanalsohaveastringvalueinsteadofaquery.Thisstringvaluecaneitherbeallornone,butitdefaultstoall.Iftheno_match_querypropertyissettoall,thedocumentsfromtheindicesthatdon’tmatchwillbereturned.Settingtheno_match_querypropertytononewillresultinnodocumentsfromtheindicesthatdon’tmatchthequeryfromthatsection.

www.EBooksWorld.ir

www.EBooksWorld.ir

UsingspanqueriesElasticsearchleveragesLucenespanqueries,whichallowustomakequerieswhensometokensorphrasesarenearothertokensorphrases.Basically,wecancallthempositionawarequeries.Whenusingthestandardnonspanqueries,wearenotabletomakequeriesthatarepositionaware;tosomeextent,thephrasequeriesallowthat,butonlytosomeextent.So,forElasticsearchandtheunderlyingLucene,itdoesn’tmatterifthetermisinthebeginningofthesentenceorattheendornearanotherterm.Whenusingspanqueries,itdoesmatter.

ThefollowingspanqueriesareexposedinElasticsearch:

spantermqueryspanfirstqueryspannearqueryspanorqueryspannotqueryspanwithinqueryspancontainingqueryspanmultiquery

Beforewecontinuewiththedescription,let’sindexadocumenttoacompletelynewindexthatwewillusetoshowhowspanquerieswork.Todothis,weusethefollowingcommand:

curl-XPUT'localhost:9200/spans/book/1'-d'{

"title":"Testbook",

"author":"Testauthor",

"description":"Theworldbreakseveryone,andafterward,somearestrong

atthebrokenplaces"

}'

www.EBooksWorld.ir

AspanAspan,inourcontext,isastartingandendingtokenpositioninafield.Forexample,inourcase,theworldbreakseveryonecouldbeasinglespan,aworldcanbeasinglespantoo.Asyoumayknow,duringanalysis,Lucene,inadditiontotoken,includessomeadditionalparameters,suchaspositioninthetokenstream.PositioninformationcombinedwiththetermsallowsustoconstructspansusingElasticsearchspanqueries(whicharemappedtoLucenespanqueries).Inthenextfewpages,wewilllearnhowtoconstructspansusingdifferentspanqueriesandhowtocontrolwhichdocumentsarematched.

www.EBooksWorld.ir

SpantermqueryThespan_termqueryisabuilderfortheotherspanqueries.Aspan_termqueryisaquerysimilartothealreadydiscussedtermquery.Onitsown,itworksjustlikethementionedtermquery–itmatchesaterm.Itsdefinitionissimpleandlooksasfollows(weomittedsomepartsofthequeriesonpurpose,becausewewilldiscussitlater):

{

"query":{

...

"span_term":{

"description":{

"value":"world",

"boost":5.0

}

}

}

}

Asyoucansee,itisverysimilartothestandardtermquery.Theabovequeryisrunagainstthedescriptionfieldandwewanttohavethedocumentsthathavetheworldtermreturned.Wealsospecifiedtheboost,whichisalsoallowed.

Onethingtorememberisthatthespan_termquery,similartothestandardtermquery,isnotanalyzed.

www.EBooksWorld.ir

SpanfirstqueryThespanfirstqueryallowsustomatchdocumentsthathavematchesonlyinthefirstpositionsofthefield.Inordertodefineaspanfirstquery,weneedtonestinsideofitanyotherspanquery;forexample,aspantermquerywealreadyknow.So,let’sfindthedocumentthathasthetermworldinthefirsttwopositionsinthedescriptionfield.Wedothatbysendingthefollowingquery:

{

"query":{

"span_first":{

"match":{

"span_term":{"description":"world"}

},

"end":2

}

}

}

Intheresults,wewillgetthedocumentthatwehadindexedinthebeginningofthissection.Inthematchsectionofthespanfirstquery,weshouldincludeatleastasinglespanquerythatshouldbematchedatthemaximumpositionspecifiedbytheendparameter.

So,tounderstandeverythingwell,ifwesettheendparameterto1,weshouldn’tgetourdocumentwiththepreviousquery.So,let’scheckitbysendingthefollowingquery:

{

"query":{

"span_first":{

"match":{

"span_term":{"description":"world"}

},

"end":1

}

}

}

Theresponsetotheprecedingquerywillbeasfollows:

{

"took":1,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":0,

"max_score":null,

"hits":[]

}

}

www.EBooksWorld.ir

Soitisworkingasexpected.Thisisbecausethefirstterminourindexwillbethetermtheandnotthetermworldwhichwesearchedfor.

www.EBooksWorld.ir

SpannearqueryThespannearqueryallowsustomatchdocumentsthathaveotherspansneareachotherandwecancallthisqueryacompoundqueryasitwrapsanotherspanquery.Forexample,ifwewanttofinddocumentsthathavethetermworldnearthetermeveryone,wewillrunthefollowingquery:

{

"query":{

"span_near":{

"clauses":[

{"span_term":{"description":"world"}},

{"span_term":{"description":"everyone"}}

],

"slop":0,

"in_order":true

}

}

}

Asyoucansee,wespecifyourqueriesintheclausessectionofthespannearquery.Itisanarrayofotherspanqueries.Theslopparameterdefinestheallowednumberoftermsbetweenthespans.Thein_orderparametercanbeusedtolimitthematchesonlytothosedocumentsthatmatchourqueriesinthesameorderthattheyweredefinedin.So,inourcase,wewillgetdocumentsthathaveworldeveryone,butnoteveryoneworldinthedescriptionfield.

Solet’sgetbacktoourquery,rightnowitwouldreturn0results.Ifyoulookatourexampledocument,youwillnoticethatbetweenthetermsworldandeveryone,anadditionaltermispresentandwesettheslopparameterto0(slopwasdiscussedduringthephrasequerydescription).Ifweincreaseitto1,wewillgetourresult.Totestit,let’ssendthefollowingquery:

{

"query":{

"span_near":{

"clauses":[

{"span_term":{"description":"world"}},

{"span_term":{"description":"everyone"}}

],

"slop":1,

"in_order":true

}

}

}

TheresultsreturnedbyElasticsearchareasfollows:

{

"took":6,

"timed_out":false,

"_shards":{

"total":5,

www.EBooksWorld.ir

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.10848885,

"hits":[{

"_index":"spans",

"_type":"book",

"_id":"1",

"_score":0.10848885,

"_source":{

"title":"Testbook",

"author":"Testauthor",

"description":"Theworldbreakseveryone,andafterward,someare

strongatthebrokenplaces"

}

}]

}

}

Aswecansee,thealteredquerysuccessfullyreturnedourindexeddocument.

www.EBooksWorld.ir

SpanorqueryThespanorqueryallowsustowrapotherspanqueriesandaggregatematchesofallthosethatwe’vewrapped.Similartothespan_nearquery,thespan_orqueryusesthearrayofclausestospecifyotherspanqueries.Forexample,ifwewanttogetthedocumentsthathavethetermworldinthefirsttwopositionsofthedescriptionfield,ortheonesthathavethetermworldnotfurtherthanasinglepositionfromthetermeveryone,wewillsendthefollowingquerytoElasticsearch:

{

"query":{

"span_or":{

"clauses":[

{

"span_first":{

"match":{

"span_term":{"description":"world"}

},

"end":2

}

},

{

"span_near":{

"clauses":[

{"span_term":{"description":"world"}},

{"span_term":{"description":"everyone"}}

],

"slop":1,

"in_order":true

}

}

]

}

}

}

Theresultoftheprecedingquerywillreturnourindexeddocument.

www.EBooksWorld.ir

SpannotqueryThespannotqueryallowsustospecifytwosectionsofqueries.Thefirstistheincludesectionwhichspecifieswhichspanqueriesshouldbematchedandthesecondsectionistheexcludeonewhichspecifiesthespanquerieswhichshouldn’tbeoverlappingthefirstones.Tokeepitsimple,ifaqueryfromtheexcludeonematchesthesamespan(orapartofit)asthequeryfromtheincludesection,suchadocumentwon’tbereturnedasamatchforsuchaspannotquery.Eachofthesesectionscancontainmultiplespanqueries.

So,toillustratethatquery,let’smakeaquerythatwillreturnallthedocumentsthathavethespanconstructedfromasingletermandwhichhavethetermbreaksinthedescriptionfield.Let’salsoexcludethedocumentsthathaveaspanwhichmatchesthetermsworldandeveryoneatthemaximumofasinglepositionfromeachother,whensuchaspanoverlapstheonedefinedinthefirstspanquery.

{

"query":{

"span_not":{

"include":{

"span_term":{"description":"breaks"}

},

"exclude":{

"span_near":{

"clauses":[

{"span_term":{"description":"world"}},

{"span_term":{"description":"everyone"}}

],

"slop":1

}

}

}

}

}

Thefollowingistheresult:

{

"took":1,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":0,

"max_score":null,

"hits":[]

}

}

Asyouwouldhavenoticed,theresultofthequeryisaswewouldhaveexpected.Ourdocumentwasn’tfoundbecausethespanqueryfromtheexcludesectionwasoverlapping

www.EBooksWorld.ir

thespanfromtheincludesection.

www.EBooksWorld.ir

SpanwithinqueryThespan_withinqueryallowsustofinddocumentsthathaveaspanenclosedinanotherspan.Wedefinetwosectionsinthespan_withinquery:thelittleandthebig.Thelittlesectiondefinesaspanquerythatneedstobeenclosedbythespanquerydefinedusingthebigsection.

Forexample,ifwewouldliketofindadocumentthathasthetermworldnearthetermbreaksandthosetermsshouldbeinsideaspanthatisboundbythetermsworldandafterwardnotmorethan10termsfromeachother,thequerythatdoesthatwilllookasfollows:

{

"query":{

"span_within":{

"little":{

"span_near":{

"clauses":[

{"span_term":{"description":"world"}},

{"span_term":{"description":"breaks"}}

],

"slop":0,

"in_order":false

}

},

"big":{

"span_near":{

"clauses":[

{"span_term":{"description":"world"}},

{"span_term":{"description":"afterward"}}

],

"slop":10,

"in_order":false

}

}

}

}

}

www.EBooksWorld.ir

SpancontainingqueryThespan_contaningquerycanbeseenastheoppositeofthespan_withinquerywejustdiscussed.Itallowsustomatchspansthatoverlapotherspans.Again,weusetwosectionswiththespanqueries:thelittleandthebig.Thelittlesectiondefinesaspanquerythatneedstobeenclosedbythespanquerydefinedusingthebigsection.

Wecanusethesameexample.Ifwewouldliketofindadocumentthathasthetermworldnearthetermbreaks,andthosetermsshouldbeinsideaspanthatisboundbythetermsworldandafterwardnotmorethan10termsfromeachother,thequerythatdoesthatwilllookasfollows:

{

"query":{

"span_containing":{

"little":{

"span_near":{

"clauses":[

{"span_term":{"description":"world"}},

{"span_term":{"description":"breaks"}}

],

"slop":0,

"in_order":false

}

},

"big":{

"span_near":{

"clauses":[

{"span_term":{"description":"world"}},

{"span_term":{"description":"afterward"}}

],

"slop":10,

"in_order":false

}

}

}

}

}

www.EBooksWorld.ir

SpanmultiqueryThelasttypeofspanquerythatElasticsearchsupportsisthespan_multiquery.Itallowsustowrapanymultitermquerythatwe’vediscussed(thetermquery,therangequery,thewildcardquery,theregexquery,thefuzzyquery,ortheprefixquery)asaspanquery.

Forexample,ifwewanttofinddocumentsthathavethetermstartingwiththeprefixworinthefirsttwopositionsinthedescriptionfield,wecandothatbysendingthefollowingquery:

{

"query":{

"span_multi":{

"match":{

"prefix":{

"description":{"value":"wor"}

}

}

}

}

}

Thereisonethingtoremember–themultitermquerythatwewanttouseneedstobeenclosedinthematchsectionofthespan_multiquery.

www.EBooksWorld.ir

PerformanceconsiderationsAfewwordsattheendofdiscussingspanqueries.Rememberthattheyarecostlierwhenitcomestoprocessingpower,becausenotonlydothetermshavetobematchedbutalsopositionshavetobecalculatedandchecked.ThismeansthatLuceneandthusElasticsearchwillneedmoreCPUcyclestocalculatealltheneededinformationtofindmatchingdocuments.Youcanexpectspanqueriestobeslowerthanthequeriesthatdon’ttakepositionsintoaccount.

www.EBooksWorld.ir

www.EBooksWorld.ir

ChoosingtherightqueryBynowwe’veseenwhatqueriesareavailableinElasticsearch,boththesimpleonesandtheonesthatcangroupotherqueriesaswell.Beforecontinuingwithmorecomplicatedtopics,wewouldliketodiscusswhichofthequeriesshouldbeusedforwhichusecase.Ofcourse,onecoulddedicatethewholebooktoshowingdifferentqueriesusecases,sowewillonlyshowafewofthemtohelpyouseewhatyoucanexpectandwhichquerytouse.

www.EBooksWorld.ir

TheusecasesAsyoualreadyknowwhichqueriescanbeusedtofindwhichdata,whatwewouldliketoshowyouareexampleusecasesusingthedataweindexedinChapter2,IndexingYourData.Todothis,wewillstartwithafewguidinglinesonhowtochosethequeryandthenwewillshowyouexampleusecasesanddiscusswhythosequeriescouldbeused.

LimitingresultstogiventagsOneofthesimplestexamplesofqueryingElasticsearchisthesearchforexactterms.ByexactwemeancharactertocharactercomparisonofatermthatisindexedandwrittenintoLuceneinvertedindex.Torunsuchaquery,wecanusethetermqueryprovidedbyElasticsearch.ThisisbecauseitscontentisnotanalyzedbyElasticsearch.Forexample,let’sassumethatwewouldliketosearchforallthebookswiththevaluenovelinthetagsfield,whichasweknowfromthemappingsisnotanalyzed.Todothat,wewouldrunthefollowingcommand:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"term":{

"tags":"novel"

}

}

}'

www.EBooksWorld.ir

SearchingforvaluesinarangeOneofthesimplestqueriesthatcanberunisaquerymatchingdocumentsinagivenrangeofvalues.Usuallysuchqueriesareapartofalargerqueryorafilter.Forexample,aquerythatwouldreturnbookswiththenumberofcopiesfrom1to3inclusive,wouldlookasfollows:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"range":{

"copies":{

"gte":1,

"lte":3

}

}

}

}'

BoostingsomeofthematcheddocumentsTherearemanycommonexamplesofusingtheboolquery.Forexample,verysimpleoneslikefindingdocumentshavingalistofterms.Whatwewouldliketoshowyouishowtousetheboolquerytoboostsomeofthedocuments.Forexample,ifwewanttofindallthedocumentsthathaveoneormorecopyandhavetheonesthatarepublishedafter1950,wewillrunthefollowingquery:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"bool":{

"must":[

{

"range":{

"copies":{

"gte":1

}

}

}

],

"should":[

{

"range":{

"year":{

"gt":1950

}

}

}

]

}

}

}'

IgnoringlowerscoringpartialqueriesThedis_maxquery,aswediscussed,allowsustocontrolhowinfluentialthelowerscoring

www.EBooksWorld.ir

partialqueriesare.Forexample,ifwewouldonlywanttoassignthescoreofthehighestscoringpartialqueryforthedocumentsmatchingcrimepunishmentinthetitlefieldorraskolnikovinthecharactersfield,wewouldrunthefollowingquery:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"fields":["_id","_score"],

"query":{

"dis_max":{

"tie_breaker":0.0,

"queries":[

{

"match":{

"title":"crimepunishment"

}

},

{

"match":{

"characters":"raskolnikov"

}

}

]

}

}

}'

Theresultfortheprecedingquerywilllookasfollows:

{

"took":2,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.70710677,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"4",

"_score":0.70710677

}]

}

}

Nowlet’sseethescoreofthepartialqueriesalone.Todothat,wewillrunthepartialqueriesusingthefollowingcommands:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"fields":["_id","_score"],

"query":{

"match":{

"title":"crimepunishment"

}

www.EBooksWorld.ir

}

}'

Theresponsefortheprecedingqueryisasfollows:

{

"took":4,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.70710677,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"4",

"_score":0.70710677

}]

}

}

Thefollowingisthenextcommand:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"fields":["_id","_score"],

"query":{

"match":{

"characters":"raskolnikov"

}

}

}'

Theresponseisasfollows:

{

"took":2,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.5,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"4",

"_score":0.5

}]

}

}

www.EBooksWorld.ir

Asyoucansee,thescoreofthedocumentreturnedbyourdis_maxqueryisequaltothescoreofthehighestscoringpartialquery(thefirstpartialquery).Thatisbecausewesetthetie_breakerpropertyto0.0.

UsingLucenequerysyntaxinqueriesHavingasimplesearchsyntaxisveryusefulforusersandwealreadyhavesuch–theLucenequerysyntax.Usingthequery_stringqueryisanexamplewherewecanleveragethatbyallowingtheuserstotypeinquerieswithadditionalcontrolcharacters.Forexample,ifwewouldliketofindbookshavingthetermscrimeandpunishmentintheirtitleandthefyodordostoevskyphraseintheauthorfield,andnotbeingpublishedbetween2000(exclusive)and2015(inclusive),wewouldusethefollowingcommand:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"query_string":{

"query":"+title:crime+title:punishment+author:\"fyodordostoevsky\"

-copies:{2000TO2015]"

}

}

}'

Asyoucansee,weusedtheLucenequerysyntaxtopassallthematchingrequirementsandweletthequeryparserconstructtheappropriatequery.

HandlinguserquerieswithouterrorsUsingthequery_stringqueryisveryhandy,butitisnoterrortolerant.IfouruserprovidesincorrectLucenesyntax,thequerywillreturnanerror.Becauseofthat,ElasticsearchexposesasecondquerythatsupportsanalysisandfullLucenequerysyntax–thesimple_query_stringquery.Usingsuchaqueryallowsustoruntheuserqueriesandnotcareabouttheparsingerrorsatall.Forexample,let’slookatthefollowingquery:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"query_string":{

"query":"+crime+punishment\"",

"default_field":"title"

}

}

}'

Theresponsewillcontain:

{

"error":{

"root_cause":[{

"type":"query_parsing_exception",

"reason":"Failedtoparsequery[+crime+punishment\"]",

"index":"library",

"line":6,

"col":3

}],

"type":"search_phase_execution_exception",

www.EBooksWorld.ir

"reason":"allshardsfailed",

"phase":"query",

"grouped":true,

"failed_shards":[{

"shard":0,

"index":"library",

"node":"7jznW07BRrqjG-aJ7iKeaQ",

"reason":{

"type":"query_parsing_exception",

"reason":"Failedtoparsequery[+crime+punishment\"]",

"index":"library",

"line":6,

"col":3,

"caused_by":{

"type":"parse_exception",

"reason":"Cannotparse'+crime+punishment\"':Lexicalerror

atline1,column21.Encountered:<EOF>after:\"\"",

"caused_by":{

"type":"token_mgr_error",

"reason":"Lexicalerroratline1,column21.

Encountered:<EOF>after:\"\""

}

}

}

}]

},

"status":400

}

Thismeansthatthequerywasnotproperlyconstructedandaparseerrorhappened.That’swhythesimple_query_stringquerywasintroduced.Itusesaqueryparserthattriestohandleusermistakesandtriestoguesshowthequeryshouldlook.Ourqueryusingthatparserwilllookasfollows:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"simple_query_string":{

"query":"+crime+punishment\"",

"fields":["title"]

}

}

}'

Ifyouruntheprecedingquery,youwillseethattheproperdocumentisreturnedbyElasticsearcheventhoughthequeryisnotproperlyconstructed.

AutocompleteusingprefixesAverycommonusecaseistoprovideautocompletefunctionalityontheindexeddata.Asweknow,theprefixqueryisnotanalyzedandworksonthebasisoftermsindexedinthefield.Sotheactualfunctionalitydependsonwhichtokensareproducedduringindexing.Forexample,let’sassumethatwewouldliketoprovideautocompletefunctionalityonanytokeninthetitlefieldandtheuserprovidedwesprefix.Aquerythatwouldmatchsucharequirementlooksasfollows:

www.EBooksWorld.ir

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"prefix":{

"title":"wes"

}

}

}'

FindingtermssimilartoagivenoneAverysimpleexampleisusingthefuzzyquerytofinddocumentshavingatermsimilartoagivenone.Forexample,ifwewanttofindallthedocumentshavingavaluesimilartocrimea,wewillrunthefollowingquery:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"fuzzy":{

"title":{

"value":"crimea",

"fuzziness":2,

"max_expansions":50

}

}

}

}'

MatchingphrasesThesimplestpositionawarequery,thephrasequeryallowsustofinddocumentsnotwithatermbuttermspositionedoneafteranother–onesthatformaphrase.Forexample,aquerythatwouldonlymatchdocumentsthathavethewestennichtsneuesphraseintheotitlefieldwouldlookasfollows:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"match_phrase":{

"otitle":"westennichtsneues"

}

}

}'

Spans,spanseverywhereThelastusecasewewouldliketodiscussisamorecomplicatedexampleofpositionawarequeriescalledspanqueries.Imaginethatwewouldliketorunaquerytofinddocumentsthathavethewesternfrontphrasenotmorethanthreepositionsafterthetermquietandallthatjustaftertheallterm?Thiscanbedonewithspanqueriesandthefollowingcommandshowshowsuchquerywilllook:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"span_near":{

"clauses":[

{

"span_term":{

www.EBooksWorld.ir

"title":"all"

}

},

{

"span_near":{

"clauses":[

{

"span_term":{

"title":"quiet"

}

},

{

"span_near":{

"clauses":[

{

"span_term":{

"title":"western"

}

},

{

"span_term":{

"title":"front"

}

}

],

"slop":0,

"in_order":true

}

}

],

"slop":3,

"in_order":true

}

}

],

"slop":0,

"in_order":true

}

}

}'

Notethatthespanqueriesarenotanalyzed.WecanseethatbylookingattheresponseoftheExplainAPI.Toseethatresponse,weshouldrunthesamerequestbody(ourquery)tothe/library/book/1/_explainRESTend-point.Theinterestingpartoftheoutputlooksasfollows:

"description":"weight(spanNear([title:all,spanNear([title:quiet,

spanNear([title:western,title:front],0,true)],3,true)],0,true)in0)

[PerFieldSimilarity],resultof:",

www.EBooksWorld.ir

www.EBooksWorld.ir

SummaryThischapterhasbeenallaboutthequeryingprocess.WestartedbylookingathowtoqueryElasticsearchandwhatElasticsearchdoeswhenitneedstohandlethequery.Wealsolearnedaboutthebasicandcompoundqueries,sowearenowabletousebothsimplequeriesaswellastheonesthatgroupmultiplesmallqueriestogether.Finally,wediscussedhowtochoosetherightqueryforagivenusecase.

Inthenextchapter,wewillextendourqueryknowledge.WewillstartwithfilteringourqueriesandmovetohighlightingpossibilitiesandawaytovalidateourqueriesusingElasticsearchAPI.WewilldiscusssortingofsearchresultsandqueryrewritewhichwillshowuswhathappenstosomequeriesinElasticsearchinternals.

www.EBooksWorld.ir

www.EBooksWorld.ir

Chapter4.ExtendingYourQueryingKnowledgeInthepreviouschapter,wedivedintoElasticsearchqueryingcapabilities.WediscussedhowtoqueryElasticsearchindetailandwelearnedhowElasticsearchqueryingworks.Wenowknowthebasicandcompoundqueriesofthisgreatsearchengineandwhataretheconfigurationoptionsforeachquerytype.Wealsogottoknowwhentouseourqueriesandwediscussedafewusecasesandwhichqueriescanbeusedtohandlethem.Thischapterisdedicatedtoextendingourqueryingknowledge.Bytheendofthischapter,youwillhavelearnedthefollowingtopics:

WhatfilteringisandhowtouseitWhathighlightingisandhowtouseitWhatarethehighlightertypesandwhatbenefitstheybringHowtovalidateyourqueriesHowtosortyourqueryresultsWhatqueryrewriteisandhowtocontrolit

www.EBooksWorld.ir

FilteringyourresultsInthepreviouschapter,wetalkedaboutvarioustypesofqueries.Thecommonpartwasthatwealwayswantedtogetthebestresultsfirst.Thisisthemaindifferencefromthestandarddatabaseapproachwhereeverydocumentmatchesthequeryornot.Inthedatabaseworld,wedonotaskhowgoodthedocumentis;ouronlyinterestliesintheresultsreturned.Whentalkingaboutfulltextsearchenginesthisisdifferent–weareinterestednotonlyintheresults,wearealsointerestedintheirquality.Thereasonisobvious,wearesearchinginunstructureddata,usingtextfieldsthatuselanguageanalysis,stemming,andsoon.Becauseofthat,theinitialresultsofourqueries,inmostcases,giveresultsthatarefarfromoptimal.Thisiswhywhenwetalkaboutsearching,wetalkaboutprecisionanddocumentrecall.

Ontheotherhand,sometimeswewanttolimitthewholesubsetofdocumentstoachosenpart.Forexample,inalibrary,wemaywanttosearchonlytheavailablebooks,therestbeingunimportant.Sometimesthescore,busilycalculatedforthegivenfields,onlyinterfereswiththeoverallscoreandhasnomeaningintermsofaccuracy.Insuchcases,filtersshouldbeusedtolimittheresultsofthequery,butnotinterferewiththecalculatedscore.

PriortoElasticsearch2.0,filterswereindependententitiesfromqueries.Inpractice,almosteveryqueryhaditsowncounterpartinfilters.Therewasthetermqueryandthetermfilter,theboolqueryandtheboolfilter,therangequeryandtherangefilter,andsoon.Fromtheuserpointofview,themostimportantdifferencebetweenthequeriesandthefilterswasscoring.Thefilterdidn’tcalculatescore,whichresultedinthefilterbeingeasilycachedandmoreefficient.Butthisdifferencewasveryinconvenientforusers.WiththereleaseofElasticsearch2.0anditsusageofLucene5.3,filterqueriesweredeprecatedalongwithsometypesofqueriesthatallowedustousefilters.Let’sdiscusshowfilteringworksnowandwhatwecandotoachievethesameorbetterperformanceasbeforeinElasticsearch2.0.

www.EBooksWorld.ir

ThecontextisthekeyInElasticsearch2.0,queriescancalculatescoreoromititbychoosingmoreefficientwayofexecution.Thisbehavior,inmanycases,isdoneautomaticallybasedonthecontextwherethequeryisused.Thisisaboutthequeriesthatincludefiltersections,whichremovethedocumentsbasedonsomecriteria.Thesedocumentsareunnecessaryinthereturnedresultsandshouldbeskippedasquicklyaspossiblewithoutaffectingtheoverallscore.Thankstothis,afterdiscardingsomedocumentswecanfocusonlyontherestofthedocuments,calculatingtheirscores,andsortingthembeforereturning.Theexampleofthiscasecanbethemust_notclauseofaBooleanquery.Thedocumentthatmatchesthemust_notclausewillberemovedfromthereturnedresultset,socalculatingthescoreforthedocumentsmatchedbythispartoftheboolquerywouldbeanadditional,unnecessary,andperformanceineffectivework.

Thebestthingaboutallthechangesisthatwedon’tneedtocareaboutifwewanttousefilteringornot.ElasticsearchandtheunderlyingApacheLucenelibrarytakecareofchoosingtherightexecutionmethodforus.

www.EBooksWorld.ir

ExplicitfilteringwithboolqueryAswementionedintheCompoundqueriessectioninChapter3,SearchingYourData,theboolqueryinElasticsearch2.0allowsustoaddafilterexplicitlybyaddingthefiltersectionandincludingaqueryinthatsection.Thisisveryconvenientifwewanttohaveapartofthequerythatneedstomatch,butwearenotinterestedinthescoreforthosedocuments.

Let’slookatthefollowingquery:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"term":{

"available":true

}

}

}'

Weseeasimplequerythatshouldreturnallthebooksinourlibraryavailableforborrowing,whichmeansthedocumentswiththeavailablefieldsettotrue.Nowlet’scompareitwiththefollowingquery:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"bool":{

"must":{

"match_all":{}

},

"filter":{

"term":{

"available":true

}

}

}

}

}'

Thisqueryreturnsallthebooks,butitalsocontainsthefiltersection,whichtellsElasticsearchthatweareonlyinterestedintheavailablebooks.Thequerywillreturnthesameresultsasthepreviousquerywe’veseen,ofcoursewhenlookingonlyatthenumberofdocumentsandwhichdocumentsarereturned.Thedifferenceisthescore.Forourexampledata,boththequeriesreturntwobooks.Theresultsreturnedforthefirstquerylookasfollows:

{

"took":2,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

www.EBooksWorld.ir

"total":2,

"max_score":1.0,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"4",

"_score":1.0,

"_source":{

"title":"CrimeandPunishment",

"otitle":"Преступлéниеинаказáние",

"author":"FyodorDostoevsky",

"year":1886,

"characters":["Raskolnikov","SofiaSemyonovnaMarmeladova"],

"tags":[],

"copies":0,

"available":true

}

},{

"_index":"library",

"_type":"book",

"_id":"1",

"_score":0.30685282,

"_source":{

"title":"AllQuietontheWesternFront",

"otitle":"ImWestennichtsNeues",

"author":"ErichMariaRemarque",

"year":1929,

"characters":["PaulBäumer","AlbertKropp","HaieWesthus",

"FredrichMüller","StanislausKatczinsky","Tjaden"],

"tags":["novel"],

"copies":1,

"available":true,

"section":3

}

}]

}

}

Theresultsforthesecondquerylookasfollows:

{

"took":2,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":2,

"max_score":1.0,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"4",

"_score":1.0,

www.EBooksWorld.ir

"_source":{

"title":"CrimeandPunishment",

"otitle":"Преступлéниеинаказáние",

"author":"FyodorDostoevsky",

"year":1886,

"characters":["Raskolnikov","SofiaSemyonovnaMarmeladova"],

"tags":[],

"copies":0,

"available":true

}

},{

"_index":"library",

"_type":"book",

"_id":"1",

"_score":1.0,

"_source":{

"title":"AllQuietontheWesternFront",

"otitle":"ImWestennichtsNeues",

"author":"ErichMariaRemarque",

"year":1929,

"characters":["PaulBäumer","AlbertKropp","HaieWesthus",

"FredrichMüller","StanislausKatczinsky","Tjaden"],

"tags":["novel"],

"copies":1,

"available":true,"section":3}

}]

}

}

Ifyoulookatthescoreforthedocumentsineachquery,you’llnoticethedifference.Inthesimpletermquery,Elasticsearch(theLucenelibrary,infact)hasascoreof1.0forthefirstdocumentandascoreof0.30685282forthesecondone.Thisisnotaperfectsolutionbecausetheavailabilitycheckismoreorlessbinaryandwedon’twantittointerferewiththescore.That’swhythesecondqueryisbetterinthiscase.Withtheboolqueryandfiltering,thescoreforthefilterelementisnotcalculatedandthescoreforboththedocumentsisthesame,thatis1.0.

www.EBooksWorld.ir

www.EBooksWorld.ir

HighlightingYouhaveprobablyheardofhighlightingorseenit.YoumaynotevenknowthatyouareactuallyusinghighlightingwhenyouareusingthebiggerandsmallerpublicsearchenginesontheWorldWideWeb(WWW).Whenwetalkabouthighlightingincontextoffulltextsearch,weusuallymeanshowingwhichwordsorphrasesfromthequerywerematchedintheresultingdocuments.Forexample,ifweuseGoogleandsearchforthewordlucene,wewouldseethatwordboldedinthesearchresults:

ItisevenmorevisibleontheMicrosoftBingsearchengine:

Inthischapter,wewillseehowtouseElasticsearchhighlightingcapabilitiestoenhanceourapplicationwithhighlightedresults.

www.EBooksWorld.ir

GettingstartedwithhighlightingThereisnobetterwayofshowinghowhighlightingworksotherthanmakingaqueryandlookingattheresultsreturnedbyElasticsearch.Solet’sdothat.Weassumethatwewouldliketohighlightthetermsthatarematchedinthetitlefieldofourdocumentstoincreasethesearchexperienceofourusers.Bynowyouknowtheexampledatafromtoptobottom,solet’sagainreusethesamedataset.Wewanttomatchthetermcrimeinthetitlefieldandwewanttogethighlightingresults.Oneofthesimplestqueriesthatcanachievethislooksasfollows:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"match":{

"title":"crime"

}

},

"highlight":{

"fields":{

"title":{}

}

}

}'

Theresponsefortheprecedingqueryisasfollows:

{

"took":16,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.5,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"4",

"_score":0.5,

"_source":{

"title":"CrimeandPunishment",

"otitle":"Преступлéниеинаказáние",

"author":"FyodorDostoevsky",

"year":1886,

"characters":["Raskolnikov","SofiaSemyonovnaMarmeladova"],

"tags":[],

"copies":0,

"available":true

},

"highlight":{

"title":["<em>Crime</em>andPunishment"]

}

www.EBooksWorld.ir

}]

}

}

Asyoucansee,apartfromthestandardinformationaboutthedocumentsthatmatchedthequery,wegotanewsectioncalledhighlight.Elasticsearchusedthe<em>HTMLtagasthebeginningofthehighlightingsectionanditsclosingcounterparttoclosethehighlightedsection.ThisisthedefaultbehaviorofElasticsearch,butwewilllearnhowtochangethat.

www.EBooksWorld.ir

FieldconfigurationInordertoperformhighlighting,theoriginalcontentofthefieldneedstobepresent.Wehavetosetthefieldswewilluseforhighlighting.Thisisdonebyeithermarkingafieldtobestoredorusingthe_sourcefieldwiththosefieldsincluded.Ifthefieldissettobestoredinthemappings,thestoredversionwillbeused,otherwiseElasticsearchwilltrytousethe_sourcefieldandextractthefieldthatneedstobehighlighted.

www.EBooksWorld.ir

UnderthehoodElasticsearchusesApacheLuceneunderthehoodandhighlightingisoneofthefeaturesofthatlibrary.Luceneprovidesthreetypesofhighlightingimplementation:thestandardone,whichwejustused;thesecondonecalledFastVectorHighlighter(https://lucene.apache.org/core/5_4_0/highlighter/org/apache/lucene/search/vectorhighlight/FastVectorHighlighter.htmlwhichneedstermvectorsandpositionstobeabletowork;andthethirdonecalledPostingsHighlighter

(http://lucene.apache.org/core/5_4_0/highlighter/org/apache/lucene/search/postingshighlight/PostingsHighlighter.htmlElasticsearchchoosestherighthighlighterimplementationautomatically.Ifthefieldisconfiguredwiththeterm_vectorpropertysettowith_positions_offsets,FastVectorHighlighterwillbeused.Ifthefieldisconfiguredwiththeindex_optionspropertysettooffsets,PostingsHighlighterwillbeused.Otherwise,thestandardhighlighterwillbeusedbyElasticsearch.

Whichhighlightertousedependsonyourdata,yourqueries,andtheneededperformance.Thestandardhighlighterisageneralusecaseone.However,ifyouwanttohighlightfieldswithlotsofdata,FastVectorHighlighteristherecommendedone.Thethingtorememberaboutitisthatitrequirestermvectorstobepresentandthatwillmakeyourindexslightlylarger.Finally,thefastesthighlighter,thatisalsorecommendedfornaturallanguagehighlighting,isPostingsHighlighter.However,thethingtorememberisthatPostingsHighlighterdoesn’tsupportcomplexqueriessuchasthematch_phrase_prefixqueryandinsuchcaseshighlightingwon’tbereturned.

ForcinghighlightertypeWhileElasticsearchchoosesthehighlightertypeforus,wecanalsoenforcethehighlightingtypeifwereallywantto.Todothat,weneedtosetthetypepropertytooneofthefollowingvalues:

plain:Whenthisvalueisset,Elasticsearchwillusethestandardhighlighterfvh:Whenthisvalueisset,ElasticsearchwilltryusingFastVectorHighlighter.Itwillrequiretermvectorstobeturnedonforthefieldusedforhighlighting.postings:Whenthisvalueisset,ElasticsearchwilltryusingPostingsHighlighter.Itwillrequireoffsetstobeturnedonforthefieldusedforhighlighting

Forexample,tousethestandardhighlighter,wewillrunthefollowingquery:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"term":{

"title":"crime"

}

},

"highlight":{

"fields":{

"title":{"type":"plain"}

}

}

}'

www.EBooksWorld.ir

ConfiguringHTMLtagsThedefaultbehaviorofhighlightingmechanismmaynotbesuitedforeveryone–notallofuswouldliketohavethe<em>and</em>tagstobeusedforhighlighting.Becauseofthat,Elasticsearchallowsustochangethedefaultbehaviorandchangethetagsthatareusedforthatpurpose.Todothat,weshouldsetthepre_tagsandpost_tagspropertiestothecodesnippetswewantthehighlightingtostartfromandendat;forexample,by<b>and</b>.Thepre_tagsandpost_tagspropertiesarearraysandbecauseofthatwecanprovidemorethanasingleopeningandclosingtagandElasticsearchwilluseeachofthedefinedtagstohighlightdifferentwords.Forexample,ifwewanttouse<b>astheopeningtagand</b>astheclosingtag,ourquerywilllooklikethis:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"term":{

"title":"crime"

}

},

"highlight":{

"pre_tags":["<b>"],

"post_tags":["</b>"],

"fields":{

"title":{}

}

}

}'

TheresultreturnedbyElasticsearchtotheprecedingquerywillbeasfollows:

{

"took":3,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.5,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"4",

"_score":0.5,

"_source":{

"title":"CrimeandPunishment",

"otitle":"Преступлéниеинаказáние",

"author":"FyodorDostoevsky",

"year":1886,

"characters":["Raskolnikov","SofiaSemyonovnaMarmeladova"],

"tags":[],

"copies":0,

www.EBooksWorld.ir

"available":true

},

"highlight":{

"title":["<b>Crime</b>andPunishment"]

}

}]

}

}

Asyoucansee,thetermCrimeinthetitlefieldwassurroundedbythetagsofourchoice.

www.EBooksWorld.ir

ControllinghighlightedfragmentsElasticsearchallowsustocontrolthenumberofhighlightedfragmentsreturnedandtheirsizesbyexposingtwoproperties.Thefirstoneisnumber_of_fragments,whichdefinesthenumberoffragmentsreturnedbyElasticsearch(defaultsto5).Settingthispropertyto0causesthewholefieldtobereturned,whichcanbehandyforshortfieldsbutexpensiveforlongerfields.Thesecondpropertyisfragment_sizewhichletsusspecifythemaximumlengthofthehighlightedfragmentsincharactersanddefaultsto100.

Anexamplequeryusingthesepropertieswilllookasfollows:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"term":{

"title":"crime"

}

},

"highlight":{

"fields":{

"title":{"fragment_size":200,"number_of_fragments":0}

}

}

}'

www.EBooksWorld.ir

GlobalandlocalsettingsThehighlightingpropertieswediscussedpreviouslycanbesetbothonaglobalbasisandperfieldbasis.Theglobaloneswillbeusedforallthefieldsthatdon’toverwritethemandshouldbeplacedonthesamelevelasthefieldssectionofyourhighlighting,forexample,likethis:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"term":{

"title":"crime"

}

},

"highlight":{

"pre_tags":["<b>"],

"post_tags":["</b>"],

"fields":{

"title":{}

}

}

}'

Youcanalsosetthepropertiesforeachfield.Forexample,ifwewouldliketokeepthedefaultbehaviorforallthefieldsexceptourtitlefield,wewoulddothefollowing:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"term":{

"title":"crime"

}

},

"highlight":{

"fields":{

"title":{"pre_tags":["<b>"],"post_tags":["</b>"]}

}

}

}'

Asyoucansee,insteadofplacingthepropertiesonthesamelevelasthefieldssection,weplaceditinsidetheemptyJSONobjectthatspecifiesthetitlefieldbehavior.Ofcourse,eachfieldcanbeconfiguredusingdifferentproperties.

www.EBooksWorld.ir

RequirematchingSometimestheremaybeaneed(especiallywhenusingmultiplehighlightedfields)toshowonlythefieldsthatmatchedourquery.Inordertohavesuchbehavior,weneedtosettherequire_field_matchpropertytotrue.Settingthispropertytofalsewillcauseallthetermstobehighlightedevenifafielddidn’tmatchthequery.

Toseehowthatworks,let’screateanewindexcalledusersandlet’sindexasingledocumentthere.Wewilldothatbysendingthefollowingcommand:

curl-XPUT'http://localhost:9200/users/user/1'-d'{

"name":"Testuser",

"description":"Testdocument"

}'

So,let’sassumewewanttohighlightthehitsinbothoftheprecedingfields.Ourcommandsendingthequerytoournewindexwilllooklikethis:

curl-XGET'localhost:9200/users/_search?pretty'-d'{

"query":{

"term":{

"name":"test"

}

},

"highlight":{

"fields":{

"name":{"pre_tags":["<b>"],"post_tags":["</b>"]},

"description":{"pre_tags":["<b>"],"post_tags":["</b>"]}

}

}

}'

Theresultoftheprecedingquerywillbeasfollows:

{

"took":3,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.19178301,

"hits":[{

"_index":"users",

"_type":"user",

"_id":"1",

"_score":0.19178301,

"_source":{

"name":"Testuser",

"description":"Testdocument"

},

"highlight":{

www.EBooksWorld.ir

"name":["<b>Test</b>user"]

}

}]

}

}

Notethatweonlygothighlightingonthenamefield.Thisisbecauseourquerymatchedonlythatfield.Let’sseewhatwillhappenifwesettherequire_field_matchpropertytofalseanduseacommandsimilartothefollowingone:

curl-XGET'localhost:9200/users/_search?pretty'-d'{

"query":{

"term":{

"name":"test"

}

},

"highlight":{

"require_field_match":false,

"fields":{

"name":{"pre_tags":["<b>"],"post_tags":["</b>"]},

"description":{"pre_tags":["<b>"],"post_tags":["</b>"]}

}

}

}'

Nowlet’slookatthemodifiedqueryresults:

{

"took":2,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.19178301,

"hits":[{

"_index":"users",

"_type":"user",

"_id":"1",

"_score":0.19178301,

"_source":{

"name":"Testuser",

"description":"Testdocument"

},

"highlight":{

"name":["<b>Test</b>user"],

"description":["<b>Test</b>document"]

}

}]

}

}

Asyoucansee,Elasticsearchreturnedhighlightinginboththefieldsnow.

www.EBooksWorld.ir

CustomhighlightingqueryThereareusecaseswhereyourqueriesarecomplicatedandnotreallysuitableforhighlighting,butyoustillwanttousehighlightingfunctionality.Insuchcases,Elasticsearchallowsustohighlightresultsonthebasisofadifferentqueryprovidedusingthehighlight_queryproperty.Anexampleofusingadifferenthighlightingquerylooksasfollows:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"term":{

"title":"crime"

}

},

"highlight":{

"fields":{

"title":{

"highlight_query":{

"term":{

"title":"punishment"

}

}

}

}

}

}'

Theprecedingquerywillresultinhighlightingthetermpunishmentinthetitlefield,insteadofthecrimeone.

www.EBooksWorld.ir

ThePostingshighlighterItistimetotalkaboutthethirdavailablehighlighter.ItwasaddedinElasticsearch0.90.6andisslightlydifferentfromthepreviousones.PostingsHighlighterisautomaticallyusedwhenthefielddefinitionhasindex_optionssettooffsets.ToillustratehowPostingsHighlighterworks,wewillcreateasimpleindexwithproperconfigurationthatallowsthathighlightertowork.Wewilldothatbyusingthefollowingcommands:

curl-XPUT'localhost:9200/hl_test'

curl-XPOST'localhost:9200/hl_test/doc/_mapping'-d'{

"doc":{

"properties":{

"contents":{

"type":"string",

"fields":{

"ps":{"type":"string","index_options":"offsets"}

}

}

}

}

}'

Ifeverythinggoeswell,weshouldhaveanewindexandthemappings.Themappingshavetwofieldsdefined:onenamedcontentsandthesecondonenamedcontents.ps.Inthissecondcase,weturnedontheoffsetsbyusingtheindex_optionsproperty.ThismeansthatElasticsearchwillusethestandardhighlighterforthecontentsfieldandthepostingshighlighterforthecontents.psfield.

Toseethedifference,wewillindexasingledocumentwithafragmentfromWikipediadescribingthehistoryofBirmingham.Wedothatbyrunningthefollowingcommand:

curl-XPUTlocalhost:9200/hl_test/doc/1-d'{

"contents":"Birmingham''searlyhistoryisthatofaremoteand

marginalarea.Themaincentresofpopulation,powerandwealthinthepre-

industrialEnglishMidlandslayinthefertileandaccessiblerivervalleys

oftheTrent,theSevernandtheAvon.TheareaofmodernBirminghamlayin

between,ontheuplandBirminghamPlateauandwithinthedenselywoodedand

sparselypopulatedForestofArden."

}'

Thelaststepistosendaqueryusingboththehighlighters.Wecandoitinasinglerequestbyusingthefollowingcommand:

curl'localhost:9200/hl_test/_search?pretty'-d'{

"query":{

"term":{

"contents.ps":"modern"

}

},

"highlight":{

"require_field_match":false,

"fields":{

"contents":{},

"contents.ps":{}

www.EBooksWorld.ir

}

}

}'

Ifeverythinggoeswell,youwillfindthefollowingsnippetintheresponsereturnedbyElasticsearch:

"highlight":{

"contents":["valleysoftheTrent,theSevernandtheAvon.Thearea

of<em>modern</em>Birminghamlayinbetween,ontheupland"],

"contents.ps":["Theareaof<em>modern</em>Birminghamlayinbetween,

ontheuplandBirminghamPlateauandwithinthedenselywoodedandsparsely

populatedForestofArden."]

}

Asyousee,boththehighlightersfoundtheoccurrenceofthedesiredword.Thedifferenceisthatthepostingshighlighterreturnsthesmartersnippet–itchecksforthesentenceboundaries.

Let’stryonemorequery:

curl'localhost:9200/hl_test/_search?pretty'-d'{

"query":{

"match_phrase":{

"contents.ps":"centresof"

}

},

"highlight":{

"require_field_match":false,

"fields":{

"contents":{},

"contents.ps":{}

}

}

}'

Wesearchedforthephrasecentresof.Asyoumayexpect,theresultsforthetwohighlighterswilldiffer.Forthestandardhighlighter,runonthecontentsfield,wewillfindthefollowingphraseintheresponse:

"Birminghamsearlyhistoryisthatofaremoteandmarginalarea.Themain

<em>centres</em><em>of</em>population"

Asyoucanclearlysee,thestandardhighlighterdividedthegivenphraseandhighlightedindividualterms.Also,notalloccurrencesofthetermscentresandofwerehighlighted,butonlytheonesthatformedthephrase.

Ontheotherhand,thePostingsHighlighterreturnedthefollowinghighlightedfragment:

"Birminghamsearlyhistoryisthat<em>of</em>aremoteandmarginal

area.","Themain<em>centres</em><em>of</em>population,powerandwealth

inthepre-industrialEnglishMidlandslayinthefertileandaccessible

rivervalleys<em>of</em>theTrent,theSevernandtheAvon.","Thearea

<em>of</em>modernBirminghamlayinbetween,ontheuplandBirmingham

PlateauandwithinthedenselywoodedandsparselypopulatedForest

www.EBooksWorld.ir

<em>of</em>Arden."

Thisisthesignificantdifference.ThePostingsHighlighterhighlightedallthetermsmatchingthequeryandnotonlythosethatformedthephrase,andreturnedwholesentences.Thisisaverynicefeature,especiallywhenyouwanttodisplaythehighlightingresultsfortheuserintheUIofyourapplication.

www.EBooksWorld.ir

www.EBooksWorld.ir

ValidatingyourqueriesTherearetimeswhenyouarenotintotalcontrolofthequeriesthatyousendtoElasticsearch.Thequeriescanbegeneratedfrommultiplecriteriamakingthemamonsterorevenworse.Theycanbegeneratedbysomekindofawizardwhichmakesithardtotroubleshootandfindthepartthatisfaultyandmakingthequeryfail.Becauseofsuchusecases,ElasticsearchexposestheValidateAPI,whichhelpsusvalidateourqueriesanddiagnosepotentialproblems.

www.EBooksWorld.ir

UsingtheValidateAPITheusageoftheValidateAPIisverysimple.Insteadofsendingthequerytothe_searchRESTendpoint,wesendittothe_validate/queryone.Andthat’sit.Let’slookatthefollowingcommand:

curl-XGET'localhost:9200/library/_validate/query?pretty'--data-binary'{

"query":{

"bool":{

"must":{

"term":{

"title":"crime"

}

},

"should":{

"range:{

"year":{

"from":1900,

"to":2000

}

}

},

"must_not":{

"term":{

"otitle":"nothing"

}

}

}

}

}'

AsimilarquerywasalreadyusedinthisbookinChapter3,SearchingYourData.TheprecedingcommandwilltellElasticsearchtovalidateitandreturntheinformationaboutitsvalidity.TheresponseofElasticsearchtotheprecedingcommandwillbesimilartothefollowingone:

{

"valid":false,

"_shards":{

"total":1,

"successful":1,

"failed":0

}

}

Lookatthevalidattribute.Itissettofalse.Somethingwentwrong.Let’sexecutethequeryvalidationonceagainwiththeexplainparameteraddedinthequery:

curl-XGET'localhost:9200/library/_validate/query?pretty&explain'--data-

binary'{

"query":{

"bool":{

"must":{

"term":{

www.EBooksWorld.ir

"title":"crime"

}

},

"should":{

"range:{

"year":{

"from":1900,

"to":2000

}

}

},

"must_not":{

"term":{

"otitle":"nothing"

}

}

}

}

}'

NowtheresultreturnedfromElasticsearchismoreverbose:

{

"valid":false,

"_shards":{

"total":1,

"successful":1,

"failed":0

},

"explanations":[{

"index":"library",

"valid":false,

"error":"[library]QueryParsingException[Failedtoparse];nested:

JsonParseException[Illegalunquotedcharacter((CTRL-CHAR,code10)):has

tobeescapedusingbackslashtobeincludedinname\nat[Source:

org.elasticsearch.transport.netty.ChannelBufferStreamInput@1110d090;line:

10,column:18]];;com.fasterxml.jackson.core.JsonParseException:Illegal

unquotedcharacter((CTRL-CHAR,code10)):hastobeescapedusing

backslashtobeincludedinname\nat[Source:

org.elasticsearch.transport.netty.ChannelBufferStreamInput@1110d090;line:

10,column:18]"

}]

}

Noweverythingisclear.Inourexample,wehaveimproperlyquotedtherangeattribute.

NoteYoumaywonderwhyinourcurlqueryweusedthe--data-binaryparameter.ThisparameterproperlypreservesthenewlinecharacterwhensendingaquerytoElasticsearch.Thismeansthatthelineandthecolumnnumberremainintactandit’seasiertofinderrors.Intheothercases,the–dparameterismoreconvenientbecauseit’sshorter.

TheValidateAPIcanalsodetectothererrors,forexample,incorrectformatofanumberorothermapping-relatedissues.Unfortunately,forourapplication,itisnoteasytodetectwhattheproblemisbecauseofalackofstructureintheerrormessages.

www.EBooksWorld.ir

TheValidateAPIsupportsmostoftheparametersthataresupportedbystandardElasticsearchqueries,whichinclude:explain,ignore_unavailable,allow_no_indices,expand_wildcards,operation_threading,analyzer,analyze_wildcard,default_operator,df,lenient,lowercase_expanded_terms,andrewrite.

www.EBooksWorld.ir

www.EBooksWorld.ir

SortingdataSofarwe’verunourqueriesandgottheresultsintheorderdeterminedbythescoreofeachdocument.However,itisnotenoughforalltheusecases.Itisreallyhandytobeabletosortourresultsonthebasisofthefieldvalues.Forexample,whenyouaresearchinglogsortime-baseddataingeneral,youprobablywanttohavethemostrecentdatafirst.Inadditiontothat,Elasticsearchallowsustocontrolhowthedocumentsuchbesortednotonlyusingfieldvalues,butalsousingmoresophisticatedsortinglikeonesthatusescriptsorsortingonfieldsthathavemultiplevalues.Wewillcoverallthatinthissection.

www.EBooksWorld.ir

DefaultsortingLet’slookatthefollowingquerythatreturnsallthebookswithatleastoneofthespecifiedwords:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"terms":{

"title":["crime","front","punishment"]

}

}

}'

Underthehood,wecanimaginethatElasticsearchseestheprecedingqueryasfollows:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"terms":{

"title":["crime","front","punishment"]

}

},

"sort":{"_score":"desc"}

}'

Lookatthehighlightedsectionintheprecedingquery.ThisisthedefaultsortingusedbyElasticsearch.Forbettervisibility,wecanchangetheformattingslightlyandshowthehighlightedfragmentasfollows:

"sort":[

{"_score":"desc"}

]

Theprecedingsectiondefineshowthedocumentsshouldbesortedintheresultslist.Inthiscase,Elasticsearchwillshowthedocumentswiththehighestscoreontopoftheresultslist.Thesimplestmodificationistoreversetheorderingbychangingthesortsectiontothefollowingone:

"sort":[

{"_score":"asc"}

]

www.EBooksWorld.ir

SelectingfieldsusedforsortingDefaultsortingisboring,isn’tit?So,let’schangeittosortonthebasisofthevaluesofthefieldspresentinthedocuments.Let’schoosethetitlefield,whichmeansthatthesortsectionofourquerywilllookasfollows:

"sort":[

{"title":"asc"}

]

Unfortunately,thisdoesn’tworkasexpected.AlthoughElasticsearchsortedthedocuments,theorderingissomewhatstrange.Lookcloselyattheresponse.Witheverydocument,Elasticsearchreturnsinformationaboutthesorting;forexample,fortheCrimeandPunishmentbook,thereturneddocumentlookslikethefollowingcode:

{

"_index":"library",

"_type":"book",

"_id":"4",

"_score":null,

"_source":{

"title":"CrimeandPunishment",

"otitle":"Преступлéниеинаказáние",

"author":"FyodorDostoevsky",

"year":1886,

"characters":["Raskolnikov","SofiaSemyonovnaMarmeladova"],

"tags":[],

"copies":0,

"available":true

},

"sort":["punishment"]

}

Ifyoucomparethetitlefieldandthereturnedsortinginformation,everythingshouldbeclear.Elasticsearch,duringtheanalysisprocess,splitsthefieldintoseveraltokens.Sincesortingisdoneusingasingletoken,Elasticsearchchoosesoneoftheproducedtokens.Itdoesthebestthatitcanbysortingthesetokensalphabeticallyandchoosingthefirstone.Thisisthereasonwhy,inthesortingvalue,wefindonlyasinglewordinsteadofthewholecontentofthetitlefield.IfyouwouldliketoseehowElasticsearchbehaveswhenusingdifferentfieldsforsorting,youcantryfieldssuchascopies:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"terms":{

"title":["crime","front","punishment"]

}

},

"sort":[

{"copies":"asc"}

]

}'

Ingeneral,itisagoodideatohaveanotanalyzedfieldforsorting.Wecanusefieldswith

www.EBooksWorld.ir

multiplevaluesforsorting,but,inmostcases,itdoesn’tmakemuchsenseandhaslimitedusage.

Asanexampleofusingtwodifferentfields,oneforsortingandanotherforsearching,let’schangeourtitlefield.Thechangedtitlefielddefinitionwilllookasfollows:

"title":{

"type":"string",

"fields":{

"sort":{"type":"string","index":"not_analyzed"}

}

}

Afterchangingthetitlefieldinthemappings(we’veusedthesamemappingsasinChapter3,SearchingYourData)andre-indexingthedata,wecantrysortingthetitle.sortfieldandseewhetheritworks.Todothis,wewillneedtosendthefollowingquery:

{

"query":{

"match_all":{}

},

"sort":[

{"title.sort":"asc"}

]

}

Now,itworksproperly.Asyoucansee,weusedthenewfield,thetitle.sortone.Wesetitasnottobeanalyzed,sothereisasingletokenforthatfieldintheindexofElasticsearch.

SortingmodeIntheresponsefromElasticsearch,everydocumentcontainsinformationaboutthevalueusedforsorting.Forexample,let’slookatoneofthedocumentsreturnedbythequeryinwhichweusedthetitlefieldforsorting:

{

"_index":"library",

"_type":"book",

"_id":"1",

"_score":null,

"_source":{

"title":"AllQuietontheWesternFront",

"otitle":"ImWestennichtsNeues",

"author":"ErichMariaRemarque",

"year":1929,

"characters":["PaulBäumer","AlbertKropp","HaieWesthus",

"FredrichMüller","StanislausKatczinsky","Tjaden"],

"tags":["novel"],

"copies":1,

"available":true,

"section":3

},

"sort":["all"]

www.EBooksWorld.ir

}

Thesortingusedinthequerytogettheprecedingdocument,wasasfollows:

"sort":[

{"title":"asc"}

]

However,becausewearesortingonananalyzedfield,whichcontainsmorethanasinglevalue,thesortingdefinitionisinfactequivalenttothelongerform,whichlooksasfollows:

"sort":[

{"title":{"order":"asc","mode":"min"}

]

modedefineswhichtokenshouldbeusedforcomparisonwhensortingonafieldwhichhasmorethanonevalue.Theavailablevalueswecanchoosefromare:

min:Sortingwillusethelowestvalue(orthefirstalphabeticalvalueonthetextbasedfields)max:Sortingwillusethehighestvalue(orthelastalphabeticalvalueonthetextbasedfields)avg:Sortingwillusetheaveragevaluemedian:Sortingwillusethemedianvaluesum:Sortingwillusethesumofallthevaluesinthefield

NoteThemodessuchasmedian,avg,andsumareusefulfornumericalmultivaluedfields,butdon’tmakemuchsensewhenitcomestotextbasedfields.

Notethatsort,inrequestandresponse,isgivenasanarray.Thissuggeststhatwecanuseseveraldifferentorderings.Elasticsearchwillusethenextelementinthesortingdefinitionlisttodetermineorderingbetweenthedocumentsthathavethesamevalueoftheprevioussortingclause.So,ifwehavethesamevalueinthetitlefield,thedocumentswillbesortedbythenextfieldthatwespecify.Forexample,ifwewouldliketogetthedocumentsthathavethemostcopiesandthensortbythetitle,wewillrunthefollowingquery:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"terms":{

"title":["crime","front","punishment"]

}

},

"sort":[

{"copies":"desc"},{"title":"asc"}

]

}'

www.EBooksWorld.ir

SpecifyingbehaviorformissingfieldsWhataboutwhensomeofthedocumentsthatmatchthequerydon’thavethefieldwewanttosorton?Bydefault,documentswithoutthegivenfieldarereturnedfirstinthecaseofascendingorderandlastinthecaseofdescendingorder.However,sometimesthisisnotexactlywhatwewanttoachieve.

Whenweusesortingonnumericfields,wecanchangethedefaultElasticsearchbehaviorfordocumentswithmissingfields.Forexample,let’stakealookatthefollowingquery:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"match_all":{}

},

"sort":[

{

"section":{

"order":"asc",

"missing":"_last"

}

}

]

}'

Notetheextendedformofthesortsectionofourquery.We’veaddedthemissingparametertoit.Bysettingthemissingparameterto_last,Elasticsearchwillplacethedocumentswithoutthegivenfieldatthebottomoftheresultslist.Settingthemissingparameterto_firstwillresultinElasticsearchplacingdocumentswithoutthegivenfieldatthetopoftheresultslist.Itisworthmentioningthatbesidesthe_lastand_firstvalues,Elasticsearchalsoallowsustouseanynumber.Insuchacase,adocumentwithoutadefinedfieldwillbetreatedasthedocumentwiththisgivenvalue.

www.EBooksWorld.ir

DynamiccriteriaAswementionedintheprevioussection,Elasticsearchallowsustosortusingfieldsthathavemultiplevalues.Wecancontrolhowthecomparisonismadeusingscripts.WedothatbyshowingElasticsearchhowtocalculatethevaluethatshouldbeusedforsorting.Let’sassumethatwewanttosortbythefirstvalueindexedinthetagsfield.Let’stakealookatthefollowingexamplequery(notethatrunningthefollowingqueryrequiresthescript.inlinepropertysettoonintheelasticsearch.ymlfile):

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"match_all":{}

},

"sort":{

"_script":{

"script":"doc[\"tags\"].values.size()>0?doc[\"tags\"].values[0]

:\"\u19999\"",

"type":"string",

"order":"asc"

}

}

}'

Intheprecedingexample,wereplacedeverynonexistentvaluewiththeUnicodecodeofacharacterthatshouldbelowenoughinthelist.Themainideaofthiscodeistocheckifourarraycontainsatleastasingleelement.Ifitdoes,thenthefirstvaluefromthearrayisreturned.Ifthearrayisempty,wereturntheUnicodecharacterthatshouldbeplacedatthebottomoftheresultslist.Besidesthescriptparameter,thisoptionofsortingrequiresustospecifytheorder(ascending,inourcase)andtypeparametersthatwillbeusedforthecomparison(wereturnstringfromourscript).

www.EBooksWorld.ir

CalculatescoringwhensortingBydefault,Elasticsearchassumesthatwhenyouusesorting,thescoreiscompletelyunimportant.Usuallyitisagoodassumption;whydoadditionalcomputationswhentheimportanceofthedocumentsisgivenbythesortingformula.Sometimes,however,youwanttoknowhowgoodthedocumentisinrelationtothecurrentquery,evenifthedocumentsarepresentedinadifferentorder.Thisiswhenthetrack_scoresparametershouldbeusedandsettotrue.Anexamplequeryusingitlooksasfollows:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"match_all":{}

},

"track_scores":true,

"sort":[

{"title":{"order":"asc"}}

]

}'

Theprecedingquerycalculatesthescoreforeverydocument.Infact,inourexample,thescoreisboringandisalwaysequalto1.0becauseofthematch_allquerywhichtreatsallthedocumentsasequal.

www.EBooksWorld.ir

www.EBooksWorld.ir

QueryrewriteWhendebuggingyourqueries,itisveryvaluabletoknowhowallthequeriesareexecuted.Becauseofthat,wedecidedtoincludethesectiononhowqueryrewriteworksinElasticsearch,whyitisused,andhowtocontrolit.Ifyouhaveeverusedqueries,suchastheprefixqueryandthewildcardquery,basicallyanyquerythatissaidtobemultiterm(aquerythatisbuiltofmultipleterms),you’veusedqueryrewritingeventhoughyoumaynothaveknownaboutit.Elasticsearchdoesrewriteforperformancereasons.Therewriteprocessisaboutchangingtheoriginal,expensivequeryintoasetofqueriesthatarefarlessexpensivefromanApacheLucenepointofview,thusspeedingupthequeryexecution.

www.EBooksWorld.ir

PrefixqueryasanexampleThebestwaytoillustratehowtherewriteprocessisdoneinternallyistolookatanexampleandseewhichtermsareusedinsteadoftheoriginalqueryterm.Wewillindexthreedocumentstoourlibrary_itindexbyusingthefollowingcommands:

curl-XPOST'localhost:9200/library_it/book/1'-d'{"title":"Solr4

Cookbook"}'

curl-XPOST'localhost:9200/library_it/book/2'-d'{"title":"Solr3.1

Cookbook"}'

curl-XPOST'localhost:9200/library_it/book/3'-d'{"title":"Mastering

Elasticsearch"}'

Whatwewouldlikeistofindallthedocumentsthatstartwiththeletters.Simpleasthat,werunthefollowingqueryagainstourlibrary_itindex:

curl-XGET'localhost:9200/library_it/_search?pretty'-d'{

"query":{

"prefix":{

"title":{

"prefix":"s",

"rewrite":"constant_score_boolean"

}

}

}

}'

We’veusedasimpleprefixquery;we’vesaidthatwewouldliketofindallthedocumentswiththelettersinthetitlefield.We’vealsousedtherewritepropertytospecifythequeryrewritemethod,butlet’sskipitfornowaswewilldiscussthepossiblevaluesofthisparameterinthelaterpartofthissection.

Astheresponsetothepreviousquery,wegetthefollowing:

{

"took":13,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":2,

"max_score":1.0,

"hits":[{

"_index":"library_it",

"_type":"book",

"_id":"2",

"_score":1.0,

"_source":{

"title":"Solr3.1Cookbook"

}

},{

"_index":"library_it",

www.EBooksWorld.ir

"_type":"book",

"_id":"1",

"_score":1.0,

"_source":{

"title":"Solr4Cookbook"

}

}]

}

}

Asyoucansee,inresponsewegotthetwodocumentsthathadthecontentsofthetitlefieldstartingwiththedesiredcharacter.Wedidn’tspecifythemappingsexplicitly,sowereliedonElasticsearch’sabilitytochoosethemappingtypeforus.Aswealreadyknow,forthetextfield,Elasticsearchusesthedefaultanalyzer.Thismeansthatthetermsinourdocumentswillbelowercasedand,becauseofthat,weusedthelowercasedletterinourprefixquery(rememberthattheprefixqueryisnotanalyzed).

www.EBooksWorld.ir

GettingbacktoApacheLuceneNowlet’stakeastepbackandlookatApacheLuceneagain.IfyourecallwhatLuceneinvertedindexisbuiltfrom,youcantellthatitcontainsaterm,count,anddocumentpointer(ifyoudon’trecall,refertotheFulltextsearchingsectioninChapter1,GettingStartedwithElasticsearchCluster).So,let’sseehowthesimplifiedviewoftheindexmaylookfortheprecedingdatawe’veputtothelibrary_itindex:

WhatyouseeinthecolumnwiththeTermtextisquiteimportant.IfyoulookatElasticsearchandApacheLuceneinternals,youcanseethatourprefixquerywasrewrittentothefollowingLucenequery:

ConstantScore(title:solr)

WecanchecktheportionsoftherewriteusingtheElasticsearchAPI.Firstofall,wecanusetheExplainAPIbyrunningthefollowingcommand:

curl-XGET'localhost:9200/library_it/book/1/_explain?pretty'-d'{

"query":{

"prefix":{

"title":{

"prefix":"s",

"rewrite":"constant_score_boolean"

}

}

}

}'

Theresultwillbeasfollows:

{

"_index":"library_it",

"_type":"book",

"_id":"1",

"matched":true,

"explanation":{

www.EBooksWorld.ir

"value":1.0,

"description":"sumof:",

"details":[{

"value":1.0,

"description":"ConstantScore(title:solr),productof:",

"details":[{

"value":1.0,

"description":"boost",

"details":[]

},{

"value":1.0,

"description":"queryNorm",

"details":[]

}]

},{

"value":0.0,

"description":"matchonrequiredclause,productof:",

"details":[{

"value":0.0,

"description":"#clause",

"details":[]

},{

"value":1.0,

"description":"_type:book,productof:",

"details":[{

"value":1.0,

"description":"boost",

"details":[]

},{

"value":1.0,

"description":"queryNorm",

"details":[]

}]

}]

}]

}

}

WecanseethatElasticsearchusedaconstantscorequerywiththetermsolragainstthetitlefield.

www.EBooksWorld.ir

QueryrewritepropertiesWecancontrolhowthequeriesarerewritteninternally.Todothat,weplacetherewriteparameterinsidetheJSONobjectresponsiblefortheactualquery.Forexample:

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"prefix":{

"title":"s",

"rewrite":"constant_score_boolean"

}

}

}'

Therewritepropertycantakethefollowingvalues:

scoring_boolean:ThisrewritemethodtranslateseachgeneratedtermintoaBooleanshouldclauseintheBooleanquery.Thisrewritemethodcausesthescoretobecalculatedforeachdocument.Becauseofthat,thismethodmaybeCPUdemanding.Pleasealsonotethat,forqueriesthathavemanyterms,itmayexceedtheBooleanquerylimit,whichissetto1024.ThedefaultBooleanquerylimitcanbechangedbysettingtheindex.query.bool.max_clause_countpropertyintheelasticsearch.ymlfile.However,rememberthatthemoreBooleanqueriesproduced,thelowerthequeryperformancemaybe.constant_score:Thisrewritemethodchoosesconstant_score_booleanorconstant_score_filterdependingonthequeryandtakingperformanceintoconsideration.Thisisalsothedefaultbehaviorwhentherewritepropertyisnotsetatall.constant_score_boolean:Thisrewritemethodissimilartothescoring_booleanrewritemethoddescribedpreviously,butlessCPUdemandingbecausethescoringisnotcomputedand,insteadofthat,eachtermreceivesascoreequaltothequeryboost(onebydefault,andwhichcanbesetusingtheboostproperty).BecausethisrewritemethodalsoresultsinBooleanshouldclausesbeingcreated,similartothescoring_booleanrewritemethod,thismethodcanalsohitthemaximumBooleanclauseslimit.top_terms_N:ArewritemethodthattranslateseachgeneratedtermintoaBooleanshouldclauseinaBooleanqueryandkeepsthescoresascomputedbythequery.However,unlikethescoring_booleanrewritemethod,itonlykeepsanNnumberoftopscoringtermstoavoidhittingthemaximumBooleanclauseslimitandincreasethefinalqueryperformance.top_terms_blended_freqs_N:ArewritemethodthattranslateseachtermintoaBooleanqueryandtreatthetermsasiftheyhadthesametermfrequency.top_terms_boost_N:Arewritemethodsimilartothetop_terms_None,butthescoresarenotcomputed.Instead,thedocumentsaregivenascoreequaltothevalueoftheboostproperty(onebydefault).

Forexample,ifwewouldlikeourexamplequerytousetop_terms_NwithNequalto2,ourquerywouldlooklikethis:

www.EBooksWorld.ir

curl-XGET'localhost:9200/library/book/_search?pretty'-d'{

"query":{

"prefix":{

"title":{

"prefix":"s",

"rewrite":"top_terms_2"

}

}

}

}'

IfyoulookattheresultsreturnedbyElasticsearch,you’llnoticethat,unlikeourinitialquery,thedocumentsweregivenascoredifferentthanthedefault1.0:

{

"took":4,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.15342641,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"3",

"_score":0.15342641,

"_source":{

"title":"TheCompleteSherlockHolmes",

"author":"ArthurConanDoyle",

"year":1936,

"characters":["SherlockHolmes","Dr.Watson","G.Lestrade"],

"tags":[],

"copies":0,

"available":false,

"section":12

}

}]

}

}

Thescoreisdifferentthanthedefault1.0becausewe’veusedthetop_terms_NrewritetypeandthistypeofqueryrewritekeepsthescoreforNtopscoringterms.

BeforewefinishtheQueryrewritesectionofthischapter,weshouldaskourselvesonelastquestion:whentousewhichrewritetype?Theanswertothisquestiongreatlydependsonyourusecase,but,tosummarize,ifyoucanlivewithlowerprecisionandrelevancy(buthigherperformance),youcangoforthetopNrewritemethod.Ifyouneedhighprecisionandthusmorerelevantqueries(butlowerperformance),choosetheBooleanapproach.

www.EBooksWorld.ir

www.EBooksWorld.ir

SummaryThechapteryoujustfinishedwasagainfocusedonquerying.Weusedfiltersandsawwhathighlightingisandhowtouseit.Welearnedwhatarethehighlightertypesandhowtheycanhelpus.WevalidatedourqueriesandwelearnedhowElasticsearchcanhelpuswhenitcomestosortingourresults.Finally,wediscussedqueryrewriting,whatthatbringsus,andhowwecancontrolit.

Inthenextchapter,wewillgetbacktoindexationtopic.WewilldiscussindexingcomplexJSONobjectssuchastree-likestructuresandindexingdatathatisnotflat.WewillprepareElasticsearchtohandlerelationshipsbetweendocumentsandwewillusetheElasticsearchAPItoupdatethestructureofourindices.

www.EBooksWorld.ir

www.EBooksWorld.ir

Chapter5.ExtendingYourIndexStructureWestartedthepreviouschapterbylearninghowtodealwithrevisedfilteringinElasticsearch2.xandwhattoexpectfromitnow.Wealsoexploredhighlightingandhowitcanhelpusinimprovingtheusers’searchexperience.WediscoveredqueryvalidationinElasticsearchandlearnedthewaysofdatasortinginElasticsearch.Finally,wediscussedqueryrewritingandhowthataffectsourqueries.Bytheendofthischapter,youwillhavelearnedthefollowingtopics:

Indexingtree-likestructuresIndexingdatathatisnotflatHandlingdocumentrelationshipsbyusingnestedobjectandparent–childfeaturesModifyingindexstructurebyusingElasticsearchAPI

www.EBooksWorld.ir

Indexingtree-likestructuresTreesareeverywhere.Ifyoudevelopane-commerceshopapplication,yourproductswillprobablybedescribedwiththeuseofcategories.Thethingaboutcategoriesisthatinmostcasestheyarehierarchical.Therearetopcategories,suchaselectronics,music,books,andsoon.Eachofthetoplevelcategoriescanhavenumerouschildrencategories,suchasfictionandscience,andthosecangetevendeeperintosciencefiction,romance,andsoon.Ifyoulookatthefilesystem,thefilesanddirectoriesarearrangedintree-likestructuresaswell.Thisbookcanalsoberepresentedasatree:chapterscontaintopicsandtopicsaredividedintosubtopics.Sothedataaroundusisarrangedintotree-likestructuresandasyoucanimagine,Elasticsearchiscapableofindexingtree-likestructuressothatwecanrepresentthedatainaneasiermanner.Let’scheckhowwecannavigatethroughthistypeofdatausingpath_analyzer.

www.EBooksWorld.ir

DatastructureTobeginwith,let’screateasimpleindexstructurebyusingthefollowingcommand:

curl-XPUT'localhost:9200/path?pretty'-d'{

"settings":{

"index":{

"analysis":{

"analyzer":{

"path_analyzer":{"tokenizer":"path_hierarchy"}

}

}

}

},

"mappings":{

"category":{

"properties":{

"category":{

"type":"string",

"fields":{

"name":{"type":"string","index":"not_analyzed"},

"path":{"type":"string","analyzer":"path_analyzer",

"store":true}

}

}

}

}

}

}'

Asyoucansee,wehaveasingletypecreated–thecategorytype.Wewilluseittostoreandindextheinformationaboutthelocationofourdocumentinthetreestructure.Theideaissimple–wecanshowthelocationofthedocumentasapath,intheexactsamemannerasthefilesanddirectoriesarepresentedonyourharddiskdrive.Forexample,inanautomotiveshop,wecanhave/cars/passenger/sport,/cars/passenger/camper,or/cars/delivery_truck/.However,toachievethat,weneedtoindexthispathintwodifferentways.Firstofall,wewilluseannotanalyzedfieldcalledname,tostoreandindexpathsnameinitsoriginalform.Wewillalsouseafieldcalledpath,whichwillusethepath_analyzeranalyzerwhichwe’vedefinedtoprocessthepathsoitiseasiertosearch.

www.EBooksWorld.ir

AnalysisNow,let’sseewhatElasticsearchwilldowiththecategorypathduringtheanalysisprocess.Toseethis,wewillusethefollowingcommandline,whichusestheanalysisAPIdiscussedintheUnderstandingtheexplaininformationsectionofChapter6,MakeYourSearchBetter:

curl-XGET'localhost:9200/path/_analyze?field=category.path&pretty'-d

'/cars/passenger/sport'

ThefollowingresultswillbereturnedbyElasticsearch:

{

"tokens":[{

"token":"/cars",

"start_offset":0,

"end_offset":5,

"type":"word",

"position":0

},{

"token":"/cars/passenger",

"start_offset":0,

"end_offset":15,

"type":"word",

"position":0

},{

"token":"/cars/passenger/sport",

"start_offset":0,

"end_offset":21,

"type":"word",

"position":0

}]

}

Aswecansee,ourcategorypath/cars/passenger/sportwasprocessedbyElasticsearchanddividedintothreetokens.Thankstothis,wecansimplyfindeverydocumentthatbelongstoagivencategoryoritssubcategoriesusingthetermfilter.Fortheexampletobecomplete,let’sindexasimpledocumentbyusingthefollowingcommand:

curl-XPUT'localhost:9200/path/category/1'-d'{"category":

"/cars/passenger/sport"}'

Anexampleofusingfiltersisasfollows:

curl-XGET'localhost:9200/path/_search?pretty'-d'{

"query":{

"bool":{

"filter":{

"term":{

"category.path":"/cars"

}

}

}

}

}'

www.EBooksWorld.ir

Notethatwealsohavetheoriginalvalueindexedinthecategory.namefield.Thisishandywhenwewanttofinddocumentsfromaparticularpath,ignoringthedocumentsthataredeeperinthehierarchy.

www.EBooksWorld.ir

www.EBooksWorld.ir

IndexingdatathatisnotflatNotalldataisflatliketheexampleswehaveusedinthebookuntilnow.MostofthedatayouwillencounterwillhavesomestructureandnestedobjectsinsidetherootJSONobject.Ofcourse,ifwearebuildingoursystemthatElasticsearchwillbeapartofandweareincontrolofallthepiecesofit,wecancreateastructurethatisconvenientforElasticsearch.Buteveninsuchcases,flatdataisnotalwaysanoption.Thankfully,Elasticsearchallowsustoindexdatathatisnotflatandthissectionwillshowushowtodothat.

www.EBooksWorld.ir

DataLet’sassumethatwehavethefollowingdata(westoreitinthefilecalledstructured_data.json):

{

"author":{

"name":{

"firstName":"Fyodor",

"lastName":"Dostoevsky"

}

},

"isbn":"123456789",

"englishTitle":"CrimeandPunishment",

"year":1886,

"characters":[

{

"name":"Raskolnikov"

},

{

"name":"Sofia"

}

],

"copies":0

}

Asyoucanseethedataisnotflat–itcontainsarraysandnestedobjects.Ifwewanttocreatemappingsandusetheknowledgethatwe’vegotsofar,wewillhavetoflattenthedata.However,aswealreadysaid,Elasticsearchallowssomedegreeofstructureandweshouldbeabletocreatemappingsthatwillworkfortheprecedingexample.

www.EBooksWorld.ir

ObjectsTheprecedingexampledatashowsthestructuredJSONfile.Asyoucanseeintheexample,ourrootobjecthassomeadditional,simpleproperties,suchasenglishTitle,isbn,year,andcopies.Thesewillbeindexedasnormalfieldsintheindexandwealreadyknowhowtodealwiththem(wediscussedthatintheMappingsconfigurationsectionofChapter2,IndexingYourData).Inadditiontothat,ithasthecharactersarraytypeandtheauthorobject.Theauthorobjecthasanotherobjectnestedwithinit–thenameobject,whichhastwoproperties:firstNameandlastName.Soasyoucansee,wecanhavemultiplenestedobjectsinsideeachother.

www.EBooksWorld.ir

ArraysWehavealreadyusedarraytypedata,butwedidn’ttalkaboutit.Bydefault,allthefieldsinLuceneandthusinElasticsearcharemultivalued,whichmeansthattheycanstoremultiplevalues.InordertosendsuchfieldstoindexingtoElasticsearch,weusetheJSONarraytype,whichisnestedwithintheopeningandclosingsquarebrackets[].Asyoucanseeintheprecedingexample,weusedthearraytypeforthecharactersofourbook.

www.EBooksWorld.ir

MappingsLet’snowlookathowourmappingswouldlooklikeforthebookobjectweshowedearlier.Wealreadysaidthattoindexarrayswedon’tneedanythingspecial.So,inourcase,toindexthecharactersdatawewillneedtoaddfieldsdefinitionsimilartothefollowingone:

"characters":{

"properties":{

"name":{"type":"string"}

}

}

Nothingstrange!Wejustnestthepropertiessectioninsidethearraysname(whichischaractersinourcase)andwedefinethefieldsthere.Astheresultoftheprecedingmappings,wewillgetthecharacters.namemultivaluedfieldintheindex.

Wedosimilarthingforourauthorobject.Wecallthesectionwiththesamenameasitispresentinthedata.Wehavetheauthorobject,butitalsohasthenameobjectnestedinit,sowedothesame–wejustnestanotherobjectinsideit.So,ourmappingsfortheauthorfieldwouldlookasfollows:

"author":{

"properties":{

"name":{

"properties":{

"firstName":{"type":"string"},

"lastName":{"type":"string"}

}

}

}

}

ThefirstNameandlastNamefieldsappearintheindexasauthor.name.firstNameandauthor.name.lastName.

Therestofthefieldsaresimplecoretypes,soI’llskipdiscussingthemastheywerealreadydiscussedintheMappingsconfigurationsectionofChapter2,IndexingYourData.

FinalmappingsSoourfinalmappingsfile,thatwe’vecalledstructured_mapping.json,lookslikethefollowing:

{

"book":{

"properties":{

"author":{

"type":"object",

"properties":{

"name":{

"type":"object",

www.EBooksWorld.ir

"properties":{

"firstName":{"type":"string"},

"lastName":{"type":"string"}

}

}

}

},

"isbn":{"type":"string"},

"englishTitle":{"type":"string"},

"year":{"type":"integer"},

"characters":{

"properties":{

"name":{"type":"string"}

}

},

"copies":{"type":"integer"}

}

}

}

SendingthemappingstoElasticsearchNowthatwehaveourmappingsdone,wewouldliketotestifalltheworkwedidactuallyworks.Thistimewewilluseaslightlydifferenttechniqueofcreatinganindexandputtingthemappings.First,let’screatethelibraryindexwiththefollowingcommand(youneedtodeletethelibraryindexifyoualreadyhaveit):

curl-XPUT'localhost:9200/library'

Now,let’ssendourmappingsforthebooktype:

curl-XPUT'localhost:9200/library/book/_mapping'-d

@structured_mapping.json

Nowwecanindexourexampledata:

curl-XPOST'localhost:9200/library/book/1'-d@structured_data.json

www.EBooksWorld.ir

TobeornottobedynamicAswealreadyknow,Elasticsearchisschema-less,whichmeansthatitcanindexdatawithouttheneedofcreatingthemappingsupfront.WhatElasticsearchwilldointhebackgroundwhenanewfieldisencounteredinthedataisamappingupdate–itwilltrytoguessthefieldtypeandaddittothemappings.ThedynamicbehaviorofElasticsearchisturnedonbydefault,buttheremaybesituationswhereyoumaywanttoturnitoffforsomepartsofyourindex.Inordertodothat,oneshouldaddthedynamicpropertytothegivenfieldandsetittofalse.Thisshouldbedoneonthesamelevelofnestingasthetypepropertyfortheobject,whichshouldn’tbedynamic.Forexample,ifwewantourauthorandnameobjectstonotbedynamic,weshouldmodifytherelevantpartofthemappingsfilesothatitlooksasfollows:

"author":{

"type":"object",

"dynamic":false,

"properties":{

"name":{

"type":"object",

"dynamic":false,

"properties":{

"firstName":{"type":"string","index":"analyzed"},

"lastName":{"type":"string","index":"analyzed"}

}

}

}

}

However,rememberthatinordertoaddnewfieldsforsuchobjects,wewouldhavetoupdatethemappings.

NoteYoucanalsoturnoffthedynamicmappingsfunctionalitybyaddingtheindex.mapper.dynamicpropertytoyourelasticsearch.ymlconfigurationfileandsettingittofalse.

www.EBooksWorld.ir

DisablingobjectindexingThereisoneadditionalthingthatwewouldliketomentionwhenitcomestoobjectshandling–wecandisableindexingaparticularobjectbyusingtheenabledpropertyandsettingittofalse.Theremaybevariousreasonsforthat,suchasnotwantingafieldtobeindexedornotwantingawholeJSONobjecttobeindexed.Forexample,ifwewanttoomitanobjectcalledinformationfromourauthorobject,wewillhavetheauthorobjectdefinitionlookasfollows:

"author":{

"type":"object",

"properties":{

"name":{

"type":"object",

"dynamic":false,

"properties":{

"firstName":{"type":"string","index":"analyzed"},

"lastName":{"type":"string","index":"analyzed"},

"information":{"type":"object","enabled":false}

}

}

}

}

Thedynamicparametercanalsobesettostrict.Thismeansthatnewfieldswon’tbeaddedintothedocumentwhentheyappearandtheindexingofsuchdocumentwillfail.

www.EBooksWorld.ir

www.EBooksWorld.ir

UsingnestedobjectsNestedobjectscancomeinhandyincertainsituations.Basically,withnestedobjectsElasticsearchallowsustoconnectmultipledocumentstogether–onemaindocumentandmultipledependentones.Themaindocumentandthenestedonesareindexedtogetherandtheyareplacedinthesamesegmentoftheindex(actually,inthesameblockinsidethesegment,neareachother),whichguaranteesthebestperformancewecangetforsuchadatastructure.Thesamegoesforchangingthedocument;unlessyouareusingtheupdateAPI,youneedtoindextheparentdocumentandalltheothernestedonesatthesametime.

NoteIfyouwouldliketoreadmoreabouthownestedobjectsworkontheApacheLucenelevel,thereisaverygoodblogpostwrittenbyMikeMcCandlessathttp://blog.mikemccandless.com/2012/01/searching-relational-content-with.html.

Nowlet’sgetonwithourexampleusecase.Imaginethatwehaveashopwithclothesandwestorethesizeandcolorofeacht-shirt.Ourstandard,non-nestedmappingswilllooklikethis(storedincloth.json):

{

"cloth":{

"properties":{

"name":{"type":"string"},

"size":{"type":"string","index":"not_analyzed"},

"color":{"type":"string","index":"not_analyzed"}

}

}

}

Tocreatetheshopindexwithoutclothmapping,werunthefollowingcommands:

curl-XPOST'localhost:9200/shop'

curl-XPUT'localhost:9200/shop/cloth/_mapping'-d@cloth.json

Nowimaginethatwehaveat-shirtinourshopthatweonlyhaveinXXLsizeinredandinXLsizeinblack.Soourexampledocumentindexationcommandwilllookasfollows:

curl-XPOST'localhost:9200/shop/cloth/1'-d'{

"name":"Testshirt",

"size":["XXL","XL"],

"color":["red","black"]

}'

However,thereisaproblemwithsuchadatastructure.WhatifoneofourclientssearchesourshopinordertofindtheXXLt-shirtinblack?Let’scheckthatbyrunningthefollowingquery(weassumethatwe’veusedourmappingstocreatetheindexandwe’veindexedourexampledocument):

curl-XGET'localhost:9200/shop/cloth/_search?pretty=true'-d'{

"query":{

www.EBooksWorld.ir

"bool":{

"must":[

{

"term":{"size":"XXL"}

},

{

"term":{"color":"black"}

}

]

}

}

}'

Weshouldgetnoresults,right?ButinfactElasticsearchreturnedthefollowingdocument:

{

(…)

"hits":{

"total":1,

"max_score":0.4339554,

"hits":[{

"_index":"shop",

"_type":"cloth",

"_id":"1",

"_score":0.4339554,

"_source":{

"name":"Testshirt",

"size":["XXL","XL"],

"color":["red","black"]

}

}]

}

}

Thisisbecausethedocumentwasmatched–wehavethevalueswearesearchingforinthesizefieldandinthecolorfield.Ofcourse,thisisnotwhatwewouldliketoget.

So,let’smodifyourmappingstousethenestedobjectstoseparatecolorandsizetodifferentnesteddocuments.Thefinalmappinglooksasfollows(westorethesemappingsinthecloth_nested.jsonfile):

{

"cloth":{

"properties":{

"name":{"type":"string","index":"analyzed"},

"variation":{

"type":"nested",

"properties":{

"size":{"type":"string","index":"not_analyzed"},

"color":{"type":"string","index":"not_analyzed"}

}

}

}

}

}

www.EBooksWorld.ir

Now,wewillcreateasecondindexcalledshop_nestedusingourmodifiedmappingsbyrunningthefollowingcommands:

curl-XPOST'localhost:9200/shop_nested'

curl-XPUT'localhost:9200/shop_nested/cloth/_mapping'-d

@cloth_nested.json

Asyoucansee,we’veintroducedanewobjectinsideourclothtype–variationone,whichisanestedone(thetypepropertysettonested).Itbasicallysaysthatwewillwanttoindexthenesteddocuments.Now,let’smodifyourdocument.Wewilladdthevariationobjecttoitandthatobjectwillstoretheobjectswithtwoproperties–sizeandcolor.Sotheindexcommandforourmodifiedexampleproductwilllooklikethefollowing:

curl-XPOST'localhost:9200/shop_nested/cloth/1'-d'{

"name":"Testshirt",

"variation":[

{"size":"XXL","color":"red"},

{"size":"XL","color":"black"}

]

}'

We’vestructuredthedocumentsothateachsizeanditsmatchingcolorisaseparatedocument.However,ifyourunourpreviousquery,itwon’treturnanydocuments.Thisisbecauseinordertoqueryfornesteddocuments,weneedtouseaspecializedquery.Sonowourquerylooksasfollows:

curl-XGET'localhost:9200/shop_nested/cloth/_search?pretty=true'-d'{

"query":{

"nested":{

"path":"variation",

"query":{

"bool":{

"must":[

{"term":{"variation.size":"XXL"}},

{"term":{"variation.color":"black"}}

]

}

}

}

}

}'

Andnow,theprecedingquerywillnotreturntheindexeddocument,becausewedon’thaveanesteddocumentthathasthesizeequaltoXXLandcolorblack.

Let’sgetbacktothequeryforasecondtodiscussitbriefly.Asyoucansee,weusedthenestedqueryinordertosearchinthenesteddocuments.Thepathpropertyspecifiesthenameofthenestedobject(yes,wecanhavemultipleofthem).Wejustincludedastandardquerysectionunderthenestedtype.Alsonotethatwespecifiedthefullpathforthefieldnamesinthenestedobjects,whichishandywhenyouhavemultilevelnesting,whichisalsopossible.

www.EBooksWorld.ir

ScoringandnestedqueriesThereisoneadditionalpropertywhenitcomestohandlingnesteddocumentsduringquery.Inadditiontothepathproperty,thereisthescore_modeproperty,whichallowsustodefinehowthescoringiscalculatedfromthenestedqueries.Elasticsearchallowsustosetthescore_modepropertytooneofthefollowingvalues:

avg:Thisisthedefaultvalue.Usingitforthescore_modepropertywillresultinElasticsearchtakingtheaveragevaluecalculatedfromthescoresofthedefinednestedqueries.Calculatedaveragewillbeincludedinthescoreofthemainquery.sum:Usingthisvalueforthescore_modepropertywillresultinElasticsearchtakingasumofthescoresforeachnestedqueryandincludingitinthescoreofthemainquery.min:Usingthisvalueforthescore_modepropertywillresultinElasticsearchtakingthescoreoftheminimumscoringnestedqueryandincludingitinthescoreofthemainquery.max:Usingthisvalueforthescore_modepropertywillresultinElasticsearchtakingthescoreofthemaximumscoringnestedqueryandincludingitinthescoreofthemainquery.none:Usingthisvalueforthescore_modepropertywillresultinnoscorebeingtakenfromthenestedquery.

www.EBooksWorld.ir

www.EBooksWorld.ir

Usingtheparent-childrelationshipIntheprevioussection,wediscussedusingElasticsearchtoindexthenesteddocumentsalongwiththeparentone.However,eventhoughthenesteddocumentsareindexedasseparatedocumentsintheindex,wecan’tchangeasinglenesteddocument(unlessweusetheupdateAPI).Elasticsearchallowsustohavearealparent-childrelationshipandwewilllookatitinthefollowingsection.

www.EBooksWorld.ir

IndexstructureanddataindexingLet’susethesameexamplethatweusedwhendiscussingthenesteddocuments–thehypotheticalclothstore.Whatwewouldliketohaveistheabilitytoupdatethesizesandcolorswithouttheneedtoindexthewholeparentdocumentaftereachchange.WewillseehowtoachievethatusingElasticsearchparent-childfunctionality.

ChildmappingsFirstwehavetocreateachildindexdefinition.Tocreatechildmappings,weneedtoaddthe_parentpropertywiththenameoftheparenttype,whichwillbeclothinourcase.Inthechildrendocuments,wewanttohavethesizeandthecolorofthecloth.So,thecommandthatwillcreatetheshopindexandthevariationtypewilllookasfollows:

curl-XPOST'localhost:9200/shop'

curl-XPUT'localhost:9200/shop/variation/_mapping'-d'{

"variation":{

"_parent":{"type":"cloth"},

"properties":{

"size":{"type":"string","index":"not_analyzed"},

"color":{"type":"string","index":"not_analyzed"}

}

}

}'

Andthat’sall.Youdon’tneedtospecifywhichfieldwillbeusedtoconnectthechilddocumentstotheparentones.Bydefault,Elasticsearchwillusethedocuments’uniqueidentifierforthat.Ifyourememberfromthepreviouschapters,theinformationaboutauniqueidentifierispresentintheindexbydefault.

ParentmappingsTheonlyfieldweneedtohaveinourparentdocumentisname.Wedon’tneedanythingmorethanthat.So,inordertocreateourclothtypeintheshopindex,wewillrunthefollowingcommands:

curl-XPUT'localhost:9200/shop/cloth/_mapping'-d'{

"cloth":{

"properties":{

"name":{"type":"string"}

}

}

}'

TheparentdocumentNowwearegoingtoindexourparentdocument.Aswewanttostoretheinformationaboutthesizeandthecolorinthechilddocuments,theonlythingweneedtohaveintheparentdocumentsisthename.Ofcourse,thereisonethingtoremember–ourparentdocumentsneedtobeoftypecloth,becauseofthe_parentpropertyvalueinthechildmappings.Theindexingcommandforourparentdocumentisverysimpleandlooksasfollows:

www.EBooksWorld.ir

curl-XPOST'localhost:9200/shop/cloth/1'-d'{

"name":"Testshirt"

}'

Ifyoulookattheprecedingcommand,you’llnoticethatourdocumentwillbegiventheidentifier1.

ChilddocumentsToindexthechilddocuments,weneedtoprovideinformationabouttheparentdocumentwiththeuseoftheparentrequestparameter.Thevalueoftheparentparametershouldpointtotheidentifieroftheparentdocument.So,toindextwochilddocumentstoourparentdocument,weneedtorunthefollowingcommandlines:

curl-XPOST'localhost:9200/shop/variation/1000?parent=1'-d'{

"color":"red",

"size":"XXL"

}'

curl-XPOST'localhost:9200/shop/variation/1001?parent=1'-d'{

"color":"black",

"size":"XL"

}'

Andthat’sall.We’veindexedtwoadditionaldocuments,whichareofourvariationtype,butwe’vespecifiedthatourdocumentshaveaparent,thedocumentwithanidentifierof1.

www.EBooksWorld.ir

QueryingWe’veindexedourdataandnowweneedtouseappropriatequeriestomatchthedocumentswiththedatastoredintheirchildren.Thisisbecause,bydefault,Elasticsearchsearchesonthedocumentswithoutlookingattheparent-childrelations.Forexample,thefollowingquerywillmatchallthreedocumentsthatwe’veindexed(twochildrenandoneparent):

curl-XGET'localhost:9200/shop/_search?q=*&pretty'

Thisisnotwhatwewouldliketoachieve,atleastinmostcases.Usually,weareinterestedinparentdocumentsthathavechildrenmatchingthequery.OfcourseElasticsearchprovidessuchfunctionalitieswithspecializedtypesofqueries.

NoteThethingtorememberthoughisthat,whenrunningqueriesagainstparents,thechildrendocumentswon’tbereturned,andviceversa.

QueryingdatainthechilddocumentsImaginethatwewanttogetclothesthatareoftheXXLsizeandarered.Asyourecall,thesizeandthecoloroftheclothareindexedinthechilddocuments,soweneedaspecializedhas_childquery,tocheckwhichparentdocumentshavechildrenwiththedesiredsizeandcolor.Soanexamplequerythatmatchesourrequirementlooksasfollows:

curl-XGET'localhost:9200/shop/_search?pretty'-d'{

"query":{

"has_child":{

"type":"variation",

"query":{

"bool":{

"must":[

{"term":{"size":"XXL"}},

{"term":{"color":"red"}}

]

}

}

}

}

}'

Thequeryisquitesimple;itisofthehas_childtype,whichtellsElasticsearchthatwewanttosearchinthechilddocuments.Inordertospecifywhichtypeofchildrenweareinterestedin,wespecifythetypepropertywiththenameofthechildtype.Thequeryisprovidedusingthequeryproperty.We’veusedastandardboolquery,whichwe’vealreadydiscussed.Theresultofthequerywillcontainonlythoseparentdocumentsthathavechildrenmatchingourboolquery.Inourcase,thesingledocumentreturnedlooksasfollows:

{

www.EBooksWorld.ir

"took":16,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":1.0,

"hits":[{

"_index":"shop",

"_type":"cloth",

"_id":"1",

"_score":1.0,

"_source":{

"name":"Testshirt"

}

}]

}

}

Thehas_childqueryallowsustoprovideadditionalparameterstocontrolitsbehavior.Everyparentdocumentfoundmaybeconnectedwithoneormorechilddocuments.Thismeansthateverychilddocumentcaninfluencetheresultingscore.Bydefault,thequerydoesn’tcareaboutthechildrendocuments,howmanyofthemmatched,andwhatistheircontent–itonlymattersiftheymatchthequeryornot.Thiscanbechangedbyusingthescore_modeparameter,whichcontrolsthescorecalculationofthehas_childquery.Thevaluesthisparametercantakeare:

none:Thedefaultone,thescoregeneratedbytherelationis1.0min:Thescoreistakenfromthelowestscoredchildmax:Thescoreistakenfromthehighestscoredchildsum:Thescoreiscalculatedasthesumofthechildscoresavg:Thescoreistakenastheaverageofthechildscores

Let’sseeanexample:

curl-XGET'localhost:9200/shop/_search?pretty'-d'{

"query":{

"has_child":{

"type":"variation",

"score_mode":"sum",

"query":{

"bool":{

"must":[

{"term":{"size":"XXL"}},

{"term":{"color":"red"}}

]

}

}

}

}

}'

www.EBooksWorld.ir

Weusedsumasscore_modewhichresultsinchildrencontributingtothefinalscoreoftheparentdocument–thecontributionisthesumofscoresofeverychilddocumentmatchingthequery.

Andfinally,wecanlimitthenumberofchildrendocumentsthatneedtobematched;wecanspecifyboththemaximumnumberofthechildrendocumentsallowedtobematched(themax_childrenproperty)andtheminimumnumberofchildrendocuments(themin_childrenproperty)thatneedtobematched.Thequeryillustratingtheusageoftheseparametersisasfollows:

curl-XGET'localhost:9200/shop/_search?pretty'-d'{

"query":{

"has_child":{

"type":"variation",

"min_children":1,

"max_children":3,

"query":{

"bool":{

"must":[

{"term":{"size":"XXL"}},

{"term":{"color":"red"}}

]

}

}

}

}

}'

QueryingdataintheparentdocumentsSometimes,wearenotinterestedintheparentdocumentsbutinthechildrendocuments.Ifyouwouldliketoreturnthechilddocumentsthatmatchesagivendataintheparentdocument,Elasticsearchhasaqueryforus–thehas_parentquery.Itissimilartothehas_childquery;however,insteadofthetypeproperty,wespecifytheparent_typepropertywiththevalueoftheparentdocumenttype.Forexample,thefollowingquerywillreturnboththechilddocumentsthatwe’veindexed,butnottheparentdocument:

curl-XGET'localhost:9200/shop/_search?pretty'-d'{

"query":{

"has_parent":{

"parent_type":"cloth",

"query":{

"term":{"name":"test"}

}

}

}

}'

TheresponsefromElasticsearchwillbesimilartothefollowingone:

{

"took":3,

"timed_out":false,

"_shards":{

www.EBooksWorld.ir

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":2,

"max_score":1.0,

"hits":[{

"_index":"shop",

"_type":"variation",

"_id":"1000",

"_score":1.0,

"_routing":"1",

"_parent":"1",

"_source":{

"color":"red",

"size":"XXL"

}

},{

"_index":"shop",

"_type":"variation",

"_id":"1001",

"_score":1.0,

"_routing":"1",

"_parent":"1",

"_source":{

"color":"black",

"size":"XL"

}

}]

}

}

Similartothehas_childquery,thehas_parentqueryalsogivesusthepossibilityoftuningthescorecalculationofthequery.Inthiscase,score_modehasonlytwooptions:none,thedefaultonewherethescorecalculatedbythequeryisequalto1.0,andscore,whichcalculatesthescoreofthedocumentonthebasisoftheparentdocumentcontents.Anexamplethatusesscore_modeinthehas_parentquerylooksasfollows:

curl-XGET'localhost:9200/shop/_search?pretty'-d'{

"query":{

"has_parent":{

"parent_type":"cloth",

"score_mode":"score",

"query":{

"term":{"name":"test"}

}

}

}

}'

Theonedifferencewiththepreviousexampleisscore_mode.Ifyouchecktheresultsofthesequeries,you’llnoticeonlyasingledifference.Thescoreofallthedocumentsfromthefirstexampleis1.0,whilethescorefortheresultsreturnedbytheprecedingqueryisequalto0.8784157.Inthiscase,allthedocumentsfoundhavethesamescore,because

www.EBooksWorld.ir

theyhaveacommonparentdocument.

www.EBooksWorld.ir

PerformanceconsiderationsWhenusingElasticsearchparent-childfunctionality,youhavetobeawareoftheperformanceimpactthatithas.Thefirstthingyouneedtorememberisthattheparentandthechilddocumentsneedtobestoredinthesameshardinorderforthequeriestowork.Ifyouhappentohaveahighnumberofchildrenforasingleparent,youmayendupwithshardsnothavingasimilarnumberofdocuments.Becauseofthat,yourqueryperformancecanbelowerononeofthenodes,resultinginthewholequerybeingslower.Also,rememberthatparent-childquerieswillbeslowerthanonesthatrunagainstthedocumentsthatdon’thavearelationshipbetweenthem.Thereisawayofspeedingupjoinsfortheparent-childqueriesatthecostofmemorybyeagerlyloadingthesocalledglobalordinals;however,wewilldiscussthatmethodintheElasticsearchcachessectionofChapter9,ElasticsearchClusterinDetail.

Finally,thefirstquerywillpreloadandcachethedocumentidentifiersusingthedocvalues.Thistakestime.Inordertoimprovetheperformanceofinitialqueriesthatusetheparent-childrelationship,WarmerAPIcanbeused.YoucanfindmoreinformationabouthowtoaddwarmingqueriestoElasticsearchintheWarmingupsectionofChapter10,AdministratingYourCluster.

www.EBooksWorld.ir

www.EBooksWorld.ir

ModifyingyourindexstructurewiththeupdateAPIInthepreviouschapters,wediscussedhowtocreateindexmappingsandindexthedata.Butwhatifyoualreadyhavethemappingscreated,anddataindexed,butyouwanttomodifythestructureoftheindex?Ofcourseonecouldsaythatwecouldjustcreateanewindexwithnewmappings,butthatisnotalwaysapossibility,especiallyinaproductionenvironment.Thisispossibletosomeextent.Forexample,bydefault,ifweindexadocumentwithanewfield,Elasticsearchwilladdthatfieldtotheindexstructure.Let’snowlookathowtomodifytheindexstructuremanually.

NoteForsituationswheremappingchangesareneededandtheyarenotpossiblebecauseofconflictswiththecurrentindexstructure,itisverygoodtousealiases–bothreadandwriteones.WewilldiscussaliasingintheIndexaliasingsectionofChapter10,AdministratingYourCluster.

www.EBooksWorld.ir

ThemappingsLet’sassumethatwehavethefollowingmappingsforourusersindexstoredintheuser.jsonfile:

{

"user":{

"properties":{

"name":{"type":"string"}

}

}

}

Asyoucansee,itisverysimple.Itjusthasasinglepropertythatwillholdtheusername.Nowlet’screateanindexcalledusersandlet’susetheprecedingmappingstocreateourtype.Todothat,wewillrunthefollowingcommands:

curl-XPOST'localhost:9200/users'

curl-XPUT'localhost:9200/users/user/_mapping'-d@user.json

Ifeverythinggoeswell,wewillhaveourindex(calledusers)andtype(calleduser)created.Sonowlet’strytoaddanewfieldtothemappings.

AddinganewfieldtotheexistingindexInordertoillustratehowtoaddanewfieldtoourmappings,weassumethatwewanttoaddaphonenumbertothedatastoredforeachuser.Inordertodothat,weneedtosendanHTTPPUTcommandtothe/index_name/type_name/_mappingRESTendpointwiththeproperbodythatwillincludeournewfield.Forexample,toaddthementionedphonefield,wewillrunthefollowingcommand:

curl-XPUT'http://localhost:9200/users/user/_mapping'-d'{

"user":{

"properties":{

"phone":{"type":"string",index:"not_analyzed"}

}

}

}'

Similartothepreviouscommandweran,ifeverythinggoeswell,weshouldhaveanewfieldaddedtoourindexstructure.

NoteOfcourse,Elasticsearchwon’treindexourdataorpopulatethenewlyaddedfieldautomatically.Itwilljustalterthemappingsheldbythemasternodeandpopulatethemappingstoalltheothernodesintheclusterandthat’sall.Datareindexationmustbedonebyusortheapplicationthatindexesthedatainourenvironment.Untilthen,theolddocumentswon’thavethenewlyaddedfield.Thisiscrucialtoremember.Ifyoudon’thavetheoriginaldocuments,youcanusethe_sourcefieldtogettheoriginaldatafromElasticsearchandindexthemonceagain.

Toensureeverythingisokay,wecanruntheGETHTTPrequesttothe_mappingRESTend

www.EBooksWorld.ir

pointandElasticsearchwillreturntheappropriatemappings.Anexamplecommandtogetthemappingsforourusertypeintheusersindexwilllookasfollows:

curl-XGET'localhost:9200/users/user/_mapping?pretty'

ModifyingfieldsofanexistingindexOurusersindexstructurecontainstwofields:nameandphone.Let’simaginethatweindexedsomedatabutafterawhilewedecidedthatwewanttosearchonthephonefieldandwewouldliketochangeitsindexpropertyfromnot_analyzedtoanalyzed.Becausewealreadyknowhowtoaltertheindexstructure,wewillrunthefollowingcommand:

curl-XPUT'http://localhost:9200/users/user/_mapping?pretty'-d'{

"user":{

"properties":{

"phone":{"type":"string","store":"yes","index":"analyzed"}

}

}

}'

WhatElasticsearchwillreturnisaresponseindicatinganerror,whichlooksasfollows:

{

"error":{

"root_cause":[{

"type":"illegal_argument_exception",

"reason":"Mapperfor[phone]conflictswithexistingmappingin

othertypes:\n[mapper[phone]hasdifferent[index]values,mapper[phone]

hasdifferent[store]values,mapper[phone]hasdifferent[omit_norms]

values,cannotchangefromdisabletoenabled,mapper[phone]hasdifferent

[analyzer]]"

}],

"type":"illegal_argument_exception",

"reason":"Mapperfor[phone]conflictswithexistingmappinginother

types:\n[mapper[phone]hasdifferent[index]values,mapper[phone]has

different[store]values,mapper[phone]hasdifferent[omit_norms]values,

cannotchangefromdisabletoenabled,mapper[phone]hasdifferent

[analyzer]]"

},

"status":400

}

Thisisbecausewecan’tchangeafieldthatwassettobenot_analyzedtoonethatisanalyzed.Andnotonlythat,inmostcasesyouwon’tbeabletoupdatethefieldsmapping.Thisisagoodthing,becauseifwewouldbeallowedtochangesuchsettings,wewouldconfuseElasticsearchandLucene.Imaginethatwealreadyhavemanydocumentswiththephonefieldsettonot_analyzedandweareallowedtochangethemappingstoanalyzed.Elasticsearchwouldn’tchangethedatathatwasalreadyindexed,butthequeriesthatareanalyzedwouldbeprocessedwithadifferentlogicandthusyouwouldn’tbeabletoproperlyfindyourdata.

However,togiveyousomeexamplesofwhatisprohibitedandwhatisnot,wedecidedtomentionsomeoftheoperationsforboththecases.Forexample,thefollowingmodificationcanbesafelymade:

www.EBooksWorld.ir

AddinganewtypedefinitionAddinganewfieldAddinganewanalyzer

Thefollowingmodificationsareprohibitedorwillnotwork:

EnablingnormsforafieldChangingafieldtobestoredornotstoredChangingthetypeofthefield(forexample,fromtexttonumeric)ChangingastoredfieldtonotstoredandviceversaChangingthevalueofindexedpropertyChangingtheanalyzerofanalreadyindexeddocument

RememberthattheprecedingmentionedexamplesofallowedandnotallowedupdatesdonotmentionallthepossibilitiesofupdateAPIusageandyouhavetotryforyourselfiftheupdateyouaretryingtodowillwork.

www.EBooksWorld.ir

www.EBooksWorld.ir

SummaryThechapteryoujustfinishedreadingconcentratedonindexingoperationsandhandlingdatathatisnotflatorhaverelationshipsbetweenthedocuments.Westartedwithindexingtree-likestructuresandobjectsinElasticsearch.Wealsousednestedobjectsandlearnedwhentheycanbeused.Wealsousedparent-childfunctionalityandwelearnedhowthisapproachisdifferentcomparedtonesteddocuments.Finally,wemodifiedourindicesstructurewithacallofanAPIandlearnedwhenthisispossible.

Inthenextchapter,wewillgetbacktoqueryingrelatedtopics.WewilllearnhowLucenescoringworks,howtousescriptsinElasticsearch,andhowtohandlemultilingualdata.Wewillaffectscoringusingboostsandwewillusesynonymstoimproveusers’searchresults.Finally,wewilllookatwhatwecandotoseehowourdocumentswerescored.

www.EBooksWorld.ir

www.EBooksWorld.ir

Chapter6.MakeYourSearchBetterInthepreviouschapter,wewerefocusedonindexingoperations;welearnedhowtohandlethestructureddata.Westartedwithindexingtree-likestructuresandJSONobjects.Weusednestedobjectsandindexeddocumentsusingparent-childfunctionality.Finally,attheendofthechapter,weusedElasticsearchAPItomodifyourindicesstructures.Bytheendofthischapter,youwillhavelearnedthefollowingtopics:

UnderstandinghowApacheLucenescoringworksUsingscriptingHandlingmultilingualdataUsingboostingtoaffectdocumentscoringUsingsynonymsUnderstandinghowyourdocumentswerescored

www.EBooksWorld.ir

IntroductiontoApacheLucenescoringWhentalkingaboutqueriesandtheirrelevance,wecan’tomittheinformationaboutthescoringandwhereitcomesfrom.Butwhatisascore?Thescoreisapropertythatdescribestherelevanceofadocumentinthecontextofaquery.Inthefollowingsection,wewilltalkaboutthedefaultApacheLucenescoringmechanism–theTF/IDFalgorithmandhowitaffectsthereturneddocument.

NoteTheTF/IDFisnottheonlyavailablealgorithmexposedbyElasticsearch.Formoreinformationabouttheavailablemodels,refertotheAvailablesimilaritymodelssectioninChapter2,IndexingYourData.YoucanalsorefertothebooksMasteringElasticsearchandMasteringElasticsearchSecondEditionpublishedbyPacktPublishing.

www.EBooksWorld.ir

WhenadocumentismatchedWhenadocumentisreturnedbyLucene,itmeansthatitmatchedthequerywesenttoit.Inmostcases,eachoftheresultingdocumentsintheresponseisgivenascore.Thehigherthescore,themorerelevantthedocumentisfromthesearchengine’spointofview,ofcourse,inthecontextofagivenquery.Thismeansthatthescorefactorcalculatedforthesamedocumentontwodifferentquerieswillbedifferent.Becauseofthat,comparingscoresbetweenqueriesusuallydoesn’tmakemuchsense.However,let’sgetbacktothescoring.Tocalculatethescorepropertyforadocument,multiplefactorsaretakenintoaccount:

documentboost:Theboostvaluegivenforadocumentduringindexing.fieldboost:Theboostvaluegivenforafieldduringqueryingandindexing.coord:Thecoordinationfactorthatisbasedonthenumberoftermsthedocumenthas.Itisresponsibleforgivingmorevaluetothedocumentsthatcontainmoresearchtermscomparedtotheotherdocuments.inversedocumentfrequency:Thetermbasedfactorthattellsthescoringformulahowrareforscorepropertycalculation:inversedocumentfrequency”thegiventermis.Thehighertheinversedocumentfrequencythelesscommonthetermis.lengthnorm:Thefieldbasedfactorfornormalizationbasedonthenumberoftermsthegivenfieldcontains.Thelongerthefield,thesmallerboostthisfactorwillgive.Itbasicallymeansthattheshorterdocumentswillbefavored.termfrequency:Thetermbasedfactordescribinghowmanytimesthegiventermoccursinadocument.Thehigherthetermfrequency,thehigherthescoreofthedocumentwillbe.querynorm:Thequerybasednormalizationfactorthatiscalculatedasthesumofthesquaredweightofeachofthequeryterms.Querynormisusedtoallowscorecomparisonbetweenqueries,whichwesaidisnotalwayseasyorpossible.

www.EBooksWorld.ir

DefaultscoringformulaThepracticalformulafortheTF/IDFalgorithmlooksasfollows:

Toadjustyourqueryrelevance,youdon’tneedtorememberthedetailsoftheequation,butitisveryimportanttoknowhowitworks–toatleastbeawarethatthereisanequationyoucananalyze.Wecanseethatthescorefactorforthedocumentisafunctionofqueryqanddocumentd.Therearealsotwofactorsthatarenotdependentdirectlyonqueryterms:coordandqueryNorm.Thesetwoelementsoftheformulaaremultipliedbythesumcalculatedforeachterminthequery.Thesumontheotherhandiscalculatedbymultiplyingthetermfrequencyforthegiventerm,itsinversedocumentfrequency,termboost,andthenorm,whichisthelengthnormwediscussedpreviously.

NoteNotethattheprecedingformulaisapracticalone.YoucanfindmoreinformationabouttheconceptualformulainLuceneJavadocsathttp://lucene.apache.org/core/5_4_0/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html

Thegoodthingabouttheprecedingrulesisthatyoudon’tneedtorememberallofthat.Whatyoushouldbeawareofiswhatmatterswhenitcomestothedocumentscore.Basically,thereareafewruleswhichcomefromtheprecedingmentionedequation:

Therarerthematchedtermis,thehigherthescorethedocumentwillhaveTheshorterthedocumentfieldsare(thelesstermstheyhave),thehigherthescorethedocumentwillhaveThehighertheboostforthefieldsis,thehigherthescorethedocumentwillhave

Aswecansee,Lucenegivesahigherscoreforthedocumentsthathavemanyquerytermsmatchedandhaveshorterfields(lesstermsindexed)thatwereusedformatching,anditalsofavorsrarertermsinsteadofthecommonones(ofcourse,theonesthatmatched).

www.EBooksWorld.ir

RelevancymattersInmostcases,wewanttogetthebestmatchingdocuments.However,themostrelevantdocumentsdon’talwaysmeanthesameasthebestmatches.Someusecasesdefineverystrictrulesonwhyagivendocumentshouldbehigherontheresultslist.Forexample,onecouldsaythat,inadditiontothedocumentbeingaperfectmatchintermsofTF/IDFsimilarity,wehavepayingcustomerstoconsider.Dependingonthecustomerplan,wewanttogivemoreimportancetosuchdocuments.Insuchcases,wecouldwantthedocumentsforthecustomersthatpaythemosttobeontopofthesearchresults.Ofcourse,thisisnotrelevantinTF/IDF.

Theotherexampleisyellowpages,wherecustomerspayformoreinformationdescribingthedocument.SuchlargedocumentsmaynotbethemostrelevantonesaccordingtoTF/IDF,soyoumaywanttoadjustthescoringifyouareworkingwithsuchdata.

TheseareverysimpleexamplesandElasticsearchqueriescanbecomereallycomplicated.WewilltalkaboutsuchqueriesintheInfluencingscoreswithqueryboostssectioninthischapter.

Whenworkingonsearchrelevance,youshouldalwaysrememberthatitisnotaonetimeprocess.Yourdatawillchangewithtimeandyourquerieswillneedtobeadjusted.Inmostcases,tuningthequeryrelevancywillbeconstantwork.Youwillneedtoreacttoyourbusinessrulesandneeds,tohowtheusersbehave,andsoon.Itisveryimportanttorememberthatthisprocessisnotasingletimeoneaboutwhichyoucanforget.

www.EBooksWorld.ir

www.EBooksWorld.ir

ScriptingcapabilitiesofElasticsearchElasticsearchhasafewfunctionalitieswherescriptscanbeused.You’vealreadyseenexamplessuchasupdatingdocumentsandsearching.WewillalsousethescriptingcapabilitiesofElasticsearchwhenwediscussaggregations.Eventhoughscriptsseemtobearatheradvancedtopic,wewilllookatthepossibilitiesofferedbyElasticsearch.That’sbecausescriptsarepricelessincertainsituations.

Elasticsearchcanuseseverallanguagesforscripting.Whennotexplicitlydeclared,itassumesthatGroovy(www.groovy-lang.org/)isused.OtherlanguagesavailableoutoftheboxareLuceneexpressionlanguageandMustache(https://mustache.github.io/).Ofcoursewecanuseplugins,whichwillmakeElasticsearchunderstandadditionalscriptinglanguages,suchasJavaScript,MVEL,andPython.Thethingworthmentioningisthatindependentfromthescriptinglanguagethatwechoose,Elasticsearchexposesobjectsthatwecanuseinourscripts.Let’sstartbybrieflylookingatwhattypeofinformationweareallowedtouseinourscripts.

www.EBooksWorld.ir

ObjectsavailableduringscriptexecutionDuringdifferentoperations,Elasticsearchallowsustousedifferentobjectsinourscripts.Todevelopascriptthatfitsourusecase,weshouldbefamiliarwiththeseobjects.

Forexample,duringasearchoperation,thefollowingobjectsareavailable:

_doc(alsoavailableasdoc):Thisisaninstanceoftheorg.elasticsearch.search.lookup.LeafDocLookupobject.Itgivesusaccesstothecurrentdocumentfoundwiththecalculatedscoreandfieldvalues._source:Thisisaninstanceoftheorg.elasticsearch.search.lookup.SourceLookupobject.Itprovidesaccesstothesourceofthecurrentdocumentandthevaluesdefinedinthesource._fields:Thisisaninstanceoftheorg.elasticsearch.search.lookup.LeafFieldsLookupobject.Itcanbeusedtoaccessthevaluesofthedocumentfields.

Ontheotherhand,duringadocumentupdateoperation,theprecedingmentionedvariablesarenotaccessible.Elasticsearchexposesonlythectxobjectwiththe_sourceproperty,whichprovidesaccesstothedocumentcurrentlyprocessedintheupdaterequest.

Aswehavepreviouslyseen,severalmethodsarementionedinthecontextofdocumentfieldsandtheirvalues.Let’snowlookatexamplesofhowtogetthevalueforaparticularfieldusingthepreviouslymentionedobjectavailableduringthesearchoperation.Inthebracketsafterthescriptpiece,youcanseewhatElasticsearchwillreturnforoneofourexampledocumentsfromthelibraryindex(wewillusethedocumentwithidentifier4):

_doc.title.value(and)_source.title(crimeandpunishment)_fields.title.value(null)

Abitconfusing,isn’tit?Duringindexing,theoriginaldocumentisbydefaultstoredinthe_sourcefield.Ofcourse,bydefault,allthefieldsarepresentinthat_sourcefield.Inadditiontothat,thedocumentisparsedandeveryfieldmaybestoredinanindexifitismarkedasstored(thatis,ifthestorepropertyissettotrue;otherwise,bydefault,thefieldsarenotstored).Finally,thefieldvaluemaybeconfiguredasindexed.Thismeansthatthefieldvalueisanalyzedandplacedintheindex.Tosumup,onefieldmaylandinElasticsearchindexinthefollowingways:

Asapartofthe_sourcedocumentAsastoredandunparsedoriginalvalueAsanindexedvaluethatisprocessedbyananalyzer

Inscripts,wehaveaccesstoallthesefieldrepresentations.Theonlyexceptionistheupdateoperation,which,aswe’vementionedbefore,givesusonlyaccesstodocument_sourceaspartofthectxvariable.Youmaywonderwhichversionyoushoulduse.Well,ifyouwantaccesstotheprocessedform,theanswerwillbesimple–usethe_docobject.Whatabout_sourceand_fields?Inmostcases,_sourceisagoodchoice.Itisusually

www.EBooksWorld.ir

fastandneedslessdiskoperationsthanreadingtheoriginalfieldvaluesfromtheindex.Thisisespeciallytruewhenyouneedtoreadthevaluesofmultiplefieldsinyourscripts;fetchingasingle_sourcefieldisfasterthanfetchingmultipleindependentfieldsfromtheindex.

www.EBooksWorld.ir

ScripttypesElasticsearchallowsustousescriptsinthreedifferentways:

Inlinescripts:ThesourceofthescriptisdirectlydefinedinthequeryInfilescripts:ThesourceisdefinedintheexternalfileplacedintheElasticsearchconfig/scriptsdirectoryAsadocumentinthededicatedindex:Thesourceofthescriptisdefinedasadocumentinaspecialindexavailablebyusingthe/_scriptsAPIend-point

Choosingthewaytodefinescriptsdependsonseveralfactors.Ifyouhavescriptswhichyouwilluseinmanydifferentqueries,thefileorthededicatedindexseemtobethebestsolutions.Thescriptsinfileisprobablylessconvenient,butitispreferredfromthesecuritypointofview;theycan’tbeoverwrittenandinjectedintoyourquerycausingasecuritybreach.

InfilescriptsThisistheonlywaytoallowdynamicscriptingifwedon’twanttoenablequerydynamicscriptinginElasticsearch.Theideaisthateveryscriptusedbythequeriesisdefinedinitsownfileplacedintheconfig/scriptsdirectory.Wewillnowlookatthismethodofusingscripts.Let’screateanexamplefilecalledtag_sort.groovyandlet’splaceitintheconfig/scriptsdirectoryofourElasticsearchinstance(orinstancesifwerunacluster).Thecontentofthementionedfileshouldlooklikethis:

_doc.tags.values.size()>0?_doc.tags.values[0]:'\u19999'

Afterfewseconds,Elasticsearchwillautomaticallyloadanewfile.YoushouldseesomethinglikethefollowingintheElasticsearchlogs:

[2015-08-3013:14:33,005][INFO][script][AlexWilder]

compilingscriptfile[/Users/negativ/Developer/ES/es-

current/config/scripts/tag_sort.groovy]

NoteIfyouhavemulti-nodecluster,youhavetomakesurethatthescriptisavailableoneverynode.

Nowwearereadytousethisscriptinourqueries.YoumayrememberthatweusedexactlythesamescriptintheSortingdatasectioninChapter4,ExtendingYourQueryingKnowledge.Nowthemodifiedquerythatusesourscriptstoredinthefilelooksasfollows:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"match_all":{}

},

"sort":{

"_script":{

"script":{

"file":"tag_sort"

www.EBooksWorld.ir

},

"type":"string",

"order":"asc"

}

}

}'

Wewillreturntothis,butfirst,thenextpossiblewayofdefininginlinescripts.

InlinescriptsInlinescriptsareamoreconvenientwayofusingscripts,especiallyforconstantlychangingqueriesandforad-hocqueries.Themaindrawbackofsuchanapproachissecurity.Ifweallowuserstorunanykindofquery,includingscripts,wecanexposeourElasticsearchinstancetoattackers.SuchattackscanexecutearbitrarycodeontheserverrunningElasticsearchwithrightsequaltotheonesgiventotheuserrunningElasticsearch.Intheworstcasescenario,theattackercouldusesecurityholestogainsuperuserrights.Thisisthereasonwhyinlinescriptsaredisabledbydefault.Aftercarefulconsideration,youcanenablethembyadding:

script.inline:on

Addtheprecedingcommandlinetotheelasticsearch.ymlfile.

Afterallowingtheinlinescripttobeexecuted,wecanrunaquerythatlooksasfollows:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"match_all":{}

},

"sort":{

"_script":{

"script":{

"inline":"_doc.tags.values.size()>0?_doc.tags.values[0]:

\"\u19999\""

},

"type":"string",

"order":"asc"

}

}

}'

IndexedscriptsThelastoptionfordefiningscriptsisstoringtheminthededicatedElasticsearchindex.Forthesamesecurityreasons,dynamicexecutionoftheindexedscriptsisbydefaultdisabled.Toenabletheindexedscripts,wehavetoaddasimilarconfigurationoptiontotheoneweaddedtobeabletousetheinlinescripts.Weneedtoaddthefollowinglinetotheelasticsearch.ymlfile:

script.indexed:on

Afteraddingtheprecedingpropertytoallthenodesandrestartingthecluster,wewillbereadytostartusingtheindexedscripts.Elasticsearchprovidesanadditional,dedicated

www.EBooksWorld.ir

endpointforthispurpose.Let’sstoreourscript:

curl-XPOST'localhost:9200/_scripts/groovy/tag_sort'-d'{

"script":"_doc.tags.values.size()>0?_doc.tags.values[0]:

\"\u19999\""

}'

Thescriptisready,butlet’sdiscusswhatwejustdid.WesentanHTTPPOSTrequesttothespecial_scriptsRESTend-point.Wealsospecifiedthelanguageofthescript(groovyinourcase)andthenameofthescript(tag_sort).Thebodyoftherequestisthescriptitself.

Wecannowmoveontothequery,whichlooksasfollows:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"match_all":{}

},

"sort":{

"_script":{

"script":{

"id":"tag_sort"

},

"type":"string",

"order":"asc"

}

}

}'

Aswesee,thequeryispracticallyidenticaltothequeryusedwiththescriptdefinedinafile.Theonlydifferenceisthatweprovidedtheidentifierofthescriptusingtheidparameterinsteadofprovidingthefilename.

www.EBooksWorld.ir

QueryingwithscriptsIfwelookatanyrequestmadetoElasticsearchthatusesscripts,wewillnoticesomesimilarproperties,whichareasfollows:

script:Thispropertywrapsthescriptdefinition.inline:Thispropertyholdsthecodeofthescriptitself.id:Thispropertydefinestheidentifieroftheindexedscript.file:Thefilenameofthescriptwithouttheextension.lang:Thispropertydefinesthelanguageofthescript.Ifitisomitted,Elasticsearchassumesgroovy.params:Thisobjectcontainstheparametersandtheirvalues.Everydefinedparametercanbeusedinsidethescriptbyspecifyingthatparameter’sname.Theparametersallowustowritecleanercodewhichwillbeexecutedinamoreefficientmanner.Scriptsusingtheparametersareexecutedfasterthancodewithembeddedconstantsbecauseofcaching.

www.EBooksWorld.ir

ScriptingwithparametersAsourscriptsbecomemoreandmorecomplicated,theneedforcreatingmultiple,almostidenticalscriptscanappear.Thesescriptsusuallydifferinthevaluesused,withthelogicbehindthembeingexactlythesame.Inoursimpleexample,weusedahardcodedvalueusedtomarkdocumentswithemptytagslist.Let’schangethistoallowdefinitionofthehardcodedvalue.Let’suseinfilescriptdefinitionandcreateatag_sort_with_param.groovyfilewiththefollowingcontents:

_doc.tags.values.size()>0?_doc.tags.values[0]:tvalue

Theonlychangewe’vemadeistheintroductionoftheparameternamedtvalue,whichcanbesetinthequeryinthefollowingway:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"match_all":{}

},

"sort":{

"_script":{

"script":{

"file":"tag_sort_with_param",

"params":{

"tvalue":"000"

}

},

"type":"string",

"order":"asc"

}

}

}'

Theparamssectiondefinesallthescriptparameters.Inoursimpleexample,we’veonlyusedasingleparameter,butofcoursewecanhavemultipleparametersinasinglequery.

www.EBooksWorld.ir

ScriptlanguagesAswealreadysaid,thedefaultlanguageforscriptingisGroovy.However,wearenotlimitedtoonlyasinglescriptinglanguagewhenusingElasticsearch.Infact,ifyouwouldliketo,youcanevenuseJavatowriteyourscripts.Inadditiontothat,thecommunitybehindElasticsearchprovidesadditionallanguagessupportasplugins.Soifyouarewillingtoinstallplugins,youcanextendthelistofscriptinglanguagesthatElasticsearchsupportsevenfurther.YoumaywonderwhyyouwouldevenconsiderusingascriptinglanguageotherthanthedefaultGroovy.Thefirstreasonisyourownpreferences.Ifyouareapythonenthusiast,youareprobablynowthinkingabouthowtousepythonforyourElasticsearchscripts.Theotherreasoncouldbesecurity.Whenwetalkedabouttheinlinescripts,wetoldyouthattheyareturnedoffbydefault.Thisisnotexactlytrueforallthescriptinglanguagesavailableoutofthebox.TheinlinescriptsaredisabledbydefaultwhenusingGroovy,butyoucanuseLuceneexpressionsandMustachewithoutanyissues.Thisisbecausethoselanguagesaresandboxed,whichmeansthatthesecuritysensitivefunctionsareturnedoff.Andofcourse,thelastfactorwhenchoosingalanguageisperformance.Theoretically,thenativescripts(inJava)shouldhavebetterperformancethanothers,butyoushouldrememberthatthedifferencecanbeinsignificant.Youshouldalwaysconsiderthecostofdevelopmentandmeasureperformance.

www.EBooksWorld.ir

UsingotherthanembeddedlanguagesUsingGroovyforscriptingisasimpleandsufficientsolutionformostusecases.However,youmayhaveadifferentpreferenceandyoumayliketousesomethingdifferent,suchasJavaScript,Python,orMvel.Beforeusingotherlanguages,wemustinstallanappropriateplugin.YoucanreadmoredetailsaboutpluginsintheElasticsearchpluginssectionofChapter9,ElasticsearchCluster.Fornow,we’lljustrunthefollowingcommandfromtheElasticsearchdirectory:

bin/plugininstalllang-javascript

TheprecedingcommandwillinstallapluginthatwillallowtheusageofJavaScriptasthescriptinglanguage.Theonlychangeweshouldmakeintherequestistoaddtheadditionalinformationaboutthelanguageweareusingforscriptingand,ofcourse,modifythescriptitselftocorrectlyusethenewlanguage.Lookatthefollowingexample:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"match_all":{}

},

"sort":{

"_script":{

"script":{

"inline":"_doc.tags.values.length>0?_doc.tags.values[0]

:\"\u19999\";",

"lang":"javascript"

},

"type":"string",

"order":"asc"

}

}

}'

Asyoucansee,we’veusedJavaScriptforscriptinginsteadofthedefaultGroovy.ThelangparameterinformsElasticsearchaboutthelanguagebeingused.

www.EBooksWorld.ir

UsingnativecodeIncasethescriptsaretoosloworyoudon’tlikescriptinglanguages,ElasticsearchallowsyoutowriteJavaclassesandusetheminsteadofscripts.Therearetwopossiblewaysofaddingnativescripts:addingclassesdefiningscriptstoElasticsearchclasspathoraddingscriptasafunctionalityprovidedbyaplugin.Wewilldescribethissecondsolutionasitismoreelegant.

ThefactoryimplementationWeneedtoimplementatleasttwoclassestocreateanewnativescript.Thefirstoneisafactoryforourscript.Fornow,let’sfocusonit.Thefollowingsamplecodeillustratesthefactoryforourscript:

packagepl.solr.elasticsearch.examples.scripts;

importjava.util.Map;

importorg.elasticsearch.common.Nullable;

importorg.elasticsearch.script.ExecutableScript;

importorg.elasticsearch.script.NativeScriptFactory;

publicclassHashCodeSortNativeScriptFactoryimplementsNativeScriptFactory

{

@Override

publicExecutableScriptnewScript(@NullableMap<String,Object>params)

{

returnnewHashCodeSortScript(params);

}

@Override

publicbooleanneedsScores(){

returnfalse;

}

}

Theessentialpartsarehighlightedinthecodesnippet.Thisclassshouldimplementtheorg.elasticsearch.script.NativeScriptFactoryclass.Theinterfaceforcesustoimplementtwomethods.ThenewScript()methodtakestheparametersdefinedintheAPIcallandreturnsaninstanceofourscript.Finally,needsScores()informsElasticsearchifwewanttousescoringandwhetheritshouldbecalculated.

ImplementingthenativescriptNowlet’slookattheimplementationofourscript.Theideaissimple–ourscriptwillbeusedforsorting.DocumentswillbeorderedbythehashCode()valueofthechosenfield.Thedocumentswithoutavalueinthedefinedfieldwillbefirstontheresultslist.Weknowthelogicdoesn’tmaketoomuchsense,butitisgoodforpresentationasitissimple.Thesourcecodeforournativescriptlooksasfollows:

www.EBooksWorld.ir

packagepl.solr.elasticsearch.examples.scripts;

importjava.util.Map;

importorg.elasticsearch.script.AbstractSearchScript;

publicclassHashCodeSortScriptextendsAbstractSearchScript{

privateStringfield="name";

publicHashCodeSortScript(Map<String,Object>params){

if(params!=null&&params.containsKey("field")){

this.field=params.get("field").toString();

}

}

@Override

publicObjectrun(){

Objectvalue=source().get(field);

if(value!=null){

returnvalue.hashCode();

}

return0;

}

}

Firstofall,ourclassinheritsfromtheorg.elasticsearch.script.AbstractSearchScriptclassandimplementstherun()method.Thisiswherewegettheappropriatevaluesfromthecurrentdocument,processitaccordingtoourstrangelogic,andreturntheresult.Youmaynoticethesource()call.Itisexactlythesame_sourceparameterthatweusedwhendealingwithnon-nativescripts.Thedoc()andfields()methodsarealsoavailableandtheyfollowthesamelogicwedescribedearlier.

Thethingworthlookingatishowwe’veusedtheparameters.Weassumethatausercanputthefieldparameter,tellinguswhichdocumentfieldwillbeusedformanipulation.Wealsoprovideadefaultvalueforthisparameter.

TheplugindefinitionWesaidthatwewillinstallourscriptasapartofaplugin.Thisiswhyweneedadditionalfiles.ThefirstfileistheplugininitializationclasswherewetellElasticsearchaboutournewscript:

packagepl.solr.elasticsearch.examples.scripts;

importorg.elasticsearch.plugins.Plugin;

importorg.elasticsearch.script.ScriptModule;

publicclassScriptPluginextendsPlugin{

@Override

publicStringdescription(){

return"Theexampleofnativesortscript";

www.EBooksWorld.ir

}

@Override

publicStringname(){

return"naive-sort-plugin";

}

publicvoidonModule(finalScriptModulemodule){

module.registerScript("native_sort",

HashCodeSortNativeScriptFactory.class);

}

}

Theimplementationiseasy.Thedescription()andname()methodsareonlyforinformation,solet’sfocusontheonModule()method.Inourcase,weneedaccesstothescriptmodule–Elasticsearchservicewithscriptsandscriptinglanguages.ThisiswhywedefineonModule()withoneScriptModuleargument.ThankstoElasticsearchmagic,wecanusethismoduleandregisterourscriptsoitcanbefoundbytheengine.WehaveusedtheregisterScript()method,whichtakesthescriptnameandthepreviouslydefinedfactoryclass.

Thesecondneededfileisaplugindescriptorfile:plugin-descriptor.properties.ItdefinestheconstantsusedbytheElasticsearchpluginsubsystem.Withoutmorethinking,let’slookatthecontentsofthisfile:

jvm=true

classname=pl.solr.elasticsearch.examples.scripts.ScriptPlugin

elasticsearch.version=2.2.0

version=0.0.1-SNAPSHOT

name=native_script

description=ExampleNativeScripts

java.version=1.7

Theappropriatelineshavethefollowingmeaning:

jvm:tellsElasticsearchthatourfilecontainsJavacodeclassname:describesthemainclasswithplugindefinitionelasticsearch.versionandjava.version:tellsusabouttheElasticsearchversionthatissupportedbythepluginandtheJavaversionthatisneedednameanddescription:Informativenameandshortdescriptionofourplugin

Andthat’sit.Wehaveallthefilesneededtorunourscript.Pleasenotethatyoucanhavemorethanasinglescriptpackedasasingleplugin.

InstallingthepluginNowit’stimetoinstallournativescriptembeddedintheplugin.AfterpackingthecompiledclassesasaJARarchive,weshouldputitintheElasticsearchplugins/native-scriptdirectory.Thenative-scriptpartisarootdirectoryforourpluginandyoumaynameitasyouwish.Inthisdirectoryyoualsoneedthepreparedplugin-descriptor.propertiesfile.ThismakesourpluginvisibletoElasicsearch.

www.EBooksWorld.ir

RunningthescriptAfterrestartingElasticsearch(orthewholeclusterifyourunmorethanasinglenode),wecanstartsendingthequeriesthatuseournativescript.Forexample,wewillsendaquerythatusesourpreviouslyindexeddatafromthelibraryindex.Thisexamplequerylooksasfollows:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"match_all":{}

},

"sort":{

"_script":{

"script":{

"script":"native_sort",

"lang":"native",

"params":{

"field":"otitle"

}

},

"type":"string",

"order":"asc"

}

}

}'

Notetheparamspartofthequery.Inthiscall,wewanttosortontheotitlefield.Weprovidethescriptnamenative_sortandthescriptlanguagenative.Thisisrequired.Ifeverythinggoeswell,weshouldseeourresultssortedbyourcustomsortlogic.IfwelookattheresponsefromElasticsearch,wewillseethatthedocumentswithouttheotitlefieldareatthefirstfewpositionsoftheresultslistandtheirsortvalueis0.

www.EBooksWorld.ir

www.EBooksWorld.ir

SearchingcontentindifferentlanguagesUntilnow,whendiscussinglanguageanalysis,we’vetalkedmostlyabouttheory.Wedidn’tseeanexampleregardinglanguageanalysis,handlingmultiplelanguagesthatourdatacanconsistof,andsoon.Nowthiswillchange,asthissectionisdedicatedtoinformationabouthowwecanhandledatainmultiplelanguages.

www.EBooksWorld.ir

HandlinglanguagesdifferentlyAsyoualreadyknow,Elasticsearchallowsustochoosedifferentanalyzersforourdata.Wecanhaveourdatadividedonthebasisofwhitespaces,orhavethemlowercased,andsoon.Thiscanusuallybedoneregardlessofthelanguage–thesametokenizationonthebasisofwhitespaceswillworkforEnglish,German,andPolish,althoughitwon’tworkforChinese.However,whatifyouwanttofinddocumentsthatcontainwordssuchascatandcatsbyonlysendingthewordcattoElasticsearch?Thisiswherelanguageanalysiscomesintoplaywithstemmingalgorithmsfordifferentlanguages,whichallowtheanalyzedwordstobereducedtotheirrootforms.Andnowtheworstpart–wecan’tuseonegeneralstemmingalgorithmforallthelanguagesintheworld;wehavetochooseoneappropriatelanguage.Thefollowingsectionsinthechapterwillhelpyouwithsomepartsofthelanguageanalysisprocess.

www.EBooksWorld.ir

HandlingmultiplelanguagesThereareafewwaysofhandlingmultiplelanguagesinElasticsearchandallofthemhavesomeprosandcons.Wewon’tbediscussingeverything,butjustforthepurposeofgivingyouanidea,afewofthosemethodsareasfollows:

StoringdocumentsindifferentlanguagesasdifferenttypesStoringdocumentsindifferentlanguagesinseparateindicesStoringlanguagedataindifferentfieldsofasingledocument

Forthepurposeofthebook,wewillfocusonasinglemethod–theonethatallowsstoringdocumentsindifferentlanguagesinasingleindex.Wewillfocusonaproblemwherewehaveasingletypeofdocument,buteachdocumentmaycomefromanywhereintheworldandthuscanbewritteninmultiplelanguages.Also,wewouldliketoenableouruserstousealltheanalysiscapabilities,suchasstemmingandstopwordsfordifferentlanguages,notonlyforEnglish.

NoteNotethatthestemmingalgorithmsperformdifferentlyfordifferentlanguages,bothintermsofanalysisperformanceandtheresultingterms.Forexample,Englishstemmersareverygood,butyoucanrunintoissueswithEuropeanlanguages,suchasGerman.

www.EBooksWorld.ir

DetectingthelanguageofthedocumentBeforewecontinuewithshowingyouhowtosolveourproblemwithhandlingmultiplelanguagesinElasticsearch,wewouldliketotellyouaboutoneadditionalthing,thatislanguagedetection.Therearesituationswhereyoujustdon’tknowwhatlanguageyourdocumentorqueryarein.Insuchcases,languagedetectionlibrariesmaybeagoodchoice,especiallywhenusingJavaasyourprogramminglanguageofchoice.Someofthelibrariesareasfollows:

ApacheTika(http://tika.apache.org/)Languagedetection(https://github.com/shuyo/language-detection)

Thelanguagedetectionlibraryclaimstohaveover99percentprecisionfor53languages;that’salotifyouaskus.

Youshouldremember,though,thatdatalanguagedetectionwillbemorepreciseforlongertext.Becausethetextofqueriesisusuallyshort,youcanexpecttohavesomedegreeoferrorduringquerylanguageidentification.

www.EBooksWorld.ir

SampledocumentLet’sstartwithintroducingasampledocument,whichisasfollows:

{

"title":"Firsttestdocument",

"content":"Thisisatestdocument"

}

Asyoucansee,thedocumentisprettysimple;itcontainsthefollowingtwofields:

title:Thisfieldholdsthetitleofthedocumentcontent:Thisfieldholdstheactualcontentofthedocument

Thisdocumentisquitesimple,but,fromthesearchpointofview,theinformationaboutdocumentlanguageismissing.Whatweshoulddoisenrichthedocumentbyaddingtheneededinformation.Wecandothatbyusingoneofthepreviouslymentionedlibraries,whichwilltrytodetectthelanguage.

Afterwehavethelanguagedetected,weinformElasticsearchwhichanalyzershouldbeusedandmodifythedocumenttodirectlyshowthelanguageofeachfield.Eachofthefieldswouldhavetobeanalyzedbyalanguageanalyzerdedicatedtothedetectedlanguage.

NoteAfulllistoftheselanguageanalyzerscanbefoundathttps://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-lang-analyzer.html).

Ifadocumentiswritteninalanguagethatwearenotsupporting,wewilljustfallbacktosomedefaultfieldwiththedefaultanalyzer.Forexample,ourprocessedandpreparedforindexingdocumentcouldlooklikethis:

{

"title_english":"Firsttestdocument",

"content_english":"Thisisatestdocument"

}

Thethingisthatallthisprocessingwe’vementionedwouldhavetobedoneoutsideofElasticsearchorinsomekindofcustompluginthatwouldimplementthementionedlogic.

NoteInthepreviousversionsofElasticsearch,therewasapossibilityofchoosingananalyzerbasedonthevalueofanadditionalfield,whichcontainedtheanalyzername.Thiswasamoreconvenientandelegantwaybutintroducedsomeuncertaintyaboutthefieldcontents.Youalwayshadtodeliveraproperanalyzerwhenusingthegivenfieldorstrangethingshappened.TheElasticsearchteammadethedifficultdecisionandremovedthisfeature.

Thereisalsoasimplerway:wecantakeourfirstdocumentandindexitinseveralwaysindependentlyfrominputlanguage.Let’sfocusonthissolution.

www.EBooksWorld.ir

ThemappingsTohandleoursolution,whichwillprocessthedocumentusingseveraldefinedlanguages,weneednewmappings.Let’slookatthemappingswe’vecreatedtoindexourdocuments(we’vestoredtheminthemappings.jsonfile):

{

"mappings":{

"doc":{

"properties":{

"title":{

"type":"string",

"index":"analyzed",

"fields":{

"english":{

"type":"string",

"index":"analyzed",

"analyzer":"english"

},

"russian":{

"type":"string",

"index":"analyzed",

"analyzer":"russian"

},

"german":{

"type":"string",

"index":"analyzed",

"analyzer":"german"

}

}

},

"content":{

"type":"string",

"index":"analyzed",

"fields":{

"english":{

"type":"string",

"index":"analyzed",

"analyzer":"english"

},

"russian":{

"type":"string",

"index":"analyzed",

"analyzer":"russian"

},

"german":{

"type":"string",

"index":"analyzed",

"analyzer":"german"

}

}

}

}

}

}

www.EBooksWorld.ir

}

Intheprecedingmappings,we’veshownthedefinitionforthetitleandcontentfields(ifyouarenotfamiliarwithanyaspectofmappingsdefinition,refertotheMappingsconfigurationsectionofChapter2,IndexingYourData).WehaveusedthemultifieldfeatureofElasticsearch:eachfieldcanbeindexedinseveralwaysusingvariouslanguageanalyzers(inourexample,thoseanalyzersare:English,Russian,andGerman).

Inaddition,thebasefieldusesthedefaultanalyzer,whichwemayuseatquerytimewhenthelanguageisunknown.So,eachfieldwillactuallyhavefourfields–thedefaultoneandthreelanguageorientedfields.

Inordertocreateasampleindexcalleddocsthatusesourmappings,wewillusethefollowingcommand:

curl-XPUT'localhost:9200/docs'-d@mappings.json

www.EBooksWorld.ir

QueryingNowlet’sseehowwecanqueryourdatatousethenewlycreatedlanguagefields.Wecandividethequeryingsituationintotwodifferentcases.Ofcourse,tostartqueryingweneeddocuments.Let’sindexourexampledocumentbyrunningthefollowingcommand:

curl-XPOST'localhost:9200/docs/doc/1'-d'{"title":"Firsttest

document","content":"Thisisatestdocument"}'

QuerieswithanidentifiedlanguageThefirstcaseiswhenwehaveourquerylanguageidentified.Let’sassumethattheidentifiedlanguageisEnglish.Insuchcases,ourqueryisasfollows:

curl'localhost:9200/docs/_search?pretty'-d'{

"query":{

"match":{

"content.english":"documents"

}

}

}'

Thethingtoputemphasisonintheprecedingqueryisthefieldusedforqueryingandthequerytype.Thefieldusediscontent.english,whichalsoindicateswhichanalyzerwewanttouse.Weusedthatfieldbecausewehadidentifiedourlanguagebeforerunningthequery.Thankstothis,theEnglishanalyzercanfindourdocumentevenifwehavethesingularformofthewordinthedocument.TheresponsereturnedbyElasticsearchwillbeasfollows:

{

"took":2,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.19178301,

"hits":[{

"_index":"docs",

"_type":"doc",

"_id":"1",

"_score":0.19178301,

"_source":{

"title":"Firsttestdocument",

"content":"Thisisatestdocument"

}

}]

}

}

Thethingtonoteisalsothequerytype–thematchquery.Weusedthematchquery

www.EBooksWorld.ir

becauseitanalyzesitsbodywiththeanalyzerusedbythefieldthatitisrunagainst.Weneedthattoproperlymatchthedatainthequeryandthedataintheindex.

QuerieswithanunknownlanguageNowlet’slookatthesecondsituation–handlingquerieswhenwecouldn’tidentifythelanguageofthequery.Insuchcases,wecan’tusethefieldnamepointingtooneofthelanguages,suchascontent.german.Insuchacase,weusethedefaultfieldwhichusesthedefaultanalyzerandwesendthequerytothecontentfieldinstead.Thequerywilllookasfollows:

curl'localhost:9200/docs/_search?pretty'-d'{

"query":{

"match":{

"content":"documents"

}

}

}'

However,wedidn’tgetanyresultsthistimebecausethedefaultanalyzercan’tdealwithasingularformofawordwhenwearesearchingwithapluralform.

www.EBooksWorld.ir

CombiningqueriesToadditionallyboostthedocumentsthatperfectlymatchwithourdefaultanalyzer,wecancombinethetwoprecedingquerieswiththeboolquery.Suchacombinedquerywilllookasfollows:

curl-XGET'localhost:9200/docs/_search?pretty=true'-d'{

"query":{

"bool":{

"minimum_should_match":1,

"should":[

{

"match":{

"content.english":"documents"

}

},

{

"match":{

"content":"documents"

}

}

]

}

}

}'

Forthedocumenttobereturned,atleastoneofthedefinedqueriesmustmatch.Iftheybothmatch,thedocumentwillhaveahigherscorevalueandwillbeplacedhigherintheresults.

Thereisoneadditionaladvantageoftheprecedingcombinedquery.Ifourlanguageanalyzerdoesn’tfindadocument(forexample,whentheanalysisisdifferentfromtheoneusedduringindexing),thesecondqueryhasachancetofindthetermsthataretokenizedonlybywhitespacecharactersandlowercase.

www.EBooksWorld.ir

www.EBooksWorld.ir

InfluencingscoreswithqueryboostsInthebeginningofthischapter,welearnedwhatscoringisandhowElasticsearchusesthescoringformula.Whenanapplicationgrows,theneedforimprovingthequalityofsearchalsoincreases-wecallitsearchexperience.Weneedtogainknowledgeaboutwhatismoreimportanttotheuserandweseehowtheusersusethesearchesfunctionality.Thisleadstovariousconclusions;forexample,weseethatsomepartsofthedocumentsaremoreimportantthanothersorthatparticularqueriesemphasizeonefieldatthecostofothers.Weneedtoincludesuchinformationinourdataandqueriessothatbothsidesofthescoringequationareclosertoourbusinessneeds.Thisiswhereboostingcanbeused.

www.EBooksWorld.ir

TheboostBoostisanadditionalvalueusedintheprocessofscoring.Wealreadyknowitcanbeappliedto:

Query:Whenused,weinformthesearchenginethatthegivenqueryisapartofacomplexqueryandismoresignificantthantheotherparts.Document:Whenusedduringindexing,wetellElasticsearchthatadocumentismoreimportantthantheothersintheindex.Forexample,whenindexingblogposts,weareprobablymoreinterestedinthepoststhemselvesthanpingbacksorcomments.

Valuesassignedbyustoaqueryoradocumentarenottheonlyfactorsusedwhenwecalculatetheresultingscoreandweknowthat.Wewillnowlookatafewexamplesofqueryboosting.

www.EBooksWorld.ir

AddingtheboosttoqueriesLet’simaginethatourindexhastwodocumentsandwe’veusedthefollowingcommandstoindexthem:

curl-XPOST'localhost:9200/messages/email/1'-d'{

"id":1,

"to":"JohnSmith",

"from":"DavidJones",

"subject":"Topsecret!"

}'

curl-XPOST'localhost:9200/messages/email/2'-d'{

"id":2,

"to":"DavidJones",

"from":"JohnSmith",

"subject":"John,readthisdocument"

}'

Thisdataistrivial,butitshoulddescribeourproblemverywell.Nowlet’sassumewehavethefollowingquery:

curl-XGET'localhost:9200/messages/_search?pretty'-d'{

"query":{

"query_string":{

"query":"john",

"use_dis_max":false

}

}

}'

Inthiscase,Elasticsearchwillcreateaquerytothe_allfieldandwillfinddocumentsthatcontainthedesiredwords.Wealsosaidthatwedon’twantthedisjunctionquerytobeusedbyspecifyingtheuse_dis_maxparametertofalse(ifyoudon’trememberwhatadisjunctionqueryis,refertotheThedis_maxquerysectioninChapter3,SearchingYourData).Aswecaneasilyguess,bothourrecordswillbereturned.Therecordwithidentifierequalto2willbefirstbecausethewordJohnoccurstwotimes–onceinthefromfieldandonceinthesubjectfield.Let’scheckthisoutinthefollowingresult:

"hits":{

"total":2,

"max_score":0.13561106,

"hits":[{

"_index":"messages",

"_type":"email",

"_id":"2",

"_score":0.13561106,

"_source":{

"id":2,

"to":"DavidJones",

"from":"JohnSmith",

"subject":"John,readthisdocument"

}

},{

www.EBooksWorld.ir

"_index":"messages",

"_type":"email",

"_id":"1",

"_score":0.11506981,

"_source":{

"id":1,

"to":"JohnSmith",

"from":"DavidJones",

"subject":"Topsecret!"

}

}]

}

Iseverythingallright?Technically,yes.Butwethinkthattheseconddocument(theonewithidentifier1)shouldbepositionedasthefirstoneintheresultlist,becausewhensearchingforsomething,themostimportantfactor(inmanycases)ismatchingpeopleratherthanthesubjectofthemessage.Youcandisagree,butthisisexactlywhyfull-textsearchingrelevanceisadifficulttopic;sometimesitishardtotellwhichorderingisbetterforaparticularcase.Whatcanwedo?First,let’srewriteourquerytoimplicitlyinformElasticsearchwhatfieldsshouldbeusedforsearching:

curl-XGET'localhost:9200/messages/_search?pretty'-d'{

"query":{

"query_string":{

"fields":["from","to","subject"],

"query":"john",

"use_dis_max":false

}

}

}'

Thisisnotexactlythesamequeryasthepreviousone.Ifwerunit,wewillgetthesameresults(inourcase).However,ifyoulookcarefully,youwillnoticedifferencesinscoring.Inthepreviousexample,Elasticsearchonlyusedonefield,thatisthedefault_allfield.Thequerythatweareusingnowisusingthreefieldsformatching.Thismeansthatseveralfactors,suchasfieldlengths,arechanged.Anyway,thisisnotsoimportantinourcase.Elasticsearchunderthehoodgeneratesacomplexquerymadeofthreequeries–onetoeachfield.Ofcourse,thescorecontributedbyeachquerydependsonthenumberoftermsfoundinthisfieldandthelengthofthisfield.

Let’sintroducesomedifferencesbetweenthefieldsandtheirimportance.Comparethefollowingquerytothelastone:

curl-XGET'localhost:9200/messages/_search?pretty'-d'{

"query":{

"query_string":{

"fields":["from^5","to^10","subject"],

"query":"john",

"use_dis_max":false

}

}

}'

www.EBooksWorld.ir

Lookatthehighlightedparts(^5and^10).Byusingthatnotation(the^characterfollowedbyanumber),wecaninformElasticsearchhowimportantagivenfieldis.Weseethatthemostimportantfieldisthetofield(becauseofthehighestboostvalue).Nextwehavethefromfield,whichislessimportant.Thesubjectfieldhasthedefaultvalueforboost,whichis1.0andistheleastimportantfieldwhenitcomestoscorecalculation.Alwaysrememberthatthisvalueisonlyoneofthevariousfactors.Youmaybewonderingwhywechoose5andnot1000or1.23.Well,thisvaluedependsontheeffectwewanttoachieve,whatquerywehave,and,mostimportantly,whatdatawehaveinourindex.Typically,whendatachangesinthemeaningfulparts,weshouldprobablycheckandtuneourrelevanceonceagain.

Intheend,let’slookatasimilarexample,butusingtheboolquery:

curl-XGET'localhost:9200/messages/_search?pretty'-d'{

"query":{

"bool":{

"should":[

{"term":{"from":{"value":"john","boost":5}}},

{"term":{"to":{"value":"john","boost":10}}},

{"term":{"subject":{"value":"john"}}}

]

}

}

}'

Theprecedingquerywillyieldthesameresults,whichmeansthatthefirstdocumentontheresultslistwillbetheonewiththeidentifier1,butthescoreswillbeslightlydifferent.ThisisbecausetheLucenequeriesmadefromthelasttwoexamplesareslightlydifferentandthusthescoresaredifferent.

www.EBooksWorld.ir

ModifyingthescoreTheprecedingexampleshowshowtoaffecttheresultlistbyboostingparticularquerycomponents–thefields.Anothertechniqueistorunaqueryandaffectthescoreofthematcheddocuments.Inthefollowingsections,wewillsummarizethepossibilitiesofferedbyElasticsearch.Intheexamples,wewilluseourlibrarydatathatwehavealreadyusedinthepreviouschapters.

ConstantscorequeryAconstant_scorequeryallowsustotakeanyqueryandexplicitlysetthevaluethatshouldbeusedasthescorethatwillbegivenforeachmatchingdocumentbyusingtheboostparameter.

Atfirst,thisquerydoesn’tseemtobepractical.Butwhenwethinkaboutbuildingcomplexqueries,thisqueryallowsustosethowmanydocumentsmatchingthisquerycanaffectthetotalscore.Lookatthefollowingexample:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"constant_score":{

"query":{

"query_string":{

"query":"available:falseauthor:heller"

}

}

}

}

}'

Inourdata,wehavetwodocumentswiththeavailablefieldsettofalse.Oneofthesedocumentshasanadditionalvalueintheauthorfield.Ifweuseadifferentquery,thedocumentwithanadditionalvalueintheauthorfield(abookwithidentifier2)wouldbegivenahigherscore,but,thankstotheconstantscorequery,Elasticsearchwillignorethatinformationduringscoring.Bothdocumentswillbegivenascoreequalto1.0.

BoostingqueryThenexttypeofquerythatcanbeusedwithboostingistheboostingquery.Theideaistoallowustodefineapartofquerywhichwillcausematcheddocumentstohavetheirscoreslowered.Thefollowingexamplereturnsalltheavailablebooks(availablefieldsettotrue),butthebookswrittenbyE.M.Remarquewillhaveanegativeboostof0.1(whichmeansabouttentimeslowerscore):

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"boosting":{

"positive":{

"term":{

"available":true

}

},

www.EBooksWorld.ir

"negative":{

"match":{

"author":"remarque"

}

},

"negative_boost":0.1

}

}

}'

ThefunctionscorequeryTillnowwe’veseentwoexamplesofqueriesthatallowedustoalterthescoreofthereturneddocuments.Thethirdexamplewewantedtotalkabout,thefunction_scorequery,iswaymorecomplicatedthanthepreviouslydiscussedqueries.Thefunction_scorequeryisveryusefulwhenthescorecalculationismorecomplicatedthangivingasingleboosttoallthedocuments;boostingmorerecentdocumentsisanexampleofaperfectusecaseforthefunction_scorequery.

Structureofthefunctionquery

Thestructureofthefunctionqueryisquitesimpleandlooksasfollows:

{

"query":{

"function_score":{

"query":{...},

"functions":[

{

"filter":{...},

"FUNCTION":{...}

}

],

"boost_mode":"...",

"score_mode":"...",

"max_boost":"...",

"min_score":"...",

"boost":"..."

}

}

}

Ingeneral,thefunctionscorequerycanuseaquery,oneofseveralfunctions,andadditionalparameters.Eachfunctioncanhaveafilterdefinedtofiltertheresultsonwhichitwillbeapplied.Ifnofilterisgivenforafunction,itwillbeappliedtoallthedocuments.

Thelogicbehindthefunctionscorequeryisquitesimple.Firstofall,thefunctionsarematchedagainstthedocumentsandthescoreiscalculatedbasedonscore_mode.Afterthat,thequeryscoreforthedocumentiscombinedwiththescorecalculatedforthefunctionsandcombinedtogetheronthebasisofboost_mode.

Let’snowdiscusstheparameters:

Boostmode:Theboost_modeparameterallowsustodefinehowthescorecomputedbythefunctionquerieswillbecombinedwiththescoreofthequery.Thefollowing

www.EBooksWorld.ir

valuesareallowed:

multiply:Thedefaultbehavior,whichresultsinthequeryscorebeingmultipliedbythescorecomputedfromthefunctionsreplace:Thequeryscorewillbetotallyignoredandthedocumentscorewillbeequaltothescorecalculatedbythefunctionssum:Thedocumentscorewillbecalculatedasthesumofthequeryandthefunctionscoresavg:Thescoreofthedocumentwillbeanaverageofthequeryscoreandthefunctionscoremax:Thedocumentwillbegivenamaximumofqueryscoreandfunctionscoremin:Thedocumentwillbegivenaminimumofqueryscoreandfunctionscore

Scoremode:Thescore_modeparameterdefineshowthescorecomputedbythefunctionsarecombinedtogether.Thefollowingscore_modeparametervaluesaredefined:

multiply:Thedefaultbehaviorwhichresultsinthescoresreturnedbythefunctionsbeingmultipliedsum:Thescoresreturnedbythedefinedfunctionsaresummedavg:Thescorereturnedbythefunctionsisanaverageofallthescoresofthematchingfunctionsfirst:Thescoreofthefirstfunctionwithafiltermatchingthedocumentisreturnedmax:Themaximumscoreofthefunctionsisreturnedmin:Theminimumscoreofthefunctionsisreturned

Thereisonethingtoremember–wecanlimitthemaximumcalculatedscorevaluebyusingthemax_boostparameterinthefunctionscorequery.Bydefault,thatparameterissettoFloat.MAX_VALUE,whichmeansthemaximumfloatvalue.

Theboostparameterallowsustosetaquerywideboostforthedocuments.

Ofcourse,thereisonethingweshouldremember–thescorecalculateddoesn’taffectwhichdocumentsmatchedthequery.Becauseofthat,themin_scorepropertyhasbeenintroduced.Itallowsustodefinetheminimumscoreofthedocuments.Documentsthathaveascorelowerthanthemin_scorepropertywillbeexcludedfromtheresults.

Whatwehaven’ttalkedaboutyetarethefunctionscoresthatwecanincludeinthefunctionssectionofourquery.Thecurrentlyavailablefunctionsare:

weightfactorfieldvaluefactorscriptscorerandomdecay

Theweightfactorfunction

Theweightfactorfunctionallowsustomultiplythescoreofthedocumentbyagiven

www.EBooksWorld.ir

value.Thevalueoftheweightparameterisnotnormalizedandistakenasis.Anexampleusingtheweightfunction,wherewemultiplythescoreofthedocumentby20,looksasfollows:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"function_score":{

"query":{

"term":{

"available":true

}

},

"functions":[

{"weight":20}

]

}

}

}'

Fieldvaluefactorfunction

Thefield_value_factorfunctionallowsustoinfluencethescoreofthedocumentbyusingavalueofthefieldinthatdocument.Forexample,tomultiplythescoreofthedocumentbythevalueoftheyearfield,werunthefollowingquery:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"function_score":{

"query":{

"term":{

"available":true

}

},

"functions":[

{

"field_value_factor":{

"field":"year",

"missing":1

}

}

]

}

}

}'

Inadditiontochoosingthefieldwhosevalueshouldbeused,wecanalsocontrolthebehaviorofthefieldvaluefactorfunctionbyusingthefollowingproperties:

factor:Themultiplicationfactorthatwillbeusedalongwiththefieldvalue.Itdefaultsto1.modifier:Themodifierthatwillbeappliedtothefieldvalue.Itdefaultstonone.Itcantakethevalueoflog,log1p,log2p,ln,ln1p,ln2p,square,sqrt,andreciprocal.missing:Thevaluethatshouldbeusedwhenadocumentdoesn’thaveanyvaluein

www.EBooksWorld.ir

thefieldspecifiedinthefieldproperty.

Thescriptscorefunction

Thescript_scorefunctionallowsustouseascripttocalculatethescorethatwillbeusedasthescorereturnedbyafunction(andthuswillfallintobehaviordefinedbytheboost_modeparameter).Anexampleofscript_scoreusageisasfollows(forthefollowingexampletowork,inlinescriptingneedstobeallowed,whichmeansaddingthescript.inlinepropertyandsettingittooninelasticsearch.yml):

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"function_score":{

"query":{

"term":{

"available":true

}

},

"functions":[

{

"script_score":{

"script":{

"inline":"_score*_source.copies*parameter1",

"params":{

"parameter1":12

}

}

}

}

]

}

}

}'

Therandomscorefunction

Byusingtherandom_scorefunction,wecangenerateapseudorandomscore,byspecifyingaseed.Inordertosimulaterandomness,weshouldspecifyanewseedeverytime.Therandomnumberwillbegeneratedbyusingthe_uidfieldandtheprovidedseed.Ifaseedisnotprovided,thecurrenttimestampwillbeused.Anexampleofusingthisisasfollows:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"function_score":{

"query":{

"term":{

"available":true

}

},

"functions":[

{

"random_score":{

"seed":12345

}

www.EBooksWorld.ir

}

]

}

}

}'

Decayfunctions

Inadditiontotheearliermentionedscoringfunctions,Elasticsearchexposesadditionalones,calledthedecayfunctions.Thedifferencefromthepreviouslydescribedfunctionsisthatthescoregivenbythosefunctionslowerswithdistance.Thedistanceiscalculatedonthebasisofasinglevaluednumericfield(suchasadate,ageographicalpoint,orastandardnumericfield).Thesimplestexamplethatcomestomindisboostingdocumentsonthebasisofdistancefromagivenpointorboostingonthebasisofdocumentdate.

Forexample,let’sassumethatwehaveapointfieldthatstoresthelocationandwewantourdocument’sscoretobeaffectedbythedistancefromapointwheretheuserstands(forexample,ourusersendsaqueryfromamobiledevice).Assumingtheuserisat52,21,wecouldsendthefollowingquery:

{

"query":{

"function_score":{

"query":{

"term":{

"available":true

}

},

"functions":[

{

"linear":{

"point":{

"origin":"52,21",

"scale":"1km",

"offset":0,

"decay":0.2

}

}

}

]

}

}

}

Intheprecedingexample,thelinearisthenameofthedecayfunction.Thevaluewilldecaylinearlywhenusingit.Theotherpossiblevaluesaregaussandexp.We’vechosenthelineardecayfunctionbecauseofthefactthatitsetsthescoreto0whenthefieldvalueexceedsthegivenoriginvaluetwice.Thisisusefulwhenyouwanttolowerthevalueofthedocumentsthataretoofaraway.

NoteNotethatthegeographicalsearchingcapabilitiesofElasticsearchwillbediscussedintheGeosectionofChapter8,BeyondFull-textSearching.

www.EBooksWorld.ir

Nowlet’sdiscusstherestofthequerystructure.Thepointisthenameofthefieldwewanttouseforscorecalculation.Ifthedocumentdoesn’thaveavalueinthedefinedfield,itwillbegivenavalueof1forthetimeofcalculation.

Inadditiontothat,we’veprovidedadditionalparameters.Theoriginandscalearerequired.Theoriginparameteristhecentralpointfromwhichthecalculationwillbedoneandthescaleistherateofdecay.Bydefault,theoffsetissetto0.Ifdefined,thedecayfunctionwillonlycomputeascoreforthedocumentswithvaluegreaterthanthevalueofthisparameter.ThedecayparametertellsElasticsearchhowmuchthescoreshouldbeloweredandissetto0.5bydefault.Inourcase,we’vesaidthat,atthedistanceof1kilometer,thescoreshouldbereducedby20%(0.2).

NoteWeexpectthefunction_scorequerytobemodifiedandextendedwiththenextversionsofElasticsearch(justasitwaswithElasticsearchversion1.x).Wesuggestfollowingtheofficialdocumentationandthepagededicatedtothefunction_scorequeryathttps://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-function-score-query.html.

www.EBooksWorld.ir

www.EBooksWorld.ir

Whendoesindex-timeboostingmakesense?Intheprevioussection,wediscussedboostingqueries.Thiskindofapproachtohandlingdifferencesintheweightofdocumentsisveryhandy,powerful,andeasytouse.Itisalsosufficientinmostsituations.However,therearecaseswhenamoreconvenientwayofdocumentsboostingisindex-timeboosting.Oneofsuchusecaseisthesituationwhenweknowwhichdocumentsareimportantduringtheindexingphase.Insuchacase,wecanpreparethedocumentboostandincludeitaspartofthedocument.Wegainaboostthatisindependentfromaqueryatthecostofreindexingthedocumentswhentheboostvalueischanged(becauseweneedtoapplythechangedboost).Inadditiontothat,theperformancegetsslightlybetterbecausesomepartsneededintheboostingprocessarealreadycalculatedatindextime,whichcanmatterwhenyourindiceshavealargenumberofdocuments.Informationabouttheboostisstoredasapartofthenormalizationfactorandbecauseofthatitisimportanttokeepthenormsturnedon.Thismeansthatwecan’tsetnorms.enabledtofalsebecausewewon’tbeabletouseindextimeboosting.

www.EBooksWorld.ir

DefiningboostinginthemappingsItisalsopossibletodirectlydefinethefield’sboostinourmappings.ThiswillresultinElasticsearchgivingaboostforallthedocumentshavingavalueinsuchafield.Ofcourse,thatwillalsohappenduringindexingtime.Thefollowingexampleillustratesthat:

{

"mappings":{

"book":{

"properties":{

"title":{"type":"string"},

"author":{"type":"string","boost":10.0}

}

}

}

}

Thankstotheprecedingboost,allquerieswillfavorvaluesfoundinthefieldnamedauthor.Thisalsoappliestoqueriesusingthe_allfield,becauseElasticsearchwillapplytheboosttovaluescopiedbetweenthefields.

www.EBooksWorld.ir

www.EBooksWorld.ir

WordswiththesamemeaningYoumayhaveheardaboutsynonyms,wordsthathavethesameorsimilarmeaning.Sometimesyouwouldwanttohavesomewordsmatchedwhenoneofthosewordsisenteredintothesearchbox.Let’srecalloursampledatafromChapter3,SearchingYourData.Therewasabookcalledcrimeandpunishment.Whatifwewantthatbooktonotonlybematchedwhenthewordscrimeorpunishmentareused,butalsowhenusingthewordssuchascriminalityandabuse.Atfirstglance,thismaynotsoundlikegoodbehavior,butsometimesthisisreallyneeded,especiallyinusecaseswheretherearemultiplewordsmeaningthesame(likeinmedicine).Tohandlesuchusecases,wewillusesynonyms.

www.EBooksWorld.ir

SynonymfilterSynonymsinElasticsearcharehandledontheanalysislevel–atbothindexandquerytime,byadedicatedsynonymsfilter.Tousethesynonymfilter,weneedtodefineourownanalyzer.Forexample,let’sdefineananalyzerthatwillbecalledsynonymandwillusethewhitespacetokenizerandasinglefiltercalledsynonym.Ourfilter’stypepropertyneedstobesettosynonym,whichtellsElasticsearchthatthisfilterisasynonymfilter.

Inadditiontothat,wewanttoignorecase,sothattheuppercasedandlowercasedsynonymsaretreatedequally(settheignore_casepropertytotrue).Todefineourcustomsynonymanalyzerthatusesasynonymfilterwhencreatinganewindex,wewouldusethefollowingcommand:

curl-XPOST'localhost:9200/test'-d'{

"index":{

"analysis":{

"analyzer":{

"synonym":{

"tokenizer":"whitespace",

"filter":[

"synonym"

]

}

},

"filter":{

"synonym":{

"type":"synonym",

"ignore_case":true,

"synonyms":[

"crime=>criminality"

]

}

}

}

}

}'

SynonymsinthemappingsInthedefinitionyou’vejustseen,we’vespecifiedthesynonymruleinthemappingswesendtoElasticsearch.Todothat,weneededtoaddthesynonymsproperty,whichisanarrayofsynonymrules.Forexample,thefollowingpartofthemappingsdefinitiondefinesasinglesynonymrule:

"synonyms":[

"crime=>criminality"

]

TheprecedingruletellsElasticsearchtochangethecrimetermtothecriminalitytermwhenthecrimetermisencounteredduringanalysis.

Synonymsstoredonthefilesystem

www.EBooksWorld.ir

Apartfromstoringthesynonymsrulesinthemappings,Elasticsearchallowsustouseafile-basedsynonymsruleset.Touseafile,weneedtospecifythesynonyms_pathpropertyinsteadofthesynonymsone.Thesynonyms_pathpropertyshouldbesettothenameofthefilethatholdsthesynonym’sdefinitionandthespecifiedfilepathisrelativetotheElasticsearchconfigdirectory.So,ifwestoreoursynonymsinthesynonyms.txtfileandwesavethatfileintheconfigdirectory,then,inordertouseit,weshouldsetsynonyms_pathtothevalueofsynonyms.txt.

Forexample,thisishowoursynonymfilterwouldlooklikeifwewanttousethesynonymsstoredinafile:

"filter":{

"synonym":{

"type":"synonym",

"synonyms_path":"synonyms.txt"

}

}

www.EBooksWorld.ir

DefiningsynonymrulesSofarwehavediscussedwhatwehavetodoinordertousesynonymexpansionsinElasticsearch.Nowlet’sseewhatformatsofsynonymsareallowed.

UsingApacheSolrsynonymsThemostcommonsynonymstructureintheApacheLuceneworldisprobablytheoneusedbyApacheSolr(http://lucene.apache.org/solr/),thesearchenginebuiltontopofLucene,justlikeElasticsearchis.ThisisthedefaultwayofhandlingsynonymsinElasticsearchandthepossibilitiesofdefininganewsynonymarediscussedinthefollowingsections.

Explicitsynonyms

Asimplemappingallowsustomapalistofwordsontootherwords.So,inourcase,ifwewantthewordcriminalitytobemappedtocrimeandthewordabusetobemappedtopunishment,weneedtodefinethefollowingentries:

criminality=>crime

abuse=>punishment

Ofcourse,asinglewordcanbemappedintomultipleonesandmultipleonescanbemappedintoasingleone.Forexample:

starwars,wars=>starwars

Theprecedingexamplemeansthatstarwarsandwarswillbechangedtostarwarsbythesynonymfilter.

Equivalentsynonyms

Inadditiontotheexplicitmapping,Elasticsearchallowsustouseequivalentsynonyms.Forexample,thefollowingdefinitionwillmakeallthewordsexchangeablesothatyoucanuseanyofthemtomatchadocumentthathasoneoftheminitscontents:

star,wars,starwars,starwars

Expandingsynonyms

AsynonymfilterallowsustouseoneadditionalpropertywhenitcomestoApacheSolrformatsynonyms–theexpandproperty.Whentheexpandpropertyissettotrue(bydefaultitissettofalse),allsynonymswillbeexpandedbyElasticsearchtoallequivalentforms.Forexample,let’ssaywehavethefollowingfilterconfiguration:

"filter":{

"synonym":{

"type":"synonym",

"expand":false,

"synonyms":[

"one,two,three"

]

}

}

www.EBooksWorld.ir

Elasticsearchwillmaptheprecedingsynonymdefinitiontothefollowing:

one,two,three=>one

Thismeansthatthewordsone,two,andthreewillbechangedtoone.However,ifwesettheexpandpropertytotrue,thesamesynonymdefinitionwillbeinterpretedinthefollowingway:

one,two,three=>one,two,three

Thisbasicallymeansthateachofthewordsfromtheleft-sideofthedefinitionwillbeexpandedtoallthewordsontheright-side.

UsingWordNetsynonymsIfwewanttouseWordNet-structured(tolearnmoreaboutWordNet,visithttp://wordnet.princeton.edu/)synonyms,weneedtoprovideanadditionalpropertyforoursynonymfilter.ThepropertynameisformatandweshouldsetitsvaluetowordnetinorderforElasticsearchtounderstandthatformat.

www.EBooksWorld.ir

Queryorindex-timesynonymexpansionAswithalltheanalyzers,onecanwonderwhentousethesynonymfilter–duringindexing,duringquerying,ormaybeduringindexingandquerying.Ofcourse,itdependsonyourneeds.However,rememberthatusingindex-timesynonymsrequiresdatareindexingaftereachsynonymchange.That’sbecausetheyneedtobereappliedtoallthedocuments.Ifweuseonlythequery-timesynonyms,wecanupdatethesynonym’slistsandhavethemappliedwithoutdatareindexation.

www.EBooksWorld.ir

www.EBooksWorld.ir

UnderstandingtheexplaininformationComparedtodatabases,usingsystemscapableofperformingfull-textsearchcanoftenbeanythingotherthanobvious.Wecansearchinmanyfieldssimultaneouslyandthedataintheindexcanvaryfromtheonesprovidedasthevaluesofthedocumentfields(becauseoftheanalysisprocess,synonyms,abbreviations,andothers).It’sevenworse!Bydefault,searchenginessortdatabyrelevance,whichmeansthateachdocumentisgivenanumberindicatinghowsimilarthedocumentistothequery.Thekeypointhereisunderstandingthehowsimilarphrase.Aswediscussedinthebeginningofthechapter,scoringtakesmanyfactorsintoaccount–howmanysearchedwordswerefoundinthedocument,howfrequentthewordis,howmanytermsareinthefield,andsoon.Thisseemscomplicatedandfindingoutwhyadocumentwasfoundandwhyanotherdocumentisbetterisnoteasy.Fortunately,Elasticsearchprovidesuswithtoolsthatcananswerthesequestionsandwewilllookattheminthissection.

www.EBooksWorld.ir

UnderstandingfieldanalysisOneofthecommonquestionsaskedwhenanalyzingthereturneddocumentsiswhyagivendocumentwasnotfound.Inmanycases,theproblemliesinthemappingsdefinitionandtheanalysisprocessconfiguration.Fordebuggingtheanalysisprocess,ElasticsearchprovidesadedicatedRESTAPIendpoint–the_analyzeone.

Usingitisverysimple.Let’sseehowitisusedbyrunningarequesttoElasticsearchtogiveusinformationonhowthecrimeandpunishmentphraseisanalyzed.Todothat,wewillrunacommandusingHTTPGETtothe_analyzeRESTend-pointandwewillprovidethephraseastherequestbody.Thefollowingcommanddoesthat:

curl-XGET'localhost:9200/_analyze?pretty'-d'CrimeandPunishment'

Inresponse,wegetthefollowingdata:

{

"tokens":[{

"token":"crime",

"start_offset":0,

"end_offset":5,

"type":"<ALPHANUM>",

"position":0

},{

"token":"and",

"start_offset":6,

"end_offset":9,

"type":"<ALPHANUM>",

"position":1

},{

"token":"punishment",

"start_offset":10,

"end_offset":20,

"type":"<ALPHANUM>",

"position":2

}]

}

Aswecansee,Elasticsearchdividedtheinputphraseintothreetokens.Duringprocessing,thephrasewasdividedintotokensonthebasisofwhitespacecharactersandwaslowercased.Thisshowsusexactlywhatwouldbehappeningduringtheanalysisprocess.Wecanalsoprovidethenameoftheanalyzer.Forexample,wecanchangetheprecedingcommandtosomethinglikethis:

curl-XGET'localhost:9200/_analyze?analyzer=standard&pretty'-d'Crimeand

Punishment'

Theprecedingcommandwillallowustocheckhowthestandardanalyzeranalyzesthedata.

ItisworthnotingthatthereisanotherformofanalysisAPIavailable–onewhichallowsustoprovidetokenizersandfilters.Itisveryhandywhenwewanttoexperimentwithconfigurationbeforecreatingthetargetmappings.Insteadofspecifyingtheanalyzer

www.EBooksWorld.ir

parameterintherequest,weprovidethetokenizerandthefiltersparameters.Wecanprovideasingletokenizerandalistoffilters(separatedbycommacharacter).Forexample,toillustratehowtokenizationusingwhitespacetokenizerworkswithlowercaseandkstemfilterswewouldrunthefollowingrequest:

curl-XGET'localhost:9200/library/_analyze?

tokenizer=whitespace&filters=lowercase,kstem&pretty'-d'JohnSmith'

Aswecansee,ananalysisAPIcanbeveryusefulfortrackingdownbugsinthemappingconfiguration.Itisalsopricelesswhenwewanttosolveproblemswithqueriesandmatching.Itcanshowushowouranalyzerswork,whattermstheyproduce,andwhattheattributesofthosetermsare.Withsuchinformation,analyzingthequeryproblemswillbeeasiertotrackdown.

www.EBooksWorld.ir

ExplainingthequeryInadditiontolookingatwhathappenedduringanalysis,Elasticsearchallowsustoexplainhowthescorewascalculatedforaparticularqueryanddocument.Let’slookatthefollowingexample:

curl-XGET'localhost:9200/library/book/1/_explain?pretty&q=quiet'

Theprecedingrequestspecifiesadocumentandaquerytorun.ThedocumentisspecifiedintheURIandthequeryispassedusingtheqparameter.Usingthe_explainendpoint,weaskElasticsearchforanexplanationabouthowthedocumentwasmatchedbyElasticsearch(ornotmatched).TheresponsereturnedbyElasticsearchfortheprecedingrequestlooksasfollows:

{

"_index":"library",

"_type":"book",

"_id":"1",

"matched":true,

"explanation":{

"value":0.057534903,

"description":"sumof:",

"details":[{

"value":0.057534903,

"description":"weight(_all:quietin0)[PerFieldSimilarity],result

of:",

"details":[{

"value":0.057534903,

"description":"fieldWeightin0,productof:",

"details":[{

"value":1.0,

"description":"tf(freq=1.0),withfreqof:",

"details":[{

"value":1.0,

"description":"termFreq=1.0",

"details":[]

}]

},{

"value":0.30685282,

"description":"idf(docFreq=1,maxDocs=1)",

"details":[]

},{

"value":0.1875,

"description":"fieldNorm(doc=0)",

"details":[]

}]

}]

},{

"value":0.0,

"description":"matchonrequiredclause,productof:",

"details":[{

"value":0.0,

"description":"#clause",

"details":[]

www.EBooksWorld.ir

},{

"value":3.2588913,

"description":"_type:book,productof:",

"details":[{

"value":1.0,

"description":"boost",

"details":[]

},{

"value":3.2588913,

"description":"queryNorm",

"details":[]

}]

}]

}]

}

}

Itcanlookslightlycomplicatedandwell,itiscomplicated.Itisevenworseifwerealizethatthisisonlyasimplequery!Elasticsearch,andmorespecificallytheLucenelibrary,showstheinternalinformationaboutthescoringprocess.Wewillonlyscratchthesurfaceandwillexplainthemostimportantthingsabouttheprecedingresponse.

ThefirstthingthatyoucannoticeisthatfortheparticularqueryElasticsearchprovidedtheinformationifthedocumentwasamatchornot.Ifthematchedpropertyissettotrue,itmeansthatthedocumentwasamatchfortheprovidedquery.

Thenextimportantthingistheexplanationobject.Itcontainsthreeproperties:thevalue,thedescription,andthedetails.Thevalueisthescorecalculatedforthegivenpartofthequery.Thedescriptionisthesimplifiedtextrepresentationoftheinternalscorecalculation,andthedetailsobjectcontainsdetailedinformationaboutthescorecalculation.ThenicethingisthatthedetailsobjectwillagaincontainthesamethreepropertiesandthisishowElasticsearchprovidesuswithinformationonhowthescoreiscalculated.

Forexample,let’sanalyzethefollowingpartoftheresponse:

"value":0.057534903,

"description":"sumof:",

"details":[{

"value":0.057534903,

"description":"weight(_all:quietin0)[PerFieldSimilarity],result

of:",

"details":[{

"value":0.057534903,

"description":"fieldWeightin0,productof:",

"details":[{

"value":1.0,

"description":"tf(freq=1.0),withfreqof:",

"details":[{

"value":1.0,

"description":"termFreq=1.0",

"details":[]

}]

},{

www.EBooksWorld.ir

"value":0.30685282,

"description":"idf(docFreq=1,maxDocs=1)",

"details":[]

},{

"value":0.1875,

"description":"fieldNorm(doc=0)",

"details":[]

}]

}]

Thescoreoftheelementis0.057534903(thevalueproperty)anditisasumof(weseethatinthedescriptionproperty)alltheinnerelements.Inthedescriptiononthefirstlevelofnestingoftheprecedingfragment,wecanseethatPerFieldSimilarityhasbeenusedandthatthescoreofthatelementistheresultoftheinnerelements–thesecondlevelofnesting.

Onthesecondlevelofdetailsnesting,wecanseethreeelements.Thefirstoneshowsusthescoreoftheelement,whichistheproductofthetwoscoresoftheelementsbelowit.Wecanalsoseevariousinternalstatisticsretrievedfromtheindex:thetermfrequencywhichinformsushowcommonthetermis(termFreq=1.0),theinverteddocumentfrequency,whichshowsushowoftenthetermappearsinthedocuments(idf(docFreq=1,maxDocs=1)),andthefieldnormalizationfactor(fieldNorm(doc=0)).

TheExplainAPIsupportsthefollowingparameters:analyze_wildcard,analyzer,default_operator,df,fields,lenient,lowercase_expanded_terms,parent,preference,routing,_source,_source_exclude,and_source_include.Tolearnmoreaboutalltheseparameters,refertotheofficialElasticsearchdocumentationregardingExplainAPI,whichisavailableathttps://www.elastic.co/guide/en/elasticsearch/reference/current/search-explain.html.

www.EBooksWorld.ir

www.EBooksWorld.ir

SummaryThechapterwejustfinishedwasfocusedonquerying;notaboutthematchingpartofitbutmostlyaboutscoring.WelearnedhowApacheLuceneTF/IDFscoringworks.WesawthescriptingcapabilitiesofElasticsearchandwehandledmultilingualdata.Weusedboostingtoinfluencehowthescoresofthereturneddocumentswerecalculatedandweusedsynonyms.Finally,weusedexplaininformationtoseehowthedocumentscoreswerecalculatedbythequery.

Inthenextchapter,wewillfullyfocusonElasticsearchdataanalysiscapabilities–theaggregations,theirtypes,andhowtheycanbeused.

www.EBooksWorld.ir

www.EBooksWorld.ir

Chapter7.AggregationsforDataAnalysisInthepreviouschapter,wediscussedthequeryingsideofElasticsearchagain.WelearnedhowtheLuceneTF/IDFalgorithmworksandhowtouseElasticsearchscriptingcapabilities.Wehandledmultilingualdataandinfluenceddocumentscoreswithboosts.WeusedsynonymstomatchwordsthathavethesamemeaningandweusedElasticsearchExplainAPItoseehowdocumentscoreswerecalculated.Bytheendofthischapter,youwillhavelearnedthefollowingtopics:

WhatareaggregationsHowtheElasticsearchaggregationengineworksHowtousemetricsaggregationsHowtousebucketsaggregationsHowtousepipelineaggregations

www.EBooksWorld.ir

AggregationsIntroducedinElasticsearch1.0,aggregationsaretheheartofdataanalyticsinElasticsearch.Highlyflexibleandperformant,aggregationsbroughtElasticsearch1.0toanewpositionasafull-featuredanalysisengine.ExtendedthroughthelifeofElasticsearch1.x,in2.xtheyareyetmorepowerful,lessmemorydemanding,andfaster.Withthisframework,youcanuseElasticsearchastheanalysisenginefordataextractionandvisualization.Let’sseehowthatfunctionalityworksandwhatwecanachievebyusingit.

www.EBooksWorld.ir

GeneralquerystructureTouseaggregations,weneedtoaddanadditionalsectioninourquery.Ingeneral,ourquerieswithaggregationslooklikethis:

{

"query":{…},

"aggs":{

"aggregation_name":{

"aggregation_type":{

...

}

}

}

}

Intheaggsproperty(youcanuseaggregationsifyouwant;aggsisjustanabbreviation),youcandefineanynumberofaggregations.EachaggregationisdefinedbyitsnameandoneofthetypesofaggregationsthatareprovidedbyElasticsearch.Onethingtorememberthoughisthatthekeydefinesthenameoftheaggregation(youwillneedittodistinguishparticularaggregationsintheserverresponse).Let’stakeourlibraryindexandcreatethefirstqueryusinguseaggregations.Acommandsendingsuchaquerylookslikethis:

curl'localhost:9200/library/_search?

search_type=query_then_fetch&size=0&pretty'-d'{

"aggs":{

"years":{

"stats":{

"field":"year"

}

},

"words":{

"terms":{

"field":"copies"

}

}

}

}'

Thisquerydefinestwoaggregations.Theaggregationnamedyearsshowsstatisticsfortheyearfield.Thewordsaggregationcontainsinformationaboutthetermsusedinagivenfield.

NoteInourexamplesweassumedthatweperformaggregationinadditiontosearching.Ifwedon’tneedfounddocuments,abetterideaistousethesizeparameterandsetitto0.Thisomitssomeunnecessaryworkandismoreefficient.Insuchacase,theendpointshouldbe/library/_search?size=0.YoucanreadmoreaboutsearchtypesinChapter3,UnderstandingtheQueryingProcess.

Let’snowlookattheresponsereturnedbyElasticsearchfortheprecedingquery:

www.EBooksWorld.ir

{

"took":2,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"words":{

"doc_count_error_upper_bound":0,

"sum_other_doc_count":0,

"buckets":[{

"key":0,

"doc_count":2

},{

"key":1,

"doc_count":1

},{

"key":6,

"doc_count":1

}]

},

"years":{

"count":4,

"min":1886.0,

"max":1961.0,

"avg":1928.0,

"sum":7712.0

}

}

}

Asyousee,boththeaggregations(yearsandwords)werereturned.Thefirstaggregationwedefinedinourquery(years)returnedgeneralstatisticsforthegivenfieldgatheredacrossallthedocumentsthatmatchedourquery.Thesecondofthedefinedaggregations(words)wasabitdifferent.Itcreatedseveralsetscalledbucketsthatwerecalculatedonthereturneddocumentsandeachoftheaggregatedvalueswaswithinoneofthesesets.Asyoucansee,therearemultipleaggregationtypesavailableandtheyreturndifferentresults.Wewillseethedifferencesinthelaterpartofthissection.

Thegreatthingabouttheaggregationengineisthatitallowsyoutohavemultipleaggregationsandthataggregationscanbenested.Thismeansthatyoucanhaveindefinitelevelsofnestingandanynumberofaggregationsingeneral.Theextendedstructureofthequeryisshownnext:

{

"query":{…},

"aggs":{

www.EBooksWorld.ir

"first_aggregation_name":{

"aggregation_type":{

...

},

"aggregations":{

"first_nested_aggregation":{

...

},

.

.

.

"nth_nested_aggregation":{

...

}

}

},

.

.

.

"nth_aggregation_name":{

...

}

}

}

www.EBooksWorld.ir

InsidetheaggregationsengineAggregationsworkonthebasisofresultsreturnedbythequery.Thisisveryhandyaswegettheinformationthatweareinterestedin,bothfromthequeryaswellasthedataanalysisperspective.SowhatdoesElasticsearchdowhenweincludetheaggregationpartofthequeryintherequestthatwesendtoElasticsearch?Firstofall,theaggregationisexecutedoneachrelevantshardandtheresultsarereturnedtothenodethatisresponsibleforrunningthatquery.Thatnodewaitsforthepartialresultstobecalculated;afteritgetsalltheresults,itmergestheresults,producingthefinalresults.

Thisapproachisnothingnewwhenitcomestodistributedsystemsandhowtheyworkandcommunicate,butcancauseissueswhenitcomestotheprecisionoftheresults.Inmostcasesthisisnotaproblem,butyoushouldbeawareaboutwhattoexpect.Let’simaginethefollowingexample:

Theprecedingimageshowsasimplifiedviewofthreeshards,eachcontainingdocumentshavingonlyElasticsearchandSolrtermsinthem.Nowimaginethatweareinterestedinasingletermforourindex.Thetermsaggregationwhenrunusingsize=1wouldreturnasingleterm,thatwouldbetheonethatisthemostfrequent(ofcourselimitedtothequerywe’verun).SoouraggregatornodewouldseepartialresultstellingusthatElasticsearchispresentin19documentsinShard1andtheSolrtermispresentin10documentsinShard2andShard3,whichmeansthatthetoptermisSolr,whichisnottrue.Thisisanextremecase,butthereareusecases(suchasaccounting)whereprecisioniskeyandyoushouldbeawareaboutsuchsituations.

NoteComparedtoqueries,aggregationsareheavierforElasticsearchintermsofbothCPUcyclesandmemoryconsumption.WewilldiscussthisinmoredetailintheCachingAggregationssectionofthischapter.

www.EBooksWorld.ir

www.EBooksWorld.ir

AggregationtypesElasticsearch2.xallowsustousethreetypesofaggregation:metrics,buckets,andpipeline.Themetricsaggregationsreturnametric,justlikethestatsaggregationweusedforthestatsfield.Thebucketaggregationsreturnbuckets,thekeyandthenumberofdocumentssharingthesamevalues,ranges,andsoon,justlikethetermsaggregationweusedforthecopiesfield.Finally,thepipelineaggregationsintroducedinElasticsearch2.0aggregatetheoutputoftheotheraggregationsandtheirmetrics,whichallowsustodoevenmoresophisticateddataanalysis.Knowingallthat,let’snowlookatalltheaggregationswecanuseinElasticsearch2.x.

www.EBooksWorld.ir

MetricsaggregationsWewillstartwiththemetricsaggregations,whichcanaggregatevaluesfromdocumentsintoasinglemetric.Thisisalwaysthecasewithmetricsaggregations–youcanexpectthemtobeasinglemetriconthebasisofthedata.Let’snowtakealookatthemetricsaggregationsavailableinElasticsearch2.x.

Minimum,maximum,average,andsumThefirstgroupofmetricsaggregationsthatwewanttoshowyouistheonethatcalculatesthebasicvaluefromthegivendocuments.Theseaggregationsare:

min:Thiscalculatestheminimumvaluefromthegivennumericfieldinthereturneddocumentsmax:Thiscalculatesthemaximumvaluefromthegivennumericfieldinthereturneddocumentsavg:Thiscalculatesanaveragefromthegivennumericfieldinthereturneddocumentssum:Thiscalculatesthesumfromthegivennumericfieldinthereturneddocuments

Asyoucansee,theprecedingmentionedaggregationsareprettyself-explanatory.So,let’strytocalculatetheaveragevalueonourdata.Forexample,let’sassumethatwewanttocalculatetheaveragenumberofcopiesforourbooks.Thequerytodothatwilllookasfollows:

{

"aggs":{

"avg_copies":{

"avg":{

"field":"copies"

}

}

}

}

TheresultsreturnedbyElasticsearchafterrunningtheprecedingquerywillbeasfollows:

{

"took":5,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"avg_copies":{

"value":1.75

www.EBooksWorld.ir

}

}

}

So,wehaveanaverageof1.75copiesperbook.Itisveryeasytocalculate–(6+0+1+0)/4isequalto1.75.Seemsthatwegotitright.

Missingvalues

ThenicethingaboutthepreviouslymentionedaggregationsisthatwecancontrolwhatvalueElasticsearchcanuseifthefieldswe’vespecifieddon’thaveany.Forexample,ifwewantedElasticsearchtouse0asthevalueforthecopiesfieldinourpreviousexample,wewouldaddthemissingpropertytoourqueryandandsetitto0.Forexample:

{

"aggs":{

"avg_copies":{

"avg":{

"field":"copies",

"missing":0

}

}

}

}

Usingscripts

Theinputvaluescanalsobegeneratedbyascript.Forexample,ifwewanttofindtheminimumvaluefromallthevaluesintheyearfield,butwewanttosubtract1000fromthosevalues,wewillsendanaggregationlikethefollowingone:

{

"aggs":{

"min_year":{

"min":{

"script":"doc['year'].value-1000"

}

}

}

}

NoteNotethattheprecedingqueryrequiresinlinescriptstobeallowed.Thismeansthatthequeryrequiresthescript.inlinepropertysettoonintheelasticsearch.ymlfile.

Inthiscase,thevaluetheaggregationswillusewillbetheoriginalyearfieldvaluereducedby1000.

WecanalsousethevaluescriptcapabilitiesofElasticsearch.Forexample,toachievethesameasthepreviousscript,wecanusethefollowingquery:

{

"aggs":{

"min_year":{

"min":{

www.EBooksWorld.ir

"field":"year",

"script":{

"inline":"_value-factor",

"params":{

"factor":1000

}

}

}

}

}

}

IfyouarenotfamiliarwithElasticsearchscriptingcapabilities,youcanreadmoreaboutitintheScriptingcapabilitiesofElasticsearchsectionofChapter6,MakeYourSearchBetter.

Onethingworthrememberingisthatusingthecommandlinemayrequireproperescapingofthevaluesinthedocarray.Forexample,thecommandthatexecutesthefirstscriptedquerywouldlookasfollows:

curl-XGET'localhost:9200/library/_search?size=0&pretty'-d'{

"aggs":{

"min_year":{

"min":{

"script":"doc[\"year\"].value-1000"

}

}

}

}'

FieldvaluestatisticsandextendedstatisticsThenextaggregationswewilldiscussaretheonesthatprovideuswiththestatisticalinformationaboutthenumericfieldwearerunningtheaggregationon:thestatsandextended_statsaggregations.

Forexample,thefollowingqueryprovidesextendedstatisticsfortheyearfield:

{

"aggs":{

"extended_statistics":{

"extended_stats":{

"field":"year"

}

}

}

}

Theresponsetotheprecedingquerywillbeasfollows:

{

"took":1,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

www.EBooksWorld.ir

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"extended_statistics":{

"count":4,

"min":1886.0,

"max":1961.0,

"avg":1928.0,

"sum":7712.0,

"sum_of_squares":1.4871654E7,

"variance":729.5,

"std_deviation":27.00925767213901,

"std_deviation_bounds":{

"upper":1982.018515344278,

"lower":1873.981484655722

}

}

}

}

Asyoucansee,intheresponsewegotinformationaboutthenumberofdocumentswithvalueintheyearfield,theminimumvalue,themaximumvalue,theaverage,andthesum.Thesearethevaluesthatwewillgetifwerunthestatsaggregationinsteadofextended_stats.Theextended_statsaggregationprovidesadditionalinformation,suchasthesumofsquares,variance,andstandarddeviation.Elasticsearchprovidestwotypesofaggregationsbecauseextended_statsisslightlymoreexpensivewhenitcomestoprocessingpower.

NoteThestatsandextended_statsaggregations,similartothemin,max,avg,andsumaggregations,supportscriptingandallowustospecifywhichvalueshouldbeusedforthefieldsthatdon’thavevalueinthespecifiedfield.

ValuecountThevalue_countaggregationisasimpleaggregationwhichallowscountingvaluesinaggregateddocuments.Thisisquiteusefulwhenusedwithnestedaggregations.Wearenotfocusingonthattopicrightnow,butitissomethingtokeepinmind.Forexample,tousethevalue_countaggregationonthecopiedfield,wewillrunthefollowingquery:

{

"aggs":{

"count":{

"value_count":{

"field":"copies"

}

}

}

www.EBooksWorld.ir

}

NoteThevalue_countaggregationallowsustousescripts,discussedearlierinthischapterwhenwedescribedthemin,max,avg,andsumaggregations.PleaserefertothebeginningofMetricsaggregationsectionearlierinthecurrentchapterforfurtherreference.

FieldcardinalityOneoftheaggregationthatallowsustocontrolhowresourcehungrytheaggregationwillbebycontrollingitsprecision,thecardinalityaggregationcalculatesthecountofdistinctvaluesinagivenfield.However,onethingneedstoberemembered:thecalculatedcountisanapproximation,nottheexactvalue.ElasticsearchusestheHyperLogLog++algorithm(http://static.googleusercontent.com/media/research.google.com/fr//pubs/archive/40671.pdftocalculatethevalue.

Thisaggregationhasawidevarietyofusecases,suchasshowingthenumberofdistinctvaluesinafieldthatisresponsibleforholdingthestatuscodeforyourindexedApacheaccesslogs.Onequery,andyouknowtheapproximatedcountofthedistinctvaluesinthatfield.

Forexample,wecanrequestthecardinalityforourtitlefield:

{

"aggs":{

"card_title":{

"cardinality":{

"field":"title"

}

}

}

}

Tocontroltheprecisionofthecardinalitycalculation,wecanspecifytheprecision_thresholdproperty–thehigherthevalue,themoreprecisetheaggregationwillbeandthemoreresourcesitwillneed.Thecurrentmaximumprecision_thresholdvalueis40000andthedefaultdependsontheparentaggregation.Anexamplequeryusingtheprecision_thresholdpropertylooksasfollows:

{

"aggs":{

"card_title":{

"cardinality":{

"field":"title",

"precision_threshold":1000

}

}

}

}

Percentiles

www.EBooksWorld.ir

ThepercentilesaggregationisanotherexampleofaggregationinElasticsearch.Itusesanalgorithmicapproximationapproachtoprovideuswithresults.ItusestheT-Digestalgorithm(https://github.com/tdunning/t-digest/blob/master/docs/t-digest-paper/histo.pdf)fromTedDunningandOtmarErtlandallowsustocalculatepercentiles:metricsthatshowushowmanyresultsareaboveacertainvalue.Forexample,the99thpercentileshowsusthevaluethatisgreaterthan99percentoftheothervalues.

Let’sgointoanexampleandlookataquerythatwillcalculatepercentilesfortheyearfieldinourdata:

{

"aggs":{

"copies_percentiles":{

"percentiles":{

"field":"year"

}

}

}

}

TheresultsreturnedbyElasticsearchfortheprecedingrequestwilllookasfollows:

{

"took":26,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"copies_percentiles":{

"values":{

"1.0":1887.2899999999997,

"5.0":1892.4499999999998,

"25.0":1918.25,

"50.0":1932.5,

"75.0":1942.25,

"95.0":1957.25,

"99.0":1960.25

}

}

}

}

Asyoucansee,thevaluethatishigherthan99percentofthevaluesis1960.25.

Youmaywonderwhysuchaggregationisimportant.Itisveryusefulforperformancemetrics;forexample,whereweusuallylookataveragesforsomeperiodoftime.Imaginethattheaverageresponsetimeofourqueriesforthelasthouris50milliseconds,whichis

www.EBooksWorld.ir

notbad.However,ifthe95thpercentilewouldshow2seconds,thatwouldmeanthatabout5percentoftheusershadtowaittwoormoresecondsforthesearchresults,whichisnotthatgood.

Bydefault,thepercentilesaggregationcalculatessevenpercentiles:1,5,25,50,75,95,and99.Wecancontrolthisbyusingthepercentspropertyandspecifywhichpercentilesweareinterestedin.Forexample,ifwewanttogetonlythe95thandthe99thpercentile,wechangeourquerytothefollowingone:

{

"aggs":{

"copies_percentiles":{

"percentiles":{

"field":"year",

"percents":["95","99"]

}

}

}

}

NoteSimilartothemin,max,avg,andsumaggregations,thepercentilesaggregationsupportsscriptingandallowsustospecifywhichvalueshouldbeusedforthefieldsthatdon’thavevalueinthespecifiedfield.

We’vementionedearlierthatthepercentilesaggregationusesanalgorithmicapproachandisanapproximation.Aswithallapproximations,wecancontroltheprecisionandmemoryusageofthealgorithm.Wedothatbyusingthecompressionproperty,whichdefaultsto100.ItisaninternalpropertyofElasticsearchanditsimplementationdetailsmaychangebetweenversions.Itisworthknowingthatsettingthecompressionvaluetoonehigherthan100canincreasethealgorithmprecisionatthecostofmemoryusage.

PercentileranksThepercentile_ranksaggregationissimilartothepercentilesonethatwejustdiscussed.Itallowsustoshowwhichpercentileagivenvaluehas.Forexample,toshowuswhichpercentileyear1932andyear1960are,werunthefollowingquery:

{

"aggs":{

"copies_percentile_ranks":{

"percentile_ranks":{

"field":"year",

"values":["1932","1960"]

}

}

}

}

TheresponsereturnedbyElasticsearchwillbeasfollows:

{

"took":2,

www.EBooksWorld.ir

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"copies_percentile_ranks":{

"values":{

"1932.0":49.5,

"1960.0":61.5

}

}

}

}

TophitsaggregationThetop_hitsaggregationkeepstrackofthemostrelevantdocumentbeingaggregated.Thisdoesn’tsoundveryappealing,butitallowsustoimplementoneofthemostdesiredfunctionalitiesinElasticsearchcalleddocumentgrouping,fieldcollapsing,ordocumentfolding.Suchfunctionalityisveryusefulinsomeusecases—forexample,whenwewanttoshowabookcatalogbutonlyonefromasinglepublisher.Todothatwithoutthetop_hitsaggregation,wewouldhavetorunmultiplequeries.Withthetop_hitsaggregation,weneedonlyasinglequery.

Thetop_hitsaggregationwasintroducedinElasticsearch1.3.Infact,thementioneddocumentfoldingismoreorlessasideeffectandonlyoneofthepossibleusageexamplesofthetop_hitsaggregation.

Theideabehindthetop_hitsaggregationissimple.Everydocumentthatisassignedtoaparticularbucketcanbealsoremembered.Bydefault,onlythreedocumentsperbucketareremembered.

NoteNotethat,inordertoshowthefullpotentialofthetop_hitsaggregation,wedecidedtouseoneofthebucketingaggregationsaswellandnestthemtoshowthedocumentgroupingfunctionalityimplementation.Thebucketingaggregationsaredescribedindetaillaterinthischapter.

Toshowyouapotentialusecasethatleveragesthetop_hitsaggregation,wehavedecidedtouseasimpleexample.Wewouldliketogetthemostrelevantbookpublishedevery100years.Todothatweusethefollowingquery:

{

"aggs":{

"when":{

www.EBooksWorld.ir

"histogram":{

"field":"year",

"interval":100

},

"aggs":{

"book":{

"top_hits":{

"_source":{

"include":["title","available"]

},

"size":1

}

}

}

}

}

}

Intheprecedingexample,wedidthehistogramaggregationonyearranges.Everybucketwascreatedforeveryonehundredyears.Thenestedtop_hitsaggregationsremembersasingledocumentwiththegreatestscorefromeachbucket(becauseofthesizepropertybeingsetto1).Weaddedtheincludeoptiononlyforsimplerresults,sothatweonlyreturnthetitleandavailablefieldsforeveryaggregateddocument.TheresponsereturnedbyElasticsearchwillbeasfollows:

{

"took":8,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"when":{

"buckets":[{

"key":1800,

"doc_count":1,

"book":{

"hits":{

"total":1,

"max_score":1.0,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"4",

"_score":1.0,

"_source":{

"available":true,

"title":"CrimeandPunishment"

www.EBooksWorld.ir

}

}]

}

}

},{

"key":1900,

"doc_count":3,

"book":{

"hits":{

"total":3,

"max_score":1.0,

"hits":[{

"_index":"library",

"_type":"book",

"_id":"2",

"_score":1.0,

"_source":{

"available":false,

"title":"Catch-22"

}

}]

}

}

}]

}

}

}

Wecanseethat,becauseofthetop_hitsaggregation,wehavethemostscoringdocument(fromeachbucket)includedintheresponse.Inourparticularcase,thequerywasthematch_alloneandallthedocumentshadthesamescore,sothetop-scoringdocumentforeverybucketwasmoreorlessrandom.However,youneedtorememberthatthisisthedefaultbehavior.Ifwewanttohavecustomsorting,thisisnotaproblemforElasticsearch.Wejustneedtoaddthesortpropertyforourtop_hitsaggregator.Forexample,wecanreturnthefirstbookfromagivencentury:

{

"aggs":{

"when":{

"histogram":{

"field":"year",

"interval":100

},

"aggs":{

"book":{

"top_hits":{

"sort":{

"year":"asc"

},

"_source":{

"include":["title","available"]

},

"size":1

}

}

www.EBooksWorld.ir

}

}

}

}

Weaddedsortingtothetop_hitsaggregation,sotheresultsaresortedonthebasisoftheyearfield.Thismeansthatthefirstdocumentwillbetheonewiththelowestvalueinthatfieldandthisisthedocumentthatisgoingtobereturnedforeachbucket.

Additionalparameters

Sortingandfieldinclusionisnoteverythingthatwecanwedoinsidethetop_hitsaggregation.Becausethisaggregationreturnsdocuments,wecanalsousefunctionalitiessuchas:

highlightingexplainscriptingfielddatafield(uninvertedrepresentationofthefields)version

Wejustneedtoincludeanappropriatesectioninthetop_hitsaggregationbody,similartowhatwedowhenweconstructaquery.Forexample:

{

"aggs":{

"when":{

"histogram":{

"field":"year",

"interval":100

},

"aggs":{

"book":{

"top_hits":{

"highlight":{

"fields":{

"title":{}

}

},

"explain":true,

"version":true,

"_source":{

"include":["title","available"]

},

"fielddata_fields":["title"],

"script_fields":{

"century":{

"script":"(doc[\"year\"].value/100).intValue()"

}

},

"size":1

}

}

}

}

www.EBooksWorld.ir

}

}

NoteNotethattheprecedingqueryrequirestheinlinescriptstobeallowed.Thismeansthatthequeryrequiresthescript.inlinepropertysettoonintheelasticsearch.ymlfile.

GeoboundsaggregationThegeo_boundsaggregationisasimpleaggregationthatallowsustocomputetheboundingboxthatincludesallthegeo_pointtypefieldvaluesfromtheaggregateddocuments.

NoteIfyouareinterestedinspatialsearches,thesectiondedicatedtoitiscalledGeoandisincludedinChapter8,BeyondFull-textSearching.

Weonlyneedtoprovidethefield(byusingthefieldproperty;itneedstobeofthegeo_pointtype).Wecanalsoprovidewrap_longitude(valuestrueorfalse;itdefaultstotrue)iftheboundingboxisallowedtooverlaptheinternationaldateline.Inresponse,wegetthelatitudeandlongitudeofthetop-leftandbottom-rightcornersoftheboundingbox.Anexamplequeryusingthisaggregationlooksasfollows(usingthehypotheticallocationfield):

{

"aggs":{

"box":{

"geo_bounds":{

"field":"location"

}

}

}

}

ScriptedmetricsaggregationThelastmetricaggregationwewanttodiscussisthescripted_metricaggregation,whichallowsustodefineourownaggregationcalculationusingscripts.Forthisaggregation,wecanprovidethefollowingscripts(map_scriptistheonlyrequiredone,therestareoptional):

init_script:Thisscriptisrunduringinitializationandallowsustosetupaninitialstateofthecalculation.map_script:Thisistheonlyrequiredscript.Itisexecutedonceforeverydocumentthatneedstostorethecalculationinanobjectcalled_agg.combine_script:ThisscriptisexecutedonceoneachshardafterElasticsearchfinishesdocumentcollectiononthatshard.reduce_script:Thisscriptisexecutedonceonthenodethatiscoordinatingaparticularqueryexecution.Thisscripthasaccesstothe_aggsvariable,whichisanarrayofthevaluesreturnedbycombine_script.

www.EBooksWorld.ir

Forexample,wecanusethescripted_metricaggregationtocalculateallthecopiesofallthebookswehaveinourlibrarybyrunningthefollowingrequest(weshowthewholerequesttoshowhowthenamesareescaped):

curl-XGET'localhost:9200/library/_search?size=0&pretty'-d'{

"aggs":{

"all_copies":{

"scripted_metric":{

"init_script":"_agg[\"all_copies\"]=0",

"map_script":"_agg.all_copies+=doc.copies.value",

"combine_script":"return_agg.all_copies",

"reduce_script":"sum=0;for(numberin_aggs){sum+=number};

returnsum"

}

}

}

}'

Ofcourse,theprecedingscriptisjustasimplesumandwecouldusesumaggregation,butwejustwantedtoshowyouasimpleexampleofwhatyoucandowiththescripted_metricaggregation.

NoteNotethattheprecedingqueryrequiresinlinescriptstobeallowed.Thismeansthatthequeryrequiresthescript.inlinepropertysettoonintheelasticsearch.ymlfile.

Asyoucansee,theinit_scriptpartoftheaggregationisusedtoinitializetheall_copiesvariable.Next,wehavemap_script,whichisexecutedonceforeverydocumentandwejustaddthevalueofthecopiesfieldtotheearlierinitializedvariable.Thecombine_scriptpart,executedonceoneachshard,tellsElasticsearchtoreturnthecalculatedvariable.Finally,thereduce_scriptpart,executedonceforthewholequeryontheaggregatornode,willrunaforloop,whichwillgothroughallthereturnedvaluesthatarestoredinthe_aggsarrayandreturnthesumofthose.ThefinalresultreturnedbyElasticsearchfortheprecedingquerylooksasfollows:

{

"took":2,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"all_copies":{

"value":7

}

www.EBooksWorld.ir

}

}

www.EBooksWorld.ir

BucketsaggregationsThesecondtypeofaggregationsthatwewilldiscussarethebucketsaggregations.Incomparisontometricsaggregations,bucketaggregationreturnsdatanotasasinglemetricbutasalistofkeyvaluepairscalledbuckets.Forexample,thetermsaggregationreturnsthenumberofdocumentsassociatedwitheachterminagivenfield.Theverypowerfulthingaboutbucketsaggregationsisthattheycanhavesub-aggregations,whichmeansthatwecannestotheraggregationsinsidetheaggregationsthatreturnbuckets(wewilldiscussthisattheendofthebucketsaggregationdiscussion).Let’slookatthebucketaggregationsthatareprovidedbyElasticsearchnow.

FilteraggregationThefilteraggregationisasimplebucketingaggregationthatallowsustofiltertheresultstoasinglebucket.Forexample,let’sassumethatwewanttogetacountandtheaveragecopiescountofallthebooksthatarenovels,whichmeanstheyhavethetermnovelinthetagsfield.Thequerythatwillreturnsuchresultslooksasfollows:

{

"aggs":{

"novels_count":{

"filter":{

"term":{

"tags":"novel"

}

},

"aggs":{

"avg_copies":{

"avg":{

"field":"copies"

}

}

}

}

}

}

Asyoucansee,wedefinedthefilterinthefiltersectionoftheaggregationdefinitionandwedefinedasecondnestedaggregation.Thenestedaggregationistheonethatwillberunonthefiltereddocuments.

TheresponsereturnedbyElasticsearchlooksasfollows:

{

"took":13,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

www.EBooksWorld.ir

"max_score":0.0,

"hits":[]

},

"aggregations":{

"novels_count":{

"doc_count":2,

"avg_copies":{

"value":3.5

}

}

}

}

Inthereturnedbucket,wehaveinformationaboutthenumberofdocuments(representedbythedoc_countproperty)andtheaveragenumberofcopies,whichisallwewanted.

FiltersaggregationThesecondbucketaggregationwewanttoshowyouisthefiltersaggregation.Whilethepreviouslydiscussedfilteraggregationresultedinasinglebucket,thefiltersaggregationreturnsmultiplebuckets–oneforeachofthedefinedfilters.Let’sextendourpreviousexampleandassumethat,inadditiontotheaveragenumberofcopiesforthenovels,wealsowanttoknowtheaveragenumberofcopiesforthebooksthatareavailable.Thequerythatwillgetusthisinformationwillusethefiltersaggregationandwilllookasfollows:

{

"aggs":{

"count":{

"filters":{

"filters":{

"novels":{

"term":{

"tags":"novel"

}

},

"available":{

"term":{

"available":true

}

}

}

},

"aggs":{

"avg_copies":{

"avg":{

"field":"copies"

}

}

}

}

}

}

Let’sstophereandlookatthedefinitionoftheaggregation.Asyoucansee,wedefined

www.EBooksWorld.ir

twofiltersusingthefilterssectionofthefiltersaggregation.EachfilterhasanameandtheactualElasticsearchfilter;thefirstiscallednovelsandthesecondiscalledavailable.Elasticsearchwillusethesenamesinthereturnedresponse.ThethingtorememberisthatElasticsearchwillcreateabucketforeachdefinedfilterandwillcalculatethenestedaggregationthatwedefined–inourcase,theonethatcalculatestheaveragenumberofcopies.

NoteThefiltersaggregationallowsustoreturnonemorebucketinadditiontothedefinedones–abucketwithallthedocumentsthatdidn’tmatchthefilters.Inordertocalculatesuchabucket,weneedtoaddtheother_bucketpropertytothebodyoftheaggregationandsetittotrue.

TheresultsreturnedbyElasticsearchareasfollows:

{

"took":4,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"count":{

"buckets":{

"novels":{

"doc_count":2,

"avg_copies":{

"value":3.5

}

},

"available":{

"doc_count":2,

"avg_copies":{

"value":0.5

}

}

}

}

}

}

Asyoucansee,wegottwobuckets,whichiswhatweexpected.

TermsaggregationOneofthemostcommonlyusedbucketaggregationsisthetermsaggregation.Itallows

www.EBooksWorld.ir

ustogetinformationaboutthetermsandthecountofdocumentshavingthoseterms.Forexample,oneofthesimplestusesisgettingthecountofthebooksthatareavailableandnotavailable.Wecandothatbyrunningthefollowingquery:

{

"aggs":{

"counts":{

"terms":{

"field":"available"

}

}

}

}

Intheresponse,wewillgettwobuckets(becausetheBooleanfieldcanonlyhavetwovalues–trueandfalse).Here,thiswilllookasfollows:

{

"took":7,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"counts":{

"doc_count_error_upper_bound":0,

"sum_other_doc_count":0,

"buckets":[{

"key":0,

"key_as_string":"false",

"doc_count":2

},{

"key":1,

"key_as_string":"true",

"doc_count":2

}]

}

}

}

Bydefault,thedataissortedonthebasisofdocumentcount,whichmeansthatthemostcommontermswillbeplacedontopoftheaggregationresults.Ofcourse,wecancontrolthisbehaviorbyspecifyingtheorderpropertyandprovidingtheorderjustlikeweusuallydowhensortingbyarbitraryfieldvalues.Elasticsearchallowsustosortbythedocumentcount(usingthe_countstaticvalue)andbytheterm(usingthe_termstaticvalue).Forexample,ifwewanttosortourprecedingaggregationresultsbydescendingterm,wecanrunthefollowingquery:

www.EBooksWorld.ir

{

"aggs":{

"counts":{

"terms":{

"field":"available",

"order":{"_term":"desc"}}

}

}

}

However,that’snotallwhenitcomestosorting.Wecanalsosortbytheresultsofthenestedaggregationsthatwereincludedinthequery.

Notetermsaggregation,similartothemin,max,avg,andsumaggregationsdiscussedinthemetricsaggregationsectionofthischapter,supportsscriptingandallowsustospecifywhichvalueshouldbeusedforthefieldsthatdon’thaveavalueinthespecifiedfield.

Countsareapproximate

Thethingtorememberwhendiscussingtermsaggregationisthatthecountsareapproximate.Thisisbecauseeachshardprovidesitsowncountsandreturnsthataggregatedinformationtothecoordinatingnode.Thecoordinatingnodeaggregatestheinformationitgotreturningthefinalinformationtotheclient.Becauseofthat,dependingonthedataandhowitisdistributedbetweentheshards,someinformationaboutthecountsmaybelostandthecountswillnotbeexact.Ofcourse,whendealingwithlowcardinalityfields,theapproximationwillbeclosertoexactnumbers,butstillthisissomethingthatshouldbeconsideredwhenusingthetermsaggregation.

Wecancontrolhowmuchinformationisreturnedfromeachoftheshardstothecoordinatingnode.Wecandothisbyspecifyingthesizeandtheshard_sizeproperties.Thesizepropertyspecifieshowmanybucketswillbereturnedatmost.Thehigherthesizeproperty,themoreaccuratethecalculationwillbe.However,thatwillcostusadditionalmemoryandCPUcycles,whichmeansthatthecalculationwillbemoreexpensiveandwillputmorepressureonthehardware.Thisisbecausetheresultsreturnedtothecoordinatingnodefromeachshardwillbelargerandtheresultmergingprocesswillbeharder.

Theshard_sizepropertycanbeusedtominimizetheworkthatneedstobedonebythecoordinatingnode.Whenset,thecoordinatingnodewillfetch(fromeachshard)thenumberofbucketsdeterminedbytheshard_sizeproperty.Thisallowsustoincreasetheprecisionoftheaggregationwhileavoidingtheadditionaloverheadonthecoordinatingnode.Rememberthattheshard_sizepropertycannotbesmallerthanthesizeproperty.

Finally,thesizepropertycanbesetto0,whichwilltellElasticsearchnottolimitthenumberofreturnedbuckets.Itisusuallynotwisetosetthesizepropertyto0asitcanresultinhighresourceconsumption.Also,avoidsettingthesizepropertyto0forhighcardinalityfieldsasthiswilllikelymakeyourElasticsearchclusterexplode.

Minimumdocumentcount

www.EBooksWorld.ir

Elasticsearchprovidesuswithtwoadditionalproperties,whichcanbeusefulincertainsituations:min_doc_countandshard_min_doc_count.Themin_doc_countpropertydefaultsto1andspecifieshowmanydocumentsmustmatchatermtobeincludedintheaggregationresults.Onethingtorememberisthatsettingthemin_doc_countpropertyto0willresultinreturningalltheterms,nomatteriftheyhaveamatchingdocumentornot.Thiscanresultinaverylargeresultsetforaggregationresults.Forexample,ifwewanttoreturntermsmatchedby5ormoredocuments,wewillrunthefollowingquery:

{

"aggs":{

"counts":{

"terms":{

"field":"available",

"min_doc_count":5}

}

}

}

Theshard_min_doc_countpropertyisverysimilaranddefineshowmanydocumentsmustmatchatermtobeincludedintheaggregation’sresults,butontheshardlevel.

RangeaggregationTherangeaggregationallowsustodefineoneormorerangesandElasticsearchcalculatesbucketsforthem.Forexample,ifwewanttocheckhowmanybookswerepublishedinagivenperiodoftime,wecreatethefollowingquery:

{

"aggs":{

"years":{

"range":{

"field":"year",

"ranges":[

{"to":1850},

{"from":1851,"to":1900},

{"from":1901,"to":1950},

{"from":1951,"to":2000},

{"from":2001}

]

}

}

}

}

Wespecifythefieldwewanttheaggregationtobecalculatedonandthearrayofranges.Eachrangeisdefinedbyoneortwoproperties:thetwoandfromsimilartotherangequerieswhichwealreadydiscussed.

TheresultreturnedbyElasticsearchforourdatalooksasfollows:

{

"took":23,

"timed_out":false,

"_shards":{

www.EBooksWorld.ir

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"years":{

"buckets":[{

"key":"*-1850.0",

"to":1850.0,

"to_as_string":"1850.0",

"doc_count":0

},{

"key":"1851.0-1900.0",

"from":1851.0,

"from_as_string":"1851.0",

"to":1900.0,

"to_as_string":"1900.0",

"doc_count":1

},{

"key":"1901.0-1950.0",

"from":1901.0,

"from_as_string":"1901.0",

"to":1950.0,

"to_as_string":"1950.0",

"doc_count":2

},{

"key":"1951.0-2000.0",

"from":1951.0,

"from_as_string":"1951.0",

"to":2000.0,

"to_as_string":"2000.0",

"doc_count":1

},{

"key":"2001.0-*",

"from":2001.0,

"from_as_string":"2001.0",

"doc_count":0

}]

}

}

}

Forexample,between1901and1950wehadtwobooksreleased.

NoteTherangeaggregation,similartothemin,max,avg,andsumaggregationsdiscussedinthemetricsaggregationssectionofthischapter,supportsscriptingandallowsustospecifywhichvalueshouldbeusedforthefieldsthatdon’thaveavalueinthespecifiedfield.

Keyedbuckets

www.EBooksWorld.ir

Onethingthatshouldmentionwhenitcomestotherangeaggregationisthatwecangivethedefinedrangesnames.Forexample,let’sassumethatwewanttousethenamesBefore18thcenturyforthebooksreleasedbefore1799,18thcenturyforthebooksreleasedbetween1800and1900,19thcenturyforthebooksreleasedbetween1900and1999,andAfter19thcenturyforthebooksreleasedafter2000.Wecandothisbyaddingthekeypropertytoeachdefinedrange,givingitthename,andaddingthekeyedpropertysettotrue.Settingthekeyedpropertytotruewillassociateauniquestringvaluetoeachbucketandthekeypropertydefinesthenameforthebucketthatwillbeusedastheuniquename.Aquerythatdoesthatwilllookasfollows:

{

"aggs":{

"years":{

"range":{

"field":"year",

"keyed":true,

"ranges":[

{"key":"Before18thcentury","to":1799},

{"key":"18thcentury","from":1800,"to":1899},

{"key":"19thcentury","from":1900,"to":1999},

{"key":"After19thcentury","from":2000}

]

}

}

}

}

TheresponsereturnedbyElasticsearchinsuchacasewilllookasfollows:

{

"took":2,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"years":{

"buckets":{

"Before18thcentury":{

"to":1799.0,

"to_as_string":"1799.0",

"doc_count":0

},

"18thcentury":{

"from":1800.0,

"from_as_string":"1800.0",

"to":1899.0,

www.EBooksWorld.ir

"to_as_string":"1899.0",

"doc_count":1

},

"19thcentury":{

"from":1900.0,

"from_as_string":"1900.0",

"to":1999.0,

"to_as_string":"1999.0",

"doc_count":3

},

"After19thcentury":{

"from":2000.0,

"from_as_string":"2000.0",

"doc_count":0

}

}

}

}

}

NoteAnimportantandquiteusefulpointabouttherangeaggregationisthatthedefinedrangesneednotbedisjoint.Insuchcases,Elasticsearchwillproperlycountthedocumentformultiplebuckets.

DaterangeaggregationThedate_rangeaggregationissimilartothepreviouslydiscussedrangeaggregationbutitisdesignedforfieldsthatusedate-basedtypes.However,inthelibraryindex,thedocumentshaveyears,butthefieldisanumber,notadate.Forthepurposeofshowinghowthisaggregationworks,let’simaginethatwewanttoextendourlibraryindextosupportnewspapers.Todothiswewillcreateanewindex(calledlibrary2)byusingthefollowingcommand:

curl-XPOSTlocalhost:9200/_bulk--data-binary'{"index":{"_index":

"library2","_type":"book","_id":"1"}}

{"title":"Fishingnews","published":"2010/12/0310:00:00","copies":3,

"available":true}

{"index":{"_index":"library2","_type":"book","_id":"2"}}

{"title":"Knittingmagazine","published":"2010/11/0711:32:00",

"copies":1,"available":true}

{"index":{"_index":"library2","_type":"book","_id":"3"}}

{"title":"Theguardian","published":"2009/07/1304:33:00","copies":0,

"available":false}

{"index":{"_index":"library2","_type":"book","_id":"4"}}

{"title":"HadoopWorld","published":"2012/01/0104:00:00","copies":6,

"available":true}

'

Forthepurposeofthisexample,wewillleavethemappingsdefinitionforElasticsearch–thisissufficientinthiscase.Let’sstartwiththefirstqueryusingthedate_rangeaggregation:

{

www.EBooksWorld.ir

"aggs":{

"years":{

"date_range":{

"field":"published",

"ranges":[

{"to":"2009/12/31"},

{"from":"2010/01/01","to":"2010/12/31"},

{"from":"2011/01/01"}

]

}

}

}

}

Comparedwiththeordinaryrangeaggregation,theonlythingthatchangedistheaggregationtype,whichisnowdate_range.ThedatescanbepassedasastringinaformrecognizedbyElasticsearchorasanumbervalue(numberofmillisecondssince1970-01-01).TheresponsereturnedbyElasticsearchfortheprecedingquerylooksasfollows:

{

"took":5,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"years":{

"buckets":[{

"key":"*-2009/12/3100:00:00",

"to":1.2622176E12,

"to_as_string":"2009/12/3100:00:00",

"doc_count":1

},{

"key":"2010/01/0100:00:00-2010/12/3100:00:00",

"from":1.262304E12,

"from_as_string":"2010/01/0100:00:00",

"to":1.2937536E12,

"to_as_string":"2010/12/3100:00:00",

"doc_count":2

},{

"key":"2011/01/0100:00:00-*",

"from":1.29384E12,

"from_as_string":"2011/01/0100:00:00",

"doc_count":1

}]

}

}

}

www.EBooksWorld.ir

Asyoucansee,theresponseisnodifferentwhencomparedtotheresponsereturnedbytherangeaggregation.Wehavetwoattributesforeachbucket-namedfromandtowhichrepresentthenumberofmillisecondsfrom1970-01-01.Thepropertiesfrom_as_stringandto_as_stringpresentthesameinformationasfromandto,butinahuman-readableform.Ofcoursethekeyedparameterandkeyinthedefinitionofdaterangeworkinthealreadydescribedway.

Elasticsearchalsoallowsustodefinetheformatofpresenteddatesusingtheformatattribute.Inourexample,wepresentedthedateswithyearresolution,sothedayandtimepartswereunnecessary.Ifwewanttoshowthemonthnames,wecansendaquerysuchasthefollowingone:

{

"aggs":{

"years":{

"date_range":{

"field":"published",

"format":"MMMMYYYY",

"ranges":[

{"to":"December2009"},

{"from":"January2010","to":"December2010"},

{"from":"January2011"}

]

}

}

}

}

Notethatthedatesinthetoandfromparametersalsoneedtobeprovidedinthespecifiedformat.Oneofthereturnedrangeslooksasfollows:

{

"key":"January2010-December2010",

"from":1.262304E12,

"from_as_string":"January2010",

"to":1.2911616E12,

"to_as_string":"December2010",

"doc_count":1

}

NoteTheavailableformatswecanuseinformataredefinedintheJodaTimelibrary.Thefulllistisavailableathttp://joda-time.sourceforge.net/apidocs/org/joda/time/format/DateTimeFormat.html.

Thereisonemorethingaboutthedate_rangeaggregationthatwewanttomention.Imaginethatsometimewemaywanttobuildanaggregationthatcanchangewithtime.Forexample,wemaywanttoseehowmanynewspaperswerepublishedinthelast3,6,9,and12months.Thisispossiblewithouttheneedtoadjustthequeryeverytime,aswecanuseconstantssuchasnow-9M.Thefollowingexampleshowsthis:

{

www.EBooksWorld.ir

"aggs":{

"years":{

"date_range":{

"field":"published",

"format":"dd-MM-YYYY",

"ranges":[

{"to":"now-9M/M"},

{"to":"now-9M"},

{"from":"now-6M/M","to":"now-9M/M"},

{"from":"now-3M/M"}

]

}

}

}

}

Thekeyhereisexpressionssuchasnow-9M.Elasticsearchdoesthemathandgeneratestheappropriatevalue.Forexample,youcanusey(year),M(month),w(week),d(day),h(hour),m(minute),ands(second).Forexample,theexpressionnow+3dmeansthreedaysfromnow.The/Minourexampletakesonlythedateroundedtomonths.Thankstosuchnotation,weonlycountfullmonths.Thesecondadvantageisthatthecalculateddateismorecache-friendlywithouttheroundingdatechangeseverymillisecondthatmakeeverycachebasedontherangeirrelevantandbasicallyuselessinmostcases.

IPv4rangeaggregationAveryinterestingaggregationistheip_rangeoneasitworksonInternetaddresses.ItworksonthefieldsdefinedwiththeiptypeandallowsdefiningrangesgivenbytheIPrangeinCIDRnotation(http://en.wikipedia.org/wiki/Classless_Inter-Domain_Routing).Anexampleusageoftheip_rangeaggregationlooksasfollows:

{

"aggs":{

"access":{

"ip_range":{

"field":"ip",

"ranges":[

{"from":"192.168.0.1","to":"192.168.0.254"},

{"mask":"192.168.1.0/24"}

]

}

}

}

}

Theresponsetotheprecedingqueryisasfollows:

"access":{

"buckets":[

{

"from":3232235521,

"from_as_string":"192.168.0.1",

"to":3232235774,

"to_as_string":"192.168.0.254",

"doc_count":0

www.EBooksWorld.ir

},

{

"key":"192.168.1.0/24",

"from":3232235776,

"from_as_string":"192.168.1.0",

"to":3232236032,

"to_as_string":"192.168.2.0",

"doc_count":4

}

]

}

Similartotherangeaggregation,wedefinebothendsofthebracketsandthemask.TherestisdonebyElasticsearchitself.

MissingaggregationThemissingaggregationallowsustocreateabucketandseehowmanydocumentshavenovalueinaspecifiedfield.Forexample,wecancheckhowmanyofourbooksinthelibraryindexdon’thavetheoriginaltitledefined–theotitlefield.Todothis,werunthefollowingquery:

{

"aggs":{

"missing_original_title":{

"missing":{

"field":"otitle"

}

}

}

}

TheresponsereturnedbyElasticsearchinthiscasewilllookasfollows:

{

"took":15,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"missing_original_title":{

"doc_count":2

}

}

}

Aswecansee,wehavetwodocumentswithouttheotitlefield.

www.EBooksWorld.ir

HistogramaggregationThehistogramaggregationisaninterestingonebecauseofitsautomation.Thisaggregationdefinesbucketsitself.Weareonlyresponsiblefordefiningthefieldandtheinterval,andtherestisdoneautomatically.Thesimplestformofaquerythatusesthisaggregationlooksasfollows:

{

"aggs":{

"years":{

"histogram":{

"field":"year",

"interval":100

}

}

}

}

Thenewinformationweneedtoprovideisinterval,whichdefinesthelengthofeveryrangethatwillbeusedtocreateabucket.Wesettheintervalto100,whichinourcasewillresultinbucketsthatare100yearswide.Theaggregationpartoftheresponsetotheprecedingquerythatwassenttoourlibraryindexisasfollows:

{

"took":13,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"years":{

"buckets":[{

"key":1800,

"doc_count":1

},{

"key":1900,

"doc_count":3

}]

}

}

}

Similartotherangeaggregation,thehistogramaggregationallowsustousethekeyedpropertytodefinenamedbuckets.Theotheravailableoptionismin_doc_count,whichallowsustospecifytheminimumnumberofdocumentsrequiredtocreateabucket.Ifwesetthemin_doc_countpropertytozero,Elasticsearchwillalsoincludebucketswiththedocumentcountofzero.Wecanalsousethemissingpropertytospecifythevalue

www.EBooksWorld.ir

Elasticsearchshouldusewhenadocumentdoesn’thaveavalueinthespecifiedfield.

www.EBooksWorld.ir

DatehistogramaggregationAsadate_rangeaggregationisaspecializedformoftherangeaggregation,date_histogramisanextensionofthehistogramaggregationthatworksondates.Forthepurposeofthisexample,wewillagainusethedataweindexedwhendiscussingthedateaggregation.Thismeansthatwewillrunourqueriesagainsttheindexcalledlibrary2.Anexamplequeryusingthedate_histogramaggregationlooksasfollows:

{

"aggs":{

"years":{

"date_histogram":{

"field":"published",

"format":"yyyy-MM-ddHH:mm",

"interval":"10d",

"min_doc_count":1}

}

}

}

Thedifferencebetweenthehistogramanddate_histogramaggregationsistheintervalproperty.Thevalueofthispropertyisnowastringdescribingthetimeinterval,whichinourcaseis10days.Ofcoursewecansetittoanythingwewant.Itusesthesamesuffixeswediscussedwhiletalkingaboutformatsinthedate_rangeaggregation.Itisworthmentioningthatthenumbercanbeafloatvalue.Forexample,1.5mmeansthatthelengthofthebucketwillbeoneandahalfminutes.Theformatattributeisthesameasinthedate_rangeaggregation.Thankstoit,Elasticsearchcanaddahuman-readabledatetextaccordingtothedefinedformat.Ofcoursetheformatattributeisnotrequiredbutuseful.Inadditiontothat,similartotheotherrangeaggregations,thekeyedandmin_doc_countattributesstillwork.

TimezonesElasticsearchstoresallthedatesintheUTCtimezone.YoucandefinethetimezonetobeusedbyElasticsearchbyusingthetime_zoneattribute.Bysettingthisproperty,webasicallytellElasticsearchwhichtimezoneshouldbeusedtoperformthecalculations.Therearethreenotationswithwhichtosettheseattributes:

Wecansetthehoursoffset;forexample,time_zone:5Wecanusethetimeformat;forexample,time_zone:"-04:30"Wecanusethenameofthetimezone;forexample,time_zone:"Europe\Warsaw"

NoteLookathttp://joda-time.sourceforge.net/timezones.htmltoseetheavailabletimezones.

www.EBooksWorld.ir

GeodistanceaggregationsThenexttwoaggregationsareconnectedwithmapsandspatialsearches.WewilltalkaboutgeotypesandqueriesintheElasticsearchspatialcapabilitiessectionofChapter8,BeyondFull-textSearching,sofeelfreetoskipthesetwotopicsnowandreturntothemlater.

Lookatthefollowingquery:

{

"aggs":{

"neighborhood":{

"geo_distance":{

"field":"location",

"origin":[-0.1275,51.507222],

"ranges":[

{"to":1200},

{"from":1201}

]

}

}

}

}

Youcanseethatthequeryissimilartotherangeaggregation.Theprecedingaggregationwillcalculatethenumberofdocumentsthatfallintotwobuckets:onecloserthan1200kmandthesecondonefurtherthan1200kmfromthegeographicalpointdefinedbytheoriginproperty(intheprecedingcase,theoriginisLondon).TheaggregationsectionoftheresponsereturnedbyElasticsearchlooksasfollows:

"neighborhood":{

"buckets":[

{

"key":"*-1200.0",

"from":0,

"to":1200,

"doc_count":1

},

{

"key":"1201.0-*",

"from":1201,

"doc_count":4

}

]

}

Thekeyedandthekeyattributesworkinthegeo_distanceaggregationaswell,sowecaneasilymodifytheresponsetoourneedsandcreatenamedbuckets.

Thegeo_distanceaggregationsupportsafewadditionalparametersthatareshowninthefollowingquery:

{

"aggs":{

www.EBooksWorld.ir

"neighborhood":{

"geo_distance":{

"field":"location",

"origin":{"lon":-0.1275,"lat":51.507222},

"unit":"m",

"distance_type":"plane",

"ranges":[

{"to":1200},

{"from":1201}

]

}

}

}

}

Wehavehighlightedthreethingsintheprecedingquery.Thefirstchangeishowwedefinedtheoriginpoint.Thistimewespecifiedthelocationbyprovidingthelatitudeandlongitudeexplicitly.

Thesecondchangeistheunitattribute.Itdefinestheunitsusedintherangesarray.Thepossiblevaluesare:km(thedefault,kilometers),mi(miles),in(inches),yd(yards),m(meters),cm(centimeters),andmm(millimeters).

Thelastattribute,distance_type,specifieshowElasticsearchcalculatesthedistance.Thepossiblevaluesare(fromthefastestbutleastaccuratetotheslowestbutthemostaccurate):plane,sloppy_arc(thedefault),andarc.

www.EBooksWorld.ir

GeohashgridaggregationThesecondaggregationrelatedtogeographicalanalysisisbasedongridsandiscalledgeohash_grid.Itorganizesareasintogridsandassignseverylocationtoacellinsuchagrid.Todothisefficiently,ElasticsearchusesGeohash(http://en.wikipedia.org/wiki/Geohash),whichencodesthelocationintoastring.Thelongerthestringis,themoreaccuratethedescriptionofaparticularlocation.Forexample,oneletterissufficienttodeclareaboxofaboutfivethousandsquarekilometersand5lettersareenoughtoincreasetheaccuracytofivesquarekilometers.Let’slookatthefollowingquery:

{

"aggs":{

"neighborhood":{

"geohash_grid":{

"field":"location",

"precision":5

}

}

}

}

Wedefinedthegeohash_gridaggregationwithbucketsthathaveaprecisionoffivesquarekilometers(theprecisionattributedescribesthenumberoflettersusedinthegeohashstringobject).Thetablewithresolutionsversusthelengthofgeohashcanbefoundathttps://www.elastic.co/guide/en/elasticsearch/reference/master/search-aggregations-bucket-geohashgrid-aggregation.html.

Ofcourse,themoreaccuratewewanttheaggregationtobe,themoreresourcesElasticsearchwillconsume,becauseofthenumberofbucketsthattheaggregationhastocalculate.Bydefault,Elasticsearchdoesnotgeneratemorethan10,000buckets.Youcanchangethisbehaviorbyusingthesizeattribute,butkeepinmindthattheperformancemaysufferforverywidequeriesconsistingofthousandsofbuckets.

www.EBooksWorld.ir

GlobalaggregationTheglobalaggregationisanaggregationthatdefinesasinglebucketcontainingallthedocumentsfromagivenindexandtype,andnotinfluencedbythequeryitself.Thethingthatdifferentiatestheglobalaggregationfromalltheothersisthattheglobalaggregationhasanemptybody.Forexample,lookatthefollowingquery:

{

"query":{

"term":{

"available":"true"

}

},

"aggs":{

"all_books":{

"global":{}

}

}

}

Inourlibraryindex,weonlyhavetwoavailablebooks,buttheresponsetotheprecedingquerylooksasfollows:

{

"took":1,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":3,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"all_books":{

"doc_count":4

}

}

}

Asyoucansee,theglobalaggregationisnotboundbythequery.Becausetheresultoftheglobalaggregationisasinglebucketcontainingallthedocuments(notnarroweddownbythequeryitself),itisaperfectcandidateforuseasatop-levelparentaggregationfornestingaggregations.

www.EBooksWorld.ir

SignificanttermsaggregationThesignificant_termsaggregationallowsustogetthetermsthatarerelevantandprobablythemostsignificantforagivenquery.Thegoodthingisthatitdoesn’tonlyshowthetoptermsfromtheresultsofthegivenquery,butalsotheonethatseemstobethemostimportantone.

Theusecasesforthisaggregationtypecanvaryfromfindingthemosttroublesomeserverworkinginyourapplicationenvironment,tosuggestingnicknamesfromtext.WheneverElasticsearchseesasignificantchangeinthepopularityofaterm,suchatermisacandidateforbeingsignificant.

NoteRememberthatthesignificant_termsaggregationisveryexpensivewhenitcomestoresourcesandrunningagainstlargeindices.Workisbeingdonetoprovidealightweightversionofthataggregation;asaresult,theAPIforsignificant_termsaggregationmaychangeinthefuture.

Thebestwaytodescribethesignificant_termsaggregationtypeistouseanexample.Let’sstartwithindexing12simpledocuments,whichrepresentreviewsofworkdonebyinterns:

curl-XPOST'localhost:9200/interns/review/1'-d'{"intern":"Richard",

"grade":"bad","type":"grade"}'

curl-XPOST'localhost:9200/interns/review/2'-d'{"intern":"Ralf",

"grade":"perfect","type":"grade"}'

curl-XPOST'localhost:9200/interns/review/3'-d'{"intern":"Richard",

"grade":"bad","type":"grade"}'

curl-XPOST'localhost:9200/interns/review/4'-d'{"intern":"Richard",

"grade":"bad","type":"review"}'

curl-XPOST'localhost:9200/interns/review/5'-d'{"intern":"Richard",

"grade":"good","type":"grade"}'

curl-XPOST'localhost:9200/interns/review/6'-d'{"intern":"Ralf",

"grade":"good","type":"grade"}'

curl-XPOST'localhost:9200/interns/review/7'-d'{"intern":"Ralf",

"grade":"perfect","type":"review"}'

curl-XPOST'localhost:9200/interns/review/8'-d'{"intern":"Richard",

"grade":"medium","type":"review"}'

curl-XPOST'localhost:9200/interns/review/9'-d'{"intern":"Monica",

"grade":"medium","type":"grade"}'

curl-XPOST'localhost:9200/interns/review/10'-d'{"intern":"Monica",

"grade":"medium","type":"grade"}'

curl-XPOST'localhost:9200/interns/review/11'-d'{"intern":"Ralf",

"grade":"good","type":"grade"}'

curl-XPOST'localhost:9200/interns/review/12'-d'{"intern":"Ralf",

"grade":"good","type":"grade"}'

Ofcourse,toshowtherealpowerofthesignificant_termsaggregation,weshoulduseawaylargerdataset.However,forthepurposeofthisbook,wewillconcentrateonthisexample,soitiseasiertoillustratehowthisaggregationworks.

Nowlet’stryfindingthemostsignificantgradeforRichard.Todothiswewillusethe

www.EBooksWorld.ir

followingquery:

curl-XGET'localhost:9200/interns/_search?size=0&pretty'-d'{

"query":{

"match":{

"intern":"Richard"

}

},

"aggregations":{

"description":{

"significant_terms":{

"field":"grade"

}

}

}

}'

Theresultoftheprecedingquerylooksasfollows:

{

"took":2,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":5,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"description":{

"doc_count":5,

"buckets":[{

"key":"bad",

"doc_count":3,

"score":0.84,

"bg_count":3

}]

}

}

}

Asyoucansee,forourqueryElasticsearchinformedusthatthemostsignificantgradeforRichardisbad.Maybeitwasn’tthebestinternshipforhim;whoknows.

ChoosingsignificanttermsTocalculatesignificantterms,Elasticsearchlooksfordatathatreportsasignificantchangeintheirpopularitybetweentwosetsofdata:theforegroundsetandthebackgroundset.Theforegroundsetisthedatareturnedbyourquery,whilethebackgroundsetisthedatainourindex(orindices,dependingonhowwerunourqueries).Ifatermexistsin10documentsoutofonemillionindexed,butappearsin5documentsfromthe10returned,

www.EBooksWorld.ir

thensuchatermisdefinitelysignificantandworthconcentratingon.

Let’sgetbacktoourprecedingexamplenowtoanalyzeitabit.Richardgotthreegradesfromthereviewers–badthreetimes,mediumonetime,andgoodonetime.Fromthesethree,thebadvalueappearedinthreeoutofthefivedocumentsmatchingthequery.Ingeneral,thebadgradeappearedinthreedocuments(thebg_countproperty)outofthe12documentsintheindex(thisisourbackgroundset).Thisgivesus25percentoftheindexeddocuments.Ontheotherhand,thebadgradeappearedinthreeoutofthefivedocumentsmatchingthequery(thisisourforegroundset),whichgivesus60percentofthedocuments.Asyoucansee,thechangeinpopularityissignificantforthebadgradeandthat’swhyElasticsearchhasreturneditinthesignificant_termsaggregationresults.

MultiplevalueanalysisThesignificant_termsaggregationcanbenestedandprovideuswithnicedataanalysiscapabilitiesthatconnecttwomultiplesetsofdata.Forexample,let’strytofindasignificantgradeforeachoftheinternsthatwehaveinformationabout.Todothiswewillnestthesignificant_termsaggregationinsidethetermsaggregation.Thequerythatdoesthatlooksasfollows:

curl-XGET'localhost:9200/interns/_search?size=0&pretty'-d'{

"aggregations":{

"grades":{

"terms":{

"field":"intern"

},

"aggregations":{

"significantGrades":{

"significant_terms":{

"field":"grade"

}

}

}

}

}

}'

TheresultsreturnedbyElasticsearchfortheprecedingqueryareasfollows:

{

"took":2,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":12,

"max_score":0.0,

"hits":[]

},

"aggregations":{

www.EBooksWorld.ir

"grades":{

"doc_count_error_upper_bound":0,

"sum_other_doc_count":0,

"buckets":[{

"key":"ralf",

"doc_count":5,

"significantGrades":{

"doc_count":5,

"buckets":[{

"key":"good",

"doc_count":3,

"score":0.48,

"bg_count":4

}]

}

},{

"key":"richard",

"doc_count":5,

"significantGrades":{

"doc_count":5,

"buckets":[{

"key":"bad",

"doc_count":3,

"score":0.84,

"bg_count":3

}]

}

},{

"key":"monica",

"doc_count":2,

"significantGrades":{

"doc_count":2,

"buckets":[]

}

}]

}

}

}

www.EBooksWorld.ir

SampleraggregationThesampleraggregationisoneoftheexperimentalaggregationsinElasticsearch.Itallowsustolimitthesubaggregationprocessingtoasampleofdocumentsthataretop-scoringones.Thisallowsfilteringandpotentialremovalofgarbageinthedata.Itisaverynicecandidateasatop-levelaggregationtolimittheamountofdatathesignificant_termsaggregationrunson.Thesimplestexampleofusingthisaggregationisasfollows:

{

"aggs":{

"sampler_example":{

"sampler":{

"field":"tags",

"max_docs_per_value":1,

"shard_size":10

},

"aggs":{

"best_terms":{

"terms":{

"field":"title"

}

}

}

}

}

}

Toseetherealpowerofsampling,wewillhavetoplaywithitonalargerdataset,butfornowwewilldiscusstheprecedingexample.Thesampleraggregationwasdefinedwiththreeproperties:field,max_docs_per_value,andshard_size.Thefirsttwopropertiesallowustocontrolthediversityofthesampling.WetellElasticsearchhowmanydocumentsatmaximum(thevalueofthemax_doc_per_valueproperty)canbecollectedonashardwiththesamevalueinthedefinedfield(thevalueofthefieldproperty).

Theshard_sizepropertytellsElasticsearchhowmanydocuments(atmost)tocollectfromeachshard.

www.EBooksWorld.ir

ChildrenaggregationThechildrenaggregationisasingle-bucketaggregationthatcreatesabucketwithallthechildrenofthespecifiedtype.Let’sgetbacktotheUsingtheparent-childrelationshipsectioninChapter5,ExtendingYourIndexStructure,andlet’susethecreatedshopindex.Tocreateabucketofallchildrendocumentswiththevariationtypeintheshopindex,werunthefollowingquery:

{

"aggs":{

"variation_children":{

"children":{

"type":"variation"

}

}

}

}

TheresponsereturnedbyElasticsearchisasfollows:

{

"took":4,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":3,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"variation_children":{

"doc_count":2

}

}

}

NoteBecausethechildrenaggregationusesparent–childfunctionality,itreliesonthe_parentfield,whichneedstobepresent.

www.EBooksWorld.ir

NestedaggregationIntheUsingnestedobjectssectionofChapter5,ExtendingYourIndexStructure,welearnedaboutnesteddocuments.Let’susethatdatatolookintothenexttypeofaggregation–thenestedone.Let’screatethesimplestworkingquery,whichlookslikethis(weusetheshop_nestedindexcreatedinthementionedchapter):

{

"aggs":{

"variations":{

"nested":{

"path":"variation"

}

}

}

}

Theprecedingqueryissimilarinstructuretoanyotheraggregation.However,insteadofprovidingthefieldnameonwhichtheaggregationshouldbecalculated,itcontainsasingleparameterpath,whichpointstothenesteddocument.Intheresponsewegetanumberofnesteddocuments:

{

"took":4,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"variations":{

"doc_count":2

}

}

}

Theprecedingresponsemeansthatwehavetwonesteddocumentsintheindex,withtheprovidedtypevariation.

www.EBooksWorld.ir

ReversenestedaggregationThereverse_nestedaggregationisaspecial,single-bucketaggregationthatallowsaggregationonparentdocumentsfromthenesteddocuments.Thereverse_nestedaggregationdoesn’thaveabodysimilartoglobalaggregation.Soundsquitecomplicated,butitisnot.Let’slookatthefollowingquerythatwerunagainsttheshop_nestedindexcreatedinChapter5,ExtendingYourIndexStructureintheUsingnestedobjectssection:

{

"aggs":{

"variations":{

"nested":{

"path":"variation"

},

"aggs":{

"sizes":{

"terms":{

"field":"variation.size"

},

"aggs":{

"product_name_terms":{

"reverse_nested":{},

"aggs":{

"product_name_terms_per_size":{

"terms":{

"field":"name"

}

}

}

}

}

}

}

}

}

}

Westartwiththetoplevelaggregation,whichisthesamenestedaggregationthatweusedwhendiscussingthenestedaggregation.However,weincludeasub-aggregationthatusesreverse_nestedtobeabletoshowtermsfromthetitleforeachsizereturnedbythetop-levelnestedaggregation.Thisispossiblebecause,whenthereverse_nestedaggregationisused,Elasticsearchcalculatesthedataonthebasisoftheparentdocumentsinsteadofusingthenesteddocuments.

NoteRememberthatthereverse_nestedaggregationmustbeusedinsidethenestedaggregation.

Theresponsetotheprecedingquerywilllookasfollows:

{

"took":7,

www.EBooksWorld.ir

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"variations":{

"doc_count":2,

"sizes":{

"doc_count_error_upper_bound":0,

"sum_other_doc_count":0,

"buckets":[{

"key":"XL",

"doc_count":1,

"product_name_terms":{

"doc_count":1,

"product_name_terms_per_size":{

"doc_count_error_upper_bound":0,

"sum_other_doc_count":0,

"buckets":[{

"key":"shirt",

"doc_count":1

},{

"key":"test",

"doc_count":1

}]

}

}

},{

"key":"XXL",

"doc_count":1,

"product_name_terms":{

"doc_count":1,

"product_name_terms_per_size":{

"doc_count_error_upper_bound":0,

"sum_other_doc_count":0,

"buckets":[{

"key":"shirt",

"doc_count":1

},{

"key":"test",

"doc_count":1

}]

}

}

}]

}

}

}

}

www.EBooksWorld.ir

NestingaggregationsandorderingbucketsWhentalkingaboutbucketaggregations,wejustneedtogetbacktothetopicofnestingaggregations.Thisisaverypowerfultechnique,becauseitallowsyoutofurtherprocessthedatafordocumentsinthebuckets.Forexample,thetermsaggregationwillreturnabucketforeachtermandthestatsaggregationcanshowusthestatisticsfordocumentsineachbucket.Forexample,let’slookatthefollowingquery:

{

"aggs":{

"copies":{

"terms":{

"field":"copies"

},

"aggs":{

"years":{

"stats":{

"field":"year"

}

}

}

}

}

}

Thisisanexampleofnestedaggregations.Thetermsaggregationwillreturnbucketsforeachtermfromthecopiesfield(threebucketsinthecaseofourdata),andthestatsaggregationwillcalculatestatisticsfortheyearfieldforthedocumentsfallingintoeachbucketreturnedbythetopaggregation.TheresponsefromElasticsearchfortheprecedingquerylooksasfollows:

{

"took":3,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"copies":{

"doc_count_error_upper_bound":0,

"sum_other_doc_count":0,

"buckets":[{

"key":0,

"doc_count":2,

"years":{

"count":2,

www.EBooksWorld.ir

"min":1886.0,

"max":1936.0,

"avg":1911.0,

"sum":3822.0

}

},{

"key":1,

"doc_count":1,

"years":{

"count":1,

"min":1929.0,

"max":1929.0,

"avg":1929.0,

"sum":1929.0

}

},{

"key":6,

"doc_count":1,

"years":{

"count":1,

"min":1961.0,

"max":1961.0,

"avg":1961.0,

"sum":1961.0

}

}]

}

}

}

Thisisapowerfulfeatureandallowsustobuildverycomplexdataprocessingpipelines.Ofcourse,wearenotlimitedtoasinglenestedaggregationandwecannestmultipleofthemandevennestanaggregationinsideanestedaggregation.Forexample:

{

"aggs":{

"popular_tags":{

"terms":{

"field":"copies"

},

"aggs":{

"years":{

"terms":{

"field":"year"

},

"aggs":{

"available_by_year":{

"stats":{

"field":"available"

}

}

}

},

"available":{

"stats":{

"field":"available"

www.EBooksWorld.ir

}

}

}

}

}

}

Asyoucansee,thepossibilitiesarealmostunlimited,ifyouhaveenoughmemoryandCPUpowertohandleverycomplicatedaggregations.

BucketsorderingThereisonemorefeatureaboutnestedaggregationsandtheorderingofaggregationresults.Elasticsearchcanusevaluesfromthenestedaggregationstosorttheparentbuckets.Forexample,let’slookatthefollowingquery:

{

"aggs":{

"availability":{

"terms":{

"field":"copies",

"order":{"numbers.avg":"desc"}

},

"aggs":{

"numbers":{"stats":{}}

}

}

}

}

Inthepreviousexample,theorderintheavailabilityaggregationisbasedontheaveragevaluefromthenumbersaggregation.Thenotationnumbers.avgisrequiredinthiscase,becausestatsisamultivaluedaggregationandprovidesmultipleinformationandwewereinterestedintheaverage.Ifitwerethesumaggregation,thenameoftheaggregationwouldbesufficient.

www.EBooksWorld.ir

www.EBooksWorld.ir

PipelineaggregationsThelasttypeofaggregationwewilldiscussispipelineaggregations.Tillnowwe’velearnedaboutmetricsaggregationsandbucketaggregations.Thefirstonereturnedmetricswhilethesecondtypereturnedbuckets.Andbothmetricsandbucketsaggregationsworkedonthebasisofreturneddocuments.Pipelineaggregationsaredifferent.Theyworkontheoutputoftheotheraggregationsandtheirmetrics,allowingfunctionalitiessuchasmoving-averagecalculations(https://en.wikipedia.org/wiki/Moving_average).

NoteRememberthatpipelineaggregationswereintroducedinElasticsearch2.0andareconsideredexperimental.ThismeansthattheAPIcanchangeinthefuture,breakingbackwards-compatibility.

www.EBooksWorld.ir

AvailabletypesTherearetwotypesofpipelineaggregation.Thesocalledparentaggregationsfamilyworksontheoutputofotheraggregations.Theyareabletoproducenewbucketsornewaggregationstoaddtoexistingbuckets.Thesecondtypeiscalledsiblingaggregationsandtheseaggregationsareabletoproducenewaggregationsonthesamelevel.

www.EBooksWorld.ir

ReferencingotheraggregationsBecauseoftheirnature,thepipelineaggregationsneedtobeabletoaccesstheresultsoftheotheraggregations.Wecandothatviathebuckets_pathproperty,whichisdefinedusingaspecifiedformat.WecanuseafewkeywordsthatallowustotellElasticsearchexactlywhichaggregationandmetricweareinterestedin.The>separatestheaggregationsandthe.characterseparatestheaggregationfromitsmetrics.Forexample,my_sum.summeansthatwetakethesummetricofanaggregationcalledmy_sum.Anotherexampleispopular_tags>my_sum.sum,whichmeansthatweareinterestedinthesummetricofasubaggregationcalledmy_sum,whichisnestedinsidethepopular_tagsaggregation.Inadditiontothis,wecanuseaspecialpathcalled_count.Thiscanbeusedtocalculatethepipelineaggregationsondocumentcountinsteadofspecifiedmetrics.

www.EBooksWorld.ir

GapsinthedataOurdatacancontaingaps–situationswherethedatadoesn’texist.Forsuchusecases,wehavetheabilitytospecifythegap_policypropertyandsetittoskiporinsert_zeros.TheskipvaluetellsElasticsearchtoignorethemissingdataandcontinuefromthenextavailablevalue,whileinsert_zerosreplacesthemissingvalueswithzero.

www.EBooksWorld.ir

PipelineaggregationtypesMostoftheaggregationswewillshowinthissectionareverysimilartotheoneswe’vealreadyseeninthesectionsaboutmetricsandbucketsaggregations.Becauseofthat,wewon’tdiscussthemindepth.Therearealsonew,specificpipelineaggregationsthatwewanttotalkaboutinalittlemoredata.

Min,max,sum,andaveragebucketaggregationsThemin_bucket,max_bucket,sum_bucket,andavg_bucketaggregationsaresiblingaggregations,similarinwhattheyreturntothemin,max,sum,andavgaggregations.However,insteadofworkingonthedatareturnedbythequery,theyworkontheresultsoftheotheraggregations.

Toshowyouasimpleexampleofhowthisaggregationworks,let’scalculatethesumofallthebucketsreturnedbytheotheraggregations.Thequerythatwilldothatlooksasfollows:

{

"aggs":{

"periods_histogram":{

"histogram":{

"field":"year",

"interval":100

},

"aggs":{

"copies_per_100_years":{

"sum":{

"field":"copies"

}

}

}

},

"sum_copies":{

"sum_bucket":{

"buckets_path":"periods_histogram>copies_per_100_years"

}

}

}

}

Asyoucansee,weusedthehistogramaggregationandweincludedanestedaggregationthatcalculatesthesumofthecopiesfield.Oursum_bucketsiblingaggregationisusedoutsidethemainaggregationandreferstoitusingthebuckets_pathproperty.IttellsElasticsearchthatweareinterestedinsummingthevaluesofmetricsreturnedbythecopies_per_100_yearsaggregation.TheresultreturnedbyElasticsearchforthisquerylooksasfollows:

{

"took":2,

"timed_out":false,

"_shards":{

"total":5,

www.EBooksWorld.ir

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"periods_histogram":{

"buckets":[{

"key":1800,

"doc_count":1,

"copies_per_100_years":{

"value":0.0

}

},{

"key":1900,

"doc_count":3,

"copies_per_100_years":{

"value":7.0

}

}]

},

"sum_copies":{

"value":7.0

}

}

}

Asyoucansee,Elasticsearchaddedanotherbuckettotheresults,calledsum_copies,whichholdsthevaluewewereinterestedin.

CumulativesumaggregationThecumulative_sumaggregationisaparentpipelineaggregationthatallowsustocalculatethesuminthehistogramordate_histogramaggregation.Asimpleexampleoftheaggregationlooksasfollows:

{

"aggs":{

"periods_histogram":{

"histogram":{

"field":"year",

"interval":100

},

"aggs":{

"copies_per_100_years":{

"sum":{

"field":"copies"

}

},

"cumulative_copies_sum":{

"cumulative_sum":{

"buckets_path":"copies_per_100_years"

}

www.EBooksWorld.ir

}

}

}

}

}

Becausethisaggregationisaparentpipelineaggregation,itisdefinedinthesubaggregations.Thereturnedresultlooksasfollows:

{

"took":2,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"periods_histogram":{

"buckets":[{

"key":1800,

"doc_count":1,

"copies_per_100_years":{

"value":0.0

},

"cumulative_copies_sum":{

"value":0.0

}

},{

"key":1900,

"doc_count":3,

"copies_per_100_years":{

"value":7.0

},

"cumulative_copies_sum":{

"value":7.0

}

}]

}

}

}

Thefirstcumulative_copies_sumis0becauseofthesumdefinedinthebucket.Thesecondisthesumofallthepreviousonesandthecurrentbucket,whichmeans7.Thenextwillbethesumofallthepreviousonesandthenextbucket.

BucketselectoraggregationThebucket_selectoraggregationisanothersiblingparentaggregation.Itallowsusingascripttodecideifabucketshouldberetainedintheparentmulti-bucketaggregation.For

www.EBooksWorld.ir

example,tokeeponlybucketsthathavemorethanonecopyperperiod,wecanrunthefollowingquery(itneedsthescript.inlinepropertytobesettoonintheelasticsearch.ymlfile):

{

"aggs":{

"periods_histogram":{

"histogram":{

"field":"year",

"interval":100

},

"aggs":{

"copies_per_100_years":{

"sum":{

"field":"copies"

}

},

"remove_empty_buckets":{

"bucket_selector":{

"buckets_path":{

"sum_copies":"copies_per_100_years"

},

"script":"sum_copies>1"

}

}

}

}

}

}

Therearetwoimportantthingshere.Thefirstisthebuckets_pathproperty,whichisdifferenttowhatwe’veusedsofar.Nowitusesakeyandavalue.Thekeyisusedtoreferencethevalueinthescript.Thesecondimportantthingisthescriptproperty,whichdefinesthescriptthatdecidesiftheprocessedbucketshouldberetained.TheresultsreturnedbyElasticsearchinthiscaseareasfollows:

{

"took":330,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"periods_histogram":{

"buckets":[{

"key":1900,

"doc_count":3,

"copies_per_100_years":{

www.EBooksWorld.ir

"value":7.0

}

}]

}

}

}

Aswecansee,thebucketwiththecopies_per_100_yearsvalueequalto0hasbeenremoved.

BucketscriptaggregationThebucket_scriptaggregation(siblingparent)allowsustodefinemultiplebucketpathsandusetheminsideascript.Theusedmetricsmustbethenumerictypeandthereturnedvaluealsoneedstobenumeric.Anexampleofusingthisaggregationfollows(thefollowingqueryneedsthescript.inlinepropertytobesettoonintheelasticsearch.ymlfile):

{

"aggs":{

"periods_histogram":{

"histogram":{

"field":"year",

"interval":100

},

"aggs":{

"copies_per_100_years":{

"sum":{

"field":"copies"

}

},

"stats_per_100_years":{

"stats":{

"field":"copies"

}

},

"example_bucket_script":{

"bucket_script":{

"buckets_path":{

"sum_copies":"copies_per_100_years",

"count":"stats_per_100_years.count"

},

"script":"sum_copies/count*1000"

}

}

}

}

}

}

Therearetwothingshere.Thefirstthingisthatwe’vedefinedtwoentriesinthebuckets_pathproperty.Weareallowedtodothatinthebucket_scriptaggregation.Eachentryisakeyandavalue.Thekeyisthenameofthevaluethatwecanuseinthescript.Thesecondisthepathtotheaggregationmetricweareinterestedin.Ofcourse,the

www.EBooksWorld.ir

scriptpropertydefinesthescriptthatreturnsthevalue.

Thereturnedresultsfortheprecedingqueryareasfollows:

{

"took":5,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

"hits":[]

},

"aggregations":{

"periods_histogram":{

"buckets":[{

"key":1800,

"doc_count":1,

"copies_per_100_years":{

"value":0.0

},

"stats_per_100_years":{

"count":1,

"min":0.0,

"max":0.0,

"avg":0.0,

"sum":0.0

},

"example_bucket_script":{

"value":0.0

}

},{

"key":1900,

"doc_count":3,

"copies_per_100_years":{

"value":7.0

},

"stats_per_100_years":{

"count":3,

"min":0.0,

"max":6.0,

"avg":2.3333333333333335,

"sum":7.0

},

"example_bucket_script":{

"value":2333.3333333333335

}

}]

}

}

}

www.EBooksWorld.ir

SerialdifferencingaggregationTheserial_diffaggregationisaparentpipelineaggregationthatimplementsatechniquewherethevaluesintimeseriesdata(suchasahistogramordatehistogram)aresubtractedfromthemselvesatdifferenttimeperiods.Thistechniqueallowsdrawingthedatachangesbetweentimeperiodsinsteadofdrawingthewholevalue.Youknowthatthepopulationofacitygrowswithtime.Ifweusetheserialdifferencingaggregationwiththeperiodofoneday,wecanseethedailygrowth.

Tocalculatetheserial_diffaggregation,weneedtheparentaggregation,whichisahistogramoradate_histogram,andweneedtoprovideitwithbuckets_path,whichpointstothemetricweareinterestedin,andlag(apositive,non-zerointegervalue),whichtellswhichpreviousbuckettosubtractfromthecurrentone.Wecanomitlag,inwhichcaseElasticsearchwillsetitto1.

Let’snowlookatasimplequerythatusesthediscussedaggregation:

{

"aggs":{

"periods_histogram":{

"histogram":{

"field":"year",

"interval":100

},

"aggs":{

"copies_per_100_years":{

"sum":{

"field":"copies"

}

},

"first_difference":{

"serial_diff":{

"buckets_path":"copies_per_100_years",

"lag":1

}

}

}

}

}

}

Theresponsetotheprecedingquerylooksasfollows:

{

"took":68,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":0.0,

www.EBooksWorld.ir

"hits":[]

},

"aggregations":{

"periods_histogram":{

"buckets":[{

"key":1800,

"doc_count":1,

"copies_per_100_years":{

"value":0.0

}

},{

"key":1900,

"doc_count":3,

"copies_per_100_years":{

"value":7.0

},

"first_difference":{

"value":7.0

}

}]

}

}

}

Asyoucansee,withthesecondbucketwegotouraggregation(wewillgetitwitheverybucketafterthataswell).Thecalculatedvalueis7becausethecurrentvalueofcopies_per_100_yearsis7andthepreviousis0.Subtracting0from7givesus7.

DerivativeaggregationThederivativeaggregationisanotherexampleofparentpipelineaggregation.Asitsnamesuggests,itcalculatesaderivative(https://en.wikipedia.org/wiki/Derivative)ofagivenmetricfromahistogramordatehistogram.Theonlythingweneedtoprovideisbuckets_path,whichpointstothemetricweareinterestedin.Anexamplequeryusingthisaggregationlooksasfollows:

{

"aggs":{

"periods_histogram":{

"histogram":{

"field":"year",

"interval":100

},

"aggs":{

"copies_per_100_years":{

"sum":{

"field":"copies"

}

},

"derivative_example":{

"derivative":{

"buckets_path":"copies_per_100_years"

}

}

}

www.EBooksWorld.ir

}

}

}

MovingavgaggregationThelastpipelineaggregationthatwewanttodiscussisthemoving_avgone.Itcalculatesthemovingaveragemetric(https://en.wikipedia.org/wiki/Moving_average)overthebucketsoftheparentaggregation(yes,thisisaparentpipelineaggregation).Similartothefewpreviouslydiscussedaggregations,itneedstoberunontheparenthistogramordatehistogramaggregation.

Whencalculatingthemovingaverage,Elasticsearchwilltakethewindow(specifiedbythewindowpropertyandsetto5bydefault),calculatetheaverageforbucketsinthewindow,movethewindowonebucketfurther,andrepeat.Ofcoursewealsoneedtoprovidebuckets_path,whichpointstothemetricthatthemovingaverageshouldbecalculatedfor.

Anexampleofusingthisaggregationlooksasfollows:

{

"aggs":{

"periods_histogram":{

"histogram":{

"field":"year",

"interval":10

},

"aggs":{

"copies_per_10_years":{

"sum":{

"field":"copies"

}

},

"moving_avg_example":{

"moving_avg":{

"buckets_path":"copies_per_10_years"

}

}

}

}

}

}

Wewillomitincludingtheresponsefortheprecedingqueryasitisquitelarge.

Predictingfuturebuckets

Theverynicethingaboutmovingaverageaggregationisthatitsupportspredictions;itcanattempttoextrapolatethedataithasandcreatefuturebuckets.Toforcetheaggregationtopredictbuckets,wejustneedtoaddthepredictpropertytoanymovingaverageaggregationandsetittothenumberofpredictionswewanttoget.Forexample,ifwewanttoaddfivepredictionstotheprecedingquery,wewillchangeittolookasfollows:

www.EBooksWorld.ir

{

"aggs":{

"periods_histogram":{

"histogram":{

"field":"year",

"interval":10

},

"aggs":{

"copies_per_10_years":{

"sum":{

"field":"copies"

}

},

"moving_avg_example":{

"moving_avg":{

"buckets_path":"copies_per_10_years",

"predict":5

}

}

}

}

}

Ifyoulookattheresultsandcomparetheresponsereturnedforthepreviousquerywiththeonewithpredictions,youwillnoticethatthelastbucketinthepreviousqueryendsonthekeypropertyequalto1960,whilethequerywithpredictionsendsonthekeypropertyequalto2010,whichisexactlywhatwewantedtoachieve.

Themodels

Bydefault,Elasticsearchusesthesimplestmodelforcalculatingthemovingaveragesaggregation,butwecancontrolthatbyspecifyingthemodelproperty;thispropertyholdsthenameofthemodelandthesettingsobject,whichwecanusetoprovidemodelproperties.

Thepossiblemodelsare:simple,linear,ewma,holt,andholt_winters.Discussingeachofthemodelsindetailisbeyondthescopeofthebook,soifyouareinterestedindetailsaboutthedifferentmodels,refertotheofficialElasticsearchdocumentationregardingthemovingaveragesaggregationavailableathttps://www.elastic.co/guide/en/elasticsearch/reference/master/search-aggregations-pipeline-movavg-aggregation.html.

Anexamplequeryusingdifferentmodellooksasfollows:

{

"aggs":{

"periods_histogram":{

"histogram":{

"field":"year",

"interval":10},

"aggs":{

"copies_per_10_years":{

"sum":{

"field":"copies"

www.EBooksWorld.ir

}},

"moving_avg_example":{

"moving_avg":{

"buckets_path":"copies_per_10_years",

"model":"holt",

"settings":{

"alpha":0.6,

"beta":0.4

}

}

}

}

}

}

}

www.EBooksWorld.ir

www.EBooksWorld.ir

SummaryThechapterwejustfinishedwasallaboutdataanalysisinElasticsearch:theaggregationsengine.Welearnedwhattheaggregationsareandhowtheywork.Weusedmetrics,buckets,andnewlyintroducedpipelineaggregations,andlearnedwhatwecandowiththem.

Inthenextchapter,we’llgobeyondfulltextsearching.Wewillusesuggesterstobuildefficientautocompletefunctionalityandcorrecttheusers’spellingmistakes.Wewillseewhatpercolationisandhowtouseitinourapplication.WewillusethegeospatialabilitiesofElasticsearchandwe’lllearnhowtoefficientlyfetchlargeamountofdatafromElasticsearch.

www.EBooksWorld.ir

www.EBooksWorld.ir

Chapter8.BeyondFull-textSearchingThepreviouschapterwasfullydedicatedtodataanalysisandhowwecanperformitwithElasticsearch.Welearnedhowtouseaggregations,whattypesofaggregationareavailable,andwhataggregationsareavailablewithineachtypeandhowtousethem.Inthischapter,wewillgetbacktoqueryrelatedtopics.Bytheendofthischapter,youwillhavelearnedthefollowingtopics:

WhatispercolatorandhowtouseitWhatarethegeospatialcapabilitiesofElasticsearchHowtouseandbuildfunctionalitiesusingElasticsearchsuggestersHowtousetheScrollAPItoefficientlyfetchlargenumbersofresults

www.EBooksWorld.ir

PercolatorHaveyoueverwonderedwhatwouldhappenifwereversethetraditionalmodelofusingqueriestofinddocumentsinElasticsearch?Doesitmakesensetohaveadocumentandsearchforqueriesmatchingit?Itisnotsurprisingthatthereisawholerangeofsolutionswherethismodelisveryuseful.Wheneveryouoperateonanunboundedstreamofinputdata,whereyousearchfortheoccurrencesofparticularevents,youcanusethisapproach.Thiscanbeusedforthedetectionoffailuresinamonitoringsystemorforthe“Tellmewhenaproductwiththedefinedcriteriawillbeavailableinthisshop”functionality.Inthissection,wewilllookathowanElasticsearchpercolatorworksandhowwecanuseittoimplementoneoftheaforementionedusecases.

www.EBooksWorld.ir

TheindexInalltheexamplestobeusedwhendiscussingpercolatorfunctionality,wewilluseanindexcallednotifier.Thementionedindexiscreatedbyusingthefollowingcommand:

curl-XPOST'localhost:9200/notifier'-d'{

"mappings":{

"book":{

"properties":{

"title":{

"type":"string"

},

"otitle":{

"type":"string"

},

"year":{

"type":"integer"

},

"available":{

"type":"boolean"

},

"tags":{

"type":"string",

"index":"not_analyzed"

}

}

}

}

}'

Itisquitesimple.Itcontainsasingletypeandfivefields,whichwillbeusedduringourjourneythroughtheworld.

www.EBooksWorld.ir

PercolatorpreparationElasticsearchexposesaspecialtypecalled.percolatorthatistreateddifferently.Thismeansthatwecanstoreanydocumentsandalsosearchthemlikeanordinarytypeinanyindex.IfyoulookatanyElasticsearchquery,youwillnoticethateachisavalidJSONdocument,whichmeansthatwecanindexandstoreitasadocumentaswell.Thethingisthatpercolatorallowsustoinversethesearchlogicandsearchforquerieswhichmatchagivendocument.Thisispossiblebecauseofthetwojustdiscussedfeatures:thespecial.percolatortypeandthefactthatqueriesinElasticsearcharevalidJSONdocuments.

Let’sgetbacktothelibraryexamplefromChapter2,IndexingYourData,andtrytoindexoneofthequeriesinthepercolator.Weassumethatourusersneedtobeinformedwhenanybookmatchingthecriteriadefinedbythequeryisavailable.

Lookatthefollowingquery1.jsonfilethatcontainsanexamplequerygeneratedbytheuser:

{

"query":{

"bool":{

"must":{

"term":{

"title":"crime"

}

},

"should":{

"range":{

"year":{

"gt":1900,

"lt":2000

}

}

},

"must_not":{

"term":{

"otitle":"nothing"

}

}

}

}

}

Toenhancetheexample,wealsoassumethatourusersareallowedtodefinefiltersusingourhypotheticaluserinterface.Forexample,ourusermaybeinterestedintheavailablebooksthatwerewrittenbeforetheyear2010.Anexamplequerythatcouldhavebeenconstructedbysuchauserinterfacewouldlookasfollows(thequerywaswrittentothequery2.jsonfile):

{

"query":{

"bool":{

"must":{

"range":{

www.EBooksWorld.ir

"year":{

"lt":2010

}

}

},

"filter":{

"term":{

"available":true

}

}

}

}

}

Now,let’sregisterbothqueriesinthepercolator(notethatweareregisteringthequeriesandhaven’tindexedanydocuments).Inordertodothis,wewillrunthefollowingcommands:

curl-XPUT'localhost:9200/notifier/.percolator/1'-d@query1.json

curl-XPUT'localhost:9200/notifier/.percolator/old_books'-d@query2.json

Intheprecedingexamples,weusedtwocompletelydifferentidentifiers.Wedidthatinordertoshowthatwecanuseanidentifierthatbestdescribesthequery.Itisuptoustodecideunderwhichnamewewouldlikethequerytoberegistered.

Wearenowreadytouseourpercolator.Ourapplicationwillprovidedocumentstothepercolatorandcheckifanyofthealreadyregisteredqueriesmatchthedocument.Thisisexactlywhatapercolatorallowsustodo-toreversethesearchlogic.Insteadofindexingthedocumentsandrunningqueriesagainstthem,westorethequeriesandsendthedocumentstofindthematchingqueries.

Let’suseanexampledocumentthatwillmatchbothstoredqueries;itwillhavetherequiredtitleandthereleasedate,andwillmentionwhetheritiscurrentlyavailable.Thecommandtosendsuchadocumenttothepercolatorlooksasfollows:

curl-XGET'localhost:9200/notifier/book/_percolate?pretty'-d'{

"doc":{

"title":"CrimeandPunishment",

"otitle":"Преступлéниеинаказáние",

"author":"FyodorDostoevsky",

"year":1886,

"characters":["Raskolnikov","SofiaSemyonovnaMarmeladova"],

"tags":[],

"copies":0,

"available":true

}

}'

Asweexpected,bothqueriesmatchedandtheElasticsearchresponseincludestheidentifiersofthematchingqueries.Sucharesponselooksasfollows:

{

"took":36,

"_shards":{

"total":5,

www.EBooksWorld.ir

"successful":5,

"failed":0

},

"total":2,

"matches":[{

"_index":"notifier",

"_id":"old_books"

},{

"_index":"notifier",

"_id":"1"

}]

}

Thisworkslikeacharm.Oneveryimportantthingtonoteistheendpointusedinthisquery:_percolate.Usingthisendpointisrequiredwhenwewanttousethepercolator.Theindexnamecorrespondstotheindexwherethequerieswerestored,andthetypeisequaltothetypedefinedinthemappings.

NoteTheresponseformatcontainsinformationabouttheindexandthequeryidentifier.Thisinformationisincludedforcaseswhenwesearchagainstmultipleindicesatonce.Whenusingasingleindex,addinganadditionalqueryparameter,percolate_format=ids,willchangetheresponseasfollows:

"matches":["old_books","1"]

www.EBooksWorld.ir

GettingdeeperBecausethequeriesregisteredinapercolatorareinfactdocuments,wecanuseanormalquerysenttoElasticsearchinordertochoosewhichqueriesstoredinthe.percolatortypeshouldbeusedinthepercolationprocess.Thismaysoundweird,butitreallygivesalotofpossibilities.Inourlibrary,wecanhaveseveralgroupsofusers.Let’sassumethatsomeofthemhavepermissionstoborrowveryrarebooks,orthatwehaveseveralbranchesinthecityandtheusercandeclarewhereheorshewouldliketogetthebookfrom.

Let’sseehowsuchusecasescanbeimplementedbyusingthepercolator.Todothis,wewillneedtoupdateourmappingandincludethebranchinformation.Wedothatbyrunningthefollowingcommand:

curl-XPOST'localhost:9200/notifier/.percolator/_mapping'-d'{

".percolator":{

"properties":{

"branches":{

"type":"string",

"index":"not_analyzed"

}

}

}

}'

Now,inordertoregisteraquery,weusethefollowingcommand:

curl-XPUT'localhost:9200/notifier/.percolator/3'-d'{

"query":{

"term":{

"title":"crime"

}

},

"branches":["brA","brB","brD"]

}'

Intheprecedingexample,weregisteredaquerythatshowsauser’sinterest.Ourhypotheticaluserisinterestedinanybookwiththetermcrimeinthetitlefield(thetermqueryisresponsibleforthis).Heorshewantstoborrowthisbookfromoneofthethreelistedbranches.Whenspecifyingthemappings,wedefinedthatthebranchesfieldisanon-analyzedstringfield.Wecannowincludeaqueryalongwiththedocumentwesentpreviously.Let’slookathowtodothis.

Ourbooksystemjustgotthebook,anditisreadytoreportthebookandcheckwhetherthebookisofinteresttoanyone.Tocheckthis,wesendthedocumentthatdescribesthebookandaddanadditionalquerytosucharequest-thequerythatwilllimittheuserstoonlytheonesinterestedinthebrBbranch.Sucharequestlooksasfollows:

curl-XGET'localhost:9200/notifier/book/_percolate?pretty'-d'{

"doc":{

"title":"CrimeandPunishment",

"otitle":"

www.EBooksWorld.ir

Преступлéниеинаказáние

",

"author":"FyodorDostoevsky",

"year":1886,

"characters":["Raskolnikov","SofiaSemyonovnaMarmeladova"],

"tags":[],

"copies":0,

"available":true

},

"size":10,

"filter":{

"term":{

"branches":"brB"

}

}

}'

Ifeverythingwasexecutedcorrectly,theresponsereturnedbyElasticsearchshouldlookasfollows(weindexedourquerywith3asanidentifier):

{

"took":27,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"total":1,

"matches":[{

"_index":"notifier",

"_id":"3"

}]

}

ControllingthesizeofreturnedresultsThesizeoftheresultswhenitcomestopercolatormakesthedifference.Themorequeriesasingledocumentmatches,themoreresultswillbereturnedandmorememorywillbeneededbyElasticsearch.Becauseofthis,thereisoneadditionalthingtonote-thesizeparameter.Itallowsustolimitthenumberofmatchesreturned.

PercolatorandscorecalculationInthepreviousexamples,wefilteredourqueriesusingasingletermquery,butwedidn’tthinkaboutthescoringprocessatall.Elasticsearchallowsustocalculatethescorewhenusingthepercolator.Let’schangethepreviouslyuseddocumentsenttothepercolatorandadjustitsothatscoringisused:

curl-XGET'localhost:9200/notifier/book/_percolate?pretty'-d'{

"doc":{

"title":"CrimeandPunishment",

"otitle":"Преступлéниеинаказáние",

"author":"FyodorDostoevsky",

"year":1886,

"characters":["Raskolnikov","SofiaSemyonovnaMarmeladova"],

www.EBooksWorld.ir

"tags":[],

"copies":0,

"available":true

},

"size":10,

"query":{

"term":{

"branches":"brB"

}

},

"track_scores":true,

"sort":{

"_score":"desc"

}

}'

Asyoucansee,weusedthequerysectionandincludedanadditionaltrack_scoresattributesettotrue.Thisisneeded,becausebydefaultElasticsearchwon’tcalculatethescoreforthedocumentsbecauseofperformance.Ifweneedscoresinthepercolationprocess,weshouldbeawarethatsuchquerieswillbeslightlymoredemandingwhenitcomestoCPUprocessingpowerthantheonesthatomitcalculatingthescore.

NoteIntheprecedingexample,wetoldElasticsearchtosortourresultonthebasisofthescoreindescendingorder.Thisisthedefaultbehaviorwhentrack_scoresisturnedon,sowecanomitsortdeclaration.Atthetimeofwriting,sortingonscoreindescendingdirectionistheonlyavailableoption.

CombiningpercolatorswithotherfunctionalitiesIfweareallowedtousequeriesalongwiththedocumentssentforpercolation,whycanwenotuseotherElasticsearchfunctionalities?Ofcourse,thisispossible.Forexample,thefollowingdocumentissentalongwithanaggregationandtheresultswillincludetheaggregationcalculation:

curl-XGET'localhost:9200/notifier/book/_percolate?pretty'-d'{

"doc":{

"title":"CrimeandPunishment",

"available":true

},

"aggs":{

"test":{

"terms":{

"field":"branches"

}

}

}

}'

Aswecansee,percolatorallowsustorunbothqueryandaggregations.Lookatthefollowingexampledocument:

curl-XGET'localhost:9200/notifier/book/_percolate?pretty'-d'{

www.EBooksWorld.ir

"doc":{

"title":"CrimeandPunishment",

"year":1886,

"available":true

},

"size":10,

"highlight":{

"fields":{

"title":{}

}

}

}'

Asyoucansee,itcontainsahighlightingsection.AfragmentoftheresponsereturnedbyElasticsearchlooksasfollows:

{

"_index":"notifier",

"_id":"3",

"highlight":{

"title":["<em>Crime</em>andPunishment"]

}

}

NoteNotethattherearesomelimitationswhenitcomestothequerytypessupportedbythepercolatorfunctionality.Inthecurrentimplementation,parent-childrelationsarenotavailableinthepercolator,soyoucan’tusequeriessuchashas_child,top_children,andhas_parent.

www.EBooksWorld.ir

GettingthenumberofmatchingqueriesSometimesyoudon’tcareaboutthematchedqueriesandyouonlywantthenumberofmatchedqueries.Insuchcases,sendingadocumentagainstthestandardpercolatorendpointisnotefficient.Elasticsearchexposesthe_percolate/countendpointtohandlesuchcasesinanefficientway.Anexampleofsuchacommandfollows:

curl-XGET'localhost:9200/notifier/book/_percolate/count?pretty'-d'{

"doc":{...}

}'

www.EBooksWorld.ir

IndexeddocumentpercolationInthefinal,closingparagraphofthepercolationsection,wewanttoshowyouonemorething–thepossibilityofpercolatingadocumentthatisalreadyindexed.Todothis,weneedtousetheGEToperationonthedocumentandprovideinformationaboutwhichpercolatorindexshouldbeused.Let’slookatthefollowingcommand:

curl-XGET'localhost:9200/library/book/1/_percolate?

percolate_index=notifier'

Thiscommandchecksthedocumentwiththe1identifierfromourlibraryindexagainstthepercolatorindexdefinedbythepercolate_indexparameter.Rememberthat,bydefault,Elasticsearchwillusethepercolatorinthesameindexasthedocument;that’swhywe’vespecifiedthepercolate_indexparameter.

www.EBooksWorld.ir

www.EBooksWorld.ir

ElasticsearchspatialcapabilitiesThesearchserverssuchasElasticsearchareusuallylookedatfromtheperspectiveoffull-textsearching.Elasticsearch,becauseofitsmarketingasbeingpartofELK(Elasticsearch,Logstash,andKibana),isalsohighlyknownforbeingabletohandlelargeamountoftimeseriesdata.However,thisisonlyapartofthewholeview.Sometimesbothofthementionedusecasesarenotenough.Imaginesearchingforlocalservices.Fortheenduser,themostimportantthingistheaccuracyoftheresults.Byaccuracy,wenotonlymeantheproperresultsofthefull-textsearch,butalsotheresultsbeingasnearastheycanintermsoflocation.Inseveralcases,thisisthesameasatextsearchongeographicalnamessuchascitiesorstreets,butinothercaseswecanfinditveryusefultobeabletosearchonthebasisofthegeographicalcoordinatesofourindexeddocuments.AndthisisalsoafunctionalitythatElasticsearchiscapableofhandling.

WiththereleaseofElasticsearch2.2,thegeo_pointtypereceivedalotofchanges,especiallyinternallywherealltheoptimizationsweredone.Priorto2.2,thegeo_pointtypewasstoredintheindexasatwonotanalyzedstringvaluesandthischanged.WiththereleaseofElasticsearch2.2,thegeo_pointtypegotallthegreatimprovementsfromApacheLucenelibraryandisnowmoreefficient.

www.EBooksWorld.ir

MappingpreparationforspatialsearchesInordertodiscussthespatialsearchfunctionality,let’sprepareanindexwithalistofcities.Thiswillbeaverysimpleindexwithonetypenamedpoi(whichstandsforthepointofinterest),thenameofthecity,anditscoordinates.Themappingsareasfollows:

{

"mappings":{

"poi":{

"properties":{

"name":{"type":"string"},

"location":{"type":"geo_point"}

}

}

}

}

Assumingthatweputthisdefinitionintothemapping1.jsonfile,wecancreateanindexbyrunningthefollowingcommand:

curl-XPUTlocalhost:9200/map-d@mapping1.json

Theonlynewthingintheprecedingmappingsisthegeo_pointtype,whichisusedforthelocationfield.Byusingit,wecanstorethegeographicalpositionofourcityandusespatial-basedfunctionalities.

www.EBooksWorld.ir

ExampledataOurexampledocuments1.jsonfilewithdocumentslooksasfollows:

{"index":{"_index":"map","_type":"poi","_id":1}}

{"name":"NewYork","location":"40.664167,-73.938611"}

{"index":{"_index":"map","_type":"poi","_id":2}}

{"name":"London","location":[-0.1275,51.507222]}

{"index":{"_index":"map","_type":"poi","_id":3}}

{"name":"Moscow","location":{"lat":55.75,"lon":37.616667}}

{"index":{"_index":"map","_type":"poi","_id":4}}

{"name":"Sydney","location":"-33.859972,151.211111"}

{"index":{"_index":"map","_type":"poi","_id":5}}

{"name":"Lisbon","location":"eycs0p8ukc7v"}

Inordertoperformabulkrequest,weaddedinformationabouttheindexname,type,anduniqueidentifiersofourdocuments;so,wecannoweasilyimportthisdatausingthefollowingcommand:

curl-XPOSTlocalhost:9200/_bulk--data-binary@documents1.json

Onethingthatweshouldtakeacloserlookatisthelocationfield.Wecanusevariousnotationsforcoordination.Wecanprovidethelatitudeandlongitudevaluesasastring,asapairofnumbers,orasanobject.Notethatthestringandarraymethodsofprovidingthegeographicallocationhavedifferentordersforthelatitudeandlongitudeparameters.ThelastrecordshowsthatthereisalsoapossibilitytogivecoordinationasaGeohashvalue(thenotationisdescribedindetailathttp://en.wikipedia.org/wiki/Geohash).

Additionalgeo_fieldpropertiesWiththereleaseofElasticsearch2.2,thenumberofparametersthatthegeo_pointtypecanaccepthasbeenreducedandisasfollows:

geohash:BooleanparametertellingElasticsearchwhetherthe.geohashfieldshouldbecreated.Defaultstofalseunlessgeohash_prefixisused.geohash_precision:Maximumsizeofgeohashandgeohash_prefix.geohash_prefix:BooleanparametertellingElasticsearchtoindexthegeohashanditsprefixes.Defaultstofalse.ignore_malformed:BooleanparametertellingElasticsearchtoignoreabadlywrittengeo_fieldpointinsteadofrejectingthewholedocument.Defaultstofalse,whichmeansthatthebadlyformattedgeo_fielddatawillresultinanindexationerrorforthewholedocument.lat_lon:BooleanparametertellingElasticsearchtoindexthespatialdataintwoseparatefieldscalled.latand.lon.Defaultstofalse.precision_step:Parameterallowingcontroloverhowournumericgeographicalpointswillbeindexed.

Keepinmindthatthegeohashfieldrelatedandlat_lonfieldrelatedpropertieswerenotremovedforbackward-compatibilityreasons.Theuserscanstillusethem.However,thequerieswillnotusethembutwillinsteadusethehighlyoptimizeddatastructurethatis

www.EBooksWorld.ir

builtduringindexingbythegeo_pointtype.

www.EBooksWorld.ir

SamplequeriesNowlet’slookatseveralexamplesofusingcoordinatesandsolvingcommonrequirementsinmodernapplicationsthatrequiregeographicaldatasearchingalongwithfull-textsearching.

NoteIfyouareinterestedinallthegeospatialqueriesthatareavailableforElasticsearchusers,refertotheofficialdocumentationavailableathttps://www.elastic.co/guide/en/elasticsearch/reference/current/geo-queries.html.

Distance-basedsortingLet’sstartwithaverycommonrequirement:sortingthereturnedresultsbydistancefromagivenpoint.Inourexample,wewanttogetallthecitiesandsortthembytheirdistancesfromthecapitalofFrance,Paris.Todothis,wesendthefollowingquerytoElasticsearch:

curl-XGETlocalhost:9200/map/_search?pretty-d'{

"query":{

"match_all":{}

},

"sort":[{

"_geo_distance":{

"location":"48.8567,2.3508",

"unit":"km"

}

}]

}'

IfyouremembertheSortingdatasectionfromChapter4,ExtendingYourQueryingKnowledge,you’llnoticethattheformatisslightlydifferent.Weareusingthe_geo_distancekeytoindicatesortingbydistance.Wemustgivethebaselocation(thelocationattribute,whichholdstheinformationofthelocationofParisinourcase),andweneedtospecifytheunitsthatcanbeusedintheresults.Theavailablevaluesarekmandmi,whichstandforkilometersandmiles,respectively.Theresultofsuchaquerywillbeasfollows:

{

"took":5,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":5,

"max_score":null,

"hits":[{

"_index":"map",

"_type":"poi",

"_id":"2",

www.EBooksWorld.ir

"_score":null,

"_source":{

"name":"London",

"location":[-0.1275,51.507222]

},

"sort":[343.17487356850313]

},{

"_index":"map",

"_type":"poi",

"_id":"5",

"_score":null,

"_source":{

"name":"Lisbon",

"location":"eycs0p8ukc7v"

},

"sort":[1452.9506736367805]

},{

"_index":"map",

"_type":"poi",

"_id":"3",

"_score":null,

"_source":{

"name":"Moscow",

"location":{

"lat":55.75,

"lon":37.616667

}

},

"sort":[2483.837565935267]

},{

"_index":"map",

"_type":"poi",

"_id":"1",

"_score":null,

"_source":{

"name":"NewYork",

"location":"40.664167,-73.938611"

},

"sort":[5832.645958617513]

},{

"_index":"map",

"_type":"poi",

"_id":"4",

"_score":null,

"_source":{

"name":"Sydney",

"location":"-33.859972,151.211111"

},

"sort":[16978.094780773998]

}]

}

}

Aswiththeotherexamplesofsorting,Elasticsearchshowsinformationaboutthevalueusedforsorting.Let’slookatthehighlightedrecord.Aswecansee,thedistancebetweenParisandLondonisabout343km,andifyoucheckatraditionalmap,youwillseethat

www.EBooksWorld.ir

thisistrue.

BoundingboxfilteringThenextexamplethatwewanttoshowisnarrowingdowntheresultstoaselectedareathatisboundedbyagivenrectangle.Thisisveryhandyifwewanttoshowresultsonthemaporwhenweallowausertomarkthemapareaforsearching.YoualreadyreadaboutfiltersintheFilteringyourresultssectionofChapter4,ExtendingYourQueryingKnowledge,buttherewedidn’tmentionspatialfilters.Thefollowingqueryshowshowwecanfilterbyusingtheboundingbox:

curl-XGETlocalhost:9200/map/_search?pretty-d'{

"query":{

"bool":{

"must":{"match_all":{}},

"filter":{

"geo_bounding_box":{

"location":{

"top_left":"52.4796,-1.903",

"bottom_right":"48.8567,2.3508"

}

}

}

}

}

}'

Intheprecedingexample,weselectedamapfragmentbetweenBirminghamandParisbyprovidingthetop-leftandbottom-rightcornercoordinates.Thesetwocornersareenoughtospecifyanyrectanglewewant,andElasticsearchwilldotherestofthecalculationforus.Thefollowingscreenshotshowsthespecifiedrectangleonthemap:

www.EBooksWorld.ir

Aswecansee,theonlycityfromourdatathatmeetsthecriteriaisLondon.So,let’scheckwhetherElasticsearchknowsthisbyrunningtheprecedingquery.Let’snowlookatthereturnedresults:

{

"took":38,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":1,

"max_score":1.0,

"hits":[{

"_index":"map",

"_type":"poi",

"_id":"2",

"_score":1.0,

"_source":{

"name":"London",

"location":[-0.1275,51.507222]

}

}]

}

}

www.EBooksWorld.ir

Asyoucansee,againElasticsearchagreeswiththemap.

LimitingthedistanceThelastexampleshowsthenextcommonrequirement:limitingtheresultstotheplacesthatarelocatednofurtherthanthedefineddistancefromagivenpoint.Forexample,ifwewanttolimitourresultstoallthecitieswithinthe500kmradiusfromParis,wecanusethefollowingquery:

curl-XGETlocalhost:9200/map/_search?pretty-d'{

"query":{

"bool":{

"must":{"match_all":{}},

"filter":{

"geo_distance":{

"location":"48.8567,2.3508",

"distance":"500km"

}

}

}

}

}'

Ifeverythinggoeswell,Elasticsearchshouldonlyreturnasinglerecordfortheprecedingquery,andtherecordshouldbeLondonagain.However,wewillleaveitforyouasareadertocheck.

www.EBooksWorld.ir

ArbitrarygeoshapesSometimes,usingasinglegeographicalpointorasinglerectangleisjustnotenough.Insuchcasessomethingmoresophisticatedisneeded,andElasticsearchaddressesthisbygivingyouthepossibilitytodefineshapes.Inordertoshowyouhowwecanleveragecustomshape-limitinginElasticsearch,weneedtomodifyourindexorcreateanewoneandintroducethegeo_shapetype.Ournewmappinglooksasfollows(wewillusethistocreateanindexcalledmap2):

{

"mappings":{

"poi":{

"properties":{

"name":{"type":"string","index":"not_analyzed"},

"location":{"type":"geo_shape"}

}

}

}

}

Assumingwewrotetheprecedingmappingdefinitiontothemapping2.jsonfile,wecancreateanindexbyusingthefollowingcommand:

curl-XPUTlocalhost:9200/map2-d@mapping2.json

NoteElasticsearchallowsustosetseveralattributesforthegeo_shapetype.Themostcommonlyusedistheprecisionparameter.Duringindexing,theshapeshavetobeconvertedtoasetofterms.Themoreaccuracyrequired,themoretermsshouldbegenerated,whichisdirectlyreflectedintheindexsizeandperformance.Precisioncanbedefinedinthefollowingunits:in,inch,yd,yard,mi,miles,km,kilometers,m,meters,cm,centimeters,ormm,millimeters.Bydefault,theprecisionissetto50m.

Next,let’schangeourexampledatatomatchournewindexstructureandcreatethedocuments2.jsonfilewiththefollowingcontents:

{"index":{"_index":"map2","_type":"poi","_id":1}}

{"name":"NewYork","location":{"type":"point","coordinates":

[-73.938611,40.664167]}}

{"index":{"_index":"map2","_type":"poi","_id":2}}

{"name":"London","location":{"type":"point","coordinates":

[-0.1275,51.507222]}}

{"index":{"_index":"map2","_type":"poi","_id":3}}

{"name":"Moscow","location":{"type":"point","coordinates":[

37.616667,55.75]}}

{"index":{"_index":"map2","_type":"poi","_id":4}}

{"name":"Sydney","location":{"type":"point","coordinates":

[151.211111,-33.865143]}}

{"index":{"_index":"map2","_type":"poi","_id":5}}

{"name":"Lisbon","location":{"type":"point","coordinates":

[-9.142685,38.736946]}}

www.EBooksWorld.ir

Thestructureofthefieldofthegeo_shapetypeisdifferentfromgeo_point.ItissyntacticallycalledGeoJSON(http://en.wikipedia.org/wiki/GeoJSON).Itallowsustodefinevariousgeographicaltypes.Nowit’stimetoindexourdata:

curl-XPOSTlocalhost:9200/_bulk--data-binary@documents2.json

Let’ssumupthetypesthatwecanuseduringquerying,atleasttheonesthatwethinkarethemostusefulones.

PointApointisdefinedbythetablewhenthefirstelementisthelongitudeandthesecondisthelatitude.Anexampleofsuchashapeisasfollows:

{

"type":"point",

"coordinates":[-0.1275,51.507222]

}

EnvelopeAnenvelopedefinesaboxgivenbythecoordinatesoftheupper-leftandbottom-rightcornersofthebox.Anexampleofsuchashapeisasfollows:

{

"type":"envelope",

"coordinates":[[-0.087890625,51.50874245880332],[2.4169921875,

48.80686346108517]]

}

PolygonApolygondefinesalistofpointsthatareconnectedtocreateourpolygon.Thefirstandthelastpointinthearraymustbethesamesothattheshapeisclosed.Anexampleofsuchashapeisasfollows:

{

"type":"polygon",

"coordinates":[[

[-5.756836,49.991408],

[-7.250977,55.124723],

[1.845703,51.500194],

[-5.756836,49.991408]

]]

}

Ifyoulookcloselyattheshapedefinition,youwillfindasupplementaryleveloftables.Thankstothis,youcandefinemorethanasinglepolygon.Insuchacase,thefirstpolygondefinesthebaseshapeandtherestofthepolygonsaretheshapesthatwillbeexcludedfromthebaseshape.

MultipolygonThemultipolygonshapeallowsustocreateashapethatconsistsofmultiplepolygons.Anexampleofsuchashapeisasfollows:

www.EBooksWorld.ir

{

"type":"multipolygon",

"coordinates":[

[[

[-5.756836,49.991408],

[-7.250977,55.124723],

[1.845703,51.500194],

[-5.756836,49.991408]

]],[[

[-0.087890625,51.50874245880332],

[2.4169921875,48.80686346108517],

[3.88916015625,51.01375465718826],

[-0.087890625,51.50874245880332]

]]]

}

Themultipolygonshapecontainsmultiplepolygonsandfallsintothesamerulesasthepolygontype.So,wecanhavemultiplepolygonsand,inadditiontothis,wecanincludemultipleexclusionshapes.

AnexampleusageNowthatwehaveourindexwiththegeo_shapefields,wecancheckwhichcitiesarelocatedintheUK.Thequerythatwillallowustodothislooksasfollows:

curl-XGETlocalhost:9200/map2/_search?pretty-d'{

"query":{

"bool":{

"must":{"match_all":{}},

"filter":{

"geo_shape":{

"location":{

"shape":{

"type":"polygon",

"coordinates":[[

[-5.756836,49.991408],[-7.250977,55.124723],

[-3.955078,59.352096],[1.845703,51.500194],

[-5.756836,49.991408]

]]

}

}

}

}

}

}

}'

ThepolygontypedefinestheboundariesoftheUK(inavery,veryimpreciseway),andElasticsearch’sresponseisasfollows:

{

"took":7,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

www.EBooksWorld.ir

"failed":0

},

"hits":{

"total":1,

"max_score":1.0,

"hits":[{

"_index":"map2",

"_type":"poi",

"_id":"2",

"_score":1.0,

"_source":{

"name":"London",

"location":{

"type":"point",

"coordinates":[-0.1275,51.507222]

}

}

}]

}

}

Asfarasweknow,theresponseiscorrect.

StoringshapesintheindexUsually,shapedefinitionsarecomplex,andthedefinedareasdon’tchangetoooften(forexample,theboundariesoftheUK).Insuchcases,itisconvenienttodefinetheshapesintheindexandusetheminqueries.Thisispossible,andwewillnowdiscusshowtodoit.Asusual,wewillstartwiththeappropriatemapping,whichisasfollows:

{

"mappings":{

"country":{

"properties":{

"name":{"type":"string","index":"not_analyzed"},

"area":{"type":"geo_shape"}

}

}

}

}

Thismappingissimilartothemappingusedpreviously.Wehaveonlychangedthefieldnameandsaveditinthemapping3.jsonfile.Let’screateanewindexbyrunningthefollowingcommand:

curl-XPUTlocalhost:9200/countries-d@mapping3.json

Theexampledatathatwewilluselooksasfollows(storedinthefilecalleddocuments3.json):

{"index":{"_index":"countries","_type":"country","_id":1}}

{"name":"UK","area":{"type":"polygon","coordinates":[[[-5.756836,

49.991408],[-7.250977,55.124723],[-3.955078,59.352096],[1.845703,

51.500194],[-5.756836,49.991408]]]}}

{"index":{"_index":"countries","_type":"country","_id":2}}

{"name":"France","area":{"type":"polygon","coordinates":[[[

www.EBooksWorld.ir

3.1640625,42.09822241118974],[-1.7578125,43.32517767999296],[

-4.21875,48.22467264956519],[2.4609375,50.90303283111257],[

7.998046875,48.980216985374994],[7.470703125,44.08758502824516],[

3.1640625,42.09822241118974]]]}}

{"index":{"_index":"countries","_type":"country","_id":3}}

{"name":"Spain","area":{"type":"polygon","coordinates":[[[

3.33984375,42.22851735620852],[-1.845703125,43.32517767999296],[

-9.404296875,43.19716728250127],[-6.6796875,41.57436130598913],[

-7.3828125,36.87962060502676],[-2.109375,36.52729481454624],[

3.33984375,42.22851735620852]]]}}

Toindexthedata,wejustneedtorunthefollowingcommand:

curl-XPOSTlocalhost:9200/_bulk--data-binary@documents3.json

Asyoucanseeinthedata,eachdocumentcontainsapolygontype.Thepolygonsdefinetheareaofthegivencountries(again,itisfarfrombeingaccurate).Ifyouremember,thefirstpointofashapeneedstobethesameasthelastonesothattheshapeisclosed.Now,let’schangeourquerytoincludetheshapesfromtheindex.Ournewquerylooksasfollows:

curl-XGETlocalhost:9200/map2/_search?pretty-d'{

"query":{

"bool":{

"must":{"match_all":{}},

"filter":{

"geo_shape":{

"location":{

"indexed_shape":{

"index":"countries",

"type":"country",

"path":"area",

"id":"1"

}

}

}

}

}

}

}'

Whencomparingthesetwoqueries,wecannotethattheshapeobjectchangedtoindexed_shape.WeneedtotellElasticsearchwheretolookforthisshape.Wecandothisbydefiningtheindex(theindexproperty,whichdefaultstoshape),thetype(thetypeproperty),andthepath(thepathproperty,whichdefaultstoshape).Theoneitemlackingisanidpropertyoftheshape.Inourcase,thisis1.However,ifyouwanttoindexmoreshapes,weadviseyoutoindextheshapeswiththeirnameastheiridentifier.

www.EBooksWorld.ir

www.EBooksWorld.ir

UsingsuggestersAlongtimeago,startingfromElasticsearch0.90(whichwasreleasedonApril29,2013),wegottheabilitytouseso-calledsuggesters.Wecandefineasuggesterasafunctionalityallowingustocorrecttheuser’sspellingmistakesandbuildautocompletefunctionalitykeepingperformanceinmind.Thissectionisdedicatedtothesefunctionalitiesandwillhelpyoulearnaboutthem.Wewilldiscusseachavailablesuggestertypeandshowthemostcommonpropertiesthatallowustocontrolthem.However,keepinmindthatthissectionisnotacomprehensiveguidedescribingeachandeveryproperty.Descriptionofallthedetailsaboutsuggestersareaverybroadtopicandisoutofthescopeofthisbook.Ifyouwanttodigintotheirfunctionality,refertotheofficialElasticsearchdocumentation(https://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters.html)ortotheMasteringElasticsearchSecondEditionbookpublishedbyPacktPublishing.

www.EBooksWorld.ir

AvailablesuggestertypesThesehavechangedsincetheinitialintroductionoftheSuggestAPItoElasticsearch.Wearenowabletousefourtypeofsuggesters:

term:Asuggesterreturningcorrectionsforeachwordpassedtoit.Usefulforsuggestionsthatarenotphrases,suchassingletermqueries.phrase:Asuggesterworkingonphrases,returningaproperphrase.completion:Asuggesterdesignedtoprovidefastandefficientautocompleteresults.context:ExtensiontotheSuggestAPIofElasticsearch.Allowsustohandlepartsofthesuggestqueriesinmemoryandthusveryeffectiveintermsofperformance.

www.EBooksWorld.ir

IncludingsuggestionsLet’snowtrygettingsuggestionsalongwiththequeryresults.Forexample,let’suseamatch_allqueryandtrygettingasuggestionforaserlockholnesphrase,whichhastwotermsspelledincorrectly.Todothis,werunthefollowingcommand:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"match_all":{}

},

"suggest":{

"first_suggestion":{

"text":"serlockholnes",

"term":{

"field":"_all"

}

}

}

}'

Asyoucansee,we’veintroducedanewsectiontoourquery–thesuggestone.We’vespecifiedthetextwewanttogetthecorrectionforbyusingthetextproperty.We’vespecifiedthesuggesterwewanttouse(thetermone)andconfigureditspecifyingthenameofthefieldthatshouldbeusedforbuildingsuggestionsusingthefieldproperty.first_suggestionisthenamewegivetooursuggester;weneedtodothisbecausetherecanbemultipleonesused.Thisishowyousendarequestforsuggestioningeneral.

Ifwewanttogetmultiplesuggestionsforthesametext,wecanembedoursuggestionsinthesuggestobjectandplacethetextpropertyasthesuggestobjectoption.Forexample,ifwewanttogetsuggestionsfortheserlockholnestextforthetitlefieldandforthe_allfield,werunthefollowingcommand:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"query":{

"match_all":{}

},

"suggest":{

"text":"serlockholnes",

"first_suggestion":{

"term":{

"field":"_all"

}

},

"second_suggestion":{

"term":{

"field":"title"

}

}

}

}'

Suggesterresponse

www.EBooksWorld.ir

Nowlet’slookattheresponseofthefirstquerywesent.Asyoucanguess,theresponseincludesboththequeryresultsandthesuggestions:

{

"took":10,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":1.0,

"hits":[...]

},

"suggest":{

"first_suggestion":[{

"text":"serlock",

"offset":0,

"length":7,

"options":[{

"text":"sherlock",

"score":0.85714287,

"freq":1

}]

},{

"text":"holnes",

"offset":8,

"length":6,

"options":[{

"text":"holmes",

"score":0.8333333,

"freq":1

}]

}]

}

}

Wecanseethatwegotboththesearchresultsandthesuggestions(we’veomittedtheresultstomaketheexamplemorereadable)intheresponse.Thetermsuggesterreturnedalistofpossiblesuggestionsforeachtermthatwaspresentinthetextparameter.Foreachterm,thetermsuggesterreturnsanarrayofpossiblesuggestions.Lookingatthedatareturnedfortheserlockterm,wecanseetheoriginalword(thetextparameter),itsoffsetintheoriginaltextparameter(theoffsetparameter),anditslength(thelengthparameter).

TheoptionsarraycontainssuggestionsforthegivenwordandwillbeemptyifElasticsearchdoesn’tfindanysuggestions.Eachentryinthisarrayisasuggestionanddescribedbythefollowingproperties:

text:Textofthesuggestion.score:Suggestionscore;thehigherthescore,thebetterthesuggestion.freq:Frequencyofthesuggestion.Thefrequencyrepresentshowmanytimesthe

www.EBooksWorld.ir

wordappearsinthedocumentsintheindexwearerunningthesuggestionqueryagainst.

www.EBooksWorld.ir

TermsuggesterThetermsuggesterworksonthebasisofstringeditdistance.Thismeansthatthesuggestionwiththefewestcharactersthatneedtobechanged,added,orremovedtomakethesuggestionlookastheoriginalword,isthebestone.Forexample,let’stakethewordsworlandwork.Tochangetheworltermtowork,weneedtochangethellettertok,soitmeansadistanceof1.Thetextprovidedtothesuggesterisofcourseanalyzedandthentermsarechosentobesuggested.

TermsuggesterconfigurationoptionsThecommonandmostusedtermsuggesteroptionscanbeusedforallthesuggesterimplementationsthatarebasedonthetermone.Currently,thesearethephrasesuggesterandofcoursethebasetermone.Theavailableoptionsare:

text:Thetextwewanttogetthesuggestionsfor.Thisparameterisrequiredinorderforthesuggestertowork.field:Anotherrequiredparameterthatweneedtoprovide.Thefieldparameterallowsustosetwhichfieldthesuggestionsshouldbegeneratedfor.analyzer:Thenameoftheanalyzerwhichshouldbeusedtoanalyzethetextprovidedinthetextparameter.Ifnotset,Elasticsearchutilizestheanalyzerusedforthefieldprovidedbythefieldparameter.size:Defaultsto5andspecifiesthemaximumnumberofsuggestionsallowedtobereturnedbyeachtermprovidedinthetextparameter.suggest_mode:Controlswhichsuggestionswillbeincludedandforwhattermsthesuggestionswillbereturned.Thepossibleoptionsare:missing–thedefaultbehavior,whichmeansthatthesuggesterwillonlyprovidesuggestionsfortermsthatarenotpresentintheindex;popular–meansthatthesuggestionswillonlybereturnedwhentheyaremorefrequentthantheprovidedterm;andfinallyalwaysmeansthatsuggestionswillbereturnedeverytime.sort:AllowsustospecifyhowthesuggestionsaresortedintheresultreturnedbyElasticsearch.Bydefault,itissettoscore,whichtellsElasticsearchthatthesuggestionsshouldbesortedbythesuggestionscorefirst,thesuggestiondocumentfrequencynext,andfinallybytheterm.Thesecondpossiblevalueisfrequency,whichmeansthattheresultsarefirstsortedbythedocumentfrequency,thenbythescore,andfinallybytheterm.

AdditionaltermsuggesteroptionsInadditiontotheprecedingcommontermsuggestoptions,Elasticsearchallowsustouseadditionalonesthatonlymakesenseforthetermsuggesteritself.Someoftheseoptionsareasfollows:

lowercase_terms:Whensettotrue,ittellsElasticsearchtolowercaseallthetermsthatareproducedfromthetextfieldafteranalysis.max_edits:Itdefaultsto2andspecifiesthemaximumeditdistancethatthesuggestioncanhavetobereturnedasatermsuggestion.Elasticsearchallowsusto

www.EBooksWorld.ir

setthisvalueto1or2.prefix_len:Bydefault,itissetto1.Ifwearestrugglingwithsuggesterperformance,increasingthisvaluewillimprovetheoverallperformance,becausefewersuggestionswillneedtobeprocessed.min_word_len:Itdefaultsto4andspecifiestheminimumnumberofcharactersasuggestionmusthaveinordertobereturnedonthesuggestionslist.shard_size:Itdefaultstothevaluespecifiedbythesizeparameterandallowsustosetthemaximumnumberofsuggestionsthatshouldbereadfromeachshard.Settingthispropertytovalueshigherthanthesizeparametercanresultinmoreaccuratedocumentfrequencyatthecostofdegradationinsuggesterperformance.

NoteTheprovidedlistofparametersdoesnotcontainalltheoptionsthatareavailableforthetermsuggester.RefertotheofficialElasticsearchdocumentationforreference,athttps://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters-term.html.

www.EBooksWorld.ir

PhrasesuggesterThetermsuggesterprovidesagreatwaytocorrectuserspellingmistakesonpertermbasis,butitisnotgreatforphrases.That’swhythephrasesuggesterwasintroduced.Itisbuiltontopofthetermsuggester,butaddsadditionalphrasecalculationlogictoit.

Let’sstartwithanexampleofhowtousethephrasesuggester.Thistimewewillomitthequerysectioninourquery.Wedothatbyrunningthefollowingcommand:

curl-XGET'localhost:9200/library/_search?pretty'-d'{

"suggest":{

"text":"sherlockholnes",

"our_suggestion":{

"phrase":{"field":"_all"}

}

}

}'

Asyoucanseeintheprecedingcommand,itisalmostthesameaswesentwhenusingthetermsuggester,butinsteadofspecifyingthetermsuggestertypewe’vespecifiedthephrasetype.Theresponsetotheprecedingcommandisasfollows:

{

"took":24,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

"max_score":1.0,

"hits":[...]

},

"suggest":{

"our_suggestion":[{

"text":"sherlockholnes",

"offset":0,

"length":15,

"options":[{

"text":"sherlockholmes",

"score":0.12227806

}]

}]

}

}

Asyoucansee,theresponseisverysimilartotheonereturnedbythetermsuggesterbut,insteadofasinglewordbeingreturned,itisalreadycombinedandreturnedasaphrase.

ConfigurationBecausethephrasesuggesterisbasedonthetermsuggester,itcanalsousesomeofthe

www.EBooksWorld.ir

configurationoptionsprovidedbyit.Thoseoptionsare:text,size,analyzer,andshard_size.Inadditiontothementionedproperties,thephrasesuggesterexposesadditionaloptions.Someoftheseoptionsare:

max_errors:Specifiesthemaximumnumber(orpercentage)oftermsthatcanbeerroneousinordertocreateacorrectionusingit.Thevalueofthispropertycanbeeitheranintegernumber,suchas1,orafloatbetween0and1whichwillbetreatedasapercentagevalue.Bydefault,itissetto1,whichmeansthatatmostasingletermcanbemisspelledinagivencorrection.separator:Defaultstoawhitespacecharacterandspecifiestheseparatorthatwillbeusedtodividethetermsintheresultingbigramfield.

NoteTheprovidedlistofparametersdoesnotcontainalltheoptionsthatareavailableforthephrasesuggester.Infact,thelistiswaymoreextensivethanwhatwe’veprovided.RefertotheofficialElasticsearchdocumentationforreference,athttps://www.elastic.co/guide/en/elasticsearch/reference/current/search-suggesters-phrase.html,ortoMasteringElasticsearchSecondEditionpublishedbyPacktPublishing.

www.EBooksWorld.ir

CompletionsuggesterThecompletionsuggesterallowsustocreateautocompletefunctionalityinaveryperformance-effectiveway,becauseofstoringcomplicatedstructuresintheindexinsteadofcalculatingthemduringquerytime.WeneedtoprepareElasticsearchforthatbyusingadedicatedfieldtypecalledcompletion.Let’sassumethatwewanttocreateanautocompletefeaturetoallowustoshowbookauthors.Inadditiontoauthor’snamewewanttoreturntheidentifiersofthebooksshe/hewrote.Westartwithcreatingtheauthorsindexbyrunningthefollowingcommand:

curl-XPOST'localhost:9200/authors'-d'{

"mappings":{

"author":{

"properties":{

"name":{"type":"string"},

"ac":{

"type":"completion",

"payloads":true,

"analyzer":"standard",

"search_analyzer":"standard"

}

}

}

}

}'

Ourindexwillcontainasingletypecalledauthor.Eachdocumentwillhavetwofields:thenameandtheacfield,whichisthefieldwewilluseforautocomplete.We’vedefinedtheacfieldusingthecompletiontype.Inadditiontothat,we’veusedthestandardanalyzerforboththeindexandthequerytime.Thelastthingisthepayload-theadditional,optionalinformationwewillreturnalongwiththesuggestion-inourcaseitwillbeanarrayofbookidentifiers.

IndexingdataToindexthedata,weneedtoprovidesomeadditionalinformationalongwiththeonesweusuallyprovideduringindexing.Let’slookatthefollowingcommandsthatindextwodocumentsdescribingtheauthors:

curl-XPOST'localhost:9200/authors/author/1'-d'{

"name":"FyodorDostoevsky",

"ac":{

"input":["fyodor","dostoevsky"],

"output":"FyodorDostoevsky",

"payload":{"books":["123456","123457"]}

}

}'

curl-XPOST'localhost:9200/authors/author/2'-d'{

"name":"JosephConrad",

"ac":{

"input":["joseph","conrad"],

"output":"JosephConrad",

www.EBooksWorld.ir

"payload":{"books":["121211"]}

}

}'

Notethestructureofthedatafortheacfield.Wehaveprovidedtheinput,output,andpayloadproperties.Theoptionalpayloadpropertyisusedtoprovidetheadditionalinformationthatwillbereturned.Theinputpropertyisusedtoprovidetheinputinformationthatwillbeusedforbuildingthecompletionusedbythesuggester.Itwillbeusedforuserinputmatching.Theoptionaloutputpropertyisusedtotellthesuggesterwhichdatashouldbereturnedforthedocument.

Wecanalsoomittheadditionalparameterssectionandindexdatainthewayweareusedto,justlikeinthefollowingexample:

curl-XPOST'localhost:9200/authors/author/1'-d'{

"name":"FyodorDostoevsky",

"ac":"FyodorDostoevsky"

}'

However,becausethecompletionsuggesterusesFSTunderthehood,wewon’tbeabletofindtheprecedingdocumentbystartingwiththesecondpartoftheacfield.That’swhywethinkthatindexingthedatainthewayweshowedfirstismoreconvenient,becausewecanexplicitlycontrolwhatwewanttomatchandwhatwewanttoshowasanoutput.

QueryingindexedcompletionsuggesterdataIfwewanttofinddocumentsthathaveauthorsstartingwithfyo,werunthefollowingcommand:

curl-XGET'localhost:9200/authors/_suggest?pretty'-d'{

"authorsAutocomplete":{

"text":"fyo",

"completion":{

"field":"ac"

}

}

}'

Beforewelookattheresults,let’sdiscussthequery.Asyoucansee,we’verunthecommandtothe_suggestendpoint,becausewedon’twanttorunastandardquery;wearejustinterestedintheautocompleteresults.Thequeryisquitesimple.WesetitsnametoauthorsAutocomplete,wesetthetextwewanttogetthecompletionfor(thetextproperty),andweaddedthecompletionobjectwiththeconfigurationinit.Theresultoftheprecedingcommandlooksasfollows:

{

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"authorsAutocomplete":[{

"text":"fyo",

"offset":0,

www.EBooksWorld.ir

"length":3,

"options":[{

"text":"FyodorDostoevsky",

"score":1.0,

"payload":{

"books":["123456","123457"]

}

}]

}]

}

Asyoucanseeintheresponse,wegetthedocumentwewerelookingforalongwiththepayloadinformation,ifitisavailable(fortheprecedingresponse,itisnot).

Wecanalsousefuzzysearches,whichallowustotoleratespellingmistakes.Wedothatbyincludingtheadditionalfuzzysectioninourquery.Forexample,toenablefuzzymatchinginthecompletionsuggesterandsetthemaximumeditdistanceto2(whichmeansthatamaximumoftwoerrorsareallowed),wesendthefollowingquery:

curl-XGET'localhost:9200/authors/_suggest?pretty'-d'{

"authorsAutocomplete":{

"text":"fio",

"completion":{

"field":"ac",

"fuzzy":{

"edit_distance":2

}

}

}

}'

Althoughwe’vemadeaspellingmistake,wewillstillgetthesameresultsaswegotearlier.

CustomweightsBydefault,thetermfrequencyisusedtodeterminetheweightofthedocumentreturnedbytheprefixsuggester.However,thismaynotbethebestsolution.Insuchcases,itisusefultodefinetheweightofthesuggestionbyspecifyingtheweightpropertyforthefielddefinedascompletion.Theweightpropertyshouldbesettoanintegervalue.Thehighertheweightpropertyvalue,themoreimportantthesuggestion.Forexample,ifwewanttospecifyaweightforthefirstdocumentinourexample,werunthefollowingcommand:

curl-XPOST'localhost:9200/authors/author/1'-d'{

"name":"FyodorDostoevsky",

"ac":{

"input":["fyodor","dostoevsky"],

"output":"FyodorDostoevsky",

"payload":{"books":["123456","123457"]},

"weight":30

}

}'

www.EBooksWorld.ir

Nowifwerunourexamplequery,theresultswillbeasfollows:

{

...

"authorsAutocomplete":[{

"text":"fyo",

"offset":0,

"length":3,

"options":[{

"text":"FyodorDostoevsky",

"score":30.0,

"payload":{

"books":["123456","123457"]

}

}]

}]

}

Lookhowthescoreoftheresultchanged.Inourinitialexample,itwas1.0andnowitis30.0.Thisissobecausewesettheweightparameterto30duringindexing.

ContextsuggesterThecontextsuggesterisanextensiontotheElasticsearchSuggestAPIforElasticsearch2.1andolderversionsthatwejustdiscussed.WhendescribingthecompletionsuggesterforElasticsearch2.1,wementionedthatthissuggesterallowsustohandlesuggester-relatedsearchesentirelyinmemory.Usingthissuggester,wecandefinethesocalledcontextforthequerythatwilllimitthesuggestionstoasubsetofdocuments.Becausewedefinethecontextinthemappings,itiscalculatedduringindexation,whichmakesquerytimecalculationseasierandlessdemandingintermsofperformance.

NoteRememberthatthissectionisrelatedtoElasticsearch2.1.ContextsinElasticsearch2.2arehandleddifferentlyandwerediscussedwhendiscussingthecompletionsuggester.

Contexttypes

Elasticsearch2.1supportstwotypesofcontext:categoryandgeo.Thecategorytypeofcontextallowsustoassignadocumenttooneormorecategoriesduringtheindextime.Later,duringthequerytime,wecantellElasticsearchwhichcategoryweareinterestedinandElasticsearchwilllimitthesuggestionstothosecategories.Thegeocontextallowsustolimitthedocumentsreturnedbythesuggesterstoagivenlocationortoacertaindistancefromapoint.Thenicethingaboutcontextisthatwecanhavemultiplecontexts.Forexample,wecanhaveboththecategorycontextandthegeocontextforthesamedocument.Let’snowseewhatweneedtodotousecontextinsuggestions.

Usingcontext

Usingthegeoandcategorycontextisverysimilar–theyjustdifferinparameters.Wewillshowyouhowtousecontextsinanexampleusingthesimplercategorycontextandlaterwewillgetbacktothegeocontextandshowyouwhatweneedtoprovide.

www.EBooksWorld.ir

Thefirststepwhenusingcontextsuggesteriscreatingapropermapping.Let’sgetbacktoourauthormapping,butthistimelet’sassumethateachauthorcanbegivenoneormorecategory–thebrandofbooksshe/heiswriting.Thiswillbeourcontext.Themappingsusingthecontextlookasfollows:

curl-XPOST'localhost:9200/authors_geo_context'-d'{

"mappings":{

"author":{

"properties":{

"name":{"type":"string"},

"ac":{

"type":"completion",

"analyzer":"simple",

"search_analyzer":"simple",

"context":{

"brand":{

"type":"category",

"default":["none"]

}

}

}

}

}

}

}'

We’veintroducedanewsectioninouracfielddefinition:context.Eachcontextisgivenaname,whichisbrandinourcase,andinsidethatobjectweprovideconfiguration.Weneedtoprovidethetypeusingthetypeproperty–wewillbeusingthecategorycontextsuggesternow.Inadditiontothat,we’vesetthedefaultarray,whichprovidesuswiththevalueorvaluesthatshouldbeusedasthedefaultcontext.Ifwewant,wecanalsoprovidethepathproperty,whichwillpointElasticsearchtoafieldinthedocumentsfromwhichthecontextvalueshouldbetaken.

Wecannowindexasingleauthorbymodifyingthecommandsweusedearlier,becauseweneedtoprovidethecontext:

curl-XPOST'localhost:9200/authors_context/author/1'-d'{

"name":"FyodorDostoevsky",

"ac":{

"input":"FyodorDostoevsky",

"context":{

"brand":"drama"

}

}

}'

Asyoucansee,theacfielddefinitionisabitdifferentnow;itisanobject.Theinputpropertyisusedtoprovidethevalueforautocompleteandthecontextobjectisusedtoprovidethevaluesforeachofthecontextsdefinedinthemappings.

Finally,wecanquerythedata.Asyoucouldimagine,wewillagainprovidethecontextweareinterestedin.Thequerythatdoesthatlooksasfollows:

www.EBooksWorld.ir

curl-XGET'localhost:9200/authors_context/_suggest?pretty'-d'{

"authorsAutocomplete":{

"text":"fyo",

"completion":{

"field":"ac",

"context":{

"brand":"drama"

}

}

}

}'

Asyoucansee,we’veincludedthecontextobjectinthequeryinsidethecompletionsectionandwe’vesetthecontextweareinterestedinusingthecontextname.TheresponsereturnedbyElasticsearchisasfollows:

{

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"authorsAutocomplete":[{

"text":"fyo",

"offset":0,

"length":3,

"options":[{

"text":"FyodorDostoevsky",

"score":1.0

}]

}]

}

However,ifwechangethebrandcontexttocomedy,forexample,Elasticsearchwillreturnnoresults,becausewedon’thaveauthorswithsuchacontext.Let’stestitbyrunningthefollowingquery:

curl-XGET'localhost:9200/authors_context/_suggest?pretty'-d'{

"authorsAutocomplete":{

"text":"fyo",

"completion":{

"field":"ac",

"context":{

"brand":"comedy"

}

}

}

}'

ThistimeElasticsearchreturnsthefollowingresponse:

{

"_shards":{

"total":5,

"successful":5,

"failed":0

www.EBooksWorld.ir

},

"authorsAutocomplete":[{

"text":"fyo",

"offset":0,

"length":3,

"options":[]

}]

}

Thisisbecausenoauthorwiththebrandcontextandthevalueofcomedyispresentintheauthors_contextindex.

Usingthegeolocationcontext

Thegeocontextissimilartothecategorycontextwhenitcomestousingit.However,insteadoffilteringbyterms,wefilterusinggeographicalpointsanddistances.Whenweusethegeocontext,weneedtoprovideprecision,whichdefinestheprecisionofthecalculatedgeohash.Thesecondpropertythatweprovideistheneighborsone,whichcanbesettotrueorfalse.Bydefault,itissettotrue,whichmeansthattheneighboringgeohasheswillbeincludedinthecontext.

Inadditiontothat,similartothecategorycontext,wecanprovidepath,whichspecifieswhichfieldtouseasthelookupforthegeographicalpoint,andthedefaultproperty,specifyingthedefaultgeopointforthedocuments.

Forexample,let’sassumethatwewanttofilteronthebirthplaceofourauthors.Themappingsforsuchasuggesterwilllookasfollows:

curl-XPOST'localhost:9200/authors_geo_context'-d'{

"mappings":{

"author":{

"properties":{

"name":{"type":"string"},

"ac":{

"type":"completion",

"analyzer":"simple",

"search_analyzer":"simple",

"context":{

"birth_location":{

"type":"geo",

"precision":["1000km"],

"neighbors":true,

"default":{

"lat":0.0,

"lon":0.0

}

}

}

}

}

}

}

}'

Nowwecanindexthedocumentsandprovidethebirthlocation.Forourexampleauthor,

www.EBooksWorld.ir

itwilllookasfollows(thecentreofMoscow):

curl-XPOST'localhost:9200/authors_geo_context/author/1'-d'{

"name":"FyodorDostoevsky",

"ac":{

"input":"FyodorDostoevsky",

"context":{

"birth_location":{

"lat":55.75,

"lon":37.61

}

}

}

}'

Asyoucansee,we’veprovidedthebirth_locationcontextforourauthor.

Nowduringquerytime,weneedtoprovidethecontextthatweareinterestedinandwecan(butwearenotobligatedto)providetheprecisionasthesubsetoftheprecisionvaluesprovidedinthemappings.We’vedefinedtheprecisionto1000km,solet’sfindalltheauthorsstartingwithfyothatwereborninKazan,whichisabout800kmfromMoscow.Weshouldfindourexampleauthor.

Thequerythatdoesthatlooksasfollows:

curl-XGET'localhost:9200/authors_geo_context/_suggest?pretty'-d'{

"authorsAutocomplete":{

"text":"fyo",

"completion":{

"field":"ac",

"context":{

"birth_location":{

"lat":55.45,

"lon":49.8

}

}

}

}

}'

TheresponsereturnedbyElasticsearchlooksasfollows:

{

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"authorsAutocomplete":[{

"text":"fyo",

"offset":0,

"length":3,

"options":[{

"text":"FyodorDostoevsky",

"score":1.0

}]

www.EBooksWorld.ir

}]

}

However,ifwerunthesamequerybutpointtotheNorthPole,wewillgetnoresults:

curl-XGET'localhost:9200/authors_geo_context/_suggest?pretty'-d'{

"authorsAutocomplete":{

"text":"fyo",

"completion":{

"field":"ac",

"context":{

"birth_location":{

"lat":0.0,

"lon":0.0

}

}

}

}

}'

ThefollowingistheresponsefromElasticsearchinthiscase:

{

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"authorsAutocomplete":[{

"text":"fyo",

"offset":0,

"length":3,

"options":[]

}]

}

www.EBooksWorld.ir

www.EBooksWorld.ir

TheScrollAPILet’simaginethatwehaveanindexwithseveralmilliondocuments.Wealreadyknowhowtobuildourqueryandsoon.However,whentryingtofetchalargenumberofdocuments,youseethatwhengettingfurtherandfurtherwithpagesoftheresults,thequeriesslowdownandfinallytimeoutorresultinmemoryissues.

Thereasonforthisisthatfull-textsearchengines,especiallythosethataredistributed,don’thandlepagingverywell.Ofcourse,gettingafewhundredpagesofresultsisnotaproblemforElasticsearch,butforgoingthroughalltheindexeddocumentsorthroughlargeresultset,aspecializedAPIhasbeenintroduced.

www.EBooksWorld.ir

ProblemdefinitionWhenElasticsearchgeneratesaresponse,itmustdeterminetheorderofthedocumentsthatformtheresult.Ifweareonthefirstpage,thisisnotabigproblem.Elasticsearchjustfindsthesetofdocumentsandcollectsthefirstones;let’ssay,20documents.Butifweareonthetenthpage,Elasticsearchhastotakeallthedocumentsfrompagesonetotenandthendiscardtheonesthatareonpagesonetonine.Thisisevenmorecomplicatedifwehaveadistributedenvironment,becausewedon’tknowfromwhichnodestheresultswillcome.Becauseofthat,eachnodeneedstobuildtheresponseandkeepitinmemoryforsometime.TheproblemisnotElasticsearch-specific;asimilarsituationcanbefoundinthedatabasesystems,forexample,generally,ineverysystemthatusestheso-calledpriorityqueue.

www.EBooksWorld.ir

ScrollingtotherescueThesolutionissimple.SinceElasticsearchhastodosomeoperations(determinethedocumentsforthepreviouspages)foreachrequest,wecanaskElasticsearchtostorethisinformationforsubsequentqueries.Thedrawbackisthatwecannotstorethisinformationforeverduetolimitedresources.Elasticsearchassumesthatwecandeclarehowlongweneedthisinformationtobeavailable.Let’sseehowitworksinpractice.

Firstofall,wequeryElasticsearchasweusuallydo.However,inadditiontoalltheknownparameters,weaddonemore:theparameterwiththeinformationthatwewanttousescrollingwithandhowlongwesuggestthatElasticsearchshouldkeeptheinformationabouttheresults.Wecandothisbysendingaqueryasfollows:

curl'localhost:9200/library/_search?pretty&scroll=5m'-d'{

"size":1,

"query":{

"match_all":{}

}

}'

Thecontentofthisqueryisirrelevant.TheimportantthingishowElasticsearchmodifiestheresponse.LookatthefollowingfirstfewlinesoftheresponsereturnedbyElasticsearch:

{

"_scroll_id":

"cXVlcnlUaGVuRmV0Y2g7NTsxNjo1RDNrYnlfb1JTeU1sX20yS0NRSUZ3OzE3OjVEM2tieV9vUl

N5TWxfbTJLQ1FJRnc7MTg6NUQza2J5X29SU3lNbF9tMktDUUlGdzsxOTo1RDNrYnlfb1JTeU1sX

20yS0NRSUZ3OzIwOjVEM2tieV9vUlN5TWxfbTJLQ1FJRnc7MDs=",

"took":3,

"timed_out":false,

"_shards":{

"total":5,

"successful":5,

"failed":0

},

"hits":{

"total":4,

...

Thenewpartisthe_scroll_idsection.Thisisahandlethatwewilluseinthequeriesthatfollow.Elasticsearchhasaspecialendpointforthis:the_search/scrollendpoint.Let’slookatthefollowingexample:

curl-XGET'localhost:9200/_search/scroll?pretty'-d'{

"scroll":"5m",

"scroll_id":

"cXVlcnlUaGVuRmV0Y2g7NTsyNjo1RDNrYnlfb1JTeU1sX20yS0NRSUZ3OzI3OjVEM2tieV9vUl

N5TWxfbTJLQ1FJRnc7Mjg6NUQza2J5X29SU3lNbF9tMktDUUlGdzsyOTo1RDNrYnlfb1JTeU1sX

20yS0NRSUZ3OzMwOjVEM2tieV9vUlN5TWxfbTJLQ1FJRnc7MDs="

}'

Noweverycalltothisendpointwithscroll_idreturnsthenextpageofresults.

www.EBooksWorld.ir

Rememberthatthishandleisonlyvalidforthedefinedtimeofinactivity.

Ofcourse,thissolutionisnotideal,anditisnotveryappropriatewhentherearemanyrequeststorandompagesofvariousresultsorwhenthetimebetweentherequestsisdifficulttodetermine.However,youcanusethissuccessfullyforusecaseswhereyouwanttogetlargerresultsets,suchastransferringdatabetweenseveralsystems.

www.EBooksWorld.ir

www.EBooksWorld.ir

SummaryInthechapterthatwejustfinished,welearnedaboutsomefunctionalitiesofElasticsearchthatwewon’tprobablyuseeverydayoratleastnoteveryoneofuswillusethem.Wediscussedpercolator–anupsidedownsearchfunctionalitythatallowsustoindexqueriesandfindwhichdocumentsmatchthem.WelearnedaboutthespatialcapabilitiesofElasticsearchandweusedsuggesterstocorrectuserspellingmistakesandbuildahighlyefficientautocompletefunctionality.WealsousedtheScrollAPItoefficientlyfetchlargenumberofresultsfromourElasticsearchindices.

Inthenextchapter,wewillfocusonclustersanditsconfiguration.Wewilldiscussnodediscovery,gateway,andrecoverymodules–whattheyareresponsibleforandhowtoconfigurethemtomatchourneeds.Wewillusetemplatesanddynamictemplates,andwewillseehowtoinstallpluginsextendingElasticsearch’sout-of-theboxfunctionalities.WewilllearnwhatarethecachesofElasticsearchcachesareandhowtoconfigurethemefficientlytomakethemostoutofthem.Finally,wewillusetheupdatesettingsAPItoupdateElasticsearchconfigurationonliveandrunningclusters.

www.EBooksWorld.ir

www.EBooksWorld.ir

Chapter9.ElasticsearchClusterinDetailThepreviouschapterwasfullydedicatedtosearchfunctionalitiesthatarenotonlyaboutfulltextsearching.Welearnedhowtousepercolator–aninversedsearchthatallowsustobuildalteringfunctionalitiesontopofElasticsearch.WelearnedtousespatialfunctionalitiesofElasticsearchandweusedthesuggestAPIthatallowedustocorrectuser’sspellingmistakesaswellasbuildveryefficientautocompletefunctionalities.Butlet’snowfocusonrunningandadministeringElasticsearch.Bytheendofthischapter,youwillhavelearnedthefollowingtopics:

HowdoesElasticsearchfindnewnodesthatshouldjointheclusterWhatarethegatewayandrecoverymodulesHowdotemplatesworkHowtousedynamictemplatesHowtousetheElasticsearchpluginmechanismWhatarethecachesinElasticsearchandhowtotunethemHowtousetheUpdateSettingsAPItoupdateElasticsearchsettingsonrunningclusters

www.EBooksWorld.ir

UnderstandingnodediscoveryWhenstartingyourElasticsearchnode,oneofthefirstthingsthathappensislookingforamasternodethathasthesameclusternameandisvisible.Ifamasterisfound,thenodegetsjoinedintoanalreadyformedcluster.Ifnomasterisfound,thenthenodeitselfisselectedasamaster(ofcourseiftheconfigurationallowssuchbehavior).Theprocessofformingaclusterandfindingnodesiscalleddiscovery.Themoduleresponsiblefordiscoveryhastwomainpurposes:electingamasteranddiscoveringnewnodeswithinacluster.Inthissection,wewilldiscusshowwecanconfigureandtunethediscoverymodule.

www.EBooksWorld.ir

DiscoverytypesBydefault,withoutinstallingadditionalplugins,ElasticsearchallowsustouseZendiscovery,whichprovidesuswithunicastdiscovery.Unicast(http://en.wikipedia.org/wiki/Unicast)allowstransmissionofasinglemessageoverthenetworktoasinglehostatonce.Elasticsearchnodesendsthemessagetothenodesdefinedintheconfigurationandwaitsforaresponse.Whenthenodeisacceptedintothecluster,therecoverymodulekicksinandstartstherecoveryprocessifneeded,orthemasterelectionprocessifthemasterisstillnotelected.

NotePriortoElasticsearch2.0,theZendiscoverymoduleallowedustousemulticastdiscovery.Onamulticastcapablenetwork,ElasticsearchwasabletoautomaticallydiscovernodeswithoutspecifyinganyIPaddressesofotherElasticsearchserverssharingthesameclustername.Thiswasverymistakeproneandnotadvisedforproductionuseandthusitwasdeprecatedandremovedtoaplugin.

Elasticsearcharchitectureisdesignedtobepeertopeer.Whenrunningoperationssuchasindexingorsearching,themasternodedoesn’ttakepartincommunicationandtherelevantnodescommunicatewitheachotherdirectly.

www.EBooksWorld.ir

NoderolesElasticsearchnodescanbeconfiguredtoworkinoneofthefollowingroles:

Master:Thenoderesponsibleformaintainingtheglobalclusterstate,changingitdependingontheneeds,andhandlingtheadditionandremovalofnodes.Therecanonlybeasinglemasternodeactiveinasinglecluster.Data:Thenoderesponsibleforholdingthedataandexecutingdatarelatedoperations(indexationandsearching)ontheshardsthatarepresentlocallyforthenode.Client:Thenoderesponsibleforhandlingrequests.Fortheindexingrequests,theclientnodeforwardstherequesttotheappropriateprimaryshardand,forthesearchrequests,itsendsittoalltherelevantshardsandaggregatestheresults.

Bydefault,eachnodecanworkasmaster,data,orclient.Itcanbeadataandaclientatthesametimeforexample.Onlargeandhighlyloadedclusters,itisveryimportanttodividetherolesofthenodesintheclusterandhavethenodesdoonlyasingleroleatatime.Whendealingwithsuchclusters,youwilloftenseeatleastthreemasternodes,multipledatanodes,andafewclientonlynodesaspartofthewholecluster.

MasternodeItisthemostimportantnodetypefromElasticsearchcluster’spointofview.Ithandlestheclusterstate,changesit,managesthenodesjoiningandleavingthecluster,checksthehealthoftheothernodesinthecluster(byrunningpingrequests),andmanagestheshardrelocationoperations.Ifthemasterissomehowdisconnectedfromthecluster,theremainingnodeswillselectanewmasterfromeachother.Alltheseprocessesaredoneautomaticallyonthebasisoftheconfigurationvaluesweprovide.YouusuallywantthemasternodestoonlycommunicatewiththeotherElasticsearchnodes,usingtheinternalJavacommunication.Toavoidhittingthemasternodesbymistake,itisadvisedtoturnofftheHTTPmoduleforthemintheconfiguration.

DatanodeThedatanodeisresponsibleforholdingthedataintheindices.Thedatanodesaretheonesthatneedthemostdiskspacebecauseofbeingloadedwithdataindexationrequestsandrunningsearchesonthedatatheyhavelocally.Thedatanodes,similartothemasternodescanhavetheHTTPmoduledisabled.

ClientnodeTheclientnodesareinmostcasesnodesthatdon’thaveanydataandarenotmasternodes.Theclientnodesaretheonesthatcommunicatewiththeoutsideworldandwithallthenodesinthecluster.Theyforwardthedatatotheappropriateshardsandaggregatethesearchandaggregationsresultsfromalltheothernodes.

Keepinmindthatclientnodescanhavedataaswell,butinsuchacasetheywillrunboththeindexingrequestsandthesearchrequestsforthelocaldataandwillaggregatethedatafromtheothernodes,whichinlargeclustersmaybetoomuchworkforasinglenode.

www.EBooksWorld.ir

ConfiguringnoderolesBydefault,Elasticsearchallowseverynodetobeamasternode,adatanode,oraclientnode.However,aswealreadymentioned,incertainsituationsyoumaywanttohavenodesthatonlyholddata,clientnodesthatareonlyusedtoprocessrequests,andmasterhoststomanagethecluster.Onesuchsituationiswhenmassiveamountsofdataneedstobehandled,wherethedatanodesshouldbeasperformantaspossible.TotellElasticsearchwhatroleitshouldtake,weusethreeBooleanpropertiessetintheelasticsearch.ymlconfigurationfile:

node.master:Whensettotrue,wetellElasticsearchthatthenodeismastereligible,whichmeansthatitcantaketheroleofamaster.However,notethatthemasterwillbeautomaticallymarkedasnotmastereligibleassoonasitisassignedaclientrole.node.data:Whensettotrue,wetellElasticsearchthatthenodecanbeusedtoholddata.node.client:Whensettotrue,wetellElasticsearchthatthenodeshouldbeusedasaclient.

So,tosetanodetoonlyholddata,weshouldaddthefollowingpropertiestotheelasticsearch.ymlconfigurationfile:

node.master:false

node.data:true

node.client:false

Tosetthenodetonotholddataandonlybeamasternode,weneedtoinstructElasticsearchthatwedon’twantthenodetoholddata.Inordertodothis,weaddthefollowingpropertiestotheelasticsearch.ymlconfigurationfile:

node.master:true

node.data:false

node.client:false

www.EBooksWorld.ir

Settingthecluster’snameIfwedon’tsetthecluster.namepropertyinourelasticsearch.ymlfile,Elasticsearchusestheelasticsearchdefaultvalue.Thisisnotagoodthing,becauseeachnewElasticsearchnodewillhavethesameclusternameandyoumaywanttohavemultipleclustersinthesamenetwork.Insuchacase,connectingthewrongnodestogetherisjustamatteroftime.Becauseofthat,wesuggestsettingthecluster.namepropertytosomeothervalueofyourchoice.Usually,itisagoodideatoadjustclusternamesbasedonclusterresponsibilities.

www.EBooksWorld.ir

ZendiscoveryThedefaultdiscoverymethodusedbyElasticsearchandonethatiscommonlyusedintheElasticsearchworldiscalledZendiscovery.Itsupportsunicastdiscoveryandallowsadjustingvariouspartsofitsconfiguration.

NoteNotethatthereareadditionaldiscoverytypesavailableasplugins,suchasAmazonEC2discovery,MicrosoftAzurediscovery,andGoogleComputeEnginediscovery.

MasterelectionconfigurationImaginethatyouhaveaclusterthatisbuiltof10nodes.Everythingisworkingfineuntilonedaywhenyournetworkfailsand3ofyournodesaredisconnectedfromthecluster,buttheystillseeeachother.BecauseoftheZendiscoveryandmasterelectionprocess,thenodesthatgotdisconnectedelectanewmasterandyouendupwithtwoclusterswiththesamename,withtwomasternodes.Suchasituationiscalledasplit-brainandyoumustavoiditasmuchaspossible.Whensplit-brainhappens,youendupwithtwo(ormore)clustersthatwon’tjoineachotheruntilthenetwork(oranyother)problemsarefixed.Thethingtorememberisthatsplit-brainmayresultinnotrecoverableerrors,suchasdataconflictsinwhichyouendupwithdatacorruptionorpartialdataloss.That’swhyitisimportanttoavoidsuchsituationsatallcosts.

Inordertopreventsplit-brainsituations,Elasticsearchprovidesadiscovery.zen.minimum_master_nodesproperty.Thispropertydefinestheminimumamountofmastereligiblenodesthatshouldbeconnectedtoeachotherinordertoformacluster.Sonowlet’sgetbacktoourcluster;ifwesetthediscovery.zen.minimum_master_nodespropertyto50percentofthetotalnodesavailable+1(whichis6inourcase),wewillendupwithasinglecluster.Whyisthat?Beforethenetworkfailure,wehad10nodes,whichismorethansixnodes,andthosenodesformedacluster.Afterthedisconnectionofthethreenodes,wewouldstillhavethefirstclusterupandrunning.However,becauseonlythreenodesgotdisconnectedandthreeislessthansix,thesethreenodeswouldn’tbeallowedtoelectanewmasterandtheywouldwaitforreconnectionwiththeoriginalcluster.

Ofcoursethisisalsonotaperfectscenario.Itisadvisedtohaveadedicatedmastereligiblenodesonly,thatdon’tworkasdataorclientnodes.Tohaveaquoruminsuchacase,weneedatleastthreededicatedmastereligiblenodes,becausethatwillallowustohaveasinglemasterofflineandstillkeepthequorum.Thisisusuallyenoughtokeeptheclustersinagoodshapewhenitcomestomasterrelatedfeaturesandtobesplit-brainproof.Ofcourse,insuchacase,thediscovery.zen.minimum_master_nodespropertyshouldbesetto2andweshouldhavethethreemasternodesupandrunning.

Furthermore,ElasticsearchallowsustoadditionallyspecifytwoadditionalBooleanproperties:discover.zen.master_election.filter_clientanddiscover.zen.master_election.filter_data.TheyallowustotellElasticsearchtoignorepingrequestsfromtheclientanddatanodesduringmasterelection.Bydefault,the

www.EBooksWorld.ir

firstmentionedpropertyissettotrueandthesecondissettofalse.ThisallowsElasticsearchtofocusonthemasterelectionandnotbeoverloadedwithpingrequestsfromthenodesthatarenotmastereligible.

Inadditiontothementionedproperties,Elasticsearchallowsconfiguringtimeoutsrelatedtothemasterelectionprocess.discovery.zen.ping_timeout,whichdefaultsto3s(threeseconds),allowsconfiguringtimeoutforslownetworks–thehigherthevalue,thelesserthechanceoffailure,buttheelectionprocesscantakelonger.Thesecondpropertyiscalleddiscover.zen.join_timeoutandspecifiesthetimeoutforthejoinrequesttothemaster.Itdefaultsto20timesthediscovery.zen.ping_timeoutproperty.

ConfiguringunicastBecauseofthewayunicastworks,weneedtospecifyatleastahostthattheunicastmessageshouldbesentto.Todothis,weshouldaddthediscovery.zen.ping.unicast.hostspropertytoourelasticsearch.ymlconfigurationfile.Basically,weshouldspecifyallthehoststhatformtheclusterinthediscovery.zen.ping.unicast.hostsproperty(wedon’thavetospecifyallthehosts,wejustneedtoprovideenoughsothatwearesurethatasingleonewillwork).Forexample,ifwewantthehosts192.168.2.1,192.168.2.2and192.168.2.3forourhost,weshouldspecifytheprecedingpropertyinthefollowingway:

discovery.zen.ping.unicast.hosts:192.168.2.1:9300,192.168.2.2:9300,

192.168.2.3:9300

OnecanalsodefinearangeoftheportsElasticsearchcanuse.Forexample,tosaythatportsfrom9300to9399canbeused,wespecifythefollowing:

discovery.zen.ping.unicast.hosts:192.168.2.1:[9300-9399],192.168.2.2:

[9300-9399],192.168.2.3:[9300-9399]

Notethatthehostsareseparatedwithacommacharacterandwe’vespecifiedtheportonwhichweexpectunicastmessages.

FaultdetectionpingsettingsInadditiontothesettingsdiscussedpreviously,wecanalsocontroloralterthedefaultpingconfiguration.Pingisasignalsentbetweenthenodestocheckiftheyarerunningandresponsive.Themasternodepingsalltheothernodesintheclusterandeachoftheothernodesintheclusterpingsthemasternode.Thefollowingpropertiescanbeset:

discovery.zen.fd.ping_interval:Thisdefaultsto1s(onesecond)andspecifieshowoftenthenodespingeachotherdiscovery.zen.fd.ping_timeout:Thisdefaultsto30s(30seconds)anddefineshowlonganodewillwaitfortheresponsetoitspingmessagebeforeconsideringanodeasunresponsivediscovery.zen.fd.ping_retries:Thisdefaultsto3andspecifieshowmanyretriesshouldbetakenbeforeconsideringanodeasnotworking

Ifyouexperiencesomeproblemswithyournetwork,oryouknowthatyournodesneedmoretimetoseethepingresponse,youcanadjusttheprecedingvaluestotheonesthat

www.EBooksWorld.ir

aregoodforyourdeployment.

ClusterstateupdatescontrolAswehavealreadydiscussed,themasternodeistheoneresponsibleforhandlingthechangesoftheclusterstateandElasticsearchallowsustocontrolthatprocess.Formostusecases,thedefaultsettingsaremorethanenough,butyoumayrunintosituationswherechangingthesettingsisrequired.

Themasternodeprocessesasingleclusterstatecommandatatime.Firstthemasternodepropagatesthechangestoothernodesandthenitwaitsforresponse.Eachclusterstatechangeisnotconsideredfinisheduntilenoughnodesrespondtothemasterwithacknoledgment.Thenumberofnodesthatneedtorespondisspecifiedbydiscovery.zen.minimum_master_nodes,whichwearealreadyawareof.ThemaximumtimeanElasticsearchnodewaitsforthenodestorespondis30sbydefaultandisspecifiedbythediscovery.zen.commit_timeoutproperty.Ifnotenoughnodesrespondtothemaster,theclusterstatechangeisrejected.

Onceenoughnodesrespondtothemasterpublishmessage,theclusterstatechangeisacceptedonthemasterandtheclusterstateischanged.Oncethatisdone,themastersendsamessagetoallthenodessayingthatthechangecanbeapplied.Thetimeoutofthismessageisagainsetto30secondsandiscontrolledusingthediscovery.zen.publish_timeoutproperty.

DealingwithmasterunavailabilityIfaclusterhasnomasternode,whateverthereasonmaybe,itisnotfullyoperational.Bydefault,wecan’tchangethemetadata,clusterwidecommandswillnotbeworking,andsoon.Elasticsearchallowsustoconfigurethebehaviorofthenodeswhenthemasternodeisnotelected.Todothat,wecanusethediscovery.zen.no_master_blockpropertywhichthesettingsofallandwrite.Settingthispropertytoallmeansthatalltheoperationsonthenodewillberejected,thatis,thesearchoperations,thewriterelatedoperations,andtheclusterwideoperationssuchashealthormappingsretrieval.Settingthispropertytowritemeansthatonlythewriteoperationwillberejected–thisisthedefaultbehaviorofElasticsearch.

www.EBooksWorld.ir

AdjustingHTTPtransportsettingsWhilediscussingthenodediscoverymoduleandprocess,wementionedtheHTTPmoduleafewtimes.WewouldliketogetbacktothattopicnowanddiscussafewusefulpropertieswhendiscussingandusingElasticsearch.

DisablingHTTPThefirstthingisdisablingtheHTTPcompletely.Thisisusefultoensurethatthemasteranddatanodeswon’tacceptanyqueriesorrequestsingeneralfromusers.TodisabletheHTTPtransportcompletely,wejustneedtoaddthehttp.enabledpropertyandsetittofalseinourelasticsearch.ymlfile.

HTTPportElasticsearchallowsustodefinetheportonwhichitwillbelisteningtoHTTPrequests.Thisisdonebyusingthehttp.portproperty.Itdefaultsto9200-9300,whichmeansthatElasticsearchwillstartfrom9200portandincreaseiftheportisnotavailable(sothenextinstancewilluse9201port,andsoon).Thereisalsohttp.publish_port,whichisveryusefulwhenrunningElasticsearchbehindafirewallandwhentheHTTPportisnotdirectlyaccessible.ItdefineswhichportshouldbeusedbytheclientsconnectingtoElasticsearchanddefaultstothesamevalueasthehttp.portproperty.

HTTPhostWecanalsodefinethehosttowhichElasticsearchwillbind.Tospecifyit,weneedtodefinethehttp.hostproperty.Thedefaultvalueistheonesetbythenetworkmodule.Ifneeded,wecansetthepublishhostandthebindhostseparatelyusingthehttp.publish_hostandhttp.bind_hostproperties.Youusuallydon’thavetospecifythesepropertiesunlessyournodeshavenonstandardhostnamesormultiplenamesandyouwantElasticsearchtobindtoasingleoneonly.

YoucanfindthefulllistofpropertiesallowedfortheHTTPmoduleinElasticsearchofficialdocumentationavailableathttps://www.elastic.co/guide/en/elasticsearch/reference/2.2/modules-http.html.

www.EBooksWorld.ir

www.EBooksWorld.ir

ThegatewayandrecoverymodulesApartfromourindicesandthedataindexedinsidethem,Elasticsearchneedstoholdthemetadata,suchasthetypemappings,theindexlevelsettings,andsoon.Thisinformationneedstobepersistedsomewheresoitcanbereadduringclusterrecovery.Ofcourse,itcouldbestoredinmemory,butfullclusterrestartorafatalfailurewouldresultinthisinformationbeinglost,whichisnotsomethingthatwewant.ThisiswhyElasticsearchintroducedthegatewaymodule.Youcanthinkaboutitasasafeheavenforyourclusterdataandmetadata.Eachtimeyoustartyourcluster,alltheneededdataisreadfromthegatewayand,whenyoumakeachangetoyourcluster,itispersistedusingthegatewaymodule.

www.EBooksWorld.ir

ThegatewayInordertosetthetypeofgatewaywewanttouse,weneedtoaddthegateway.typepropertytotheelasticsearch.ymlconfigurationfileandsetittothelocalvalue.Currently,Elasticsearchrecommendsusingthelocalgatewaytype(gateway.typesettolocal),whichisthedefaultoneandtheonlyoneavailablewithoutadditionalplugins.

Thedefaultlocalgatewaytypestorestheindicesandtheirmetadatainthelocalfilesystem.Comparedtotheothergateways,thewriteoperationtothisgatewayisnotperformedinanasynchronousway,so,wheneverawritesucceeds,youcanbesurethatthedatawaswrittenintothegateway(sobasicallyindexedorstoredinthetransactionlog).

www.EBooksWorld.ir

RecoverycontrolInadditiontochoosingthegatewaytype,Elasticsearchallowsustoconfigurewhentostarttheinitialrecoveryprocess.Therecoveryisaprocessofinitializingalltheshardsandreplicas,readingallthedatafromthetransactionlog,andapplyingthemontheshards.Basically,it’saprocessneededtostartElasticsearch.

Forexample,let’simaginethatwehaveaclusterthatconsistsof10Elasticsearchnodes.WeshouldinformElasticsearchaboutthenumberofnodesbysettinggateway.expected_nodestothatvalue,so10inourcase.WeinformElasticsearchaboutthenumberofexpectednodesthatareeligibletoholdthedataandeligibletobeselectedasamaster.Elasticsearchwillstarttherecoveryprocessimmediatelyifthenumberofnodesintheclusterisequaltothatproperty.

Wewouldalsoliketostarttherecoveryaftersixnodesaretogether.Todothis,weshouldsetthegateway.recover_after_nodespropertyto6.Thispropertyshouldbesettoavaluethatensuresthatthenewestversionoftheclusterstatesnapshotwillbeavailable,whichusuallymeansthatyoushouldstartrecoverywhenmostofyournodesareavailable.

Thereisalsoonemorething.Wewouldlikethegatewayrecoveryprocesstostart5minutesafterthegateway.recover_after_nodesconditionismet.Todothis,wesetthegateway.recover_after_timepropertyto5m.Thispropertytellsthegatewaymodulehowlongtowaitwiththerecoveryprocessafterthenumberofnodesreachedtheminimumspecifiedbythegateway.recovery_after_nodesproperty.Wemaywanttodothisbecauseweknowthatournetworkisquiteslowandwewantthenodescommunicationtobestable.NotethatElasticsearchwon’tdelaytherecoveryifthenumberofmasteranddataeligiblenodesthatformedtheclusterisequaltothevalueofthegateway.expected_nodesproperty.

Theprecedingpropertyvaluesshouldbesetintheelasticsearch.ymlconfigurationfile.Forexample:ifwewouldliketohavethepreviouslydiscussedvalueinthementionedfile,wewouldendupwiththefollowingsectioninthefile:

gateway.recover_after_nodes:6

gateway.recover_after_time:5m

gateway.expected_nodes:10

AdditionalgatewayrecoveryoptionsInadditiontothementionedoptions,Elasticsearchallowsussomeadditionaldegreeofcontrol.Theseadditionaloptionsare:

gateway.recover_after_master_nodes:Thisissimilartothegateway_recover_after_nodesproperty,butinsteadoftakingintoconsiderationallthenodes,itallowsustospecifyhowmanymastereligiblenodesshouldbepresentintheclusterbeforerecoverystartsgateway.recover_after_data_nodes:Thisisalsosimilartothegateway_recover_after_nodesproperty,butitallowsspecifyinghowmanydata

www.EBooksWorld.ir

nodesshouldbepresentintheclusterbeforerecoverystartsgateway.expected_master_nodes:Thisissimilartothegateway.expected_nodesproperty,butinsteadofspecifyingthenumberofallthenodesthatweexpectinthecluster,itallowsspecifyinghowmanymastereligiblenodesweexpecttobepresentgateway.expected_master_nodes:Thisissimilartothegateway.expected_nodesproperty,butallowsspecifyinghowmanymasternodesweexpecttobepresentgateway.expected_data_nodes:Thisisalsosimilartothegateway.expected_nodesproperty,butallowsspecifyinghowmanydatanodesweexpecttobepresent

IndicesrecoveryAPIThereisalsooneotherthingwhenitcomestotherecoveryprocess–theindicesrecoveryAPI.Itallowsustoseetheprocessofindexorindicesrecovery.Touseit,wejustneedtospecifytheindicesandusethe_recoveryend-point.Forexample,tochecktherecoveryprocessofthelibraryindex,wewillrunthefollowingcommand:

curl-XGET'localhost:9200/library/_recovery?pretty'

Theresponsefortheprecedingcommandcanbelargeanddependsonthenumberofshardsintheindexandofcoursetheamountofindiceswewanttogetinformationfor.Inourcase,theresponselooksasfollows(weleftinformationaboutasingleshardtomakeitlessextensive):

{

"library":{

"shards":[{

"id":0,

"type":"STORE",

"stage":"DONE",

"primary":true,

"start_time_in_millis":1444030695956,

"stop_time_in_millis":1444030695962,

"total_time_in_millis":5,

"source":{

"id":"Brt5ejEVSVCkIfvY9iDMRQ",

"host":"127.0.0.1",

"transport_address":"127.0.0.1:9300",

"ip":"127.0.0.1",

"name":"PuffAdder"

},

"target":{

"id":"Brt5ejEVSVCkIfvY9iDMRQ",

"host":"127.0.0.1",

"transport_address":"127.0.0.1:9300",

"ip":"127.0.0.1",

"name":"PuffAdder"

},

"index":{

"size":{

"total_in_bytes":157,

"reused_in_bytes":157,

"recovered_in_bytes":0,

www.EBooksWorld.ir

"percent":"100.0%"

},

"files":{

"total":1,

"reused":1,

"recovered":0,

"percent":"100.0%"

},

"total_time_in_millis":1,

"source_throttle_time_in_millis":0,

"target_throttle_time_in_millis":0

},

"translog":{

"recovered":0,

"total":-1,

"percent":"-1.0%",

"total_on_start":-1,

"total_time_in_millis":4

},

"verify_index":{

"check_index_time_in_millis":0,

"total_time_in_millis":0

}

},

...

]

}

}

Asyoucanseeintheresponse,weseetheinformationabouteachshard.Foreachshard,weseethetypeoftheoperation(thetypeproperty),thestage(thestageproperty)describingwhatpartoftherecoveryprocessisinprogress,andwhetheritisaprimaryshard(theprimaryproperty).Inadditiontothis,weseesectionsaboutthesourceshard,thetargetshard,theindextheshardispartof,theinformationaboutthetransactionlog,andfinallyinformationabouttheindexverification.Allofthisallowsustoseewhatisthestatusoftherecoveryofourindices.

DelayedallocationWealreadydiscussedthatbydefaultElasticsearchtriestobalancetheshardsintheclusteraccordinglytothenumberofnodesinthatcluster.Becauseofthat,whenanodedropsoffthecluster(ormultiplenodesdo)orwhennodesjointhecluster,Elasticsearchstartsrebalancingthecluster,movingtheshardsandthereplicasaround.Thisisusuallyveryexpensive–newprimaryshardsmaybepromotedoutoftheavailablereplicas,largeamountofdatamaybecopiedbetweenthenewprimaryanditsreplicas,andsoon.Andthismaybehappeningbecauseasinglenodewasjustrestartedfor30secondsmaintenance.

Toavoidsuchsituations,Elasticsearchprovidesuswiththepossibilitytocontrolhowlongtowaitbeforebeginningallocationofshardsthatareinunassignedstate.Wecancontrolthedelaybyusingtheindex.unassigned.node_left.delayed_timeoutpropertyandsettingitonperindexbasis.Forexample,toconfiguretheallocationtimeoutforthe

www.EBooksWorld.ir

libraryindexto10minutes,werunthefollowingcommand:

curl-XPUT'localhost:9200/library/_settings'-d'{

"settings":{

"index.unassigned.node_left.delayed_timeout":"10m"

}

}'

Wecanalsoconfiguretheallocationtimeoutforalltheindicesbyrunningthefollowingcommand:

curl-XPUT'localhost:9200/_all/_settings'-d'{

"settings":{

"index.unassigned.node_left.delayed_timeout":"10m"

}

}'

IndexrecoveryprioritizationElasticsearch2.2exposesonemorefeaturewhenitcomestotheindicesrecoveryprocessthatallowsustodefinewhichindicesshouldbeprioritizedwhenitcomestorecovery.Byspecifyingtheindex.prioritypropertyintheindexsettingsandassigningitapositiveintegervalue,wedefinetheorderinwhichElasticsearchshouldrecovertheindices;theoneswiththehigherindex.prioritypropertywillbestartedfirst.

Forexample,let’sassumethatwehavetwoindices,libraryandmap,andwewantthelibraryindextoberecoveredbeforethemapindex.Todothis,wewillrunthefollowingcommands:

curl-XPUT'localhost:9200/library/_settings'-d'{

"settings":{

"index.priority":10

}

}'

curl-XPUT'localhost:9200/map/_settings'-d'{

"settings":{

"index.priority":1

}

}'

Weassignedhigherprioritytothelibraryindexand,becauseofthat,itwillberecoveredfaster.

www.EBooksWorld.ir

www.EBooksWorld.ir

TemplatesanddynamictemplatesIntheMappingsconfigurationsectionofChapter2,IndexingYourData,wediscussedmappings,howtheyarecreated,andhowthetype-determiningmechanismworks.Nowwewillgetintomoreadvancedtopics.Wewillshowyouhowtodynamicallycreatemappingsfornewindicesandhowtoapplysomelogictothetemplates,sothatnewindicesarealreadycreatedwithpredefinedmappings.

www.EBooksWorld.ir

TemplatesInvariouspartsofthebook,whendiscussingindexconfigurationanditsstructure,we’veseenthatthiscanbecomecomplicated,especiallywhenwehavesophisticateddatastructuresthatwewanttoindex,search,andaggregate.Especiallyifyouhavealotofsimilarindices,takingcareofthemappingsineachofthemcanbeaverypainfulprocess–eachnewindexhastobecreatedwithappropriatemappings.Elasticsearchcreatorspredictedthisandimplementedafeaturecalledindextemplates.Eachtemplatedefinesapattern,whichiscomparedtoanewlycreatedindexname.Whenbothofthemmatch,thevaluesdefinedinthetemplatearecopiedtotheindexstructuredefinition.Whenmultipletemplatesmatchthenameofthenewlycreatedindex,allofthemareappliedandthevaluesfromthetemplatesthatareappliedlateroverridethevaluesdefinedinthepreviouslyappliedtemplates.Thisisveryconvenientbecausewecandefineafewcommonsettingsinthegeneraltemplatesandchangetheminthemorespecializedones.Inaddition,thereisanorderparameterthatletsusforcethedesiredtemplateordering.Youcanthinkoftemplatesasdynamicmappingsthatcanbeappliednottothetypesindocumentsbuttotheindices.

AnexampleofatemplateLet’sseearealexampleofatemplate.Imaginethatwewanttocreatemanyindicesinwhichwedon’twanttostorethesourceofthedocumentssothatourindicesaresmaller.Wealsodon’tneedanyreplicas.WecancreateatemplatethatmatchesourneedbyusingtheElasticsearchRESTAPIandthe/_templateendpoint,bysendingthefollowingcommand:

curl-XPUThttp://localhost:9200/_template/main_template?pretty-d'{

"template":"*",

"order":1,

"settings":{

"index.number_of_replicas":0

},

"mappings":{

"_default_":{

"_source":{

"enabled":false

}

}

}

}'

Fromnowon,allthecreatedindiceswillhavenoreplicasandnosourcestored.Thisisbecausethetemplateparametervalueissetto*,whichmatchesallthenamesoftheindices.Notethe_default_typenameinourexample.Thisisaspecialtypenamewhichindicatesthatthecurrentruleshouldbeappliedtoeverydocumenttype.Thesecondinterestingthingistheorderparameter.Let’sdefineasecondtemplatebyusingthefollowingcommand:

curl-XPUThttp://localhost:9200/_template/ha_template?pretty-d'{

"template":"ha_*",

www.EBooksWorld.ir

"order":10,

"settings":{

"index.number_of_replicas":5

}

}'

Afterrunningtheprecedingcommand,allthenewindiceswillbehaveasearlierexcepttheoneswithnamesbeginningwithha_.Incaseoftheseindices,boththetemplatesareapplied.First,thetemplatewiththelowerordervalueisusedandthenthenexttemplateoverwritesthereplica’ssetting.So,theindiceswhosenamesstartwithha_willhavefivereplicasanddisabledsourcesstored.

NoteBeforeversion2.0,Elasticsearchtemplatescouldalsobestoredinfiles.StartingwithElasticsearch2.0,thisfeatureisnolongeravailable.

www.EBooksWorld.ir

DynamictemplatesSometimeswewanttohavethepossibilityofdefiningtypethatisdependentonthefieldnameandthetype.Thisiswheredynamictemplatescanhelp.Dynamictemplatesaresimilartotheusualmappings,buteachtemplatehasitspatterndefined,whichisappliedtoadocument’sfieldname.Ifafieldnamematchesthepattern,thetemplateisused.

Let’shavealookatthefollowingexample:

curl-XPOST'localhost:9200/news'-d'{

"mappings":{

"article":{

"dynamic_templates":[

{

"template_test":{

"match":"*",

"mapping":{

"index":"analyzed",

"fields":{

"str":{

"type":"{dynamic_type}",

"index":"not_analyzed"

}

}

}

}

}

]

}

}

}'

Intheprecedingexample,wedefinedthemappingforthearticletype.Inthismapping,wehaveonlyonedynamictemplatenamedtemplate_test.Thistemplateisappliedforeveryfieldintheinputdocumentbecauseofthesingleasteriskpatterninthematchproperty.Eachfieldwillbetreatedasamultifield,consistingofafieldnamedastheoriginalfield(forexample,title)andthesecondfieldwithanamesuffixedwithstr(forexample,title.str).ThefirstfieldwillhaveitstypedeterminedbyElasticsearch(withthe{dynamic_type}type),andthesecondfieldwillbeastring(becauseofthestringtype).

ThematchingpatternWehavetwowaysofdefiningthematchingpattern.Theyareasfollows:

match:Thistemplateisusedifthenameofthefieldmatchesthepattern(thispatterntypewasusedinourexample)unmatch:Thistemplateisusedifthenameofthefielddoesn’tmatchthepattern

Bydefault,thepatternisverysimpleandusesglobpatterns.Thiscanbechangedbyusingmatch_pattern=regexp.Afteraddingthisproperty,wecanuseallthemagicprovidedbyregularexpressionstomatchandunmatchthepatterns.Therearevariationssuchas

www.EBooksWorld.ir

path_matchandpath_unmatchthatcanbeusedtomatchthenamesinnesteddocuments(byprovidingpath,similartoqueries).

FielddefinitionsWhenwritingatargetfielddefinition,thefollowingvariablescanbeused:

{name}:Thenameoftheoriginalfieldfoundintheinputdocument{dynamic_type}:Thetypedeterminedfromtheoriginaldocument

NoteNotethatElasticsearchchecksthetemplatesintheorderoftheirdefinitionsandthefirstmatchingtemplateisapplied.Thismeansthatthemostgenerictemplates(forexample,with"match":"*")mustbedefinedattheend.

www.EBooksWorld.ir

www.EBooksWorld.ir

ElasticsearchpluginsAtvariousplacesinthisbook,wehaveuseddifferentpluginsthathavebeenabletoextendthecorefunctionalityofElasticsearch.YouprobablyremembertheadditionalprogramminglanguagesusedinscriptsdescribedintheScriptingcapabilitiesofElasticsearchsectionofChapter6,MakeYourSearchBetter.Inthissection,wewilllookathowthepluginsworkandhowtoinstallthem.

www.EBooksWorld.ir

ThebasicsBydefault,Elasticsearchpluginsarelocatedintheirownsubdirectoryinthepluginssubdirectoryofthesearchenginehomedirectory.Ifyouhavedownloadedanewpluginmanually,youcanjustcreateanewdirectorywiththepluginnameandunpackthatpluginarchivetothisdirectory.Thereisalsoamoreconvenientwaytoinstallplugins:byusingthepluginscript.Wehaveuseditseveraltimesinthisbookwithouttalkingaboutit,sothistimelet’stakethetimeanddescribethistool.

Elasticsearchhastwomaintypesofplugins.Thesetwotypescanbecategorizedbasedonthecontentoftheplugin-descriptor.propertiesfile:Javapluginsandsiteplugins.Let’sstartwiththesiteplugins.TheyusuallycontainsetsofHTML,CSS,andJavaScriptfilesandaddadditionalUIcomponentstoElasticsearch.Elasticsearchtreatsthesitepluginsasafilesetthatshouldbeservedbythebuilt-inHTTPserverunderthe/_plugin/plugin_name/URL(forexample,/_plugin/bigdesk/).Thistypeofplugindoesn’tchangeanythingincoreElasticsearchfunctionality.

TheJavapluginsaretheonesthataddormodifythecoreElasticsearchfeatures.TheyusuallycontaintheJARfiles.Theplugin-descriptor.propertiesfilecontainsinformationaboutthemainclassthatshouldbeusedbyElasticsearchasanentrypointtoconfigurepluginsandallowthemtoextendtheElasticsearchfunctionality.ThenicethingabouttheJavapluginsisthattheycancontainthesitepartaswell.Thesitepartofthepluginneedstobeplacedinthe_sitedirectoryifweareunpackingthepluginmanually.

www.EBooksWorld.ir

InstallingpluginsPluginscanbedownloadedfromthreesourcetypes.Thefirstistheofficialrepositorylocatedathttps://download.elastic.co.Allpluginsfromthissourcecanbeinstalledbyreferringtothepluginname.Forexample:

bin/plugininstalllang-javascript

Theprecedingcommandresultsininstallationofapluginthatallowsustouseanadditionalscriptinglanguage,JavaScript.ElasticsearchautomaticallytriestofindapluginversionthatisthesameastheversionofElasticsearchweareusing.Sometimes,likeinthefollowingexample,apluginmayaskforadditionalpermissionsduringinstallation.

Justsoweknowwhattoexpect,thisisanexampleresultofrunningtheprecedingcommand:

->Installinglang-javascript…

Trying

https://download.elastic.co/elasticsearch/release/org/elasticsearch/plugin/

lang-javascript/2.2.0/lang-javascript-2.2.0.zip…

Downloading…...............................................................

.........DONE

Verifying

https://download.elastic.co/elasticsearch/release/org/elasticsearch/plugin/

lang-javascript/2.2.0/lang-javascript-2.2.0.zipchecksumsifavailable…

Downloading.DONE

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

@WARNING:pluginrequiresadditionalpermissions@

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@

*java.lang.RuntimePermissioncreateClassLoader

*org.elasticsearch.script.ClassPermission<<STANDARD>>

*org.elasticsearch.script.ClassPermission

org.mozilla.javascript.ContextFactory

*org.elasticsearch.script.ClassPermissionorg.mozilla.javascript.Callable

*org.elasticsearch.script.ClassPermission

org.mozilla.javascript.NativeFunction

*org.elasticsearch.script.ClassPermissionorg.mozilla.javascript.Script

*org.elasticsearch.script.ClassPermission

org.mozilla.javascript.ScriptRuntime

*org.elasticsearch.script.ClassPermissionorg.mozilla.javascript.Undefined

*org.elasticsearch.script.ClassPermission

org.mozilla.javascript.optimizer.OptRuntime

See

http://docs.oracle.com/javase/8/docs/technotes/guides/security/permissions.

html

fordescriptionsofwhatthesepermissionsallowandtheassociatedrisks.

Continuewithinstallation?[y/N]y

Installedlang-javascriptinto/Users/someplace/elasticsearch-

2.2.0/plugins/lang-javascript

Installedlang-javascriptinto

/Users/negativ/Developer/Elastic/elasticsearch-2.2.0/plugins/lang-

javascript

www.EBooksWorld.ir

Ifthepluginisnotavailableatthefirstlocation,itcanbeplacedinoneoftheApacheMavenrepositories:MavenCentral(https://search.maven.org/)orMavenSonatype(https://oss.sonatype.org/).Inthiscase,thepluginnameforinstallationshouldbeequaltogroupId/artifactId/version,justaseverylibraryforMaven(http://maven.apache.org/).Forexample:

bin/plugininstallorg.elasticsearch/elasticsearch-mapper-attachments/3.0.1

ThethirdsourcearetheGitHub(https://github.com/)repositories.Theplugintoolassumesthatthegivenpluginaddresscontainstheorganizationnamefollowedbythepluginnameand,optionally,theversionnumber.Let’slookatthefollowingcommandexample:

bin/plugininstallmobz/elasticsearch-head

Ifyouwriteyourownpluginandyouhavenoaccesstotheearlier-mentionedsites,thereisnoproblem.Theplugintoolacceptstheurlpropertyfromwherethepluginshouldbedownloaded(insteadofspecifyingthenameoftheplugin).Thisoptionallowsustosetanylocationfortheplugins,includingthelocalfilesystem(usingthefile://prefix)orremotefile(usingthehttp://prefix).Forexample,thefollowingcommandwillresultintheinstallationofapluginarchivedonthelocalfilesysteminthe/tmp/elasticsearch-lang-javascript-3.0.0.RC1.zipdirectory:

bin/plugininstallfile:///tmp/elasticsearch-lang-javascript-3.0.0.RC1.zip

www.EBooksWorld.ir

RemovingpluginsRemovingapluginisassimpleasremovingitsdirectory.Youcanalsodothisbyusingtheplugintool.Forexample,toremovethepreviouslyinstalledJavaScriptplugin,werunacommandasfollows:

bin/pluginremovelang-javascript

Theoutputfromthecommandjustconfirmsthatthepluginwasremoved:

->Removinglang-javascript…

Removedlang-javascript

NoteYouneedtorestarttheElasticsearchnodefortheplugininstallationorremovaltotakeeffect.

www.EBooksWorld.ir

www.EBooksWorld.ir

ElasticsearchcachesUntilnowwehaven’tmentionedElasticsearchcachesmuchinthebook.However,asmostcommonsystemsElasticsearchusersavarietyofcachestoperformmorecomplicatedoperationsortospeedupperformanceofheavydataretrievalfromdiskbasedLuceneindices.Inthissection,wewilllookatthemostcommoncachesofElasticsearch,whattheyareusedfor,whataretheperformanceimplicationsofusingthem,andhowtoconfigurethem.

www.EBooksWorld.ir

FielddatacacheInthebeginningofthebook,wediscussedthatElasticsearchusesthesocalledinvertedindexdatastructuretoquicklyandefficientlysearchthroughthedocuments.Thisisverygoodwhensearchingandfilteringthedata,butforfeaturessuchasaggregations,sorting,orscriptusage,Elasticsearchneedsanun-inverteddatastructure,becausethesefunctionsrelyonperdocumentdatainformation.

Becauseoftheneedforuninverteddata,whenElasticsearchwasfirstreleaseditcontainedandstillcontainsaninmemorydatastructurecalledfielddata.Fielddataisusedtostoreallthevaluesofagivenfieldtomemorytoprovideveryfastdocumentbasedlookup.However,thecostofusingfielddataismemoryandincreasedgarbagecollection.Becauseofmemoryandperformancecost,startingfromElasticsearch2.0,eachindexed,notanalyzedfieldusesdocvaluesbydefault.Otherfields,suchasanalyzedtextfields,stillusefielddataandbecauseofthatitisgoodtoknowhowtohandlefielddata.

FielddatasizeElasticsearchallowsustocontrolhowmuchmemorythefielddatacacheuses.Bydefault,thecacheisunbounded,whichisverydangerous.Ifyouhavelargeindices,youmayrunintomemoryissues,wherethefielddatacachewilleatmostofthememorygiventoElasticsearchandwillresultinnodefailure.Weareallowedtoconfigurethesizeofthefielddatacachebyusingthestaticindices.fielddata.cache.sizepropertysettoanexplicitvalue(like10GB)ortoapercentageofthewholememorygiventoElasticsearch(like20%).

Rememberthatthefielddatacacheisveryexpensivetobuildasitneedstoloadallthevaluesofagivenfieldtomemory.Thiscantakealotoftimeresultingindegradationintheperformanceofthequeries.Becauseofthis,itisadvisedtohaveenoughmemorytokeeptheneededcachepermanentlyinElasticsearchmemory.However,weunderstandthatthisisnotalwayspossiblebecauseofhardwarecosts.

CircuitbreakersThenicethingaboutElasticsearchisthatitallowsustoachieveasimilarthinginmultiplewaysandwehavethesamesituationwhenitcomestofielddataandlimitingthememoryusage.Elasticsearchallowsustouseafunctionalitycalledcircuitbreakers,whichcanestimatehowmuchmemoryarequestoraquerywilluse,andifitisaboveadefinedthreshold,itwon’tbeexecutedatall,resultinginnomemoryusageandanexceptionthrown.Thisisverynicewhenwedon’twanttolimitthesizeofthefielddatacachebutwealsodon’twantasinglequerytocausememoryissuesandmaketheclusterunstable.Therearetwomaincircuitbreakers:thefielddatacircuitbreakerandtherequestcircuitbreaker.

Thefirstcircuitbreaker,thefielddataone,estimatestheamountofmemorythatwillneedtobeusedtoloaddatatothefielddatacacheforagivenquery.Wecanconfigurethelimitbyusingtheindices.breaker.fielddata.limitproperty,whichisbydefaultsetto60%,whichmeansthatafielddatacacheforasinglequerycan’tusemorethan60percent

www.EBooksWorld.ir

ofthememorygiventoElasticsearch.

Thesecondcircuitbreaker,therequestone,estimatesthememoryusedbyperrequestdatastructuresandpreventsthemfromusingmorethantheamountspecifiedbytheindices.breaker.request.limitproperty.Bydefault,thementionedpropertyissetto40%,whichmeansthatsinglerequestdatastructures,suchastheonesusedforaggregationcalculation,can’tusemorethan40%ofthememorygiventoElasticsearch.

Finally,thereisonemorecircuitbreakerthatisdefinedbytheindices.breaker.limit.totalproperty(bydefaultsetto70%).Thiscircuitbreakerdefinesthetotalamountofmemorythatcanbeusedbyboththeperrequestdatastructuresandfielddata.

Rememberthatthesettingsforcircuitbreakersaredynamicandcanbeupdatedusingclusterupdatesettings.

www.EBooksWorld.ir

FielddataanddocvaluesAswealreadydiscussed,insteadoffielddatacache,docvaluescanbeused.Ofcourse,thisisonlytruefornotanalyzedfieldsandonesusingnumericdatatypesandnotmultivaluedones.Thiswillsavememoryandshouldbefasterthanthefielddatacacheduringquerytime,atthecostofslightindexingspeeddegradations(verysmall)andaslightlylargerindex.Ifyoucanusedocvalues,dothat–itwillhelpyourElasticsearchclustertomaintainstabilityandrespondtoqueriesquickly.

www.EBooksWorld.ir

ShardrequestcacheThefirstofthecachesthatoperatesonthequeries.Theshardrequestcachecachestheaggregationsandsuggestionsresultedbythequery,but,whenwritingthisbook,itwasnotcachingqueryhits.WhenElasticsearchexecutesthequery,thiscachecansavetheresourceconsumingaggregationsforthequeryandspeedupthesubsequentqueriesbyretrievingtheaggregationsorsuggestionsfrommemory.

NoteDuringthewritingofthisbook,theshardrequestcachewasonlyusedwhenthesize=0parameterwassetforthequery.Thismeansthatonlythetotalnumberofhits,aggregationresults,andsuggestionswillbecached.Rememberthatwhenrunningquerieswithdatesandusingthenowconstant,theshardquerycachewon’talsobeused.

Theshardrequestcache,asitsnamesays,cachestheresultsofthequeriesoneachshard,beforetheyarereturnedtothenodethataggregatestheresults.Thiscanbeverygoodwhenyouraggregationsareheavy,liketheonesthatdoalotofcomputationonthedatareturnedbythequery.Ifyourunalotofaggregationswithyourqueriesandthequeriescanberepeated,thinkaboutusingtheshardrequestcacheasitshouldhelpyouwithquerieslatency.

EnablingandconfiguringtheshardrequestcacheTheshardrequestcacheisdisabledbydefault,butcanbeeasilyenabled.Toenableit,weshouldsettheindex.requests.cache.enablepropertytotruewhencreatingtheindex.Forexample,toenabletheshardrequestcacheforanindexcallednew_library,weusethefollowingcommand:

curl-XPUT'localhost:9200/new_library'-d'{

"settings":{

"index.requests.cache.enable":true

}

}'

Onethingtorememberisthatthementionedsettingisnotdynamicallyupdatable.Weneedtoincludeitintheindexcreationcommandorwecanupdateitwhentheindexisclosed.

Themaximumsizeofthecacheisspecifiedusingtheindices.requests.cache.sizepropertyandissetto1%bydefault(whichmeans1%ofthetotalmemorygiventoElasticsearch).Wecanalsospecifyhowlongeachentryshouldbekeptbyusingtheindices.requests.cache.expireproperty,butitisnotsetbydefault.Also,thecacheisinvalidatedoncetheindexisrefreshed(duringindexsearcherreopening),whichmakesthesettinguselessmostofthetime.

NoteNotethatintheearlierversionsofElasticsearch,forexampleinthe1.xbranch,toenableordisablethiscache,theindex.cache.query.enablepropertywasused.Thismaybe

www.EBooksWorld.ir

importantwhenmigratingfromolderElasticsearchversions.

PerrequestshardrequestcachedisablingElasticsearchallowsustocontroltherequestshardcacheusedonaperrequestbasis.Ifwehavethementionedcacheenabled,wecanstillforcethesearchenginetoomitcachingforsuchrequests.Thisisdonebyusingtherequest_cacheparameter.Ifsettotrue,therequestwillbecachedand,ifsettofalse,therequestwon’tbecached.Thisisespeciallyusefulwhenwewanttocacheourrequestsingeneralbutomitcachingforsomequeriesthatarerareandnotusedoften.Itisalsowiseforrequeststhatusenon-deterministicscriptsandtimerangestonotbecached.

ShardrequestcacheusagemonitoringIfwedon’tuseanymonitoringsoftwarethatallowsmonitoringthecachesusage,wecanuseElasticsearchAPItocheckthemetricsaroundtheshardrequestcache.Thiscanbedonebothattheindicesleveloratthenodeslevel.

Tocheckthemetricsfortheshardrequestcacheforalltheindices,weshouldusetheindicesstatsAPIandrunthefollowingcommand:

curl'localhost:9200/_stats/request_cache?pretty'

Tochecktherequestcachemetrics,butinpernodeview,werunthefollowingcommand:

curl'localhost:9200/_nodes/stats/indices/request_cache?pretty'

www.EBooksWorld.ir

NodequerycacheThenodequerycacheisresponsibleforholdingtheresultsofqueriesforthewholenode.Itssizeisdefinedusingindices.queries.cache.size,defaultingto10%,andissharableacrossalltheshardspresentonthenode.WecansetitbothtothepercentageoftheheapmemorygiventoElasticsearch,likethedefaultone,ortoanexplicitvalue,like1024mb.Onethingtorememberaboutthecacheisthatitsconfigurationisstatic,itcan’tbeupdateddynamicallyandshouldbesetintheelasticsearch.ymlfile.Thenodequerycacheusestheleastrecentusedevictionpolicy,whichmeansthat,whenfull,itremovesthedatathatwasusedtheleast.

Thiscacheisveryusefulwhenyourunqueriesthatarerepetetiveandheavy,suchastheonesusedtogeneratecategorypagesorthemainpageinane-commerceapplication.

www.EBooksWorld.ir

IndexingbuffersThelastcachewewanttodiscussistheindexingbufferthatallowsustoimproveindexingthroughput.Theindexingbufferisdividedbetweenalltheshardsonthenodeandisusedtostorenewlyindexeddocuments.Oncethecachefillsup,Elasticsearchflushesthedatafromthecachetodisk,creatinganewLucenesegmentintheindex.

Therearefourstaticpropertiesthatallowustoconfiguretheindexingbuffersize.Theyneedtobesetintheelasticsearch.ymlfileandcan’tbechangeddynamicallyusingtheSettingsAPI.Thesepropertiesare:

indices.memory.index_buffer_size:Thispropertydefinestheamountofmemoryusedbyanodefortheindexingbuffer.Itacceptsbothapercentagevalueaswellasanexplicitvalueinbytes.Itdefaultsto10%,whichmeansthat10%oftheheapmemorygiventoanodewillbeusedastheindexingbuffer.indices.memory.min_index_buffer_size:Thispropertydefaultsto48mbandspecifiestheminimummemorythatwillbeusedbytheindexingbuffer.Itisusefulwhenindices.memory.index_buffer_sizeisdefinedasapercentagevalue,sothattheindexingbufferisneversmallerthanthevaluedefinedbythisproperty.indices.memory.max_index_buffer_size:Thispropertyspecifiesthemaximummemorythatwillbeusedbytheindexingbuffer.Itisusefulwhenindices.memory.index_buffer_sizeisdefinedasapercentagevalue,sothattheindexingbuffernevercrossesacertainamountofmemoryusage.indices.memory.min_shard_index_buffer_size:Thispropertydefaultsto4mbandsetsthehardminimumlimitoftheindexingbufferthatisgiventoeachshardonanode.Theindexingbufferforeachshardwillnotbelowerthanthevaluesetbythisproperty.

Whenitcomestoindexingperformance,ifyouneedhigherindexingthroughput,considersettingtheindexingbuffersizetoavaluehigherthanthedefaultsize.ItwillallowElasticsearchtoflushthedatatodisklessoftenandcreatefewersegments.Thiswillresultinlessmerges,thuslessI/OandCPUintensiveoperations.Becauseofthat,Elasticsearchwillbeabletousemoreresourcesforindexingpurposes.

www.EBooksWorld.ir

WhencachesshouldbeavoidedTheusualquestionthatmaybeaskedbyusersisiftheyshouldreallycachealltheirrequests.Theanswerisobvious–ofcourse,cachesarenotthetoolforeveryone.Usingcachingisnotfree–itrequiresmemoryandadditionaloperationstoputthedatatocacheorgetthedataoutofthere.

What’smore,youshouldrememberthatElasticsearchroundrobinsqueriesbetweenprimaryshardsarereplicas,so,ifyouhavereplicas,noteveryrequestafterthefirstonewillusethecache.Imaginethatyouhaveanindexwhichhasasingleprimaryshardandtworeplicas.Whenthefirstrequestcomes,itwillhitarandomshard,butthenextrequest,evenwiththesamequery,willhitanothershard,notthesameone(unlessroutingisused).Youshouldtakethisintoconsiderationwhenusingcaches,becauseifyourqueriesarenotrepeated,youmayhavethemrunninglongerbecauseofacachebeingused.

Sotoanswerthequestionifyoushouldusecachingornot,wewouldadvisetakingyourdata,takingyourqueries,andrunningperformancetestsusingtoolssuchasJMeter(http://jmeter.apache.org).Thiswillletyouseehowyourclusterbehaveswithrealdataunderatestloadandseeifthequeriesareactuallyfasterwithorwithoutthecaches.

www.EBooksWorld.ir

www.EBooksWorld.ir

TheupdatesettingsAPIElasticsearchletsustuneitselfbyspecifyingthevariousparametersintheelasticsearch.ymlfile.ButyoushouldtreatthisfileasthesetofdefaultvaluesthatcanbechangedintheruntimeusingtheElasticsearchRESTAPI.Wecanchangeboththeperindexsettingandtheclusterwidesettings.However,youshouldrememberthatnotallpropertiescanbedynamicallychanged.Ifyoutrytoaltertheseparameters,Elasticsearchwillrespondwithapropererror.

www.EBooksWorld.ir

TheclustersettingsAPIInordertosetoneoftheclusterproperties,weneedtousetheHTTPPUTmethodandsendaproperrequesttothe_cluster/settingsURI.However,wehavetwooptions:addingthechangesastransientorpermanent.

Thefirstone,transient,willsetthepropertyonlyuntilthefirstrestart.Inordertodothis,wesendthefollowingcommand:

curl-XPUT'localhost:9200/_cluster/settings'-d'{

"transient":{

"PROPERTY_NAME":"PROPERTY_VALUE"

}

}'

Asyoucansee,intheprecedingcommand,weusedtheobjectnamedtransientandweaddedourpropertydefinitionthere.Thismeansthatthepropertywillbevalidonlyuntiltherestart.Ifwewantourpropertysettingstopersistbetweenrestarts,insteadofusingtheobjectnamedtransient,weusetheonenamedpersistent.

Atanymoment,youcanfetchthesesettingsusingthefollowingcommand:

curl-XGETlocalhost:9200/_cluster/settings

www.EBooksWorld.ir

TheindicessettingsAPITochangetheindicesrelatedsettings,Elasticsearchprovidesthe/_settingsendpointforchangingtheparametersforalltheindicesandthe/index_name/_settingsendpointformodifyingthesettingsofasingleindex.Whencomparedtotheclusterwidesettings,allthechangesdonetoindicesusingtheAPIarealwayspersistentandvalidafterElasticsearchrestarts.Tochangethesettingsforalltheindices,wesendthefollowingcommand:

curl-XPUT'localhost:9200/_settings'-d'{

"index":{

"PROPERTY_NAME":"PROPERTY_VALUE"

}

}'

Thecurrentsettingsforalltheindicescanbelistedusingthefollowingcommand:

curl-XGETlocalhost:9200/_settings

Tosetapropertyforasingleindex,werunthefollowingcommand:

curl-XPUT'localhost:9200/index_name/_settings'-d'{

"index":{

"PROPERTY_NAME":"PROPERTY_VALUE"

}

}'

Thegetthesettingsforthelibraryindex,werunthefollowingcommand:

curl-XGETlocalhost:9200/library/_settings

www.EBooksWorld.ir

www.EBooksWorld.ir

SummaryInthechapterwejustfinished,welearnedafewveryimportantthingsaboutElasticsearch.Firstofall,welearnedhowwecanconfigurethenodediscoverymechanism.Inadditiontothat,welearnedtocontrolwhathappensaftertheclusterisinitiallyformedusingtherecoveryandgatewaymodules.Weuseddynamicandnon-dynamictemplatestohandleourindicesmoreeasily,andwelearnedwhattypeofcachesElasticsearchhasandhowtocontrolthem.Finally,weusedtheupdatesettingsAPItoupdatethevariousElasticsearchconfigurationvariablesonanalreadylivecluster.

Inthenextchapter,wewillfocusonclusteradministration.Wewillstartwithlearninghowtobackupourdataandhowtomonitorthekeyclustermetrics.We’llseethewaytocontrolclusterrebalancingandshardallocation,andwewilluseahumanfriendlyCatAPIthatallowsustogetvariedinformationaboutthecluster.Finally,wewilllearnaboutwarmingupourindicesandaliasing.

www.EBooksWorld.ir

www.EBooksWorld.ir

Chapter10.AdministratingYourClusterInthepreviouschapter,wefocusedonElasticsearchnodesandclusterconfiguration.Westartedbydiscussingthenodediscoveryprocess,whatitisandhowtoconfigureit.We’vediscussedgatewayandrecoverymodulesandtunedthemtomatchourneeds.We’veusedtemplatesanddynamictemplatestomanagedatastructureeasilyandlearnedhowtoinstallpluginstoextendthefunctionalitiesofElasticsearch.Finally,we’velearnedaboutthecachesofElasticsearchandhowtoupdateindicesandclustersettingsusingadedicatedAPI.Bytheendofthischapter,youwillhavelearnedthefollowingtopics:

BackingupyourindicesinElasticsearchMonitoringyourclustersControllingshardsandrebalancingreplicasControllingshardsandallocatingreplicasUsingCATAPItolearnaboutclusterstateWarmingupAliasing

www.EBooksWorld.ir

ElasticsearchtimemachineAgoodpieceofsoftwareisaonethatcanmanageexceptionalsituationssuchashardwarefailureorhumanerror.Eventhoughaclusterofafewserversislessdependentonhardwareproblems,badthingscanstillhappen.Forexample,let’simaginethatyouneedtorestoreyourindices.OnepossiblesolutionistoreindexallyourdatafromaprimarydatastoresuchasaSQLdatabase.Butwhatwillyoudoifittakestoolongor,evenworse,theonlydatastoreisElasticsearch?BeforeElasticsearch1.0,creatingbackupsofindiceswasnoteasy.Theprocedureincludedstoppingindexation,flushingthedatatodisk,shuttingdownthecluster,and,finally,copyingthedatatoabackupdevice.

Fortunately,nowwecantakesnapshotsandthissectionwillguideyouandshowhowthisfunctionalityworks.

www.EBooksWorld.ir

CreatingasnapshotrepositoryAsnapshotkeepsallthedatarelatedtotheclusterfromthetimethesnapshotcreationstartsanditincludesinformationabouttheclusterstateandindices.Beforewecreatesnapshots,atleastthefirstone,asnapshotrepositorymustbecreated.Eachrepositoryisrecognizedbyitsnameandshoulddefinethefollowingaspects:

name:Auniquenameoftherepository;wewillneeditlater.type:Thetypeoftherepository.Thepossiblevaluesarefs(arepositoryonasharedfilesystem)andurl(aread-onlyrepositoryavailableviaURL)settings:Additionalinformationneededdependingontherepositorytype

Now,let’screateafilesystemrepository.Beforethis,wehavetomakesurethatthedirectoryforourbackupsfulfilstworequirements.Thefirstisrelatedtosecurity.EveryrepositoryhastobeplacedinthepathdefinedintheElasticsearchconfigurationfileaspath.repo.Forexample,ourelasticsearch.ymlincludesalinesimilartothefollowingone:

path.repo:["/tmp/es_backup_folder","/tmp/backup/es"]

Thesecondrequirementsaysthateverynodeintheclustershouldbeabletoaccessthedirectorywesetfortherepository.

Sonow,let’screateanewfilesystemrepositorybyrunningthefollowingcommand:

curl-XPUTlocalhost:9200/_snapshot/backup-d'{

"type":"fs",

"settings":{

"location":"/tmp/es_backup_folder/cluster1"

}

}'

Theprecedingcommandcreatesarepositorynamedbackup,whichstoresthebackupfilesinthedirectorygivenbythelocationattribute.Elasticsearchrespondswiththefollowinginformation:

{"acknowledged":true}

Atthesametime,es_backup_folderonthelocalfilesystemiscreated—withoutanycontentyet.

NoteYoucanalsosetarelativepathwiththelocationparameter.Inthiscase,Elasticsearchdeterminestheabsolutepathbyfirstgettingthedirectorydefinedinpath.repo.

Aswesaid,thesecondrepositorytypeisurl.Itrequiresaurlparameterinsteadofthelocation,whichpointstotheaddresswheretherepositoryresides,forexample,theHTTPaddress.Asinthepreviouscase,theaddressshouldbedefinedintherepositories.url.allowed_urlsparameterintheElasticsearchconfiguration.Theparameterallowstheuseofwildcardsintheaddress.

www.EBooksWorld.ir

NoteNotethatfile://addressesarecheckedagainstthepathsdefinedinthepath.repoparameter.

YoucanalsostoresnapshotsinAmazonS3,HDFS,orAzureusingtheadditionalpluginsavailable.Tolearnaboutthese,pleasevisitthefollowingpages:

https://github.com/elastic/elasticsearch-cloud-aws#s3-repositoryhttps://github.com/elastic/elasticsearch-hadoop/tree/master/repository-hdfshttps://github.com/elastic/elasticsearch-cloud-azure#azure-repository

Nowthatwehaveourfirstrepository,wecanseeitsdefinitionusingthefollowingcommand:

curl-XGETlocalhost:9200/_snapshot/backup?pretty

Wecanalsocheckalltherepositoriesbyrunningacommandlikethefollowing:

curl-XGETlocalhost:9200/_snapshot/_all?pretty

Orsimply,wecanusethis:

curl-XGETlocalhost:9200/_snapshot/_all?pretty

curl-XGETlocalhost:9200/_snapshot/?pretty

Ifyouwanttodeleteasnapshotrepository,thestandardDELETEcommandhelps:

curl-XDELETElocalhost:9200/_snapshot/backup?pretty

www.EBooksWorld.ir

CreatingsnapshotsBydefault,Elasticsearchtakesalltheindicesandclustersettings(exceptthetransientones)whencreatingsnapshots.Youcancreateanynumberofsnapshotsandeachwillholdinformationavailablerightfromthetimewhenthesnapshotwascreated.Thesnapshotsarecreatedinasmartway;onlynewinformationiscopied.ThismeansthatElasticsearchknowswhichsegmentsarealreadystoredintherepositoryanddoesn’thavetosavethemagain.

Tocreateanewsnapshot,weneedtochooseauniquenameandusethefollowingcommand:

curl-XPUT'localhost:9200/_snapshot/backup/bckp1'

Theprecedingcommanddefinesanewsnapshotnamedbckp1(youcanonlyhaveonesnapshotwithagivenname;Elasticsearchwillcheckitsuniqueness)anddataisstoredinthepreviouslydefinedbackuprepository.Thecommandreturnsanimmediateresponse,whichlooksasfollows:

{"accepted":true}

Theprecedingresponsemeansthattheprocessofsnapshot-inghasstartedandcontinuesinthebackground.Ifyouwouldliketheresponsetobereturnedonlywhentheactualsnapshotiscreated,youcanaddthewait_for_completion=trueparameterasshowninthefollowingexample:

curl-XPUT'localhost:9200/_snapshot/backup/bckp2?

wait_for_completion=true&pretty'

Theresponsetotheprecedingcommandshowsthestatusofacreatedsnapshot:

{

"snapshot":{

"snapshot":"bckp2",

"version_id":2000099,

"version":"2.2.0",

"indices":["news"],

"state":"SUCCESS",

"start_time":"2016-01-07T21:21:43.740Z",

"start_time_in_millis":1446931303740,

"end_time":"2016-01-07T21:21:44.750Z",

"end_time_in_millis":1446931304750,

"duration_in_millis":1010,

"failures":[],

"shards":{

"total":5,

"failed":0,

"successful":5

}

}

}

Asyoucansee,Elasticsearchpresentsinformationaboutthetimetakenbythesnapshot-

www.EBooksWorld.ir

ingprocess,itsstatus,andtheindicesaffected.

AdditionalparametersThesnapshotcommandalsoacceptsthefollowingadditionalparameters:

indices:Thenamesoftheindicesofwhichwewanttotakesnapshots.ignore_unavailable:Whenthisissettofalse(thedefault),Elasticsearchwillreturnanerrorifanyindexlistedusingtheindicesparameterismissing.Whensettotrue,Elasticsearchwilljustignorethemissingindicesduringbackup.include_global_state:Whenthisissettotrue(thedefault),theclusterstateisalsowrittentothesnapshot(exceptforthetransientsettings).partial:Thesnapshotoperationsuccessdependsontheavailabilityofalltheshards.Ifanyoftheshardsisnotavailable,thesnapshotoperationwillfail.SettingpartialtotruecausesElasticsearchtosaveonlytheavailableshardsandomitthelostones.

Anexampleofusingadditionalparameterscanlookasfollows:

curl-XPUT'localhost:9200/_snapshot/backup/bckp3?

wait_for_completion=true&pretty'-d'{

"indices":"b*",

"include_global_state":"false"

}'

www.EBooksWorld.ir

RestoringasnapshotNowthatwehaveoursnapshotsdone,wewillalsolearnhowtorestoredatafromagivensnapshot.Aswesaidearlier,asnapshotcanbeaddressedbyitsname.Wecanlistallthesnapshotsusingthefollowingcommand:

curl-XGET'localhost:9200/_snapshot/backup/_all?pretty'

TheresponsereturnedbyElasticsearchtotheprecedingcommandshowsthelistofallavailablebackups.Everylistitemissimilartothefollowing:

{

"snapshot":{

"snapshot":"bckp2",

"version_id":2000099,

"version":"2.2.0",

"indices":["news"],

"state":"SUCCESS",

"start_time":"2016-01-07T21:21:43.740Z",

"start_time_in_millis":1446931303740,

"end_time":"2016-01-07T21:21:44.750Z",

"end_time_in_millis":1446931304750,

"duration_in_millis":1010,

"failures":[],

"shards":{

"total":5,

"failed":0,

"successful":5

}

}

}

Therepositorywecreatedearlieriscalledbackup.Torestoreasnapshotnamedbckp1fromoursnapshotrepository,runthefollowingcommand:

curl-XPOST'localhost:9200/_snapshot/backup/bckp1/_restore'

Duringtheexecutionofthiscommand,Elasticsearchtakestheindicesdefinedinthesnapshotandcreatesthemwiththedatafromthesnapshot.However,iftheindexalreadyexistsandisnotclosed,thecommandwillfail.Inthiscase,youmayfinditconvenienttoonlyrestorecertainindices,forexample:

curl-XPOST'localhost:9200/_snapshot/backup/bckp1/_restore?pretty'-d'{

"indices":"c*"}'

Theprecedingcommandrestoresonlytheindicesthatbeginwiththeletterc.Theotheravailableparametersareasfollows:

ignore_unavailable:Thisparameterwhensettofalse(thedefaultbehavior),willcauseElasticsearchtofailtherestoreprocessifanyoftheexpectedindicesisnotavailable.include_global_state:ThisparameterwhensettotruewillcauseElasticsearchtorestoretheglobalstateincludedinthesnapshot,whichisalsothedefaultbehavior.

www.EBooksWorld.ir

rename_pattern:Thisparameterallowstherenamingoftheindexduringarestoreoperation.Thankstothis,therestoredindexwillhaveadifferentname.Thevalueofthisparameterisaregularexpressionthatdefinesthesourceindexname.Ifapatternmatchesthenameoftheindex,namesubstitutionwilloccur.Inthepattern,youshouldusegroupslimitedbyparenthesesusedintherename_replacementparameter.rename_replacement:Thisparameteralongwithrename_patterndefinesthetargetindexname.Usingthedollarsignandnumber,youcanrecalltheappropriategroupfromrename_pattern.

Forexample,duetorename_pattern=products_(.*),onlytheindiceswithnamesthatbeginwithproducts_willberestored.Therestoftheindexnamewillbeusedduringreplacement.rename_pattern=products_(.*)togetherwithrename_replacement=items_$1causestheproducts_carsindextoberestoredtoanindexcalleditems_cars.

www.EBooksWorld.ir

Cleaningup–deletingoldsnapshotsElasticsearchleavessnapshotrepositorymanagementuptoyou.Currently,thereisnoautomaticclean-upprocess.Butdon’tworry;thisissimple.Forexample,let’sremoveourpreviouslytakensnapshot:

curl-XDELETE'localhost:9200/_snapshot/backup/bckp1?pretty'

Andthat’sall.Thecommandcausesthesnapshotnamedbckp1fromthebackuprepositorytobedeleted.

www.EBooksWorld.ir

www.EBooksWorld.ir

Monitoringyourcluster’sstateandhealthMonitoringisessentialwhenitcomestohandlingyourclusterandensuringitisinahealthystate.Itallowsadministratorsanddevelopstodetectpossibleproblemsandpreventthembeforetheyoccurortoactassoonastheystartshowing.Intheworstcase,monitoringallowsustodoapostmortemanalysisofwhathappenedtotheapplication—inthiscase,ourElasticsearchclusterandeachofthenodes.

Elasticsearchprovidesverydetailedinformationthatallowsustocheckandmonitorournodesortheclusterasawhole.Thisincludesstatisticsandinformationabouttheservers,nodes,indices,andshards.Ofcourse,wearealsoabletogetinformationabouttheentireclusterstate.BeforewegetintothedetailsaboutthementionedAPI,pleaserememberthattheAPIiscomplexandweareonlydescribingthebasics.Wewilltrytoshowyouwheretostartsoyou’llbeabletoknowwhattolookforwhenyouneedverydetailedinformation.

www.EBooksWorld.ir

ClusterhealthAPIOneofthemostbasicAPIsistheclusterhealthAPI,whichallowsustogetinformationabouttheentireclusterstatewithasingleHTTPcommand.Forexample,let’srunthefollowingcommand:

curl-XGET'localhost:9200/_cluster/health?pretty'

AsampleresponsereturnedbyElasticsearchfortheprecedingcommandlooksasfollows:

{

"cluster_name":"elasticsearch",

"status":"yellow",

"timed_out":false,

"number_of_nodes":1,

"number_of_data_nodes":1,

"active_primary_shards":11,

"active_shards":11,

"relocating_shards":0,

"initializing_shards":0,

"unassigned_shards":11,

"delayed_unassigned_shards":0,

"number_of_pending_tasks":0,

"number_of_in_flight_fetch":0,

"task_max_waiting_in_queue_millis":0,

"active_shards_percent_as_number":50.0

}

Themostimportantinformationisaboutthestatusofthecluster.Inourexample,weseethattheclusterisinyellowstatus.Thismeansthatalltheprimaryshardshavebeenallocatedproperly,butthereplicaswerenot(becauseofasinglenodeinthecluster,butthatdoesn’tmatterfornow).

Ofcourse,apartfromtheclusternameandstatus,wecanseehowtherequestwastimedout,howmanynodesthereare,howmanydatanodes,primaryshards,initializingshards,unassignedones,andsoon.

Let’sstophereandtalkabouttheclusterandwhenthecluster,asawhole,isfullyoperational.ClusterisfullyoperationalwhenElasticsearchisabletoallocatealltheshardsandreplicasaccordingtotheconfiguration.Thisiswhentheclusterisinthegreenstate.Theyellowstatemeansthatwearereadytohandlerequestsbecausetheprimaryshardsareallocated,butsome(orall)replicasarenot.Thelaststate,theredone,meansthatatleastoneprimaryshardwasnotallocatedandbecauseofthis,theclusterisnotreadyyet.Thatmeansthatthequeriesmayreturnerrorsornotcompleteresults.

Theprecedingcommandcanalsobeexecutedtocheckthehealthstateofcertainindices.Forexample,ifwewouldliketocheckthehealthofthelibraryandmapindices,wewouldrunthefollowingcommand:

curl-XGET'localhost:9200/_cluster/health/library,map/?pretty'

Controllinginformationdetails

www.EBooksWorld.ir

Elasticsearchallowsustospecifyaspeciallevelparameter,whichcantakethevalueofcluster(default),indices,orshards.ThisallowsustocontrolthedetailsofinformationreturnedbythehealthAPI.We’vealreadyseenthedefaultbehavior.Whensettingthelevelparametertoindices,apartfromtheclusterinformation,wewillalsogetperindexhealth.SettingthementionedparametertoshardstellsElasticsearchtoreturnpershardinformationinadditiontowhatwe’veseenintheexample.

AdditionalparametersInadditiontothelevelparameter,wehaveafewadditionalparametersthatcancontrolthebehaviorofthehealthAPI.

Thefirstofthementionedparametersistimeoutandallowsustocontrolhowlongatthemost,thecommandexecutionwillwaitwhenoneofthefollowingparametersisused:wait_for_status,wait_for_nodes,wait_for_relocating_shards,andwait_for_active_shards.Bydefault,itissetto30sandmeansthatthehealthcommandwillwait30secondsmaximumandreturntheresponsebythen.

Thewait_for_statusparameterallowsustotellElasticsearchwhichhealthstatustheclustershouldbeattoreturnthecommand.Itcantakethevaluesofgreen,yellow,andred.Forexample,whensettogreen,thehealthAPIcallwillreturntheresultsuntilthegreenstatusortimeoutisreached.

Thewait_for_nodesparameterallowsustosettherequirednumberofnodesavailabletoreturnthehealthcommandresponse(oruntiladefinedtimeoutisreached).Itcanbesettoanintegernumberlike3ortoasimpleequationlike>=3(means,greaterthanorequaltothreenodes)or<=3(meanslessthanorequaltothreenodes).

Thewait_for_active_shardsparametermeansthatElasticsearchwillwaitforaspecifiednumberofactiveshardstobepresentbeforereturningtheresponse.

Thelastparameteristhewait_for_relocating_shard,whichisbydefaultnotspecified.ItallowsustotellElasticsearchhowmanyrelocatingshardsitshouldwaitfor(oruntilthetimeoutisreached).Settingthisparameterto0meansthatElasticsearchshouldwaitforalltherelocatingshards.

Anexampleusageofthehealthcommandwithsomeofthementionedparametersisasfollows:

curl-XGET'localhost:9200/_cluster/health?

wait_for_status=green&wait_for_nodes=>=3&timeout=100s'

www.EBooksWorld.ir

IndicesstatsAPIElasticsearchindexistheplacewhereourdatalivesanditisacrucialpartformostdeployments.WiththeuseoftheindicesstatsAPIavailableusingthe_statsendpoint,wecangetalotofinformationabouttheindiceslivinginsideourcluster.Ofcourse,aswithmostoftheAPI’sinElasticsearch,wecansendacommandtogettheinformationaboutalltheindices(usingthepure_statsendpoint),aboutoneparticularindex(forexamplelibrary/_stats)orseveralindicesatthesametime(forexamplelibrary,map/_stats).Forexample,tocheckthestatisticsforthemapandlibraryindiceswe’veusedinthebook,wecouldrunthefollowingcommand:

curl-XGET'localhost:9200/library,map/_stats?pretty'

Theresponsetotheprecedingcommandhasmorethan700lines,soweonlydescribeitsstructureomittingtheresponseitself.Apartfromtheinformationabouttheresponsestatusandtheresponsetime,wecanseethreeobjectsnamedprimaries,total(in_allobject),andindices.Theindicesobjectcontainsinformationaboutthelibraryandmapindices.Theprimariesobjectcontainsinformationabouttheprimaryshardsallocatedtothecurrentnode,andthetotalobjectcontainsinformationaboutalltheshardsincludingreplicas.Alltheseobjectscancontainobjectsdescribingaparticularstatisticsuchasthefollowing:docs,store,indexing,get,search,merges,refresh,flush,warmer,query_cache,fielddata,percolate,completion,segments,translog,suggest,request_cache,andrecovery.

WecanlimittheamountofinformationthatwegetfromtheindicesstatsAPIbyprovidingthetypeofdataweareinterestedinusingthenamesofthestatisticsmentionedpreviously.Forexample,ifwewanttogetinformationaboutindexingandsearching,wecanrunthefollowingcommand:

curl-XGET'localhost:9200/library,map/_stats/indexing,search?pretty'

Let’sdiscusstheinformationstoredinthoseobjects.

DocsThedocssectionoftheresponseshowsinformationaboutindexeddocuments.Forexample,itcouldlookasfollows:

"docs":{

"count":4,

"deleted":0

}

Themaininformationisthecount,indicatingthenumberofdocumentsinthedescribedindex.Whenwedeletedocumentsfromtheindex,Elasticsearchdoesn’tremovethesedocumentsimmediatelyandonlymarksthemasdeleted.Documentsarephysicallydeletedduringthesegmentmergeprocess.Thenumberofdocumentsmarkedasdeletedispresentedbythedeletedattributeandshouldbe0rightafterthemerge.

Store

www.EBooksWorld.ir

Thenextstatistic,thestoreone,providesinformationregardingstorage.Forexample,suchasectioncouldlookasfollows:

"store":{

"size_in_bytes":6003,

"throttle_time_in_millis":0

}

Themaininformationisabouttheindex(orindices)size.Wecanalsolookatthrottlingstatistics.ThisinformationisusefulwhenthesystemhasproblemswiththeI/Operformanceandhasconfiguredlimitsonaninternaloperationduringsegmentmerging.

Indexing,get,andsearchTheindexing,get,andsearchsectionsoftheresponseprovideinformationaboutdatamanipulationindexingwithdeleteoperations,usingreal-timegetandsearching.Let’slookatthefollowingexamplereturnedbyElasticsearch:

"indexing":{

"index_total":0,

"index_time_in_millis":0,

"index_current":0,

"delete_total":0,

"delete_time_in_millis":0,

"delete_current":0,

"noop_update_total":0,

"is_throttled":false,

"throttle_time_in_millis":0

},

"get":{

"total":0,

"time_in_millis":0,

"exists_total":0,

"exists_time_in_millis":0,

"missing_total":0,

"missing_time_in_millis":0,

"current":0

},

"search":{

"open_contexts":0,

"query_total":0,

"query_time_in_millis":0,

"query_current":0,

"fetch_total":0,

"fetch_time_in_millis":0,

"fetch_current":0,

"scroll_total":0,

"scroll_time_in_millis":0,

"scroll_current":0

}

Asyoucansee,allofthesestatisticshavesimilarstructures.Wecanreadthetotaltimespentinvariousrequesttypes(inmilliseconds),thenumberofrequests(whichwiththetotaltimeallowsustocalculatetheaveragetimeofasinglequery).Inthecaseofgetrequests,valuableinformationishowmanyfetcheswereunsuccessful(missing

www.EBooksWorld.ir

documents);anindexingrequesthasinformationaboutthrottling,andsearchincludesinformationregardingscrolling.

AdditionalinformationInadditiontothepreviouslydescribedsection,Elasticsearchprovidesthefollowinginformation:

merges:ThissectioncontainsinformationaboutLucenesegmentmergesrefresh:Thissectioncontainsinformationabouttherefreshoperationflush:Thissectioncontainsinformationaboutflusheswarmer:Thissectioncontainsinformationaboutwarmersandforhowlongtheywereexecutedquery_cache:Thisquerycachesstatisticsfielddata:Thisfielddatacachesstatisticspercolate:Thissectioncontainsinformationaboutthepercolatorusagecompletion:Thissectioncontainsinformationaboutthecompletionsuggestersegments:ThissectioncontainsinformationaboutLucenesegmentstranslog:Thissectioncontainsinformationaboutthetransactionlogscountandsizesuggest:Thissectioncontainssuggesters-relatedstatisticsrequest_cache:Thiscontainsshardrequestcachesstatisticsrecovery:Thiscontainsshardsrecoveryinformation

www.EBooksWorld.ir

NodesinfoAPIThenodesinfoAPIprovidesuswithinformationaboutthenodesinthecluster.TogetinformationfromthisAPI,weneedtosendtherequesttothe_nodesRESTendpoints.ThesimplestcommandtoretrievenodesrelatedinformationfromElasticsearchwouldbeasfollows:

curl-XGET'localhost:9200/_nodes?pretty'

ThisAPIcanbeusedtofetchinformationaboutparticularnodesorasinglenodeusingthefollowing:

Nodename:IfwewouldliketogetinformationaboutthenodenamedPulse,wecouldrunacommandtothefollowingRESTendpoint:_nodes/PulseNodeidentifier:Ifwewouldliketogetinformationaboutthenodewithanidentifierequaltony4hftjNQtuKMyEvpUdQWg,wecouldrunacommandtothefollowingRESTendpoint:_nodes/ny4hftjNQtuKMyEvpUdQWgIPaddress:WecanuseIPaddressestogetinformationaboutthenodes.Forexample,ifwewouldliketogetinformationaboutthenodewithanIPaddressequalto192.168.1.103,wecouldrunacommandtothefollowingRESTendpoint:_nodes/192.168.1.103

ParametersfromtheElasticsearchconfiguration:Ifwewouldliketogetinformationaboutallthenodeswiththenode.rackpropertysetto2,wecouldrunacommandtothefollowingRESTendpoint:/_nodes/rack:2

ThisAPIalsoallowsustogetinformationaboutseveralnodesatonceusingthese:

Patterns,forexample:_nodes/192.168.1.*or_nodes/P*Nodesenumeration,forexample:_nodes/Pulse,SlabBothpatternsandenumerations,forexample:/_nodes/P*,S*

ReturnedinformationBydefault,thenodesAPIwillreturnextensiveinformationabouteachnodealongwiththename,identifier,andaddresses.Thisextensiveinformationincludesthefollowing:

settings:TheElasticsearchconfigurationos:Informationabouttheserversuchasprocessor,RAM,andswapspaceprocess:Processidentifierandrefreshintervaljvm:InformationaboutJavaVirtualMachinesuchasmemorylimits,memorypools,andgarbagecollectorsthread_pool:Theconfigurationofthreadpoolsforvariousoperationstransport:Listeningaddressesforthetransportprotocolhttp:InformationaboutlisteningaddressesforanHTTP-basedAPIplugins:Informationaboutthepluginsinstalledbytheusermodules:Informationaboutthebuilt-inplugins

AnexampleusageofthisAPIcanbeillustratedbythefollowingcommand:

curl'localhost:9200/_nodes/Pulse/os,jvm,plugins'

www.EBooksWorld.ir

TheprecedingcommandwillreturnthebasicinformationaboutthenodenamedPulseand,inadditiontothis,itwillincludetheoperatingsysteminformation,javavirtualmachineinformation,andplugins-relatedinformation.

www.EBooksWorld.ir

NodesstatsAPIThenodesstatsAPIissimilartothenodesinfoAPIdescribedintheprecedingsection.ThemaindifferenceisthatthepreviousAPIprovidedinformationabouttheenvironmentinwhichthenodeisrunning,whiletheonewearecurrentlydiscussingtellsusaboutwhathappenedwiththeclusterduringitswork.TousethenodesstatsAPI,youneedtosendacommandtothe/_nodes/statsRESTendpoint.However,similartothenodesinfoAPI,wecanalsoretrieveinformationaboutspecificnodes(forexample:_nodes/Pulse/stats).

ThesimplestcommandtoretrievenodesrelatedinformationfromElasticsearchwouldbeasfollows:

curl-XGET'localhost:9200/_nodes/stats?pretty'

Bydefault,Elasticsearchreturnsalltheavailablestatisticsbutwecanlimittheonesweareinterestedin.Theavailableoptionsareasfollows:

indices:Informationabouttheindicesincludingsize,documentcount,indexingrelatedstatistics,searchandgettime,caches,segmentmerges,andsoonos:Operatingsystemrelatedinformationsuchasfreediskspace,memory,swapusage,andsoonprocess:Memory,CPU,andfilehandlerusagerelatedtotheElasticsearchprocessjvm:Javavirtualmachinememoryandgarbagecollectorstatisticstransport:Informationaboutdatasentandreceivedbythetransportmodulehttp:Informationabouthttpconnectionsfs:InformationaboutavailablediskspaceandI/Ooperationsstatisticsthread_pool:Informationaboutthestateofthethreadsassignedtovariousoperationsbreakers:Informationaboutcircuitbreakersscript:Scriptingenginerelatedinformation

AnexampleusageofthisAPIcanbeillustratedbythefollowingcommand:

curl'localhost:9200/_nodes/Pulse/stats/os,jvm,breaker'

www.EBooksWorld.ir

ClusterstateAPIAnotherAPIprovidedbyElasticsearchistheclusterstateAPI.Asitsnamesuggests,itallowsustogetinformationabouttheentirecluster(wecanalsolimitthereturnedinformationtoalocalnodebyaddingthelocal=trueparametertotherequest).ThebasiccommandusedtogetalltheinformationreturnedbythisAPIlooksasfollows:

curl-XGET'localhost:9200/_cluster/state?pretty'

Wecanalsolimittheprovidedinformationtothegivenmetricsincomma–separatedform,specifiedafterthe_cluster/statepartoftheRESTcall.Forexample:

curl-XGET'localhost:9200/_cluster/state/version,nodes?pretty'

Wecanalsolimittheinformationtothegivenmetricsandindices.Forexample,ifwewouldliketogetthemetadataforthelibraryindex,wecouldrunthefollowingcommand:

curl-XGET'localhost:9200/_cluster/state/metadata/library?pretty'

Thefollowingmetricsareallowedtobeused:

version:Thisreturnsinformationabouttheclusterstateversion.master_node:Thisreturnsinformationabouttheelectedmasternode.nodes:Thisreturnsnodesinformation.routing_table:Thisreturnsroutingrelatedinformation.metadata:Thisreturnsmetadatarelatedinformation.Whenspecifyingretrievingthemetadatametricwecanalsoincludeanadditionalparametersuchasindex_templates=true,whichwillresultinincludingthedefinedindextemplates.blocks:Thisreturnstheblockspartoftheresponse.

www.EBooksWorld.ir

ClusterstatsAPITheclusterstatsAPIallowsustogetstatisticsabouttheindicesandnodesfromtheclusterwideperspective.TousethisAPI,weneedtoruntheGETrequesttothe/_cluster/statsRESTendpoint,forexample:

curl-XGET'localhost:9200/_cluster/stats?pretty'

Theresponsesizedependsonthenumberofshards,indices,andnodesinthecluster.Itwillincludebasicindicesinformationsuchasshards,theirstate,recoveryinformation,cachesinformation,andnoderelatedinformation.

www.EBooksWorld.ir

PendingtasksAPIOneoftheAPI’sthathelpsusinseeingwhatElasticsearchisdoing;itallowsustocheckwhichtasksarewaitingtobeexecuted.Toretrievethisinformation,weneedtosendarequesttothe/_cluster/pending_tasksRESTendpoint.Inthisresponse,wewillseeanarrayoftaskswithinformationaboutthem,suchastaskpriorityandtimeinqueue.

www.EBooksWorld.ir

IndicesrecoveryAPITherecoveryAPIgivesusinsightabouttherecoverystatusoftheshardsthatarebuildingindicesinourcluster(learnmoreaboutrecoveryinThegatewayandrecoverymodulessectionofChapter9,ElasticsearchClusterinDetail).

Thesimplestcommandthatwouldreturntheinformationabouttherecoveryofalltheshardsintheclusterwouldlookasfollows:

curl-XGET'http://localhost:9200/_recovery?pretty'

Wecanalsogetinformationaboutrecoveryforparticularindices,suchasthelibraryindexforexample:

curl-XGET'http://localhost:9200/library/_recovery?pretty'

TheresponsereturnedbyElasticsearchisdividedbyindicesandshards.Aresponseforasingleshardcouldlookasfollows:

{

"id":2,

"type":"STORE",

"stage":"DONE",

"primary":true,

"start_time_in_millis":1446132761730,

"stop_time_in_millis":1446132761734,

"total_time_in_millis":4,

"source":{

"id":"DboTibRlT1KJSQYnDPxwZQ",

"host":"127.0.0.1",

"transport_address":"127.0.0.1:9300",

"ip":"127.0.0.1",

"name":"Plague"

},

"target":{

"id":"DboTibRlT1KJSQYnDPxwZQ",

"host":"127.0.0.1",

"transport_address":"127.0.0.1:9300",

"ip":"127.0.0.1",

"name":"Plague"

},

"index":{

"size":{

"total_in_bytes":156,

"reused_in_bytes":156,

"recovered_in_bytes":0,

"percent":"100.0%"

},

"files":{

"total":1,

"reused":1,

"recovered":0,

"percent":"100.0%"

},

"total_time_in_millis":0,

www.EBooksWorld.ir

"source_throttle_time_in_millis":0,

"target_throttle_time_in_millis":0

},

"translog":{

"recovered":0,

"total":-1,

"percent":"-1.0%",

"total_on_start":-1,

"total_time_in_millis":3

},

"verify_index":{

"check_index_time_in_millis":0,

"total_time_in_millis":0

}

}

Intheprecedingresponse,wecanseeinformationabouttheshardidentifier,thestageofrecovery,informationwhethertheshardisaprimaryorareplica,thetimestampsofthestartandendofrecovery,andthetotaltimetherecoveryprocesstook.Wecanseethesourcenode,targetnode,andinformationabouttheshard’sphysicalstatistics,suchassize,numberoffiles,transactionlog-relatedstatistics,andindexverificationtime.

Itisworthknowingtheinformationaboutthestagesofrecoveryandtypes.Whenitcomestothetypesofrecovery(thetypeattributeintheresponse),wecanexpectthefollowing:theSTORE,SNAPSHOT,REPLICA,andRELOCATINGvalues.Whenitcomestothestageofrecovery(thestageattributeintheresponse),wecanexpectvaluessuchasINIT(recoveryhasnotstarted),INDEX(Elasticsearchcopiesmetadatainformationanddatafromsourcetodestination),START(Elasticsearchisopeningtheshardforuse),FINALIZE(finalstage,whichcleansupgarbage),andDONE(recoveryhasended).

WecanlimittheresponsereturnedbytheindicesrecoveryAPItoonlytheshardsthatarecurrentlyinactiverecoverybyincludingtheactive_only=trueparameterintherequest.Finally,wecanrequestmoredetailedinformationbyaddingthedetailed=trueparameterintheAPIcall.

www.EBooksWorld.ir

IndicesshardstoresAPITheindicesshardstoresAPIgivesusinformationaboutthestorefortheshardsofourindices.WeusethisAPIbyrunningasimplecommandtothe/_shard_storesRESTendpointandprovidingornotprovidingthecomma-separatedindicesnames.

Forexample,togetinformationaboutalltheindices,wewouldrunthefollowingcommand:

curl-XGET'http://localhost:9200/_shard_stores?pretty'

Wecanalsogetinformationaboutparticularindices,suchasthelibraryandmapones:

curl-XGET'http://localhost:9200/library,map/_shard_stores?pretty'

TheresponsereturnedbyElasticsearchcontainsinformationaboutthestoreforeachshard.Forexample,thisiswhatElasticsearchreturnedforoneoftheshardsofthelibraryindex:

"0":{

"stores":[{

"DboTibRlT1KJSQYnDPxwZQ":{

"name":"Plague",

"transport_address":"127.0.0.1:9300",

"attributes":{}

},

"version":6,

"allocation":"primary"

}]

}

Wecanseeinformationaboutthenodeinthestoresarrays.Eachentrycontainsnoderelatedinformation(thenodewheretheshardisphysicallylocated),theversionofthestorecopy,andtheallocation,whichcantakethevaluesofprimary(forprimaryshards),replica(forreplicas),andunused(forunassignedshards).

www.EBooksWorld.ir

IndicessegmentsAPIThelastAPIwewanttomentionistheLucenesegmentsAPIthatcanbeavailedbyusingthe/_segmentsendpoint.Wecaneitherrunitfortheentirecluster,forexamplelikethis:

curl-XGET'localhost:9200/_segments?pretty'

Wecanalsorunthecommandforindividualindices.Forexample,ifwewouldliketogetsegmentsrelatedinformationforthemapandlibraryindices,wewouldusethefollowingcommand:

curl-XGET'localhost:9200/library,map/_segments?pretty'

ThisAPIprovidesinformationaboutshards,theirplacements,andinformationaboutsegmentsconnectedwiththephysicalindexmanagedbytheApacheLucenelibrary.

www.EBooksWorld.ir

www.EBooksWorld.ir

ControllingtheshardandreplicaallocationTheindicesthatliveinsideyourElasticsearchclustercanbebuiltfrommanyshardsandeachshardcanhavemanyreplicas.Theabilitytodivideasingleindexintomultipleshardsgivesusthepossibilityofdividingthedataintomultiplephysicalinstances.Thereasonswhywewanttodothismaybedifferent.Wemaywanttoparallelizeindexingtogetmorethroughput,orwemaywanttohavesmallershardssothatourqueriesarefaster.Ofcourse,wemayhavetoomanydocumentstofitthemonasinglemachineandwemaywantashardbecauseofthis.Withreplicas,wecanparallelizethequeryloadbyhavingmultiplephysicalcopiesofeachshard.Wecansaythat,usingshardsandreplicas,wecanscaleoutElasticsearch.However,Elasticsearchhastofigureoutwhereintheclusteritshouldplaceshardsandreplicas.Itneedstofigureoutonwhichserver/nodeseachshardorreplicashouldbeplaced.

www.EBooksWorld.ir

ExplicitlycontrollingallocationOneofthemostcommonusecasesthatuseexplicitcontrollingofshardsandreplicasallocationinElasticsearchistime-baseddata,thatis,logs.Eachlogeventhasatimestampassociatedwithit;however,theamountoflogsinmostorganizationsisjustenormous.Thethingisthatyouneedalotofprocessingpowertoindexthem,butyoudon’tusuallysearchhistoricaldata.Ofcourse,youmaywanttodothat,butitwillbedonelessfrequentlythanthequeriesforthemostrecentdata.

Becauseofthis,wecandividetheclusterintosocalledtwotiers—thecoldandthehottier.Thehottiercontainsmorepowerfulnodes,onesthathaveveryfastdisks,lotsofCPUprocessingpower,andmemory.Thesenodeswillhandlebothalotofindexingaswellasqueriesforrecentdata.Thecoldtier,ontheotherhand,willcontainnodesthathaveverylargedisks,butarenotveryfast.Wewon’tbeindexingintothecoldtier;wewillonlystoreourhistoricalindiceshereandsearchthemfromtimetotime.WiththedefaultElasticsearchbehavior,wecan’tbesurewheretheshardsandreplicaswillbeplaced,butluckilyElasticsearchallowsustocontrolthis.

NoteThemainassumptionwhenitcomestotimeseriesdataisthatoncetheyareindexed,theyarenotbeingupdated.ThisistrueforlogindexingusecasesandweassumewecreateElasticsearchdeploymentforsuchausecase.

Theideaistocreatetheindicesthatindextoday’sdataonthehotnodesand,whenwestopusingit(whenanotherdaystarts),weupdatetheindexsettingssothatitismovedtothetiercalledcold.Let’snowseehowwecandothis.

SpecifyingnodeparametersSolet’sdivideourclusterintotwotiers.Wesaytiers,buttheycanbeanynameyouwant,wejustliketheterm“tier”anditiscommonlyused.Weassumethatwehavesixnodes.Wewantourmorepowerfulnodesnumbered1and2tobeplacedinthetiercalledhotandthenodesnumbered3,4,5,and6,whicharesmallerintermsofCPUandmemory,butverylargeintermsofdiskspace,tobeplacedinatiercalledcold.

ConfigurationToconfigure,weaddthefollowingpropertytotheelasticsearch.ymlconfigurationfileonnodes1and2(theonesthataremorepowerful):

node.tier:hot

Ofcourse,wewilladdasimilarpropertytotheelasticsearch.ymlconfigurationfileonnodes3,4,5,and6(thelesspowerfulones):

node.tier:cold

IndexcreationNowlet’screateourdailyindexfortoday’sdata,onecalledlogs_2015-12-10.Aswesaid

www.EBooksWorld.ir

earlier,wewantthistobeplacedonthenodesinthehottier.Wedothisbyrunningthefollowingcommands:

curl-XPUT'http://localhost:9200/logs_2015-12-10'-d'{

"settings":{

"index":{

"routing.allocation.include.tier":"hot"

}

}

}'

Theprecedingcommandwillresultinthecreationofthelogs_2015-12-10indexandspecificationoftheindex.routing.allocation.include.tierpropertytoit.Wesetthispropertytothehotvalue,whichmeansthatwewanttoplacethelogs_2015-12-10indexonthenodesthathavethenode.tierpropertysettohot.

Now,whenthedayendsandweneedtocreateanewindex,weagainputitonthehotnodes.Wedothisbyrunningthefollowingcommand:

curl-XPUT'http://localhost:9200/logs_2015-12-11'-d'{

"settings":{

"index":{

"routing.allocation.include.tier":"hot"

}

}

}'

Finally,weneedtotellElasticsearchtomovetheindexholdingthedataforthepreviousdaytothecoldtier.Wedothisbyupdatingtheindexsettingsandsettingtheindex.routing.allocation.include.tierpropertytocold.Thisisdoneusingthefollowingcommand:

curl-XPUT'http://localhost:9200/logs_2015-12-10/_settings'-d'{

"index.routing.allocation.include.tier":"cold"

}'

Afterrunningtheprecedingcommand,Elasticsearchwillstartrelocatingtheindexcalledlogs_2015-12-10tothenodesthathavethenode.tierpropertysettocoldintheelasticsearch.ymlfilewithoutanymanualworkneededfromus.

ExcludingnodesfromallocationInthesamemanneraswespecifiedonwhichnodestheindexshouldbeplaced,wecanalsoexcludenodesfromindexallocation.Referringtothepreviouslyshownexample.ifwewanttheindexcalledlogs_2015-12-10tonotbeplacedonthenodeswiththenode.tierpropertysettocold,wewouldrunthefollowingcommand:

curl-XPUT'localhost:9200/logs_2015-12-10/_settings'-d'{

"index.routing.allocation.exclude.tier":"cold"

}'

Noticethatinsteadoftheindex.routing.allocation.include.tierproperty,we’veusedtheindex.routing.allocation.exclude.tierproperty.

www.EBooksWorld.ir

RequiringnodeattributesInadditiontoinclusionandexclusionrules,wecanalsospecifytherulesthatmustmatchinorderforashardtobeallocatedtoagivennode.Thedifferenceisthatwhenusingtheindex.routing.allocation.includeproperty,theindexwillbeplacedonanynodethatmatchesatleastoneoftheprovidedpropertyvalues.Usingindex.routing.allocation.require,Elasticsearchwillplacetheindexonanodethathasallthedefinedvalues.Forexample,let’sassumethatwe’vesetthefollowingsettingsforthelogs_2015-12-10index:

curl-XPUT'localhost:9200/logs_2015-12-10/_settings'-d'{

"index.routing.allocation.require.tier":"hot",

"index.routing.allocation.require.disk_type":"ssd"

}'

Afterrunningtheprecedingcommand,Elasticsearchwouldonlyplacetheshardsofthelogs_2015-12-10indexonanodewiththenode.tierpropertysettohotandthenode.disk_typepropertysettossd.

UsingtheIPaddressforshardallocationInsteadofaddingaspecialparametertothenodesconfiguration,weareallowedtouseIPaddressestospecifywhichnodeswewanttoincludeorexcludefromtheshardsandreplicasallocation.Inordertodothis,insteadofusingthetierpartoftheindex.routing.allocation.include.tierorindex.routing.allocation.exclude.tierproperties,weshouldusethe_ip.Forexample,ifwewouldlikeourlogs_2015-12-10indextobeplacedonlyonthenodeswiththe10.1.2.10and10.1.2.11IPaddresses,wewouldrunthefollowingcommand:

curl-XPUT'localhost:9200/logs_2015-12-10/_settings'-d'{

"index.routing.allocation.include._ip":"10.1.2.10,10.1.2.11"

}'

NoteInadditionto_ip,Elasticsearchalsoallowsustouse_nametospecifyallocationrulesusingnodenamesand_hosttospecifyallocationrulesusinghostnames.

Disk-basedshardallocationInadditiontothealreadydescribedallocationfilteringmethods,Elasticsearchgivesusdisk-basedshardallocationrules.Itallowsustosetallocationrulesbasedonthenodes’diskusage.

Configuringdiskbasedshardallocation

Therearefourpropertiesthatcontrolthebehaviorofadisk-basedshardallocation.Allofthemcanbeupdateddynamicallyorsetintheelasticsearch.ymlconfigurationfile.

Thefirstoftheseiscluster.info.update.interval,whichisbydefaultsetto30secondsanddefineshowoftenElasticsearchupdatesinformationaboutdiskusageonnodes.

www.EBooksWorld.ir

Thesecondpropertyisthecluster.routing.allocation.disk.watermark.low,whichisbydefaultsetto0.85.ThismeansthatElasticsearchwillnotallocatenewshardstoanodethatusesmorethan85%ofitsdiskspace.

Thethirdpropertyisthecluster.routing.allocation.disk.watermark.high,whichcontrolswhenElasticsearchwillstartrelocatingshardsfromagivennode.Itdefaultsto0.90andmeansthatElasticsearchwillstartreallocatingshardswhenthediskusageonagivennodeisequaltoormorethan90%.

Boththecluster.routing.allocation.disk.watermark.lowandcluster.routing.allocation.disk.watermark.highpropertiescanbesettoapercentagevalue(suchas0.60,meaning60%)andtoanabsolutevalue(suchas600mb,meaning600megabytes).

Finally,thelastpropertyiscluster.routing.allocation.disk.include_relocations,whichbydefaultissettotrue.IttellsElasticsearchtotakeintoaccounttheshardsthatarenotyetcopiedtothenodebutElasticsearchisintheprocessofdoingthat.Havingthisbehaviorturnedonbydefaultmeansthatthedisk-basedallocationmechanismwillbemorepessimisticwhenitcomestoavailablediskspaces(whenshardsarerelocating),butwewon’trunintosituationswhereshardscan’tberelocatedbecausetheassumptionsaboutdiskspacewerewrong.

Disablingdiskbasedshardallocation

Thediskbasedshardallocationisenabledbydefault.Wecandisableitbyspecifyingthecluster.routing.allocation.disk.threshold_enabledpropertyandsettingittofalse.Wecandothisintheelasticsearch.ymlfileordynamicallyusingtheclustersettingsAPI:

curl-XPUTlocalhost:9200/_cluster/settings-d'{

"transient":{

"cluster.routing.allocation.disk.threshold_enabled":false

}

}'

www.EBooksWorld.ir

ThenumberofshardsandreplicaspernodeInadditiontospecifyingshardsandreplicasallocation,wearealsoallowedtospecifythemaximumnumberofshardsthatcanbeplacedonasinglenodeforasingleindex.Forexample,ifwewouldlikeourlogs_2015-12-10indextohaveonlyasingleshardpernode,wewouldrunthefollowingcommand:

curl-XPUT'localhost:9200/logs_2015-12-10/_settings'-d'{

"index.routing.allocation.total_shards_per_node":1

}'

Thispropertycanbeplacedintheelasticsearch.ymlfileorcanbeupdatedonliveindicesusingtheprecedingcommand.PleaserememberthatyourclustercanstayintheredstateifElasticsearchwon’tbeabletoallocatealltheprimaryshards.

www.EBooksWorld.ir

AllocationthrottlingTheElasticsearchallocationmechanismcanbethrottled,whichmeansthatwecancontrolhowmuchresourcesElasticsearchwilluseduringtheshardallocationandrecoveryprocess.Wearegivenfivepropertiestocontrol,whichareasfollows:

cluster.routing.allocation.node_concurrent_recoveries:Thispropertydefineshowmanyconcurrentshardrecoveriesmaybehappeningatthesametimeonanode.Thisdefaultsto2andshouldbeincreasedifyouwouldlikemoreshardstoberecoveredatthesametimeonasinglenode.However,increasingthisvaluewillresultinmoreresourceconsumptionduringrecovery.Also,pleaserememberthatduringthereplicarecoveryprocess,datawillbecopiedfromtheothernodesoverthenetwork,whichcanbeslow.cluster.routing.allocation.node_initial_primaries_recoveries:Thispropertydefaultsto4anddefineshowmanyprimaryshardsarerecoveredatthesametimeonagivennode.Becauseprimaryshardrecoveryusesdatafromlocaldisks,thisprocessshouldbeveryfast.cluster.routing.allocation.same_shard.host:ABooleanpropertythatdefaultstofalseandisapplicableonlywhenmultipleElasticsearchnodesarestartedonthesamemachine.Whensettotrue,thiswillforceElasticsearchtocheckwhetherphysicalcopiesofthesameshardarepresentonasinglephysicalmachine.Thedefaultfalsevaluemeansnocheckisdone.indices.recovery.concurrent_streams:Thisisthenumberofnetworkstreamsusedtocopydatafromothernodesthatcanbeusedconcurrentlyonasinglenode.Themorethestreams,thefasterthedatawillbecopied,butthiswillresultinmoreresourceconsumption.Thispropertydefaultsto3.indices.recovery.concurrent_small_file_streams:Thisissimilartotheindices.recovery.concurrent_streamsproperty,butdefineshowmanyconcurrentdatastreamsElasticsearchwillusetocopysmallfiles(onesthatareunder5mbinsize).Thispropertydefaultsto2.

Thisallowsustoperformachecktopreventtheallocationofmultipleinstancesofthesameshardonasinglehost,basedonhostnameandhostaddress.Thisdefaultstofalse,meaningthatnocheckisperformedbydefault.Thissettingonlyappliesifmultiplenodesarestartedonthesamemachine.

www.EBooksWorld.ir

Cluster-wideallocationInadditiontotheperindicesallocationsettings,Elasticsearchalsoallowsustocontrolshardandindicesallocationonacluster-widebasis—socalledshardallocationawareness.Thisisespeciallyusefulwhenwehavenodesindifferentphysicalracksandwewouldliketoplaceshardsandreplicasindifferentphysicalnodes.

Let’sstartwithasimpleexample.Weassumethatwehaveaclusterbuiltoffournodes.Eachnodeinadifferentphysicalrack.Thesimplegraphicthatillustratesthisisasfollows:

Asyoucansee,ourclusterisbuiltfromfournodes.EachnodewasboundtoaspecificIPaddressandeachnodewasgiventhetagpropertyandagroupproperty(addedtoelasticsearch.ymlasthenode.tagandnode.groupproperties).Thisclusterwillservethepurposeofshowinghowshardallocationfilteringworks.Thegroupandtagpropertiescanbegivenwhatevernamesyouwant,youjustneedtoprefixyourdesiredpropertynamewiththenodename,forexample,ifyouwouldliketouseapartypropertyname,youwouldjustaddnode.party:party1toyourelasticsearch.yml.

AllocationawarenessAllocationawarenessallowsustoconfigureshardsandtheirreplicasallocationwiththeuseofgenericparameters.Inordertoillustratehowallocationawarenessworks,wewilluseourexamplecluster.Fortheexampletowork,weshouldaddthefollowingpropertytotheelasticsearch.ymlfile:

www.EBooksWorld.ir

cluster.routing.allocation.awareness.attributes:group

ThiswilltellElasticsearchtousethenode.grouppropertyastheawarenessparameter.

NoteYoucanspecifymultipleattributeswhensettingthecluster.routing.allocation.awareness.attributesproperty.Forexample:cluster.routing.allocation.awareness.attributes:group,node

Afterthis,let’sstartthefirsttwonodes,theoneswiththenode.groupparameterequaltogroupA,andlet’screateanindexbyrunningthefollowingcommand:

curl-XPOST'localhost:9200/awarness'-d'{

"settings":{

"index":{

"number_of_shards":1,"number_of_replicas":1

}

}

}'

Afterthiscommand,ourtwo-nodeclusterwilllookmoreorlesslikethis:

Asyoucansee,theindexwasdividedbetweenthetwonodesevenly.Nowlet’sseewhathappenswhenwelaunchtherestofthenodes(theoneswithnode.groupsettogroupB):

www.EBooksWorld.ir

Noticethedifference—theprimaryshardswerenotmovedfromtheiroriginalallocationnodes,butthereplicashardsweremovedtothenodeswithadifferentnode.groupvalue.That’sexactlyright;whenusingshardallocationawareness,Elasticsearchwon’tallocatetheprimaryshardsandreplicasofthesameindextothenodeswiththesamevalueofthepropertyusedtodeterminetheallocationawareness(whichinourcaseisthenode.group).

NotePleaserememberthatwhenusingallocationawareness,shardswillnotbeallocatedtothenodethatdoesn’thavetheexpectedattributesset.Soinourexample,anodewithoutthenode.grouppropertysetwillnotbetakenintoconsiderationbytheallocationmechanism.

ForcingallocationawarenessForcingallocationawarenesscancomeinhandywhenweknow,inadvance,howmanyvaluesourawarenessattributescantakeandwedon’twantmorereplicasthanneededtobeallocatedinourcluster,forexample,nottooverloadourclusterwithtoomanyreplicas.Forthis,wecanforcetheallocationawarenesstobeactiveonlyforcertainattributes.Wecanspecifythesevaluesusingthecluster.routing.allocation.awareness.force.zone.valuespropertyandprovidingalistofcomma-separatedvaluestoit.Forexample,ifwewouldliketheallocationawarenesstouseonlythegroupAandgroupBvaluesofthenode.groupproperty,wewouldaddthefollowingtotheelasticsearch.ymlfile:

www.EBooksWorld.ir

cluster.routing.allocation.awareness.attributes:group

cluster.routing.allocation.awareness.force.zone.values:groupA,groupB

FilteringElasticsearchallowsustoconfigureallocationfortheentireclusterorfortheindexlevel.Inthecaseofclusterallocation,wecanusethepropertiesprefixes:

cluster.routing.allocation.include

cluster.routing.allocation.require

cluster.routing.allocation.exclude

Whenitcomestoindex-specificallocation,wecanusethefollowingpropertiesprefixes:

index.routing.allocation.include

index.routing.allocation.require

index.routing.allocation.exclude

Thepreviouslymentionedprefixescanbeusedwiththepropertiesthatwe’vedefinedintheelasticsearch.ymlfile(ourtagandgroupproperties)andwithaspecialpropertycalled_ipthatallowsustomatchorexcludetheuseofthenodes’IPaddresses,forexample,likethis:

cluster.routing.allocation.include._ip:192.168.2.1

IfwewouldliketoincludenodeswithagrouppropertymatchingthegroupAvalue,wewouldsetthefollowingproperty:

cluster.routing.allocation.include.group:groupA

Noticethatwe’veusedthecluster.routing.allocation.includeprefixandwe’veconcatenateditwiththenameoftheproperty,whichisgroupinourcase.

Whatdoinclude,exclude,andrequiremean

Ifyoulookcloselyattheprecedingparameters,youwillnoticethattherearethreekinds:

include:Thistypewillresultinincludingallthenodeswiththisparameterdefined.Ifmultipleincludeconditionsarevisiblethanallthenodesthatmatchatleastaoneoftheseconditionswillbetakenintoconsiderationwhenallocatingshards.Forexample,ifweaddtwocluster.routing.allocation.include.tagparameterstoourconfiguration,onewithapropertywiththevalueofnode1andsecondwiththenode2value,wewouldendupwithindices(actuallytheirshards)beingallocatedtothefirstandsecondnode(countingfromlefttoright).TosumupthenodesthathavetheincludeallocationparametertypewillbetakenintoconsiderationbyElasticsearchwhenchoosingthenodestoplaceshardson,butthisdoesn’tmeanthatElasticsearchwillputshardsinthem.require:Thisparameter,whichwasintroducedintheElasticsearch0.90typeofallocationfilter,requiresallthenodestohaveavaluethatmatchesthevalueofthisproperty.Forexample,ifweaddonecluster.routing.allocation.require.tagparametertoourconfigurationwiththevalueofnode1andacluster.routing.allocation.require.groupparameterwiththevalueofgroupA,

www.EBooksWorld.ir

wewouldendupwithshardsallocatedonlytothefirstnode(theonewithanIPaddressof192.168.2.1).exclude:Thisparameterallowsustoexcludenodeswithgivenpropertiesfromtheallocationprocess.Forexample,ifwesetcluster.routing.allocation.include.tagtogroupA,wewouldendupwithindicesbeingallocatedonlytothenodeswithIPaddresses192.168.3.1and192.168.3.2(thethirdandfourthnodesinourexample).

NoteThepropertyvaluecanusesimplewildcardcharacters.Forexample,ifwewanttoincludeallthenodesthathavethegroupparametervaluebeginningwithgroup,wecouldsetthecluster.routing.allocation.include.grouppropertytogroup*.Intheexampleclustercase,thiswouldresultinmatchingnodeswiththegroupAandgroupBgroupparametervalues.

www.EBooksWorld.ir

ManuallymovingshardsandreplicasThelastthingwewantedtodiscussistheabilitytomanuallymoveshardsbetweennodes.Elasticsearchexposesthe_cluster/rerouteRESTend-point,whichallowsustocontrolthat.Thefollowingoperationsareavailable:

MovingashardfromnodetonodeCancellingshardallocationForcingshardallocation

Nowlet’slookcloselyatalloftheprecedingoperations.

MovingshardsLet’ssaywehavetwonodescalledes_node_oneandes_node_two,andwehavetwoshardsoftheshopindexplacedbyElasticsearchonthefirstnodeandwewouldliketomovethesecondshardtothesecondnode.Inordertodothis,wecanrunthefollowingcommand:

curl-XPOST'localhost:9200/_cluster/reroute'-d'{

"commands":[{

"move":{

"index":"shop",

"shard":1,

"from_node":"es_node_one",

"to_node":"es_node_two"

}

}]

}'

We’vespecifiedthemovecommand,whichallowsustomoveshards(andreplicas)oftheindexspecifiedbytheindexproperty.Theshardpropertyisthenumberofshardswewanttomove.And,finally,thefrom_nodepropertyspecifiesthenameofthenodewewanttomovetheshardfromandtheto_nodepropertyspecifiesthenameofthenodewewanttheshardtobeplacedon.

CancelingshardallocationIfwewouldliketocancelanon-goingallocationprocess,wecanrunthecancelcommandandspecifytheindex,node,andshardwewanttocanceltheallocationfor.Forexample:

curl-XPOST'localhost:9200/_cluster/reroute'-d'{

"commands":[{

"cancel":{

"index":"shop",

"shard":0,

"node":"es_node_one"

}

}]

}'

Theprecedingcommandwouldcanceltheallocationofshard0oftheshopindexonthe

www.EBooksWorld.ir

es_node_onenode.

ForcingshardallocationInadditiontocancellingandmovingshardsandreplicas,wearealsoallowedtoallocateanunallocatedshardtoaspecificnode.Forexample,ifwehaveanunallocatedshardnumbered0fortheusersindexandwewouldlikeittobeallocatedtoes_node_twobyElasticsearch,wewouldrunthefollowingcommand:

curl-XPOST'localhost:9200/_cluster/reroute'-d'{

"commands":[{

"allocate":{

"index":"users",

"shard":0,

"node":"es_node_two"

}

}]

}'

MultiplecommandsperHTTPrequestWecan,ofcourse,includemultiplecommandsinasingleHTTPrequest.Forexample:

curl-XPOST'localhost:9200/_cluster/reroute'-d'{

"commands":[

{"move":{"index":"shop","shard":1,"from_node":"es_node_one",

"to_node":"es_node_two"}},

{"cancel":{"index":"shop","shard":0,"node":"es_node_one"}}

]

}'

AllowingoperationsonprimaryshardsThecancelandallocatecommandsacceptanadditionalallow_primaryparameter.Ifsettotrue,ittellsElasticsearchthattheoperationcanbeperformedontheprimaryshard.Pleasebeadvisedthatoperationswiththeallow_primaryparametersettotruemayresultindataloss.

www.EBooksWorld.ir

HandlingrollingrestartsThereisonemorethingthatwewouldliketodiscusswhenitcomestoshardandreplicaallocation—handlingrollingrestarts.WhenElasticsearchisrestarted,itmaytakesometimetogetitbacktothecluster.Duringthistime,therestoftheclustermaydecidetodorebalancingandmoveshardsaround.Whenweknowwearedoingrollingrestarts,forexample,toupdateElasticsearchtoanewversionorinstallaplugin,wemaywanttotellthistoElasticsearch.Theprocedureforrestartingeachnodeshouldbeasfollows:

First,beforeyoudoanymaintenance,youshouldstoptheallocationbysendingthefollowingcommand:

curl-XPUT'localhost:9200/_cluster/settings'-d'{

"transient":{

"cluster.routing.allocation.enable":"none"

}

}'

ThiswilltellElasticsearchtostopallocation.Afterthis,wewillstopthenodewewanttodomaintenanceonandstartitagain.Afteritjoinsthecluster,wecanenabletheallocationagainbyrunningthefollowing:

curl-XPUT'localhost:9200/_cluster/settings'-d'{

"transient":{

"cluster.routing.allocation.enable":"all"

}

}'

Thiswillenabletheallocationagain.Thisprocedureshouldberepeatedforeachnodewewanttoperformmaintenanceon.

www.EBooksWorld.ir

www.EBooksWorld.ir

ControllingclusterrebalancingBydefault,Elasticsearchtriestokeeptheshardsandtheirreplicasevenlybalancedacrossthecluster.Suchbehaviorisgoodinmostcases,buttherearetimeswhenwewanttocontrolthisbehavior—forexample,duringrollingrestarts.Wedon’twanttorebalancetheentireclusterwhenoneortwonodesarerestarted.Inthissection,wewilllookathowtoavoidclusterrebalanceandcontrolthisprocess’behaviorindepth.

Imagineasituationwhereyouknowthatyournetworkcanhandleveryhighamountsoftrafficortheoppositeofthis—yournetworkisusedextensivelyandyouwanttoavoidtoomuchloadonit.TheotherexampleisthatyoumaywanttodecreasethepressurethatisputonyourI/Osubsystemafterafull-clusterrestartandyouwanttohavelessshardsandreplicasbeinginitializedatthesametime.Theseareonlytwoexampleswhererebalancecontrolmaybehandy.

www.EBooksWorld.ir

UnderstandingrebalanceRebalancingistheprocessofmovingshardsbetweendifferentnodesinourcluster.Aswehavealreadymentioned,itisfineinmostsituations,butsometimesyoumaywanttocompletelyavoidthis.Forexample,ifwedefinehowourshardsareplacedandwewanttokeepitthisway,wemaywanttoavoidrebalancing.However,bydefault,ElasticsearchwilltrytorebalancetheclusterwhenevertheclusterstatechangesandElasticsearchthinksarebalanceisneeded(andthedelayedtimeouthaspassedasdiscussedinThegatewayandrecoverymodulessectionofChapter9,ElasticsearchClusterinDetail).

www.EBooksWorld.ir

ClusterbeingreadyWealreadyknowthatourindicesarebuiltfromshardsandreplicas.Primaryshardsorjustshardsaretheonesthatgetthedatafirst.Thereplicasarephysicalcopiesoftheprimariesandgetthedatafromthem.Youcanthinkoftheclusterasbeingreadytobeusedwhenalltheprimaryshardsareassignedtotheirnodesinyourcluster–assoonastheyellowhealthstateisachieved.However,Elasticsearchmaystillinitializeothershards–thereplicas.However,youcanuseyourclusterandbesurethatyoucansearchyourentiredatasetandsendindexchangecommands.Thenthecommandswillbeprocessedproperly.

www.EBooksWorld.ir

TheclusterrebalancesettingsElasticsearchletsuscontroltherebalanceprocesswiththeuseofafewpropertiesthatcanbesetintheelasticsearch.ymlfileorbyusingtheElasticsearchRESTAPI(asdescribedinTheupdatesettingsAPIsectionofChapter9,ElasticsearchClusterinDetail).

ControllingwhenrebalancingwillbeallowedThecluster.routing.allocation.allow_rebalancepropertyallowsustospecifywhenrebalancingisallowed.Thispropertycantakethefollowingvalues:

always:Rebalancingwillbeallowedassoonasit’sneededindices_primaries_active:Rebalancingwillbeallowedwhenalltheprimaryshardsareinitializedindices_all_active:Thedefaultone,whichmeansthatrebalancingwillbeallowedwhenalltheshardsandreplicasareinitialized

Thecluster.routing.allocation.allow_rebalancepropertycanbesetintheelasticsearch.ymlconfigurationfileandupdateddynamicallyaswell.

ControllingthenumberofshardsbeingmovedbetweennodesconcurrentlyThecluster.routing.allocation.cluster_concurrent_rebalancepropertyallowsustospecifyhowmanyshardscanbemovedbetweennodesatonceintheentirecluster.Ifyouhaveaclusterthatisbuiltfrommanynodes,youcanincreasethisvalue.Thisvaluedefaultsto2.Youcanincreasethedefaultvalueifyouwouldliketherebalancingtobeperformedfaster,butthiswillputmorepressureonyourclusterresourcesandwillaffectindexingandquerying.Thecluster.routing.allocation.cluster_concurrent_rebalancepropertycanbesetintheelasticsearch.ymlconfigurationfileandupdateddynamicallyaswell.

ControllingwhichshardsmayberebalancedThecluster.routing.allocation.enablepropertyallowsustospecifywhenwhichshardswillbeallowedtoberebalancedbyElasticsearch.Thispropertycantakethefollowingvalues:

all:Thedefaultbehavior,whichtellsElasticsearchtorebalancealltheshardsintheclusterprimaries:Thisvalueallowstherebalancingoftheprimaryshardsonlyreplicas:Thisvalueallowstherebalancingofthereplicashardsonlynone:Thisvaluedisablestherebalancingofalltypeofshardsforallindicesinthecluster

Thecluster.routing.allocation.enablepropertycanbesetintheelasticsearch.ymlconfigurationfileandupdateddynamicallyaswell.

www.EBooksWorld.ir

www.EBooksWorld.ir

TheCatAPITheElasticsearchAdminAPIisquiteextensiveandcoversalmosteverypartofElasticsearcharchitecture:fromlow-levelinformationaboutLucenetohigh-levelonesabouttheclusternodesandtheirhealth.AllthisinformationisavailableusingtheElasticsearchJavaAPIaswellastheRESTAPI.However,thereturneddata,eventhoughitisaJSONdocument,isnotveryreadablebyauser,atleastwhenitcomestotheamountofinformationgiven.

Becauseofthis,Elasticsearchprovidesuswithamorehuman-friendlyAPI–theCatAPI.ThespecialCatAPIreturnsdatainasimpletext,tabularformatandwhat’smore–itprovidesaggregateddatathatisusuallyusablewithoutanyfurtherprocessing.

www.EBooksWorld.ir

ThebasicsThebaseendpointfortheCatAPIisquiteobvious:itis/_cat.Withoutanyparameters,itshowsalltheavailableendpointsforthisAPI.Wecancheckthisbyrunningthefollowingcommand:

curl-XGET'localhost:9200/_cat'

TheresponsereturnedbyElasticsearchshouldbesimilaroridentical(dependingonyourElasticsearchversion)tothefollowingone:

=^.^=

/_cat/allocation

/_cat/shards

/_cat/shards/{index}

/_cat/master

/_cat/nodes

/_cat/indices

/_cat/indices/{index}

/_cat/segments

/_cat/segments/{index}

/_cat/count

/_cat/count/{index}

/_cat/recovery

/_cat/recovery/{index}

/_cat/health

/_cat/pending_tasks

/_cat/aliases

/_cat/aliases/{alias}

/_cat/thread_pool

/_cat/plugins

/_cat/fielddata

/_cat/fielddata/{fields}

/_cat/nodeattrs

/_cat/repositories

/_cat/snapshots/{repository}

SolookingfromthetopElasticsearchallowsustogetthefollowinginformationusingtheCatAPI:

Shardallocation-relatedinformationAllshards-relatedinformation(alsoonelimitedtoagivenindex)InformationaboutthemasternodeNodesinformationIndicesstatistics(alsoonelimitedtoagivenindex)Segmentsstatistics(alsoonelimitedtoagivenindex)Documentscount(alsoonelimitedtoagivenindex)Recoveryinformation(alsoonelimitedtoagivenindex)ClusterhealthTaskspendingforexecutionIndexaliasesandindicesforagivenaliasThreadpoolconfiguration

www.EBooksWorld.ir

PluginsinstalledoneachnodeFielddatacachesizeandfielddatacachesizesforindividualfieldsNodeattributesinformationDefinedbackuprepositoriesSnapshotscreatedinthebackuprepository

www.EBooksWorld.ir

UsingCatAPIUsingtheCatAPIisassimpleasrunningtheGETrequesttotheoneofthepreviouslymentionedRESTend-points.Forexample,togetinformationabouttheclusterstate,wecouldrunthefollowingcommand:

curl-XGET'localhost:9200/_cat/health'

TheresponsereturnedbyElasticsearchfortheprecedingcommandshouldbesimilartothefollowingone,but,ofcourse,willbedependentonyourcluster:

144629204112:47:21elasticsearchyellow11212100210-50.0%

Thisiscleanandnice.Becauseitisintabularformat,itisalsoeasytousetheresponseintoolssuchasgrep,awk,orsed–astandardsetoftoolsforeveryadministrator.Itisalsomorereadableonceyouknowwhatitisallabout.

Toaddaheaderdescribingeachcolumnpurpose,wejustneedtoaddanadditionalvparameter,justlikethis:

curl-XGET'localhost:9200/_cat/health?v'

CommonargumentsEveryCatAPIendpointhasitsownarguments,butthereareafewcommonoptionsthataresharedamongallofthem:

v:Thisaddsaheaderlinetotheresponsewiththenamesofpresenteditems.h:Thisallowsustoshowonlythechosencolumns,forexampleh=status,node.total,shards,pri.help:Thislistsallthepossiblecolumnsthatthisparticularendpointisabletoshow.Thecommandshowsthenameoftheparameter,itsabbreviation,anddescription.bytes:Thisistheformatfortheinformationrepresentingthevaluesinbytes.Aswesaidearlier,theCatAPIisdesignedtobeusedbyhumansandbecauseofthis,bydefault,thesevaluesarerepresentedinhuman-readableform,forexample:3.5kBor40GB.Thebytesoptionallowsthesettingofthesamebaseforallthenumbers,sosortingornumericalcomparisonwillbeeasier.Forexample,bytes=bpresentsallvaluesinbytes,bytes=kinkilobytes,andsoon.

NoteForthefulllistofargumentsforeachCatAPIendpoint,pleaserefertotheofficialElasticsearchdocumentationavailableat:https://www.elastic.co/guide/en/elasticsearch/reference/2.2/cat.html.

www.EBooksWorld.ir

TheexamplesWhenwewrotethisbook,theCatAPIhadtwenty-twoendpoints.Wedon’twanttodescribethemall–itwouldbearepeatofinformationcontainedinthedocumentationanditdoesn’tmakesense.However,wedidn’twanttoleavethissectionwithoutanexampleregardingtheusageoftheCatAPI.Becauseofthis,wedecidedtoshowhoweasilyyoucangetinformationusingtheCatAPIcomparedtothestandardJSONAPIexposedbyElasticsearch.

GettinginformationaboutthemasternodeThefirstexampleshowshoweasyitistogetinformationaboutwhichnodeinourclusteristhemasternode.Bycallingthe/_cat/masterRESTendpointwecangetinformationaboutthenodesandwhichoneofthemiscurrentlybeingelectedasamaster.Forexample,let’srunthefollowingcommand:

curl-XGET'localhost:9200/_cat/master?v'

TheresponsereturnedbyElasticsearchformylocaltwo-nodeclusterlooksasfollows:

idhostipnode

Cfj3tzqpSNi5SZx4g8osAg127.0.0.1127.0.0.1Skin

Asyoucanseeinresponse,we’vegottheinformationaboutwhichnodeiscurrentlyelectedasthemaster:wecanseeitsidentifier,IPaddress,andname.

GettinginformationaboutthenodesThe/_cat/nodesRESTendpointprovidesinformationaboutallthenodesinthecluster.Let’sseewhatElasticsearchwillreturnafterrunningthefollowingcommand:

curl-XGET'localhost:9200/_cat/nodes?v&h=name,node.role,load,uptime'

Intheprecedingexample,wehaveusedthepossibilityofchoosingwhatinformationwewanttogetfromtheapproximatelyseventyoptionsofthisendpoint.Wehavechosentogetonlythenodename,itsrole—whetherthenodeisadataorclientnode-,nodeload,anditsuptime.

AndtheresponsereturnedbyElasticsearchlooksasfollows:

namenode.roleloaduptime

Skind2.001.3h

Asyoucansee,the/_cat/nodesRESTendpointprovidesalltherequestedinformationaboutthenodesinthecluster.

RetrievingrecoveryinformationforanindexAnotherniceexampleofusingtheCatAPIisgettinginformationabouttherecoveryofasingleindexoralltheindices.Inourcase,wewillretrieverecoveryinformationforasinglelibraryindexbyrunningthefollowingcommand:

curl-XGET'localhost:9200/_cat/recovery/library?

www.EBooksWorld.ir

v&h=index,shard,time,type,stage,files_percent'

Theresponsefortheprecedingcommandlooksasfollows:

indexshardtimetypestagefiles_percent

library075storedone100.0%

library183storedone100.0%

library288storedone100.0%

library379storedone100.0%

library45storedone100.0%

www.EBooksWorld.ir

www.EBooksWorld.ir

WarmingupSometimes,theremaybeaneedtoprepareElasticsearchtohandleyourqueries.Maybeit’sbecauseyouheavilyrelyonthefielddatacacheandyouwantittobeloadedbeforeyourproductionqueriesarrive,ormaybeyouwanttowarmupyouroperatingsystem’sI/Ocachesothatthedataindicesfilesarereadfromthecache.Whateverthereason,Elasticsearchallowsustousesocalledwarmingqueriesforourtypesandindices.

www.EBooksWorld.ir

DefininganewwarmingqueryAwarmingqueryisnothingmorethantheusualquerystoredinaspecialtypecalled_warmerinElasticsearch.Let’sassumethatwehavethefollowingquerythatwewanttouseforwarmingup:

curl-XGETlocalhost:9200/library/_search?pretty-d'{

"query":{

"match_all":{}

},

"aggs":{

"warming_aggs":{

"terms":{

"field":"tags"

}

}

}

}'

Tostoretheprecedingqueryasawarmingqueryforourlibraryindex,wewillrunthefollowingcommand:

curl-XPUT'localhost:9200/library/_warmer/tags_warming_query'-d'{

"query":{

"match_all":{}

},

"aggs":{

"warming_aggs":{

"terms":{

"field":"tags"

}

}

}

}'

Theprecedingcommandwillregisterourqueryasawarmingquerywiththetags_warming_queryname.Youcanhavemultiplewarmingqueriesforyourindex,buteachofthesequeriesneedstohaveauniquename.

Wecannotonlydefinewarmingqueriesfortheentireindex,butalsoforthespecifictypeinit.Forexample,tostoreourpreviouslyshownqueryasthewarmingqueryonlyforthebooktypeinthelibraryindex,runtheprecedingcommandnottothe/library/_warmerURIbutto/library/book/_warmer.So,theentirecommandwillbeasfollows:

curl-XPUT'localhost:9200/library/book/_warmer/tags_warming_query'-d'{

"query":{

"match_all":{}

},

"aggs":{

"warming_aggs":{

"terms":{

"field":"tags"

}

}

www.EBooksWorld.ir

}

}'

Afteraddingawarmingquery,beforeElasticsearchallowsanewsegmenttobesearchedon,itwillbewarmedupbyrunningthedefinedwarmingqueriesonthatsegment.ThisallowsElasticsearchandtheoperatingsystemtocachedataand,thus,speedupsearching.

JustaswereadintheFulltextsearchingsectionofChapter1,GettingStartedwithElasticsearchCluster,Lucenedividestheindexintopartscalledsegments,whichoncewrittencan’tbechanged.Everynewcommitoperationcreatesanewsegment(whichiseventuallymergedifthenumberofsegmentsistoohigh),whichLuceneusesforsearching.

NotePleasenotethattheWarmerAPIwillberemovedinthefutureversionsofElasticsearch.

www.EBooksWorld.ir

RetrievingthedefinedwarmingqueriesInordertogetaspecificwarmingqueryforourindex,wejustneedtoknowitsname.Forexample,ifwewanttogetthewarmingquerynamedastags_warming_queryforourlibraryindex,wewillrunthefollowingcommand:

curl-XGET'localhost:9200/library/_warmer/tags_warming_query?pretty'

TheresultreturnedbyElasticsearchwillbeasfollows:

{

"library":{

"warmers":{

"tags_warming_query":{

"types":["book"],

"source":{

"query":{

"match_all":{}

},

"aggs":{

"warming_aggs":{

"terms":{

"field":"tags"

}

}

}

}

}

}

}

}

Wecanalsogetallthewarmingqueriesfortheindexandtypeusingthefollowingcommand:

curl-XGET'localhost:9200/library/_warmer?pretty'

Andfinally,wecanalsogetallthewarmingqueriesthatstartwithagivenprefix.Forexample,ifwewanttogetallthewarmingqueriesforthelibraryindexthatstartwiththetagsprefix,wewillrunthefollowingcommand:

curl-XGET'localhost:9200/library/_warmer/tags*?pretty'

www.EBooksWorld.ir

DeletingawarmingqueryDeletingawarmingqueryisverysimilartogettingone;wejustneedtousetheDELETEHTTPmethod.Todeleteaspecificwarmingqueryfromourindex,wejustneedtoknowitsname.Forexample,ifwewanttodeletethewarmingquerynamedtags_warming_queryforourlibraryindex,wewillrunthefollowingcommand:

curl-XDELETE'localhost:9200/library/_warmer/tags_warming_query'

Wecanalsodeleteallthewarmingqueriesfortheindexusingthefollowingcommand:

curl-XDELETE'localhost:9200/library/_warmer/_all'

Andfinally,wecanalsoremoveallthewarmingqueriesthatstartwithagivenprefix.Forexample,ifwewanttoremoveallthewarmingqueriesforthelibraryindexthatstartwiththetagsprefix,wewillrunthefollowingcommand:

curl-XDELETE'localhost:9200/library/_warmer/tags*'

www.EBooksWorld.ir

DisablingthewarmingupfunctionalityTodisablethewarmingqueriestotallybuttosavetheminthe_warmerindex,youshouldsettheindex.warmer.enabledconfigurationpropertytofalse(settingthispropertytotruewillresultinenablingthewarmingupfunctionality).Thissettingcanbeeitherputintheelasticsearch.ymlfileorjustsetusingtheRESTAPIonalivecluster.

Forexample,ifwewanttodisablethewarmingupfunctionalityforthelibraryindex,wewillrunthefollowingcommand:

curl-XPUT'localhost:9200/library/_settings'-d'{

"index.warmer.enabled":false

}'

www.EBooksWorld.ir

ChoosingqueriesforwarmingFinally,weshouldaskourselvesonequestion:whichqueriesshouldbeconsideredascandidatesforwarming.Typically,you’llwanttochooseonesthatareexpensivetoexecuteandonesthatrequirecachestobepopulated.Soyou’llprobablywanttochoosequeriesthatincludeaggregationsandsortingbasedonthefieldsinyourindex.Thiswillforcetheoperatingsystemtoloadthepartoftheindicesthatholdthedatarelatedtosuchqueriesandimprovetheperformanceofconsecutivequeriesthatarerun.Inadditiontothis,parent-childqueriesandnestedqueriesarealsopotentialcandidatesforwarming.Youmayalsochooseotherqueriesbylookingatthelogs,andfindingwhereyourperformanceisnotasgreatasyouwantittobe.Suchqueriesmayalsobeperfectcandidatesforwarmingup.

Forexample,let’ssaythatwehavethefollowingloggingconfigurationsetintheelasticsearch.ymlfile:

index.search.slowlog.threshold.query.warn:10s

index.search.slowlog.threshold.query.info:5s

index.search.slowlog.threshold.query.debug:2s

index.search.slowlog.threshold.query.trace:1s

Andwehavethefollowinglogginglevelsetinthelogging.ymlconfigurationfile:

logger:

index.search.slowlog:TRACE,index_search_slow_log_file

Noticethattheindex.search.slowlog.threshold.query.tracepropertyissetto1sandtheindex.search.slowloglogginglevelissettoTRACE.Thismeansthatwheneveraqueryisexecutedforlongerthanonesecond(onashard,notintotal),itwillbeloggedintotheslowlogfile(thenameofwhichisspecifiedbytheindex_search_slow_log_fileconfigurationsectionofthelogging.ymlconfigurationfile).Forexample,thefollowingcanbefoundinaslowlogfile:

[2015-11-2519:53:00,248][TRACE][index.search.slowlog.query]

took[340000.2ms],took_millis[3400],types[],stats[],

search_type[QUERY_THEN_FETCH],total_shards[5],source[{"query":

{"match_all":{}},"aggs":{"warming_aggs":{"terms":{"field":"tags"}}}}],

extra_source[],

Asyoucansee,intheprecedinglogline,wehavethequerytime,searchtype,andthequerysource,whichshowsustheexecutedquery.

Ofcourse,thevaluescanbedifferentinyourconfigurationbuttheslowlogcanbeavaluablesourceofthequeriesthathavebeenrunningtoolongandmayneedtohavesomewarmupdefined;maybetheseareparent-childqueriesandneedsomeidentifierstobefetchedtoperformbetter,ormaybeyouareusingafilterthatisexpensivewhenyouexecuteitforthefirsttime.

Thereisonethingyoushouldremember:don’toverloadyourElasticsearchclusterwithtoomanywarmingqueriesbecauseyoumayendupspendingtoomuchtimeinwarmingupinsteadofprocessingyourproductionqueries.

www.EBooksWorld.ir

www.EBooksWorld.ir

www.EBooksWorld.ir

IndexaliasingandusingittosimplifyyoureverydayworkWhenworkingwithmultipleindicesinElasticsearch,youcansometimeslosetrackofthem.Imagineasituationwhereyoustorelogsinyourindicesortime-baseddataingeneral.Usually,theamountofdatainsuchcasesisquitelargeand,therefore,itisagoodsolutiontohavethedatadividedsomehow.Alogicaldivisionofsuchdataisobtainedbycreatingasingleindexforasingledayoflogs(ifyouareinterestedinanopensourcesolutionusedtomanagelogs,lookattheLogstashfromtheElasticsearchsuiteathttps://www.elastic.co/products/logstash).

However,aftersometime,ifwekeepalltheindices,wewillstarthavingaproblemintakingcareofallthat.Anapplicationneedstotakecareofalltheinformation,suchaswhichindextosenddatato,whichtoquery,andsoon.Withthehelpofaliases,wecanchangethistoworkwithasinglenamejustaswewoulduseasingleindex,butwewillworkwithmultipleindices.

www.EBooksWorld.ir

AnaliasWhatisanindexalias?It’sanadditionalnameforoneormoreindicesthatallowsustousetheseindicesbyreferringtothemwiththoseadditionalnames.Asinglealiascanhavemultipleindicesaswellastheotherwayround;asingleindexcanbeapartofmultiplealiases.

However,pleaserememberthatyoucan’tuseanaliasthathasmultipleindicesforindexingorforreal-timeGEToperations.Elasticsearchwillthrowanexceptionifyoudothis.Wecanstilluseanaliasthatlinkstoonlyasingleindexforindexing,though.ThisisbecauseElasticsearchdoesn’tknowinwhichindexthedatashouldbeindexedorfromwhichindexthedocumentshouldbefetched.

www.EBooksWorld.ir

CreatinganaliasTocreateanindexalias,weneedtoruntheHTTPPOSTmethodtothe_aliasesRESTend-pointwithadefinedaction.Forexample,thefollowingrequestwillcreateanewaliascalledweek12thatwillincludetheindicesnamedday10,day11,andday12(weneedtocreatethoseindicesfirst):

curl-XPOST'localhost:9200/_aliases'-d'{

"actions":[

{"add":{"index":"day10","alias":"week12"}},

{"add":{"index":"day11","alias":"week12"}},

{"add":{"index":"day12","alias":"week12"}}

]

}'

Iftheweek12aliasisn’tpresentinourElasticsearchcluster,theprecedingcommandwillcreateit.Ifitispresent,thecommandwilljustaddthespecifiedindicestoit.

Wewouldrunasearchacrossthethreeindicesasfollows:

curl-XGET'localhost:9200/day10,day11,day12/_search?q=test'

Ifeverythinggoeswell,wecaninsteadrunitasfollows:

curl-XGET'localhost:9200/week12/_search?q=test'

Isn’tthisbetter?

Sometimeswehaveasetofindiceswhereeveryindexservesindependentinformationbutsomequeriesshouldgoacrossallofthem;forexample,wehavededicatedindicesforcountries(country_en,country_us,country_de,andsoon).Inthiscase,wewouldcreatethealiasbygroupingthemall:

curl-XPOST'localhost:9200/_aliases'-d'{

"actions":[

{"add":{"index":"country_*","alias":"countries"}}

]

}'

Thelastcommandcreatedonlyonealias.Elasticsearchallowsyoutorewritethistosomethinglessverbose:

curl-XPUT'localhost:9200/country_*/_alias/countries'

www.EBooksWorld.ir

ModifyingaliasesOfcourse,youcanalsoremoveindicesfromanalias.Wecandothissimilarlytohowweaddindicestoanalias,butinsteadoftheaddcommand,weusetheremoveone.Forexample,toremovetheindexnamedday9fromtheweek12index,wewillrunthefollowingcommand:

curl-XPOST'localhost:9200/_aliases'-d'{

"actions":[

{"remove":{"index":"day9","alias":"week12"}}

]

}'

www.EBooksWorld.ir

CombiningcommandsTheaddandremovecommandscanbesentasasinglerequest.Forexample,ifyouwouldliketocombineallthepreviouslysentcommandsintoasinglerequest,youwillhavetosendthefollowingcommand:

curl-XPOST'localhost:9200/_aliases'-d'{

"actions":[

{"add":{"index":"day10","alias":"week12"}},

{"add":{"index":"day11","alias":"week12"}},

{"add":{"index":"day12","alias":"week12"}},

{"remove":{"index":"day9","alias":"week12"}}

]

}'

www.EBooksWorld.ir

RetrievingaliasesInadditiontoaddingorremovingindicestoorfromaliases,weandourapplicationsthatuseElasticsearchmayneedtoretrieveallthealiasesavailableintheclusterorallthealiasesthatanindexisconnectedto.Toretrievethesealiases,wesendarequestusingtheHTTPGETcommand.Forexample,thefollowingcommandgetsallthealiasesfortheday10indexandthesecondonewillgetalltheavailablealiases:

curl-XGET'localhost:9200/day10/_aliases'

curl-XGET'localhost:9200/_aliases'

Theresponsefromthesecondcommandisasfollows:

{

"day12":{

"aliases":{

"week12":{}

}

},

"library":{

"aliases":{}

},

"day11":{

"aliases":{

"week12":{}

}

},

"day9":{

"aliases":{}

},

"day10":{

"aliases":{

"week12":{}

}

}

}

Youcanalsousethe_aliasendpointtogetallaliasesfromthegivenindex:

curl-XGET'localhost:9200/day10/_alias/*'

Togetaparticularaliasdefinition,youcanusethefollowing:

curl-XGET'localhost:9200/day10/_alias/day12'

www.EBooksWorld.ir

RemovingaliasesYoucanalsoremoveanaliasusingthe_aliasendpoint.Forexample,sendingthefollowingcommandwillremovetheclientaliasfromthedataindex:

curl-XDELETElocalhost:9200/data/_alias/client

www.EBooksWorld.ir

FilteringaliasesAliasescanbeusedinawaysimilartohowviewsareusedinSQLdatabases.YoucanuseafullQueryDSL(discussedindetailinChapter3,SearchingYourData)andhaveyourfilterappliedtoallcount,search,deletebyquery,andsoon.

Let’slookatanexample.Imaginethatwewanttohavealiasesthatreturndataforacertainclientsowecanuseitinourapplication.Let’ssaythattheclientidentifierweareinterestedinisstoredintheclientIdfieldandweareinterestedinthe12345client.So,let’screatethealiasnamedclientwithourdataindex,whichwillapplyaqueryforclientIdautomatically:

curl-XPOST'localhost:9200/_aliases'-d'{

"actions":[

{

"add":{

"index":"data",

"alias":"client",

"filter":{"term":{"clientId":12345}}

}

}

]

}'

Sowhenusingthedefinedalias,youwillalwaysgetyourrequestfilteredbyatermquerythatensuresthatallthedocumentshavethe12345valueintheclientIdfield.

www.EBooksWorld.ir

AliasesandroutingIntheIntroductiontoroutingsectionofChapter2,IndexingYourData,wetalkedaboutrouting.Similartoaliasesthatusefiltering,wecanaddroutingvaluestothealiases.Imaginethatweareusingroutingonthebasisofuseridentifierandwewanttousethesameroutingvalueswithouraliases.So,forthealiasnamedclient,wewillusetheroutingvaluesof12345,12346,and12347forquerying,andonly12345forindexing.Todothis,wewillcreateanaliasusingthefollowingcommand:

curl-XPOST'localhost:9200/_aliases'-d'{

"actions":[

{

"add":{

"index":"data",

"alias":"client",

"search_routing":"12345,12346,12347",

"index_routing":"12345"

}

}

]

}'

Thisway,whenweindexourdatausingtheclientalias,thevaluesspecifiedbytheindex_routingpropertywillbeused.Atthetimeofquerying,thevaluesspecifiedbythesearch_routingpropertywillbeused.

Thereisonemorething.Pleaselookatthefollowingquerysenttothepreviouslydefinedalias:

curl-XGET'localhost:9200/client/_search?q=test&routing=99999,12345'

Thevalueusedasaroutingvaluewillbe12345.ThisisbecauseElasticsearchwilltakethecommonvaluesofthesearch_routingattributeandthequeryroutingparameter,whichinourcaseis12345.

www.EBooksWorld.ir

ZerodowntimereindexingandaliasesOneofthegreatestadvantagesofusingaliasesistheabilitytore-indexthedatawithoutanydowntimefromthesystemusingElasticsearch.Toachievethis,youwouldneedtointeractwithyourindicesonlythroughaliases—bothforindexingandquerying.Insuchacase,youcanjustcreateanewindex,indexthedatahere,andswitchaliaseswhenneeded.Duringindexing,aliaseswouldstillpointtotheoldindex,sotheapplicationcouldworkasusual.

www.EBooksWorld.ir

www.EBooksWorld.ir

SummaryInthischapter,wediscussedElasticsearchadministration.WestartedbylearninghowtoperformbackupsofourindicesandhowtomonitorourclusterhealthandstateusingitsAPI.Wecontrolledclustershardrebalancingandlearnedhowtoadjustshardallocationaccordingtoourneeds.We’veusedtheCATAPItogetinformationaboutElasticsearchinhuman-readableformandwe’vewarmedupourqueriestomakethemfaster.Finally,we’veusedaliasestoallowabettermanagementofourindicesandtohavemoreflexibility.

Inthenextandfinalchapterofthebook,wewillfocusonahypotheticalonlinelibrarystoretoseehowtomakeElasticsearchworkinpractice.Wewillstartwithabriefintroductionandhardwareconsiderations.WewilltuneasingleinstanceofElasticsearchandproperlyconfigureourclusterbydiscussingeachofitspartsandprovidingaproperarchitecture.Wewillverticallyexpandtheclusterandprepareitforbothhighqueryingandhighindexingload.Finally,wewilllearnhowtomonitorsuchapreparedcluster.

www.EBooksWorld.ir

www.EBooksWorld.ir

Chapter11.ScalingbyExampleInthepreviouschapter,wediscussedElasticsearchadministration.WestartedwithdiscussionaboutbackupsandhowwecandothembyusingavailableAPI.Wemonitoredthehealthandstateofourclustersandnodesandwelearnedhowtocontrolshardrebalancing.WecontrolledtheshardandreplicasallocationandusedhumanfriendlyCatAPItogetinformationabouttheclusterandnodes.Wesawhowtousewarmerstospeeduppotentiallyheavyqueriesandweusedindexaliasingtomanageourindicesmoreeasily.Bytheendofthischapter,youwillhavelearnedthefollowingtopics:

HardwarepreparationsforrunningElasticsearchTuningasingleElasticsearchnodePreparinghighlyavailableandfaulttolerantclustersExpandingElasticsearchverticallyPreparingElasticsearchforhighqueryandindexingthroughputMonitoringElasticsearch

www.EBooksWorld.ir

HardwareOneofthefirstdecisionsthatweneedtomakewhenstartingeveryserioussoftwareprojectisasetchoicesrelatedtohardware.Andbelieveus,thisisnotonlyaveryimportantchoice,butalsooneofthemostdifficultones.Oftenthedecisionsaremadeatearlyprojectstages,whenonlythebasicarchitectureisknownandwedon’thavepreciseinformationregardingthequeries,dataload,andsoon.Projectarchitecthastobalanceprecautionandprojectedcostofthewholesolution.Toomanytimesitisanintersectionofexperienceandclairvoyance,whichcanleadtoeithergreatorterribleresults.

www.EBooksWorld.ir

PhysicalserversoracloudLet’sstartwithadecision:acloud,virtual,orphysicalmachines.Nowadays,theseareallvalidoptions,butitwasnotalwaysthecase.Sometimeagotheonlyoptionwastobuynewserversforeachenvironmentpartorshareresourceswiththeotherapplicationsonthesamemachine.Thesecondoptionmakesperfectsenseasitismorecost-effectivebutintroducesrisk.Problemswithoneapplication,especiallywhentheyarehardwarerelated,willresultinproblemsforanotherapplication.YoucanimagineoneofyourapplicationsusingmostoftheI/OsubsystemofthephysicalmachineandalltheotherapplicationsstrugglingwithlotsofI/Owaitsandperformanceproblemsbecauseofthat.Virtualizationpromisesapplicationseparationandamoreconvenientwayofmanagingresources,butyouarestilllimitedbytheunderlyinghardware.Everyunexpectedtrafficcouldbeaproblemandaffectserviceavailability.Imaginethatyourecommercesitesuddenlygainsmassivenumberofcustomers.Insteadofbeinggladthatthespikeappearedandyouhavemorepotentialcustomers,yousearchforaplacewhereyoucanbuyadditionalhardwarethatwillbesuppliedassoonaspossible.

Cloudcomputingontheotherhandmeansamoreflexiblecostmodel.Wecaneasilyaddnewmachineswheneverweneed.Wecanaddthemtemporarilywhenweexpectagreaterload(forexample,beforeChristmasforanecommercesite)andpayonlyfortheactuallyusedprocessingpower.Itisjustafewclicksintheadminpanel.Evenmore,wecanalsosetupautomaticscaling,thatisnewvirtualmachinescanappearautomaticallywhenweneedthem.Cloud-basedsoftwarecanalsoshutthemdownwhenwedonotneedthemanymore.Thecloudhasmanyadvantages,suchaslowerinitialcost,abilitytoeasilygrowyourbusiness,andinsensitivitytotemporalfluctuationsofresourcerequirements,butitalsohasseveralflaws.Thecostsofcloudserversrisefasterthanthatofphysicalmachines.Also,massstorage,althoughpracticallyunlimited,hasworsecharacteristics(numberofoperationsperseconds)thanphysicalservers.Thisissometimesagreatproblemforus,especiallywithdiskbasedstoragesuchasElasticsearch.

Inpractice,asusual,thechoicecanbehardbutgoingthroughafewpointscanhelpyouwithyourdecision:

Businessrequirementsmaydirectlypointforyourownservers;forexample,someproceduresrelatedtofinancialormedicaldataautomaticallyexcludecloudsolutionshostedbythird-partyvendorsForproofofconceptandlow/mediumloadservices,thecloudcanbeagoodchoicebecauseofsimplicity,scalability,andlowcostSolutionswithstrongrequirementsconnectedwithI/OsubsystemswillprobablyworkbetteronbaremetalmachineswhereyouhavegreaterinfluencewhatstoragetypeisavailabletoyouWhenthetrafficcangreatlychangewithinashorttime,thecloudisaperfectplaceforyou

Forthepurposeoffurtherdiscussion,let’sassumethatwewanttobuyourownservers.Weareinthecomputerstorenowandlet’sbuysomething!

www.EBooksWorld.ir

CPUInmostcases,thisistheleastimportantpart.YoucanchooseanymodernCPUmodelbutyoushouldknowthatmorenumberofcoresmeansahighernumberofconcurrentqueriesandindexingthreads.Thatwillleadtobeingabletoindexdatafaster,especiallywithcomplicatedanalysisandlotsofmerges.

www.EBooksWorld.ir

RAMmemoryMoregigabytesofRAMisalwaysbetterthanlessgigabytesofRAM.Memoryisnecessary,especiallyforaggregationandsorting.Itislessofaproblemnow,withElasticsearch2.0anddocvalues,butstillcomplicatedquerieswithlotsofaggregationrequirememorytoprocessthedata.Memoryisalsousedforindexingbuffersandcanleadtoindexingspeedimprovements,becausemoredatacanbebufferedinmemoryandthusdiskswillbeusedlessfrequently.Ifyoutrytousemorememorythanavailable,theoperatingsystemwillusetheharddisksastemporaryspace(itstartsswapping)andyoushouldavoidthisatallcost.NotethatyoushouldnevertrytoforceElasticsearchtouseasmuchaspossiblememory.ThefirstreasonisJavagarbagecollector–lessmemoryismoreGCfriendly.Thesecondreasonisthattheunusedmemoryisactuallyusedbytheoperatingsystemforbuffersanddiskcache.Infact,whenyourindexcanfitinthisspace,alldataisreadfromthesecachesandnotfromthedisksdirectly.Thiscandrasticallyimprovetheperformance.Bydefault,ElasticsearchandtheI/OsubsystemsharethesameI/Ocache,whichgivesanotherreasontoleaveevenmorememoryfortheoperatingsystemitself.

Inpractice,8GBisthelowestrequirementformemory.ItdoesnotmeanthatElasticsearchwillneverworkwithlessmemory,butformostsituationsanddataintensiveapplications,itisthereasonableminimum.Ontheotherhand,morethan64GBisrarelyneeded.Inlieu,thinkaboutscalingthesystemhorizontallyinsteadofassigningsuchamountsofmemorytoasingleElasticsearchnode.

www.EBooksWorld.ir

MassstorageWesaidthatweareinagoodsituationwhenthewholeindexfitsintomemory.Inpracticethiscanbedifficulttoachieve,sogoodandfastdisksareveryimportant.Itisevenmoreimportantifoneoftherequirementsishighindexingthroughput.Insuchacase,youmayconsiderfastSSDdisks.Unfortunately,thesedisksareexpensiveifyourdatavolumeisbig.YoucanimprovethesituationbyavoidingusingRAID(seehttps://en.wikipedia.org/wiki/RAID),exceptRAID0.Inmostcases,whenyouhandlefaulttolerancebyhavingmultipleservers,theadditionallevelofsecurityontheRAIDlevelisunnecessary.Thelastthingistoavoidusingexternalstorage,suchasnetworkattachedstorage(NAS)orNFSvolumes.Thenetworklatencyinsuchcasesalwayskillsalltheadvantagesofthesesolutions.

www.EBooksWorld.ir

ThenetworkWhenyouuseElasticsearchcluster,eachnodeopensseveralconnectionstoothernodesforvarioususes.Whenyouindex,thedataisforwardedtodifferentshardsandreplicas.Whenyouqueryfordata,thenodeusedforqueryingcanrunmultiplepartialqueriestotheothernodesandcomposereplyfromthedatafetchedfromtheothernodes.Thisiswhyyoushouldmakesurethatyournetworkisnotthebottleneck.Inpractice,useonenetworkforalltheserversintheclusterandavoidsolutionsinwhichthenodesintheclusterarespreadbetweendatacenters.

www.EBooksWorld.ir

HowmanyserversTheanswerisalwaysthesame,asitdepends.Itdependsonmanyfactors:thenumberofrequestperseconds,thedatavolume,thelevelofthequery’scomplexity,theaggregationsandsortingusage,thenumberofnewdocumentsperunitoftime,howfastnewdatashouldbeavailableforsearching(therefreshtime),theaveragedocumentsize,andtheanalyzersused.Inpractice,thehandiestansweris-testitandapproximate.

Theonethingthatisoftenunderestimatedisdatasecurity.Whenyouthinkaboutfaulttoleranceandavailability,youshouldstartfromthreeservers.Why?WetalkedaboutthesplitbrainsituationintheMasterelectionconfigurationsectionofChapter9,ElasticsearchClusterinDetail.StartingfromthreeserversweareabletohandleasingleElasticsearchnodefailurewithouttakingdownthewholecluster.

www.EBooksWorld.ir

CostcuttingYoudidsometests,consideredcarefullyplannedfunctionalities,estimatedvolumesandload,andwenttotheprojectownerwithanarchitecturedraft.“Itstooexpensive”,hesaidandaskedyoutothinkaboutserversonceagain.Whatcanwedo?

Let’sthinkaboutserverrolesandtrytointroducesomedifferencesbetweenthem.Ifoneoftherequirementsisindexingmassiveamountsofdataconnectedwithtime(maybelogs),thepossiblewayishavingtwogroupsofservers:hotnodes,whennewdataarrives,andcoldnodes,whenolddataismoved.Thankstothisapproach,hotnodesmayhavefasterbutsmallerdisks(thatis,solidstatedrives)inoppositetothecoldnodes,whenfastdisksarenotsoimportantbutspaceis.Youcanalsodivideyourarchitectureintoseveralgroupsasmasterservers(lesspowerful,withrelativlysmalldisks),datanodes(biggerdisks),andqueryaggregatornodes(moreRAM).Wewilltalkaboutthisinthefollowingsections.

www.EBooksWorld.ir

www.EBooksWorld.ir

PreparingasingleElasticsearchnodeWhenwetalkaboutverticalscaling,weoftenmeanaddingmoreresourcestotheserverElasticsearchisrunningon.WecanaddmemoryorwecanswitchtoamachinewithabetterCPUorfasterdiskstorage.Ofcourse,withbettermachineswecanexpectanincreaseinperformance;dependingonourdeploymentanditsbottlenecks,itcanbeasmallorlargeimprovement.However,therearelimitationswhenitcomestoverticalscaling.Forexample,oneofthelimitationsisthemaximumamountofphysicalmemoryavailableforyourserversorthetotalmemoryrequiredbytheJVMtooperate.Whenhavinglargedataandcomplicatedqueries,youcanverysoonrunintomemoryissuesandaddingnewmemorymaynothelpatall.Inthissection,wewilltrytogiveyougeneraladviceonwheretolookandwhattotunewhenitcomestoasingleElasticsearchnode.

Thethingtorememberwhentuningyoursystemisperformancetests,onesthatcanberepeatedunderthesamecircumstances.Onceyoumakeachange,youneedtobeabletoseehowitaffectstheoverallperformance.Inadditiontothat,Elasticsearchscalesgreat.Usingthatknowledge,wecanrunperformancetestsonasinglemachine(orafewofthem)andextrapolatetheresults.Suchobservationsmaybeagoodstartingpointforfurthertuning.

Alsokeepinmindthatthissectiondoesn’tcontainadeepdiveintoallperformancerelatedtopics,butisdedicatedtoshowingyouthemostcommonthings.

www.EBooksWorld.ir

ThegeneralpreparationsApartfromallthethingswewilldiscussinthissection,therearethreemajor,operatingsystemrelatedthingsyouneedtoremember:thenumberofallowedfiledescriptors,thevirtualmemory,andavoidingswapping.

NotethatthefollowingsectioncontainsinformationforLinuxoperatingsystems,butyoucanalsoachievesimilaroptionsonMicrosoftWindows.

AvoidingswappingLet’sstartwiththethirdone.ElasticsearchandJavaVirtualMachinebasedapplications,ingeneral,don’tliketobeswapped.Thismeansthattheseapplicationsworkbestiftheoperatingsystemdoesn’tputthememorythattheyuseintheswapspace.Thisisverysimple,because,toaccesstheswappedmemory,theoperatingsystemwillhavetoreaditfromthedisk,whichisslowandwhichwouldaffecttheperformanceinaverybadway.

Ifwehaveenoughmemory,andweshouldhaveifwewantourElasticsearchinstancetoperformwell,wecanconfigureElasticsearchtoavoidswapping.Todothat,wejustneedtomodifytheelasticsearch.ymlfileandincludethefollowingproperty:

bootstrap.mlockall:true

Thisisoneoftheoptions.Thesecondoneistosetthepropertyvm.swappinessinthe/etc/sysctl.conffileto0(forcompleteswapdisabling)or1forswappingonlyinemergency(forKernelversions3.5andabove).

Thethirdoptionistodisableswappingbyediting/etc/fstabandremovingthelinesthatcontaintheswapword.Thefollowingisanexample/etc/fstabcontent:

LABEL=cloudimg-rootfs/ext4defaults,discard00

/dev/xvdbswapswapdefaults00

Todisableswappingwewouldjustremovethesecondlinefromtheabovecontents.Wecouldalsorunthefollowingcommandtodisableswapping:

sudoswapoff-a

However,rememberthatthiseffectwon’tpersistbetweenloggingoffandbackintothesystem,sothisisonlyatemporarysolution.

Also,rememberthatifyoudon’thaveenoughmemorytorunElasticsearch,theoperatingsystemwilljustkilltheprocesswhenswappingisdisabled.

FiledescriptorsMakesureyouhaveenoughlimitsrelatedtofiledescriptorsfortheuserrunningElasticsearch(wheninstallingfromofficialpackages,thatuserwillbecalledelasticsearch).Ifyoudon’t,youmayendupwithproblemswhenElasticsearchtriestoflushthedataandcreatenewsegmentsormergesegmentstogether,whichcanresultinindexcorruption.

Toadjustthenumberofallowedfiledescriptors,youwillneedtoadjustthewww.EBooksWorld.ir

/etc/security/limits.conffile(atleastonmostcommonLinuxsystems)andadjustoraddanentryrelatedtoagivenuser(forbothsoftandhardlimits).Forexample:

elasticsearchsoftnofile65536

elasticsearchhardnofile65536

Itisadvisedtosetthenumberofallowedfiledescriptorstoatleast65536,butevenmorecanbeneeded,dependingonyourindexsize.

OnsomeLinuxsystems,youmayalsoneedtoloadanappropriatelimitsmodulefortheprecedingsettingtotakeeffect.Toloadthatmodule,youneedtoadjustthe/etc/pam.d/loginfileandaddoruncommentthefollowingline:

sessionrequiredpam_limits.so

ThereisalsoapossibilitytodisplaythenumberoffiledescriptorsavailableforElasticsearchbyaddingthe-Des.max-open-files=trueparametertoElasticsearchstartupparameters.Forexample,likethis:

bin/elasticsearch-Des.max-open-files=true

Whendoingthat,Elasticsearchwillincludeinformationaboutthefiledescriptorsinthelogs:

[2015-12-2000:22:19,869][INFO][bootstrap]max_open_files

[10240]

VirtualmemoryElasticsearch2.2useshybriddirectoryimplementation,whichisacombinationofmmapfsandniofsdirectories.Becauseofthat,especiallywhenyourindicesarelarge,youmayneedalotofvirtualmemoryonyoursystem.Bydefault,theoperatingsystemlimitstheamountofmemorymappedfilesandthatcancauseerrorswhenrunningElasticsearch.Becauseofthat,werecommendincreasingthedefaultvalues.Todothat,youjustneedtoeditthe/etc/sysctl.conffileandsetthevm.max_map_countproperty;forexample,toavalueequalto262144.

Youcanalsochangethevaluetemporarilybyrunningthefollowingcommand:

sysctl-wvm.max_map_count=262144

www.EBooksWorld.ir

ThememoryBeforethinkingaboutElasticsearchconfigurationrelatedthings,weshouldrememberaboutgivingenoughmemorytoElasticsearch.Ingeneral,weshouldn’tgivemorethan50-60percentofthetotalavailablememorytotheJVMprocessrunningElasticsearch.WedothatbecausewewanttoleavememoryfortheoperatingsystemandfortheoperatingsystemI/Ocache.However,weneedtorememberthatthe50-60percentfigureisnotalwaystrue.Youcanimaginehavingnodeswith256GBofRAMandhavingindicesof30GBintotalonsuchanode.Insuchcircumstances,evenassigningmorethan60percentofphysicalRAMtoElasticsearchwouldleaveplentyofRAMfortheoperatingsystem.ItisalsoagoodideatosettheXmxandXmspropertiestothesamevaluestoavoidJVMheapsizeresizing.

Anotherthingtorememberarethesocalledcompressedoops(http://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html#compressedOop),theordinaryobjectpointers.Javavirtualmachinecanbetoldtousethembyaddingthe-XX:+UseCompressedOopsswitch.ThisallowsJavavirtualmachinetouselessmemorytoaddresstheobjectsontheheap.However,thisisonlytrueforheapsizeslessthanorequalto31GB.Goingforalargerheapmeansnocompressedoopsandhighermemoryusageforaddressingtheobjectsontheheap.

www.EBooksWorld.ir

FielddatacacheandbreakingthecircuitAsweknow,bydefaultthefielddatacacheinElasticsearchisunbounded.Thiscanbeverydangerous,especiallywhenyouareusingaggregationsandsortingonmanyfieldsthatareanalysed,becausetheydon’tusedocvaluesbydefault.Ifthosefieldsarehighcardinalityones,thenyoucanrunintoevenmoretrouble.Bytroublewemeanrunningoutofmemory.

Wehavetwodifferentfactorswecantunetobesurethatwedon’trunintooutofmemoryerrors.Firstofall,wecanlimitthesizeofthefielddatacacheandweshoulddothat.Thesecondthingisthecircuitbreaker,whichwecaneasilyconfiguretojustthrowexceptionsinsteadofloadingtoomuchdata.Combiningthesetwothingstogetherwillensurethatwedon’trunintomemoryissues.

However,weshouldalsorememberthatElasticsearchwillevictdatafromthefielddatacacheifitssizeisnotenoughtohandleaggregationrequestsorsorting.Thiswillaffectthequeryperformancebecauseloadingthefielddatainformationisnotveryefficientandisresourceintensive.However,inouropinion,itisbettertohaveourqueriesslowerthanhavingourclusterblownupbecauseofoutofmemoryerrors.

ThefielddatacacheandcachesingeneralwerediscussedintheElasticsearchcachessectionofChapter9,ElasticsearchClusterinDetail.

www.EBooksWorld.ir

UsedocvaluesWheneveryouplantousesorting,aggregations,orscriptingheavily,youshouldusedocvalueswheneveryoucan.Thiswillnotonlysaveyouthememoryneededforthefielddatacache,becauseoffewerobjectsproduced,itwillalsomaketheJavavirtualmachineworkbetterwithlowergarbagecollectortime.DocvalueswerediscussedintheMappingsConfigurationsectionofChapter2,IndexingYourData.

www.EBooksWorld.ir

RAMbufferforindexingIntheElasticsearchcachessectionofChapter9,ElasticsearchClusterinDetail,wealsodiscussed.Thereareafewthingswewouldliketomention.Firstofall,themoreRAMfortheindexingbuffer,themoredocumentsElasticsearchwillbeabletoholdinmemory.Sothemorememorywehaveforindexing,thelessoftentheflushtodiskwillhappenandfewersegmentswillbecreated.Becauseofthat,yourindexingwillbefaster.Butofcourse,wedon’twantElasticsearchtooccupy100percentoftheavailablememory.KeepinmindthattheRAMbuffersaresetpershard,sotheamountofmemorythatwillbeuseddependsonthenumberofshardsandreplicasthatareassignedonthegivennodeandonthenumberofdocumentsyouindex.Youshouldsettheupperlimitssoyournodedoesn’tblowupwhenithasmultipleshardsassigned.

www.EBooksWorld.ir

IndexrefreshrateElasticsearchusesLuceneandweknowitbynow.ThethingwithLuceneisthattheviewoftheindexisnotrefreshedwhennewdataisindexedorsegmentsarecreated.Toseethenewlyindexeddata,weneedtorefreshtheindex.Bydefault,Elasticsearchdoesthatonceeverysecondandtheperiodofrefreshiscontrolledbyusingtheindex.refresh_intervalproperty,specifiedperindex.Thelowertherefreshrate,thesoonerthedocumentswillbevisibleforsearchoperations.However,thatalsomeansthatElasticsearchwillneedtoputmoreresourcesintorefreshingtheindexview,meaningthattheindexingandsearchingoperationswillbeslower.Thehighertherefreshrate,themoretimeyouwillhavetowaitbeforebeingabletoseethedatainthesearchresults,butyourindexingandqueryingwillbefaster.

www.EBooksWorld.ir

ThreadpoolsWehaven’ttalkedaboutthreadpoolsuntilnow,butwewouldliketomentionthemnow.EachElasticsearchnodeholdsseveralthreadpoolsthatcontroltheexecutionqueuesforoperationssuchasindexingorquerying.Elasticsearchusesseveralpoolstoallowcontroloverhowthethreadsarehandledandmuchthememoryconsumptionisallowedforuserrequests.

NoteJavavirtualmachineallowsapplicationstousemultiplethreads-concurrentlyrunningmultipleapplicationtasks.FormoreinformationaboutJavathreads,refertohttp://docs.oracle.com/javase/7/docs/api/java/lang/Thread.html.

Therearemanythreadpools(wecanspecifythetypeweareconfiguringbyspecifyingthetypeproperty).However,forperformance,themostimportantare:

generic:Thisisthethreadpoolforgenericoperations,suchasnodediscovery.Bydefault,thegenericthreadpoolisoftypecached.index:Thisisthethreadpoolusedforindexinganddeletingoperations.Itstypedefaultstofixed,itssizetothenumberofavailableprocessors,andthesizeofthequeueto200.search:Thisisthethreadpoolusedforsearchandcountrequests.Itstypedefaultstofixedanditssizetothenumberofavailableprocessorsmultipliedby3anddividedby2,withthesizeofthequeuedefaultingto1000.suggest:Thisisthethreadpoolusedforsuggestrequests.Itstypedefaultstofixed,itssizetothenumberofavailableprocessors,andthesizeofthequeueto1000.get:Thisisthethreadpoolusedforrealtimegetrequests.Itstypedefaultstofixed,itssizetothenumberofavailableprocessors,andthesizeofthequeueto1000.bulk:Asyoucanguess,thisisthethreadpoolusedforbulkoperations.Itstypedefaultstofixed,itssizetothenumberofavailableprocessors,andthesizeofthequeueto50.percolate:Thisisthethreadpoolforpercolationrequests.Itstypedefaultstofixed,itssizetothenumberofavailableprocessors,andthesizeofthequeueto1000.

NoteBeforeElasticsearch2.1,wecouldcontrolthetypeofthethreadpool.StartingwithElasticsearch2.1wecannolongerdothat.Formoreinformationpleaserefertotheofficialdocumentation-https://www.elastic.co/guide/en/elasticsearch/reference/2.1/breaking_21_removed_features.html

Forexample,ifwewanttoconfigurethethreadpoolforindexingoperationstohaveasizeof100andaqueueof500,wewillsetthefollowingintheelasticsearch.ymlconfigurationfile:

threadpool.index.size:100

threadpool.index.queue_size:500

www.EBooksWorld.ir

AlsorememberthatthethreadpoolconfigurationcanbeupdatedusingtheclusterupdateAPI.Forexample,likethis:

curl-XPUT'localhost:9200/_cluster/settings'-d'{

"transient":{

"threadpool.index.size":100,

"threadpool.index.queue_size":500

}

}'

Ingeneral,youdon’tneedtoworkwiththethreadpoolsandtheirconfiguration.However,whenconfiguringyourcluster,youmaywanttoputmoreemphasisonindexingorqueryingand,insuchcases,givingmorethreadsorlargerqueuestotheprioritizedoperationmayresultinmoreresourcesbeingusedforsuchoperations.

www.EBooksWorld.ir

www.EBooksWorld.ir

HorizontalexpansionElasticsearchisahighlyscalablesearchandanalyticsplatform.Wecanscaleitbothhorizontallyandvertically.WediscussedhowtotuneasinglenodeinthePreparingasingleElasticsearchnodesectionearlierinthischapterandwewouldliketofocusonhorizontalscalingnow;howtohandlemultiplenodesinthesamecluster,whatrolesshouldtheyhave,andhowtotunetheconfigurationtohaveahighlyreliable,available,andfaulttolerantcluster.

Youcanimagineverticalscalinglikebuildingaskyscrapper–wehavelimitedspaceavailableandweneedtogoashighaswecan.Ofcourse,thatisexpensiveandrequiresalotofengineeringdoneright.Ontheotherhand,wehavehorizontalscaling,whichislikehavingmanyhousesinaresidentialarea.Insteadofinvestingintohardwareandhavingpowerfulmachines,wechoosetohavemultiplemachinesandourdatasplitbetweenthem.Horizontalscalinggivesusvirtuallyunlimitedscalingpossibilities.Evenwiththemostpowerfulhardware,thetimecomeswhenasinglemachineisnotenoughtohandlethedata,thequeries,orbothofthem.Insuchcases,spreadingthedataamongmultipleserversiswhatsavesusandallowsustohaveterabytesofdatainmultipleindicesspreadacrossthewholecluster,justliketheoneinthefollowingimage:

Wehaveour4nodesclusterwiththelibraryindexcreatedandbuiltoffourshards.

Ifwewanttoincreasethequeryingcapabilitiesofourcluster,wecanjustaddadditionalnodes,forexample,fourofthem.Afteraddingnewnodestothecluster,wecaneithercreatenewindicesthatwillbebuiltofmoreshardstospreadtheloadmoreevenlyoraddreplicastothealreadyexistingshards.Bothoptionsareviable.Thisisbecausewedon’thavethepossibilityofsplittingshardsoraddingmoreprimaryshardstoanexistingindex.Weshouldgoforhavingmoreprimaryshardswhenourhardwareisnotenoughtohandletheamountofdataitholds.Insuchcases,weusuallyrunintooutofmemorysituations,longshardqueryexecutiontime,swapping,orhighI/Owaits.Thesecondoption,thatishavingreplicas,isthewaytogowhenourhardwareishappilyhandlingthedatawehavebutthetrafficissohighthatthenodesjustcan’tkeepup.

www.EBooksWorld.ir

Thefirstoptionissimple,butlet’slooksatthesecondcase-havingmorereplicas.Sowithfouradditionalnodes,ourclusterwouldlookasfollows:

Now,let’srunthefollowingcommandtoaddasinglereplica:

curl-XPUT'localhost:9200/library/_settings'-d'{

"index":{

"number_of_replicas":1

}

}'

Ourclusterviewwouldlookmoreorlessasfollows:

Asyoucansee,eachoftheinitialshardsbuildingthelibraryindexhasasinglereplicastoredonanothernode.ThenicethingaboutshardsandtheirreplicasisthatElasticsearchissmartenoughtobalancetheshardsinasingleindexandputthemonseparatenodes.Forexample,youwon’teverendupinasituationwhereyouhaveashardanditsreplicasonthesamenode.Also,Elasticsearchisabletoroundrobinthequeriesbetweentheshardsandtheirreplicas,whichmeansthatallthenodeswillbehitbythequeriesandwedon’thavetocareaboutthat.Becauseofthat,weareabletohandlealmostdoublethequeryloadcomparedtoourinitialdeployment.

www.EBooksWorld.ir

AutomaticallycreatingthereplicasLet’sstayabitlongeraroundreplicas.Elasticsearchallowsustoautomaticallyexpandreplicaswhentheclusterisbigenough.Thismeansthatthereplicascanbecreatedautomaticallywhennewnodesareaddedtothecluster.Youcanwonderwheresuchfunctionalitycanbeuseful.Imagineasituationwhereyouhaveasmallindexthatyouwouldliketobepresentoneverynodesothatyourpluginsdon’thavetorundistributedqueriesjusttogetthedatafromit.Inadditiontothat,yourclusterisdynamicallychanging,thatisyouaddandremovenodesfromit.ThesimplestwaytoachievesuchfunctionalityistoallowElasticsearchtoautomaticallyexpandthereplicas.Todothat,weneedtosetindex.auto_expand_replicasto0-all,whichmeansthattheindexcanhave0replicasorbepresentonallthenodes.SoifoursmallindexiscalledshopsandwewouldlikeElasticsearchtoautomaticallyexpanditsreplicastoallthenodesinthecluster,wewouldusethefollowingcommandtocreatetheindex:

curl-XPOST'localhost:9200/shops/'-d'{

"settings":{

"index":{

"auto_expand_replicas":"0-all"

}

}

}'

Wecanalsoupdatethesettingsofthatindexifitisalreadycreatedbyrunningthefollowingcommand:

curl-XPUT'localhost:9200/shops/_settings'-d'{

"index":{

"auto_expand_replicas":"0-all"

}

}'

www.EBooksWorld.ir

RedundancyandhighavailabilityTheElasticsearchreplicationmechanismnotonlygivesusabilitytohandlehigherquerythroughput,butalsogivesusredundancyandhighavailability.ImagineanElasticsearchclusterhostingasingleindexcalledlibrarythatisbuiltof2shardsand0replicas.Suchaclusterwouldlookasfollows:

Nowwhathappenswhenoneofthenodesfail?Thesimplestansweristhatweloseabout50percentofthedataand,ifthefailureisfatal,welosethatdataforever.Evenwhenhavingbackups,wewouldneedtospinupanothernodeandrestorethebackupandthattakestime.Duringthattime,yourapplication,orpartsofitthatarebasedonElasticsearch,can’tworkcorrectly.IfyourbusinessreliesonElasticsearch,downtimemeansmoneyloss.Ofcourse,wecanusereplicastocreatemorereliableclustersthatcanhandlethehardwareandsoftwarefailures.Andonethingtorememberisthateverythingwillfaileventually–ifthesoftwarewon’t,hardwarewill.Forexample,sometimeagoGooglesaidthatineachoftheirclusters,duringthefirstyearatleast1000machineswillfail(youcanreadmoreonthattopicathttp://www.cnet.com/news/google-spotlights-data-center-inner-workings/).Becauseofthat,weneedtobereadytohandlesuchcases.

Let’slookatthesameclusterbutwithonereplica:

NowlosingasingleElasticsearchnodemeansthatwestillhavethewholedataavailableandwecanworkonrestoringthefullclusterstructurewithoutdowntime.Ofcourse,thisisonlyaverysmallclusterbuiltoftwoElasticsearchnodesclusters.Thelargerthecluster,themorereplicas,themorefailureyouwillbeabletohandlewithoutworryingaboutthedataloss.Ofcourseyouwillhavelowerperformance,dependingonthepercentageofnodesthatfail,butthedatawillstillbethereandtheclusterwillbeoperational.

That’swhy,whendesigningyourarchitectureanddecidingonthenumberofnodesandindicesandtheirarchitecture,youshouldtakeintoconsiderationhowmanynodes,failure

www.EBooksWorld.ir

youwanttolivewith.Ofcourse,youcan’tforgetabouttheperformancepartoftheequation,butredundancyandhighavailabilityshouldbeoneofthefactorsofthescalingequation.

www.EBooksWorld.ir

CostandperformanceflexibilityThedefaultdistributednatureofElasticsearchanditsabilitytoscalehorizontallyallowsustobeflexiblewhenitcomestoperformanceandcoststhatwehavewhenrunningourenvironment.Firstofall,highendserverswithhighperformancedisks,numerousCPUcores,andalotofRAMarestillexpensive.Inadditiontothat,cloudcomputingisgettingmoreandmorepopularandifyouneedalotofflexibilityanddon’twanttohaveyourownhardware,youcanchoosesolutionssuchasAmazon(http://aws.amazon.com/),Rackspace(http://www.rackspace.com/),DigitalOcean(https://www.digitalocean.com/),andsoon.Theydonotonlyallowustorunoursoftwareonrentedmachines,butalsoallowustoscaleondemand.Wejustneedtoaddmoremachineswhichisafewclicksawayorcanevenbeautomatedwithsomedegreeofwork.

Usingahostedsolutionwithoneclickmachinerentingallowshavingatrulyhorizontallyscalablesolution.Ofcourse,that’snotcheap–youpayfortheflexibility.Butwecaneasilysacrificeperformanceifcostsarethemostcrucialfactorinourbusinessplan.Ofcourse,wecanalsogotheotherway.Ifwecanaffordlargebaremetalmachines,Elasticsearchclusterscanbepushedtohundredsofterabytesofdatastoredintheindicesandstillgetdecentperformance(ofcoursewithaproperhardwareandpropertydistributed).

www.EBooksWorld.ir

ContinuousupgradesHighavailability,costandperformanceflexibility,andvirtuallyendlessgrowtharenottheonlythingsworthtalkingaboutwhendiscussingthescalabilitysideofElasticsearch.Atsomepointintime,youwillwanttohaveyourElasticsearchclusterupgradedtoanewversion.Itcanbebecauseofbugfixes,performanceimprovements,newfeatures,oranythingthatyoucanthinkof.Thethingisthatwhenyouhaveasingleinstanceofeachshard,withoutreplicas,anupgrademeansunavailabilityofElasticsearch(oratleastitsparts)andthatmaymeandowntimeoftheapplicationsthatuseElasticsearch.Thisisanotherreasonwhyhorizontalscalingissoimportant;youcanperformupgrades,atleasttothepointwheresoftwaresuchasElasticsearchsupports.Forexample,youcantakeElasticsearch2.0andupgradetoElasticsearch2.1withonlyrollingrestarts(gettingonenodeoutofthecluster,upgradingit,bringingitback,andcontinuingwiththenextnodeuntilallthenodesaredone),thushavingallthedatastillavailableforsearchingandindexinghappeningatthesametime.

www.EBooksWorld.ir

MultipleElasticsearchinstancesonasinglephysicalmachineHavingalargephysicalmachinewithlotofmemoryandCPUcoreshasadvantagesandsomechallenges.Firstofall,ifyoudecidetorunasingleElasticsearchnodeonthatmachine,youwillsoonerorlaterrunintogarbagecollectionissues,youwillhavelotsofshardsonasinglenodewhichwillrequireahighnumberofI/OoperationsfortheinternalElasticsearchcommunication(retrievingclusterstatistics),andsoso.What’smore,youusuallyshouldn’tgoabove31GBofheapmemoryforasingleJVMprocessbecauseyoucan’tusecompressedordinaryobjectpointers(https://docs.oracle.com/javase/7/docs/technotes/guides/vm/performance-enhancements-7.html).

Insuchcases,youcaneitherrunmultipleElasticsearchinstancesonthesamebaremetalmachine,runmultiplevirtualmachinesandasingleElasticsearchinsideeachone,orrunElasticsearchinacontainer,suchasDocker(http://www.docker.com/).Thisisoutofthescopeofthebook,but,becausewearetalkingaboutscaling,wethoughtitmaybeagoodthingtomentionwhatcanbedoneinsuchcases.

NoteThereisalsothepossibilityofrunningmultipleElasticsearchserversonasinglephysicalmachinewithoutrunningmultiplevirtualmachines.Whichroadtotake-virtualmachinesormultipleinstances-isreallyyourchoice.However,weliketokeepthingsseparateandbecauseofthatweusuallygofordividinganylargeserverintomultiplevirtualmachines.Whendividingonelargeserverintomultiplesmallervirtualmachines,rememberthattheI/Osubsystemwillbesharedacrossthosesmallervirtualmachines.Becauseofthat,itmaybegoodtowiselydividethedisksbetweenthevirtualmachines.

PreventingashardanditsreplicasfrombeingonthesamenodeThereisoneadditionalthingworthmentioning.Whenyouhavemultiplephysicalserversdividedintovirtualmachines,itiscrucialtoensurethattheshardanditsreplicadon’tenduponthesamephysicalmachine.Bydefault,ElasticsearchissmartenoughtonotputtheshardanditsreplicaonthesameElasticsearchinstance,butitdoesn’tknowanythingaboutbaremetalmachines,soweneedtotellit.WecantellElasticsearchtoseparatetheshardsandreplicasbyusingclusterallocationawareness.Inourpreviouscase,wehadthreephysicalservers.Let’scallthem:server1,server2,andserver3.

NowforeachElasticsearchonaphysicalserver,wedefinethenode.server_namepropertyandwesetittotheidentifieroftheserver(thenameofthepropertycanbeanythingwewant).Soforexample,forallElasticsearchnodesonthefirstphysicalserver,wewouldsetthefollowingpropertyintheelasticsearch.ymlconfigurationfile:

node.server_name:server1

Inadditiontothat,eachElasticsearchnode(nomatteronwhichphysicalserver)needstohavethefollowingpropertyaddedtotheelasticsearch.ymlconfigurationfile:

www.EBooksWorld.ir

cluster.routing.allocation.awareness.attributes:server_name

IttellsElasticsearchnottoputtheprimaryshardanditsreplicasonthenodeswiththesamevalueinthenode.server_nameproperty.ThisisenoughforusandElasticsearchwilltakecareoftherest.

www.EBooksWorld.ir

DesignatednoderolesforlargerclustersThereisonemorethingthatwewanttodiscussandemphasise.Whenitcomestolargeclusters,itisimportanttoassignrolestoallthenodesinthecluster.ThisallowsforatrulyfullyfaulttolerantandhighlyavailableElasticsearchcluster.TheroleswecanassigntoeachElasticsearchnodeareasfollows:

MastereligiblenodeDatanodeQueryaggregatornode

Bydefault,eachElasticsearchnodeisbothmastereligible(itcanserveasamasternode),canholddata,andworkasaqueryaggregatornode.Youmaywonderwhythatisneeded.Letusgiveyouasimpleexample:ifthemasternodeisunderalotofstress,itmaynotbeabletohandletheclusterstaterelatedcommandfastenoughandtheclustercouldbecomeunstable.Thisisonlyasingle,simpleexampleandyoucanthinkofnumerousothers.

Becauseofthat,mostElasticsearchclustersthatarelargerthanafewnodes,usuallylookliketheonepresentedinthefollowingpicture:

Asyoucansee,ourhypotheticalclustercontainsthreeclientnodes(becauseweknowthattherewillbealotofqueries),alargenumberofdatanodesbecausetheamountofdatawillbelarge,andatleastthreemastereligiblenodesthatshouldn’tbedoinganythingelse.WhythreemasternodeswhenElasticsearchwillonlyuseasingleoneatanygiventime?Again,becauseofredundancyandtobeabletopreventsplitbrainsituationsbysettingdiscovery.zen.minimum_master_nodesto2,whichwouldallowustoeasilyhandlethefailureofasinglemastereligiblenodeinthecluster.

Letusnowgiveyousnippetsoftheconfigurationforeachtypeofnodeinourcluster.WealreadytalkedaboutthatintheUnderstandingnodediscoverysectioninChapter9,ElasticsearchClusterinDetail,butwewouldliketomentionthatonceagain.

www.EBooksWorld.ir

QueryaggregatornodesThequeryaggregatornodesconfigurationisquitesimple.Toconfigurethose,wejustneedtotellElasticsearchthatwedon’twantthosenodestobemastereligibleortoholddata.Thiscorrespondstothefollowingconfigurationsnippetsintheelasticsearch.ymlfile:

node.master:false

node.data:false

DatanodesDatanodesarealsoverysimpletoconfigure.Wejustneedtotellthattheyshouldnotbemastereligible.However,wearenotbigfansofdefaultconfigurations(becausetheytendtochange)andthusourElasticsearchdatanodesconfigurationlooksasfollows:

node.master:false

node.data:true

MastereligiblenodesWe’veleftthemastereligiblenodestotheendofthegeneralscalingsection.Ofcourse,suchElasticsearchnodesshouldn’tbeallowedtoholddata,but,inadditiontothat,itisagoodpracticetodisableHTTPprotocolonsuchnodes.Thisisdonetoavoidaccidentallyqueryingthosenodes.Mastereligiblenodescanuselessresourcesthandataandqueryaggregatornodesandbecauseofthatweshouldensurethattheyareonlyusedformasterrelatedpurpose.Soourconfigurationformastereligiblenodeslooksmoreorlessasfollows:

node.master:true

node.data:false

http.enabled:false

www.EBooksWorld.ir

www.EBooksWorld.ir

PreparingtheclusterforhighindexingandqueryingthroughputUntilthischapter,wemostlytalkedaboutdifferentfunctionalitiesofElasticsearch,bothintermsofhandlingqueries,indexingdata,andtuning.However,runningaclusterinproductionisnotonlyaboutusingthisgreatsearchengine,butalsoaboutpreparingtheclustertohandleboththeindexingandqueryingload.Let’snowsummarizetheknowledgewehaveandseewhatarethethingsweneedtocareaboutwhenitcomestopreparingtheclusterforhighindexingandqueryingthroughput.

www.EBooksWorld.ir

IndexingrelatedadviceInthissection,wewilllookattheindexingrelatedadvicearoundtuningElasticsearch.Eachproductionenvironmentdataisdifferent,indexrateisdifferent,anduser’sbehaviorisdifferent.Takethatintoconsiderationandrunperformancetestsonyourenvironment.Thiswillgiveyouthebestideaaboutwhattoexpectandwhatworksthebestinthecaseofyoursystem.

IndexrefreshrateOneofthegeneralthingsyoushouldpayattentiontoistheindexrefreshrate.Weknowthatrefreshratespecifieshowfastthedocumentswillbevisibleforsearchoperations.Theequationisquitesimple-thefastertherefreshrate,theslowerthequerieswillbeandthelowertheindexingthroughput.Ifwecanallowourselvestohaveaslowerrefreshrate,suchas10sor30s,goforit.ItwillputlesspressureonElasticsearch,Lucene,andhardwareingeneral.Rememberthatbydefaulttherefreshrateissetto1s,whichbasicallymeansthattheindexsearcherobjectisreopenedeverysecond.

Togiveyouabitofinsightintowhatperformancegainswearetalkingabout,wedidsomeperformancetestsincludingElasticsearchanddifferentrefreshrates.Withtherefreshrateof1swewereabletoindexabout1000documentspersecondusingasingleElasticsearchnode.Increasingtherefreshrateto5sgaveusincreaseinindexingthroughputofmorethan25percentandwewereabletoindexabout1250documentspersecond.Settingtherefreshrateto25sgaveusabout70percentofmorethroughputascomparedto1srefreshrate,whichwasabout1700documentspersecondonthesameinfrastructure.Itisalsoworthrememberingthatincreasingthetimeindefinitelydoesn’tmakemuchsense,becauseafteracertainpoint(dependingonyourdataloadandtheamountofdatayouhave)theincreaseofperformanceisnegligible.

Someperformancecomparisonsrelatedtoindexingthroughputandindexrefreshratecanbefoundintheblogpostathttp://blog.sematext.com/2013/07/08/elasticsearch-refresh-interval-vs-indexing-performance/.

ThreadpoolstuningBydefault,Elasticsearchcomeswithverygooddefaultswhenitcomestoallthreadpoolsconfiguration.Youshouldrememberthattuningthedefaultthreadpoolsconfigurationshouldbedoneonlywhenyoureallyseethatyournodesarefillingupthequeuesandtheyhavestillprocessingpowerleftthatcouldbedesignatedtotheprocessingofthewaitingoperationsorwhenyouwanttoincreasethepriorityofoneormoreoperations.

Forexample,ifyoudidyourperformancetestsandyousawyourElasticsearchinstancesnotbeingsaturated100percent,butontheotherhandyouexperiencedarejectedexecutionerror,thenthatisapointwhenyoushouldstartadjustingthethreadpools.Youcaneitherincreasetheamountofthreadsthatareallowedtobeexecutedatthesametimeorincreasethequeue.Ofcourse,youshouldalsorememberthatincreasingthenumberofconcurrentlyrunningthreadstoveryhighnumberswillleadtomanyCPUcontextswitches(http://en.wikipedia.org/wiki/Context_switch)whichwillresultinaperformance

www.EBooksWorld.ir

drop.

AutomaticstorethrottlingBeforeElasticsearch2.0,wehadtocareabouthowoursegmentprocesswasconfiguredandhowmuchdiskI/Omergingcoulduseingeneral,butthatchanged.RightnowElasticsearchlooksathowI/Osubsystembehavesandadjuststhethrottlingandmergingprocessifthemergesarefallingbehindtheindexing.So,wenolongerneedtoautomaticallyadjustthrottlingfordiskbasedoperations.YoucanreadmoreabouttherelatedchangesonGitHubathttps://github.com/elastic/elasticsearch/pull/9243.

Handlingtime-baseddataWhenyouhavetime-baseddata,suchaslogsforexample,thearchitectureofyourindicesplaysaveryimportantrole.Let’sassumethatwehavelogsindexedintoElasticsearch.Theseusuallycomeinlargenumbers,areconstantlyindexed,andaretimerelated(aneventthatisloggedhappenedatacertainpointintime).TheassumptionisthatyouhaveacertainretentiontoyourdataandatimethatyouwouldlikethedatatobepresentandsearchableinElasticsearch.Afterthattime,youjustdeletethedataandforgetaboutit.

Withsuchassumptionsinmind,youcouldjustcreateasingleindexwithlotofshardsandtrytoindexlargeamountsoflogsthere.However,that’snottheperfectsolution.Firstofall,becauseofmerges–thelargertheindexgets,themoreexpensivethemergesare.ElasticsearchneedstomergelargerandlargersegmentsandmoreI/OandCPUisrequiredtohandlethem.Thismeansslowdowns.Inadditiontothat,deleteswillbeexpensivebecauseyouwillhavetodeletethedataeitherbyusingTTLorbyusingdeletebyqueryplugin–bothexpensivetouseintermsofperformanceandwillcauseevenmoremerging.Andthisisnoteverything–duringqueryingyouwillhavetorunthroughthewholeindextogeteventhesmallestsliceofthedata.So,aretherebetterindexarchitecturesfortime-baseddata?

Yes,oneofthemostcommonandbestsolutionsistousetimebasedindices.Dependingonthedatavolume,youcanhavedaily,weekly,monthly,orevenhourlyindices.Thedownsideisthenumberofshardsyouwillhavewhenthenumberofindicesgrow,butapartfromthatthereareonlypros:youcancontroleachindex,changethenumberofshardsifthatisneeded,andhavefastermergingbecausetheindiceswillbesmallercomparedtoonlyonebigindex.What’smore,deletingdatawon’tbepainfulatall–theideaistodeletethewholeindices;forexample,adayworthofdataincaseofdailyindices.Querieswillalsobenefit–youcanjustrunthequeryonasingletimebasedindextonarrowdownthesearchresults.Finally,Elasticsearch,bydefault,willcreatetheindicesforus.Forexample,whenusingdailyindices,wecanhavenamessuchaslogs_2016-01-01,logs_2016-01-02,andsoon.

TheonlythingweneedtocareaboutisprovidingtheindexnameonthebasisofthedateandcreatingtemplatestoconfigureeachnewlycreatedindexandElasticsearchwilldotherest.

Multipledatapaths

www.EBooksWorld.ir

WiththereleaseofElasticsearch2.0,weweregiventheabilitytospecifymultiplepath.datapropertiesinourelasticsearch.ymlpointingtodifferentdirectoriesondifferentphysicaldevices.Elasticsearchcannowleveragethatbyputtingdifferentshardsondifferentdevicesandusingthemultiplepathsinthemostefficientway.Becauseofthat,wecanparallelizewritingtodisksifwehavemorethanasingledisk.Thisisespeciallyusefulforhighindexingusecaseswhereyouindexalotofdata.

DatadistributionAsweknow,eachindexintheElasticsearchworldcanbedividedintomultipleshardsandeachshardcanhavemultiplereplicas.IncaseswhenyouhavemultipleElasticsearchnodes(andyouwillprobablyhaveinproduction),youshouldthinkaboutthenumberofshardsandreplicasandhowthatwillaffectyournodes.Datadistributionmaybecrucialtoeventheloadontheclusterandnothavesomenodesdoingmoreworkthantheotherones.

Let’stakethefollowingexample.Imaginewehaveaclusterthatisbuiltof4nodesandithasasingleindexcalledbookbuiltof3shardsandonereplica.Suchadeploymentwilllookasfollows:

Asyoucansee,thefirsttwonodeshavetwophysicalshardsallocatedtothem,whilethelasttwonodeshaveonlyoneshardallocatedeach.Theactualdataallocationisnoteven.Whensendingthequeriesandindexingdata,wewillhavethefirsttwonodesdomoreworkthantheothertwo-thisiswhatwewanttoavoid.Oneoptionistohavethebookindexhavetwoshardsandonereplica,soitlooksasfollows:

www.EBooksWorld.ir

Thisarchitecturewillworkanditisperfectlyfine.Wedon’thavetohaveprimaryshardsonallournodes,wecanhavereplicas,dependingonwhatbottleneckweexpect.Forqueryingwemaywanttohavemorereplicas,forindexingmoreprimaries.

Wecanalsohaveourprimaryshardssplitevenly,likeinthefollowingimage:

ThethingtorememberthoughisthatinbothcaseswewillendupwithevendistributionofshardsandreplicasandElasticsearchwilldosimilaramountofworkonallthenodes.Ofcourse,withmoreindices(likehavingdailyindices)itmaybetrickiertogetthedataevenlydistributedanditmaynotbepossibletohaveevenlydistributedshards,butweshouldtrytogettosuchpoint.

Onemorethingtorememberwhenitcomestodatadistributionandshardsandreplicasisthatwhendesigningyourindexarchitecture,youshouldrememberwhatyouwanttoachieve.Ifyouaregoingforaveryhighindexingusecase,youmaywanttospreadtheindexintomultipleshardstolowerthepressurethatisputontheCPUandtheI/Osubsystemoftheserver.Thisisalsotrueforrunningexpensivequeries,becausewithmoreshardsyoucanlowertheloadonasingleserver.However,withthequeriesthereis

www.EBooksWorld.ir

onemorething-ifyournodescan’tkeepupwiththeloadcausedbyqueries,youcanaddmoreElasticsearchnodesandincreasethenumberofreplicassothatthephysicalcopiesoftheprimaryshardsareplacedonthosenodes.Thatwillmakeindexingabitslowerbutwillgiveyouthecapacitytohandlemorequeriesatthesametime.

BulkindexingThisisveryobviousadvice,butyouwouldbesurprisedhowmanyElasticsearchusersforgetaboutindexingdatainbulksinsteadofsendingthedocumentsonebyone.Sotheadvicehereistodobulksinsteadofonebyoneindexingwheneverpossible.ThethingtorememberthoughisnottooverloadElasticsearchwithtoomanybulkrequestsandtokeepthemunderareasonablesize(donotpushmillionsofdocumentsinasinglerequest).Rememberaboutthebulkthreadpoolanditssizeandtrytoadjustyourindexersnottogobeyonditoryouwillfirststarttoqueuetheserequestsand,ifElasticsearchwillnotbeabletoprocessthem,youwillquicklystartseeingrejectedexecutionexceptionsandyourdatawon’tbeindexed.

Justasanexample,wewouldliketoshowresultsoftestswedidsometimeagoforthetwotypesofindexing:onebyoneandbulks.Inthefollowingimage,wehavetheindexingthroughputwhenrunningindexationonedocumentbyone:

Inthisnextimage,wedothesame,butinsteadofindexingdocumentsonebyone,weindextheminbatchesof10documents(whichisstillarelativelylownumberofdocumentsinabulk):

www.EBooksWorld.ir

Asyoucansee,whenindexingdocumentsonebyone,wewereabletoindexabout30documentspersecondanditwasstable.Thesituationchangedwithbulkindexingandbatchesof10documents;wewereabletoindexslightlymorethan200documentspersecond.Sothedifferencecanbeclearlyseen.

Ofcoursethisisaverybasiccomparisonofindexingspeed.Toshowtherealdifference,weshouldusedozensofthreadsandpushElasticsearchtoitslimits.However,theprecedingcomparisonshouldgiveyouabasicviewoftheindexingthroughputgainswhenusingbulkindexing.

RAMbufferforindexingRemember,themoreavailableRAMfortheindexingbuffer(theindices.memory.index_buffer_sizeproperty),themoredocumentsElasticsearchcanholdinmemory.However,wedon’twanttohaveElasticsearchoccupy100percentoftheavailablememory.Theindexingbuffercanhelpuswithdelayingtheflushtodisk,whichwillmeanlessI/Opressureandlessmerges.YoucanreadmoreaboutindexingbufferconfigurationinChapter9,ElasticsearchClusterinDetail.

www.EBooksWorld.ir

AdviceforhighqueryratescenariosOneofthegreatfeaturesofElasticsearchisitsabilitytosearchandanalyzethedatathatwasindexed.However,sometimesitisnecessarytoadjustElasticsearchandourqueriestonotonlygettheresultsofthequery,butalsogetthemfast(orinareasonableamountoftime).Inthissection,wewilllookatthepossibilitiesofpreparingElasticsearchforhighquerythroughputusecases,butnotjustthat.Wewillalsolookatgeneralperformancetipswhenitcomestoquerying.

ShardrequestcacheThepurposeoftheshardrequestcacheistocacheaggregations,suggesterresults,andnumbersofhits(itwillnotcachethereturneddocumentsandthusonlyworkswithsize=0).Whenyourqueriesuseaggregationsorsuggestions,itmaybeagoodideatoenablethiscache(itisdisabledbydefault)sothatElasticsearchcanre-usethedatastoredthere.Thebestthingaboutthecacheisthatitpromisesthesamenearreal-timesearchasasearchthatisnotcached.YoucanreadmoreaboutcachesandtheshardrequestcacheinparticularinChapter9,ElasticsearchClusterinDetail.

ThinkaboutthequeriesThisisthemostgeneraladvicewecanactuallygive–youshouldalwaysthinkaboutoptimalquerystructure,filterusage,andsoon.Forexample,let’slookatthefollowingquery:

{

"query":{

"bool":{

"must":[

{

"query_string":{

"query":"masteringANDdepartment:itANDcategory:book",

"default_field":"name"

}

},

{

"term":{

"tag":"popular"

}

},

{

"term":{

"tag":"2014"

}

}

]

}

}

}

Itreturnsthebookmatchingafewconditions.However,thereareafewthingswecanimproveintheprecedingquery.Forexample,wecanmovethestaticthingssuchasthe

www.EBooksWorld.ir

tag,department,andcategoryfieldrelatedconditionstothefiltersectionoftheBooleanquery,sothatthenexttimeweusesomepartsofthequerywesaveCPUcyclesandre-usetheinformationstoredincache.Thatstaticfilteringinformationisalsonotrelevantwhenitcomestoscoring.Becauseofthatwecanmovethosestaticelementstothefiltersectionandomitscoringcalculationforthem.Forexample,thisishowtheoptimizedquerywilllooklike:

{

"query":{

"bool":{

"must":[

{

"query_string":{

"query":"mastering",

"default_field":"name"

}

}

],

"filter":[

{

"term":{

"tag":"popular"

}

},

{

"term":{

"tag":"2014"

}

},

{

"term":{

"department":"it"

}

},

{

"term":{

"category":"book"

}

}

]

}

}

}

Asyoucansee,thereareafewthingsthatwedid.Westillusedtheboolquery,butweintroducedtheuseofthefiltersection.Weusedfilteringforthestatic,non-analyzedfields.Thisallowsustoeasilyre-usethefiltersinthenextqueriesthatweexecute.Becauseofsuchqueryrestructuring,wewereabletosimplifythemainquery.Thisisexactlywhatyoushouldbedoingwhenoptimizingyourqueriesordesigningthem-haveoptimizationandperformanceinmindandtrytokeepthemasoptimalastheycanbe.Thiswillresultinfasterexecutionofthequeries,lowerresourceconsumption,andbetterhealthofthewholeElasticsearchcluster.

www.EBooksWorld.ir

ParallelizeyourqueriesOnethingthatisusuallyforgottenistheneedofparallelizingqueries.Imaginethatyouhaveadozennodesinyourclusterbutyourindexisbuiltofasingleshard.Iftheindexislarge,yourquerieswillperformworsethanyouexpect.Ofcourseyoucanincreasethenumberofreplicas,butthatwon’thelp.Asinglequerywillstillgotoasingleshardinthatindex,becausereplicasarenotmorethanthecopiesoftheprimaryshardandtheycontainthesamedata(oratleasttheyshould).Thisisalsotruenotonlyforindiceshavingoneshardbutalsoifyouhavemorethanoneshard,buttheyareverylarge,youcanstillhaveperformancerelatedproblems.Itissaidthatthequeryisonlyasfastastheslowestpartialqueryresponse.

Ofcourse,theparallelizationalsodependsontheusecase.IfyourunalotofqueriestoElasticsearch,youmaynotneedtoparallelizethequeries,especiallywhentheshardsaresmallenoughandyoudon’tseeproblemsatshardlevel.Ingeneral,lookatyourElasticsearchnodesandseeiftheyhaveunusedCPUcoresand,ifthat’sthecase,youmayhaveroomforimprovementandparallelization.

FielddatacacheandbreakingthecircuitWehavetwodifferentfactorswecantunetobesurethatwedon’trunintooutofmemoryerrors.Firstofall,wecanlimitthesizeofthefielddatacache.Thesecondthingisthecircuitbreaker,whichwecaneasilyconfiguretojustthrowanexceptioninsteadofloadingtoomuchdata.Combiningthesetwothingswillensurethatwedon’trunintomemoryissues.Evenifyouareusingdocvaluesalot,youmaystillrunintooutofmemoryissues.Forexample,foranalysedfields,whichcan’tusedocvaluesandwilluse,fielddatacache–configurethefielddatacacheandcircuitbreakerscorrectly.YoucanreadmoreabouthowtoconfiguretheminChapter9,ElasticsearchClusterinDetail.

KeepsizeandshardsizeundercontrolWhendealingwithsomeofthequeriesthatuseaggregations,wehavethepossibilityofusingtwoproperties:sizeandshard_size.Thesizeparameterdefineshowmanybucketsshouldbereturnedbythefinalaggregationresults;thenodethataggregatesthefinalresultswillgetthetopbucketsfromeachshardthatreturnstheresultandwillonlyreturnthetopsizeofthemtotheclient.Theshard_sizeparametertellsElasticsearchaboutthesamebutattheshardlevel.Increasingthevalueoftheshard_sizeparameterwillleadtomoreaccurateaggregations(likeinthecaseofsignificanttermsaggregation)atthecostofnetworktrafficandmemoryusage.Loweringthatparameterwillcauseaggregationresultstobelessprecise,butwewillbenefitfromlowermemoryconsumptionandlowernetworktraffic.Ifweseethatthememoryusageistoolarge,wecanlowerthesizeandshard_sizepropertiesforproblematicqueriesandseeifthequalityoftheresultsisstillacceptable.

www.EBooksWorld.ir

www.EBooksWorld.ir

MonitoringElasticsearchmonitoringAPIsexposealotofinformation,bothaboutthesearchengineitselfaswellasabouttheenvironment,suchastheoperatingsystem.WesawthatinChapter10,AdministratingYourCluster.Becauseofthisandtheeaseofretrievingthisinformation,numerousapplicationswerebuilt–onesthatallowustodomonitoringandbeyond.Someoftheseapplicationsaresimpleandjustreadthedatainrealtimewithoutanypersistentstorage,whileothersallowustoreadhistoricaldataaboutourclusterbehavior.Inthischapter,wewillonlyslightlytouchthetopofthepileofinformationaboutsuchapplications,butwestronglyadviseyoutogetfamiliarwithsomeofthemastheycanmakeyoureverydayworkwithElasticsearcheasier.

WechosethreeexamplesofmonitoringsolutionswhichtakeadifferentapproachofintegrationwithElasticsearch.ThefirsttwotoolsareavailableasElasticsearchpluginsandthethirdtakesadifferentapproachtointegration.

www.EBooksWorld.ir

ElasticsearchHQThistoolisavailableasanElasticsearchpluginbutcanalsobedownloadedseparatelyasaJavaScriptapplicationruninabrowser.

ElasticsearchHQusesJavaScriptandAJAXtechniqueswheredataisfetchedperiodicallyfromthecluster,preparedforvisualizationonthebrowserside,andshowntotheuser.

Thetoolallowsustotrackstatisticsonaparticularnode.Thebrowsercanpresentvitalinformationabouttheclusterandparticularnodes.ThefollowingscreenshotshowsthegraphicaluserinterfacefromElasticsearchHQ:

Wehavethebasicinformationaboutthecluster,thenumberofnodes,andElasticsearchhealth.Wecanalsoseewhichnodewearelookingatandsomestatisticsaboutthenode,whichincludethememoryusage(bothheapandnon-heap),thenumberofthreads,Javavirtualmachinegarbagecollectorwork,andsoon.Thepluginalsopresentssimplifiedinformationaboutschemaandshardsandallowsexecutionofsimplequeries.

InordertoinstallElasticsearchHQ,oneshouldjustrunthefollowingcommand:

bin/plugininstallroyrusso/elasticsearch-HQ

Afterthat,theGUIwillbeavailableathttp://localhost:9200/_plugin/hq/.

OnethingtorememberisthatElasticsearchHQdoesn’tpersistthefetcheddataanywhere,sothedataisonlyfetchedwhenyourbrowserisrunningandhasElasticsearchHQopened.Ifsomethinghashappenedinthepast,youwon’tbeabletodiagnoseit.

www.EBooksWorld.ir

MarvelMarvelisthetoolcreatedbytheElasticsearchteam.Inthecurrentversion,itisbuiltasapluginforavisualizationplatformcalledKibana(https://www.elastic.co/products/kibana).

NoteKibanaisoutofthescopeofthisbook.YoucanfindmoreaboutKibanaonofficialproductpageavailableat

https://www.elastic.co/.

Marvelalsovisualizesbasicinformationaboutclustersandnodesbydrawingnicegraphsthataredynamicallyupdatedovertime.ThemaindifferencefromElasticsearchHQisthattheperformancedataisstoredontheserverside(inthesameorexternalElasticsearchcluster),sohistoricaldataisavailable.Theexamplescreenshotispresentednext:

TheinstallationprocedureforMarvelcontainsthreesteps:

bin/plugininstalllicense

bin/plugininstallmarvel-agent

Andfinally,thethirdstepistoinstalltheMarvelplugininKibanabyrunningthefollowingcommand:

bin/kibanaplugin--installelasticsearch/marvel/latest

www.EBooksWorld.ir

SPMforElasticsearchThistoolpresentsadifferentapproachthanthepreviouslymentionedtools.SPMisaSoftwareasaService(SaaS)solutioncreatedformonitoringElasticsearchinstallationsofanysizeandallowsmonitoringseveralclustersanddifferenttechnologies.ThoughitsrootsareSaaS-based,itisalsoavailableonpremises,whichmeansthatyoucanrunSPMonyourownmachineswithouttheneedforsendingyourmetricstocloud.

InformationissentbysimpleclientsoftwareinstalledontheElasticsearchmachinetotheSPMservers.Themainadvantageisthepossibilityofstoringinformationforawiderrangeoftimeandseeingwhatwashappeninginthepast.Youcancreateyourowndashboardsandcorrelatemetricswithlogsbetweenmultipleapplications(SPMallowsyoutomonitorawidevarietyofapplications).

ThefollowingscreenshotshowsthedashboardofSPMforElasticsearch:

Theoverviewdashboardshownintheprecedingscreenshotprovidesinformationabouttheclusternodes,therequestrateandlatency,thenumberofdocumentsintheindices,CPUusage,load,memorydetails,Javavirtualmachinememory,thediskspaceusage,andfinallynetworktraffic.Youcangetdetailedinformationabouteachoftheseelementsbygoingintothetabdedicatedtoit.

YoucanfindadditionalinformationaboutSPMinstallationandavailableoptionsathttp://sematext.com/spm/index.html.

www.EBooksWorld.ir

www.EBooksWorld.ir

SummaryInthischapter,wefocusedonscalingandtuningElasticsearch.Westartedwiththehardwarepreparationsanddecisionsweneedtomake.Next,wetunedasingleElasticsearchnodeasmuchaswecouldandafterthatweconfiguredthewholeclustertoworkaswellasitcould.Wediscussedverticalexpansionpossibilitiesandwelearnedhowtomonitorourclusteronceithitstheproductionenvironment.

Sonowwehavereachedtheendofthebook.Wehopethatitwasanicereadingexperienceandthatyoufoundthebookinteresting.Sincethepreviouseditionofthebook,Elasticsearchhaschangedalot.Notonlywhenitcomestoversions,butalsowhenitcomestofunctionalities.Someofthefeaturesarenolongerthere,someofthemweremovedtoplugins,andofcoursenewfeatureswereadded.WereallyhopethatyouhavelearnedsomethingfromthisbookandnowyouwillfinditeasiertouseElasticsearcheveryday–nomatterifyouareabeginnerinthisworldorasemi–experiencedElasticsearchuser.Astheauthorsofthisbook,butalsoasElasticsearchusersourselves,wetriedtobringyou,ourreaders,thebestreadingexperiencewecould.OfcourseElasticsearchismorethanwedescribedinthebook,especiallywhenitcomestomonitoringandadministrationcapabilitiesandAPI.However,thenumberofpagesislimitedandifweweretodescribeeverythingingreatdetailswewouldhaveendedupwithabookonethousandpageslong.WeneedtorememberthatElasticsearchisnotonlyuserfriendlybutalsoprovidesalargeamountofconfigurationoptions,queryingpossibilities,andsoon.Duetothat,wehadtochoosewhichfunctionalitiestodescribeingreaterdetails,whichhadtobeonlymentioned,andwhichhadtobetotallyskipped.Aswiththetwopreviouseditionsofthebookyouareholding,wehopethatwemadetherightchoiceandthatyouarehappyaboutwhatyou’veread.

WewouldalsoliketosaythatitisworthrememberingthatElasticsearchisconstantlyevolving.Whenwritingthisbook,wewentthroughafewstableversionsfinallymakingittothereleaseofElasticsearch2.2.Evenbackthenweknewthatnewfeaturesandimprovementswerecoming,likesomeofthechangesmentionedinthebookthatwillbepartofthenextrelease,oratleasttheyareplannedtobe.BesuretochecktheofficialdocumentationofElasticsearchperiodicallyforthereleasenotesfornewversionsofElasticsearch,ifyouwanttobeuptodatewiththenewfeaturesbeingadded.Wewillalsobewritingaboutnewfeaturesthatwethinkareworthmentioningonwww.elasticsearchserverbook.com.Soifyouareinterested,visitthesitefromtimetotime.

Onceagainthankyouforthetimeyou’vespentwiththebook.

www.EBooksWorld.ir

IndexA

advices,forhighqueryratescenariosabout/Adviceforhighqueryratescenariosshardrequestcache/Shardrequestcachequeries/Thinkaboutthequeriesqueries,parallelizing/Parallelizeyourqueriesfielddatacache/Fielddatacacheandbreakingthecircuitcircuit,breaking/Fielddatacacheandbreakingthecircuitsize,controlling/Keepsizeandshardsizeundercontrolshardsize,controlling/Keepsizeandshardsizeundercontrol

aggregationengineworking/Insidetheaggregationsengine

aggregationsabout/Aggregationsgeneralquerystructure/Generalquerystructuretypes/Aggregationtypesdate_histogram/Datehistogramaggregationgeodistanceaggregations/Geodistanceaggregationsgeohashgridaggregation/Geohashgridaggregationglobalaggregation/Globalaggregationsignificant_termsaggregation/Significanttermsaggregationsampleraggregation/Sampleraggregationchildrenaggregation/Childrenaggregationnestedaggregation/Nestedaggregationreverse_nestedaggregation/Reversenestedaggregationnestingaggregations/Nestingaggregationsandorderingbuckets

aggregations,typesmetrics/Aggregationtypes,Metricsaggregationsbuckets/Aggregationtypes,Bucketsaggregationspipeline/Aggregationtypes

AmazonURL/Costandperformanceflexibility

AmazonS3URL/Creatingasnapshotrepository

AnalyzeAPIURL/Definingyourownanalyzers

analyzersusing/Usinganalyzersout-of-the-boxanalyzers/Out-of-the-boxanalyzersdefining/Definingyourownanalyzersdefaultanalyzers/Defaultanalyzers

www.EBooksWorld.ir

ApacheLucene/GettingbacktoApacheLuceneURL/Fulltextsearchingglossary/TheLuceneglossaryandarchitecturearchitecture/TheLuceneglossaryandarchitecturedocument/TheLuceneglossaryandarchitecturefield/TheLuceneglossaryandarchitectureterm/TheLuceneglossaryandarchitecturetoken/TheLuceneglossaryandarchitecturetokenizer/Inputdataanalysisscoring/IntroductiontoApacheLucenescoring

ApacheLuceneJavadocsfortheTFIDFURL/Scoringandqueryrelevance

ApacheLucenescoringabout/IntroductiontoApacheLucenescoringdocumentmatching,factors/Whenadocumentismatcheddefaultscoringformula/Defaultscoringformularelevantdocuments/Relevancymatters

ApacheSolrURL/UsingApacheSolrsynonyms

ApacheSolrsynonymsusing/UsingApacheSolrsynonymsexplicitsynonyms/Explicitsynonymsequivalentsynonyms/Equivalentsynonymsexpandproperty/Expandingsynonyms

ApacheTikaURL/Detectingthelanguageofthedocument

arbitrarygeoshapesabout/Arbitrarygeoshapespoint/Pointenvelope/Envelopepolygon/Polygonmultipolygon/Multipolygonexampleusage/Anexampleusagestoring,inindex/Storingshapesintheindex

arguments,CatAPIURL/Commonarguments

attributes,indexstructuremappingindex_name/Commonattributesindex/Commonattributesstore/Commonattributesdoc_values/Commonattributesboost/Commonattributesnull_value/Commonattributescopy_to/Commonattributes

www.EBooksWorld.ir

include_in_all/Commonattributesprecision_step/Number,Datecoerce/Numberignore_malformed/Number,Dateformat/Dateformat,referencelink/Datenumeric_resolution/Date

availableobjects,scriptexecution_doc/Objectsavailableduringscriptexecution_source/Objectsavailableduringscriptexecution_fields/Objectsavailableduringscriptexecution

AzureURL/Creatingasnapshotrepository

www.EBooksWorld.ir

Bbasicqueries

about/Basicqueriestermquery/Thetermquerytermsquery/Thetermsquerymatchallquery/Thematchallquerytypequery/Thetypequeryexistsquery/Theexistsquerymissingquery/Themissingquerycommontermsquery/Thecommontermsquerymatchquery/Thematchquerymultimatchquery/Themultimatchqueryquerystringquery/Thequerystringquerysimplequerystringquery/Thesimplequerystringqueryidentifiersquery/Theidentifiersqueryprefixquery/Theprefixqueryfuzzyquery/Thefuzzyquerywildcardquery/Thewildcardqueryrangequery/Therangequeryregularexpressionquery/Regularexpressionquerymorelikethisquery/Themorelikethisquery

batchindexingused,forspeedingupindexingprocess/Batchindexingtospeedupyourindexingprocess

Booleanpropertiessetnode.master/Configuringnoderolesnode.data/Configuringnoderolesnode.client/Configuringnoderoles

boolqueryabout/Theboolqueryshouldsection/Theboolquerymustsection/Theboolquerymust_notsection/Theboolqueryfilterparameter/Theboolqueryboostparameter/Theboolqueryminimum_should_matchparameter/Theboolquerydisable_coordparameter/Theboolqueryused,forexplicitfiltering/Explicitfilteringwithboolquery

boostingquery/Theboostingqueryboost_modeparameter

multiplyvalue/Structureofthefunctionqueryreplacevalue/Structureofthefunctionquerysumvalue/Structureofthefunctionquery

www.EBooksWorld.ir

avgvalue/Structureofthefunctionquerymaxvalue/Structureofthefunctionqueryminvalue/Structureofthefunctionquery

bucketaggregationsordering/Nestingaggregationsandorderingbuckets,Bucketsordering

buckets/Generalquerystructurebucketsaggregations

about/Bucketsaggregationsfilteraggregation/Filteraggregationfiltersaggregation/Filtersaggregationtermsaggregation/Termsaggregationrangeaggregation/Rangeaggregationdate_rangeaggregation/Daterangeaggregationip_rangeaggregation/IPv4rangeaggregationmissingaggregation/Missingaggregationhistogramaggregation/Histogramaggregation

bulkindexingdata,preparing/Preparingdataforbulkindexing

www.EBooksWorld.ir

Ccaches

about/Elasticsearchcachesfielddatacache/Fielddatacachefielddata,usingwithdocvalues/Fielddataanddocvaluesshardrequestcache/Shardrequestcachenodequerycache/Nodequerycacheindexingbuffers/Indexingbuffersavoiding,scenarios/Whencachesshouldbeavoided

CatAPIabout/TheCatAPIdefining/Thebasicsusing/UsingCatAPIcommonarguments/Commonargumentsexamples/Theexamples,Gettinginformationaboutthenodes

childrenaggregationabout/Childrenaggregation

CIDRnotationURL/IPv4rangeaggregation

ClassDateTimeFormatURL/Tuningthetypedeterminingmechanismfordates

clientnodeabout/Noderoles,Clientnode

clusterabout/Nodesandclustersinstalling/Installingandconfiguringyourclusterconfiguring/Installingandconfiguringyourclusterdirectorylayout/Thedirectorylayoutsystem-specificinstallationandconfiguration/Thesystem-specificinstallationandconfiguration

clusterhealthAPIabout/ClusterhealthAPIinformationdetails,controlling/Controllinginformationdetailsadditionalparameters/Additionalparameters

clusterrebalancingcontrolling/Controllingclusterrebalancingdefining/Understandingrebalanceimplementing/Clusterbeingreadysettings/Theclusterrebalancesettings,Controllingthenumberofshardsbeingmovedbetweennodesconcurrently

clustersettingsAPI/TheclustersettingsAPIclusterwideallocation

about/Cluster-wideallocation

www.EBooksWorld.ir

allocationawareness/Allocationawarenessallocationawareness,forcing/Forcingallocationawarenessfiltering/Filtering

CMSsystemURL/Creatinganewdocument

commontermsquery/Thecommontermsquerycompletionsuggester

about/CompletionsuggesterinElasticsearch2.2/Completionsuggester

completionsuggester,Elasticsearch2.1data,indexing/Indexingdataindexeddata,querying/Queryingindexedcompletionsuggesterdatacustomweights/Customweights

completionsuggester,Elasticsearch2.2about/Completionsuggester

compoundqueriesabout/Compoundqueriesboolquery/Theboolquerydis_maxquery/Thedis_maxqueryboostingquery/Theboostingqueryconstant_scorequery/Theconstant_scorequeryindicesquery/Theindicesquery

compressedoopsURL/Thememory

compressedordinaryobjectpointersreferencelink/MultipleElasticsearchinstancesonasinglephysicalmachine

configurationoptions,phrasesuggestermax_errors/Configurationseparator/Configuration

configurationoptions,termsuggestertext/Termsuggesterconfigurationoptionsfield/Termsuggesterconfigurationoptionsanalyzer/Termsuggesterconfigurationoptionssize/Termsuggesterconfigurationoptionssuggest_mode/Termsuggesterconfigurationoptionssort/Termsuggesterconfigurationoptions

constant_scorequery/Theconstant_scorequerycontent

searching,indifferentlanguages/Searchingcontentindifferentlanguagescontent,searchingindifferentlanguages

about/Searchingcontentindifferentlanguageslanguages,handling/Handlinglanguagesdifferentlymultiplelanguages,handling/Handlingmultiplelanguagesdocumentlanguage,detecting/Detectingthelanguageofthedocument

www.EBooksWorld.ir

sampledocument/Sampledocumentmappings/Themappingsdata,querying/Queryingqueries,combining/Combiningqueries

contextsuggesterabout/Contextsuggestertypes/Contexttypesusing/Usingcontextgeolocationcontext,using/Usingthegeolocationcontext

contextswitchesreferencelink/Threadpoolstuning

coretypes,indexstructuremappingabout/Coretypescommonattributes/Commonattributesstring/Stringnumber/Numberboolean/Booleanbinary/Binarydate/Date

counttoitfield/Addingpartialdocumentscreate,retrieve,update,delete(CRUD)

URL/ManipulatingdatawiththeRESTAPIcURLcommand

URL/InstallingElasticsearch

www.EBooksWorld.ir

Ddata

manipulating,withRESTAPI/ManipulatingdatawiththeRESTAPIstoring,inElasticsearch/StoringdatainElasticsearchpreparing,forbulkindexing/Preparingdataforbulkindexingindexing/Indexingthedata_allfield/The_allfield_sourcefield/The_sourcefieldinternalfields/Additionalinternalfieldssorting/Sortingdatadefaultsorting/Defaultsortingquerying,inchilddocuments/Queryingdatainthechilddocumentsquerying,inparentdocuments/Queryingdataintheparentdocuments

data,thatisnotflatindexing/Indexingdatathatisnotflatdata/Dataobjects/Objectsarrays/Arraysmappings/Mappingsdynamicbehavior/Tobeornottobedynamicobjectindexing,disabling/Disablingobjectindexing

datanodeabout/Noderoles

dataquerying,casesidentifiedlanguage,using/Querieswithanidentifiedlanguageunknownlanguage,using/Querieswithanunknownlanguage

datasetsforegroundsets/Choosingsignificanttermsbackgroundsets/Choosingsignificantterms

datasortingabout/Sortingdatadefaultsorting/Defaultsortingfields,selecting/Selectingfieldsusedforsortingmode/Sortingmodebehaviorformissingfields,specifying/Specifyingbehaviorformissingfieldsdynamiccriteria/Dynamiccriteriascoring,calculating/Calculatescoringwhensorting

date_histogramaggregationsabout/Datehistogramaggregationtimezones/Timezones

DEBpackageused,forinstallingElasticsearch/InstallingElasticsearchusingtheDEBpackage

www.EBooksWorld.ir

defaultindexing/Defaultindexingderivativeaggregation

URL/Derivativeaggregationdesignatednodesrolesforlargerclusters

about/Designatednoderolesforlargerclustersqueryaggregatornodes/Queryaggregatornodesdatanodes/Datanodesmastereligiblenodes/Mastereligiblenodes

DigitalOceanURL/Costandperformanceflexibility

directorylayout,clusterbin/Thedirectorylayoutconfig/Thedirectorylayoutlib/Thedirectorylayoutmodules/Thedirectorylayoutdata/Thedirectorylayoutlogs/Thedirectorylayoutplugins/Thedirectorylayoutwork/Thedirectorylayout

disk-basedshardallocationabout/Disk-basedshardallocationconfiguring/Configuringdiskbasedshardallocationdisabling/Disablingdiskbasedshardallocation

dis_maxquery/Thedis_maxqueryDocker

referencelink/MultipleElasticsearchinstancesonasinglephysicalmachinedocument

about/Documentcreating/Creatinganewdocumentautomaticidentifiercreation,creating/Automaticidentifiercreationretrieving/Retrievingdocumentsupdating/Updatingdocumentsnon-existingdocuments,dealingwith/Dealingwithnon-existingdocumentspartialdocuments,adding/Addingpartialdocumentsdeleting/Deletingdocuments

documenttype/Documenttypedoubletype

URL/Numberdynamictemplates

about/Templatesanddynamictemplates,Dynamictemplatesmatchingpattern/Thematchingpatterntargetfielddefinition,writing/Fielddefinitions

www.EBooksWorld.ir

EElasticsearch

about/ThebasicsofElasticsearchkeyconcepts/KeyconceptsofElasticsearchindex/Indexdocument/Documentdocumenttype/Documenttypemapping/Mappingindexing/Indexingandsearching,Elasticsearchindexingsearching/IndexingandsearchingURL/InstallingElasticsearch,Availablesimilaritymodelsinstalling/InstallingElasticsearchrunning/RunningElasticsearchshuttingdown/ShuttingdownElasticsearchconfiguring/ConfiguringElasticsearchinstalling,withRPMpackage/InstallingElasticsearchusingRPMpackagesinstalling,withDEBpackage/InstallingElasticsearchusingtheDEBpackageconfigurationfiles,localization/Elasticsearchconfigurationfilelocalizationquerying/QueryingElasticsearch,Asimplequeryexampledata/Theexampledatapaging/Pagingandresultsizeresultsize,controlling/Pagingandresultsizeversionvalue,returning/Returningtheversionvaluescore,limiting/Limitingthescorereturnfields,selecting/Choosingthefieldsthatwewanttoreturnsourcefiltering/Sourcefilteringscriptfields,using/Usingthescriptfieldsparameters,passingtoscriptfields/Passingparameterstothescriptfieldsparametrs,passingtoscriptfields/Passingparameterstothescriptfieldsscriptingcapabilities/ScriptingcapabilitiesofElasticsearchspatialcapabilities/Elasticsearchspatialcapabilitiesreferencedocumentation,URL/Configurationplugins/Elasticsearchpluginscaches/Elasticsearchcacheshardwarepreparations/Hardwaremonitoring/MonitoringKibana,URL/Marvel

Elasticsearch2.1URL/Threadpools

Elasticsearch2.2completionsuggester/Completionsuggester

Elasticsearchclusterpreparing,forhighindexing/Preparingtheclusterforhighindexingand

www.EBooksWorld.ir

queryingthroughputpreparing,forhighquerying/Preparingtheclusterforhighindexingandqueryingthroughput

ElasticsearchHQtoolusing/ElasticsearchHQ

Elasticsearchindexingabout/Elasticsearchindexingshards/Shardsandreplicasreplicas/Shardsandreplicasindices,creating/Creatingindices

Elasticsearchinfrastructurekeyconcepts/KeyconceptsoftheElasticsearchinfrastructurenode/Nodesandclusterscluster/Nodesandclustersshard/Shardsreplica/Replicasgateway/Gateway

Elasticsearchmonitoringabout/MonitoringElasticsearchHQtool,using/ElasticsearchHQMarveltool,using/MarvelSPMtool,using/SPMforElasticsearch

Elasticsearchtimemachineabout/Elasticsearchtimemachinesnapshotrepository,creating/Creatingasnapshotrepositorysnapshots,creating/Creatingsnapshotssnapshot,restoring/Restoringasnapshotparameters/Restoringasnapshotoldsnapshots,deleting/Cleaningup–deletingoldsnapshots

existsquery/TheexistsqueryExplainAPI

URL/Explainingthequeryexplaininformation

about/Understandingtheexplaininformationfieldanalysis/Understandingfieldanalysisquery,explaining/Explainingthequery

www.EBooksWorld.ir

Ffactors,forscorepropertycalculation

documentboost/Whenadocumentismatchedfieldboost/Whenadocumentismatchedcoord/Whenadocumentismatchedinversedocumentfrequency/Whenadocumentismatchedlengthnorm/Whenadocumentismatchedtermfrequency/Whenadocumentismatchedquerynorm/Whenadocumentismatched

FastVectorHighlighterURL/Underthehood

FedoraLinuxURL/InstallingElasticsearchusingRPMpackages

fielddatacacheabout/Fielddatacachesize,controlling/Fielddatasizecircuitbreakers/Circuitbreakers

fielddefinitionvariables,dynamictemplates{name}/Fielddefinitions{dynamic_type}/Fielddefinitions

filteringabout/Filteringinclude/Whatdoinclude,exclude,andrequiremeanrequire/Whatdoinclude,exclude,andrequiremeanexclude/Whatdoinclude,exclude,andrequiremean

filterslowercasefilter/Inputdataanalysissynonymsfilter/Inputdataanalysislanguagestemmingfilters/Inputdataanalysis

filtersandtokenizersURL/Definingyourownanalyzers

filtertypesURL/Definingyourownanalyzers

fulltextsearchingabout/FulltextsearchingApacheLucene,glossary/TheLuceneglossaryandarchitectureApacheLucene,architecture/TheLuceneglossaryandarchitectureinputdataanalysis/Inputdataanalysisindexing/Indexingandqueryingquerying/Indexingandqueryingscoring/Scoringandqueryrelevancequeryrelevance/Scoringandqueryrelevance

functionscorequery

www.EBooksWorld.ir

about/Thefunctionscorequerystructure/Structureofthefunctionqueryweightfactorfunction/Theweightfactorfunctionfield_value_factorfunction/Fieldvaluefactorfunctionscript_scorefunction/Thescriptscorefunctionrandom_scorefunction/Therandomscorefunctiondecayfunctions/Decayfunctions

function_scorequeryURL/Decayfunctions

fuzzyqueryabout/Thefuzzyquery

www.EBooksWorld.ir

Ggateway/Gatewaygatewaymodule

about/Thegatewayandrecoverymodules,Thegatewaygatewayrecoveryoptions

gateway.recover_after_master_nodes/Additionalgatewayrecoveryoptionsgateway.recover_after_data_nodes/Additionalgatewayrecoveryoptionsgateway.expected_master_nodes/Additionalgatewayrecoveryoptionsgateway.expected_data_nodes/Additionalgatewayrecoveryoptions

generalpreparations,singleElasticsearchnodeabout/Thegeneralpreparationsswapping,avoiding/Avoidingswappingfiledescriptors/Filedescriptorsvirtualmemory/Virtualmemory,Thememory

Geo/Geoboundsaggregationgeodistanceaggregations

about/GeodistanceaggregationsGeohash

URL/Geohashgridaggregationgeohashgridaggregation

about/GeohashgridaggregationURL/Geohashgridaggregation

GeohashvalueURL/Exampledata

GeoJSONURL/Arbitrarygeoshapes

geospatialqueriesURL/Samplequeries

geo_fieldpropertiesgeohash/Additionalgeo_fieldpropertiesgeohash_precision/Additionalgeo_fieldpropertiesgeohash_prefix/Additionalgeo_fieldpropertiesignore_malformed/Additionalgeo_fieldpropertieslat_lon/Additionalgeo_fieldpropertiesprecision_step/Additionalgeo_fieldproperties

GitHubURL/Installingpluginsautomaticstorethrottling,URL/Automaticstorethrottling

Githubissue,URL/String

globalaggregationabout/Globalaggregation

Groovy

www.EBooksWorld.ir

URL/ScriptingcapabilitiesofElasticsearch

www.EBooksWorld.ir

Hhardwarepreparations,forrunningElasticsearch

about/Hardwarephysicalservers/Physicalserversoracloudcloud/PhysicalserversoracloudCPU/CPURAMmemory/RAMmemorymassstorage/Massstoragenetwork/Thenetworkserverscounting/Howmanyserverscostcutting/Costcutting

HDFSURL/Creatingasnapshotrepository

highlightedfragmentscontrolling/Controllinghighlightedfragments

highlightertypeselecting/Forcinghighlightertype

highlightingabout/Highlightingusing/Gettingstartedwithhighlightingfieldconfiguration/FieldconfigurationApacheLucene,using/Underthehoodhighlightertype,selecting/ForcinghighlightertypeHTMLtags,,configuring/ConfiguringHTMLtagsglobalsettings/Globalandlocalsettingslocalsettings/Globalandlocalsettingsmatchingneed/Requirematchingcustomquery/CustomhighlightingqueryPostingshighlighter/ThePostingshighlighter,Validatingyourqueries

horizontalexpansionabout/Horizontalexpansionreplicas,automaticcreation/Automaticallycreatingthereplicasredundancy/Redundancyandhighavailabilityhighavailability/Redundancyandhighavailabilityreferencelinks/Redundancyandhighavailabilitycostandperformanceflexibility/Costandperformanceflexibilitycontinuesupgrades/ContinuousupgradesmultipleElasticsearchinstances,onsinglephysicalmachine/MultipleElasticsearchinstancesonasinglephysicalmachinedesignatednodesrolesforlargerclusters/Designatednoderolesforlargerclusters

howsimilarphrase/UnderstandingtheexplaininformationHTTPmodule

www.EBooksWorld.ir

properties,URL/HTTPhostHTTPprotocol

URL/UnderstandingtheRESTAPIHTTPtransportsettings,adjusting

node/AdjustingHTTPtransportsettingsHTTP,disabling/DisablingHTTPHTTPport/HTTPportHTTPhost/HTTPhost

HyperLogLog++algorithmURL/Fieldcardinality

www.EBooksWorld.ir

Iidentifiersquery

about/Theidentifiersqueryindex

segments/TheLuceneglossaryandarchitectureabout/Index

index-timeboostingusing/Whendoesindex-timeboostingmakesense?defining,inmappings/Definingboostinginthemappings

indexaliasabout/Indexaliasingandusingittosimplifyyoureverydayworkdefining/Analiascreating/Creatinganaliasmodifying/Modifyingaliasescommands,combining/Combiningcommandsretrieving/Retrievingaliasesremoving/Removingaliasesfiltering/Filteringaliasesandrouting/Aliasesandroutingandzerodowntimereindexing/Zerodowntimereindexingandaliases

indexation/Inputdataanalysisindexingprocess

speedingup,batchindexingused/Batchindexingtospeedupyourindexingprocess

indexingrelatedadvicesabout/Indexingrelatedadviceindexrefreshrate/Indexrefreshratethreadpools,tuning/Threadpoolstuningautomaticstorethrottling/Automaticstorethrottlingtime-baseddata,handling/Handlingtime-baseddatamultipledatapaths/Multipledatapathsdatadistribution/Datadistributionbulkindexing/BulkindexingRAMbuffer,usedforindexing/RAMbufferforindexing

indexrefreshratereferencelink/Indexrefreshrate

indexstructuremodifying,withupdateAPI/ModifyingyourindexstructurewiththeupdateAPI

indexstructure,modifyingmappings/Themappingsnewfield,adding/Addinganewfieldtotheexistingindexexistingindexfields,modifying/Modifyingfieldsofanexistingindex

www.EBooksWorld.ir

indexstructure,parent-childrelationshipabout/Indexstructureanddataindexingchildmappings/Childmappingsparentmappings/Parentmappingsparentdocument/Theparentdocumentchildrendocuments/Childdocuments

indexstructuremappingabout/Indexstructuremappingtypes/Typeandtypesdefinitiontypesdefinition/Typeandtypesdefinitionfields/Fieldscoretypes/Coretypesmultifields/MultifieldsIPaddresstype/TheIPaddresstypetokencounttype/Tokencounttype

indices,Elasticsearchindexingcreating/Creatingindicesautomaticcreation,altering/Alteringautomaticindexcreationnewlycreatedindex,settings/Settingsforanewlycreatedindexdeleting/Indexdeletion

indicesanalyzeAPIURL/Queryanalysis

indicesquery/TheindicesqueryindicessettingsAPI/TheindicessettingsAPIindicesstatsAPI

about/IndicesstatsAPIdocs/Docsstore/Storeindexing/Indexing,get,andsearchget/Indexing,get,andsearchsearch/Indexing,get,andsearchdefining/Additionalinformation

internalfields_id/Additionalinternalfields_uid/Additionalinternalfields_type/Additionalinternalfields_field_names/Additionalinternalfields

invertedindexabout/TheLuceneglossaryandarchitectureURL/Index

www.EBooksWorld.ir

JJava

URL/Fulltextsearchinginstalling/InstallingJava

JavaScriptObjectNotation(JSON)URL/RunningElasticsearch

JavathreadsURL/Threadpools

JavatypesURL/Number

JavaVersion7URL/InstallingJava

JavaVirtualMachine(JVM)/ConfiguringElasticsearchJMeter

URL/WhencachesshouldbeavoidedJodaTimelibrary

URL/DaterangeaggregationJSON

URL/Document

www.EBooksWorld.ir

KKibana

URL/Marvel

www.EBooksWorld.ir

Llanguageanalyzer

URL/Out-of-the-boxanalyzerslanguageanalyzers

URL/Sampledocumentlanguagedetection

URL/DetectingthelanguageofthedocumentLevenshteinalgorithm

URL/TheBooleanmatchqueryLinux

Elasticsearch,installing/InstallingElasticsearchonLinuxElasticsearch,configuringassystemservice/ConfiguringElasticsearchasasystemserviceonLinux

LogstashURL/Indexaliasingandusingittosimplifyyoureverydaywork

LuceneJavadocsURL/Defaultscoringformula

Lucenequerysyntaxabout/Lucenequerysyntax

www.EBooksWorld.ir

Mmapping/Mappingmappings

configuration/Mappingsconfigurationtypedeterminingmechanism/Typedeterminingmechanismindexstructuremapping/Indexstructuremappinganalyzers,using/Usinganalyzerssimilaritymodels/Differentsimilaritymodelsabout/Mappingsfinalmappings/Finalmappingssending,toElasticsearch/SendingthemappingstoElasticsearchnewfield,addingtoexistingindex/Addinganewfieldtotheexistingindexfieldofexistingindex,modifying/Modifyingfieldsofanexistingindex

Marveltoolusing/Marvel

masternodeabout/Noderoles,Masternode

matchallquery/Thematchallquerymatchingpattern,dynamictemplates

match/Thematchingpatternunmatch/Thematchingpattern

matchqueryabout/ThematchqueryBooleanmatchquery/TheBooleanmatchqueryphrasematchquery/Thephrasematchquerymatchphraseprefixquery/Thematchphraseprefixquery

MavenURL/Installingplugins

MavenCentralURL/Installingplugins

MavenSonatypeURL/Installingplugins

mergepolicyabout/Themergepolicyproperties/Themergepolicy

mergescheduler/Themergeschedulermetricsaggregations

about/Metricsaggregationsmin/Minimum,maximum,average,andsummax/Minimum,maximum,average,andsumavg/Minimum,maximum,average,andsumsum/Minimum,maximum,average,andsummissingvalues/Missingvalues

www.EBooksWorld.ir

scripts,using/Usingscriptsfieldvaluestatistics/Fieldvaluestatisticsandextendedstatisticsextended_statistics/Fieldvaluestatisticsandextendedstatisticsvalue_countaggregation/Valuecountfieldcardinalityaggregation/Fieldcardinalitypercentilesaggregation/Percentilespercentile_ranksaggregation/Percentilerankstop_hitsaggregation/Tophitsaggregationtop_hitsaggregation,additionalparameters/Additionalparametersgeo_boundsaggregation/Geoboundsaggregationscriptedmetricsaggregation/Scriptedmetricsaggregation

MicrosoftWindowsplatformfilehandles,URL/ConfiguringElasticsearch

minimum_should_matchparameterURL/Theboolquery

missingquery/Themissingquerymorelikethisquery

about/Themorelikethisquerymovingaveragescalculation

URL/Pipelineaggregationsmoving_avgaggregation

URL/Movingavgaggregationabout/Movingavgaggregationfuturebuckets,predicting/Predictingfuturebucketsmodels/Themodelsmodels,URL/Themodels

multimatchquery/ThemultimatchquerymultipleElasticsearchinstances,onsinglephysicalmachine

about/MultipleElasticsearchinstancesonasinglephysicalmachineshard,preventingonsamenode/Preventingashardanditsreplicasfrombeingonthesamenodereplicas,preventingonsamenode/Preventingashardanditsreplicasfrombeingonthesamenode

multipleindicesURL/URIsearch

multiterm/Queryrewritemultivaluedfield/DocumentMustache

URL/ScriptingcapabilitiesofElasticsearch

www.EBooksWorld.ir

Nnativecode,using

factoryimplementation/Thefactoryimplementationnativescriptimplementation/Implementingthenativescriptplugindefinition/Theplugindefinitionplugin,installing/Installingthepluginscript,running/Runningthescript

nestedaggregationabout/Nestedaggregation

nestedobjectsusing/UsingnestedobjectsURL/Usingnestedobjectsnestedqueries/Scoringandnestedqueriesscore_modeproperty,setting/Scoringandnestedqueries

nestingaggregationsabout/Nestingaggregationsandorderingbuckets

networkattachedstorage(NAS)/Massstoragenode/Nodesandclusters

discoveryTopicnabout/Understandingnodediscoverydiscoverytypes/Understandingnodediscovery,Discoverytypesroles/Noderolesclustername,setting/Settingthecluster’snameZendiscovery/ZendiscoveryHTTPtransportsettings,adjusting/AdjustingHTTPtransportsettings

noderolesmasternode/Noderoles,Masternodedatanode/Noderoles,Datanodeclientnode/Noderoles,Clientnodeconfiguring/Configuringnoderoles

nodesinfoAPIabout/NodesinfoAPIrequisites/NodesinfoAPIextensiveinformation,returning/Returnedinformation

NoSQLURL/ManipulatingdatawiththeRESTAPI

number,indexstructuremappingbyte/Numbershort/Numberinteger/Numberlong/Numberfloat,URL/Numberfloat/Numberdouble/Number

www.EBooksWorld.ir

double,URL/Number

www.EBooksWorld.ir

Oobjectindexing

disabling/Disablingobjectindexingofficialrepository

URL/InstallingpluginsOpenJDK

URL/InstallingJavaoptimisticlocking

URL/Versioningoptions,termsuggester

lowercase_terms/Additionaltermsuggesteroptionsmax_edits/Additionaltermsuggesteroptionsprefix_len/Additionaltermsuggesteroptionsmin_word_len/Additionaltermsuggesteroptionsshard_size/Additionaltermsuggesteroptions

out-of-the-boxanalyzersstandard/Out-of-the-boxanalyzerssimple/Out-of-the-boxanalyzerswhitespace/Out-of-the-boxanalyzersstop/Out-of-the-boxanalyzerskeyword/Out-of-the-boxanalyzerspattern/Out-of-the-boxanalyzerslanguage/Out-of-the-boxanalyzerssnowball/Out-of-the-boxanalyzers

www.EBooksWorld.ir

Pparameters,Booleanmatchquery

operator/TheBooleanmatchqueryanalyzer/TheBooleanmatchqueryfuzziness/TheBooleanmatchqueryprefix_length/TheBooleanmatchquerymax_expansions/TheBooleanmatchqueryzero_terms_query/TheBooleanmatchquerycero_terms_query/TheBooleanmatchquerylenient/TheBooleanmatchquery

parameters,fuzzyqueryvalue/Thefuzzyqueryboost/Thefuzzyqueryfuzziness/Thefuzzyqueryprefix_length/Thefuzzyquerymax_expansions/Thefuzzyquery

parameters,morelikethisqueryfields/Themorelikethisquerylike/Themorelikethisqueryunlike/Themorelikethisqueryin_term_freq/Themorelikethisquerymax_query_terms/Themorelikethisquerystop_words/Themorelikethisquerymin_doc_freq/Themorelikethisquerymin_word_len/Themorelikethisquerymax_word_len/Themorelikethisqueryboost_terms/Themorelikethisqueryboost/Themorelikethisqueryinclude/Themorelikethisqueryminimum_should_match/Themorelikethisqueryanalyzer/Themorelikethisquery

parameters,querystringqueryquery/Thequerystringquerydefault_field/Thequerystringquerydefault_operator/Thequerystringqueryanalyzer/Thequerystringqueryallow_leading_wildcard/Thequerystringquerylowercase_expand_terms/Thequerystringqueryenable_position_increments/Thequerystringqueryfuzzy_max_expansions/Thequerystringqueryfuzzy_prefix_length/Thequerystringqueryphrase_slop/Thequerystringqueryboost/Thequerystringquery

www.EBooksWorld.ir

analyze_wildcard/Thequerystringqueryauto_generate_phrase_queries/Thequerystringqueryminimum_should_match/Thequerystringqueryfuzziness/Thequerystringquerymax_determined_states/Thequerystringquerylocale/Thequerystringquerytime_zone/Thequerystringquerylenient/Thequerystringquery

parameters,rangequerygte/Therangequerygt/Therangequerylte/Therangequerylt/Therangequery

parent-childrelationshipusing/Usingtheparent-childrelationshipindexstructure/Indexstructureanddataindexingdataindexing/Indexstructureanddataindexingquerying/Queryingperformanceconsiderations/Performanceconsiderations

parentaggregations/Availabletypespatternanalyzer

URL/Out-of-the-boxanalyzerspercolator

about/Percolatorindex/Theindexpreparing/Percolatorpreparationexploring/Gettingdeeperreturnedresultssize,controlling/Controllingthesizeofreturnedresultsusing,forandscorecalculation/Percolatorandscorecalculationcombining,withotherfunctionalities/Combiningpercolatorswithotherfunctionalitiesmatchingqueriescount,obtaining/Gettingthenumberofmatchingqueriesindexeddocumentspercolation/Indexeddocumentpercolation

phrasematchqueryslop/Thephrasematchqueryanalyzer/Thephrasematchquery

phrasesuggesterabout/Phrasesuggesterconfiguration/Configuration

pipelineaggregationsabout/PipelineaggregationsURL/Pipelineaggregationsparentaggregationfamily/Availabletypessiblingaggregationfamily/Availabletypes

www.EBooksWorld.ir

types/Availabletypes,Pipelineaggregationtypesotheraggregations,referencing/Referencingotheraggregationsdata,gaps/Gapsinthedata

pipelineaggregations,typessum_bucket/Min,max,sum,andaveragebucketaggregationsmin_bucket/Min,max,sum,andaveragebucketaggregationsmax_bucket/Min,max,sum,andaveragebucketaggregationsavg_bucket/Min,max,sum,andaveragebucketaggregationscumulative_sumaggregation/Cumulativesumaggregationbucket_selectoraggregation/Bucketselectoraggregationbucket_scriptaggregation/Bucketscriptaggregationserial_diffaggregation/Serialdifferencingaggregationderivativeaggregation/Derivativeaggregationmoving_avgaggregation/Movingavgaggregation

pluginsabout/Elasticsearchpluginsbasics/Thebasicsinstalling/Installingpluginsremoving/Removingplugins

PostingsHighlighterURL/Underthehoodabout/ThePostingshighlighter

prefixquery/Theprefixqueryproperties,faultdetectionpingsettings

discovery.zen.fd.ping_interval/Faultdetectionpingsettingsdiscovery.zen.fd.ping_timeout/Faultdetectionpingsettingsdiscovery.zen.fd.ping_retries/Faultdetectionpingsettings

properties,mergepolicyindex.merge.policy.expunge_deletes_allowed/Themergepolicyindex.merge.policy.max_merge_at_once/Themergepolicyindex.merge.policy.max_merge_at_once_explicit/Themergepolicyindex.merge.policy.max_merged_segment/Themergepolicyindex.merge.policy.segments_per_tier/Themergepolicyindex.merge.policy.reclaim_deletes_weight/Themergepolicy

www.EBooksWorld.ir

Qqueries

selecting,forwarming/Choosingqueriesforwarmingqueryboost

applying,todocument/Theboostqueryboosts

used,forinfluencingscores/Influencingscoreswithqueryboostsabout/Theboostadding,toqueries/Theboost,Addingtheboosttoqueriesscore,modifying/Modifyingthescore

queryingdata,inchilddocuments/Queryingdatainthechilddocumentsdata,inparentdocuments/Queryingdataintheparentdocuments

queryingprocessabout/Understandingthequeryingprocessquerylogic/Querylogicsearchtype,specifying/Searchtypesearchexecutionpreference,specifying/SearchexecutionpreferencesearchshardsAPI,specifying/SearchshardsAPI

queryparserURL/Lucenequerysyntax

queryrewriteabout/Queryrewriteprefixquery,example/PrefixqueryasanexampleApacheLucene,using/GettingbacktoApacheLuceneproperties/Queryrewriteproperties

querystringqueryabout/Thequerystringqueryrunning,againstmultiplefields/Runningthequerystringqueryagainstmultiplefields

www.EBooksWorld.ir

RRackspace

URL/CostandperformanceflexibilityRAID

URL/Massstoragerangeaggregation

about/Rangeaggregationkeyedbuckets/Keyedbuckets

rangequery/Therangequeryrecoverymodules

about/Thegatewayandrecoverymodulesrecoveryprocess

about/Recoverycontrolgatewayrecoveryoptions/AdditionalgatewayrecoveryoptionsindicesrecoveryAPI/IndicesrecoveryAPIdelayedallocation/Delayedallocationindexrecoveryprioritization/Indexrecoveryprioritization

regularexpressionqueryabout/RegularexpressionqueryURL/Regularexpressionquery

replica/Replicasreplicas,Elasticsearchindexing

about/Shardsandreplicaswriteconsistency,controlling/Writeconsistency

RESTAPIused,fordatamanipulation/ManipulatingdatawiththeRESTAPIabout/UnderstandingtheRESTAPIURL/UnderstandingtheRESTAPIdata,storinginElasticsearch/StoringdatainElasticsearchdocuments,retrieving/Retrievingdocumentsdocuments,updating/Updatingdocumentsdocuments,deleting/Deletingdocumentsversioning/Versioning

resultsfiltering/Filteringyourresultsquerycontext/Thecontextisthekeyexplicitfiltering,boolqueryused/Explicitfilteringwithboolquery

reverse_nestedaggregationabout/Reversenestedaggregation

rewriteproperty,valuesscoring_boolean/Queryrewritepropertiesconstant_score/Queryrewritepropertiesconstant_score_boolean/Queryrewriteproperties

www.EBooksWorld.ir

top_terms/Queryrewritepropertiestop_terms_blendedfreqs/Queryrewritepropertiestop_terms_boost_N/Queryrewriteproperties

rightqueryselecting/Choosingtherightqueryusecases/Theusecasesresults,limitingtogiventags/Limitingresultstogiventagsvaluesinrange,searching/Searchingforvaluesinarange

routingabout/Introductiontorouting,Routingdefaultindexing/Defaultindexingdefaultsearching/Defaultsearchingparameters/Theroutingparametersfields/Routingfields

RPMpackageused,forinstallingElasticsearch/InstallingElasticsearchusingRPMpackages

www.EBooksWorld.ir

Ssample

distance-basedsorting/Distance-basedsortingboundingboxfiltering/Boundingboxfilteringdistance,limiting/Limitingthedistance

samplequeriesabout/Samplequeries

sampleraggregationabout/Sampleraggregation

scoreabout/IntroductiontoApacheLucenescoringinfluencing,withqueryboosts/Influencingscoreswithqueryboostsmodifying/Modifyingthescore

score,modifyingabout/Modifyingthescoreconstant_scorequery/Constantscorequeryboostingquery/Boostingqueryfunctionscorequery/Thefunctionscorequery

score_modeparameterabout/Structureofthefunctionquerymultiplevalue/Structureofthefunctionquerysumvalue/Structureofthefunctionqueryavgvalue/Structureofthefunctionqueryfirstvalue/Structureofthefunctionquerymaxvalue/Structureofthefunctionqueryminvalue/Structureofthefunctionquery

scriptfieldsselecting/Usingthescriptfieldsparameters,passingto/Passingparameterstothescriptfields

scriptingcapabilitiesabout/ScriptingcapabilitiesofElasticsearchscriptexecution,availableobjects/Objectsavailableduringscriptexecutionscript,types/Scripttypesquerying,scriptsused/Queryingwithscriptsparameters,using/Scriptingwithparameterslanguages,Groovy/Scriptlanguagesotherthanembeddedlanguages,using/Usingotherthanembeddedlanguagesnativecode,using/Usingnativecode

scriptpropertiesscript/Queryingwithscriptsinline/Queryingwithscriptsid/Queryingwithscriptsfile/Queryingwithscripts

www.EBooksWorld.ir

lang/Queryingwithscriptsparams/Queryingwithscripts

scripts,scripted_metricaggregationinit_script/Scriptedmetricsaggregationmap_script/Scriptedmetricsaggregationcombine_script/Scriptedmetricsaggregationreduce_script/Scriptedmetricsaggregation

scripttypesabout/Scripttypesinlinescripts/Scripttypes,Inlinescriptsinfilescripts/Infilescriptsindexedscripts/Indexedscripts

ScrollAPIabout/TheScrollAPIproblemdefinition/Problemdefinitionproblemdefinition,solution/Scrollingtotherescue

searching/Defaultsearchingsearchingrequestexecution/Indexingandsearchingsegmentmerging

about/Introductiontosegmentmerging,Segmentmergingneedfor/Theneedforsegmentmergingmergepolicy/Themergepolicymergepolicy,basicproperties/Themergepolicymergescheduler/Themergeschedulerthrottling/Throttling

shardallocationIPaddress,usingfor/UsingtheIPaddressforshardallocationcancelling/Cancelingshardallocationforcing/ForcingshardallocationmultiplecommandsperHTTPrequest/MultiplecommandsperHTTPrequestoperations,allowingonprimaryshards/Allowingoperationsonprimaryshards

shardandreplicaallocationcontrolling/Controllingtheshardandreplicaallocationcontrolling,explicitly/Explicitlycontrollingallocationnodeparameters,specifying/Specifyingnodeparametersconfiguration/Configurationindex,creating/Indexcreationnodes,excluding/Excludingnodesfromallocationnodeattributes,requiring/Requiringnodeattributesnumberofshardsandreplicaspernode/Thenumberofshardsandreplicaspernodeallocationthrottling/Allocationthrottlingclusterwideallocation/Cluster-wideallocationshardsandreplicas,movingmanually/Manuallymovingshardsandreplicas

www.EBooksWorld.ir

rollingrestarts,handling/Handlingrollingrestartsshardrequestcache

about/Shardrequestcacheenabling/Enablingandconfiguringtheshardrequestcacheconfiguring/Enablingandconfiguringtheshardrequestcacheperrequestshardrequestcache,disabling/Perrequestshardrequestcachedisablingusagemonitoring/Shardrequestcacheusagemonitoring

shards/Index,Shardsmoving/Movingshards

shards,Elasticsearchindexingabout/Shardsandreplicaswriteconsistency,controlling/Writeconsistency

siblingaggregations/Availabletypessignificant_termsaggregation

about/Significanttermsaggregationsignificantterms,selecting/Choosingsignificanttermsmultiplevalue,analyzing/Multiplevalueanalysis

similaritymodelsabout/Differentsimilaritymodelsper-fieldsimilarity,setting/Settingper-fieldsimilarityOkapiBM25model/Availablesimilaritymodelsrandomnessmodel,divergence/Availablesimilaritymodelsinformation-basedmodel/Availablesimilaritymodelsdefaultsimilarity,configuring/ConfiguringdefaultsimilarityBM25similarity,configuring/ConfiguringBM25similarityDFRsimilarity,configuring/ConfiguringDFRsimilarityIBsimilarity,configuring/ConfiguringIBsimilarity

simplequerystringqueryabout/ThesimplequerystringqueryURL/Thesimplequerystringquery

singleElasticsearchnodetuning/PreparingasingleElasticsearchnodegeneralpreparations/Thegeneralpreparationsfielddatacache/Fielddatacacheandbreakingthecircuitcircuit,breaking/Fielddatacacheandbreakingthecircuitdocvalues,using/UsedocvaluesRAMbuffer,usedforindexing/RAMbufferforindexingindexrefreshrate/Indexrefreshratethreadpools/Threadpools

snapshotscreating/Creatingsnapshotsadditionalparameters/Additionalparameters

snowballanalyzer

www.EBooksWorld.ir

URL/Out-of-the-boxanalyzersSoftwareasaService(SaaS)/SPMforElasticsearchsourcefiltering/Sourcefilteringspan/Aspanspanfirstquery/Spanfirstqueryspannearquery/Spannearqueryspannotquery/Spannotqueryspanorquery/Spanorqueryspanqueries

using/Usingspanqueriesspan/Aspanspan_termquery/Spantermqueryspanfirstquery/Spanfirstqueryspannearquery/Spannearqueryspanorquery/Spanorqueryspannotquery/Spannotqueryspan_withinquery/Spanwithinqueryspan_containingquery/Spancontainingqueryspan_multiquery/Spanmultiqueryperformanceconsiderations/Performanceconsiderations

span_contaningquery/Spancontainingqueryspan_multiquery/Spanmultiqueryspan_termquery/Spantermqueryspan_withinquery/Spanwithinqueryspatialcapabilities

about/Elasticsearchspatialcapabilitiesmappingspreparation/Mappingpreparationforspatialsearchesexampledata/Exampledatageo_fieldproperties/Additionalgeo_fieldproperties

SPMtoolURL/SPMforElasticsearch

standardanalyzerURL/Out-of-the-boxanalyzers

stateandhealth,clustermonitoring/Monitoringyourcluster’sstateandhealthclusterhealthAPI/ClusterhealthAPIindicesstatsAPI/IndicesstatsAPInodesinfoAPI/NodesinfoAPInodesstatsAPI/NodesstatsAPIclusterstateAPI/ClusterstateAPIclusterstatsAPI/ClusterstatsAPIpendingtasksAPI/PendingtasksAPIindicesrecoveryAPI/IndicesrecoveryAPIindicesshardstoresAPI/IndicesshardstoresAPI

www.EBooksWorld.ir

indicessegmentsAPI/IndicessegmentsAPIstaticproperties,forindexingbuffersizeconfiguration

indices.memory.index_buffer_size/Indexingbuffersindices.memory.min_index_buffer_size/Indexingbuffersindices.memory.max_index_buffer_size/Indexingbuffersindices.memory.min_shard_index_buffer_size/Indexingbuffers

statuscodedefinitionURL/Indexingthedata

stemmingURL/Out-of-the-boxanalyzers

stopanalyzerURL/Out-of-the-boxanalyzers

stopwordsURL/Thecommontermsquery

string,indexstructuremappingterm_vector/Stringanalyzer/Stringsearch_analyzer/Stringnorms.enabled/Stringnorms.loading/Stringposition_offset_gap/Stringindex_options/Stringignore_above/String

suggestersusing/UsingsuggestersURL/Usingsuggesters,Additionaltermsuggesteroptionstypes/Availablesuggestertypessuggestions,including/Includingsuggestionsresponse/Suggesterresponsetextproperty/Suggesterresponsescoreproperty/Suggesterresponsefreqproperty/Suggesterresponse

synonymrulesdefining/DefiningsynonymrulesApacheSolrsynonyms,using/UsingApacheSolrsynonymsWordNetsynonyms,using/UsingWordNetsynonyms

synonymsabout/Wordswiththesamemeaningfiltering/Synonymfilterinmappings/Synonymsinthemappingsstoring,infilesystem/Synonymsstoredonthefilesystemrules,defining/Definingsynonymrulesindex-timesynonymsexpansion/Queryorindex-timesynonymexpansionquery-timesynonymexpansion/Queryorindex-timesynonymexpansion

www.EBooksWorld.ir

synonymsfilterusing/Synonymfilter

system-specificinstallationandconfigurationabout/Thesystem-specificinstallationandconfigurationElasticsearch,installingonLinux/InstallingElasticsearchonLinuxElasticsearch,configuringassystemserviceonLinux/ConfiguringElasticsearchasasystemserviceonLinuxElasticsearch,usingassystemserviceonWindows/ElasticsearchasasystemserviceonWindows

www.EBooksWorld.ir

TT-Digestalgorithm

URL/Percentilestemplates

about/Templatesexample/Anexampleofatemplate

termquery/Thetermquerytermsaggregation

about/Termsaggregationapproximatecounts/Countsareapproximateminimumdocumentcount/Minimumdocumentcount

termsquery/Thetermsquerytermsuggester

about/Termsuggesterconfigurationoptions/Termsuggesterconfigurationoptionsoptions/Additionaltermsuggesteroptions

threadpoolsabout/Threadpoolsgeneric/Threadpoolsindex/Threadpoolssearch/Threadpoolssuggest/Threadpoolsget/Threadpoolsbulk/Threadpoolspercolate/Threadpools

throttling,adjustingtypesetting/Throttlingvalue/Throttlingnonevalue/Throttlingmergevalue/Throttlingallvalue/Throttling

timezonesURL/Timezones

tree-likestructuresindexing/Indexingtree-likestructuresdatastructure/Datastructureanalysis/Analysis

typedeterminingmechanismabout/Typedeterminingmechanismdisabling/Disablingthetypedeterminingmechanismtuning,fornumerictypes/Tuningthetypedeterminingmechanismfornumerictypestuning,fordates/Tuningthetypedeterminingmechanismfordates

www.EBooksWorld.ir

typeproperty,valuesplain/Forcinghighlightertypefvh/Forcinghighlightertypepostins/Forcinghighlightertype

typequery/Thetypequerytypes,suggesters

term/Availablesuggestertypes,Termsuggesterphrase/Availablesuggestertypes,Phrasesuggestercompletion/Availablesuggestertypes,Completionsuggestercontext/Availablesuggestertypes,Contextsuggester

www.EBooksWorld.ir

UUnicast

URL/DiscoverytypesupdateAPI

used,formodifyingindexstructure/ModifyingyourindexstructurewiththeupdateAPI

UpdateAPIURL/Addingpartialdocuments

updatesettingsAPIabout/TheupdatesettingsAPIclustersettingsAPI/TheclustersettingsAPIindicessettingsAPI/TheindicessettingsAPI

URIquerystringparametersabout/URIquerystringparametersquery/Thequerydefaultsearchfield/Thedefaultsearchfieldanalyzerproperty/Analyzerdefaultoperator/Thedefaultoperatorpropertyexplainparameter/Queryexplanationfieldsreturned/Thefieldsreturnedresults,sorting/Sortingtheresultssearchtimeout/Thesearchtimeoutresultswindow/Theresultswindowpershardresults,limiting/Limitingper-shardresultsunavailableindices,ignoring/Ignoringunavailableindicessearchtype/Thesearchtypelowercasingtermsexpansion/Lowercasingtermexpansionwildcardqueriesanalysis/Wildcardandprefixanalysisanalyze_wildcardproperty/Wildcardandprefixanalysisprefixqueriesanalysis/Wildcardandprefixanalysis

URIrequestqueryused,forsearching/SearchingwiththeURIrequestquerysampledata/SampledataURIsearch/URIsearchanalyzing/Queryanalysisparameters/URIquerystringparametersLucenequerysyntax/Lucenequerysyntax

URIsearchabout/URIsearchElasticsearchqueryresponse/ElasticsearchqueryresponseURL/Wildcardandprefixanalysis

www.EBooksWorld.ir

VValidateAPI

using/UsingtheValidateAPIvalues,has_childqueryparameter

none/Queryingdatainthechilddocumentsmin/Queryingdatainthechilddocumentsmax/Queryingdatainthechilddocumentssum/Queryingdatainthechilddocumentsavg/Queryingdatainthechilddocuments

values,inrangesearching/Searchingforvaluesinarangematcheddocuments,boosting/Boostingsomeofthematcheddocumentslowerscoringpartialqueries,ignoring/IgnoringlowerscoringpartialqueriesLucenequerysyntax,usinginqueries/UsingLucenequerysyntaxinqueriesuserquerieswithouterrors,handling/Handlinguserquerieswithouterrorsprefixes,usedforprovidingautocompletefunctionality/Autocompleteusingprefixessimilarterms,finding/Findingtermssimilartoagivenonespans/Spans,spanseverywhere

values,score_modepropertyavg/Scoringandnestedqueriessum/Scoringandnestedqueriesmin/Scoringandnestedqueriesmax/Scoringandnestedqueriesnone/Scoringandnestedqueries

versioningabout/Versioningusageexample/Usageexamplefromexternalsystem/Versioningfromexternalsystems

verticalscaling/PreparingasingleElasticsearchnode

www.EBooksWorld.ir

Wwarmingquery

about/Warmingupdefining/Defininganewwarmingquerydefinedwarmingqueries,retrieving/Retrievingthedefinedwarmingqueriesdeleting/Deletingawarmingquerywarmingupfunctionality,disabling/Disablingthewarmingupfunctionality

wildcardquery/ThewildcardqueryWindows

Elasticsearch,configuringassystemservice/ElasticsearchasasystemserviceonWindows

WordNetURL/UsingWordNetsynonyms

www.EBooksWorld.ir

ZZendiscovery

about/Zendiscoverymasterelectionconfiguration/Masterelectionconfigurationunicast,configuring/Configuringunicastfaultdetectionpingsettings/Faultdetectionpingsettingsclusterstateupdatescontrol/Clusterstateupdatescontrolmasterunavailability,dealingwith/Dealingwithmasterunavailability

www.EBooksWorld.ir

Recommended