Upload
evolve-aem-community-summit
View
158
Download
2
Tags:
Embed Size (px)
Citation preview
DEMYST IFY ING OAK SEARCH
P R E S E N T E D B Y
Justin Edelson & Darin Kuntze | Adobe
2
3
• Oak Query Implementation
• Cost Calculation
• Oak Index Implementations
AGENDA
4
CAVEAT
Covers Oak 1.0.5 (AEM 6.0 SP1)
5
• Search is the most significant change for AEM developers between CRX2 and
Oak.
WHY SHOULD YOU CARE?
6
CRX2 Search – Limited Optimization Opportunities
Baseline Search Performance – OK
No “Plan” Output
Single Index
Minimal Configuration
WHY SHOULD YOU CARE?
7
Oak Search – Many Optimization Opportunities
Baseline Performance – Slow
Viewable Plan
Different Index Types
WHY SHOULD YOU CARE?
8
OAK QUERY IMPLEMENTAT ION OVERVIEW
9
EXAMPLE
/jcr:root/content/geometrixx/en/products//element(*,
nt:unstructured)[@sling:resourceType = 'geometrixx/components/title' and
@jcr:title = 'Triangle']
10
Oak supports an “explain” query prefix, similar to what many RDMBS’s support.
explain /jcr:root/content/geometrixx/en/products//element(*,
nt:unstructured)[@sling:resourceType = 'geometrixx/components/title’]
Shows you which index was used.
queryResult.getRows().nextRow().getValue("plan")
SEE ING THE PLAN
11
SEE ING THE PLAN – EXPLAIN QUERY TOOL
Plan
Explanation
12
Stored in the repository as nodes under /oak:index
Node Type is oak:QueryIndexDefinition
Single mandatory property – “type”
Optional generic properties:
async – set to “async” to do index updates asynchronously
reindex – set to true to trigger a reindex
declaringNodeTypes – one or more node types to restrict indexing
entryCount – used to weight indexes
INDEX DEF IN IT IONS
13
Sync indexes (the default) update in the context of a save() call
Async indexes do not.
Every 5 seconds, the diff between the last successful indexed state and the
HEAD state is read and used to update the index
CONSEQUENCE - async indexes may not return up-to-date returns
The OOTB ordered and Lucene indexes are defined as async.
All external indexes (e.g. Solr) should also be async.
SYNC VS. ASYNC INDEX
14
VIEWING CURRENT INDEXES
15
Many indexes store their content in the repository, but hidden.
Cannot be viewed using CRXDE Lite.
Must use oak-run
TarMK – use either “explore” (GUI) or “console” (CLI) command
MongoMK – use “console” command
• Vote for OAK-2096 to get “explore” support working for MongoMK
VIEWING INDEX CONTENT
16
Created as content via CRXDE Lite / deployed using content package
Created through code.
Created through configuration.
CREATING AN INDEX
17
When the configuration changes
For example, changing the declaringNodeTypes
But not the entryCount
(Sometimes) After updating Oak
Check the Release Notes, this should be prominently indicated.
But not arbitrarily…
Reindexing is a resource intensive process.
Reindexing will NOT help query performance.
WHEN SHOULD YOU RE INDEX?
18
Each Index calculates a relative cost for the query
Number between 0 and Infinity
0 = “Pick me!”
Infinity = “Don’t Pick Me!”
COST CALCULAT ION
19
Enable DEBUG logging on org.apache.jackrabbit.oak.query.QueryImpl
Per Index Type Cost
Enable DEBUG logging on
org.apache.jackrabbit.oak.plugins.index.property.PropertyIndex
Detailed Property Cost
Enable DEBUG logging on
org.apache.jackrabbit.oak.plugins.index.property.OrderedPropertyIndex
Detailed Ordered Property Cost
Enable DEBUG logging on org.apache.jackrabbit.oak.plugins.index.lucene
Detailed Lucene Cost
DEBUGGING COST CALCULAT ION
20
Query = /jcr:root/content/geometrixx/en/products//element(*,
nt:unstructured)[@sling:resourceType = 'geometrixx/components/title' and @jcr:title
= 'Triangle']
cost for aggregate lucene is Infinity
cost for reference is Infinity
cost for ordered is Infinity
cost for nodeType is Infinity
property cost for sling:resourceType is 10003.0
property cost for jcr:title is Infinity
Cheapest property cost is 10003.0 for property sling:resourceType
cost for property is 10003.0
cost for traverse is 199996.0
SAMPLE DEBUG OUTPUT
21
Query = /jcr:root/content/geometrixx/en/products//element(*,
nt:unstructured)[@sling:resourceType = 'geometrixx/components/title' and
@type='large']
cost for aggregate lucene is Infinity
cost for reference is Infinity
cost for ordered is Infinity
cost for nodeType is Infinity
property cost for sling:resourceType is 10003.0
property cost for type is 21.0
Cheapest property cost is 21.0 for property type
cost for property is 21.0
cost for traverse is 199996.0
SAMPLE DEBUG OUTPUT
22
These indexes you can create new ones of
Property
Ordered Property
Solr
Lucene
These you shouldn’t
Reference
Node Type
And then there is a special one
Traversing
INDEX IMPLEMENTAT IONS
23
Stores node paths indexed by a particular property value
Example: /oak:index/slingResourceType
Can be unique (unique = true)
Examples: rep:principalName & jcr:uuid
Only usable with sync indexes
PROPERTY INDEX
24
PROPERTY INDEX – IN OAK EXPLORER
25
PROPERTY INDEX – IN OAK EXPLORERA Match!
26
PROPERTY INDEX - UNIQUE
27
Generalized Cost Calculation:
Cost per Execution + (Estimated Matches * Cost per Entry)
Cost per Execution – 2
Cost per Entry – 1
PROPERTY INDEX – COST CALCULAT ION
28
For name=value queries (e.g.
[@sling:resourceType=‘foundation/components/text’], including lists
If entry count provided, the estimated cost is entry count / key count + number
of values in the query
• Key count defaults to entry count / 10000, but can be manually specified
Otherwise, counts up to 100 matches across the first three values.
If > 100 matches, estimated matches are 1.1 ^ (the average depth of matches)
If > 3 values, estimated matches are extrapolated from the first three values.
For exists queries (e.g. [@sling:resourceType]
If entry count provided, it is the estimated count.
Otherwise, counts up to 100 matches across all values.
If > 100 matches, estimated matches are 1.1 ^ (the average depth of matches)
PROPERTY INDEX – EST IMATING MATCHES
29
Stores node paths indexed by a particular property value
Has extra :next property on each value node to handle ordering
Example: /oak:index/cqLastModified
WARNING – only supports lexigraphic sorting
ORDERED INDEX
30
ORDERED INDEX – IN OAK EXPLORER
31
1 + (Estimated Matches * 1.3)
Similar to Property Index
Doesn’t support entryCount
ORDERED INDEX - COST CALCULAT ION
32
Flat list of UUIDs.
Each node points to a path.
Cost is always 1 if a match is available
REFERENCE INDEX
33
REFERENCE INDEX – IN OAK EXPLORER
34
Special type of Property Index
Note that not all node types are indexed by default
Has a default entryCount of a very high value
NODE TYPE INDEX
35
Oak Index Implementation:
LUCENE
36
What Oak Lucene is (and is not)
LUCENE
37
FLOW
jcr:containsquery
detected
Repo-based Lucene index
queried
Results Returned
38
//*[jcr:contains(., ‘Experience Manager’)]
Any query that includes a full text condition
Native queries
FULL TEXT QUERIES
39
oak:QueryIndexDefintion
type = lucene
async = async
includePropertyTypes[] = String, Binary
excludePropertyNames[] = …
reindex = true
LUCENE DEF IN IT ION
40
What can’t you do?
LUCENE
41
Customize the tika configuration
Configurable analyzers (OAK-2177)
Synonyms
Boost Terms at index time (OAK-2178)
LUCENE
42
SOLR
Based on Lucene
Fault TolerantRich Document Handlers
Geospatial Search
Load Balancing
AEM 6.0 Configurable:
Full Text Search
IndexingNative Queries
43
There are 4 configurable components
Oak Solr embedded server
Oak Solr indexing / search
Oak remote server
Oak Solr server provider
SOLR CONFIGURATION
44
oak:QueryIndexDefintion
type = solr
async = async
reindex = true
SOLR DEF IN IT ION
45
//*[jcr:contains(., ‘Experience
Manager’)]
SOLR FULL TEXT QUERIES
Solr enables restrictions based on:
• Path
• Property
• Primary Type
46
jcr:containsquery
detected
Remote solrindex
queried
Results Returned
FLOW
• In oak-solr-core 1.0.1+ (AEM 6 SP1) you can add property, path & primary
• type restrictions to your query
47
Types of Solr that Oak uses
SOLR TYPES
Embedded Solr
Primarily used for development
work. The solr instance runs within
AEM and can be configured similar
to the remote instance
Remote Solr
Used for non-development
level environments. Typically
these instances take
advantage of fault tolerant
features of the Solr cloud. In
many cases, existing solr
instances are used.
48
SOLR CONCEPTUAL ARCHITECTURE
AEM 6
Node 1
AEM 6
Node 2
Solr
Shard 1
Solr
Shard 2
Zookeeper
Solr Cloud
49
Main differences with the Lucene index
You create and control the solr config
Analyzers
Schema
• You must have a schema.xml that accurately reflects the properties and fields you want indexed (and queried). Which is similar to how the property indexes are configured.
Currency
Language
Enabling additional Solr native functionality (example: mlt - more like this)
Some indexing overhead offloaded
All of this is configured on the Solr servers
LUCENE VS. SOLR
50
//*[rep:native('lucene', 'wine OR beer')]
NATIVE QUERIES
native
function
query type
solr or
lucene
query
select [jcr:path] from [nt:base]
where native('solr', 'mlt?q=Wine&mlt.fl=text&mlt.mindf=1&mlt.mintf=1')
51
JCR BASED SOLR QUERIES
• Oak index cost is
factored
• Transparent to
executor
• Familiar JCR query
syntax
• Easy access to
repository objects
52
SOLR TROUBLESHOOTING
AEM 6
(Solrj)
53
Oak 1.0.8
Lucene Property Indexes
Copy on Write for Lucene Indexes
ONE MORE TH ING…
54
XPath still works.
AND ONE MORE TH ING…
55
ACS AEM Commons & ACS AEM Tools - http://adobe-consulting-services.github.io/
AEM Docs - http://docs.adobe.com/docs/en/aem/6-0/deploy/upgrade/queries-
and-indexing.html
Oak Docs - http://docs.adobe.com/docs/en/aem/6-0/deploy/upgrade/queries-
and-indexing.html
QUERY RESOURCES
56
Training http://bit.ly/AEMTraining
Documentation http://bit.ly/AEM5Docs & http://bit.ly/AEM6Docs
GEMs Webinar Knowledge Exchange www.adobe.com/go/gems
Mobile Dev: Get started with Adobe PhoneGap
https://github.com/blefebvre/aem-phonegap-kitchen-sink
https://github.com/blefebvre/aem-phonegap-starter-kit
Community
Meet with your peers on-line and in-person, get technical
help from the community, access community articles
• AEM Technologist Community: http://adobe.ly/Qe5BBw
• Evolve for AEM Technologists: http://bit.ly/EvolveDev
• AEM Help Forum: http://adobe.ly/OYdtY0
PackageShare
Sign in to the Adobe
Marketing Cloud to
access packages http://bit.ly/AMCPKGSHARE
Marketing Cloud
Exchangehttp://bit.ly/MCXChange
ADOBE EXPERIENCE MANAGERDeveloper Resources