25
Alfresco Search Internals 1 Andy Hind Senior Developer, Alfresco twitter: @andy_hind

Alfresco Search Internals

Embed Size (px)

DESCRIPTION

This session will first explain the index related options that are available when developing a data model and how these choices affect indexing and searching. We will cover Alfresco FTS in detail, and compare SQL 92 with the CMIS QL. We will also consider sorting and other ways to control the results returned, and how query performance may be affected by ACL evaluation.

Citation preview

Page 1: Alfresco Search Internals

1

Alfresco Search Internals

Andy HindSenior Developer, Alfresco

twitter: @andy_hind

Page 2: Alfresco Search Internals

2

Agenda

• Overview• Direction• Challenges• Alfresco FTS• CMIS Query Language

Page 3: Alfresco Search Internals

3

Overview

Data Modelling Options

<property name="cmis:name">

...

<type>d:text</type>

...

<index enabled="true">

<tokenised>both</tokenised>

<atomic>true</atomic>

<stored>false</stored>

</index>

....

</property>

Page 4: Alfresco Search Internals

4

Overview

Data Modelling Options

<property name="cmis:name">

...

<type>d:text</type>

...

<index enabled="true">

<tokenised>both</tokenised>

<atomic>true</atomic>

<stored>false</stored>

</index>

....

</property>

Type drives analysis

Page 5: Alfresco Search Internals

5

Overview

Data Modelling Options

<property name="cmis:name">

...

<type>d:text</type>

...

<index enabled="true">

<tokenised>both</tokenised>

<atomic>true</atomic>

<stored>false</stored>

</index>

....

</property>

truefalse

Page 6: Alfresco Search Internals

6

Overview

Data Modelling Options

<property name="cmis:name">

...

<type>d:text</type>

...

<index enabled="true">

<tokenised>both</tokenised>

<atomic>true</atomic>

<stored>false</stored>

</index>

....

</property>

truefalseboth

Page 7: Alfresco Search Internals

7

Overview

Data Modelling Options

<property name="cmis:name">

...

<type>d:text</type>

...

<index enabled="true">

<tokenised>both</tokenised>

<atomic>true</atomic>

<stored>false</stored>

</index>

....

</property>

truefalse

(d:content)

Page 8: Alfresco Search Internals

8

Overview

Data Modelling Options

<property name="cmis:name">

...

<type>d:text</type>

...

<index enabled="true">

<tokenised>both</tokenised>

<atomic>true</atomic>

<stored>false</stored>

</index>

....

</property>

truefalse

Page 9: Alfresco Search Internals

9

Overview

Configuration

• IndexerAndSearcher interface and related factory• Redirection by store protocol or value• Factories

• AVM• DM• Unindexed• All lucene based with options set via properties

• Analysis by data type and locale• alfresco/model/dataTypeAnalyzers_{local}.properties

Page 10: Alfresco Search Internals

10

Overview

Configuration

<bean id="indexerAndSearcherFactory" class="org.alfresco.repo.service.StoreRedirectorProxyFactory">

<property name="proxyInterface">

<value>org.alfresco.repo.search.impl.lucene.LuceneIndexerAndSearcher</value>

</property>

<property name="defaultBinding">

<ref bean="admLuceneIndexerAndSearcherFactory"></ref>

</property>

<property name="redirectedProtocolBindings">

<map>

<entry key="workspace">

<ref bean="admLuceneIndexerAndSearcherFactory"></ref>

</entry>

<entry key="avm">

<ref bean="avmLuceneIndexerAndSearcherFactory"></ref>

</entry>

</map>

</property>

<property name="redirectedStoreBindings">

<map>

<entry key="workspace://lightWeightVersionStore">

<ref bean="admLuceneUnIndexedIndexerAndSearcherFactory"></ref>

</entry>

<entry key="workspace://version2Store">

<ref bean="admLuceneUnIndexedIndexerAndSearcherFactory"></ref>

</entry>

</map>

</property>

</bean>

Page 11: Alfresco Search Internals

11

Overview

Configuration properties

lucene.maxAtomicTransformationTime=20

lucene.query.maxClauses=10000

lucene.indexer.cacheEnabled=true

lucene.indexer.maxDocIdCacheSize=10000

lucene.indexer.maxDocumentCacheSize=100

lucene.indexer.maxParentCacheSize=10000

lucene.indexer.maxIsCategoryCacheSize=-1

lucene.indexer.maxLinkAspectCacheSize=10000

lucene.indexer.maxPathCacheSize=10000

lucene.indexer.maxTypeCacheSize=10000

Page 12: Alfresco Search Internals

12

Overview

Configuration properties

lucene.indexer.mergerTargetIndexCount=5

lucene.indexer.mergerTargetOverlayCount=5

lucene.indexer.mergerTargetOverlaysBlockingFactor=1

lucene.indexer.mergerMergeBlockingFactor=1

lucene.indexer.maxDocsForInMemoryMerge=10000

lucene.indexer.maxRamInMbForInMemoryMerge=16

lucene.indexer.postSortDateTime=true

lucene.indexer.defaultMLIndexAnalysisMode=EXACT_LANGUAGE_AND_ALL lucene.indexer.defaultMLSearchAnalysisMode=EXACT_LANGUAGE_AND_ALL

lucene.indexer.maxFieldLength=10000

Page 13: Alfresco Search Internals

13

Overview

Authorization

• Post query filter for READ• Configuration

• system.acl.maxPermissionCheckTimeMillis=10000• system.acl.maxPermissionChecks=1000

• Also set at query time• Read performance

• 1.4 old model• 2.2/3.0 new• 3.4 improved read• system/admin/others

Page 14: Alfresco Search Internals

14

Overview

Query Support

• Lucene based• Lucene with Alfresco extensions (PATH, ...)• Alfresco FTS• CMIS QL + extensions

• DB based• XPath• Specific APIs – (using the child association table)

• NodeService – selecting children• PersonService – lookup people

Page 15: Alfresco Search Internals

15

Overview

Issues

• Factory abstraction• Transaction vs Snapshot• Query language abstraction

• Repo reliance on the lucene index• Cross locale support• Rebuild• Cluster (loss of consistency)• Lucene limitations

• Delete/add and reindexing

• DB schema for properties• Read permission evaluation • One big store• Analyser configuration• Associations• Richer data model control - analysis

Page 16: Alfresco Search Internals

16

Direction

Query Language Abstraction

• Alfresco FTS• CMIS QL

Parser•Alfresco FTS•CMIS QL• ... Abstract Query

Representation

Query Engine•Lucene

Page 17: Alfresco Search Internals

17

Direction

Query Language Abstraction

• Alfresco FTS• CMIS QL

Parser•Alfresco FTS•CMIS QL• ... Abstract Query

Representation

Query Engine•Lucene

SOLRDB/SQL

Page 18: Alfresco Search Internals

18

Direction

SOLR

• Data model integration• Tracking – eventual consistency

• Not suitable for RM

• Query time ACL filtering• PATH support• SOLR scalability and elasticity• faceting etc

Page 19: Alfresco Search Internals

19

Alfrecso FTS

Introduction

• CMIS QL FTS (almost)• Google • Lucene• Developer/App Customisation

• Define the default namespace (e.g. Allow the user to drop cm: )• Disable/enable/modify certain language features• Define templates• Define the default field, simple templates for users• Share defines the “keywords” template as the default field• "%(cm:name cm:title cm:description ia:whatEvent

ia:descriptionEvent lnk:title lnk:description TEXT)

Page 20: Alfresco Search Internals

20

Alfresco FTS

Syntax

• Term (exact/tokenised)• Phrase• Conjunction/Disjunction/Negation/Boosting• Fields• Wildcards• Ranges• Fuzzy matching• Proximity• Templates• See http://wiki.alfresco.com/wiki/Full_Text_Search_Query_Syntax

Page 21: Alfresco Search Internals

21

Alfresco FTS

Template example

"%(cm:name cm:title cm:description ia:whatEvent ia:descriptionEvent lnk:title lnk:description TEXT

=keywords:woof

=cm:name:woof =cm:title:woof =cm:description:woof =ia:whatEvent:woof =ia:descriptionEvent:woof =lnk:title:woof =lnk:description:woof =TEXT:woof

Page 22: Alfresco Search Internals

22

Alfresco FTS

Template example – relevance tuning

"%(cm:name cm:title cm:description ia:whatEvent ia:descriptionEvent lnk:title lnk:description TEXT^2

=keywords:woof

=cm:name:woof =cm:title:woof =cm:description:woof =ia:whatEvent:woof =ia:descriptionEvent:woof =lnk:title:woof =lnk:description:woof =TEXT:woof^2

Page 23: Alfresco Search Internals

23

CMIS QL

Introduction

• Use via CMIS or the SearchService• Read-only relational view of the repository • Subset of SQL-92 with extensions

• Type inheritance• Multi-valued properties• Full text search

• CONTAINS()• SCORE()

• Location • IN_FOLDER() • IN_TREE()

Page 24: Alfresco Search Internals

24

CMIS QL

Alfresco extensions

• JOIN to aspects only• SELECT D.*, O.* FROM cmis:document AS D

JOIN cm:ownable AS O ON D.cmis:objectId = O.cmis:objectId

• no JOIN between types/nodes yet• use Alfresco FTS instead of SQL QL FTS

• SELECT * from cmis:documentWHERE CONTAINS('cmis:name:\'test*\'')

• relax some constraints• SCORE() can be used on its own

• mvps can use svp syntax for IN, LIKE and comparisons• Queries more robust if the data model changes

Page 25: Alfresco Search Internals

25

Learn Morewiki.alfresco.comforums.alfresco.comtwitter: @AlfrescoECM