Upload
alfresco-software
View
5.871
Download
4
Embed Size (px)
DESCRIPTION
This session will first explain the index related options that are available when developing a data model and how these choices affect indexing and searching. We will cover Alfresco FTS in detail, and compare SQL 92 with the CMIS QL. We will also consider sorting and other ways to control the results returned, and how query performance may be affected by ACL evaluation.
Citation preview
1
Alfresco Search Internals
Andy HindSenior Developer, Alfresco
twitter: @andy_hind
2
Agenda
• Overview• Direction• Challenges• Alfresco FTS• CMIS Query Language
3
Overview
Data Modelling Options
<property name="cmis:name">
...
<type>d:text</type>
...
<index enabled="true">
<tokenised>both</tokenised>
<atomic>true</atomic>
<stored>false</stored>
</index>
....
</property>
4
Overview
Data Modelling Options
<property name="cmis:name">
...
<type>d:text</type>
...
<index enabled="true">
<tokenised>both</tokenised>
<atomic>true</atomic>
<stored>false</stored>
</index>
....
</property>
Type drives analysis
5
Overview
Data Modelling Options
<property name="cmis:name">
...
<type>d:text</type>
...
<index enabled="true">
<tokenised>both</tokenised>
<atomic>true</atomic>
<stored>false</stored>
</index>
....
</property>
truefalse
6
Overview
Data Modelling Options
<property name="cmis:name">
...
<type>d:text</type>
...
<index enabled="true">
<tokenised>both</tokenised>
<atomic>true</atomic>
<stored>false</stored>
</index>
....
</property>
truefalseboth
7
Overview
Data Modelling Options
<property name="cmis:name">
...
<type>d:text</type>
...
<index enabled="true">
<tokenised>both</tokenised>
<atomic>true</atomic>
<stored>false</stored>
</index>
....
</property>
truefalse
(d:content)
8
Overview
Data Modelling Options
<property name="cmis:name">
...
<type>d:text</type>
...
<index enabled="true">
<tokenised>both</tokenised>
<atomic>true</atomic>
<stored>false</stored>
</index>
....
</property>
truefalse
9
Overview
Configuration
• IndexerAndSearcher interface and related factory• Redirection by store protocol or value• Factories
• AVM• DM• Unindexed• All lucene based with options set via properties
• Analysis by data type and locale• alfresco/model/dataTypeAnalyzers_{local}.properties
10
Overview
Configuration
<bean id="indexerAndSearcherFactory" class="org.alfresco.repo.service.StoreRedirectorProxyFactory">
<property name="proxyInterface">
<value>org.alfresco.repo.search.impl.lucene.LuceneIndexerAndSearcher</value>
</property>
<property name="defaultBinding">
<ref bean="admLuceneIndexerAndSearcherFactory"></ref>
</property>
<property name="redirectedProtocolBindings">
<map>
<entry key="workspace">
<ref bean="admLuceneIndexerAndSearcherFactory"></ref>
</entry>
<entry key="avm">
<ref bean="avmLuceneIndexerAndSearcherFactory"></ref>
</entry>
</map>
</property>
<property name="redirectedStoreBindings">
<map>
<entry key="workspace://lightWeightVersionStore">
<ref bean="admLuceneUnIndexedIndexerAndSearcherFactory"></ref>
</entry>
<entry key="workspace://version2Store">
<ref bean="admLuceneUnIndexedIndexerAndSearcherFactory"></ref>
</entry>
</map>
</property>
</bean>
11
Overview
Configuration properties
lucene.maxAtomicTransformationTime=20
lucene.query.maxClauses=10000
lucene.indexer.cacheEnabled=true
lucene.indexer.maxDocIdCacheSize=10000
lucene.indexer.maxDocumentCacheSize=100
lucene.indexer.maxParentCacheSize=10000
lucene.indexer.maxIsCategoryCacheSize=-1
lucene.indexer.maxLinkAspectCacheSize=10000
lucene.indexer.maxPathCacheSize=10000
lucene.indexer.maxTypeCacheSize=10000
12
Overview
Configuration properties
lucene.indexer.mergerTargetIndexCount=5
lucene.indexer.mergerTargetOverlayCount=5
lucene.indexer.mergerTargetOverlaysBlockingFactor=1
lucene.indexer.mergerMergeBlockingFactor=1
lucene.indexer.maxDocsForInMemoryMerge=10000
lucene.indexer.maxRamInMbForInMemoryMerge=16
lucene.indexer.postSortDateTime=true
lucene.indexer.defaultMLIndexAnalysisMode=EXACT_LANGUAGE_AND_ALL lucene.indexer.defaultMLSearchAnalysisMode=EXACT_LANGUAGE_AND_ALL
lucene.indexer.maxFieldLength=10000
13
Overview
Authorization
• Post query filter for READ• Configuration
• system.acl.maxPermissionCheckTimeMillis=10000• system.acl.maxPermissionChecks=1000
• Also set at query time• Read performance
• 1.4 old model• 2.2/3.0 new• 3.4 improved read• system/admin/others
14
Overview
Query Support
• Lucene based• Lucene with Alfresco extensions (PATH, ...)• Alfresco FTS• CMIS QL + extensions
• DB based• XPath• Specific APIs – (using the child association table)
• NodeService – selecting children• PersonService – lookup people
15
Overview
Issues
• Factory abstraction• Transaction vs Snapshot• Query language abstraction
• Repo reliance on the lucene index• Cross locale support• Rebuild• Cluster (loss of consistency)• Lucene limitations
• Delete/add and reindexing
• DB schema for properties• Read permission evaluation • One big store• Analyser configuration• Associations• Richer data model control - analysis
16
Direction
Query Language Abstraction
• Alfresco FTS• CMIS QL
Parser•Alfresco FTS•CMIS QL• ... Abstract Query
Representation
Query Engine•Lucene
17
Direction
Query Language Abstraction
• Alfresco FTS• CMIS QL
Parser•Alfresco FTS•CMIS QL• ... Abstract Query
Representation
Query Engine•Lucene
SOLRDB/SQL
18
Direction
SOLR
• Data model integration• Tracking – eventual consistency
• Not suitable for RM
• Query time ACL filtering• PATH support• SOLR scalability and elasticity• faceting etc
19
Alfrecso FTS
Introduction
• CMIS QL FTS (almost)• Google • Lucene• Developer/App Customisation
• Define the default namespace (e.g. Allow the user to drop cm: )• Disable/enable/modify certain language features• Define templates• Define the default field, simple templates for users• Share defines the “keywords” template as the default field• "%(cm:name cm:title cm:description ia:whatEvent
ia:descriptionEvent lnk:title lnk:description TEXT)
20
Alfresco FTS
Syntax
• Term (exact/tokenised)• Phrase• Conjunction/Disjunction/Negation/Boosting• Fields• Wildcards• Ranges• Fuzzy matching• Proximity• Templates• See http://wiki.alfresco.com/wiki/Full_Text_Search_Query_Syntax
21
Alfresco FTS
Template example
"%(cm:name cm:title cm:description ia:whatEvent ia:descriptionEvent lnk:title lnk:description TEXT
=keywords:woof
=cm:name:woof =cm:title:woof =cm:description:woof =ia:whatEvent:woof =ia:descriptionEvent:woof =lnk:title:woof =lnk:description:woof =TEXT:woof
22
Alfresco FTS
Template example – relevance tuning
"%(cm:name cm:title cm:description ia:whatEvent ia:descriptionEvent lnk:title lnk:description TEXT^2
=keywords:woof
=cm:name:woof =cm:title:woof =cm:description:woof =ia:whatEvent:woof =ia:descriptionEvent:woof =lnk:title:woof =lnk:description:woof =TEXT:woof^2
23
CMIS QL
Introduction
• Use via CMIS or the SearchService• Read-only relational view of the repository • Subset of SQL-92 with extensions
• Type inheritance• Multi-valued properties• Full text search
• CONTAINS()• SCORE()
• Location • IN_FOLDER() • IN_TREE()
24
CMIS QL
Alfresco extensions
• JOIN to aspects only• SELECT D.*, O.* FROM cmis:document AS D
JOIN cm:ownable AS O ON D.cmis:objectId = O.cmis:objectId
• no JOIN between types/nodes yet• use Alfresco FTS instead of SQL QL FTS
• SELECT * from cmis:documentWHERE CONTAINS('cmis:name:\'test*\'')
• relax some constraints• SCORE() can be used on its own
• mvps can use svp syntax for IN, LIKE and comparisons• Queries more robust if the data model changes
25
Learn Morewiki.alfresco.comforums.alfresco.comtwitter: @AlfrescoECM