Transcript
Page 1: How to Gain Greater Business Intelligence from Lucene/Solr

Patrick BeaucampFounder of the Vanilla Project

Mail : [email protected]

How to Gain Greater Business Intelligence with Vanilla from Solr/Lucene

1LuceneRevolution, Boston

Page 2: How to Gain Greater Business Intelligence from Lucene/Solr

Presentation AgendaVanilla powered by Lucene- Report Indexation, Search Interface- External document management- evolution & constraints

Step to Solr/Lucene Adoption- Indexation, Storage, Search- Embeded Solr/Lucene- External Solr/Lucene Platform

Keys Benefit for Vanilla powered by Solr/Lucene- Cluster Architecture- Cache Mechanism- Support for enhanced search language

2LuceneRevolution, Boston

Page 3: How to Gain Greater Business Intelligence from Lucene/Solr

Flash maps and charts : Reports, Cubes and Dashboard

Vanilla Apps : Android and Iphone

Some Vanilla Features

3LuceneRevolution, Boston

Page 4: How to Gain Greater Business Intelligence from Lucene/Solr

Vanilla Powered by Lucene (1/6)Vanilla is a full Business Intelligence Platform that provide :- Reporting, Olap, Dashboard, Kpi, Maps Visualisation- Etl, Workflow, Document Management search Engine

4LuceneRevolution, Boston

Page 5: How to Gain Greater Business Intelligence from Lucene/Solr

Vanilla Powered by Lucene (2/6)Report Indexation- Search engine is Apache Lucene (summer 2010)- External Document & Vanilla Report are indexed- Different Indexation strategy for documents :

– No indexation– Real Time indexation– Late Indexation

2 modules to manage indexation strategy - Enterprise Services to set document property- Norparena to Manage Indexation

5LuceneRevolution, Boston

Page 6: How to Gain Greater Business Intelligence from Lucene/Solr

Vanilla Powered by Lucene (3/6)Search Interface- Search Interface available from Vanilla Portal- Search against Lucene index (inside Vanilla)- Search result is combined with Security on documents

– List contains all documents– Documents are ordered based on popularity

6LuceneRevolution, Boston

Page 7: How to Gain Greater Business Intelligence from Lucene/Solr

Vanilla Powered by Lucene (4/6)External document management- various document format are available (Lucene)- additional properties can be set on documents, for later useage in search criteria- check In / check Out on document for versioning- search is run on the latest document version

7LuceneRevolution, Boston

Page 8: How to Gain Greater Business Intelligence from Lucene/Solr

Vanilla Powered by Lucene (5/6)Evolution and constraints- No clustering available for search engine (embeded Api), as opposed to Vanilla Report Services- Limitation in language and keywords (internal search)- No cache to manage search resultset, as opposed to Vanilla dataset, powered by Memcached

- request from customers to be compliant with enterprise search engine → need to setup an external search architecture

8LuceneRevolution, Boston

Page 9: How to Gain Greater Business Intelligence from Lucene/Solr

Vanilla Powered by Lucene (6/6)

9LuceneRevolution, Boston

Embeded Lucene Api inside Vanilla Platform - Video

Page 10: How to Gain Greater Business Intelligence from Lucene/Solr

10LuceneRevolution, Boston

Step to Solr/Lucene Adoption (1/9)Solr/Lucene is the natural evolution of any embeded Lucene platform

Solr Version : 3.5

IndexationVanilla Lucene Index can be transfert & read by a Solr/Lucene(a Solr/Lucene index is not usable inside Vanilla Platform)

StorageVanilla search Indexed can be managed by a Solr/Lucene platform

SearchSearch language is compliant

Page 11: How to Gain Greater Business Intelligence from Lucene/Solr

11LuceneRevolution, Boston

Step to Solr/Lucene Adoption (2/9)Embeded Solr/Lucene inside Vanilla Platform

No need for any changed in Vanilla code : use of solrj Api

Immediatly provide additional features such as new Keywords

Potential upgrade to Solr/Lucene Enterprise

Page 12: How to Gain Greater Business Intelligence from Lucene/Solr

12LuceneRevolution, Boston

Step to Solr/Lucene Adoption (3/9)From Embeded Lucene to Embeded Solr/Lucene inside Vanilla Platform

Page 13: How to Gain Greater Business Intelligence from Lucene/Solr

13LuceneRevolution, Boston

Step to Solr/Lucene Adoption (4/9)Embeded Solr/Lucene inside Vanilla Platform - Video

Page 14: How to Gain Greater Business Intelligence from Lucene/Solr

14LuceneRevolution, Boston

Step to Solr/Lucene Adoption (5/9)Solr/Lucene Platform with a Vanilla Platform

Need for changes in Vanilla code, to separate document management, indexation & search Api → 10 man days workload

Document Management ApiEasy to move to any Cmis compliancy

Indexation & Search ApiSolr/Lucene oriented & compliant, but now open to any other Search Platform

Page 15: How to Gain Greater Business Intelligence from Lucene/Solr

15LuceneRevolution, Boston

Step to Solr/Lucene Adoption (6/9)Coding Before

Example of Code (Api) Before the split

- Direct use of the Lucene Api

- Parse the document content using Apache TIKA

- Generate Lucene's queries

Page 16: How to Gain Greater Business Intelligence from Lucene/Solr

16LuceneRevolution, Boston

Step to Solr/Lucene Adoption (7/9)Coding After

Example of Code (Api) After the split

- Easy to use Solrj Api

- Distributed search

- Indexation with automatic parsing (using Apache Tika)

Page 17: How to Gain Greater Business Intelligence from Lucene/Solr

17LuceneRevolution, Boston

Step to Solr/Lucene Adoption (8/9)Solr/Lucene Platform with Vanilla Platform - Screenshot

Page 18: How to Gain Greater Business Intelligence from Lucene/Solr

18LuceneRevolution, Boston

Step to Solr/Lucene Adoption (9/9)Solr/Lucene Platform with Vanilla Platform - Video

Page 19: How to Gain Greater Business Intelligence from Lucene/Solr

19LuceneRevolution, Boston

Key Benefits for Vanilla Powered by Solr/Lucene (1/4)

Clustering Search Architecture, outside of Vanilla

Search results clustering implementation (CarrotClusteringEngine) is based on the Carrot2 framework.

Page 20: How to Gain Greater Business Intelligence from Lucene/Solr

20LuceneRevolution, Boston

Key Benefits for Vanilla Powered by Solr/Lucene (2/4)

Additional query language to perform search

Solr Uses the Lucene Search Library and Extends it!

- A Real Data Schema, with Numeric Types, Dynamic Fields, Unique Keys- Powerful Extensions to the Lucene Query Language- Faceted Search and Filtering- Geospatial Search- Advanced, Configurable Text Analysis

Page 21: How to Gain Greater Business Intelligence from Lucene/Solr

21LuceneRevolution, Boston

Key Benefits for Vanilla Powered by Solr/Lucene (3/4)

New methods to manage result set (binary, Xml, Json)

Solr enterprise search server with a REST-like API. You put documents in it (called "indexing") via

XML, JSON or binary over HTTP. You query it via HTTP GET

and receive XML, JSON, or binary results

- Advanced Full-Text Search Capabilities- Optimized for High Volume Web Traffic- Standards Based Open Interfaces - XML,JSON and HTTP

Page 22: How to Gain Greater Business Intelligence from Lucene/Solr

22LuceneRevolution, Boston

Key Benefits for Vanilla Powered by Solr/Lucene (4/4)

Cache Mechanism

Solr caches are associated with an Index Searcher

Three cache implementations : solr.LRUCache (LRU = Least Recently Used in memory),solr.FastLRUCache, solr.LFUCache (Least Frequenty Used)

Many configuration parameters for cache optimisation

Page 23: How to Gain Greater Business Intelligence from Lucene/Solr

23LuceneRevolution, Boston

Next StepsUpgrade to Solr 4.0

New features for Document cycle Management

Roadmap for better Internationalisation :- 10 languages available (not Japaneese)- Search Translation management

Page 24: How to Gain Greater Business Intelligence from Lucene/Solr

Documentations and tutorials available on our Web sites:

www.bpm-conseil.com and forge.bpm-conseil.com

Thanks for your attention

24LuceneRevolution, Boston