Fusion for Business Intelligence
Allan Syiek Senior Sales Engineer September 14, 2016
Session Objec,ves
By the end of this session, you will: – Have a high level awareness of the variety of search and discovery funcFonality available
– Select the right product for a parFcular use case
– Know why this baby is so happy
Agenda Ø The Beer and Diaper Legend Ø DIKW Pyramid Ø What is Enterprise Search Ø Indexing 101 Ø StaFsFcs vs. Data Mining vs. Machine Learning Ø What is Business Intelligence Ø Where does Fusion Fit?
Parable of the Beer and the Diapers
Illustrates the difference between querying and data mining, already firmly enshrined in BI mythology
The DIKW Pyramid
What is Enterprise Search
Q. What do you do with a mountain of data located everywhere? A. Depends…. What do you need it for?
• Crawling, Parsing, Indexing, Searching • Advanced Searches • Searching Structured Data • Searching Unstructured Data • Metadata • Ranking • Results • Access Control • UI • Tuning • ReporFng • Scale and Performance
Aspects of Enterprise Search
Index Pipeline
Tika Parser Exclusion Filter Field Mapper HTML Transform Stage XML Transform Stage OpenNLP EnFty ExtracFon Gaze]eer ExtracFon Regular Expression AggregaFng Javascript (custom scripts) …and others…
Sear
ch C
olle
ctio
n
Sear
ch U
I
Search Fields/Parameters Facets Landing Pages Boost Documents Block Documents Security Trimming RecommendaFon BoosFng Rollup Aggregator Sub Query Javascript (custom scripts) …and others…
Doc
umen
tsQuery Pipeline
Indexing 101
A system used to make finding informa,on easier.
Every word is converted into a wordID by using an in-‐memory hash table -‐-‐ the lexicon. Occurrences in the current document are translated into hit lists and are wri]en into the forward “barrels”. Inverted Barrels have been sorted.
Indexing 101 -‐ Ranking
• Score Results for PresentaFon – Weighted by Term Frequency-‐Inverse Document Frequency (TF-‐IDF) – Clustering – Complex proprietary algorithms
Indexing 101 -‐ Relevance
Sta,s,cs vs. Data Mining vs. Machine Learning
– Sta,s,cs quan%fies numbers – Data Mining explains pa]erns – Machine Learning predicts with models – Ar,ficial Intelligence behaves and reasons
What is Business Intelligence
• BI technologies provide historical, current and predicFve views of business operaFons
• Business intelligence is made up of an increasing number of components including: – MulFdimensional aggregaFon and allocaFon (OLAP– Online AnalyFcal Processing) – DenormalizaFon, tagging and standardizaFon (relaFonal database) – Real Fme reporFng with analyFcal alert – A method of interfacing with unstructured data sources (data mining) – Group consolidaFon, budgeFng and rolling forecasts – StaFsFcal inference and probabilisFc simulaFon – Key performance indicators opFmizaFon – Version control and process management – Open item management
• Why Fusion for Log Analytics?
• Secure access to dashboards
• ETL of logs using Index pipelines
• Spark run analysis models for logs and leverage with ML index pipeline
• Time series index management
Massive-‐scale log analyFcs
• Index billions of log events per day, real-time
• Recent event and historical analysis: Analyze logs over time: today, recent, past week, past 30 days, …
• Easy to use dashboards to visualize common
questions and allow for ad hoc analysis • Ability to scale linearly as business grows …
with sub-linear growth in costs!
• Easy to setup, easy to manage, easy to use
• Signals & RecommendaFons
Fusion can capture, store, and aggregate signals from a variety of sources to drive predicFve search capabiliFes and conFnuous relevancy tuning
Signals can includeClicks and queries Add-‐to-‐cart and purchase behavior Geo-‐locaFon User behavior and preferences User history and past orders Device
VisualizaFon & Insight with SILK
SILK Dashboards provide a rich visual interface for users to search, inspect and visualize event/log data Gives user the power to perform ad-hoc search and analysis on massive amounts of multi-structured and time series data. Real-time insights and trends for on-the-fly decision making using the most accurate and up-to-date data Users can share visualizations and dashboards
REST API
Worker Worker Cluster Mgr.
Apache Spark
Shards Shards
Apache Solr
HDFS (O
pFon
al)
Shared Config Mgmt Leader ElecFon Load Balancing
ZK 1
Apache Zookeeper
ZK N
DATABASEWEBFILELOGSHADOOP CLOUD
Connectors
Alerting/Messaging
NLP
PipelinesBlob Storage
Scheduling
Recommenders/Signals
…
Core Services
Admin UI
SECURITY BUILT-IN
Lucidworks View
Where Does Fusion Fit?
Learn more at -‐ lucenerevoluFon.org
Thank You Q & A