Upload
lucidworks
View
186
Download
2
Embed Size (px)
Citation preview
Agenda • Why Search Monitoring is important ? • How and what to Monitor ? • How and What to Analyze ?
• What can be forecast • How to forecast ? • Infrastructure and Algorithms to Forecast
• Example with AklaBox Platform • Software Packages – Forecast Infrastructure
3
Why Search monitoring (dashboard, forecast) is important ?
User Behavior Analysis for Sales & Marketing Team, Web Design Team
WebSite as a Vitrin :
Which Menu & Sub menu are visited ?
Where are the dead branch ?
No real « Search Approach »
Before
4
Why Search monitoring (dashboard, forecast) is important ?
User Behavior Analysis for Sales & Marketing Team, Web Design Team
WebSite as a Search Interface
What people are looking for ?
How are they searching?
Now
Review your SEO
5
Why Search monitoring (dashboard, forecast) is important ?
Technology Evolution : Solr, Drupal-Solr, WordPress-Solr, etc …
It’s easy to add a « search » feature
In WebSite (Drupal Hosting) Company don’t want to live
this again !
6
Why Search monitoring (dashboard, forecast) is important ?
WebSite Content Evolution : Providing relevant content
Search Engine Optimisation & Keyword Strategy
Follower & Alerts
7
How and what to monitor – Infrastructure Activity
Infrastructure Activity - Technical Side v CPU & Memory, bandwidth Logs Analysis or Jmx process, Product like Nagios
8
How and what to monitor – Solr Activity Solr Indexation activity v Nutch processing or ManiFold CE process v Fusion’s Anda crawler v User Activity (load documents)
Solr Search Activity – Functional Side v TimeStamp (when people are searching) v Search Criteria – User Behavior (combined search) Solr Log Analysis (Cluster of Application) SolrCore Metrics –Solr 6.4 section <solr><metrics> in solr.xml (250 metrics) Web Server Log Analysis (Apache Log) Web Application Analysis (your own Search Platform)
9
How and what to Visualize - Solr Activity
Ready to use Packages to explore Solr Activity / logs • LucidWorks Banana • ELK • Some other like : Thoth / Trulia – Carbon Graphite, etc …
10
How and what to monitor – Architecture Impact Architecture for French Ministry of Environment
1 Web Platform With clustered Solr Infra
11
How and what to monitor – Architecture Impact Architecture for French Ministry of Environment
1 Web Platform With clustered Solr Infra
12
How and what to monitor – Architecture
Vanilla
French Region with BI (Vanilla) and AklaBox (Document Management
2 Web Platforms Share the same
Solr Infra
13
How and what to monitor - Solr Activity Logging
Remember : => You can only explore what you log Remember : => Solr Log configuration is easy, but you need to practice Remember : ⇒ Good and Bad New : ⇒ This session is not a Solr logging tutorial
14
01How and what to Analyze Be aware of your infrastructure (shared solr) Where do you log ? https://findwise.com/blog/using-log4j-tomcat-solr-how-make-customized-file-appender/ Important : You may need to write your own parser & log aggregator (Remember the L in ELK suit of programs - LogStash) You need to clarify your objective : • Solr response time analysis (infrastructure side) • Search Keywords analysis (user behavior side)
15
01Understand Solr Activity … understand your activity
Why are there some pick of activity ? Does it has an impact on the server (response time, stability) How can we anticipate pick of activity ? • Events : « Accord de Paris », « NBA Final » • Related events : Brexit, Election • Marketing campaign => Information about external data
16
01What can be Forecast … does it has interest ?
• Server consumption (CPU, Memory) • User Activity per period • Search Criteria • Much More … … basically, all that can be logged then analyzed is a candidate to forecast, using either Time series or Predictive Events
17
01How to Forecast
Collection of data • Logs from server • External data
Data preparation to re-work some data
Work on the Log, for example to extract Search Keywords Work on Intermediate Result (example Search Keyword) to classify them, group them, etc … (example : aggregation of search « potus » and « Obama » (between this date and this date) and «Trump » (from this date)
18
01How to Forecast
You Need of basic Statistics & Analytics • What is the usage of a cluster ? • What does it mean to have 2 data correlated (and not
correlated) ? • What does it means to have data dependancy • Why do I have to deal with outlier ?
Question : Outlier are wrong data or exceptional data ?
19
01How to Forecast – its all about DATA
Language & platform to • Explore & Visualize (Statistical methods) … Data Understanding • Analyze & Understand (outlier, trend, correlation, dependancy) • Build Forecast Model • Insert external data that impact behavior (wheather activity,
marketing campaign, business event)
Review your model : compare reality and forecasted data !
20
01Data Workflow : from Log to Dashboard • Log « Analyzer/Manager », like LogStash – but also your own parser • Load Logs into DBMS and/or Spark2 (depend of your analyze strategy) • Statistic program running inside Spark (R, python, scala, Julia …) • Data Preparation Interface (Exploration, Classification, Recoding of data) • Exploration : any Dashboard package that can run R or python programs,
such as Tableau, Vanilla FlexBoard, Zepellin, Jupyter …
21
01Forecast using R – What is R ? R is a programming language and software environment for statistical computing and
graphics.
www.R-project.org
22
01R Challenge – Package Management
23
01R Challenge – Development Studios
Web Based
24
01R Challenge – Visualization & Dashboard Shiny
Zepellin
Jupyter
Vanilla Air
25
01R Challenge – Enterprise Ready Platform
Shiny
Microser Server (Revolution Analytics)
Oracle R Server
Vanilla Air
End of 2015 : creation of the R Foundation
Certified Packages Server Side Architecture
26
01Forecast using R
Package to analyze Solr logs • Cluster of events : cluster package (algorithm like clara) • Time series for some events : ts, timeSeries package • Search Keywords : qdap package • User IP (Origin) : rgeolocate package • Package to run semantics analysis (similar words) : tm package • Your own package to analyze data Some basic Statistics or data exploration : • keywords search evolution, • group of keywords …
27
01Forecast using R Need to integrate external data, example :
Brexit event -> Search for Irish passport US Election & related events
Marketing Campaign & its impact Basic data base integration : • Finances data (yahoo, quandl, • Weather data • Social media data (twitter, facebook)
28
01
Algorithms & Visualisation - Time-Series Analysis
• Frequency & time representation
29
01
Algorithms & Visualisation - Correlation Analysis
Marketing campaign & Keywords • Search on Keyword A and B is correlated with Campaign 1 • Campaign 1 has no incidence on Search on Keyword C To know that 2 facts are correlated is as important as knowing they are not correlated.
30
01
Algorithms & Visualisation - Cluster
Building a 2D visualisation with 2 dimensions, and creating groups Group 1 : US Visitors, group of Keyword « Finance » Group 2 : West European Visitors, group of Keywords « History » Cluster of Search Criteria : Synonymous management (Valls -> Primer Minister)
31
01Algorithms & Visualisation - Principal Component Analysis Building a 2D or 3D visualisation with multiples dimensions, and detecting difference between axis (usually good if a dimension carries 30% of the information) Axis 1 : US Visitors, multiples search, group of Keyword « Finance » Axis 2 : West European Visitors, group of Keywords « History »
32
01
Algorithms & Visualisation - Dependancy
Building relation between Keywords Search (Pair of terms) Equivalence in Retail : detection of products beeing bought together Available also with Solr Semantic Knowledge Graph
People searching on « Sport » are searching « Base Ball » People searching on « History » are searching « France » … if first search is not relevant, then second search may never occur
33
01R & AklaBox - Solr : Usage
R to analyse document During document upload R to create Custom Search
algorithms
Souvenir
34
01R & AklaBox - Solr : Usage
R to run Analysis & Predictive model using Solr log activity
Search Engine powered by R (create custom Search Algorithm)
Document Digitalization & OCR : Document Recognition (R program to analyze document content and classify
document)
35
01Platforms : Vanilla & Vanilla Air
Vanilla Air as a server side R infrastructure Vanilla Hub to integrate external data Vanilla Portal to display FlexBoard Dashboard Powered by R Taking advantage of Document management Features : • CMIS support (Dashboard publication & distribution) • Solr indexation (Dashboard indexation) • Search engine (Dashboard Access)
36
01Platforms : Vanilla & Vanilla Air
37
01Vanilla FlexBoard – Some Solr data Visualization
Q & A ?
Thank You