30
OCTOBER 1316, 2016 AUSTIN, TX

Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil

Embed Size (px)

Citation preview

Page 1: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil

O C T O B E R   1 3 -­‐ 1 6 ,   2 0 1 6     •     A U S T I N ,   T X  

Page 2: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil

Solr  &  R  to  deploy  Custom  Search  Interfaces  

 Patrick  Beaucamp  

Chairman  –  Bpm-­‐Conseil  -­‐  France  patrick.beaucamp@bpm-­‐conseil.com  

Page 3: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil

PresentaHon  Agenda  

Solr  &  R  IntegraHon  inside  AklaBox  

AklaBox  PresentaHon  

AklaBox  &  Solr  +  R  &  GoJS  &  OSM  

Demo  Pla;orm  :  AklaBox  

Going  further  :  Vanilla  Air,  Spark  &  R  &  Solr  

Page 4: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil

Cer@fied  on  Cloudera  &  HortonWorks  

Run  on  Hadoop  :  Solr/Cloud,  Hdfs  ...  

Ready  for  OpenStack  

Aklabox  PresentaHon  

Page 5: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil

Aklabox  PresentaHon  

User  Interface  

Page 6: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil

Aklabox  PresentaHon  

Upload  your  documents  

Share  your  documents  

Collaborate  on  documents  

Search  on  documents  

Synchronize  your  

documents  

Publish  your  documents  

Document  Viewer  

Page 7: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil

Aklabox  PresentaHon  

WorkFlow  

Synchro  

Mobile  

Page 8: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil

Aklabox  PresentaHon  

Standard  Search  Interface  

Page 9: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil

Solr  &  R  IntegraHon  inside  AklaBox  

•  Why  do  I  get  this  list  when  I  search  inside  the  document  repository  ?  

•  What  does  value  when  I  run  a  search  :  weight  of  every  words  ?  •  If  a  word  is  100  @mes  in  a  document,  is  the  document  more  valuable  for  my  search  ?  

•  May  be  the  document  I’m  looking  for  has  not  the  exact  word  spelling  ?  

•  How  do  I  take  into  account  mul@  language  support  ?    

Page 10: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil

Solr  &  R  IntegraHon  inside  AklaBox  

•  We  need  to  review  our  module  and  rethink  how  we  can  help  user  to  deploy  their  own  search  policy  

•  R  was  a  natural  choice  to  create  a  new  search  algorithm    •  We  use  R  for  our  Data  Mining  development  •  R  contains  packages  to  inspect  documents  •  R  has  virtually  no  limit  to  analyze  and  classify  documents  •  We  read  a  lot  about  R  &  Search  engine  …  

 

Page 11: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil

Solr  &  R  IntegraHon  inside  AklaBox  

•  When  do  we  analyze  documents  with  R  :  •  Before  Solr  Indexa@on  •  AZer  Solr  Indexa@on  

•  Choice  :  •  Before  Solr  Indexa@on  •  We  add  Metadata  on  every  document,  like  top  words,  class  of  document  ….  

•  We  create  classes  for  documents,  and  rela@on  between  classes  

 

Page 12: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil

Solr  &  R  IntegraHon  inside  AklaBox  

Keywords  are  added  inside  Solr  Index  

Page 13: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil

Solr  &  R  IntegraHon  inside  AklaBox  

Page 14: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil

Solr  &  R  IntegraHon  inside  AklaBox  

Page 15: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil

Solr  &  R  IntegraHon  inside  AklaBox  

R  Packages  :    •  tm,  textmining  func@ons  (stemming,  words  frequency,  words  manipula@on,  

etc...)  •  TF  IDF  funcHon  (Term  Frequency)  

•  Matrix,  for  complex  ma@rx  manipula@on    

•  cluster  -­‐  fanny  &  kmeans  func-ons,  to  calculate  classes  on  various  group  

•  libsvm  -­‐  fonc@uns  svm,  predict  e&  tune,  for  automa@c  words  classifica@on  

•  Sampling  –  to  create  &  manipulate  different  data  sets    

Page 16: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil

Solr  &  R  IntegraHon  inside  AklaBox  

+    •  R  algorithm  runs  when  the  document  is  uploaded  

•  We  keep  only  a  few  number  of  words  per  documents  (parameter)  •  We  create  classes  for  documents  •  We  can  managed  other  concerns,  such  as  interna@onalisa@on  

•  R  Package  can  be  switch  (other  algorithm,  new  deployment)  •  easy  &  flexible  to  deploy  and  maintain  

•  No  impact  on  Solr  

-­‐  •  Solr  index  is  a  gold  mine  …  and  we  don’t  run  analysis  on  it  

 

Page 17: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil

AklaBox  &  Solr  +  R  &  GoJS  &  OSM  

Page 18: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil

AklaBox  &  Solr  +  R  &  GoJS  &  OSM  

Mind  Map  with  Words  associa@on  

Page 19: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil

AklaBox  &  Solr  +  R  &  GoJS  &  OSM  

Map  Visualiza@on  

OSM  Visualiza@on  

Page 20: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil

DemonstraHon  

Page 21: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil

DemonstraHon  

•  Other  Business  Cases    

•  Document  Management  :  Pre-­‐classifica@on  of  documents  (pharmaceu@cal  industry)  

•  Search  engine  :    Analysis  of  WebSite  during  crawling  process      

•  Open  Door  to  New  development  

•  Phone@cs  search  (to  solve  the  word  spelling  problem)    

Page 22: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil

Vanilla  Air,  Spark,  Spark  Sql  for  Solr  

New  Technologies  are  emerging  …  well  :  it’s  already  there  !!!  

Page 23: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil

Vanilla  Air,  Spark,  Spark  Sql  for  Solr  

•  Vanilla  Air  – Can  Process  R  Packages  – Can  scale  with  growing  number  of  documents  

 www.vanillasmartdata.com  

 

Page 24: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil

Vanilla  Air,  Spark,  Spark  Sql  for  Solr  

Easy  Switch  in  Architecture  -­‐>  scalability  

Page 25: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil

Vanilla  Air,  Spark,  Spark  &  R  &  Solr  

 Spark  1.5  Version  1.5  (sept  2015)  support  for  YARN  cluster  mode  in  R  

Page 26: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil

Vanilla  Air,  Spark,  Spark  &  R  &  Solr  

We  have  now  Spark  &  Solr  Tools  :  SolrRDD    Tools  for  reading  data  from  Solr  as  a  Spark  RDD  and  indexing  objects  from  Spark  into  Solr  using  SolrJ  

hlps://github.com/LucidWorks/spark-­‐solr  

Page 27: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil

Vanilla  Air,  Spark,  Spark  &  R  &  Solr  

Admin  Side  –  Runing  complex  R  program  on  Solr  index,  using  Vanilla  Air  

Page 28: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil
Page 29: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil

Lucky  One  !  

Page 30: Solr & R to Deploy Custom Search Interface: Presented by Patrick Beaucamp, Bpm-Conseil