40
CloudSearch and the Democra1za1on of Informa1on Retrieval Daniel E. Rose A9.com [email protected] SIGIR 2012 Portland

SIGIR 2012 CloudSearch Final · 2012. 8. 30. · CloudSearch,and,the,Democrazaon, of,Informaon,Retrieval, Daniel,E.,Rose, A9.com [email protected],,,,, SIGIR2012, Portland,

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SIGIR 2012 CloudSearch Final · 2012. 8. 30. · CloudSearch,and,the,Democrazaon, of,Informaon,Retrieval, Daniel,E.,Rose, A9.com danrose@a9.com,,,,, SIGIR2012, Portland,

CloudSearch  and  the  Democra1za1on  of  Informa1on  Retrieval  

Daniel  E.  Rose  A9.com  

[email protected]          

SIGIR  2012  Portland  

Page 2: SIGIR 2012 CloudSearch Final · 2012. 8. 30. · CloudSearch,and,the,Democrazaon, of,Informaon,Retrieval, Daniel,E.,Rose, A9.com danrose@a9.com,,,,, SIGIR2012, Portland,

What  Does  A9  Do?  Product  Search   Visual  Search  

Adver1sing  Technology   Community  Q&A  

15  August  2012   D.  Rose,  CloudSearch,  SIGIR  2012   2  

Page 3: SIGIR 2012 CloudSearch Final · 2012. 8. 30. · CloudSearch,and,the,Democrazaon, of,Informaon,Retrieval, Daniel,E.,Rose, A9.com danrose@a9.com,,,,, SIGIR2012, Portland,

…  and  CloudSearch  

A  new  hosted  search  service  offered  by  AWS  

15  August  2012   D.  Rose,  CloudSearch,  SIGIR  2012   3  

Page 4: SIGIR 2012 CloudSearch Final · 2012. 8. 30. · CloudSearch,and,the,Democrazaon, of,Informaon,Retrieval, Daniel,E.,Rose, A9.com danrose@a9.com,,,,, SIGIR2012, Portland,

Democra1za1ng  Informa1on  Retrieval:  A  Brief  History  

15  August  2012   D.  Rose,  CloudSearch,  SIGIR  2012   4  

Page 5: SIGIR 2012 CloudSearch Final · 2012. 8. 30. · CloudSearch,and,the,Democrazaon, of,Informaon,Retrieval, Daniel,E.,Rose, A9.com danrose@a9.com,,,,, SIGIR2012, Portland,

Democra1zing  Informa1on  Retrieval  

•  Giving  more  users  access  to  search  tools,  and  making  those  tools  easier  to  use  and  more  powerful  

•  Giving  more  content  owners  (businesses,  organiza1ons,  research  teams,  government  offices,  etc.)  the  ability  to  be  search  providers.  

15  August  2012   D.  Rose,  CloudSearch,  SIGIR  2012   5  

Page 6: SIGIR 2012 CloudSearch Final · 2012. 8. 30. · CloudSearch,and,the,Democrazaon, of,Informaon,Retrieval, Daniel,E.,Rose, A9.com danrose@a9.com,,,,, SIGIR2012, Portland,

1970s:  Online  Metered  Search  Services  

•  Examples:    Dialog,  ORBIT,  Lexis/Nexis,  Westlaw,  BRS  

•  Cost  and  requirements  [users]:  –  Installa1on  and  rental  of  dedicated  terminal  – Usage  cost  per  hour  (e.g.  $50)  –  Cost  per  page  printed,  etc.  

•  Content  available:    Research  corpora  (e.g.  journal  ar1cles),  news  stories,  court  cases.  

•  Improved  access  for:  – Users  (researchers,  lawyers,  etc.)  

15  August  2012   D.  Rose,  CloudSearch,  SIGIR  2012   6  

Page 7: SIGIR 2012 CloudSearch Final · 2012. 8. 30. · CloudSearch,and,the,Democrazaon, of,Informaon,Retrieval, Daniel,E.,Rose, A9.com danrose@a9.com,,,,, SIGIR2012, Portland,

1970s:  Online  Metered  Search  Services  

•  Typical  Query:    

assum! /5 risk /p ic* snow*** snowfall /s slip! fell fall***

•  Results:    Oien  the  first  screen  of  the  first  retrieved  document.  

•  Restric1ons  encouraged  batch-­‐style  search  

15  August  2012   D.  Rose,  CloudSearch,  SIGIR  2012   7  

Page 8: SIGIR 2012 CloudSearch Final · 2012. 8. 30. · CloudSearch,and,the,Democrazaon, of,Informaon,Retrieval, Daniel,E.,Rose, A9.com danrose@a9.com,,,,, SIGIR2012, Portland,

1980s:    Enterprise  Search  Products  

•  Examples:    Verity  Topic,  Personal  Library  Systems,  Fulcrum  SearchServer,  Excalibur  RetrievalWare.  

•  Cost  and  requirements:  – $10-­‐100K  per  year  license  fee,  also  per  seat  – Beefy  hardware  to  install  it  on  

•  Improved  access  for:  – Content  owners  (usually  large  businesses).  

15  August  2012   D.  Rose,  CloudSearch,  SIGIR  2012   8  

Page 9: SIGIR 2012 CloudSearch Final · 2012. 8. 30. · CloudSearch,and,the,Democrazaon, of,Informaon,Retrieval, Daniel,E.,Rose, A9.com danrose@a9.com,,,,, SIGIR2012, Portland,

1990s:    Web  Search  •  Examples:    WebCrawler,  Lycos,  Infoseek,  AltaVista,  Excite,  Inktomi,  Yandex,  Google,  AllTheWeb,  Teoma.  

•  Cost  and  requirements:  –  For  users:    Free,  web  browser.  –  For  search  providers:  web  server,  high-­‐speed  service  

•  Improved  access  for:  – Users  –  Content  owners  (as  long  as  your  data  was  HTML,  and  you  put  it  on  your  website,  and  search  engine  chose  to  crawl  it)  

15  August  2012   D.  Rose,  CloudSearch,  SIGIR  2012   9  

Page 10: SIGIR 2012 CloudSearch Final · 2012. 8. 30. · CloudSearch,and,the,Democrazaon, of,Informaon,Retrieval, Daniel,E.,Rose, A9.com danrose@a9.com,,,,, SIGIR2012, Portland,

2000s:    Open  Source  Search  

•  Examples:    Lucene/Solr,  Indri  •  Cost  and  requirements  [providers]:      – No  cost  for  soiware  – Need  hardware  to  run  it  on  

•  Improved  Access  for:  – Content  owners  (with  resources  and  exper1se.)  

15  August  2012   D.  Rose,  CloudSearch,  SIGIR  2012   10  

Page 11: SIGIR 2012 CloudSearch Final · 2012. 8. 30. · CloudSearch,and,the,Democrazaon, of,Informaon,Retrieval, Daniel,E.,Rose, A9.com danrose@a9.com,,,,, SIGIR2012, Portland,

2010s:    CloudSearch  

•  Put  your  content  in  the  cloud  and  make  it  searchable  

•  You  decide  what  content  gets  searched  and  who  can  see  it  

•  Self-­‐service  •  Improves  access  for:  – Content  owners  

15  August  2012   D.  Rose,  CloudSearch,  SIGIR  2012   11  

Page 12: SIGIR 2012 CloudSearch Final · 2012. 8. 30. · CloudSearch,and,the,Democrazaon, of,Informaon,Retrieval, Daniel,E.,Rose, A9.com danrose@a9.com,,,,, SIGIR2012, Portland,

An  Introduc1on  to  CloudSearch  

Page 13: SIGIR 2012 CloudSearch Final · 2012. 8. 30. · CloudSearch,and,the,Democrazaon, of,Informaon,Retrieval, Daniel,E.,Rose, A9.com danrose@a9.com,,,,, SIGIR2012, Portland,

What  Is  Amazon  CloudSearch?  

•  A  hosted  web  search  service  developed  by  A9  •  Powered  by  the  same  search  engine  used  by  Amazon.com  and  other  retailers.  

•  Designed  from  ground  up  to  support:  – semistructured  data  –  faceted  metadata  search  – numeric  range  searches  – memory-­‐resident  indexes  

15  August  2012   D.  Rose,  CloudSearch,  SIGIR  2012   13  

Page 14: SIGIR 2012 CloudSearch Final · 2012. 8. 30. · CloudSearch,and,the,Democrazaon, of,Informaon,Retrieval, Daniel,E.,Rose, A9.com danrose@a9.com,,,,, SIGIR2012, Portland,

CloudSearch  Uses  AWS  Services  

•  Elas1c  Compute  Cloud  (EC2)  for  computa1on  •  Simple  Storage  Service  (S3)  for  storage  •  Elas1c  Map  Reduce  (EMR)  for  index  construc1on  

•  Simple  Work  Flow  (SWF)  for  coordina1ng  customer  ac1ons  

•  Elas1c  Load  Balancing  (ELB)  for  rou1ng  traffic  

15  August  2012   D.  Rose,  CloudSearch,  SIGIR  2012   14  

Page 15: SIGIR 2012 CloudSearch Final · 2012. 8. 30. · CloudSearch,and,the,Democrazaon, of,Informaon,Retrieval, Daniel,E.,Rose, A9.com danrose@a9.com,,,,, SIGIR2012, Portland,

CloudSearch  Dashboard  

15  August  2012   D.  Rose,  CloudSearch,  SIGIR  2012   15  

Page 16: SIGIR 2012 CloudSearch Final · 2012. 8. 30. · CloudSearch,and,the,Democrazaon, of,Informaon,Retrieval, Daniel,E.,Rose, A9.com danrose@a9.com,,,,, SIGIR2012, Portland,

Sepng  Up  the  Data  

15  August  2012   D.  Rose,  CloudSearch,  SIGIR  2012   16  

Page 17: SIGIR 2012 CloudSearch Final · 2012. 8. 30. · CloudSearch,and,the,Democrazaon, of,Informaon,Retrieval, Daniel,E.,Rose, A9.com danrose@a9.com,,,,, SIGIR2012, Portland,

Indexing  Documents  (addi1ons,  updates,  dele1ons)  

15  August  2012   D.  Rose,  CloudSearch,  SIGIR  2012   17  

Page 18: SIGIR 2012 CloudSearch Final · 2012. 8. 30. · CloudSearch,and,the,Democrazaon, of,Informaon,Retrieval, Daniel,E.,Rose, A9.com danrose@a9.com,,,,, SIGIR2012, Portland,

Tes1ng  Queries  from  Dashboard  

15  August  2012   D.  Rose,  CloudSearch,  SIGIR  2012   18  

Page 19: SIGIR 2012 CloudSearch Final · 2012. 8. 30. · CloudSearch,and,the,Democrazaon, of,Informaon,Retrieval, Daniel,E.,Rose, A9.com danrose@a9.com,,,,, SIGIR2012, Portland,

Tes1ng  Queries  from  Web  Browser  

15  August  2012   D.  Rose,  CloudSearch,  SIGIR  2012   19  

Page 20: SIGIR 2012 CloudSearch Final · 2012. 8. 30. · CloudSearch,and,the,Democrazaon, of,Informaon,Retrieval, Daniel,E.,Rose, A9.com danrose@a9.com,,,,, SIGIR2012, Portland,

Search  API    q = close+encounters!

!bq = (and (or director:’spielberg’ ! director:’lucas’) ! year:1975..1980) !!  rank = -year,title!

!  return-fields: director,title,year!

 "facet = genre !

15  August  2012   D.  Rose,  CloudSearch,  SIGIR  2012   20  

Page 21: SIGIR 2012 CloudSearch Final · 2012. 8. 30. · CloudSearch,and,the,Democrazaon, of,Informaon,Retrieval, Daniel,E.,Rose, A9.com danrose@a9.com,,,,, SIGIR2012, Portland,

CloudSearch  Relevance  Ranking  

•  Configurable  ranking  func1ons  •  Can  combine  !  x  idf-­‐style  text  matching  score  with  query-­‐independent  ranking  features.  

•  Rank  Expressions:  !(0.4 * log2(time()/31536000000 – year)) !+ (0.6 * text_relevance) !

15  August  2012   D.  Rose,  CloudSearch,  SIGIR  2012   21  

Page 22: SIGIR 2012 CloudSearch Final · 2012. 8. 30. · CloudSearch,and,the,Democrazaon, of,Informaon,Retrieval, Daniel,E.,Rose, A9.com danrose@a9.com,,,,, SIGIR2012, Portland,

Elas1c  Scaling  

15  August  2012   D.  Rose,  CloudSearch,  SIGIR  2012   22  

Page 23: SIGIR 2012 CloudSearch Final · 2012. 8. 30. · CloudSearch,and,the,Democrazaon, of,Informaon,Retrieval, Daniel,E.,Rose, A9.com danrose@a9.com,,,,, SIGIR2012, Portland,

To  Recap  Benefits:  

•  Easy  to  make  any  semi-­‐structured  data  searchable  

•  Easy  to  set  up  and  configure  •  No  hardware  or  soiware  management  •  Scalable  and  elas1c  

Anyone  can  be  a  search  provider.  

15  August  2012   D.  Rose,  CloudSearch,  SIGIR  2012   23  

Page 24: SIGIR 2012 CloudSearch Final · 2012. 8. 30. · CloudSearch,and,the,Democrazaon, of,Informaon,Retrieval, Daniel,E.,Rose, A9.com danrose@a9.com,,,,, SIGIR2012, Portland,

Implica1ons  for  Search  User  Experience  

   

15  August  2012   D.  Rose,  CloudSearch,  SIGIR  2012   24  

Page 25: SIGIR 2012 CloudSearch Final · 2012. 8. 30. · CloudSearch,and,the,Democrazaon, of,Informaon,Retrieval, Daniel,E.,Rose, A9.com danrose@a9.com,,,,, SIGIR2012, Portland,

Progress  in  Search  User  Experience  1972-­‐1995  

•  From  highly  structured  boolean  queries  to  unstructured  text  

•  From  binary  matching  to  relevance-­‐ranking  •  From  batch-­‐like  to  interac1ve  •  From  command  line  to  GUI  •  From  monospace  80  x  24  text  to  rich  presenta1on  

15  August  2012   D.  Rose,  CloudSearch,  SIGIR  2012   25  

Page 26: SIGIR 2012 CloudSearch Final · 2012. 8. 30. · CloudSearch,and,the,Democrazaon, of,Informaon,Retrieval, Daniel,E.,Rose, A9.com danrose@a9.com,,,,, SIGIR2012, Portland,

Web  Search  in  1995  

15  August  2012   D.  Rose,  CloudSearch,  SIGIR  2012   26  

Page 27: SIGIR 2012 CloudSearch Final · 2012. 8. 30. · CloudSearch,and,the,Democrazaon, of,Informaon,Retrieval, Daniel,E.,Rose, A9.com danrose@a9.com,,,,, SIGIR2012, Portland,

Hardware  &  Soiware  Capabili1es  

15  August  2012  

•  Search  engine  designers  passed  the  absolute  minimum  needed  to  decide  whether  to  click:    1tle,  URL,  small  excerpt.  

•  Being  able  to  point  and  click  on  remote  content  was  a  big  deal!  

1994  CPU   120  MHz  Intel  Pen1um  Memory   512  KB  OS   Windows  3.1  Bandwidth   28.8  kbps  Cost   $2000  

D.  Rose,  CloudSearch,  SIGIR  2012   27  

Page 28: SIGIR 2012 CloudSearch Final · 2012. 8. 30. · CloudSearch,and,the,Democrazaon, of,Informaon,Retrieval, Daniel,E.,Rose, A9.com danrose@a9.com,,,,, SIGIR2012, Portland,

Hardware  &  Soiware  Capabili1es  

15  August  2012  

1994   2012   ∆  CPU   120  MHz  Intel  Pen1um   2.5  GHz  Intel  Core  i5   >  40x  Memory   1  MB   4  GB   4000x  OS   Windows  3.1   Windows  7,  Mac  OS  X  Bandwidth   28.8  kbps   5.8  Mbps   200x  Cost   $2000   $500-­‐1000   0.5x  

What  have  search  engine  UX  designers  done  with  all  that  addi7onal  power?  

D.  Rose,  CloudSearch,  SIGIR  2012   28  

Page 29: SIGIR 2012 CloudSearch Final · 2012. 8. 30. · CloudSearch,and,the,Democrazaon, of,Informaon,Retrieval, Daniel,E.,Rose, A9.com danrose@a9.com,,,,, SIGIR2012, Portland,

Web  Search  in  2012  

15  August  2012   D.  Rose,  CloudSearch,  SIGIR  2012   29  

Page 30: SIGIR 2012 CloudSearch Final · 2012. 8. 30. · CloudSearch,and,the,Democrazaon, of,Informaon,Retrieval, Daniel,E.,Rose, A9.com danrose@a9.com,,,,, SIGIR2012, Portland,

Why  Lack  of  Progress?  

•  Most  users’  first  (and  some1mes  only)  experience  with  search  is  with  web  search  

•  Search  dominated  by  a  few  players  

15  August  2012   D.  Rose,  CloudSearch,  SIGIR  2012   30  

Page 31: SIGIR 2012 CloudSearch Final · 2012. 8. 30. · CloudSearch,and,the,Democrazaon, of,Informaon,Retrieval, Daniel,E.,Rose, A9.com danrose@a9.com,,,,, SIGIR2012, Portland,

“What  makes  a  good  search  engine  user  experience?”  

•  Results  as  relevant  as  possible  •  Short  delay  between  query  and  results  •  Clean  and  uncluvered  presenta1on  •  Gives  user  a  feeling  of  direct  engagement  •  Allows  seamless  transi1on  between  search  and  browsing  

•  Fun  to  use  •  Rewards  user  for  giving  more  informa1on  •  Interac1on  appropriate  for  type  of  task  •  Limit  visual  noise  /  op1mize  data-­‐ink  ra1o  •  Minimizes  scrolling  D.  E.  Rose  and  S.  Raju,  “Encouraging  Explora1on  with  Elroy:    A  New  User  Experience  for  Web  Search,”  SIGIR  2007  workshop  on  Exploratory  Search  and  HCI  15  August  2012   D.  Rose,  CloudSearch,  SIGIR  2012   31  

Page 32: SIGIR 2012 CloudSearch Final · 2012. 8. 30. · CloudSearch,and,the,Democrazaon, of,Informaon,Retrieval, Daniel,E.,Rose, A9.com danrose@a9.com,,,,, SIGIR2012, Portland,

Op1mizing  for  Other  Proper1es  

15  August  2012   D.  Rose,  CloudSearch,  SIGIR  2012   32  

Page 33: SIGIR 2012 CloudSearch Final · 2012. 8. 30. · CloudSearch,and,the,Democrazaon, of,Informaon,Retrieval, Daniel,E.,Rose, A9.com danrose@a9.com,,,,, SIGIR2012, Portland,

What  does  this  have  to  do  with  CloudSearch?  

•  Unprecedented  opportunity  to  build  new  search  applica1ons  

•  We’re  not  constrained  by  how  web  search  works.  

Not  all  search  is  web  search.  

15  August  2012   D.  Rose,  CloudSearch,  SIGIR  2012   33  

Page 34: SIGIR 2012 CloudSearch Final · 2012. 8. 30. · CloudSearch,and,the,Democrazaon, of,Informaon,Retrieval, Daniel,E.,Rose, A9.com danrose@a9.com,,,,, SIGIR2012, Portland,

Simple  Illustra1ons  with  CloudSearch  

15  August  2012   D.  Rose,  CloudSearch,  SIGIR  2012   34  

Page 35: SIGIR 2012 CloudSearch Final · 2012. 8. 30. · CloudSearch,and,the,Democrazaon, of,Informaon,Retrieval, Daniel,E.,Rose, A9.com danrose@a9.com,,,,, SIGIR2012, Portland,

A  Typical  Search  Interface  

15  August  2012   D.  Rose,  CloudSearch,  SIGIR  2012   35  

Page 36: SIGIR 2012 CloudSearch Final · 2012. 8. 30. · CloudSearch,and,the,Democrazaon, of,Informaon,Retrieval, Daniel,E.,Rose, A9.com danrose@a9.com,,,,, SIGIR2012, Portland,

Results  Can  Be  Interac1ve  (and  Contain  Lots  of  Informa1on)  

15  August  2012   D.  Rose,  CloudSearch,  SIGIR  2012   36  

Page 37: SIGIR 2012 CloudSearch Final · 2012. 8. 30. · CloudSearch,and,the,Democrazaon, of,Informaon,Retrieval, Daniel,E.,Rose, A9.com danrose@a9.com,,,,, SIGIR2012, Portland,

Different  Interface  Controls  in  Different  Situa1ons  

15  August  2012   D.  Rose,  CloudSearch,  SIGIR  2012   37  

Page 38: SIGIR 2012 CloudSearch Final · 2012. 8. 30. · CloudSearch,and,the,Democrazaon, of,Informaon,Retrieval, Daniel,E.,Rose, A9.com danrose@a9.com,,,,, SIGIR2012, Portland,

Results  Don’t  Have  to  Be  a  List  

15  August  2012   D.  Rose,  CloudSearch,  SIGIR  2012   38  

Page 39: SIGIR 2012 CloudSearch Final · 2012. 8. 30. · CloudSearch,and,the,Democrazaon, of,Informaon,Retrieval, Daniel,E.,Rose, A9.com danrose@a9.com,,,,, SIGIR2012, Portland,

Conclusions  

•  CloudSearch  represents  the  next  stage  in  the  democra1za1on  of  search.  

•  You  no  longer  need  to  be  a  search  expert  to  be  a  search  provider.  

•  As  the  number  and  variety  of  search  applica1ons  increases,  we  should  see  an  increase  in  the  variety  of  search  interfaces.  

15  August  2012   D.  Rose,  CloudSearch,  SIGIR  2012   39  

Page 40: SIGIR 2012 CloudSearch Final · 2012. 8. 30. · CloudSearch,and,the,Democrazaon, of,Informaon,Retrieval, Daniel,E.,Rose, A9.com danrose@a9.com,,,,, SIGIR2012, Portland,

Ques1ons?  

   To  learn  more  about  CloudSearch:  

[email protected]    hvps://aws.amazon.com/cloudsearch/    

   

Thanks  to  Mav  Amacker,  Puneet  Gupta,  Asif  Makhani,  Brian  Pinkerton,  Joel  Tesler  

15  August  2012   D.  Rose,  CloudSearch,  SIGIR  2012   40