Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
CloudSearch and the Democra1za1on of Informa1on Retrieval
Daniel E. Rose A9.com
SIGIR 2012 Portland
What Does A9 Do? Product Search Visual Search
Adver1sing Technology Community Q&A
15 August 2012 D. Rose, CloudSearch, SIGIR 2012 2
… and CloudSearch
A new hosted search service offered by AWS
15 August 2012 D. Rose, CloudSearch, SIGIR 2012 3
Democra1za1ng Informa1on Retrieval: A Brief History
15 August 2012 D. Rose, CloudSearch, SIGIR 2012 4
Democra1zing Informa1on Retrieval
• Giving more users access to search tools, and making those tools easier to use and more powerful
• Giving more content owners (businesses, organiza1ons, research teams, government offices, etc.) the ability to be search providers.
15 August 2012 D. Rose, CloudSearch, SIGIR 2012 5
1970s: Online Metered Search Services
• Examples: Dialog, ORBIT, Lexis/Nexis, Westlaw, BRS
• Cost and requirements [users]: – Installa1on and rental of dedicated terminal – Usage cost per hour (e.g. $50) – Cost per page printed, etc.
• Content available: Research corpora (e.g. journal ar1cles), news stories, court cases.
• Improved access for: – Users (researchers, lawyers, etc.)
15 August 2012 D. Rose, CloudSearch, SIGIR 2012 6
1970s: Online Metered Search Services
• Typical Query:
assum! /5 risk /p ic* snow*** snowfall /s slip! fell fall***
• Results: Oien the first screen of the first retrieved document.
• Restric1ons encouraged batch-‐style search
15 August 2012 D. Rose, CloudSearch, SIGIR 2012 7
1980s: Enterprise Search Products
• Examples: Verity Topic, Personal Library Systems, Fulcrum SearchServer, Excalibur RetrievalWare.
• Cost and requirements: – $10-‐100K per year license fee, also per seat – Beefy hardware to install it on
• Improved access for: – Content owners (usually large businesses).
15 August 2012 D. Rose, CloudSearch, SIGIR 2012 8
1990s: Web Search • Examples: WebCrawler, Lycos, Infoseek, AltaVista, Excite, Inktomi, Yandex, Google, AllTheWeb, Teoma.
• Cost and requirements: – For users: Free, web browser. – For search providers: web server, high-‐speed service
• Improved access for: – Users – Content owners (as long as your data was HTML, and you put it on your website, and search engine chose to crawl it)
15 August 2012 D. Rose, CloudSearch, SIGIR 2012 9
2000s: Open Source Search
• Examples: Lucene/Solr, Indri • Cost and requirements [providers]: – No cost for soiware – Need hardware to run it on
• Improved Access for: – Content owners (with resources and exper1se.)
15 August 2012 D. Rose, CloudSearch, SIGIR 2012 10
2010s: CloudSearch
• Put your content in the cloud and make it searchable
• You decide what content gets searched and who can see it
• Self-‐service • Improves access for: – Content owners
15 August 2012 D. Rose, CloudSearch, SIGIR 2012 11
An Introduc1on to CloudSearch
What Is Amazon CloudSearch?
• A hosted web search service developed by A9 • Powered by the same search engine used by Amazon.com and other retailers.
• Designed from ground up to support: – semistructured data – faceted metadata search – numeric range searches – memory-‐resident indexes
15 August 2012 D. Rose, CloudSearch, SIGIR 2012 13
CloudSearch Uses AWS Services
• Elas1c Compute Cloud (EC2) for computa1on • Simple Storage Service (S3) for storage • Elas1c Map Reduce (EMR) for index construc1on
• Simple Work Flow (SWF) for coordina1ng customer ac1ons
• Elas1c Load Balancing (ELB) for rou1ng traffic
15 August 2012 D. Rose, CloudSearch, SIGIR 2012 14
CloudSearch Dashboard
15 August 2012 D. Rose, CloudSearch, SIGIR 2012 15
Sepng Up the Data
15 August 2012 D. Rose, CloudSearch, SIGIR 2012 16
Indexing Documents (addi1ons, updates, dele1ons)
15 August 2012 D. Rose, CloudSearch, SIGIR 2012 17
Tes1ng Queries from Dashboard
15 August 2012 D. Rose, CloudSearch, SIGIR 2012 18
Tes1ng Queries from Web Browser
15 August 2012 D. Rose, CloudSearch, SIGIR 2012 19
Search API q = close+encounters!
!bq = (and (or director:’spielberg’ ! director:’lucas’) ! year:1975..1980) !! rank = -year,title!
! return-fields: director,title,year!
"facet = genre !
15 August 2012 D. Rose, CloudSearch, SIGIR 2012 20
CloudSearch Relevance Ranking
• Configurable ranking func1ons • Can combine ! x idf-‐style text matching score with query-‐independent ranking features.
• Rank Expressions: !(0.4 * log2(time()/31536000000 – year)) !+ (0.6 * text_relevance) !
15 August 2012 D. Rose, CloudSearch, SIGIR 2012 21
Elas1c Scaling
15 August 2012 D. Rose, CloudSearch, SIGIR 2012 22
To Recap Benefits:
• Easy to make any semi-‐structured data searchable
• Easy to set up and configure • No hardware or soiware management • Scalable and elas1c
Anyone can be a search provider.
15 August 2012 D. Rose, CloudSearch, SIGIR 2012 23
Implica1ons for Search User Experience
15 August 2012 D. Rose, CloudSearch, SIGIR 2012 24
Progress in Search User Experience 1972-‐1995
• From highly structured boolean queries to unstructured text
• From binary matching to relevance-‐ranking • From batch-‐like to interac1ve • From command line to GUI • From monospace 80 x 24 text to rich presenta1on
15 August 2012 D. Rose, CloudSearch, SIGIR 2012 25
Web Search in 1995
15 August 2012 D. Rose, CloudSearch, SIGIR 2012 26
Hardware & Soiware Capabili1es
15 August 2012
• Search engine designers passed the absolute minimum needed to decide whether to click: 1tle, URL, small excerpt.
• Being able to point and click on remote content was a big deal!
1994 CPU 120 MHz Intel Pen1um Memory 512 KB OS Windows 3.1 Bandwidth 28.8 kbps Cost $2000
D. Rose, CloudSearch, SIGIR 2012 27
Hardware & Soiware Capabili1es
15 August 2012
1994 2012 ∆ CPU 120 MHz Intel Pen1um 2.5 GHz Intel Core i5 > 40x Memory 1 MB 4 GB 4000x OS Windows 3.1 Windows 7, Mac OS X Bandwidth 28.8 kbps 5.8 Mbps 200x Cost $2000 $500-‐1000 0.5x
What have search engine UX designers done with all that addi7onal power?
D. Rose, CloudSearch, SIGIR 2012 28
Web Search in 2012
15 August 2012 D. Rose, CloudSearch, SIGIR 2012 29
Why Lack of Progress?
• Most users’ first (and some1mes only) experience with search is with web search
• Search dominated by a few players
15 August 2012 D. Rose, CloudSearch, SIGIR 2012 30
“What makes a good search engine user experience?”
• Results as relevant as possible • Short delay between query and results • Clean and uncluvered presenta1on • Gives user a feeling of direct engagement • Allows seamless transi1on between search and browsing
• Fun to use • Rewards user for giving more informa1on • Interac1on appropriate for type of task • Limit visual noise / op1mize data-‐ink ra1o • Minimizes scrolling D. E. Rose and S. Raju, “Encouraging Explora1on with Elroy: A New User Experience for Web Search,” SIGIR 2007 workshop on Exploratory Search and HCI 15 August 2012 D. Rose, CloudSearch, SIGIR 2012 31
Op1mizing for Other Proper1es
15 August 2012 D. Rose, CloudSearch, SIGIR 2012 32
What does this have to do with CloudSearch?
• Unprecedented opportunity to build new search applica1ons
• We’re not constrained by how web search works.
Not all search is web search.
15 August 2012 D. Rose, CloudSearch, SIGIR 2012 33
Simple Illustra1ons with CloudSearch
15 August 2012 D. Rose, CloudSearch, SIGIR 2012 34
A Typical Search Interface
15 August 2012 D. Rose, CloudSearch, SIGIR 2012 35
Results Can Be Interac1ve (and Contain Lots of Informa1on)
15 August 2012 D. Rose, CloudSearch, SIGIR 2012 36
Different Interface Controls in Different Situa1ons
15 August 2012 D. Rose, CloudSearch, SIGIR 2012 37
Results Don’t Have to Be a List
15 August 2012 D. Rose, CloudSearch, SIGIR 2012 38
Conclusions
• CloudSearch represents the next stage in the democra1za1on of search.
• You no longer need to be a search expert to be a search provider.
• As the number and variety of search applica1ons increases, we should see an increase in the variety of search interfaces.
15 August 2012 D. Rose, CloudSearch, SIGIR 2012 39
Ques1ons?
To learn more about CloudSearch:
[email protected] hvps://aws.amazon.com/cloudsearch/
Thanks to Mav Amacker, Puneet Gupta, Asif Makhani, Brian Pinkerton, Joel Tesler
15 August 2012 D. Rose, CloudSearch, SIGIR 2012 40