This tutorial, offered at the 10th International Conference on Web Engineering, presents the peculiarities of advanced Web search applications, describes some tools and techniques that can be exploited, and offers a methodological approach to development. The approach proposed in this tutorial is based on the paradigm of Model Driven Development (MDD), where models are the core artifacts of the application life-cycle and model transformations progressively refine models to achieve an executable version of the system. To cope with the process-intensive nature of the main interactions (i.e., content analysis, query management, etc.), we describe the use of Process Models (e.g., BPMN models). Indeed, search-based applications are considered as process- and content-intensive applications, due to the trends towards exploratory search and search as a process visions.
- 1.Engineering Web Search Applications Alessandro Bozzon Marco
Brambilla Vienna July 5, 2010
2.
- Post-doc @Politecnico di Milano
- http://home.dei.polimi.it/bozzon
- Assistant Professor @Politecnico di Milano
- http://home.dei.polimi.it/mbrambil
About the speakers 2010 Alessandro Bozzon, Marco Brambilla
- Research background and interests
-
- Web engineering and model-driven development
-
-
- Complex enterprise application design
-
- BPM, SOA and integration with Web application devel.
-
- Search engine and complex search application development
-
-
- Search Computing: multidomain search
-
-
- Pharos: multimedia search framework
July 5, 2010 ABOUT // 3. About the tutorial
- Information Retrieval is a >40y old discipline tackled from
a myriad of viewpoints
-
- Development process driven
-
- using real-world case studies as examples
- The tutorial is necessarily shallow
-
- But we provide references and links
2010 Alessandro Bozzon, Marco Brambilla July 5, 2010 ABOUT// 4.
Agenda 2010 Alessandro Bozzon, Marco Brambilla 5. AGENDA
-
- What are Web search applications?
-
- Which are their requirements?
-
- How to measure their success?
2010 Alessandro Bozzon, Marco Brambilla July 5, 2010 AGENDA// 6.
Introduction 2010 Alessandro Bozzon, Marco Brambilla 7. Search
prevails
- Searchis an integral part of online life of people
- Web search has become a standard (and often preferred) source
of information finding
-
- ... 92%of Internet users say the Internet is a good place to go
for getting everydayinformation... - 2004 Pew Internet Survey
- Web search engines are now thesecond most frequently usedonline
computer application, after email
- Search is fully integrated into operating systems and is viewed
as an essential part of most information systems
2010 Alessandro Bozzon, Marco Brambilla July 5, 2010
INTRODUCTION // 8. Some numbers
-
- Estimated size:~ 60 billion pages 22/06/2010
-
-
- http://www.worldwidewebsize.com/
-
- > 9.3 billion queries just in the U.S. inMay2010
-
-
-
http://blog.nielsen.com/nielsenwire/online_mobile/top-u-s-search-sites-for-may-2010/
-
- # of new tweets per day: 55 million
-
- # of search queries per day: 600 million
-
- 400 Million Global Users (and growing)
-
- The average Facebook User Spends 55 Minutes Per Day
2010 Alessandro Bozzon, Marco Brambilla July 5, 2010
INTRODUCTION // 9. more numbers
- IDC Digital Universe report estimates:
-
- digital data grew by 62% between 2008 and 2009
-
- reach 35 ZB (zetabytes) by 2020.
2010 Alessandro Bozzon, Marco Brambilla July 5, 2010
INTRODUCTION // [Ramakrishnan and Tomkins 2007] 10. Information
Retrieval
- Information retrieval (IR)deals with the representation,
storage, organization of, and access to information items.
- As an academic field of study:
-
- Information retrieval (IR) is devoted tofinding relevant
documents , not finding simple match to patterns.
-
- Information retrieval (IR) is finding material (usually
documents) of anunstructured nature(usually text) that satisfy an
information need from within large collections (usually stored on
computers).
2010 Alessandro Bozzon, Marco Brambilla INTRODUCTION // July 5,
2010 11. Information Retrieval Applications
-
- Static document collection
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla
INTRODUCTION //
-
- Document collection constantly changing
-
-
- Example: corporate mails routed by predefined queries to
different parts of the organizations
Static Document Collection Ranked Result Ad-Hoc query Document
Routing System Predetermined queries or User profiles
IncomingDocuments 12. The nature of information retrieval
- retrieving all objects whichmight be useful or relevantto the
user information need
-
- Usuallyunstructuredqueries (no formal semantics)
-
-
- The IR system interpret the contents of the information
items
-
-
- Examples: keyword-based queries, context queries, proximity,
phrases, natural language queries
-
-
- Also structural queries and, in recent systems, structured
query languages are supported (but with a different semantics)
-
- Errorsin the results aretolerated
-
-
- Relevance Ranking(accordingto the user need)
-
-
- It is not clear what degree of relevancethe user is happy
with
-
-
- The user starts from the top of theranked list and explore down
satisfied
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla
INTRODUCTION // 13. Information RetrievalisNOTData Retrieval
- Data Retrieval (RDBMS, XML DB)
-
- retrieving all objects whichsatisfy clearly defined
conditionsexpressed trough a query language.
-
- Data has a well defined structure and semantics
-
-
- Regular expression, relation algebra expression, etc.
-
- Results areEXACT matches errors are not tolerated
-
- Norankingw.r.t. the userinformation need
-
-
- Binary retrieval: does not allow the user to control the
magnitude of the output
-
-
- For a given query, the system may return:
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla
INTRODUCTION // 14. The Information Retrieval Process July 5, 2010
2010 Alessandro Bozzon, Marco Brambilla INTRODUCTION// Content
Management Query analysis Query Interaction Generic search-oriented
application B A C K E N D F R O N T E N D q q r r Search Result
Composition Result Manipulation 15. Search Engine vs. Search
Application
-
- data management system which uses information retrieval
algorithms to retrieve information items from one or more sources
upon the submission of a query
-
- data management system where search engines are a piece of a
more complex puzzle, that includes:
-
-
- data source integration (e.g. databases,legacy systems, the
Web)
-
-
- content analysis technologies orchestration
-
-
- Web-mediated social interactions, etc.
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla
INTRODUCTION // 16. Characterization of the user information
need
- It is not a simple problem:
-
-
- Gap between the object in the
-
-
- world and the information in a
-
-
- (computational) description
-
-
- Lack of coincidence between the
-
-
- (computational) description of the
-
-
- information and their interpretation
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla
INTRODUCTION // 17. Evaluating an IR System
- Precision: fraction of retrieved docs that are relevant
-
-
-
- degree of soundness of the system
-
-
-
- not considering the total number of documents
- Recall: fraction of relevant docs that are retrieved
-
-
-
- degree of completeness of the system
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla
INTRODUCTION // 18. Enterprise search
- Public Web search engines are the ones known to the general
public
- But there is also a huge need (and market share!)
forprofessional search over enterprise repositories
- Enterprise search is covered by
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla
INTRODUCTION // 19. Case Studies
-
- The Search Computing project
- Example of Web Search Application
2010 Alessandro Bozzon, Marco Brambilla July 5, 2010CASE STUDIES
// 20. YaGoBi
-
- 92% of marketshare in the U.S.
-
- Web pages, Blog, News, Books, Scientific Publications,
Emails
-
- Images and Videos (but only troughtextual descriptions )
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla CASE
STUDIES // 21. The PHAROS Project
- FP6 IP, 3Years, 12 Partners, ~15 M budget
- Mission : Develop SOA-compliant,open and distributed
technologyplatform for development of information access solutions
foraudio visual content
- www.pharos-audiovisual-search.eu
2010 Alessandro Bozzon, Marco Brambilla July 5, 2010 CASE
STUDIES // 22. The Search Computing Project
- European Research Council (ERC), 2008 Call for "IDEAS Advanced
Grants, 5y (started in 2009)
- Mission : provide the abstractions, foundations, methods, and
tools required to answermulti-domain queries by interacting with a
constellation of cooperating search services, usingranking and
joining of results
- as the dominant factors forservice
2010 Alessandro Bozzon, Marco Brambilla July 5, 2010 CASE
STUDIES // 23. Chansonnier
- Open source video analysis
- open frameworks(SMILA / SOLR)
-
- Keyframe extraction for video snippets
- http://github.com/giorgiosironi/Chansonnier
2010 Alessandro Bozzon, Marco Brambilla July 5, 2010CASE STUDIES
// 24. Requirements 2010 Alessandro Bozzon, Marco Brambilla 25. Key
Requirements and Design Dimensions for Web Search 2010 Alessandro
Bozzon, Marco Brambilla July 5, 2010 REQUIREMENTS //
26. Data Sources
- Sensors (in wide sense)and streams
2010 Alessandro Bozzon, Marco Brambilla July 5, 2010
REQUIREMENTS // 27. Data Type
2010 Alessandro Bozzon, Marco Brambilla July 5, 2010
REQUIREMENTS // 28.
-
- Deals with basic language units (morphemes, roots, stems,
words, phrases, sentences, etc.)
-
- Deals with media contents
Data Analysis July 5, 2010 2010 Alessandro Bozzon, Marco
Brambilla REQUIREMENTS //
- An activity performed at the purpose of providing a
representation of a content item suited for the application
29. Search Engine _1
-
- Textual contents represented as collection of unstructured text
terms
-
- Textual contents structured infields(e.g., metadata)
-
- Textual contents organized incomplex (possibly
heterogeneous)structure (e.g., XML, HTML)
2010 Alessandro Bozzon, Marco Brambilla July 5, 2010
REQUIREMENTS // 30. Search Engine _2
-
- Media contents described by low-level features
- Geographic and other special dimensions
-
- Content featuring geo-spatial features
-
- Streaming content searched by temporal features (e.g.,
recency)
2010 Alessandro Bozzon, Marco Brambilla July 5, 2010
REQUIREMENTS // 31. Query Format
- Representation of the user information need
-
-
- For instance trough vocal interfaces
-
-
- Set of text items, plus Boolean (AND/OR/NOT), proximity (
lexical nearness) and/or wildcard conditions
-
-
- Text items defined on one or more fields
-
-
- Queries to semi-structured search-engines andFaceted
queries
-
-
- Query by example (text, image, video, audio, etc.)
-
- Geographicand other special dimensions
-
-
- Geographic coordinates plus spatial operator terms ( near,
north of, within X kilometers from, etc.)
-
-
- Timestamps plus temporal operator terms (recent, near,
interval, etc.)
2010 Alessandro Bozzon, Marco Brambilla July 5, 2010
REQUIREMENTS // 32. YaGoBi
-
- Web : crawling of Web resources
-
- Users : comments, preferences, relationships
-
- Unstructured data :Web pages
-
- Documents : PDF, PPT, DOC, etc.
-
- Textual : for content, document, and user generated
comments
-
- Media : some basic image analysis for color, faces, size
-
- Fielded: filetype, page title, site, page content
-
- Content-based: image similarity in Google
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla
REQUIREMENTS // 33. PHAROS
-
- Web : crawling of audio/video files
-
- File System : NAS and content provider media archives
-
- Users : comments, preferences, relationships
-
- Structured data : content provider description metadata
-
- Media : hi-quality video and audio files
-
- Semi-structured data : MPEG-7 description of processed media
files and user annotations
-
- Textual : for content metadata and user generated comments
-
- Media : for audio and video
-
-
- Audio/Video Mood classification, Image concept classification,
Music Genre, Danceability classification, face recognition and
identification, speech to text
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla
REQUIREMENTS // 34. PHAROS
-
- Semi-structured : XML search engine for MPEG-7 content
description
-
-
- Plusgeographicannotations and geo-based ranking
-
- 3 content-based engines :
-
-
- one for images (shots of the video)
-
- Fielded-keyword : XQuery for XML search engine
-
- Query by example : for image, music and faces
-
- MPQF: high level query language
-
-
- AND/OR/AND THEN for fielded keyword and by-example queries
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla
REQUIREMENTS // 35. Query Federation in PHAROS July 5, 2010 2010
Alessandro Bozzon, Marco Brambilla REQUIREMENTS // JPG Long/Lat
XPath Keywords amsterdam here[contains(amsterdam)]and
opic[contains(building)] Geo search R-tree index 52.37N 4.89 E Text
search Inverted index XML search Semantic index Image search
Similarity index Query analysis Federation 36. User Behavior
-
-
- People dont want to search
-
-
- People want to get task done and get answers
-
- Moving towardsidentifying a users task
-
- Enabling means fortask completion
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla
REQUIREMENTS //
-
- Support the user in the search process
-
- (try to) Infer the user intent to help him accomplishing his
task
Ricardo Baeza-YatesNext Generation Search , 2 ndSeCo
Workshop,Milan, 24/06/2010 Start End I am craving for a goodWiener
Schnitzeland aSachertortein ViennaSearch Menu Reviews Map 37.
Information Seeking[Bates, 2002] July 5, 2010 2010 Alessandro
Bozzon, Marco Brambilla REQUIREMENTS // Bates, Marcia J. 2002.
Toward an integrated model for information seeking and searching.
In: The Fourth International Conference on Information Needs,
Seeking and Use in Dierent Contexts. 38. Information Foraging
- Information foragingapplies the ideas fromoptimal foraging
theory to understand how human users search for information.
- Assumption: humans use "built-in" foraging mechanisms that
evolved to help our animal ancestors find food.
2010 Alessandro Bozzon, Marco Brambilla July 5, 2010
REQUIREMENTS //
-
- Fu, Wai-Tat; Pirolli, Peter (2007), "SNIF-ACT: a cognitive
model of user navigation on the world wide web", Human-Computer
Interaction: 335412
-
- Jason Withrow, "Do your links stink?," American Society for
Information Science Bulletin, June 1, 2002
-
- Pirolli, Peter (2009), "An elementary social information
foraging model", Proceedings of the 27th international conference
on Human factors in computing systems: 605614
39. Moving between patches
- Patches of information = websites
- Problem:should I continue foraging in the current patchor look
for another patch?
- Expected gain from continuing in current patch vs. moving to
another
2010 Alessandro Bozzon, Marco Brambilla REQUIREMENTS // July 5,
2010 40. Information seeking funnel [D. Rose, 2008]
- Wandering:the userdoes not haveaninformation seeking-goal in
mind.
- Exploring:the user has ageneral goalbut not a plan for how to
achieve it.
- Seeking: the user hasstarted to identifyinformation needs that
must be satisfied but the needs are open-ended.
- Asking:the user has avery specificinformationneed that
corresponds to a closed-class question
2010 Alessandro Bozzon, Marco Brambilla July 5, 2010
REQUIREMENTS // 41. Berrypicking vs. Orienteering vs. Teleporting
...
- Information needschange during interactions
-
-
- M.J. Bates. The design ofbrowsing and berrypickingtechniques
for the onlinesearch interface.OnlineReview,
13(5):407431,1989.
- Orienteering [ Teevan et al., CHI 2004 ] :Searcher issues a
quick, imprecise to get to approximately the right information
space region and then follows known paths that require small steps
that move them closer to their goal.Easy! (perfect query not
needed)
- Teleporting:Expert searchers issue longer queries to jump
directly to the target. Requires more effort and experience.
2010 Alessandro Bozzon, Marco Brambilla July 5, 2010
REQUIREMENTS // 42. vs. exploratory search
- Exploratory Search:users intent is primarily to learn more on a
topic of interest, by exploring various directions and sources
-
- exploratory searchblends querying and browsing strategies and
is differentfromretrievalthat is best served by analytical
strategies
-
-
-
-
- Marchionini, G. Exploratory search:from finding to
understanding.Communications ACM 49(4): 41-46 (2006)
2010 Alessandro Bozzon, Marco Brambilla July 5, 2010
REQUIREMENTS //
-
- Definition and analysis of the problem
-
-
- White, R. W., and Drucker, S. M. Investigating behavioral
variability in web search. 16th WWW Conf. (Banff, Canada,
2007)
-
- Complex Search and Exploratory Search
-
-
- Aula, A., and Russell, D.M. Complex and Exploratory Web Search.
ISSS: Information Seeking Support Systems Workshop (Chapel Hill,
June 2008)
43. Multi-domain Exploratory Search
- search for upcomingconcerts closeto anattractivelocation(like a
beach, lake, mountain, natural park, and so on), considering also
availability ofgood ,close-by hotels
- Current approach the user can adopt:
-
- Independently explore search services
-
- Manually combine findings
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla
REQUIREMENTS // 44. Multi-domain Exploratory Search
- expandthe search to get information about available restaurants
near the candidate concert locations, news associated to the event
and possible options to combine further events scheduled in the
same days and located in a close-by place with respect to the first
one
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla
REQUIREMENTS // 45. Existing Approaches _1
- Topic based search : instance of exploratory search centered on
the goal of collecting information on a subject matter of interest
from multiple sources
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla
REQUIREMENTS //
-
-
-
- Kosmix : topic discovery engine, keyword search, a topic page
summarizes the most relevant information on the subject
-
-
-
- Hakia : resume pages for topics associated with users queries,
natural language processing techniques
46. Existing Approaches _2
- Structured Object Search : process queries and present results
that address entities or real world objects described in Web
pages
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla
REQUIREMENTS //
-
-
-
- Google Squared: keyword search, results collected in a table
(called a square) featuring all the attributes relevant to the
result items as columns headers
-
-
-
- Google Fusion Tables: upload data tables (e.g., spreadsheet
files) and join (or fuse) the data in some column with other
tables
47. The note-taking limit
- There is a limit after which the found options need to be
marked down.
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla
REQUIREMENTS // [Aula and Russel, 2008] 48. Liquid Queries
- A new paradigm allowing users toformulateand
getresponsestomulti-domainqueries through anexploratory information
seekingapproach, based uponstructuredinformation sources exposed as
software services
- Compositeanswers obtained by aggregating search results from
various domains
- Highlightthe contribution of each search service
- Joinof results based on the structural information afforded by
the search service interfaces
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla
REQUIREMENTS //
- Alessandro Bozzon, Marco Brambilla, Piero Fraternali, Stefano
Ceri.Liquid Query: multi-domain exploratory search on the Web . WWW
2010, Raleigh, USA
49. Liquid Queries Definition _1
- It consists of subsetting and parametrizing the resource
graph...
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla
REQUIREMENTS // Concert Artist Exhibition Restaurant Hotel Movie
Metro Station Theatre Photo Landmark News Photo Concert Metro
Station Restaurant News Exhibition Artist Hotel = inputs,
outputs+GR = global ranking 50. Liquid Queries Definition _2
- And then characterizing the user interaction
-
- Parametrization of global ranking
-
- Data visualization options
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla
REQUIREMENTS // Photo Concert Metro Station Restaurant News
Exhibition Artist Hotel Expand 51. Result Exploration Support
- If the current set of combinations is not satisfactory, the
user may ask formorevalues for a service (more one) or for all
services (more all)
-
- More concerts, more hotels, or more combinations
- Add new informationabout further domains for selected
combinations (expand)
-
- Find close-by restaurants or co-located events
- Aggregateinformation to ease analysis and readability
(clustering, grouping)
- Reducethe number of shown items through filtering
-
- Total walked distance for the night
- Re-order(ranking or sorting)
-
- Calculate derived values from existing ones
-
- Total walked distance for the night
- Alternativedata visualization
-
- Map, parallel coordinates,
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla
REQUIREMENTS //
- http://demo.search-computing.org
52. User Intent
- Understand the user information need
-
- User intent taxonomy (Broder2002)
-
-
- Informational want to learn about something (~40% / 65%)
-
-
- Navigational want to go to a given page (~25% / 15%)
-
-
- Transactional want to do something (web-mediated) (~35% /
20%)
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla
REQUIREMENTS // [from SIGIR 2008 Tutorial, Baeza-Yates and
Jones]History nyonya food Singapore Airlines Jakarta Weather Nikon
Finepix Car Rental Kuala Lumpur 53. Contextual Content Delivery
- Context Vs. Personalization
- Trigger the right search depending on the context
- Not interested in your personal profile
-
- Your favorite restaurant?
-
-
- It depends on where you are!
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla
REQUIREMENTS // from Ricardo Baeza-Yates, Next Generation Search ,2
ndSearch Computing Workshop, Milan, 24/06/2010 Demo:
http://sandbox.yahoo.com/Motif 54. Relevance: the Top-k problem
- Relevance of the resultswith respect to the request is the main
expectation for search engine users
- Top-k relevant items : retrieve quickly a number ( k)of highest
ranking tuples in the presence of monotone ranking functions
defined on the attributes of underlying relations
-
- R. Fagin. Combining fuzzy information from multiple systems. J.
Comput. Syst. Sci., 58(1):8399, 1999.
-
- F. Ilyas, R. Shah, W. G. Aref, J. S. Vitter, and A. K.
Elmagarmid. Rank-aware query optimization. In SIGMOD Conference,
pages 203214, 2004
-
- D. Martinenghi and M. Tagliasacchi: Proximity Rank Join,to
appear in PVLDB
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla
REQUIREMENTS // 55. Result Diversification
- Relevance is not the only success factor for a result set
- User satisfactionis increased if the first items cover a good
spectrum of options
-
- If userintent is ambiguous , diversification tries to cover the
most likely intents
-
- If several top-kitems are very similar ,they can be clustered
together
- Thus: an optimization problem
- Objective: find the set of kelements that contains themost
relevant and diverse items
- Maximal Marginal Relevance[Carbonell and Goldstein 1998]
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla
REQUIREMENTS // Relevance Diversity 56. User Interface
- More Complete information on one search
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla
REQUIREMENTS // Shortcuts Deep Links Enhanced Results 57. User
Interface July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla
REQUIREMENTS // 58. User Interface
- Optimization of the result set layout (and of page space)
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla
REQUIREMENTS // 59. User Interface
- Optimization of the result set layout (and of page space)
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla
REQUIREMENTS // 60. User Interface
- Optimization of the result set layout (and of page space)
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla
REQUIREMENTS // 61. User Interface
- Optimization of the result set layout (and of page space)
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla
REQUIREMENTS // 62. Performance
- Users dont want tolose their timewaiting for a search
result
- Performances are the leading factorfor the evaluation ofWeb
Search applications
-
- Queries per seconds (QPS)
-
- Service-oriented computing
-
- Content Delivery Networks
-
-
- But intellectual properties may be a concern
-
- More in section (ARCHITECTURE)
2010 Alessandro Bozzon, Marco Brambilla July 5, 2010
REQUIREMENTS // 63. Other Requirements
-
- User relationships and actions as additional content
description
-
-
- Collection Vs. Item level
-
-
- Who I am = What I like + What I do + Where I am ?
-
-
- A search process tells a lot about whom is doing it
- Alessandro Bozzon, Tereza Iofciu, Wolfgang Nejdl, Antonio V.
Taddeo, Sascha Tnnies, Role Based Access Control for the
interaction with Search Engines, (COOPER) 2007, Crete, Greece
.
2010 Alessandro Bozzon, Marco Brambilla July 5, 2010
REQUIREMENTS // 64. Design 2010 Alessandro Bozzon, Marco Brambilla
65. Designing Web Search Applications
- Reference execution processes
- Tools supporting the methodology
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla DESIGN //
66. Search Applications from 1000 feet 2010 Alessandro Bozzon,
Marco Brambilla July 5, 2010DESIGN // 67. Bird eye view on Search
Applications 2010 Alessandro Bozzon, Marco Brambilla July 5,
2010DESIGN // 68. Search Application Processes July 5, 2010 2010
Alessandro Bozzon, Marco Brambilla DESIGN // 69. An example of
Indexing ProcessJuly 5, 2010 2010 Alessandro Bozzon, Marco
Brambilla DESIGN // 70. Pharos: the architecture July 5, 2010 2010
Alessandro Bozzon, Marco Brambilla DESIGN // 71. Search Computing:
the architecture July 5, 2010 2010 Alessandro Bozzon, Marco
Brambilla DESIGN // Main Query flow relation 72. Search Computing:
the architecture July 5, 2010 2010 Alessandro Bozzon, Marco
Brambilla DESIGN // High level query Where can I attend a DB
scientific conference close toa beautiful beach reachablewith cheap
flights? Sub query 1 Where can I attenda DB scientificconference?
Sub query 2 place close toa beautifulbeach? Sub query 3 place
reachablewith cheap flight? 73. Search Computing: the architecture
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla DESIGN // Low
level query 1 ConfSearch(DB,placeX,dateY) Low level query 2
TourSearch(Beach,PlaceX) Low level query 3 Flight(cost Finland?
Finlands?, Finlands?
-
-
-
- Hewlett-Packard -> Hewlett and Packard as two tokens?
-
-
-
- San Francisco: one token or two? How do you decide it is one
token?
-
-
- Language issues(normalization)
-
-
-
- Accents: rsum vs. resume.
-
-
-
- L'ensemble -> one token or two?
-
-
- How are your users like to write their queries for these
words?Use locale?
-
-
-
- Punctuation(e.g: U.S.A. vs. USA)
-
-
-
- Numbers (100.45 vs. 100,45 vs. 1.0045 E+2 )
-
-
-
- Dates (e.g. March 1 st2009 vs. 03/01/09 vs. 1/03/2009)
- It depends on the addressed language
-
- E.g., in Chinese spaces do not separate words
-
-
- (tokenization based on vocabulary)
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla
IMPLEMENTATION // 127. Stopword Removal
-
- Removal of high-frequency words , which carry less
information
-
-
- Statistical analysis on the indexed collection
-
-
- Functional terms (articles, conjunctions, auxiliary verbs)
-
-
- A-priori knowledge, based on the IR system domain
-
-
-
- Creation of a stop-list with all the terms to remove
-
-
-
- English stop list is about 200-300 terms (e.g., been, a, about,
otherwise, the, etc..)
-
-
-
-
-
http://www.dcs.gla.ac.uk/idom/ir_resources/linguistic_utils/stop_words
- < 30% - 50% of tokens (smaller dictionary)
- It candecrease recall(e.g. to be or not to be, let it be)
- Most of WEB search enginesdo notremove stopwords[
ManningIR]
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla
IMPLEMENTATION // 128. Phrases (noun groups)
-
- Phrases capture the meaning behind the bag of words and result
inmulti-term phrases
-
-
- Added to the query: a query New York should be modified to
search for New York> 10% in precision and recall
-
-
- Replace terms in index: empirically considered not as good as
query rewriting
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla
IMPLEMENTATION // 129. Phrases (noun groups) - Strategies
-
-
- Many systems identify phrases as any pairs of terms not
separated by:
-
-
- Phrases occurring fewer than 25 times are removed (decrease in
memory requirements)
-
- Part Of Speech and Word Sense tagging
-
-
- statistical or rule-based methods to identify the part of
speech (noun, verb, adjective) of each token
-
-
- Identify the key syntactic components of a sentence usually by
tagging according to POS and then applying a grammar (FSA and
NFSA)
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla
IMPLEMENTATION // 130. Thesauri
- A thesaurus is as aclassificationscheme composed ofwords and
phraseswhose organization aims atfacilitatingthe expression of
ideas in written text
-
- E.g.: synonyms and homonyms
-
-
- Example entry from Rogets 1thesaurus: cowardlyadjective
-
-
-
- Ignobly lacking in courage: cowardly turncoats.
-
-
-
- Syns: chicken (slang) chicken-hearted, craven, dastardly,
faint-hearted, gutless, lily-livered
-
- Thematic: specific to the IR systems domain of application
(most frequent case)
-
-
- E.g.: Thesaurus of Engineering and Scientific Terms
- A thesaurus can be used to
-
- Helpuser formulate queries
-
- Modificationof queries by the system
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla
IMPLEMENTATION // 131. Thesauri
- Many kinds of thesauri have been developed for IR systems
-
- Hierarchical: synonyms(RTrelated terms, UFuse
for),generalization(BTbroader term),specialization(NTnarrower
term)
-
-
- ISO and ANSI standards, almost always thematic
-
-
- Manually built and updated by domain experts
-
- Clustered:cluster (or synset) of words
-
-
- Non-typed, semantic relationships among cluster
-
-
-
- Each cluster is a set of word having strong semantic
relationship (usually UF)
-
-
- Clustered Thesauri can be automatically generated if no
distinction is made among semantic relationships
-
- Associative:graph of words, where nodes represents words and
edges representssemantic similarityamong words
-
-
- Edges can be oriented or not, according to the symmetry of the
similarity relationship
-
-
- Edged can be weighted (fuzzy pseudo-thesauri)
-
-
- Can be automatic generated from a collection of documents using
a co-occurrence relationships
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla
IMPLEMENTATION // 132. Stemming and Lemmatization
-
- Reduce terms to their roots before indexing
-
- Reduce inflectional/variant forms to base form
-
-
-
- car, cars, car's, cars' -> car
-
-
-
- the boy's cars are different colors -> the boy carbe
different color
- Stemming : heuristic process that chops off the ends of words
in the hope of achieving the goal correctly the most of the
time
-
- Stemming collapses derivationally related words
- Lemmatization : NPL tool. It uses dictionaries and
morphological analysis of words in order to return the base or
dictionary form of a word
-
- Lemmatization collapses the different inflectional forms of a
lemma
-
- Not widely used cause it harms performances
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla
IMPLEMENTATION // 133. Stemming
- Many different algorithms :
-
-
- Commonest algorithm for stemming English
-
-
-
- Porter, Martin F. 1980. An algorithm for suffix
stripping.Program 14:130137.
-
-
-
- http://www.tartarus.org/martin/PorterStemmer/
-
-
- Lovins, Julie Beth. 1968. Development of a stemming
algorithm.Translation and
-
-
- http://www.comp.lancs.ac.uk/computing/research/stemming/
-
-
- Paice, Chris D. 1990. Another stemmer.SIGIR Forum 24:5661
-
-
- http://snowball.tartarus.org/demo.php
- Stemming increases recall while harming precision
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla
IMPLEMENTATION // 134. Stemming Example July 5, 2010 2010
Alessandro Bozzon, Marco Brambilla IMPLEMENTATION // 135. Tools for
text analysis _1
- Lucene and Solr contains a lot of text analyzer working on
several languages
-
-
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters
-
-
- CharFilters, Tokenizer, Token Analyzers
-
- toolkit for detecting and extracting metadata and structured
text content from various documents using existing parser
libraries
- GATE(General Architecture for Text Engineering)
-
- ANNIE (A Nearly-New Information Extraction System)
-
-
- tokenizer, gazetteer, sentence splitter, part of speech
tagger,
-
-
- named entities transducer, coreference tagger
-
-
- Support for English, Spanish, Chinese, Arabic, French,
German,
-
-
-
- Hindi, Italian, Cebuano, Romanian, Russian
- MALLET(Machine Learning for Language Toolkit)
-
- http://mallet.cs.umass.edu/index.php
-
- Java-based package for statistical natural language processing,
document classification, clustering, topic modeling, information
extraction, and other machine learning applications to text
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla
IMPLEMENTATION // 136. Tools for text analysis _2
-
- http://opennlp.sourceforge.net/projects.html
-
- open source projects related to natural language
processing)
- Cognitive Computation Group University of Illinois
-
- http://l2r.cs.uiuc.edu/~cogcomp/software.php
-
-
- Chunker, Part of Speech tagger, String similarity, Semantic
Role Labeler Named Entity Extractor, etc.
-
- http://medialab.di.unipi.it/wiki/SuperSense_Tagger
-
- tool for assigning to each noun, verb, adjective and adverb of
a sentence one of the45 standard WordNet supersenses
-
- http://wndomains.fbk.eu/hierarchy.html
-
- http://www.synesketch.krcadinac.com/
-
- Open source textual emotion recognition
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla SECTION
NAME // 137. Multimedia Content Analysis
- Computer are not able to catch the underlying meaning of a
multimedia content.Annotation is needed.
-
-
- It can take up to 10x the duration of the video
-
-
- Problems in scaling to millions of contents
-
-
- People might not be able to holistically catch all the meanings
associated with a multimedia object
-
-
- Some contents are tedious to describe with words
-
-
-
- E.g., a melody without lyrics
-
-
- Some technologies have a ~90% precision
2010 Alessandro Bozzon, Marco Brambilla July 5,
2010IMPLEMENTATION // 138. Audio Segmentation
- GOAL: split an audio track according to contained
information
-
- Identification and removal of ads
2010 Alessandro Bozzon, Marco Brambilla July 5,
2010IMPLEMENTATION // 139. Video Segmentation
-
- segment a video track according to its keyframes
-
-
- fixed-length temporal segments
-
- automated detection of transitions between shots
-
-
- a shot is a series of consecutive pictures taken contiguously
by a single camera and representing a continuous action in time and
space.
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla
IMPLEMENTATION // CREDITS:Thorsten Hermes@SSMT2006 140. Speech
Analysis
- Speaker Identification : identify people participating in a
discussion
- Speech To Text : automatically recognize spoken words belonging
to an open dictionary
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla
IMPLEMENTATION // ERIC DAVID JOHN 141. Classification of Music
Genre
- GOAL: automatically classify the genre and mood of a song
-
- Rock, pop, Jazz, Blues, etc.
-
- Happy, aggressive, sad, melancholic,
-
- Automatic selection of songs for playlist composition
- Tutorial from PHAROS Summer School
-
-
http://www.pharos-audiovisual-search.eu/res/files/SummerSchool/Programme_Summer_School_file.zip
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla
IMPLEMENTATION // Rock Dance! 142. Images: Low-level features
- GOAL: extract implicit characteristics of a picture
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla
IMPLEMENTATION // 143. Face Identification and Recognition
- GOAL: recognize and identify faces in an image
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla
IMPLEMENTATION // CREDITS:Thorsten Hermes@SSMT2006 144. Image
Concept Detection
- GOAL: recognize context/ concepts of an image
-
- E.g., playground, seaside, road, ...
- Extraction of low level features from raw data
-
- color histograms, color correlograms, color
moments,co-occurrence texture matrices, edge direction histograms,
etc..
- Features can be used to builddiscrete classifiers , which may
associate semantic concepts to images or regions thereof
-
- The MediaMill semantic search engine defines 491 semantic
concepts
-
-
- http://www.science.uva.nl/research/mediamill/demo
- Concepts can be detected also from text (e.g., from manual or
automatic metadata) using NLP techniques
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla
IMPLEMENTATION // 145. Image Object Identification
- GOAL: identify objects appearing in a picture
-
- Basket ball, cars, planes, players, etc.
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla
IMPLEMENTATION // 146. Tools for media analysis _1
-
- http://opencv.willowgarage.com/wiki/
-
- Framework for image analysis
-
- http://www.gnu.org/software/octave/
-
- high-level language, primarily intended for numerical
computations, it works well with Matlab
- Marsyas(Music Analysis, Retrieval and Synthesis for Audio
Signals)
-
- http://marsyas.sness.net/
-
- Framework for music analysis and retrieval
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla
IMPLEMENTATION // 147. Tools for media analysis _2
-
- http://www.tina-vision.net/
-
- is an open source environment developed to accelerate the
process of image analysis research.
-
- http://cmusphinx.sourceforge.net/sphinx4/
-
- speech recognition system written entirely in the Java
-
-
- http://www.cs.waikato.ac.nz/ml/weka/
-
-
- A collection of machine learning algorithms for data
mining
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla
IMPLEMENTATION // 148. Validation 2010 Alessandro Bozzon, Marco
Brambilla July 5, 2010 149. Disclaimer
- This section is inspired by the WWW2010 tutorialby Dasdan,
Tsioutsiouliklis, Velipasaoglu @ WWW2010
- Web Search Engine Metricsfor Measuring User Satisfaction
- http://analytics.ncsu.edu/reports/wsmt.pdf
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla VALIDATION
// 150. Measures for IR Systems
-
- How fast does it process (index) documents?
-
-
- Latency as a function of index size
-
-
- Expressiveness of query language
- Thekeymeasure: userhappiness
-
-
- Speed of response/size of index are factors
-
-
-
- But blindingly fast, useless answers wontmake a user happy
-
- How do we quantify user happiness?
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla VALIDATION
// 151. Measuring User Happiness
- Whois the user we are trying to makehappy?
-
-
- Web engine: user finds what they want andreturn to the
engine
-
-
-
- Can measure rate of return users
-
-
- eCommerce site: user finds what they wantand make a
purchase
-
-
-
- Is it the end-user, or the eCommerce site,whose happiness we
measure?
-
-
-
- Measure time to purchase, or fraction ofsearchers who become
buyers?
-
-
- Enterprise (company/govt/academic): Care about user
productivity
-
-
-
- How much time do my users save whenlooking for
information?
-
-
-
- Many other criteria having to do with breadth of access, secure
access
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla VALIDATION
// 152. Evaluation measures
-
- Presence of content of interest in a catalog
-
- How many new resources (in the collection) are in the
catalogue
-
- How long it took to get the new resources in the catalog?
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla VALIDATION
// 153. Relevance as a measure of user happiness
- How do you measure relevance?
- In order to assess the performance of a IR system you needed a
test collection composed of:
-
- A benchmark document collection
-
- A benchmark suite of queries
-
- A binary assessment of eitherRelevantorIrrelevantfor each
query-doc pair ( gold standard , orground truth )
- Test collection must be of a reasonable size
-
- Need to average performance since results are very variable
over different documents and information needs
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla VALIDATION
// 154. Evaluating Relevance
- Rankbased evaluation withexplicitjudgment
- Rankbased evaluation withimplicitjudgment
-
- Direct and indirect evaluation by clicks
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla VALIDATION
// NOTCOVERED HERE 155. Information Need Translation
- Relevance is assessed relative to the neednot to the query
-
- I'm looking for information on whether drinking red wine is
more effective at reducing your risk of heart attacks than white
wine.
-
- Query:wine red white heart attack effective
- A document is relevant if itaddressesthe stated information
need,not just because itcontainsall the word in the query
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla VALIDATION
// 156. Set-based evaluation
- The two most frequent and basic measures for IR effectiveness
areprecisionandrecall
-
- Precision: fraction of retrieved docs that are relevant
-
-
-
- Provides a measure of the degree of soundness of the
system
-
-
-
- This not consider the total number of documents
-
- Recall: fraction of relevant docs that are retrieved
-
-
-
- Provides a measure of the degree of completeness of the
system
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla VALIDATION
// 157. Precision / Recall
- Can get highrecall (but lowprecision ) byretrieving all docs
for all queries!
-
- Recall is anon-decreasingfunction of thenumber of docs
retrieved
-
- Precision usually decreases (in a good system)
- Precisioncan be computedat different levels ofrecall
-
- Perhaps most appropriate for web search: all people want are
good matches on the first one or tworesults pages
-
- Professional searchers, paralegals, intelligence analysts
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla VALIDATION
// 158. F-Measure
- Combined measurethat assesses the tradeoff between precision
and recall (weighted harmonicmean):
-
- Values of 1 emphasize recall
- People usually use balancedF 1measure
- Harmonic mean is conservative average
-
-
-
- [CJ van Rijsbergen,Information Retrieval ]
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla VALIDATION
// 159. Difficulties in using precision/recall
- Average over large corpus/query
-
- Need human relevance assessments
-
-
- People arent reliable assessors
-
- Assessments have to be binary
-
- Heavily skewed by corpus/authorship
-
-
- Results may not translate from one domain to another
- The relevance of one document is treated asindependentof the
relevance of other document
-
- This is also an assumption in most retrieval system
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla VALIDATION
// 160. Ranked Based evaluation
- In ranked retrieval systems,PandRare values relative to arank
position
- Evaluation performed by computing precision as a function of
recall
- Function computed at each rank position in which a
relevant
- document has been retrieved
- Resulting values are interpolated
- yielding a precision/recall plot
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla VALIDATION
// 161. Measures for Ranked Based evaluation
- Mean average precision ( MAP )
-
- Measure of quality at all recall levels
-
- Not all queries will have more than K relevant results
-
- Even a perfect system may have a score less than 1.0 for some
queries
-
- Use a variable result set cut-off for each query based on
number of its relevant results
- Mean Reciprocal Rank ( MRR )[ Voorhees 1999]
-
- Reciprocal of the rank of thefirst relevant result averagedover
a population of queries
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla VALIDATION
// 162. Discounted Cumulative Gain (DCG)
- [Jrvelin and Keklinen 2002]
- Gain adjustable for importance of different relevance gradesfor
user satisfaction
- Discounting desirable for web ranking
-
- Most users dont browse deep
-
- Search engines truncate the list of results returned.
- DCG yieldsunbounded scores
-
- For each query, divide the DCG by the best attainable DCG for
that query
-
- Normalized Discounted Cumulative Gain (nDCG)
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla VALIDATION
//
163. Preference Judgment
-
- Based on counts of preferences
-
- Robust for incomplete judgments
- Binary Preference (bpref)
-
- Buckley and Voorhees (2004)
-
- Designed for incompletejudgments
-
- Generalized to graded judgment
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla VALIDATION
// A: preferences in agreement D: preferences in disagreement N r=
# of non-relevant docs above relevant doc r, In the first R
non-relevant R = number of relevant results for the query 164.
Presentation Metrics
- How to present information?
-
- Where they should be displayed
-
- Which presentation elements should be used?
-
-
- Font, colors, design elements, interaction design
-
-
- On-line, on-home, usability, eye tracking, focus group,
surveys
-
-
- Comparative, Perceived vs. actual
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla VALIDATION
// 165. Not all results are likely to be reviewed July 5, 2010 2010
Alessandro Bozzon, Marco Brambilla VALIDATION //
(Source:iprospect.comWhitePaper_2006_SearchEngineUserBehavior.pdf)
166. Clicks and views depend on rank July 5, 2010 2010 Alessandro
Bozzon, Marco Brambilla VALIDATION // [Joachims et al, 2005] 167.
Eye Tracking Studies July 5, 2010 2010 Alessandro Bozzon, Marco
Brambilla VALIDATION // 168. Heat Maps
-
- Thefirst result is always considered moretrusted and
morerelevant by default
-
- The user spend less time reading the lower part of the
page
-
- [Marti A. Hearst,Search User Interfaces , Cambridge University
Press, 2009]
July 5, 2010 2010 Alessandro Bozzon, Marco Brambilla VALIDATION
// 169. Thank you for your attention!
2010 Alessandro Bozzon, Marco Brambilla Alessandro Bozzon
Dipartimento di Elettronica e Informazione Politecnico di Milano
Milano, Italy [email_address] http://home.dei.polimi.it/bozzonMarco
Brambilla Dipartimento di Elettronica e Informazione Politecnico di
Milano Milano, Italy [email_address]
http://home.dei.polimi.it/mbrambil
http://www.search-computing.org/book July 5, 2010REFERENCES // 170.
References Books
- Modern Information Retrieval
-
- Ricardo Baeza-Yates, Berthier Ribeiro-Neto ,Addison Wesley
Longman Publishing Co. Inc., 2010
- [ManningIR] Introduction to Information Retrieval
-
- Christopher D. Manning, Prabhakar Raghavan and Hinrich
Schtze,Cambridge University Press, 2008
- Information Retrieval: Algorithms and Heuristics .
-
- D.A. Grossman, O. Frieder. Springer, 2004
-
- I.H. Witten, A. Moffat, T.C. Bell. Morgan Kaufmann, 1999
- Mining the Web: Analysis of Hypertext and Semi Structured Data
.
-
- S. Chakrabarti. Morgan Kaufmann, 2002
-
- Marti A. Hearst. Cambridge University Press, 2009
- Search Computing Challenges and directions
-
- Stefano Ceri, Marco Brambilla(eds.) . Springer LNCS, vol. 5950,
2010
2010 Alessandro Bozzon, Marco Brambilla July 5, 2010REFERENCES
// 171. References - Tutorial
- Web Search Engine Metrics: Direct Metrics to Measure User
Satisfaction
-
- Ali Dasdan, Kostas Tsioutsiouliklis, Emre Velipasaoglu
(Yahoo!)
- Recent Progress on Inferring Web Searcher Intent
-
- Eugene Agichtein (Emory University)
- Applications of Open Search Tools
-
- Rosie Jones, Ted Drake (Yahoo!)
- [BAEZASeco2010] New Frontiers for Search
-
- Ricardo Baeza-Yates and Rosie Jones (Yahoo!)
2010 Alessandro Bozzon, Marco Brambilla July 5, 2010REFERENCES
// 172. References - Papers
- [Ramakrishnan and Tomkins 2007] Raghu Ramakrishnan, Andrew
Tomkins:Toward a PeopleWeb
-
- IEEE Computer 40(8): 63-72 (2007)
- [Broder2002] A. Broder.A taxonomy of web search
-
- SIGIR Forum, 36(2):310, 2002.
- [BATES2002]Bates, Marcia J.Toward an integrated model for
information seeking and searching
-
- In: The Fourth International Conference on Information Needs,
Seeking and Use in Dierent Contexts, 2002
- [FU2007] Fu, Wai-Tat; Pirolli, Peter,SNIF-ACT: a cognitive
model of user navigation on the world wide web
-
- Human-Computer Interaction: 335412 , 2007
- [Withrow2002] Jason Withrow,Do your links stink?
-
- American Society for Information Science Bulletin, June 1,
2002
- [Pirolli2009] Pirolli, PeterAn elementary social information
foraging model
-
- Proceedings of the 27th international conference on Human
factors in computing systems: 605614, 2009
- [BATES1989] M.J. Bates.The design of browsing and berrypicking
techniques for the online search interface
-
- Online Review, 13(5):407431,1989.
- [Teevan et al., CHI 2004] Teevan, J., Alvarado, C., Ackerman,
M. and Karger, D.The perfect Search Engine is not Enough: A Study
of Orienteering Behavior in Directed Search
-
- Proceedings of ACM CHI 2004, pp. 415-4422.
- [MARCHIONINI2006]Marchionini, G.Exploratory search:from finding
to understanding .
-
- Communications ACM 49(4): 41-46 (2006)
- [WHITE2007] White, R. W., and Drucker, S. M.Investigating
behavioral variability in web search
-
- 16th WWW Conf. (Banff, Canada, 2007)
- [AULA2008] Aula, A., and Russell, D.M.Complex and Exploratory
Web Search
-
- ISSS: Information Seeking Support Systems Workshop (Chapel
Hill, June 2008)
2010 Alessandro Bozzon, Marco Brambilla July 5, 2010REFERENCES
// 173. References - Papers
- [BozzonEtAL2010] Alessandro Bozzon, Marco Brambilla, Piero
Fraternali, Stefano Ceri.Liquid Query: multi-domain exploratory
search on the Web
- [FAGIN1999] R. Fagin.Combining fuzzy information from multiple
systems
-
- J. Comput. Syst. Sci., 58(1):8399, 1999.
- [ILYAS1999] F. Ilyas, R. Shah, W. G. Aref, J. S. Vitter, and A.
K. Elmagarmid.Rank-aware query optimization
-
- In SIGMOD Conference, pages 203214, 2004.
- [MARTINENGHI2010] D. Martinenghi and M. Tagliasacchi:Proximity
Rank Join
- [Carbonell and Goldstein 1998] J. Goldstein and J. Carbonell
(1998), Summarization:Using MMR for Diversity- based Reranking
- [BozzonEtAl2007] Alessandro Bozzon,et AlRole Based Access
Control for the interaction with Search Engines
-
- International Workshop on Collaborative Open Environments for
Project-Centered Learning (COOPER) 2007, Crete, Greece.
- [BozzonEtAl2009] Alessandro Bozzon, Marco Brambilla, Piero
FraternaliConceptual Modeling of Multimedia Search Applications
using Rich Process Models
-
- ICWE 2009, June 24-26, 2009, San Sebastian, Spain
- [BozzonThesis2009]Alessandro Bozzon,Model-driven development of
Search Based Web Applications
-
- Ph.D Thesis, Politecnico di Milano, April 2009.
- [BragaEtAl2010] D. Braga, S. Ceri, F. Corcoglioniti,M.
Grossniklaus, and S. Vadacca:Panta Rhei: An Execution Model for
Queries over Web Information Sources
-
-
http://www.search-computing.it/sites/cms.web.seco/files/pantarhei2010.pdf
2010 Alessandro Bozzon, Marco Brambilla July 5, 2010REFERENCES
// 174. References - Papers
- [Allan 2005] J. Allan (2005),HARD track overview in TREC 2005:
High accuracy retrieval from documents.
- [Voorhees 1999] E.M. Voorhees (1999),TREC-8 question answering
track report
- [Jrvelin and Keklinen 2002] K. Jrvelin and J.
Keklinen,Cumulated gain-based evaluation of IR techniques
-
- ACM Trans. IS, 20(4): 422-446, 2002
- [Buckley and Voorhees (2004)] C. Buckley and E.M.
Voorhees,Retrieval evaluation with incomplete information
- [De Beer and Moens (2006)] De Beer, Jan; Moens,
Marie-Francine.Rpref: a generalization of Bpref towards graded
relevance judgments
-
- SIGIR 2006, Seattle, USA, 6-11 August 2006, pages 637-638,
ACM
2010 Alessandro Bozzon, Marco Brambilla July 5, 2010REFERENCES
// 175. References - Links
- Search Computing Course Lecture Notes
-
- http://www.search-computing.it/course
- Fabio Aolli,Universit di Padova,
http://www.math.unipd.it/~aiolli/corsi/0809/IR/IR.html
- http://www.ir.disco.unimib.it/
2010 Alessandro Bozzon, Marco Brambilla July 5, 2010REFERENCES
//