Upload
amy-horn
View
218
Download
0
Tags:
Embed Size (px)
Citation preview
Conceptual structures in
modern information retrieval
Claudio CarpinetoClaudio CarpinetoFondazione Ugo BordoniFondazione Ugo Bordoni
[email protected]@fub.it
OverviewOverview
• Keyword-based IR and early conceptual approachesKeyword-based IR and early conceptual approaches
• Context and concepts in modern topical IRContext and concepts in modern topical IR
• Emerging IR tasks requiring knowledge structuresEmerging IR tasks requiring knowledge structures
• Research at FUBResearch at FUB
• ConclusionsConclusions
DocumentsDocuments QueryQuery
Vectors ofVectors ofweighted keywordsweighted keywords
Vector of Vector of weighted keywordsweighted keywords
Retrieved documentsRetrieved documents
MatchingMatching
Vector-based IR
Term weighting
• tf.idf and vector space model (Salton) very popular in70’s and 80’s
• BM25 (Robertson) has been the state of the art in the 90’s
• Several recent term-weighting functions based on statistical language modeling (Ponte, Lafferty)
• A new weighting framework based on deviation from randomness + information gain (FUB + UG)
W = Inf1. Inf2
tf . log [(N + 1) / (n + 0.5)]......…
tf / (tf + 1)......…
tfn = tf . log (1 + K . avg_l / l)
Inherent limitations of keyword-based IR
• Vocabulary problemVocabulary problem
• Relations are ignoredRelations are ignored
Early approaches to conceptual IR
• n-gramsn-grams (Salton 1975, Maarek 1989)
• parse treeparse tree (Dillon 1983, Metzler 1989)
• case relationscase relations (Fillmore 1968, Somers 1987)
• conceptualconceptual graphsgraphs (Dick 1991)
Why early conceptual IR not successful
• No best representation schemeNo best representation scheme
• Manual coding too costlyManual coding too costly
• Automated coding too hardAutomated coding too hard
• Training required both for the indexer and the userTraining required both for the indexer and the user
• Effectiveness not clearly demonstratedEffectiveness not clearly demonstrated
• Retrieval task often not appropriateRetrieval task often not appropriate
OverviewOverview
• Vector-based IR and early conceptual approachesVector-based IR and early conceptual approaches
• Context and concepts in modern topical IRContext and concepts in modern topical IR
• Emerging IR tasks requiring knowledge structuresEmerging IR tasks requiring knowledge structures
• Research at FUBResearch at FUB
• ConclusionsConclusions
Evolution of topical IR
• Very short queriesVery short queries
• Heterogeneous collectionsHeterogeneous collections
• Unreliable sourcesUnreliable sources
• Interactive sessionsInteractive sessions
IndexingIndexing
DocsDocs QueryQuery ContextContext
VisualizationVisualization
RankingRanking
UseUse
IndexingIndexing
InteractionInteraction
Model of modern topical IRModel of modern topical IR
Ranking
Query
Inverted File
Weighted Query
Form. Docs
+norm
Select top D docs
Compute σ(w )
Select top E terms
Query Expansion
Performance of retrieval feedback versus query difficultyPerformance of retrieval feedback versus query difficulty
Ranking based on interdocument similarity
Cluster hypothesis (van Rijsbergen 1978)Cluster hypothesis (van Rijsbergen 1978)
ApproachesApproaches
- Matching the query against document clusters (Willet 1988)- Matching the query against document clusters (Willet 1988)
- Matching the query against transformed document- Matching the query against transformed document representations (GVSM, Wong 1987, LSI, Deerwester 1990)representations (GVSM, Wong 1987, LSI, Deerwester 1990)
- Computing the conceptual distance between query andComputing the conceptual distance between query and documents (Order-theoretical ranking, Carpineto 2000)documents (Order-theoretical ranking, Carpineto 2000)
Order-theoretical ranking
NNS 0 FINANCE (Query)
1 NNS
FINANCE CREDIT
KBS (D7)
4 KBS
1 NNS
FINANCE BANK
ACCOUNT (D1)
1 NNS
1 FINANCE
2 NNS
BANK
2 NNS
BANK ACCOUNT
(D3)
2 FINANCE
CREDIT KBS (D4)
3 CREDIT
KBS (D5)
3 NNS
BANK RIVER
(D2)
3 BANK
4 BANK
KBS WATERS
(D6)
Performance of order-theoretical ranking
• Better than hierarchic clustering and comparable to best matching on the whole collection
• Markedly better than both hierarchic clustering and best matching on non-matching relevant documents
• Order-theoretical ranking does not scale up well but it is synergistic with best matching document ranking
OverviewOverview
• Vector-based IR and early conceptual approachesVector-based IR and early conceptual approaches
• Context and concepts in modern topical IRContext and concepts in modern topical IR
• Emerging IR tasks requiring knowledge structuresEmerging IR tasks requiring knowledge structures
• Research at FUBResearch at FUB
• ConclusionsConclusions
Question Answering
Task:
Closed-class questions in unrestricted domains with
no guarantee of answer and result possibly scattered
over multiple documents
Question Answering
Approach:
1. Recognize type of queries2. Retrieve relevant documents3. Find sought entities near question words4. Fall back to best-matching passage retrieval in case of failure
Web Information Retrieval
Web Information Retrieval
Current tasks:
named-entity finding tasktopic distillation task
Approach:
1. Use of multiple methods2. Combination of results via interpolation and normalization schemes
XML document retrieval
Goal:
Use document structure to improve precision andrecall of unstructured queries
“concerts this weekend at Sofia under 20 euros”
Approaches:
• Automatic inference of query structure
• Semi-automatic query annotation
• Hybrid query languages
OverviewOverview
• Vector-based IR and early conceptual approachesVector-based IR and early conceptual approaches
• Context and concepts in modern topical IRContext and concepts in modern topical IR
• Emerging IR tasks requiring knowledge structuresEmerging IR tasks requiring knowledge structures
• Research at FUBResearch at FUB
• ConclusionsConclusions
Recommender systemsRecommender systems
“Related keyword” feature
versus
Context-dependent query reformulation
DocumentDocument
RankingRanking
DocsDocs
QueryQueryQuery
Term ranking 1Term ranking 1
Term ranking 2Term ranking 2
Term ranking 3Term ranking 3
+
Combining text retrieval and text mining with concept latticesCombining text retrieval and text mining with concept lattices
Integration of multiple search strategies
(querying, browsing, thesaurus climbing,
bounding) into a unique Web interface
Goal
The use of conceptual structures surfaces in traditionaltopic relevance retrieval and it is at the heart of manynon-topical retrieval tasks
Towards conceptual search
Conclusions
•Understand term meaning•Adapt to the user•Can translate between applications•Explainable•Capable of filtering and summarization