28
Conceptual structures in modern information retrieval Claudio Carpineto Claudio Carpineto Fondazione Ugo Bordoni Fondazione Ugo Bordoni Roma Roma [email protected] [email protected]

Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni [email protected]

Embed Size (px)

Citation preview

Page 1: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it

Conceptual structures in

modern information retrieval

Claudio CarpinetoClaudio CarpinetoFondazione Ugo BordoniFondazione Ugo Bordoni

[email protected]@fub.it

Page 2: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it

OverviewOverview

• Keyword-based IR and early conceptual approachesKeyword-based IR and early conceptual approaches

• Context and concepts in modern topical IRContext and concepts in modern topical IR

• Emerging IR tasks requiring knowledge structuresEmerging IR tasks requiring knowledge structures

• Research at FUBResearch at FUB

• ConclusionsConclusions

Page 3: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it

DocumentsDocuments QueryQuery

Vectors ofVectors ofweighted keywordsweighted keywords

Vector of Vector of weighted keywordsweighted keywords

Retrieved documentsRetrieved documents

MatchingMatching

Vector-based IR

Page 4: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it

Term weighting

• tf.idf and vector space model (Salton) very popular in70’s and 80’s

• BM25 (Robertson) has been the state of the art in the 90’s

• Several recent term-weighting functions based on statistical language modeling (Ponte, Lafferty)

• A new weighting framework based on deviation from randomness + information gain (FUB + UG)

Page 5: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it

W = Inf1. Inf2

tf . log [(N + 1) / (n + 0.5)]......…

tf / (tf + 1)......…

tfn = tf . log (1 + K . avg_l / l)

Page 6: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it

Inherent limitations of keyword-based IR

• Vocabulary problemVocabulary problem

• Relations are ignoredRelations are ignored

Page 7: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it

Early approaches to conceptual IR

• n-gramsn-grams (Salton 1975, Maarek 1989)

• parse treeparse tree (Dillon 1983, Metzler 1989)

• case relationscase relations (Fillmore 1968, Somers 1987)

• conceptualconceptual graphsgraphs (Dick 1991)

Page 8: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it

Why early conceptual IR not successful

• No best representation schemeNo best representation scheme

• Manual coding too costlyManual coding too costly

• Automated coding too hardAutomated coding too hard

• Training required both for the indexer and the userTraining required both for the indexer and the user

• Effectiveness not clearly demonstratedEffectiveness not clearly demonstrated

• Retrieval task often not appropriateRetrieval task often not appropriate

Page 9: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it

OverviewOverview

• Vector-based IR and early conceptual approachesVector-based IR and early conceptual approaches

• Context and concepts in modern topical IRContext and concepts in modern topical IR

• Emerging IR tasks requiring knowledge structuresEmerging IR tasks requiring knowledge structures

• Research at FUBResearch at FUB

• ConclusionsConclusions

Page 10: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it

Evolution of topical IR

• Very short queriesVery short queries

• Heterogeneous collectionsHeterogeneous collections

• Unreliable sourcesUnreliable sources

• Interactive sessionsInteractive sessions

Page 11: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it

IndexingIndexing

DocsDocs QueryQuery ContextContext

VisualizationVisualization

RankingRanking

UseUse

IndexingIndexing

InteractionInteraction

Model of modern topical IRModel of modern topical IR

Page 12: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it

Ranking

Query

Inverted File

Weighted Query

Form. Docs

+norm

Select top D docs

Compute σ(w )

Select top E terms

Query Expansion

Page 13: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it

Performance of retrieval feedback versus query difficultyPerformance of retrieval feedback versus query difficulty

Page 14: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it

Ranking based on interdocument similarity

Cluster hypothesis (van Rijsbergen 1978)Cluster hypothesis (van Rijsbergen 1978)

ApproachesApproaches

- Matching the query against document clusters (Willet 1988)- Matching the query against document clusters (Willet 1988)

- Matching the query against transformed document- Matching the query against transformed document representations (GVSM, Wong 1987, LSI, Deerwester 1990)representations (GVSM, Wong 1987, LSI, Deerwester 1990)

- Computing the conceptual distance between query andComputing the conceptual distance between query and documents (Order-theoretical ranking, Carpineto 2000)documents (Order-theoretical ranking, Carpineto 2000)

Page 15: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it

Order-theoretical ranking

NNS 0 FINANCE (Query)

1 NNS

FINANCE CREDIT

KBS (D7)

4 KBS

1 NNS

FINANCE BANK

ACCOUNT (D1)

1 NNS

1 FINANCE

2 NNS

BANK

2 NNS

BANK ACCOUNT

(D3)

2 FINANCE

CREDIT KBS (D4)

3 CREDIT

KBS (D5)

3 NNS

BANK RIVER

(D2)

3 BANK

4 BANK

KBS WATERS

(D6)

Page 16: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it

Performance of order-theoretical ranking

• Better than hierarchic clustering and comparable to best matching on the whole collection

• Markedly better than both hierarchic clustering and best matching on non-matching relevant documents

• Order-theoretical ranking does not scale up well but it is synergistic with best matching document ranking

Page 17: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it

OverviewOverview

• Vector-based IR and early conceptual approachesVector-based IR and early conceptual approaches

• Context and concepts in modern topical IRContext and concepts in modern topical IR

• Emerging IR tasks requiring knowledge structuresEmerging IR tasks requiring knowledge structures

• Research at FUBResearch at FUB

• ConclusionsConclusions

Page 18: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it

Question Answering

Task:

Closed-class questions in unrestricted domains with

no guarantee of answer and result possibly scattered

over multiple documents

Page 19: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it

Question Answering

Approach:

1. Recognize type of queries2. Retrieve relevant documents3. Find sought entities near question words4. Fall back to best-matching passage retrieval in case of failure

Page 20: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it

Web Information Retrieval

Page 21: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it

Web Information Retrieval

Current tasks:

named-entity finding tasktopic distillation task

Approach:

1. Use of multiple methods2. Combination of results via interpolation and normalization schemes

Page 22: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it

XML document retrieval

Goal:

Use document structure to improve precision andrecall of unstructured queries

“concerts this weekend at Sofia under 20 euros”

Approaches:

• Automatic inference of query structure

• Semi-automatic query annotation

• Hybrid query languages

Page 23: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it

OverviewOverview

• Vector-based IR and early conceptual approachesVector-based IR and early conceptual approaches

• Context and concepts in modern topical IRContext and concepts in modern topical IR

• Emerging IR tasks requiring knowledge structuresEmerging IR tasks requiring knowledge structures

• Research at FUBResearch at FUB

• ConclusionsConclusions

Page 24: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it

Recommender systemsRecommender systems

“Related keyword” feature

versus

Context-dependent query reformulation

Page 25: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it

DocumentDocument

RankingRanking

DocsDocs

QueryQueryQuery

Term ranking 1Term ranking 1

Term ranking 2Term ranking 2

Term ranking 3Term ranking 3

+

Page 26: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it
Page 27: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it

Combining text retrieval and text mining with concept latticesCombining text retrieval and text mining with concept lattices

Integration of multiple search strategies

(querying, browsing, thesaurus climbing,

bounding) into a unique Web interface

Goal

Page 28: Conceptual structures in modern information retrieval Claudio Carpineto Fondazione Ugo Bordoni Romacarpinet@fub.it

The use of conceptual structures surfaces in traditionaltopic relevance retrieval and it is at the heart of manynon-topical retrieval tasks

Towards conceptual search

Conclusions

•Understand term meaning•Adapt to the user•Can translate between applications•Explainable•Capable of filtering and summarization