79
Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University www.cs.brown.edu/people/th (& Chief Scientist, RecomMind Inc.) In collaboration with: Jan Puzicha, UC Berkeley & RecomMind David Cohen, CMU & Burning Glass

Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University (& Chief

  • View
    226

  • Download
    3

Embed Size (px)

Citation preview

Page 1: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

Matrix Decomposition Methods

in Information RetrievalThomas HofmannDepartment of Computer ScienceBrown Universitywww.cs.brown.edu/people/th(& Chief Scientist, RecomMind Inc.)

In collaboration with:Jan Puzicha, UC Berkeley & RecomMindDavid Cohen, CMU & Burning Glass

Page 2: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

2

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Overview

1. Introduction: A Brief History of

Mechanical IR

2. Latent Semantic Analysis

3. Probabilistic Latent Semantic

Analysis

4. Learning (from) Hyperlink Graphs

5. Collaborative Filtering

6. Future Work and Conclusion

Page 3: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

1. Introduction: A Brief History of Mechanical IR

3

Page 4: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

4

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Memex – “As we may think.”

Vannevar Bush (1945)

The idea of an easily accessible, individually configurable storehouse of knowledge, the beginning of the literature on mechanized information retrieval:

“Consider a future device for individual use, which is a sort of mechanized private file and library. It needs a name, and to coin one at random, ‘memex’ will do. A memex is a device in which an individual stores all his books, records, and communications, and which is mechanized so that it may be consulted with exceeding speed and flexibility. It is an enlarged intimate supplement to his memory.”

“The world has arrived at an age of cheap complex devices of great reliability; and something is bound to come of it.”

Page 5: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

5

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Memex – “As we may think.”

Vannevar Bush (1945)

The civilizational challenge:

“The difficulty seems to be, not so much that we publish unduly in view of the extent and variety of present day interests, but rather that publication has been extended far beyond our present ability to make real use of the record. The summation of human experience is being expanded at a prodigious rate, and the means we use for threading through the consequent maze to the momentarily important item is the same as was used in the days of square-rigged ships.”

V. Bush, “As we may think”, Atlantic Monthly, 176 (1945), pp.101-108

Page 6: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

6

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

The Thesaurus Approach

Hans Peter Luhn (1957, 1961) Words of similar or related meaning are grouped into

“notional families” Encoding of documents in terms of notional elements Matching by measuring the degree of notional

similarity A common language for annotating documents, key

word in context (KWIC) indexing. “… the faculty of interpretation is beyond the talent of

machines.” Statistical cues extracted by machines to assist human

indexer; vocabulary method to detecting similarities.

H.P. Luhn, “A statistical approach to mechanical literature searching”, New York, IBM Research Center, 1957.H.P. Luhn, “The Automatic Derivation of Information Retrieval Encodements from Machine-Readable Text”, Information Retrieval and Machine Translation, 3(2), pp.1021-1028, 1961

Page 7: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

7

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

To Punch or not to punch …

T. Joyce & R.M. Needham (1958)

Lattices & hierarchies of search terms“As in other systems, the documents are represented by holes in punched cards which represent the various terms, and in addition, when a hole is punched in any term card, all the terms at higher levels of the lattice […] are also punched.”The postcoordinate revolution: card sorting at search time!

“Investigations […] to lessen the physical work are continuing.”T. Joyce & R.M. Needham, “The Thesaurus Approach to Information Retrieval”,

American Documentation, 9, pp. 192-197, 1958.

Page 8: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

8

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Term Associations

Lauren B. Doyle (1962)

Unusual co-occurrences of pairs of words = associations of words in text

Statistical testing: Chi-square and Pearson correlation coefficient to determine pairwise correlations

Term association maps for interactive retrieval

Today: semantic maps

L.B. Doyle, “Indexing and Abstracting by Association”, Unisys Corporation, 1962.

Page 9: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

10

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Vector Space Model

Gerard Salton (1960/70)

Instead of indexing documents by selected index terms, preserve (almost) all terms in automatic indexing

Represent documents by a high-dimensional vector.

Each term can be associated with a weight Geometrical interpretation

G. Salton, “The SMART Retrieval System – Experiments in Automatic Document Processing”, 1971.

Page 10: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

11

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Term-Document Matrix

d i

w jintelligence

w1 ... w j ... w J

d1

...

d i

...

d I

D

W

...

...

...

... ),( ji wdc

Texas Instruments said it has developed the first 32-bit computer chip designed specifically for artificial intelligence applications [...]

D = {documents in database}

W = {terms in vocabulary}

...

art

ifici

al

1

inte

llig

ence

inte

rest

0

art

ifact

0 ...... 2t

term-document matrix

term weighting

Xd =

Page 11: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

12

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

similarity betweendocument and query

Documents in “Inner” Space

Retrieval method rank documents according

to similarity with query term weighting schemes,

for example, TFIDF used in SMART system and

many successor systems, high popularity

00.2

0.40.6

0.8

1

0

0.2

0.4

0.6

0.8

10

0.2

0.4

0.6

0.8

1

0.75

0.64

cosine of angle between query and document(s)

qd

q,d)q,d(cos)q,d(sim

Page 12: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

13

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Advantages of the Vector Space Model

No subjective selection of index terms Partial matching of queries and documents

(dealing with the case where no document contains all search terms)

Ranking according to similarity score (dealing with large result sets)

Term weighting schemes (improves retrieval performance)

Various extensions Document clustering Relevance feedback (modifying query vector)

Geometric foundation

Page 13: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

2. Latent Semantic Analysis

14

Page 14: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

15

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Limitations of the Vector Space Model Dimensionality:

Vector space representation is high-dimensional (several 10-100K).

Learning and estimation has to deal with curse of dimensionality.

Sparseness: Document vectors are typically very sparse. Cosine similarity can be noisy and inaccurate.

Semantics: The inner product can only match occurrences of exactly

the same terms. The vector representation does not capture semantic

relations between words. Independence

Bag-of-Words Representation Unable to capture phrases and semantic/syntactic

regularities

Page 15: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

16

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

The Lost Meaning of Words …

Ambiguity and association in natural language

Polysemy: Words often have a multitude of meanings and different types of usage (more urgent for very heterogeneous collections).The vector space model is unable to discriminate between different meanings of the same word.

Synonymy: Different terms may have an identical or a similar meaning (weaker: words indicating the same topic).No associations between words are made in the vector space representation.

)q,d(cos)q,d(sim

)q,d(cos)q,d(sim

Page 16: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

17

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Polysemy and Context

Document similarity on single word level: polysemy and context

carcompany

•••dodgeford

meaning 2

ringjupiter

•••space

voyagermeaning 1…

saturn...

…planet

...

contribution to similarity, if used in 1st meaning, but not if in 2nd

Page 17: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

18

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Latent Semantic Analysis

General idea Map documents (and terms) to a low-

dimensional representation. Design a mapping such that the low-

dimensional space reflects semantic associations (latent semantic space).

Compute document similarity based on the inner product in the latent semantic space.

Goals Similar terms map to similar location in low

dimensional space. Noise reduction by dimension reduction.

Page 18: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

19

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Dimension reduction by singular value decomposition of term-document matrix

original td matrix

original td matrix

L2 optimalapproximation

reconstructed td matrix

reconstructed td matrixCVUVUC ˆˆ tt

term/documentvectors

term/documentvectors

thresholdedsingular values

thresholdedsingular values

LSA: Matrix Decomposition by SVD

)w,d(cc),c( jiijij C word frequencies(possibly transformed)

•Document length normalization•Sublinear transformation (e.g., log)•Global term weight

Page 19: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

20

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Singular Value Decomposition, definition

: orthonormal columns : diagonal with singular values (ordered)

Properties: Existence & uniqueness Thresholding small singular values yields an optimal

low-rank approximation (in the sense of the Frobenius norm)

Background: SVD

VU,

tVUC = X Xn X m n X n n X n n X m

tˆˆ VUC = X Xn X m n X k k X k k X m

Page 20: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

21

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

SVD and PCA

If (!) the rows of would be shifted such that their mean is zero, then:

Then, one would essentially perform a projection on the principal axis defined by the columns of

Yet, this would destroy the sparseness of the term-document matrix (and consequently might hurt the performance of SVD methods)

t2tttt )( UUVUVUCC

C

U

Page 21: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

22

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Hirschfield 1935, Hotelling 1936, Fisher 1940:Correlation analysis for contingency tables

K

2kjkikkjiij vu1ccc

J

1jiji cc

I

1iijj cc

I

1i

J

1jjkjiki 0vcuc

I

1ikl

J

1jjljkjiliki vvcuuc

1ii

constraints

Canonical Analysis

Page 22: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

23

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Correspondence Analysis (as a method of scaling):Guttman 1941, Torgerson 1958, Benzecri 1969, Hill 1974Whitaker 1967: “gradient analysis”

“reciprocal averaging”

j

jijjc

1i vcu

iiij

ic

1j ucv

solutions: unit vectors and scores of canonical analysis SVD of rescaled matrix with entries

jiijij cc/cc

Canoncial & Correspondence Analysis

(not exactly what is done in LSA)

Page 23: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

24

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

lower dimensionaldocument representation

lower dimensionaldocument representation

t2tttt UˆU)Uˆ)(ˆU(UˆVVˆUˆˆ CC

Similarity: inner product in lower dimensional space

For given decomposition, additional documents or queries can be mapped to semantic space (folding-in) Notice that:

Hence, for new document/query q

Semantic Inner Product / Kernel

1t CVUCVU

1tt qq̂ V

Page 24: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

25

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Term Associations from LSA

(taken from slide by S. Dumais)

Ter

m 2

Term 1

Conce

pt

Page 25: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

26

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

LSA: Discussion

pros: Low-dimensional document representation is able to capture

synonyms. Noise removal and robustness by dimension reduction Experimentally: advantages over naïve vector space model

cons: “Formally”: L2 norm is inappropriate as a distance function for

count vectors (reconstruction may contain negative entries) “Conceptually”:

Problem of polysemy is not addressed; principle of linear superposition, no active disambiguation

Context of terms is not taken into account. Directions in latent space are hard to interpret. No probabilistic model of term occurrences.

[ad hoc selection of the number of dimensions, ...]

Page 26: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

27

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Features of IR Methods

Features VSM LSA

Quantitative relevance score yes yes

Partial query matching yes yes

Document similarity yes yes

Word correlations, synonyms no yes

Low-dimensional representation no yes

Notional families, concepts no not really

Dealing with polysemy no no

Probabilistic model no no

Sparse representation yes no

Page 27: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

3. Probabilistic Latent Semantic Analysis

28

Page 28: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

29

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Documents as Information Sources

w1 ... w j ... w J

d1

...

d i

...

d I

D

W

...

...

...

... )w,d(c ji

D = {documents in database}

W = {words in vocabulary}

“real” document: empirical probability distrib. relative frequencies

)d(c

)w,d(c)d|w(P̂

sampleother documents ?)d|w(P

“ideal” document: (memoryless) information source

Page 29: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

30

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Information Source Models in IR

Bayes rule: probability of relevance of document w.r.t. query

)d(P)d|q(P)q|d(P prior probabilityof relevance

w

qt

)d|w(P )w|t(P)d|t(P

,)d|t(P)d|q(P

Query translation model

• Probability that q is “generated” from d

• Probability that query term is generated

Language model

Translation model

J. Ponte & W.B. Croft, ”A Language Model Approach to Information Retrieval”, SIGIR 1998.A. Berger & J. Lafferty, “Information Retrieval as Statistical Translation, SIGIR 1999.

Page 30: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

31

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Probabilistic Latent Semantic Analysis How can we learn document-specific language

models? Sparseness problem, even for unigrams.

Probabilistic dimension reduction techniques to overcome data sparseness problem.

Factor analysis for count data: factors concepts

z

d)|P(zz)|P(wd)|P(w

(topic) factor“sources”

document-specificmixing proportions

document“sources”

latent variable z(“small” #states)

T. Hofmann, “Probabilistic Latent Semantic Analysis”, UAI 1999.

z

)z(P)z|P(dz)|P(w)dP(w,

Page 31: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

32

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

docu

men

tco

llect

ion

single documentin collection

word occurrences

in a document

PLSA: Graphical Model

z

wc(d)

P(w|d) P(w|z) P(z|d)z

Page 32: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

33

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

PLSA: Graphical Model

P(w|d) P(w|z) P(z|d)z

colle

ctio

n

N

wc(d)

P(z|d)

z

Page 33: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

34

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

PLSA: Graphical Model

P(w|d) P(w|z) P(z|d)z

N

wc(d)

P(z|d)

z

P(w|z)

Page 34: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

35

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

PLSA: Graphical Model

P(w|d) P(w|z) P(z|d)z

N

wc(d)

P(z|d)

z

shared by all words in a document

shared by all documents in

collection

P(w|z)

Page 35: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

36

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Probabilistic Latent Semantic Space documents are represented as points in low-

dimensional sub-simplex (dimensionality reduction for probability distributions)

KL-divergence projection, not orthogonal

)z|w(P 1

spannedsub-

simplex

0

+simplexembedding )d|w(P̂

)z|w(P 1

)z|w(P 3)z|w(P 2

)d|w(P

Page 36: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

37

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Positive Matrix Decomposition mixture decomposition in matrix notation

CPPC~t

wd diag(P( ),..., P( ))z zK1

)z|dP()( kik,id P

)z|wP()( kjk,jw P

constraints Non-negativity of all matrices

Normalization according to L1-norm

(no orthogonality)

D.D. Lee & H.S. Seung, “Learning the parts of objects by non-negative matrix factorization”, Nature, 1999.

Page 37: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

38

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Positive Matrix Decomposition & SVD mixture decomposition in matrix notation

CPPC~t

wd diag(P( ),..., P( ))z zK1

)z|dP()( kik,id P

)z|wP()( kjk,jw P

CVUVUC ˆˆ tt compare to

probabilistic approach vs. linear algebra decomposition conditional independence assumption “replaces” outer product

class-conditional distributions “replace” left/right eigenvectors

maximum likelihood instead of minimum L2 norm

criterion

j,i z

ijijj,i

ijij )z(P)z|d(P)z|w(Plogcc~logcL

Page 38: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

39

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Expectation Maximization Algorithm

Maximizing log-likelihood by (tempered) EM iterations

E-step (posterior probabilities of latent variables)

M-step (max. of expected complete log-likelihood)

d

)w,d|z(P)w,d(c)z|wP(

probability that a term occurrence w within d is explained by topic z“

w

) w, d| z( P) w, d(c ) z| d P(

d, w

) w, d| z( P) w, d(c ) z P(

'z

)'z(P)'z|w(P)'z|d(P

)z(P)z|w(P)z|d(P)w,d|zP(

'z

)'z(P)]'z|w(P)'z|d(P[

)z(P)]z|w(P)z|d(P[)w,d|zP(

Page 39: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

40

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Example: Science Magazine Papers

Dataset with approx. 12K papers from Science Magazine Selected concepts from model with K=200

Page 40: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

41

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Example: TDT1 news stories

TDT1 = document collection with approx. 16,000 news stories (Reuters, CNN, years 1994/95)

results based on decomposition with 128 concepts 2 main factors for “flight“ and “love“ (most probable

words)

“love”

homefamilylikejustkidsmotherlifehappyfriendscnn

film moviemusicnewbesthollywoodloveactorentertainmentstar

“flight”

planeairportcrashflightsafetyaircraftairpassengerboardairline

spaceshuttlemissionastronautslaunchstationcrewnasasatelliteearth

pro

babili

tyP(w

|z)

Page 41: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

42

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Folding-in a Document/Query

unbosnianserbsbosniaserbsarajevonatopeacekeep.nationspeacebihacwar

iraqiraquisanctionskuwaituncouncilgulfsaddambaghdadhusseinresolutionborder

refugeesaidrwandareliefpeoplecampszairecampfoodrwandanungoma

buildingcitypeoplerescuebuildingsworkerskobevictimsareaearthquakedisastermissing

4 selected factorswith their most probable keywords

TDT1 collection: approx. 16,000 news storiesPLSA model with 128 dimensionsQuery keywords: “aid food medical people UN war”4 most probable factors for queryTrack posteriors for every key word

Page 42: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

43

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Folding-in a Document/Query

0.2

0.4

0.6

0.8

1

0.2

0.4

0.6

0.8

1

0.2

0.4

0.6

0.8

1

0.2

0.4

0.6

0.8

1

aidfoodmedicalpeopleunwar

unbosnianserbsbosniaserbsarajevonatopeacekeep.nationspeacebihacwar

iraqiraquisanctionskuwaituncouncilgulfsaddambaghdadhusseinresolutionborder

refugeesaidrwandareliefpeoplecampszairecampfoodrwandanungoma

buildingcitypeoplerescuebuildingsworkerskobevictimsareaearthquakedisastermissing

Iteration 1Posterior

probabilites

Page 43: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

44

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Folding-in a Document/Query

aidfoodmedicalpeopleunwar

unbosnianserbsbosniaserbsarajevonatopeacekeep.nationspeacebihacwar

iraqiraquisanctionskuwaituncouncilgulfsaddambaghdadhusseinresolutionborder

refugeesaidrwandareliefpeoplecampszairecampfoodrwandanungoma

buildingcitypeoplerescuebuildingsworkerskobevictimsareaearthquakedisastermissing

Iteration 2Posterior

probabilites

0.2

0.4

0.6

0.8

1

0.2

0.4

0.6

0.8

1

0.2

0.4

0.6

0.8

1

0.2

0.4

0.6

0.8

1

Page 44: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

45

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Folding-in a Document/Query

aidfoodmedicalpeopleunwar

unbosnianserbsbosniaserbsarajevonatopeacekeep.nationspeacebihacwar

iraqiraquisanctionskuwaituncouncilgulfsaddambaghdadhusseinresolutionborder

refugeesaidrwandareliefpeoplecampszairecampfoodrwandanungoma

buildingcitypeoplerescuebuildingsworkerskobevictimsareaearthquakedisastermissing

Iteration 5Posterior

probabilites

0.2

0.4

0.6

0.8

1

0.2

0.4

0.6

0.8

1

0.2

0.4

0.6

0.8

1

0.2

0.4

0.6

0.8

1

Page 45: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

46

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Folding-in a Document/Query

aidfoodmedicalpeopleunwar

unbosnianserbsbosniaserbsarajevonatopeacekeep.nationspeacebihacwar

iraqiraquisanctionskuwaituncouncilgulfsaddambaghdadhusseinresolutionborder

refugeesaidrwandareliefpeoplecampszairecampfoodrwandanungoma

buildingcitypeoplerescuebuildingsworkerskobevictimsareaearthquakedisastermissing

Iteration Posterior

probabilites

0.2

0.4

0.6

0.8

1

0.2

0.4

0.6

0.8

1

0.2

0.4

0.6

0.8

1

0.2

0.4

0.6

0.8

1

0.2

0.4

0.6

0.8

1

0.2

0.4

0.6

0.8

1

0.2

0.4

0.6

0.8

1

0.2

0.4

0.6

0.8

1

Page 46: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

47

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

0 50 1000

10

20

30

40

50

60

70

80

90

MED

recall [%]

pre

cisi

on

[%]

0 50 1000

10

20

30

40

50

60

70

CRAN

recall [%]0 50 100

0

10

20

30

40

50

60

CACM

recall [%]0 50 100

0

5

10

15

20

25

30

35

40

45

50

CISI

recall [%]

cosLSIPLSI*

cosLSIPLSI*

cosLSIPLSI*

cosLSIPLSI*

Experiments: Precison-Recall

4 test collections (each with approx.1000- 3500 docs)

Page 47: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

48

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Experimental Results: TFIDF

0

10

20

30

40

50

60

70

80

Medline CRAN CACM CISI

VSM

LSA

PLSA

Avera

ge P

reci

sion

-Reca

ll

Page 48: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

49

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Experimental Results: TFIDF

0%

5%

10%

15%

20%

25%

30%

35%

40%

45%

50%

Medline CRAN CACM CISI

VSM

LSA

PLSA

Rela

tive G

ain

in A

vera

ge P

R

Page 49: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

50

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

From Probabilistic Models to Kernels: The Fisher Kernel

Use idea of a Fisher kernel: Main idea: Derive a kernel or similarity function from a

generative model How do ML estimates of parameters change, around a

point in sample space?

Derive Fisher scores from model

Kernel/similarity function

y1t

x U)θ̂(IU)y,x(sim

θ)|x(PlogU θx point sample:x parameters model:θ

T. Jaakkola & D. Haussler, “Exploiting Generative Models for Discriminative Training”, NIPS 1999.

Page 50: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

51

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Semantic Kernel from PLSA: Outline

Outline of the technical derivation Parameterize multinomials by variance

stabilizing parameters (=square-root parameterization)

Assume information orthogonality of parameters for different multinomials (approximation).

In each block, an isometric embedding with constant Fisher information is obtained. (Inversion problem for information matrix is circumvented)

… and the result …

Page 51: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

52

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Semantic Kernel from PLSA: Result

J

1j mi

mjijK

1kjmji

K

1kmimi

cc

cc)w,d|z(P)w,d|z(Pα

)d|z(P)d|z(P)d,d(sim

K=1 essentially reduces to Vector Space Model (!)

topical overlap: probability that randomly chosen word in first and in second document refer to the same topic/concept

word sense(!) overlap: do both terms refer to the same concept?

word overlap: do both documents contain common terms?

Page 52: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

53

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Text Categorization: SVM with PLSA standard text collection: Reuters21578 (5 main

categories) with standard kernel and PLSA kernel (Fisher kernel)

substantial improvement, if additional unlabeled documents are available

0

1

2

3

4

5

6

7

8

Error%

ear

n

acq

money

grai

n

crude

SVM 5%

SVM+ 5%

SVM 20%

SVM+ 20%

SVM 100%

SVM+ 100%

Page 53: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

54

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Latent Class Analysis: Example

•document collection with approx. 1,400 abstracts on “clustering“ (INSPEC 1991-1997), preprocessing: stemming, stop word list

•4 main factors (K=128) for term “SEGMENT“ (most probable words)

imagSEGMENTtexturcolortissubrainsliceclustermrivolum

image segmentation

videosequencmotionframesceneSEGMENTshotimagclustervisual

motionsegmentation

constraintlinematchlocatimaggeometrimposSEGMENTfundamentrecogn

linematching

speakerspeechrecognisignaltrainHMMsourcspeakerindep.SEGMENTsound

speechrecognition

Page 54: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

55

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Multiresolution wavelet decomposition and neuro-fuzzy clustering for segmentation of radiographic images.

Segmentation of medical images is a challenging problem in the field of image analysis. Several diagnostics are based on proper segmentation of the digitized image. Segmentation of medical images is needed for applications involving estimation of the boundary of an object, classification of tissue abnormalities, shape analysis, contour detection and texture segmentation. […]

0.55340.00000.00120.0000

Unknown-multiple signal source clustering problem using ergodic HMM and applied to speaker classification.

The authors consider signals originated from a sequence of sources. More specifically, the problems of segmenting such signals and relating the segments to their sources are addressed. This issue has wide applications in many fields. The report describes a resolution method that is based on an ergodic hidden Markov model (HMM), in which each HMM state corresponds to a signal source. […]

0.00020.66890.04550.0000

relative similarity (VSM): 1.4 relative similarity (PLSA): 0.7

“image” “speech”“video”“line”

Document Similarity: Example (1)

Page 55: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

56

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

McCalpin, J.P.; Nishenko, S.P.: Holocene paleoseismicity, temporal clustering, and probabilities of future large (M>7) earthquakes on the Wasatch fault zone, Utah.

The chronology of M>7 paleoearthquakes on the central five segments of the Wasatch fault zone (WFZ) contains 16 earthquakes in the past 5500 years with an average repeat time of 350 years. Four of the central five segments ruptured between 620+or-30 and 1230+or-60 calendar years B.P. The remaining segment (Brigham City segment) has not ruptured in the past 2120+or-100 years. Comparison of the WFZ space-time diagram of paleoearthquakes with synthetic paleoseismic histories indicates that the observed temporal clusters and gaps have about an equal probability (depending on model assumptions) of reflecting random coincidence as opposed to intersegment contagion. Regional seismicity suggests […]

relative similarity (VSM): 1.0 relative similarity (PLSA): 0.5

Blatt, M.; Wiseman, S.; Domany, E.: Clustering data through an analogy to the Potts model

A new approach for clustering is proposed. This method is based on an analogy to a physical model; the ferromagnetic Potts model at thermal equilibrium is used as an analog computer for this hard optimization problem. We do not assume any structure of the underlying distribution of the data. Phase space of the Potts model is divided into three regions; ferromagnetic, super-paramagnetic and paramagnetic phases. The region of interest is that corresponding to the super-paramagnetic one, where domains of aligned spins appear. The range of temperatures where these structures are stable is indicated by […]

Document Similarity: Example (2)

Page 56: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

57

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Features of IR Methods

Features LSA PLSA

Quantitative relevance score yes yes

Partial query matching yes yes

Document similarity yes yes

Word correlations, synonyms yes yes

Low-dimensional representation yes yes

Notional families, concepts not really

yes

Dealing with polysemy no yes

Probabilistic model no yes

Sparse representation no yes

Page 57: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

4. Learning (from) Hyperlink Graphs

58

Page 58: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

59

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

The Importance of Hyperlinks in IR

Hyperlinks provide latent human annotation Hyperlinks represent an implicit

endorsement of the page being pointed to Social structures are reflected in the Web

graph (cyber/virtual/Web communities) Link structure allows assessment of page

authority goes beyond content-based analysis potentially discriminates between high and low

quality sites

Page 59: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

60

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

HITS (Hyperlink Induced Topic Search)

Jon Kleinberg and the Smart group (IBM) HITS

Retrieve a subset of Web pages, based on query-based search: result set + context graph

Extract hyperlink graph of pages in subset Rescoring method with hubs- and authority weights

using the adjacency matrix of a Web subgraph

Solution: left/right eigenvectors (SVD)

J. Kleinberg, “Authoritative Sources in a Hyperlinked Environment”, 1998.

E)p,q(:p

)t(p

)1t(q

E)p,q(:q

)t(q

)t(p

xy

yx Authority scores

Hub scores

pq

…)t(qy )t(

px

qp

…)t(px )1t(

qy

Page 60: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

61

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Learning a Semantic Model of the Web

Making sense of the text Probabilistic latent semantic analysis Automatically identifies concepts and topics.

Making sense of the link structure Probabilistic graph model, i.e., predictive model

for additional links/nodes based on existing ones Centered around the notion of “Web

communities” Probabilistic version of HITS Enables to predict the existence of hyperlinks:

estimate the entropy of the Web graph

Page 61: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

62

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Finding Web Communities

)z|s(P)z|t(P

z

)z|t(P)z|s(P)z(P)t,s(P

Probabilistic model

Source nodes Target nodes

st

identical

Web Community: densely connected bipartite subgraph

Page 62: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

63

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Decomposing the Web Graph

Web subgraph Community 1

Community 3Community 2

Links (probabilistically)belong to exactly one community.

Nodes may belong tomultiple communities.

Page 63: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

64

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Linking Hyperlinks and Content

PLSA and PHITS (probabilistic HITS) can be combined into one joint decomposition model

w

z

P(z|s)

P(w|z)

concept/topic

P(t|z)

t

Web community

Page 64: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

65

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

“Ulysses” Webs: Space, War, and Genius (no heros wanted)

ulysses 0.022082space 0.015334page 0.013885home 0.011904nasa 0.008915science 0.007417solar 0.007143esa 0.006757mission 0.006090

ulysses.jpl.nasa.gov/ 0.028583helio.estec.esa.nl/ulysses 0.026384www.sp.ph.ic.ak.uk/Ulysses 0.026384

grant 0.019197s 0.017092ulysses 0.013781online 0.006809war 0.006619school 0.005966poetry 0.005762president 0.005259civil 0.005065www.lib.siu.edu/projects/usgrant/ 0.019358www.whitehouse.gov/WH/glimpse /presidents/ug18.html 0.017598saints.css.edu/mkelsey/gppg.html 0.015838

page 0.020032ulysses 0.013361new 0.010455web 0.009060site 0.009009joyce 0.008430net 0.007799teachers 0.007236information 0.007170http://www.purchase.edu/Joyce/Ulysses.htm 0.008469http://www.bibliomania.com/Fiction/joyce/ulysses/index.html 0.007274 http://teachers.net/chatroom/ 0.005082

D. Cohn & T. Hofmann, “The Missing Link”, NIPS 2001.

• Decomposition of a base set generated from Altavista with query “Ulysses”• Combined decomposition based on links and text

Page 65: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

5. Collaborative Filtering

66

Page 66: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

67

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Personalized Information Filtering:

Users/Customers

Objects

Judgement/Selection

“likes”“has seen”

Page 67: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

68

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Predicting Preferences and Actions

User ProfileDr. Strangeloves *****Three Colors: Blue *****Fargo *****Pretty Woman *Movie? Rating?

.

***************

Page 68: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

69

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Collaborative and Content-Based Filtering

Collaborative/social filtering Properties of persons or similarities between

persons are used to improve predictions. Makes use of user profile data Formally: starting point is sparse matrix with

user ratings

Content-based filtering properties of objects or similarities between

objects are used to improve predictions

Page 69: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

70

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

PLSA for Predicting User Ratings

Multi-valued (or real-valued) rating }5,4,3,2,1,0{v

u y

z v preference v is independent of person u, given latent state z“community-based” variant

• Each user is represented by a specific probability distribution

• Analogy to IR [user=document], [items=terms]

Page 70: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

71

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

PLSA vs. Memory-Based Approaches

Standard approach: memory-based Given active user, compute correlation with all

user profiles in the data base (e.g., Pearson) Transform correlation into relative weight and

perform a weighted prediction over neighbors PLSA

Explicitly decomposes preferences: interests are inherently “multi-dimensional”, no global similarity function used (!)

Probabilistic model Data mining: interest groups

Page 71: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

72

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

EachMovie Data Set (I)

33.4

35.3

39.940.8

30

32

34

36

38

40

42

Baseline

Memory

PLSA, K=20

PLSA, K=200

EachMovie: >40K users, >1.6K movies, >2M votes

Experimental evaluation: comparison with memory-based method (competitive), leave-one-out protocol

Prediction accuracy

Page 72: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

73

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

EachMovie Data Set (II)

Absolute Deviation

1.091

0.951 0.9470.924

0.9

0.95

1

1.05

1.1

Baseline

Memory

PLSA, K=20

PLSA, K=200

Page 73: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

74

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

EachMovie Data Set (III)

26.95 27.89

44.64 45.98

0

10

20

30

40

50

Baseline

Memory

PLSA, K=20

PLSA, K=200

Ranking score: exponential fall-off of weights with position in recommendation list

Page 74: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

75

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Interests Group, Each Movie

Page 75: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

76

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Des-Interests Group, Each Movie

Page 76: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

6. Open Problems & Conclusions

77

Page 77: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

78

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Scalability of Matrix Decomposition

RecomMind Inc., Retrieval Engine >1M documents >50K vocabulary >1K concepts

Internet Archive (www.archive.org) Large-scale Web experiments, >10M sites

Page 78: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

79

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Conclusion: Matrix Decomposition

Enables semantic document indexing: concepts, notional families

Increased robustness in information retrieval Text/data mining: finding regularities & patterns Improved categorization by providing more

suitable document representations Probabilistic nature of models allows the use of

formal inference Very versatile: term-document matrix,

adjacency matrix, rating matrix, etc.

Page 79: Matrix Decomposition Methods in Information Retrieval Thomas Hofmann Department of Computer Science Brown University  (& Chief

80

© T

hom

as

Hofm

an

nC

S D

ep

art

men

t, B

row

n U

niv

ers

ity, Pro

vid

en

ce R

I, t

h@

cs.b

row

n.e

du

KerM

IT &

Neu

roC

OLT

Work

shop

, A

pri

l 3

0th-M

ay 2

nd 2

00

1, C

um

berl

an

d L

od

ge

Open Problems

Conceptual Bayesian model learning and model combination Distributed learning of latent class models Relational Bayesian networks (Koller et al.) Principled ways to exploit sparseness in algorithm

design Beyond bag-of-words models (string kernels, bigram

language models)

Applications Combining content filtering with collaborative filtering Personalized information retrieval Interactive retrieval using extracted structure Multimedia retrieval New application domains