26
Application of Ensemble Models in Web Ranking Homa B. Hashemi Nasser Yazdani Azadeh Shakery Mahdi Pakdaman Naeini School of Electrical and Computer Engineering University of Tehran

Application of Ensemble Models in Web Ranking Homa B. Hashemi Nasser Yazdani Azadeh Shakery Mahdi Pakdaman Naeini School of Electrical and Computer Engineering

Embed Size (px)

Citation preview

Application of Ensemble Models in Web Ranking

Homa B. HashemiNasser YazdaniAzadeh Shakery

Mahdi Pakdaman Naeini

School of Electrical and Computer EngineeringUniversity of Tehran

Information Explosion

Web Challenges Huge size of information

25 billion pages

Proliferation and dynamic nature Creation of New pages New links are created at rate 25% per week

Heterogeneous contents HTML/Text/Audio/…

4Application of Ensemble Models in Web Ranking

Search Engine as A Tool

Application of Ensemble Models in Web Ranking 5

http://seo-related.com/

Inside Search Engine Crawling Indexing Ranking

Inside Search Engine Crawling Indexing Ranking

Ranking Approaches Content-based (query dependent)

TF, IDF BM25 Classical IR …

Connectivity based (web) PageRank HITS …

Application of Ensemble Models in Web Ranking 8

Our General Framework

Application of Ensemble Models in Web Ranking 9

Query Retrieval Model

List 1

List 2

List N

Ensemble Model

Final

List

Simple Ensemble Models Sum rule

Add (normalized) values of different methods Product rule

Multiply (normalized) values of different methods

Borda rule Combination of ranking

Application of Ensemble Models in Web Ranking 10

Complicated Ensemble Models

OWA (Ordered Weighted Averaging)

Click-Through Data

SVM Use the distance from discriminating hyper

plane as the measure for relevancy of a page to a specific query

Application of Ensemble Models in Web Ranking 11

OWA operator

the weights of each vector

Application of Ensemble Models in Web Ranking 12

n

jjjn bwaaaF

121 ,...,,

1

21

23

2

1

1

,1

...

,1

,1

,

nn

nn

w

w

w

w

w

3.0

Simulated Click-Through Data How can we use the user behavior?

80% of user clicks are related to query Click-through data

Application of Ensemble Models in Web Ranking 13

14

L(a)

1. D1

2. D3

3. D2

4. D4

5. D5

6. d6

Simulated Click-Through Data (example)

L(b)

1. D1

2. D4

3. D7

4. D9

5. D2

6. d8

15

L(a)

1. D1

2. D3

3. D2

4. D4

5. D5

6. d6

Simulated Click-Through Data (example)

L(b)

1. D1

2. D4

3. D7

4. D9

5. D2

6. d8

Interleaved results L(a,b)

1. D1 2. D43. D34. D75. D26. D97. D58. D89. D6

16

L(a)

1. D1

2. D3

3. D2

4. D4

5. D5

6. d6

Simulated Click-Through Data (example)

L(b)

1. D1

2. D4

3. D7

4. D9

5. D2

6. d8

Interleaved results L(a,b)

1. D1 First2. D43. D34. D75. D2 Second6. D97. D5 Third8. D89. D6

17

L(a)

1. D1

2. D3

3. D2

4. D4

5. D5

6. d6

Simulated Click-Through Data (example)

L(b)

1. D1

2. D4

3. D7

4. D9

5. D2

6. d8

Interleaved results L(a,b)

1. D1 First2. D43. D34. D75. D2 Second6. D97. D5 Third8. D89. D6

Experimental Datasets

LETOR benchmark (English) Microsoft Research Asia, 2007

DotIR benchmark (Persian) Iran Telecommunication Research Center

(ITRC),2009

Application of Ensemble Models in Web Ranking 18

LETOR Benchmark – p@k

Application of Ensemble Models in Web Ranking 19

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

-0.0499999999999997

2.91433543964104E-16

0.0500000000000003

0.1

0.15

0.2

0.25

0.3

0.35

TF-IDF BM25 HITS PageRank Borda1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

product Normal_sum SumWeighted_Sum SVM_Linear SVM_RBFOWA SimClick

LETOR Benchmark – MAP

Application of Ensemble Models in Web Ranking 20

TF-ID

F

BM25

HIT

S

Page

Rank

Borda

prod

uct

Nor

mal

_sum Su

m

Wei

ghte

d_Su

m

Wei

ghte

d_Nor

mal

_Sum

SVM

_Lin

ear

SVM

_RBF

OW

A

Sim

Click

0

0.05

0.1

0.15

0.2

0.25

DotIR Benchmark – p@k

21

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 160

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

TF-IDF BM25 HITS

PageRank Borda

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 160

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

product Normal_sumSum SVM_RBFOWA SimClick

DotIR Benchmark – MAP

Application of Ensemble Models in Web Ranking 22

TF-ID

F

BM25

HIT

S

Page

Rank

Borda

prod

uct

Nor

mal

_sum Su

m

Wei

ghte

d_Su

m

Wei

ghte

d_Nor

mal

_Sum

SVM

_Lin

ear

SVM

_RBF

OW

A

Sim

Click

0

0.1

0.2

0.3

0.4

0.5

0.6

Summary Motivation:

Important role of Ranking algorithms Low precision of content and connectivity

algorithms

Solution: Use different Ensemble models to combine

Ranking algorithms based on Learning

Results: LETOR benchmark has been used for evaluation More research needed to be done on newly built DotIR

collectionApplication of Ensemble Models in Web Ranking 23

Application of Ensemble Models in Web Ranking 24

LABS

25

Reference Ali Mohammad Zareh Bidoki, Pedram Ghodsnia, Nasser

Yazdani, “A3CRank: An Adaptive Ranking method based on Connectivity, Content and Click-through data”, Information Processing and Management, 2010.

Ali Mohammad Zareh Bidoki, “Combination of Documents Features Based on Simulated Click-through Data”, ECIR 2009.

Application of Ensemble Models in Web Ranking

Thank You

Any Questions?