35
Use of Click Data for Web Search Tao Yang UCSB 290N, 2014

Use of Click Data for Web Search - UCSBtyang/class/290N14/slides/TopicClickAnalysis.pdfAggregate observed values over all user interactions for each query and result pair Distributional

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Use of Click Data for Web

Search

Tao Yang

UCSB 290N, 2014

Table of Content

• Search Engine Logs

• Eyetracking data on position bias

Click data for ranker training [Joachims, KDD02]

• Case study: Use of click data for search ranking [ Agichtein et al, SIGIR 06]

3

Search Logs

Query logs recorded by search engines

Huge amount of data: e.g. 10TB/day at Bing

4

5

mustang ford mustang

Nova

… www.fordvehicles.com/

cars/mustang

www.mustang.com

en.wikipedia.org/wiki/

Ford_Mustang AlsoTry

Search

sessions

Query session

Query sessions and analysis

6

Session

Mission Mission Mission …

Query Query Query

Click Click Click

Query Query

Click Click

fixation fixation fixation

Query level

Click level

Eye-tracking level

Query-URL correlations:

• Query-to-pick

• Query-to-query

• Pick-to-pick

Examples of behavior analysis with

search logs

• Query-pick (click) analysis

• Session detection

• Classification

x1, x2, …, xN y

eg, whether the session has a commercial intent

• Sequence labeling

x1, x2, …, xN y1, y2, …, yN

eg, segment a search sequence into missions and goals

• Prediction

x1, x2, …, xN-1 yN

• Similarity

Similarity(S1, S2)

Query-pick (click) analysis

• Search Results for “CIKM”

3/6/2014

8

CIKM'09 Tutorial, Hong Kong, China

# of clicks received

Interpret Clicks: an Example

• Clicks are good…

Are these two clicks

equally “good”?

• Non-clicks may have

excuses:

Not relevant

Not examined

3/6/2014

CIKM'09 Tutorial, Hong Kong, China 9

Use of behavior data

• Adapt ranking to user clicks?

3/6/2014

10

CIKM'09 Tutorial, Hong Kong, China

# of clicks received

Non-trivial cases

• Tools needed for non-trivial cases

3/6/2014

11

CIKM'09 Tutorial, Hong Kong, China

# of clicks received

Eye-tracking User Study

12

3/6/2014

CIKM'09 Tutorial, Hong Kong, China

Eye tracking for different web sites

Google user

patterns

Higher positions receive more user attention (eye fixation) and clicks than lower positions.

This is true even in the extreme setting where the order of positions is reversed.

“Clicks are informative but biased”.

14

3/6/2014

CIKM'09 Tutorial, Hong Kong, China

[Joachims+07]

Click Position-bias

Normal Position

Pe

rce

nta

ge

Reversed Impression

Pe

rce

nta

ge

Clicks as Relative Judgments for Rank

Training

• “Clicked > Skipped Above” [Joachims, KDD02]

3/6/2014

CIKM'09 Tutorial, Hong Kong, China 15

Preference pairs: #5>#2, #5>#3, #5>#4.

Use Rank SVM to optimize the retrieval function.

Limitation:

Confidence of judgments

Little implication to user modeling

1

2

3

4

5

6

7

8

Additional relation for relative relevance

judgments

click > skip above

last click > click above

click > click earlier

last click > click previous

click > no-click next

17

Web Search Ranking by Incorporating User Behavior

Information Rank pages relevant for a query

•Eugene Agichtein, Eric Brill, Susan Dumais SIGIR

2006

• Categories of Features (Signals) for Web Search

Ranking

Content match

– e.g., page terms, anchor text, term weights, term span

Document quality

– e.g., web topology, spam features

• Add one more category:

Implicit user feedback from click data

18

Related Work

• Personalization

Rerank results based on user’s clickthrough and

browsing history

• Collaborative filtering

Amazon, Ask.com DirectHit: rank by clickthrough

• General ranking

Joachims et al. [KDD 2002], Radlinski et al. [KDD

2005]: tuning ranking functions with clickthrough

19

Rich User Behavior Feature Space

• Observed and distributional features

Aggregate observed values over all user interactions

for each query and result pair

Distributional features: deviations from the “expected”

behavior for the query

• Represent user interactions as vectors in

user behavior space

Presentation: what a user sees before a click

Clickthrough: frequency and timing of clicks

Browsing: what users do after a click

20

Ranking Features (Signals)

Presentation

ResultPosition Position of the URL in Current ranking

QueryTitleOverlap Fraction of query terms in result Title

Clickthrough

DeliberationTime Seconds between query and first click

ClickFrequency Fraction of all clicks landing on page

ClickDeviation Deviation from expected click frequency

Browsing

DwellTime Result page dwell time

DwellTimeDeviation Deviation from expected dwell time for query

More Presentation Features

More Clickthough Features

Browsing features

24

Training a User Behavior Model

• Map user behavior features to relevance

judgements

• RankNet: Burges et al., [ICML 2005]

Neural Net based learning

Input: user behavior + relevance labels

Output: weights for behavior feature values

Used as testbed for all experiments

25

User Behavior Models for Ranking

• Use interactions from previous instances of query General-purpose (not personalized)

Only available for queries with past user interactions

• 3 Models: Rerank, clickthrough only:

reorder results by number of clicks

Rerank, predicted preferences (all user behavior features): reorder results by predicted preferences

Integrate directly into ranker: incorporate user behavior features with other categories of ranking (e.g. text matching)

26

Evaluation Metrics

• Precision at K: fraction of relevant in top K

• NDCG at K: norm. discounted cumulative gain Top-ranked results most important

• MAP: mean average precision Average precision for each query: mean of the

precision at K values computed after each relevant document was retrieved

K

j

jr

qq jMN1

)( )1log(/)12(

27

Datasets

• 8 weeks of user behavior data from anonymized opt-in client instrumentation

• Millions of unique queries and interaction traces

• Random sample of 3,000 queries Gathered independently of user behavior

1,500 train, 500 validation, 1,000 test

• Explicit relevance assessments for top 10 results for each query in sample

28

Methods Compared

• Full Search Engine

Content match feature uses BM25F

A variation of TF-IDF model

• Compare 4 ranking models

BM25F only

Clickthrough: called Rerank-CT

– Rerank these queries with sufficient historic click data

Full user behavior model predictions: called

Rerank-All

Integrate all user behavior features directly: +All

– User behavior features + content match

29

Content, User Behavior:

Precision at K, queries with interactions

BM25 < Rerank-CT < Rerank-All < +All

0.38

0.43

0.48

0.53

0.58

0.63

1 3 5 10K

Pre

cis

ion

BM25

Rerank-CT

Rerank-All

BM25+All

30

Content, User Behavior: NDCG

BM25 < Rerank-CT < Rerank-All < +All

0.5

0.52

0.54

0.56

0.58

0.6

0.62

0.64

0.66

0.68

1 2 3 4 5 6 7 8 9 10K

ND

CG

BM25

Rerank-CT

Rerank-All

BM25+All

31

Impact: All Queries, Precision at K

< 50% of test queries w/ prior interactions

+0.06-0.12 precision over all test queries

0.4

0.45

0.5

0.55

0.6

0.65

0.7

1 3 5 10K

Pre

cis

ion

RNRerank-AllRN+All

32

Impact: All Queries, NDCG

+0.03-0.05 NDCG over all test queries

0.56

0.58

0.6

0.62

0.64

0.66

0.68

0.7

1 2 3 4 5 6 7 8 9 10K

ND

CG

RN

Rerank-All

RN+All

33

Which Queries Benefit Most

0

50

100

150

200

250

300

350

0.1 0.2 0.3 0.4 0.5 0.6

-0.4

-0.35

-0.3

-0.25

-0.2

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2

Frequency Average Gain

Most gains are for queries with poor ranking

34

Conclusions

• Incorporating user behavior into web search ranking dramatically improves relevance

• Providing rich user interaction features to ranker is the most effective strategy

• Large improvement shown for up to 50% of test queries

35

Full Search Engine, User Behavior:

NDCG, MAP

MAP Gain

RN 0.270

RN+ALL 0.321 0.052 (19.13%)

BM25 0.236

BM25+ALL 0.292 0.056 (23.71%)

0.56

0.58

0.6

0.62

0.64

0.66

0.68

0.7

0.72

0.74

1 2 3 4 5 6 7 8 9 10K

ND

CG

RN

Rerank-All

RN+All