31
1 Pengjie Ren, Zhumin Chen and Jun Ma Information Retrieval Lab. Shandong University 报报报 报报报 2013 报 11 报 18 报 Understanding Temporal Intent of User Query based on Time-based Query Classification

1 Pengjie Ren, Zhumin Chen and Jun Ma Information Retrieval Lab. Shandong University 报告人:任鹏杰 2013 年 11 月 18 日 Understanding Temporal Intent of User Query

Embed Size (px)

Citation preview

Page 1: 1 Pengjie Ren, Zhumin Chen and Jun Ma Information Retrieval Lab. Shandong University 报告人:任鹏杰 2013 年 11 月 18 日 Understanding Temporal Intent of User Query

1

Pengjie Ren, Zhumin Chen and Jun MaInformation Retrieval Lab.Shandong University报告人:任鹏杰2013 年 11 月 18日

Understanding Temporal Intent of User Query based on Time-based Query Classification

Page 2: 1 Pengjie Ren, Zhumin Chen and Jun Ma Information Retrieval Lab. Shandong University 报告人:任鹏杰 2013 年 11 月 18 日 Understanding Temporal Intent of User Query

2

Outline

Why Temporal Intent Detection?Query Temporal Pattern TaxonomyQuery Pattern Detection FrameworkExperiment ResultsApplicationConclusion and Future Work

Page 3: 1 Pengjie Ren, Zhumin Chen and Jun Ma Information Retrieval Lab. Shandong University 报告人:任鹏杰 2013 年 11 月 18 日 Understanding Temporal Intent of User Query

3

Outline

Why Temporal Intent Detection?Query Temporal Pattern TaxonomyQuery Pattern Detection FrameworkExperiment ResultsApplicationConclusion and Future Work

Page 4: 1 Pengjie Ren, Zhumin Chen and Jun Ma Information Retrieval Lab. Shandong University 报告人:任鹏杰 2013 年 11 月 18 日 Understanding Temporal Intent of User Query

4

Why Temporal Intent Detection?

Richard McCreadie SIGIR 2013 Users tend to prefer rankings that integrate tweets or newswire articles soon after an event breaks, and blogs and Wikipedia pages become more useful over time.

Automatic temporal intent detection is very significant for time-sensitive information retrieval, temporal diversity etc.!

Hideo Joho WWW 201348.2% seek for information about the same day as they perform the search;32.7% look for past information;8.1% look for future information;10.9% say that their information needs do not have specific temporal attributes.

Page 5: 1 Pengjie Ren, Zhumin Chen and Jun Ma Information Retrieval Lab. Shandong University 报告人:任鹏杰 2013 年 11 月 18 日 Understanding Temporal Intent of User Query

5

Outline

Why Temporal Intent Detection?Query Temporal Pattern TaxonomyQuery Pattern Detection FrameworkExperiment ResultsApplicationConclusion and Future Work

Page 6: 1 Pengjie Ren, Zhumin Chen and Jun Ma Information Retrieval Lab. Shandong University 报告人:任鹏杰 2013 年 11 月 18 日 Understanding Temporal Intent of User Query

6In this paper, we propose an approach to identify the different temporal patterns automatically.

Different Temporal Patterns Imply Different Temporal Intents

Kulkarni A et al. (WSDM 2011) find some temporal patterns of query through mining query logs.

However, they do not propose methods to identify those patterns automatically.

Query frequency Curves from Google Trend

Page 7: 1 Pengjie Ren, Zhumin Chen and Jun Ma Information Retrieval Lab. Shandong University 报告人:任鹏杰 2013 年 11 月 18 日 Understanding Temporal Intent of User Query

7

Query Temporal Pattern Taxonomy

Java JDK

Haiti Earthquake

Christmas PresentEarthquake

Clearly, we can use spikes to detect query temporal patterns.

Page 8: 1 Pengjie Ren, Zhumin Chen and Jun Ma Information Retrieval Lab. Shandong University 报告人:任鹏杰 2013 年 11 月 18 日 Understanding Temporal Intent of User Query

8

What is a Spike?

A spike is a set of continuous points on the query frequency curve that burst singularly. Generally, it represents an event.

Spikes are hard to be detected effectively and precisely. Specially, we found it not effective to learn a cutting line to identify all spikes.

Southeast Asia Earthquake

Pakistan earthquake

China earthquake

Haiti earthquake

Japan earthquake

Virginia earthquake

Page 9: 1 Pengjie Ren, Zhumin Chen and Jun Ma Information Retrieval Lab. Shandong University 报告人:任鹏杰 2013 年 11 月 18 日 Understanding Temporal Intent of User Query

9

Outline

Why Temporal Intent Detection?Query Temporal Pattern TaxonomyQuery Pattern Detection FrameworkExperiment ResultsApplicationConclusion and Future Work

Page 10: 1 Pengjie Ren, Zhumin Chen and Jun Ma Information Retrieval Lab. Shandong University 报告人:任鹏杰 2013 年 11 月 18 日 Understanding Temporal Intent of User Query

10

Query Classification System

Query Pattern Detection Framework

Training Set

Query Log

Feature Extraction

Query frequency curves

Query

Classifier(SVM)

QueryPattern

Preprocess

Page 11: 1 Pengjie Ren, Zhumin Chen and Jun Ma Information Retrieval Lab. Shandong University 报告人:任鹏杰 2013 年 11 月 18 日 Understanding Temporal Intent of User Query

11

(1). Preprocess

ttt YsmF

Trend ComponentSeasonal Component

Random Component

Use polynomial regression to model Trend Component.

According to time series analysis, any curve contains three components.

This is what we care in this paper.So we should remove Trend Component.

;T ξmt xw

Page 12: 1 Pengjie Ren, Zhumin Chen and Jun Ma Information Retrieval Lab. Shandong University 报告人:任鹏杰 2013 年 11 月 18 日 Understanding Temporal Intent of User Query

12

);,,0|(~ ξ StξWe use Student-t Distribution instead of Gaussian Distribution because we do not have exact training data pair (X, mt). We have to use (X,F) instead.Thus, St and Yt components become noise when training. Student-t Distribution is more robust to noise than Gaussian Distribution.

From PRML

Student-t

Gaussian

noise

without noise both work well

||;||2

1),,|(log),,(

1

T wxww

n

iiifStL Log likelihood loss function

(1). Preprocess

Page 13: 1 Pengjie Ren, Zhumin Chen and Jun Ma Information Retrieval Lab. Shandong University 报告人:任鹏杰 2013 年 11 月 18 日 Understanding Temporal Intent of User Query

13

Original Query Curve

Trend Component

Seasonal & Random Component

(1). Preprocess

Page 14: 1 Pengjie Ren, Zhumin Chen and Jun Ma Information Retrieval Lab. Shandong University 报告人:任鹏杰 2013 年 11 月 18 日 Understanding Temporal Intent of User Query

14

(2). Feature Extraction

MeanStandard DeviationMR (Max Rate)SR (Spike Rate)

Basic Features

Curve Distance Features

Regression Features

For preprocessed query frequency curves, we define following features.

DQoT

DOQ

DAMQ

DPMQ

CutoffSpikesPD(Periodic Deviation)

Page 15: 1 Pengjie Ren, Zhumin Chen and Jun Ma Information Retrieval Lab. Shandong University 报告人:任鹏杰 2013 年 11 月 18 日 Understanding Temporal Intent of User Query

15

MR (Max Rate)

tt

M

f

fMR

Page 16: 1 Pengjie Ren, Zhumin Chen and Jun Ma Information Retrieval Lab. Shandong University 报告人:任鹏杰 2013 年 11 月 18 日 Understanding Temporal Intent of User Query

16

SR (Spike Rate)

tt

mMMMMmMNM

f

fffffffffSR

}),...,,,,...,{},...,,max({ 1121

}),...,,max({ 21 NM ffff

MQ

OQ QoT

m is half the period of a spike.

Page 17: 1 Pengjie Ren, Zhumin Chen and Jun Ma Information Retrieval Lab. Shandong University 报告人:任鹏杰 2013 年 11 月 18 日 Understanding Temporal Intent of User Query

17

How to determine the value of m?

MjMMMMiM frfffff },...,,,,...,{ 11

SR (Spike Rate)

Page 18: 1 Pengjie Ren, Zhumin Chen and Jun Ma Information Retrieval Lab. Shandong University 报告人:任鹏杰 2013 年 11 月 18 日 Understanding Temporal Intent of User Query

18

Distance between Two Curves

Fiq :shifting time series Fi by q time units.

|| || :the l2 norm.

This measure finds the optimal alignment (translation q) and the scaling coefficient α for matching the shapes of the two time series. It is difficult to find the optimum solution. In practice, we shift all possible q to find the approximation solution.

)1)(||1||

||21||(min)2,1(tan

, F

FFFFceDis q

q

Jaewon Yang and Jure Leskovec. Patterns of temporal variation in online media. WSDM, 2011.

Page 19: 1 Pengjie Ren, Zhumin Chen and Jun Ma Information Retrieval Lab. Shandong University 报告人:任鹏杰 2013 年 11 月 18 日 Understanding Temporal Intent of User Query

19Jaewon Yang and Jure Leskovec. Patterns of temporal variation in online media. WSDM, 2011.

Distance between Two Curves

Page 20: 1 Pengjie Ren, Zhumin Chen and Jun Ma Information Retrieval Lab. Shandong University 报告人:任鹏杰 2013 年 11 月 18 日 Understanding Temporal Intent of User Query

20

DQoT DOQ DAMQ DPMQ

DQoT: Average distance from annotated QoT curves.DOQ : Average distance from annotated OQ curves.DAMQ : Average distance from annotated AMQ curves.DPMQ : Average distance from annotated PMQ curves.

Similar to KNN but cost much less time.

Page 21: 1 Pengjie Ren, Zhumin Chen and Jun Ma Information Retrieval Lab. Shandong University 报告人:任鹏杰 2013 年 11 月 18 日 Understanding Temporal Intent of User Query

21

Cutoff Spikes PD

What about training data? (F, Cutoff) pair is not known.

XWTCutoff

PD: Measure periodicity…… …… …………

Spikes: Number of spikes…… …………

Above 8 features are combined to learn a cutting off line

We can use annotated pair (F, Pattern Category) to approximate (F, Cutoff).

For this curve, because we annotate it as MQ, the cutoff value line in the pink area.

Page 22: 1 Pengjie Ren, Zhumin Chen and Jun Ma Information Retrieval Lab. Shandong University 报告人:任鹏杰 2013 年 11 月 18 日 Understanding Temporal Intent of User Query

22

Outline

Why Temporal Intent Detection?Query Temporal Pattern TaxonomyQuery Pattern Detection FrameworkExperiment ResultsApplicationConclusion and Future Work

Page 23: 1 Pengjie Ren, Zhumin Chen and Jun Ma Information Retrieval Lab. Shandong University 报告人:任鹏杰 2013 年 11 月 18 日 Understanding Temporal Intent of User Query

Experiment Results

5,000 queries from Query Track 07-09 of TREC.Corresponding query frequency files from Google Trends.Manually annotate categories of these queries in terms of their frequency curves.5-fold

Query Class QoT OQ AMQ PMQ average

P 0.952 0.928 0.846 0.914 0.910

R 0.973 0.915 0.831 0.924 0.911

F1 0.962 0.922 0.838 0.919 0.910

Classification Performance Comparison for Different Query Categories

AMQ

PMQ

QoT

OQ

Page 24: 1 Pengjie Ren, Zhumin Chen and Jun Ma Information Retrieval Lab. Shandong University 报告人:任鹏杰 2013 年 11 月 18 日 Understanding Temporal Intent of User Query

24

Feature Effectiveness Analysis

Page 25: 1 Pengjie Ren, Zhumin Chen and Jun Ma Information Retrieval Lab. Shandong University 报告人:任鹏杰 2013 年 11 月 18 日 Understanding Temporal Intent of User Query

25

Outline

Why Temporal Intent Detection?Query Temporal Pattern TaxonomyQuery Pattern Detection FrameworkExperiment ResultsApplicationConclusion and Future Work

Page 26: 1 Pengjie Ren, Zhumin Chen and Jun Ma Information Retrieval Lab. Shandong University 报告人:任鹏杰 2013 年 11 月 18 日 Understanding Temporal Intent of User Query

26

Application – Temporal Diversity

Temporal intents of user query are uncertain, we should diversify the search results in time dimension in order to cover more important time unit of user query.

Tt Zz Sd

ztqdPqzPqtPqSP ))),,|(1(1)(|()|()|(

Temporal Intent Coverage

Subtopic Coverage

Novelty

Page 27: 1 Pengjie Ren, Zhumin Chen and Jun Ma Information Retrieval Lab. Shandong University 报告人:任鹏杰 2013 年 11 月 18 日 Understanding Temporal Intent of User Query

27

Application – Temporal Diversity

MMR SIGIR’98xQuAD WWW’10IA-Select WSDM’09LM+T+D SIGIR’13RM+T+S+D Our method

Page 28: 1 Pengjie Ren, Zhumin Chen and Jun Ma Information Retrieval Lab. Shandong University 报告人:任鹏杰 2013 年 11 月 18 日 Understanding Temporal Intent of User Query

28

Outline

Why Temporal Intent Detection?Query Temporal Pattern TaxonomyQuery Pattern Detection FrameworkExperiment ResultsApplicationConclusion and Future Work

Page 29: 1 Pengjie Ren, Zhumin Chen and Jun Ma Information Retrieval Lab. Shandong University 报告人:任鹏杰 2013 年 11 月 18 日 Understanding Temporal Intent of User Query

Conclusion

We shift the problem of temporal intents detection to classification problem.

We propose effective features to detect temporal intents effectively.

We imply temporal intents results to temporal diversity and achieve high performance.

29

Page 30: 1 Pengjie Ren, Zhumin Chen and Jun Ma Information Retrieval Lab. Shandong University 报告人:任鹏杰 2013 年 11 月 18 日 Understanding Temporal Intent of User Query

30

Future Work

More Effective FeaturesData sparse problem for long queries

Page 31: 1 Pengjie Ren, Zhumin Chen and Jun Ma Information Retrieval Lab. Shandong University 报告人:任鹏杰 2013 年 11 月 18 日 Understanding Temporal Intent of User Query

31

Thanks a lot for your attention!