26
Generating Queries from User-Selected Text Date : 2013/03/04 Resource : IIiX’12 Advisor : Dr. Jia- Ling Koh Speaker : I-Chih Chiu

Generating Queries from User-Selected Text

  • Upload
    duncan

  • View
    53

  • Download
    0

Embed Size (px)

DESCRIPTION

Generating Queries from User-Selected Text. Date : 2013/03/04 Resource : IIiX’12 Advisor : Dr. Jia -Ling Koh Speaker : I- Chih Chiu. Outline. Introduction Approaches Experiments Conclusion. Outline. Introduction Motivation Goal Flow Chart Approaches Experiments Conclusion. - PowerPoint PPT Presentation

Citation preview

Page 1: Generating Queries from  User-Selected Text

Generating Queries from User-Selected Text

Date : 2013/03/04Resource : IIiX’12Advisor : Dr. Jia-Ling KohSpeaker : I-Chih Chiu

Page 2: Generating Queries from  User-Selected Text

Outline Introduction

Approaches

Experiments

Conclusion

Page 3: Generating Queries from  User-Selected Text

Outline Introduction

Motivation Goal Flow Chart

Approaches Experiments Conclusion

Page 4: Generating Queries from  User-Selected Text

Motivation Annotation, which are

becoming more common in various tablet applications, can help improve understanding content.

Queries constructed from the annotated texts can be very effective.

Page 5: Generating Queries from  User-Selected Text

Motivation Manual query construction based on text passages

is common; however, such formulation can involve considerable effort for users and an effective search is not guaranteed.

Past researches Log history Relevance feedback More-like-this

Page 6: Generating Queries from  User-Selected Text

Goal Authors propose techniques for generating queries

from user-selected or annotated text passages.

A user can select any arbitrary text segment of interest while browsing, and then automatically generate queries based on that text segment.

Page 7: Generating Queries from  User-Selected Text

Flow Chart The use of noun phrases or named entities as the

minimum semantic building blocks has proven to be reliable in past research on information retrieval and natural language processing.

Authors propose to identify important noun phrases and named entities, called “chunks“, within the selected text segment as the basic building blocks for query formulation.

Page 8: Generating Queries from  User-Selected Text

Flow Chart

TS : Text Segment C : Chunks Ce : effective Chunks

Page 9: Generating Queries from  User-Selected Text

Outline Introduction Approaches

Chunk Extraction Chunk Selection Query Generation

Experiments Conclusion

Page 10: Generating Queries from  User-Selected Text

Chunk Extraction

Page 11: Generating Queries from  User-Selected Text

Chunk Selection Frequency-based approach

Learning-based approach

Page 12: Generating Queries from  User-Selected Text

Frequency-based

Following the common belief in the effectiveness of term inverse document frequency

is considered more important than if

Based on the number of returned results select the top k most infrequent chunks →

Chunk Selection

chunks Web search API 𝑁={𝑛1 ,𝑛2 ,…,𝑛𝑛 }

Page 13: Generating Queries from  User-Selected Text

Learning-based CRF-perf model (Conditional Random Field)

To identify important chunks in C

Features

Labeling problem Each chunk , and means “keep” and “don’t keep” respectively.

Chunk Selection

Page 14: Generating Queries from  User-Selected Text

Learning-based CRF-perf model

In the training phase, the model parameters

Chunk Selection

𝑃 (𝐿|𝐶 )=exp (∑

𝑗=1

𝐽

𝜆 𝑗 𝑓 𝑗(𝐿 ,𝐶))

𝑍 (𝐶 )

𝑍 (𝐶 )=∑𝐿exp (∑

𝑗=1

𝐽

𝜆 𝑗 𝑓 𝑗 (𝐿 ,𝐶 ))

: the features : the weight of : the number of features : a normalizer

𝑂𝑏𝑗 (𝜃 )=∏𝐶∑𝐿𝑃 (𝐿|𝐶 )𝑚(𝐿)

: the retrieval performance(MAP) : log-likelihood : a regularization avoids unbounded parameter values.

𝑙 (𝜃 )=∑𝐶𝑙𝑜𝑔∑

𝐿exp (∑𝑗 𝜆 𝑗 𝑓 𝑗 (𝐿 ,𝐶 ))𝑚 (𝐿 )−∑

𝐶𝑙𝑜𝑔𝑍 (𝐶 )−𝑅

Page 15: Generating Queries from  User-Selected Text

Learning-based For example

Chunk Selection

C = {Taiwan, baseball player, money}L have eight combinations, “keep” or “don’t keep”

L = {1,1,0}𝑃 (𝐿|𝐶 )=

exp (∑𝑗=1

𝐽

𝜆 𝑗 𝑓 𝑗(𝐿 ,𝐶))

𝑍 (𝐶 )

𝑍 (𝐶 )=∑𝐿exp (∑

𝑗=1

𝐽

𝜆 𝑗 𝑓 𝑗 (𝐿 ,𝐶 ))

Page 16: Generating Queries from  User-Selected Text

Select effective chunks Three ways construct the final chunk set

CombC The chunk combination with the highest probability

CombC + TopC(2) Select two top-performing single chunks with the highest

probability

TopC(k) It contains the top k effective chunks by algorithm.

Page 17: Generating Queries from  User-Selected Text

Select effective chunks TopC(k) ()

Threshold = 0.42

Page 18: Generating Queries from  User-Selected Text

Query Generation

According to frequency based approach , , : document frequency

The query is generated by combining the best chunk combination (max ) with

denotes the corresponding with no stopwords.

Page 19: Generating Queries from  User-Selected Text

Query Generation

Based on the model ,

Using model and Algorithm

Page 20: Generating Queries from  User-Selected Text

Outline Introduction Approaches Experiments Conclusion

Page 21: Generating Queries from  User-Selected Text

Experiment Experimental Setup

TREC Gov2 collection 25205179 documents Average number of words in text segments and documents

before/after removing stopwords for the selected 50 topics.

Use 10-fold cross validation for training and testing the CRF-perf models.

Page 22: Generating Queries from  User-Selected Text

Experiment

PRF(Pseudo relevance feedback) : extract the top 10 and 20 tf-idf weighted terms from

Page 23: Generating Queries from  User-Selected Text

Experiment TopC(K)

average k value is 3.85.

Page 24: Generating Queries from  User-Selected Text

Outline Introduction Approaches Experiments Conclusion

Page 25: Generating Queries from  User-Selected Text

Conclusion They present approaches for generating queries

based on user-selected text segments from a document.

They propose several learning-based approaches to selecting effective chunks from the text segments.

In the experiments, the technique TopC(k) has the advantage of automatic determination of k can significantly improve retrieval performance.

Page 26: Generating Queries from  User-Selected Text

Thanks for your listening