42
Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata

Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata

Embed Size (px)

Citation preview

Page 1: Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata

Personalized Query Expansion for the Web

Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl

Gabriel Barata

Page 2: Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata

Motivation

by Tojosan @ Flickr

Page 3: Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata

What is query expansion?

Add meaningful search terms to the query…

Page 4: Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata

What is PIR based query expansion?

Add meaningful search terms to the query…

… related to the use’s interests.

Page 5: Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata

Why PIR based query expansion?

More personalization quality!

More privacy!

Page 6: Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata

Example

Google search: “canon book”

Page 7: Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata

Example

Top 3 results:• The Canon: A Whirligig Tour of the Beautiful

Basics of Science (Hardcover) @ Amazon

• Western Canon @ Wikipedia

• Biblical Canon @ Wikipedia

Page 8: Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata

Example

Top 3 results:• The Canon: A Whirligig Tour of the Beautiful

Basics of Science (Hardcover) @ Amazon

• Western Canon @ Wikipedia

• Biblical Canon @ Wikipedia

Page 9: Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata

Example

Expanded query: “canon book bible”

Page 10: Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata

Example

Top 3 results:• Biblical Canon @ Wikipedia

• Books of the Bible @ Wikipedia

• The Canon of the Bible @ catholicapologetics.org

Page 11: Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata

Query Expansion using Desktop data

by Old Shoe Woman @ Flickr

Page 12: Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata

Algorithms

• Expanding with Local Desktop Analysis• Expanding with Global Desktop Analysis

Page 13: Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata

Algorithms

• Expanding with Local Desktop Analysis• Expanding with Global Desktop Analysis

Page 14: Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata

Expanding with Local Desktop Analysis

• Term and Document Frequency• Lexical Compounds• Sentence Selection

Page 15: Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata

Expanding with Local Desktop Analysis

• Term and Document Frequency• Lexical Compounds• Sentence Selection

Page 16: Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata

Term and Document Frequency

𝑇𝑒𝑟𝑚𝑆𝑐𝑜𝑟𝑒= 12+ 12∙𝑛𝑟𝑊𝑜𝑟𝑑𝑠− 𝑝𝑜𝑠𝑛𝑟𝑊𝑜𝑟𝑑𝑠 ൨∙log(1+ 𝑇𝐹)

Page 17: Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata

Expanding with Local Desktop Analysis

• Term and Document Frequency• Lexical Compounds• Sentence Selection

Page 18: Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata

Lexical Compounds

{ adjective? Noun+ }

Page 19: Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata

Expanding with Local Desktop Analysis

• Term and Document Frequency• Lexical Compounds• Sentence Selection

Page 20: Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata

Sentence Selection

𝑆𝑒𝑛𝑡𝑒𝑛𝑐𝑒𝑆𝑐𝑜𝑟𝑒= 𝑆𝑊2𝑇𝑊 + 𝑃𝑆+ 𝑇𝑄2𝑁𝑄

𝑃𝑆= ቐ

𝐴𝑣𝑔ሺ𝑁𝑆ሻ− 𝑆𝑒𝑛𝑡𝑒𝑛𝑐𝑒𝐼𝑛𝑑𝑒𝑥𝐴𝑣𝑔2(𝑁𝑆) ,𝑖𝑓 𝑆𝑒𝑛𝑡𝑒𝑛𝑐𝑒𝐼𝑛𝑑𝑒𝑥≤ 100 ,𝑖𝑓 𝑆𝑒𝑛𝑡𝑒𝑛𝑐𝑒𝐼𝑛𝑑𝑒𝑥 > 10

𝑇𝐹> 𝑚𝑠= ቐ7− 0.1× ሺ25− 𝑁𝑆ሻ ,𝑖𝑓 𝑁𝑆< 257 ,𝑖𝑓 𝑁𝑆 ∈[25,40]7+ 0.1× ሺ𝑁𝑆− 40ሻ ,𝑖𝑓 𝑁𝑆> 40

Page 21: Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata

Expanding with Global Desktop Analysis

• Term Co-occurrence Statistics• Thesaurus based Expansion

Page 22: Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata

Expanding with Global Desktop Analysis

• Term Co-occurrence Statistics• Thesaurus based Expansion

Page 23: Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata

Term Co-occurrence Statistics

Page 24: Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata

Expanding with Global Desktop Analysis

• Term Co-occurrence Statistics• Thesaurus based Expansion

Page 25: Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata

Thesaurus based Expansion

Page 26: Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata

Experiments & Evaluation

by Canadian Museum of Nature @ Flickr

Page 27: Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata

Experiments

• 18 users• Files indexed within user selected paths,

Emails and Web cache

Page 28: Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata

Experiments

• They chose 4 queries:– 1 from the top 2% log queries (avg. length = 2.0)

– 1 random log query (avg. length = 2.3)

– 1 self-selected specific query (avg. length = 2.9)

– 1 self-selected ambiguous query (avg. length = 1.8)

Page 29: Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata

Evaluation

𝐷𝐶𝐺ሺ𝑖ሻ= ቐ

𝐺ሺ1ሻ ,𝑖𝑓 𝑖 = 1𝐷𝐶𝐺ሺ𝑖 − 1ሻ+ 𝐺ሺ𝑖ሻlog2(i) ,𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

Page 30: Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata

Evaluation

• Evaluated algorithms:– Google: Google query output– TF, DF: Term and Document Frequency– LC, LC[O]: Regular and Optimized Lexical Compounds– TC[CS], TC[MI], TC[LR]: Term Co-occurrences

Statistics using Cosine Similarity, Mutual Information and Likelihood Ratio

– WN[SYN], WN[SUB], WN[SUP]: WordNet based expansion with synonyms, sub-concepts and super-concepts.

Page 31: Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata

ResultsLog queries:

Page 32: Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata

ResultsSelf-selected queries:

Page 33: Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata

Introducing Adaptativity

by RavenCore17 @ Flickr

Page 34: Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata

Query Clarity

Page 35: Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata

Adaptive Expansion

Page 36: Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata

Experiments

• Same experimental setup as for the previous analyzis.

Page 37: Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata

Results

Log queries:

Page 38: Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata

Results

Self-selected queries:

Page 39: Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata

Results

Page 40: Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata

Conclusions

by ThisIsIt2 @ Flickr

Page 41: Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata

Conclusions

• Five techniques for determining expansion terms from personal documents.

• Empirical analysis showed that these approaches perform very well.

• Expansion process adapts accordingly to query features.

• Adaptive expansion process proved to yield significant improvements over the static one.

Page 42: Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl Gabriel Barata

End

Any questions?