Personalized Query Expansion for the Web Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl...

Preview:

Citation preview

Personalized Query Expansion for the Web

Paul-Alexandru Chirita, Claudiu S. Firan, Wolfgang Nejdl

Gabriel Barata

Motivation

by Tojosan @ Flickr

What is query expansion?

Add meaningful search terms to the query…

What is PIR based query expansion?

Add meaningful search terms to the query…

… related to the use’s interests.

Why PIR based query expansion?

More personalization quality!

More privacy!

Example

Google search: “canon book”

Example

Top 3 results:• The Canon: A Whirligig Tour of the Beautiful

Basics of Science (Hardcover) @ Amazon

• Western Canon @ Wikipedia

• Biblical Canon @ Wikipedia

Example

Top 3 results:• The Canon: A Whirligig Tour of the Beautiful

Basics of Science (Hardcover) @ Amazon

• Western Canon @ Wikipedia

• Biblical Canon @ Wikipedia

Example

Expanded query: “canon book bible”

Example

Top 3 results:• Biblical Canon @ Wikipedia

• Books of the Bible @ Wikipedia

• The Canon of the Bible @ catholicapologetics.org

Query Expansion using Desktop data

by Old Shoe Woman @ Flickr

Algorithms

• Expanding with Local Desktop Analysis• Expanding with Global Desktop Analysis

Algorithms

• Expanding with Local Desktop Analysis• Expanding with Global Desktop Analysis

Expanding with Local Desktop Analysis

• Term and Document Frequency• Lexical Compounds• Sentence Selection

Expanding with Local Desktop Analysis

• Term and Document Frequency• Lexical Compounds• Sentence Selection

Term and Document Frequency

𝑇𝑒𝑟𝑚𝑆𝑐𝑜𝑟𝑒= 12+ 12∙𝑛𝑟𝑊𝑜𝑟𝑑𝑠− 𝑝𝑜𝑠𝑛𝑟𝑊𝑜𝑟𝑑𝑠 ൨∙log(1+ 𝑇𝐹)

Expanding with Local Desktop Analysis

• Term and Document Frequency• Lexical Compounds• Sentence Selection

Lexical Compounds

{ adjective? Noun+ }

Expanding with Local Desktop Analysis

• Term and Document Frequency• Lexical Compounds• Sentence Selection

Sentence Selection

𝑆𝑒𝑛𝑡𝑒𝑛𝑐𝑒𝑆𝑐𝑜𝑟𝑒= 𝑆𝑊2𝑇𝑊 + 𝑃𝑆+ 𝑇𝑄2𝑁𝑄

𝑃𝑆= ቐ

𝐴𝑣𝑔ሺ𝑁𝑆ሻ− 𝑆𝑒𝑛𝑡𝑒𝑛𝑐𝑒𝐼𝑛𝑑𝑒𝑥𝐴𝑣𝑔2(𝑁𝑆) ,𝑖𝑓 𝑆𝑒𝑛𝑡𝑒𝑛𝑐𝑒𝐼𝑛𝑑𝑒𝑥≤ 100 ,𝑖𝑓 𝑆𝑒𝑛𝑡𝑒𝑛𝑐𝑒𝐼𝑛𝑑𝑒𝑥 > 10

𝑇𝐹> 𝑚𝑠= ቐ7− 0.1× ሺ25− 𝑁𝑆ሻ ,𝑖𝑓 𝑁𝑆< 257 ,𝑖𝑓 𝑁𝑆 ∈[25,40]7+ 0.1× ሺ𝑁𝑆− 40ሻ ,𝑖𝑓 𝑁𝑆> 40

Expanding with Global Desktop Analysis

• Term Co-occurrence Statistics• Thesaurus based Expansion

Expanding with Global Desktop Analysis

• Term Co-occurrence Statistics• Thesaurus based Expansion

Term Co-occurrence Statistics

Expanding with Global Desktop Analysis

• Term Co-occurrence Statistics• Thesaurus based Expansion

Thesaurus based Expansion

Experiments & Evaluation

by Canadian Museum of Nature @ Flickr

Experiments

• 18 users• Files indexed within user selected paths,

Emails and Web cache

Experiments

• They chose 4 queries:– 1 from the top 2% log queries (avg. length = 2.0)

– 1 random log query (avg. length = 2.3)

– 1 self-selected specific query (avg. length = 2.9)

– 1 self-selected ambiguous query (avg. length = 1.8)

Evaluation

𝐷𝐶𝐺ሺ𝑖ሻ= ቐ

𝐺ሺ1ሻ ,𝑖𝑓 𝑖 = 1𝐷𝐶𝐺ሺ𝑖 − 1ሻ+ 𝐺ሺ𝑖ሻlog2(i) ,𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒

Evaluation

• Evaluated algorithms:– Google: Google query output– TF, DF: Term and Document Frequency– LC, LC[O]: Regular and Optimized Lexical Compounds– TC[CS], TC[MI], TC[LR]: Term Co-occurrences

Statistics using Cosine Similarity, Mutual Information and Likelihood Ratio

– WN[SYN], WN[SUB], WN[SUP]: WordNet based expansion with synonyms, sub-concepts and super-concepts.

ResultsLog queries:

ResultsSelf-selected queries:

Introducing Adaptativity

by RavenCore17 @ Flickr

Query Clarity

Adaptive Expansion

Experiments

• Same experimental setup as for the previous analyzis.

Results

Log queries:

Results

Self-selected queries:

Results

Conclusions

by ThisIsIt2 @ Flickr

Conclusions

• Five techniques for determining expansion terms from personal documents.

• Empirical analysis showed that these approaches perform very well.

• Expansion process adapts accordingly to query features.

• Adaptive expansion process proved to yield significant improvements over the static one.

End

Any questions?

Recommended