A DDING S TRUCTURE TO T OP -K: F ORM I TEMS TO E XPANSIONS Date : 2012.5.21 Source : CIKM’ 11 Speaker : I-Chih Chiu Advisor : Dr. Jia-Ling Koh 1

Embed Size (px)

DESCRIPTION

I NTRODUCTION Keyword based search interfaces are extremely popular. 3

Citation preview

A DDING S TRUCTURE TO T OP -K: F ORM I TEMS TO E XPANSIONS Date : Source : CIKM 11 Speaker : I-Chih Chiu Advisor : Dr. Jia-Ling Koh 1 I NDEX Introduction Problem Definition Basic Algorithm Semantic Optimization Experiments Conclusion 2 I NTRODUCTION Keyword based search interfaces are extremely popular. 3 I NTRODUCTION Google search Query Whats the weather today? Results include what, weather, today. Lack of semantic. Del.icio.us Search results Using a faceted interface. Expansions A fixed set of tags. 4 I NTRODUCTION Motivated by these drawbacks of current search result interfaces, considering a search scenario in which each item is annotated with a set of keywords. Dont need to assume the existence of pre-defined categorical hierarchy Want to automatically group query result items into different expansions of the query corresponding to subsets of keywords. 5 I NDEX Introduction Problem Definition Basic Algorithm Semantic Optimization Experiments Conclusion 6 P ROBLEM D EFINITION 7 t i.a j : normalized to [0,1] Author(0.3)Click(0.6) t t t t u(t i ) 0.6* *0.8= * *0.2= * *0.3= * *0.4=0.51 P ROBLEM D EFINITION Group items into different expansions of Q and return high quality expansions. A subset of keywords e K Q. (K : all keywords) Subset-of relationship for K-Q={k 1,k 2,k 3,k 4 } 8 D ETERMINING I MPORTANCE OF A N E XPANSION 9 S k1 S k1,k2 S k2,k3 t 1 (k 1 )0.4XX t 2 (k 1,k 2 )0.60.5X t 3 (k 3 )XX0.6 g(S e ) I NDEX Introduction Problem Definition Basic Algorithm Semantic Optimization Experiments Conclusion 10 N AVE A LGORITHM TopExp-Nave algorithm 11 Access items in the non- increasing order of their attribute value For each matching item accessed, enumerate all possible expansions and update their lower bound and upper bound utility value; Round-robin I MPROVED A LGORITHM 12 LKLK L I MPROVED A LGORITHM 13 I MPROVED A LGORITHM TopExp-Lazy algorithm 14 Access items in the non- increasing order of their attribute value I MPROVED A LGORITHM To count how many expansions correspond to the same set of items. Use the classical inclusion-exclusion principle. 2 |e| count 1 count += 2 |e| -1 E.g. e = {k 1,k 2,k 3 } 8 (2 |e| ) e = {k 1,k 2 },{k 3 } 4 (count) 8 4 1 = 3 ({k 1, k 2, k 3 }, {k 1, k 3 } and {k 2, k 3 }). 15 I NDEX Introduction Problem Definition Basic Algorithm Semantic Optimization Experiments Conclusion 16 W EIGHTING E XPANSIONS 17 P ATH E XCLUSION BASED A LGORITHM 18 P ATH E XCLUSION BASED A LGORITHM 19 Assume weights are equal 1. H1H1 H2H2 G P ATH E XCLUSION BASED A LGORITHM Top-PEkExp algorithm 20 Generate necessary expansions using TopExp-Lazy R G GreedyMWIS( L ); Etopk k expansions in L which have the largest upper bound utilities; I NDEX Introduction Problem Definition Basic Algorithm Semantic Optimization Experiments Conclusion 21 E XPERIMENTS Synthetic datasets Generated 5 synthetic datasets with size from 8000 to Efficiency Scalability Memory saving Real datasets The ACM Digital Library. Demonstrate the quality of the expansions returned. 22 E XPERIMENTS Fixed N=10 and k=10 23 E XPERIMENTS Fixed number of items=10000, N = 10 24 E XPERIMENTS Fixed number of items=10000, k = 10 25 E XPERIMENTS Queries : xml histogram privacy Attributes : The average author publication number The citation count. Keywords : The title Keywords list Abstract 26 27 C ONCLUSION They studied the problem of how to better present search/query results to users. Proposed various efficient algorithms which can calculate top-k expansions. Not only demonstrated the performance of the proposed algorithms, also validated the quality of the expansions returned by doing a study on a real data set. 28