View
216
Download
0
Category
Tags:
Preview:
Citation preview
EASE: An Effective 3-in-1 Keyword Search Method for
Unstructured, Semi-structured and Structured
Data
Guoliang Li et al.Guoliang Li et al.
The ProblemThe Problem
Keyword search introduces false positivesKeyword search introduces false positives
i.e.: “Conference 2008 Canada Data Integration”i.e.: “Conference 2008 Canada Data Integration”
The ProblemThe Problem
Websites are organized through contentWebsites are organized through content
““Dr Pain, Math 343, Linear Algebra”Dr Pain, Math 343, Linear Algebra”
The SolutionThe Solution
Combine linked pages for search, Combine linked pages for search, ordered by rankingordered by ranking
The Solution
r-Radius Steiner Graph Problem r-Radius Graph
Centric Distance: shortest path Radius: minimal centric distance
vu
t
r
s
The Solution
r-Radius Steiner Graph Problem Content node: Contains a keyword Steiner node: Two content nodes
u
t
r“Dr Pain”
“Math 343”
v
s
Finding r-Radius GraphsFinding r-Radius Graphs Query: “Shanmugasundaram, Guo, Query: “Shanmugasundaram, Guo,
XRANK”XRANK”
Avoiding OverlappingAvoiding Overlapping
Maximal r-Radius GraphMaximal r-Radius Graph It is not contained in another r-Radius It is not contained in another r-Radius
subgraphsubgraph But wait! There is still overlapBut wait! There is still overlap No problem:No problem:
Graph Clustering Graph Clustering Graph PartitioningGraph Partitioning
RankingRanking
TF-IDF-based IR ranking (tf,idf,ndl) is TF-IDF-based IR ranking (tf,idf,ndl) is okok
Better yet: structural compactness-Better yet: structural compactness-based DB ranking (SIM)based DB ranking (SIM) More compact more relevantMore compact more relevant Length of path inversely proportional to Length of path inversely proportional to
rankingranking
IndexingIndexing
IR score and Sim score are combinedIR score and Sim score are combined An inverted index (EI-Index) is An inverted index (EI-Index) is
created created The inverted index stores keyword The inverted index stores keyword
pairs and scorespairs and scores
Strengths of the PaperStrengths of the Paper
Very well written paperVery well written paper Deep research on the topicDeep research on the topic Mathematical based and provedMathematical based and proved Baseline with current methodsBaseline with current methods Good resultsGood results
Weakness and Future WorkWeakness and Future Work
It might be too complexIt might be too complex Could work on ways to find Steiner Could work on ways to find Steiner
graphs fastergraphs faster It doesn’t consider cases of farming It doesn’t consider cases of farming
sites or bogus sitessites or bogus sites
Recommended