Upload
ethan
View
56
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Swoogle: A Semantic Web Search and Metadata Engine. Li Ding, Tim Finin, Anupam Joshi, Rong Pan, R. Scott Cost, Yun Peng Pavan Reddivari, Vishal Doshi, Joel Sachs Department of Computer Science and Electronic Engineering University of Maryland Baltimore County CIKM ‘04 ------- Dongmin Shin - PowerPoint PPT Presentation
Citation preview
Swoogle: A Semantic Web Search Swoogle: A Semantic Web Search and Metadata Engineand Metadata Engine
Li Ding, Tim Finin, Anupam Joshi, Rong Pan, R. Scott Cost, Yun Peng
Pavan Reddivari, Vishal Doshi, Joel Sachs
Department of Computer Science and Electronic Engineering University of Maryland Baltimore County
CIKM ‘04
-------
Dongmin Shin
IDS Lab
2008.10.22
Copyright 2008 by CEBT
IndexIndex
Introduction
Semantic Web Documents
Swoogle Architecture
Finding SWDs
SWD Metadata
Ranking SWDs
Indexing and Retrieval of SWDs
Conclusions
Evaluation and Discussion
Center for E-Business Technology
Copyright 2008 by CEBT
IntroductionIntroduction
Semantic Web documents(SWDs) are characterized by semantic annotation and meaningful references to other SWDs
Conventional search engines do not take advantage of these features
A search engine customized for SWDs is needed
Center for E-Business Technology
Swoogle is a crawler-based indexing and retrieval system for the Semantic Web
Copyright 2008 by CEBT
IntroductionIntroduction
Three Activities of Swoogle
Finding appropriate ontologies
– Allows users to query for ontologies that contain specified terms anywhere in the document
– The ontologies returned are ranked
Finding instance data
– Enables querying SWDs with constraints on what classes and properties being used/defined by them
Characterizing the Semantic Web
– Be collecting metadata about the Semantic Web, Swoogle reveals interesting structural properties
Center for E-Business Technology
Swoogle automatically discovers SWDs, indexes their metadata and answers queries about it
Copyright 2008 by CEBT
Semantic Web DocumentsSemantic Web Documents
SWD A document in a semantic web language that is online
and accessible to web users and software agents
Two kinds of documents of SWD SWOs (Semantic Web Ontologies)
– Correspond to T-Boxes
– Significant proportion of the statements it makes define new terms or extend the definitions of terms defined in other SWDs
SWDBs (Semantic Web Databases)– Correspond to A-Boxes
– It does not define or extend a significant number of terms
– It can introduce individuals and make assertions about them or make assertions about individuals defined in other SWDs
Center for E-Business Technology
Copyright 2008 by CEBT
Swoogle ArchitectureSwoogle Architecture
SWD discovery
Discovers potential SWDs
throughout the Web
Metadata creation
Caches a snapshot of a SWD and generates objective metadata about SWDs
Data analysis
Uses the cached SWDs and the created metadata to derive analytical reports
Interface
Providing data services to the Semantic Web community
Center for E-Business Technology
Copyright 2008 by CEBT
Finding SWDsFinding SWDs
Google Crawler Using Google Web Service
Start with type extensions
Append some constraints(keywords) to construct more specific queries, and then combine their results
Focused Crawler Crawls documents within a given website
Extension constraint– e.g. not “.jpg” or “.html”
Focus constraint– only crawl URLs relative to the given base URL
Center for E-Business Technology
Copyright 2008 by CEBT
Finding SWDsFinding SWDs
Web interface
Registered users can submit a URL of either a SWD or a web directory
JENA2 based Swoogle Crawler
Analyzes the content of a SWD and discovers new SWDs
– E.g. Use URIref, owl:imports, rdfs:seeAlso, foaf:Person
Center for E-Business Technology
Copyright 2008 by CEBT
SWD Metadata – Basic MetadataSWD Metadata – Basic Metadata
Language feature Properties describing the syntactic or semantic features
of a SWD– Encoding : syntactic encoding of a SWD : RDF/XML, N-TRIPLE, N3
– Language : Semantic Web language used by a SWD : OWL, DAML, RDFS, RDF
– OWL Species : language species of a SWD written in OWL : OWL-LITE, OWL-DL, OWL-FULL
RDF statistics Properties summarizing node distribution of the RDF
graph
Focus on how SWDs define new classes, properties and individuals– SWDB & SWO by ontology-ratio R(foo)
Center for E-Business Technology
Copyright 2008 by CEBT
SWD Metadata – Basic MetadataSWD Metadata – Basic Metadata
Ontology annotation
Properties that describe a SWD as an ontology
– label. i.e. rdfs:label
– comment. i.e. rdfs:comment
– versionInfo. i.e. owl:versionInfo and daml:versionInfo
Center for E-Business Technology
Copyright 2008 by CEBT
SWD Metadata – Relations among SWD Metadata – Relations among SWDsSWDs
TM/IN Term reference relations between two SWDs
– i.e. a SWD is using terms defined by some other SWDs
IM An ontology imports another ontology
EX An ontology extends another
– i.e. ontology A defines class AC which has the “rdfs:subClassOf” relation with class BC defined in ontology B
PV An ontology is a prior version of another
CPV An ontology is a prior version of and is compatible with another
IPV An ontology is a prior version of but is incompatible with another
Center for E-Business Technology
Copyright 2008 by CEBT
Ranking SWDsRanking SWDs
Random surfing model(PageRank)
not appropriate for the Semantic Web
– Semantics of links lead to a non-uniform probability of following a particular outgoing link
Rational random surfing model
Inter-SWD links into four categories
– imports(A,B), uses-term(A,B), extends(A,B), asserts(A,B)
The more terms in B referenced by A, the more likely a surfer will follow the link from A to B
Center for E-Business Technology
Copyright 2008 by CEBT
Ranking SWDsRanking SWDs
Center for E-Business Technology
A
B
D
C
PR(A) = (1-d) + d( 1/4 + 1/2 + 1/3)
Swoogle
A
B
D
C
rawPR(A) = (1-d) + d( 0.4/(0.4+0.3+0.2+0.4) +
0.6/(0.6+0.1) +0.5/(0.5+0.1+0.7))
0.4
0.30.2
0.40.1
0.6
0.5
0.70.1
Copyright 2008 by CEBT
Ranking SWDsRanking SWDs
Center for E-Business Technology
Copyright 2008 by CEBT
Indexing and Retrieval of SWDsIndexing and Retrieval of SWDs
Using traditional IR techniques
Reasoning over large collections of documents can be expensive
IR techniques have the advantage of being faster, while taking a somewhat more coarse view of the text
Including well researched method for ranking matches, computing similarity between documents
Using N-grams
Can result in a larger vocabulary
Inter-word relationships are preserved
Somewhat resistant to certain kinds of errors
Center for E-Business Technology
Copyright 2008 by CEBT
ConclusionsConclusions
Current web search engines
Do not work well with SWDs, as they are designed to work with natural languages and expect documents to contain unstructured text composed of words
Swoogle
A prototype crawler-based indexing and retrieval system for Semantic Web documents
Center for E-Business Technology
Copyright 2008 by CEBT
Evaluation and DiscussionEvaluation and Discussion
Pros Clear contribution on the method:
– How to discover potential SWDs
– How to rank SWDs
Cons Poor explanation about ranking algorithm
– The reason they differentiated between SWOs and SWDBs
– How the ranking formula(which are different depend on type of SWD) comes out
Discussion How can Semantic Web retrieval system process conflict
between SWDs
By ranking? Or by TF-IDF? Or else method?
Center for E-Business Technology
Copyright 2008 by CEBT
Current Status (@ 2005)Current Status (@ 2005)
Referenced from Li Ding et al.,
"Finding and Ranking Knowledge on the Semantic Web", Proceedings of the 4th International Semantic Web Conference, November 2005.
Tim Finin et al., "Swoogle: Searching for knowledge on the Semantic Web", AAAI 05 (intelligent systems demo), July 2005
System architecture Metadata creation
-> Digest– Computes metadata for
SWDs and semantic web
terms(SWTs) as well as
identifies relations among them
Center for E-Business Technology
Copyright 2008 by CEBT
Current Status (@ 2005)Current Status (@ 2005)
Size
SWDs : 135K -> 368K SWDs
SWOs : 13.29% of SWDs -> 1% of SWDs
Ranking SWDs and SWTs
Center for E-Business Technology