Upload
elijah
View
22
Download
0
Embed Size (px)
DESCRIPTION
S w o o g l e. search and metadata for the semantic web. Presented by eBiquity group, UMBC CIKM’04, Nov 12, 2004. Partial research support was provided by DARPA contract F30602-00-0591 and by NSF by awards NSF-ITR-IIS-0326460 and NSF-ITR-IDM-0219649. Outline. Motivation Concepts Demo - PowerPoint PPT Presentation
Citation preview
@
Presented by eBiquity group, UMBC
CIKM’04, Nov 12, 2004
SwoogleSwooglesearch and metadata for the semantic web
Partial research support was provided by DARPA contract F30602-00-0591 and by NSF by awards NSF-ITR-IIS-0326460 and NSF-ITR-IDM-0219649.
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
2
@SwoogleSwoogle
Concepts SummaryStatusArchitectureDemoMotivation
Outline Motivation Concepts Demo Architecture
document discovery metadata creation ontology rank
Status Summary
http://swoogle.umbc.edu/
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
3
@SwoogleSwoogle
Concepts SummaryStatusArchitectureDemoMotivation
Motivation
(Google + Web) has made us all smarter something similar is needed by people and software
agents for information on the semantic web
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
4
@SwoogleSwoogle
Concepts SummaryStatusArchitectureDemoMotivation
Motivation – Common Questions Find an ontology
What are the ontologies about “time” ? Shall I use an existing ontology or create one?
Find instance data Show me the instances of a class “http://foo.com/Person”? Gather relevant information for my application.
Characterize the Semantic Web How many RDF documents are online? What are the most popular ontologies ? What graph properties does the semantic web have? Does namespace URI link to the corresponding ontology?
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
5
@SwoogleSwoogle
Concepts SummaryStatusArchitectureDemoMotivation
The Role of Swoogle in Semantic Web
Semantic WebServices
Data Service
Software Agents, Applications
SW data service
database(Web) document
RDF document
usesuses
Directory/Digest Service
Service Finder
digestsdigests
searches
Data Finder Swoogle
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
6
@SwoogleSwoogle
Concepts SummaryStatusArchitectureDemoMotivation
Related work Ontology based annotation & search
Annotate web documents SHOE (UMCP, 1997) Ontobroker (AIFB, karlsruhe, 1998), WebKB (Martin & Eklund, 1999), QuizRDF (BT,2002)
Annotate proper reference & relations CREAM (AIFB,2003)
Ontology repositories Ontology level
DAML Ontology Library Schema Web SemWebCentral
Term level W3C’s Ontaria (2004)
Ontology management systems Stanford’s Ontolingua IBM’s Snobase
Based on both ontology and instance document
Automated discovery
Search and rank ontologies and terms
Digest but not store
Create metadata based on RDF and OWL semantics
Provide services to both human and software agents
Swoogle aims to be a Google-like online ontology repository
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
7
@SwoogleSwoogle
Concepts SummaryStatusArchitectureDemoMotivation
Concepts Document
A Semantic Web Document (SWD) is an online document written in semantic web languages (i.e. RDF and OWL).
An ontology document (SWO) is a SWD that contains mostly term definition (i.e. classes and properties). It corresponds to T-Box in Description Logic.
An instance document (SWI or SWDB) is a SWD that contains mostly class individuals. It corresponds to A-Box in Description Logic.
Term A term is a non-anonymous RDF resource which is the URI reference of
either a class or a property.
Individual An individual refers to a non-anonymous RDF resource which is the URI
reference of a class member.
In swoogle, a document D is a valid SWD iff. JENA* correctly parses D and produces at least one triple.
*JENA is a Java framework for writing Semantic Web applications. http://www.hpl.hp.com/semweb/jena2.htm
rdf:typerdfs:Class
foaf:Person
rdf:typefoaf:Person
http://.../foaf.rdf#finin
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
8
@SwoogleSwoogle
Concepts SummaryStatusArchitectureDemoMotivation
Concepts Example
wordNet:Agent
rdf:typerdfs:Class
rdfs:subClassOf
foaf:Person
http://xmlns.com/foaf/1.0/
foaf:mbox
rdfs:domain
rdf:typerdf:Property
Property
Class
SWO
http://foo.com/foaf.rdf#finin
foaf:mbox
rdf:type
foaf:Person
http://foo.com/foaf.rdf#finin
SWI
Individual
SWD
Term
NOTE: Qualified Names (QName) are used to shorten well-known namespaces as follows
rdf: => http://www.w3.org/1999/02/22-rdf-syntax-ns#" rdfs: => http://www.w3.org/2000/01/rdf-schema foaf: => http://xmlns.com/foaf/1.0/wordNet: => http://xmlns.com/wordnet/1.6/
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
9
@SwoogleSwoogle
Concepts SummaryStatusArchitectureDemoMotivation
Demo
Find “Time” Ontology(Swoogle Search)1
2
3
4
Digest “Time” Ontology• Document view• Term view
Find Term “Person”(Ontology Dictionary)
Digest Term “Person”• Class properties• (Instance) properties
5 Swoogle Statistics
Find “Time” Ontology
We can use a set of keywords to search ontology. For example, “time, before, after” are basic concepts for a “Time” ontology.
Demo1
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
11
@SwoogleSwoogle
Concepts SummaryStatusArchitectureDemoMotivation
Usage of Terms in SWD
foaf:mbox
rdf:type
foaf:Person
http://www.cs.umbc.edu/~finin/foaf.rdf
wordNet:Agent
rdf:typerdfs:Class
rdfs:subClassOf
foaf:Person
http://xmlns.com/foaf/1.0/
foaf:mbox
rdfs:domain
rdf:typerdf:Property
populated Class
defined Class
populated Property
defined Property
http://foo.com/foaf.rdf#finin
foaf:mbox
rdf:type
foaf:Person
http://foo.com/foaf.rdf
defined Individual
Digest “Time” Ontology (term view)
Demo2(a)
………….
TimeZone
before
intAfter
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
13
@SwoogleSwoogle
Concepts SummaryStatusArchitectureDemoMotivation
Document Metadata Web document metadata
When/how discovered/fetched Suffix of URL Last modified time Document size
SWD metadata Language features
OWL species RDF encoding
Statistical features Defined/used terms Declared/used namespaces Ontology Ratio
Ontology Rank
Ontology annotation Label Version Comment
Related Relational Metadata Links to other SWDs
Imported SWDs Referenced SWDs Extended SWDs Prior version
Links to terms Classes/Properties
defined/used
Digest “Time” Ontology (document view)
Demo2(b)
Find Term “Person”Demo3
Not capitalized! URIref is case sensitive!
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
16
@SwoogleSwoogle
Concepts SummaryStatusArchitectureDemoMotivation
Term Metadata: An integrated definition
Class Definition• rdfs:subClassOf -- foaf:Agent• rdfs:label – “Person”
Properties (from SWI)• foaf:name• dc:title
Properties (from SWO)• foaf:mbox• foaf:name
foaf:name
foaf:mbox
rdfs:domain
rdfs:domain
Onto 1
owl:Classrdf:type
“Person”rdfs:label
foaf:Agent
rdfs:subClassOf
Onto 2
foaf:name
rdf:type
“Tim Finin”
SWD3
foaf:Person
Digest Term “Person”Demo4
167 different properties
562 different properties
Demo5 Swoogle
Statistics
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
19
@SwoogleSwoogle
Concepts SummaryStatusArchitectureDemoMotivation
Swoogle Architecture
metadata creation
data analysis
interface
SWD discovery
SWD MetadataWeb Service
Web Server
SWD Cache
The Web
The WebCandidate
URLs Web Crawler
SWD Reader
IR analyzer SWD analyzer
Agent Service
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
20
@SwoogleSwoogle
Concepts SummaryStatusArchitectureDemoMotivation
1. SWD Discovery Swoogle uses three crawlers to discover likely SWD
URLs A Google Crawler uses Google to find URLs using
keywords: http://www.w3.org/2000/01/rdf-schema,... File type suffices: .rdf, .owl
A Focused Crawler crawls through HTML files recursively within the given website.
A SWD Crawler crawls through SWDs and discover URLs according to term semantics.
To determine the likely SWD URLs: Non-swd extension filter: .jpg, .mp3, and etc. Protocol filter: file://, urn:, and etc. Namespace of RDF resources in SWD
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
21
@SwoogleSwoogle
Concepts SummaryStatusArchitectureDemoMotivation
2. Metadata Creation Document metadata
General metadata SWD metadata Ontology metadata
Term Metadata (definition) Class property (Instance) property: i.e. class-property bond
Relational metadata
Term Document
Term rdfs:subClassOf, rdfs:domain… rdfs:seeAlso, …
Document Uses, Defines,… owl:imports,…
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
22
@SwoogleSwoogle
Concepts SummaryStatusArchitectureDemoMotivation
2.1 Ontology Ratio Why?
The fuzzy distinction between ontology and instance document Given a SWD foo, and let
C(foo): the set of classes defined in foo P(foo): the set of properties defined in foo I(foo): the set of instances defined in foo
Ontology Ratio as a heuristic to do the classification 0: pure SWI 1: pure SWO > 0.8: foo is said to be an ontology.
)()()(
)()()(ontology fooIfooPfooC
fooPfooCfooRatio
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
23
@SwoogleSwoogle
Concepts SummaryStatusArchitectureDemoMotivation
2.2 Relational Metadata Inter-document relation
rdfs:seeAlso IMport (IM) e.g. owl:import Similar/Equal SWD
Inter-term relation EXtension (EX) e.g. rdfs:subClassOf use-TerM (TM) e.g. rdf:range use-INdividual (IN) e.g. owl:sameAs Prior Version (PV, IPV, CPV)
Generalized inter-document relations Generalized from individual level relation Capture more relations while with less complexity
Usage Link SWDs Ontology rank
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
24
@SwoogleSwoogle
Concepts SummaryStatusArchitectureDemoMotivation
SWOs
SWIs
HTMLdocuments
Images
Audiofiles
Videofiles
3. Data analysis: Ranking SWD Why?
Ranking captures page importance and popularity
Ranking has been proven useful in HTML search.
SWD is different from HTML and has more semantics
So, a new SWD ranking mechanism is needed !
Related ideas? Google’s PageRank Kleinberg’s HITS
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
25
@SwoogleSwoogle
Concepts SummaryStatusArchitectureDemoMotivation
3.1 Random surfer model (PageRank) How PageRank is computed?
page A’s rank is
Where {Ti } are the pages that link to A C(X): # of page X’s out links d is a damping factor (e.g., 0.85)
Compute by iterating until converge
Uniform probability of following any link is convention in the Web but not in the SW Links have semantics that influence the
probability of following them Rational users read an ontology and all
ontologies it referenced.
Jump to a random page
Follow arandom link
bored?
no
yes
read page
n
i i
i
TC
TPddAP
1
1
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
26
@SwoogleSwoogle
Concepts SummaryStatusArchitectureDemoMotivation
3.2 Rational Random Surfer Model Weighted random behavior
Rational behavior Rank of a SWI
Rank of a a SWO
Jump to a random page
Follow arandom link
bored?
noyes
read page
Read referenced
SWOs
SWO?
yesno
m
j
AXilinksi
n
i
AjXifXiflow
lweightAXiflow
Xiflow
AXiflowXirawPRddArawPR
1
),(
1
),()(
)(),(
)(
),()()1()(
)()( ArawPRAPR
)(
)()(ATCXi
XirawPRAPR
where TC(A) is transitive closure of SWOs referencing A.
1
2
1
2
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
27
@SwoogleSwoogle
Concepts SummaryStatusArchitectureDemoMotivation
3.3 Ontology Rank Example
foaf:mbox
rdf:type
foaf:Person
http://www.cs.umbc.edu/~finin/foaf.rdfwordNet:Person
rdf:type rdfs:Class
rdfs:subClassOf
foaf:Person
http://xmlns.com/foaf/1.0/
TM
TM
TM
http://www.w3.org/2000/01/rdf-schema
rdfs:subClassOf
rdf:Property
rdf:type
http://xmlns.com/wordnet/1.6/
rdfs:Classrdf:type
wordNet:Individualrdfs:subClassOf
wordNet:Person
EX
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
28
@SwoogleSwoogle
Concepts SummaryStatusArchitectureDemoMotivation
3.3 Ontology Rank Example (cont’d)
http://www.cs.umbc.edu/~finin/foaf.rdf
http://xmlns.com/wordnet/1.6/
http://xmlns.com/foaf/1.0/
EX
TM
TM
TM
http://www.w3.org/2000/01/rdf-schema
rawPR =0.2rawPR =0.2
rawPR =100rawPR =100
rawPR =3rawPR =3
rawPR =300rawPR =300
PR =0.2PR =0.2
PR =100PR =100
PR =103PR =103
PR =403PR =403
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
29
@SwoogleSwoogle
Concepts SummaryStatusArchitectureDemoMotivation
Current Status Swoogle Watch reported (Nov 7, 2004)
40 M triples 270 K SWDs: 4k ontologies 144 K terms: 91K classes & 51K properties
Ongoing work Ontology Dictionary Swoogle Statistics Web Service interface (see Swoogle website) IR with the Semantic Web (Content search)
Character N-Grams Bag of URIrefs Swangling
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
30
@SwoogleSwoogle
Concepts SummaryStatusArchitectureDemoMotivation
Summary
Swoogle (Mar, 2004)Swoogle (Mar, 2004)
Swoogle2 (Sep, 2004)Swoogle2 (Sep, 2004)
Swoogle3Swoogle3
Automated SWD discovery SWD metadata creation and search Ontology rank (rational surfer model) Swoogle watch Web Interface
Ontology dictionary Swoogle statistics Web service interface (WSDL) Bag of URIref IR search
Better crawl & refresh strategies More metadata (ontology mapping) More IR features Better web service interfaces Capture and store all triples More reasoning
2005
2004
Swoogle, cikm'04 -- http://swoogle.umbc.edu/
31
@SwoogleSwoogle
Concepts SummaryStatusArchitectureDemoMotivation
The End
Website: http://swoogle.umbc.edu Slides at: http://ebiquity.umbc.edu/v2.1/resource/html/id/66/ Demo: http://ebiquity.umbc.edu/v2.1/resource/html/id/65/