Upload
alice-owens
View
219
Download
0
Embed Size (px)
Citation preview
UMBCUMBCan Honors University in an Honors University in
MarylandMaryland 1
Finding knowledge, data and answers on
the Semantic WebTim Finin
University of Maryland, Baltimore Countyhttp://ebiquity.umbc.edu/resource/html/id/223/
Joint work with Li Ding, Anupam Joshi, Cynthia Parr,Joel Sachs, Andriy Parafiynyk and Lushan Han
http://creativecommons.org/licenses/by-nc-sa/2.0/ This work was partially supported by DARPA contract F30602-97-1-0215, NSF grants CCR007080 and IIS9875433
2
This talk• Motivation• Semantic Web background• Swoogle Semantic Web
search engine• Use cases and applications• Social Semantic Web• Conclusions
3
Google has made us smarter
4
But what about our agents?
tell
register
Agents still have a very minimal understanding of text and images.
5
But what about our agents?
A Google for knowledge on the Semantic Web is needed by software agents and programs
SwoogleSwoogle
Swoogle
Swoogle
SwoogleSwoogle
SwoogleSwoogle
Swoogle SwoogleSwoogle
SwoogleSwoogle
SwoogleSwoogle
tell
register
6
This talk• Motivation
• Semantic Web background• Swoogle Semantic Web
search engine• Use cases and applications
• Social Semantic Web
• Conclusions
7
Brief history of the Semantic WebTim Berners-Lee’s original 1989
WWW proposal described a web of relationships among namedobjects unifying many info.
management tasks.
• Guha’s MCF (~94)
• XML+MCF=>RDF (~96)
• Semantic Web coined (~97)
• RDF+OO=>RDFS (~99)
• RDFS+KR=>DAML+OIL (00)
• W3C’s SW activity (01)
• W3C’s OWL (03)
• SPARQL (06)
• Rules, RDFa, ….
http://www.w3.org/History/1989/proposal.html
8
Interest is high
• Interest in industry, government and VCs is high
• RDF is in Adobe’s products, Oracle 10g and 11g, Microsoft Vista, and Yahoo’s food portal
• Several high-visibility startups use RDF– Joost (internet TV), Teranode (Bioinformatics),
Garlik (personal info monitoring)
• And, if you want more evidence that interest is high …
9
$1795
$695CD Only
10
What do we mean by “Semantic Web”Semantic
Web
explicitsemantics
KR based
RDF+OWLother
structuredFreebaseGoogle Base
ad hocapproachesMicroformats
Tags
FolksonomiesXML
“a smarter Google”
“NLP”PowerSet
topic maps
11
RDF is the first SW language
<rdf:RDF ……..> <….> <….></rdf:RDF>
XML EncodingGraph
stmt(docInst, rdf_type, Document)stmt(personInst, rdf_type, Person)stmt(inroomInst, rdf_type, InRoom)stmt(personInst, holding, docInst)stmt(inroomInst, person, personInst)
Triples
RDFData Model
Good for Machine
Processing
Good For HumanViewing
Good For Reasoning
• RDF is a simple language for building graph based representations• Grounded in web standards• With terms to support ontologies, description logic, rules and much of first
order logic
12
IMHO
• Better NLP will help search engines, it’s a long term, incremental project
• We need an well-defined and extensible representation system for explicit knowledge
• It should be backed by open, non-proprietary standards supported by industry, Government and other interested parties
• The W3C approach is not perfect
• But “The perfect is the enemy of the good.”
• “Semantic Web” vs. “semantic web”
13
This talk• Motivation• Semantic Web background• Swoogle Semantic Web
search engine• Use cases and applications• Social Semantic Web• Conclusions
14
•http://swoogle.umbc.edu/•Running since summer 2004•2.1M RDF docs, 420M triples, 10K
ontologies,15K namespaces, 1.5M classes, 185K properties, 49M instances, 800 registered users
•http://swoogle.umbc.edu/•Running since summer 2004•2.1M RDF docs, 420M triples, 10K
ontologies,15K namespaces, 1.5M classes, 185K properties, 49M instances, 800 registered users
15
Analysis
Index
Discovery
IR Indexer
Search Services
Semantic Webmetadata
Web Service
Web Server
Candidate URLs
Bounded Web CrawlerGoogle Crawler
SwoogleBot
SWD Indexer
Ranking
document cache
SWD classifier
human machine
html rdf/xml
…
the WebSemantic Web
Information flow Swoogle‘s web interface
Legends
Swoogle Architecture
16
A Hybrid Harvesting Framework
Submissions & pings
RDF crawlingBounded HTML crawlingMeta crawling
Seeds M Seeds H Seeds R
SwoogleSampleDataset
Inductive learner
the Web
Google API call crawl crawl
true
would
19
This talk• Motivation
• Semantic Web background• Swoogle Semantic Web
search engine• Use cases and applications
• Social Semantic Web
• Conclusions
20
Applications and use cases
Supporting Semantic Web developers– Ontology designers, vocabulary discovery, who’s using
my ontologies or data?, use analysis, errors, statistics, etc.
Searching specialized collections– Spire: aggregating observations and data from biologists
– InferenceWeb: searching over and enhancing proofs
– SemNews: Text Meaning of news stories
Supporting SW tools– Triple shop: finding data for SPARQL queries
1
2
3
21
1
22
By default, ontologies are ordered by their ‘popularity’, but they can also be ordered by recency or size.
80 ontologies were found that had these three terms
Let’s look at this one
23
Basic MetadatahasDateDiscovered: 2005-01-17 hasDatePing: 2006-03-21 hasPingState: PingModified type: SemanticWebDocument isEmbedded: false hasGrammar: RDFXML hasParseState: ParseSuccess hasDateLastmodified: 2005-04-29 hasDateCache: 2006-03-21 hasEncoding: ISO-8859-1 hasLength: 18K hasCntTriple: 311.00 hasOntoRatio: 0.98 hasCntSwt: 94.00 hasCntSwtDef: 72.00 hasCntInstance: 8.00
24
Who uses this ontology and how do they access it?
25
rdfs:range was used 41 times to assert a value.
owl:ObjectProperty was instantiated 28 times
time:Cal… defined once and used 24 times (e.g., as range)
26
These are the namespaces this ontology uses. Clicking on one
shows all of the documents using the namespace.
All of this is available in RDF form for the
agents among us.
27
Here’s what the agent sees. Note the swoogle and wob (web of belief) ontologies.
28
We can also search for terms (classes, properties) like terms for “person”.
29
10K terms associated with “person”! Ordered by use.
Let’s look at foaf:Person’s metadata
30
Metadata stored for a term is information about it’s
definition – both what and by whom
31
10K terms associated with “person”! Ordered by use.
32
How do other terms use foaf:Person? 100 documents assert that
foaf:publication is a property of a foaf:Person
33
87K documents used foaf:gender with a foaf:Person instance as the subject
34
3K documents used dc:creator with a foaf:Person instance as the object
35
Swoogle’s archive saves every version of a SWD it’s seen.
36
37
2
An NSF ITR collaborative project with•University of Maryland, Baltimore County •University of Maryland, College Park•U. Of California, Davis•Rocky Mountain Biological Laboratory
An NSF ITR collaborative project with•University of Maryland, Baltimore County •University of Maryland, College Park•U. Of California, Davis•Rocky Mountain Biological Laboratory
38
An invasive species scenario• Nile Tilapia fish have been found in a California lake.
• Can this invasive species thrive in this environment?• If so, what will be the likely
consequences for theecology?
• So…we need to understandthe effects of introducingthis fish into the food webof a typical California lake
39
Food Webs• A food web models the trophic (feeding)
relationships between organisms in an ecology– Food web simulators are used to explore the
consequences of changes in the ecology, such as the introduction or removal of a species
– A locations food web is usually constructed from studies of the frequencies of the species found there and the known trophic relations among them.
• Goal: automatically construct a food web for a new location using existing data and knowledge
• ELVIS: Ecosystem Location Visualization and Information System
40
East River Valley Trophic Web
http://www.foodwebs.org/
41
Species List ConstructorClick a county, get a species list
42
The problem
• We have data on what species are known to be in the location and can further restrict and fill in with other ecological models
• But we don’t know which of these the Nile Tilapia eats of who might eat it.
• We can reason from taxonomic data (similar species) and known natural history data (size, mass, habitat, etc.) to fill in the gaps.
43
44
Food Web ConstructorPredict food web links using database and taxonomic reasoning.
In an new estuary, Nile Tilapia could compete with ostracods (green) to eat algae. Predators (red) and prey (blue) of ostracods may be affected
45
Evidence Provider
46
Status• ELVIS (Ecosystem Location Visualization and
Information System) as an integrated set of web services for constructing food webs for a given location.
• Background ontologies– SpireEcoConcepts: concepts and properties to represent food
webs, and ELVIS related tasks, inputs and outputs
– ETHAN (Evolutionary Trees and Natural History) Concepts and properties for ‘natural history’ information on species derived from data in the Animal diversity web and other taxonomic sources. 250K classes on plants and animals
• Under development– Connect to visualization software
– Connect to triple shop to discover more data
47
Supporting SW Tools
• Semantic Web applications can access Swoogle through a REST-based Web interface or via SQL.
• Two examples:– A system to help scientists construct datasets from
RDF documents on the Web– Tools to manage Semantic Web data in Blogs and
other forms of social media
3
48
UMBC Triple Shop• http://sparql.cs.umbc.edu/• Online SPARQL RDF query processing with several
interesting features• Automatically finds SWDs for give queries using
Swoogle backend database• Datasets, queries and results can be saved, tagged,
annotated, shared, searched for, etc.• RDF datasets as first class objects
– Can be stored on our server or downloaded– Can be materialized in a database or
(soon) as a Jena model
49
What’s SPARQL?• SPARQL is the standard language (& protocol)
for querying RDF graphs• Think: SQL for RDF
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>SELECT ?person ?name ?email FROM <http://rdf.example.org/people.rdf>WHERE { ?person a foaf:Person . ?person foaf:name ?name . OPTIONAL {?person foaf:mbox ?email} . }
52
Who knows Anupam Joshi?Show me their names, email address and pictures
53
The UMBC ebiquity site publishes lots of RDF data, including FOAF profiles
54
No FROM clause!
PREFIX foaf: <http://xmlns.com/foaf/0.1/>SELECT DISTINCT ?p2name ?p2mbox ?p2pixFROM ???WHERE { ?p1 foaf:surname "Joshi" . ?p1 foaf:firstName “Anupam" . ?p1 foaf:mbox ?p1mbox . ?p2 foaf:knows ?p3 . ?p3 foaf:mbox ?p1mbox . ?p2 foaf:name ?p2name . ?p2 foaf:mbox ?p2mbox . OPTIONAL { ?p2 foaf:depiction ?p2pix } . }ORDER BY ?p2name
55
Enter query w/oFROM clause!
log in
specify dataset
56
We want to create a reusable dataset
57
Find RDF data using terms found in the query
That also satisfy some simple constraints (e.g., for trust)
58
302 RDF documents were found that might have useful data.
302 RDF documents were found that might have useful data.
59
We’ll select them all and add them to the current dataset.
We’ll select them all and add them to the current dataset.
60
We’ll run the query against this dataset to see if the results are as expected.
We’ll run the query against this dataset to see if the results are as expected.
61
The results can be produced in any of several formats
The results can be produced in any of several formats
62
63
Looks like a useful dataset. Let’s save it and also materialize it the TS triple store.
Looks like a useful dataset. Let’s save it and also materialize it the TS triple store.
An extension will let us ask that it be automatically updated when constituents change
An extension will let us ask that it be automatically updated when constituents change
65
We can also annotate, save and share queries.
We can also annotate, save and share queries.
66
This talk• Motivation• Semantic Web background• Swoogle Semantic Web
search engine• Use cases and applications• Social Semantic Web• Conclusions
67
• Social media sites have become thebiggest source of new content on the Web
• Blogs, Wikis, Photo sites, forums, etc.• Accounting for ~1/3 of new Web content
69
• Social media sites have embraced newways of letting users add semanticinformation
• Showing users the potential of semantics
70
Social Media and the Semantic Web• Many are exploring how Semantic Web technology
can work with social media
• Social media like blogs are typically temporally organized– valued for their timely and dynamic information!
• If static pages form the Web’s long term memory, then the Blogosphere is its stream of consciousness
• Maybe we can (1) help people publish data in RDF on their blogs and (2) mine social media sites for useful information
71
The OWL icon links to the data
in RDF
A BioBlitz involves going out to an area and
recording every organism you see
72
73
A good Semantic Web opportunity• We want to make it easy for scientists to enter
and collect information from social media
–Professionals, students and amateurs!
• Two early examples
–SPOTter – a tool to add Semantic Web data to blogs
–Splickr – a system to mine Flickr for images of organisms
74
SPOTter: SPire Observation Tool
• We’ve developed some simple components to help people add RDF data to blogs and ping Swoogle to get it indexed.
• SPOTter is an initial prototype that uses the ETHAN ontology and is being used in some BioBlitz activities with students.
• We’re working toward a version that uses Twitter so that people can make the blog entries from the cell phones via SMS– The SPOTter agent will get the entries (via RSS)
and index the data
75
SPOTter button
Once entered, the data isembedded into the blog postand Swoogle is pinged toindex it
76
Prototype SPOTterSearch engine
• We can draw a bounding box onThe map and find observations
• An RSS feed provided for eachquery
77
Flickr • The Flickr “photo sharing” site has millions of
photographs– Many of plants and animals
• Most of them have descriptions, timestamps, tags and even geo-tags– Flickr has even introduced “machine tags” that can
be mapped into RDF• Any Flickr users (humans or bots) can add comments
and annotations• There’s a good API• It could be a good source of ecological information
78
79
80
Results for people and machines
81
This talk• Motivation• Semantic Web background• Swoogle Semantic Web
search engine• Use cases and applications• Social Semantic Web• Conclusions
82
Conclusion• The web will contain the world’s knowledge in
forms accessible to people and computers– We need better ways to discover, index, search and
reason over SW knowledge
• SW search engines address different tasks than html search engines– So they require different techniques and APIs
• Swoogle like systems can help create consensus ontologies and foster best practices
• Social media provide new challenges and opportunities for the Semantic Web
83
http://ebiquity.umbc.edu/Annotated
in OWL
For more information