Upload
aims-agricultural-information-management-standards-fao-of-the-un
View
1.145
Download
0
Embed Size (px)
Citation preview
The role of Thesauriand Standard Vocabularies in linking data
Dr. Johannes Keizer
FAO of the United Nations
Office of Knowledge Exchange, Research and Extension
Knowledge and Capacity for Development
dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
Th
esau
rus W
ork
sh
op
–C
AS
Be
ijin
g, 2
01
0-1
0-2
2
The Development of the Internet
dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
Th
esau
rus W
ork
sh
op
–C
AS
Be
ijin
g, 2
01
0-1
0-2
2
“Closed” (“normal”) IT environments
Data sources carefully controlled.
Data formats “custom-defined” for an
application.
Linked data based on an “open world
mindset”
Integrating data from the open Web
Systems designed to incorporate new
information incrementally
By design, tolerance of incomplete
information
Open World Mindset
dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
Th
esau
rus W
ork
sh
op
–C
AS
Be
ijin
g, 2
01
0-1
0-2
2 The Linked Data Universe:
http://www.linkeddata.org (july 2009)
4
dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
Th
esau
rus W
ork
sh
op
–C
AS
Be
ijin
g, 2
01
0-1
0-2
2The Linked Data Universe:
http://www.linkeddata.org (july 2010)
dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
Th
esau
rus W
ork
sh
op
–C
AS
Be
ijin
g, 2
01
0-1
0-2
2
Example: BBC Wildlife Finder
dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
Th
esau
rus W
ork
sh
op
–C
AS
Be
ijin
g, 2
01
0-1
0-2
2 Humboldt Squid page, pulled together from a diversity of Linked Data
sources
Animal Diversity Web:
Nocturnal way of life
BBC TV Documentary
BBC News item
Wikipedia
dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
Th
esau
rus W
ork
sh
op
–C
AS
Be
ijin
g, 2
01
0-1
0-2
2
RDF– a grammar for the language of data
ResourcerelatedTo
ResourceA ResourceB
ResourcedescribedBy
ResourceA Some text
1. Describe resources using interrelated “statements” (“triples”).
2. Use URIs – unique, globally managed identifiers –
as the “words” of statements.
dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
Th
esau
rus W
ork
sh
op
–C
AS
Be
ijin
g, 2
01
0-1
0-2
2
•http://www.w3.org/2007/Talks/0221-Bangalore-IH/
RDF as a common format for merging data
dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
Th
esau
rus W
ork
sh
op
–C
AS
Be
ijin
g, 2
01
0-1
0-2
2 Finding things related to “genes” across
databases
Source: Joanne Luciano, Mitre, and the W3C HCLS IG
dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
Th
esau
rus W
ork
sh
op
–C
AS
Be
ijin
g, 2
01
0-1
0-2
2
Born as tools to assure consistency in the
indexing of library collections
Thesauri were based on “terms”, but terms
represented already concepts in a non
explicit way
Hierarchical and associative relationships
represented generic ontological domain
knowledge
Candidate building blocks for the semantic
web
Role of thesauri/concept schemes
dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
Th
esau
rus W
ork
sh
op
–C
AS
Be
ijin
g, 2
01
0-1
0-2
2 ..from thesaurus to Ontologies….
dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
Th
esau
rus W
ork
sh
op
–C
AS
Be
ijin
g, 2
01
0-1
0-2
2
around 30,000 concepts
600000 labels in around 20 languages.
one-stop shop for terminological knowledge
related to agriculture in general
a knowledge base of related concepts organized
in ontological relationships (hierarchical,
associative, equivalence)
Is a concept/term/string based system
Concepts may be organized in multiple categories.
AGROVOC today
dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
Th
esau
rus W
ork
sh
op
–C
AS
Be
ijin
g, 2
01
0-1
0-2
2 Semantic Relationships
Concept to
Concept
isA (hierarchy), isPestOf, hasPest
Concept to
Term
has_lexicalization
(links concepts to their lexical
realizations)
Term to
Term
isSynonymOf, isTranslationOf,
hasAcronym, hasAbbreviation
Term to
String
hasSpellingVariant, hasSingular
dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
Th
esau
rus W
ork
sh
op
–C
AS
Be
ijin
g, 2
01
0-1
0-2
2
The AGROVOC SKOS-XL Model
8171
1474
12332
skosxl:altLabel
skosxl:prefLabel
skos:broader
SKOS
Label
skos:broader
SKOS
Concept
rdf:type
rdf:type
6211
skos:broader
AgrovocConcept
Scheme
skos:topConceptOfskos:inScheme
SKOSConcept
Scheme
rdf:type
rdf:type
:bar
:foo
“corn”
“maize”
skosxl:literalForm
skosxl:literalForm
rdf:type
rdf:type
rdf:type
dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
Th
esau
rus W
ork
sh
op
–C
AS
Be
ijin
g, 2
01
0-1
0-2
2
http://www.w3.org/2004/02/skos/
dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
Th
esau
rus W
ork
sh
op
–C
AS
Be
ijin
g, 2
01
0-1
0-2
2 SKOS-XL output
<rdf:Description
rdf:about="http://aims.fao.org/aos/agrovoc/agrovocScheme"> <rdf:type
rdf:resource="http://www.w3.org/2004/02/skos/core#ConceptScheme"/></rdf
:Description><rdf:Description
rdf:about="http://aims.fao.org/aos/agrovoc/c_330829"> <rdf:type
rdf:resource="http://www.w3.org/2004/02/skos/core#Concept"/>
<skos:inScheme
rdf:resource="http://aims.fao.org/aos/agrovoc/agrovocScheme"/>
<skos:topConceptOf
rdf:resource="http://aims.fao.org/aos/agrovoc/agrovocScheme"/></rdf:Descri
ption><rdf:Description
rdf:about="http://aims.fao.org/aos/agrovoc/xl_en_1278479064610">
<literalForm xmlns="http://www.w3.org/2008/05/skos-xl#"
xml:lang="en">subjects</literalForm> <rdf:type
rdf:resource="http://www.w3.org/2008/05/skos-xl#Label"/></rdf:Description>
URI of AGROVOC concept
dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
Th
esau
rus W
ork
sh
op
–C
AS
Be
ijin
g, 2
01
0-1
0-2
2
AGROVOC EUROVOC UNBIS Relationship
http://aims.fao.
org/aos/agrovoc
/c_207
http://eurovoc
.europa.eu/21
9055
agroforestry skos:exactMatch
/ owl:sameAs
http://aims.fao.
org/aos/agrovoc
/c_4826
http://eurovoc
.europa.eu/22
0018
MILK skos:exactMatch
/ owl:sameAs
http://aims.fao.
org/aos/agrovoc
/c_12332
http://eurovoc
.europa.eu/21
9871
MAIZE skos:exactMatch
/ owl:sameAs
Linking vocabularies
dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
Th
esau
rus W
ork
sh
op
–C
AS
Be
ijin
g, 2
01
0-1
0-2
2http://agris.fao.org/agris-search/search/display.do?f=2004/ZA/ZA04002.xml;ZA2004000049
dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
Th
esau
rus W
ork
sh
op
–C
AS
Be
ijin
g, 2
01
0-1
0-2
2
http://aims.fao.org/aos/agrovoc/c_7825
http://eurovoc.europa.eu/218754
dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
Th
esau
rus W
ork
sh
op
–C
AS
Be
ijin
g, 2
01
0-1
0-2
2
http://eurovoc.europa.eu/
219871
Maize
skosxl: literalForm
Maize
http://aims.fao.org/ao
s/agrovoc/c_12332
AGROVOC
skosxl: literalFormMaize
http://aims.fao.org/aos/agrovoc/c_12332 owl:sameAs http://eurovoc.europa.eu/219871
owl:sameAs/exactMatch
http://agris.fao.org/agris-
search/search/display.do?f=1996
/TR/TR96001.xml;TR9600026
Linking data through common URIs
skosxl: literalForm
owl:sameAs/exactMatch
http://eur-
lex.europa.eu/LexUriServ/LexUriSe
rv.do?uri=OJ:L:2010:202:0011:001
5:EN:PDF
http://unbisnet.un.org:8080/ipac20/ipac.j
sp?session=128F308557F34.283092&pr
ofile=bib&uri=full=3100001~!685149~!1&
ri=1&aspect=subtab124&menu=search&
source=~!horizon
Maize
Eurovoc
UNBIS
dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
Th
esau
rus W
ork
sh
op
–C
AS
Be
ijin
g, 2
01
0-1
0-2
2
What are we doing with unstructured data?
• We have enormous amounts of unstructured
material
• Still most of the documents that we are
producing are mostly semantically
unstructured
• Human work to catalogue and index is
becoming always more rare
• We need machines to do automatic semantic
mark ups of text
• If machines are trained and based on concept
schemes, ther are able to do so
dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
Th
esau
rus W
ork
sh
op
–C
AS
Be
ijin
g, 2
01
0-1
0-2
2
dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
Th
esau
rus W
ork
sh
op
–C
AS
Be
ijin
g, 2
01
0-1
0-2
2
• Does Concept identification in unstructured
texts
• Uses Agrovoc as a controlled vocabulary
• Prototype under testing with excellent
results (entire repository of ICARDA
indexed)
• Will produce in future Structured RDF files
that can be used to link data like “open
Calais”
•
AgroTagger
dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
Th
esau
rus W
ork
sh
op
–C
AS
Be
ijin
g, 2
01
0-1
0-2
2
dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
Th
esau
rus W
ork
sh
op
–C
AS
Be
ijin
g, 2
01
0-1
0-2
2
dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
Th
esau
rus W
ork
sh
op
–C
AS
Be
ijin
g, 2
01
0-1
0-2
2
dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
Th
esau
rus W
ork
sh
op
–C
AS
Be
ijin
g, 2
01
0-1
0-2
2
Life Demo: Semantic mark ups:
http://viewer.opencalais.com/
http://agropedialabs.iitk.ac.in/Tagger/Agrotagger_text.php
dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
Th
esau
rus W
ork
sh
op
–C
AS
Be
ijin
g, 2
01
0-1
0-2
2 The concept scheme workbench
dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
Th
esau
rus W
ork
sh
op
–C
AS
Be
ijin
g, 2
01
0-1
0-2
2
Is a web-based working environment for managing the
AGROVOC Concept Server
Facilitate the collaborative editing of multilingual
terminology and semantic concept information
It includes administration and group management
features
It includes workflows for maintenance, validation and
quality assurance of the data pool
The CS is accessible freely to everybody to facilitates
collaborative editing
The workbench
dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
Th
esau
rus W
ork
sh
op
–C
AS
Be
ijin
g, 2
01
0-1
0-2
2 Group/Action/Status
GROUP
Non registered users
Term editors
Ontology editors
Validators
Publishers
Administrators
ACTION
concept-create
concept-delete
concept-edit
term-create
term-edit
term-delete
..........
STATUS
Proposed by guest
Proposed
Revised by guest
Revised
Validated
Published
Proposed deprecated
Deprecated
dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
Th
esau
rus W
ork
sh
op
–C
AS
Be
ijin
g, 2
01
0-1
0-2
2
3
2
Concept Life Cycle
GUEST
<concept-create>
Proposed by guest
VALIDATOR
<validates>
Validated
PUBLISHER
<publishes>
Published
TERM EDITOR
<concept-edit>
Revised
ADMINISTRATOR
<validates>
Published
ONTOLOGY EDITOR
<concept-delete>
Proposed deprecated
PUBLISHER
<validates>
Deprecated
dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
Th
esau
rus W
ork
sh
op
–C
AS
Be
ijin
g, 2
01
0-1
0-2
2 Modules
• Home
• Search
• Concept/Term
Management
• Relationship
Management
• Classification Scheme
Management
• Validation
• Consistency Check
• Import/Export
• User/Group Management
• Statistics/Preferences
3
3
dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
Th
esau
rus W
ork
sh
op
–C
AS
Be
ijin
g, 2
01
0-1
0-2
2
• by string: the user can specify if the system
should search by exact match, beginning with,
contains or fuzzy
• by URI or term code; or by range of term code
(e.g. between 123 and 9876)
• by classification schemes
• by creation or modification date
• by specific relationships (e.g. search all
concepts using the “has_pest”)
• by status, language
by notes/attributes
Search
3
4
dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
Th
esau
rus W
ork
sh
op
–C
AS
Be
ijin
g, 2
01
0-1
0-2
2
3
5
Graph Visualization
Java Applets
based touch
graph
Visualizes
concepts and
its
relationships
with other
concepts in
graphical view
dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
Th
esau
rus W
ork
sh
op
–C
AS
Be
ijin
g, 2
01
0-1
0-2
2
3
6
Web services
AGROVOC CS
WORKBENCHmaintain access
response
uses
SKOS
Triple
Store
Other
Applications
dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
Th
esau
rus W
ork
sh
op
–C
AS
Be
ijin
g, 2
01
0-1
0-2
2
AGROVOC Web Services
dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
Th
esau
rus W
ork
sh
op
–C
AS
Be
ijin
g, 2
01
0-1
0-2
2
Architecture of the System
dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
Th
esau
rus W
ork
sh
op
–C
AS
Be
ijin
g, 2
01
0-1
0-2
2
3
9
Front end Back end
Administrativ
e Database
(Mysql)
Protégé
Triple Store
(Mysql)
Middleware
Hibernate
Layer
Protégé
OWL API
Gilead
Intermediate
Layer
Web
Toolkit
(GWT)
Graph
Visualizatio
n
GWT
Incubator
Web
services
System Overview
dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
Th
esau
rus W
ork
sh
op
–C
AS
Be
ijin
g, 2
01
0-1
0-2
2
Giving it a try…….
A demo version of the AWB:
http://202.73.13.50:55234/agrovocdevv10d/ With all
functionalities, availabe to users for testing purpose.
Latest stable release version 1.0 : (read/write)
http://202.73.13.50:55381/agrovocv10i/
Latest stable release version 1.0 (Read only):
http://202.73.13.50:55481/agrovocv10i/ (Visitors only with only
view privilege)
dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
Th
esau
rus W
ork
sh
op
–C
AS
Be
ijin
g, 2
01
0-1
0-2
2
…and more: http://aims.fao.org
dr johannes keizer - FAO of the United Nations - knowledge and capacity for development
Th
esau
rus W
ork
sh
op
–C
AS
Be
ijin
g, 2
01
0-1
0-2
2
Thank You!