Upload
henry-townsend
View
219
Download
2
Tags:
Embed Size (px)
Citation preview
Introduction to Ontology
Barry SmithAugust 11, 2012
The problem of (big) data
Some questions
• How to find data?• How to understand data when you find it?• How to use data when you find it?• How to integrate with other data?• How to label the data you are collecting?• How to build a set of labels for a new domain that will
integrate well with labels used in neighboring domains?
Big problem: nearly all of this data is siloed
3
Sources• Examples of databases containing person data and data
pertaining to skills
PersonID SkillID
111 222
SkillID Name Description
222 Java Programming
ID SkillDescr
333 SQL
EmplID SkillName
444 Java
The problem: many, many silos
• DoD spends more than $6B annually developing a portfolio of more than 2,000 business systems and Web services
• these systems are poorly integrated• deliver redundant capabilities, • make data hard to access, foster error and waste• prevent secondary uses of data
https://ditpr.dod.mil/ Based on FY11 Defense Information Technology Repository (DITPR) data
5
6/
One road to a solution: Exploit the network effects of the Web
• You build a site.• Others discover the site and they link to it• The more they link to it, the more important and
well known the page becomes (this is what Google exploits)
• Your page becomes important, and others begin to rely on it
• Many people link to the data, use it• New ‘secondary uses’ of the data are discovered
With thanks to Ivan Herman
7
Unfortunately the Web is ruled by anarchy. However much we try to link web content
together à la google, we will still be left with many, many siloes.
Photo credit “nepatterson”, Flickr
8
To avoid silos, data must be available on the Web in a standard way.
Use “ontologies” to capture common meanings with logical definitions that are understandable to both humans and computers.
using a common language such as OWL (Web Ontology Language)
The idea of the Semantic Web
Annotate data using ontologies
Source Term Ontology LabelDb1.Name SE.SkillDb2.SkillDescr SE.ComputerSkillDb3.SkillName SE.ProgrammingSkillDb1.PersonID SE.PersonIDDb2.ID SE.PersonIDDb3.EmplID SE.PersonIDSE.ComputerSkill SE.SkillSE.ProgrammingSkill SE.ComputerSkill
Inconsistent and idiosyncratic terms used in source data are associated with single preferred labels from ontologies
Where we stand today• html demonstrated the power of the Web to
allow sharing of information • increasing availability of semantically enhanced
data• increasing power of semantic software to allow
automatic reasoning over online information• increasing use of OWL in attempts to break down
silos, and create useful integration of on-line data and information
11
Linked Open Data as of September 2010
Ontology success stories, and some reasons for failure
unfortunately this data is not really linked
13
Ontology success stories, and some reasons for failure
14
unfortunately this data is not really linked
The result: the more Semantic Technology is successful, they more it fails to achieve it goals
the very success of the approach leads to the creation of ever new controlled vocabularies , semantic silos – because multiple ontologies are being created in ad hoc ways
The Semantic Web framework as currently conceived yields minimal standardization
Creates semantic siloes
15
Basic Formal Ontology (BFO)
top-level architecture used in over 120 ontology projects world wide
Next tutorial in this series: August 18-19, 2012http://ncorwiki.buffalo.edu/index.php/Basic_Formal_Ontology_2.0
People will tell you, all you need is …
17
XML gives you: processable tagging + syntactic interoperability
RDF gives you: net-centricity (URIs for unique and consistent naming), linked data
OWL (Web Ontology Language) gives you: RDF + semantic interoperability, richer logic
Levels of coordinationbut these are just tools:
• they do not rule out stovepipes• they do not prevent redundant efforts• they do not imply high quality ontologies of
the sort that will support reasoningEven if we all speak Irish, thus does not mean that we all understand each other
18
Warning 1.• OWL implementation is not enough• the issues we face are not only logical, but
also sociological• they are the same issues already endemic in
the database world – database architecture is inflexible– database systems, once distributed, degrade very
quickly; create stovepipes, forking, siloes …• How to ensure coordinated ontology
development over time?
Suggested principles for an ontologist’s code of ethics
1. I hereby swear that I will reuse existing ontology content wherever possible
2. I hereby swear that whenever I reuse terms from an existing ontology, I will keep their original source IDs
3. I hereby swear that before releasing an ontology I will aggressively test it in multiple independent real-world applications
4. I hereby swear that before committing a new term and definition to an ontology I will always think first
Some governance principles• Information sharing: to avoid ontology redundancy and
inconsistency, there must be sharing of information at every stage
• Collaborative development: where ontology development needs overlap, the communities involved must either develop shared resources or agree to a division of labor
• Leverage of existing resources: ontology development should wherever possible involve reuse of existing ontologies.
• Guiding role of subject-matter experts, who should be involved in the construction and maintenance of all domain ontology content
Warning 2.Ontology is a multi-disciplinary enterprise, in which the same terms are used in conflicting ways by different communities of ontologies
• universal, type, kind, class• instance• concept, model• representation• datum
22
The ontology spectrum (data focus)glossary: A simple list of terms and their definitions.
data dictionary: Terms, definitions, naming conventions and representations of the data elements in a computer system.
data model (e.g. JC3IEDM): Terms, definitions, naming conventions, representations and the beginning of specification of the relationships between data elements.
taxonomy: A complete data model in an inheritance hierarchy where all data elements inherit their behaviors from a single "super data element".
ontology: A complete, machine-readable specification of a conceptualization = conceptual data model
23
The ontology spectrum (reality focus)glossary: A simple list of terms and their definitions.
controlled vocabulary: A simple list of terms, definitions and naming conventions to ensure consistency.
taxonomy: A controlled vocabulary in which the terms form of a hierarchical representation of the types and subtypes of entities in a given domain.
The hierarchy is organized by the is_a (subtype) relation
ontology: A controlled vocabulary organized by is_a and by further formally defined relations, for example part_of.
24
FMA
Pleural Cavity
Pleural Cavity
Interlobar recess
Interlobar recess
Mesothelium of Pleura
Mesothelium of Pleura
Pleura(Wall of Sac)
Pleura(Wall of Sac)
VisceralPleura
VisceralPleura
Pleural SacPleural Sac
Parietal Pleura
Parietal Pleura
Anatomical SpaceAnatomical Space
OrganCavityOrganCavity
Serous SacCavity
Serous SacCavity
AnatomicalStructure
AnatomicalStructure
OrganOrgan
Serous SacSerous Sac
MediastinalPleura
MediastinalPleura
TissueTissue
Organ PartOrgan Part
Organ Subdivision
Organ Subdivision
Organ Component
Organ Component
Organ CavitySubdivision
Organ CavitySubdivision
Serous SacCavity
Subdivision
Serous SacCavity
Subdivision
part_
of
is_a
Foundational Model of Anatomy25
In graph-theoretical terms:
Ontology Components:• alphanumeric IDs form nodes of the graph• each node is associated with some single term
(preferred label)• relationships between nodes, such as is_a form the
edges of the graph• definitions and synonyms are associated with each
node
26
Entity =def
anything which exists, including things and processes, functions and qualities, beliefs and actions, documents and software
27
A 515287 DC3300 Dust Collector Fan
B 521683 Gilmer Belt
C 521682 Motor Drive Belt
instances
universals28
Catalog vs. inventoryOntology vs. list of items in your warehouse
29
Warning 3.Do not confuse things with words and ideas
• Level 1: the entities in reality, both instances and universals
• Level 2: cognitive representations of this reality on the part of scientists ...
• Level 3: publicly accessible concretizations of these cognitive representations in textual and graphical artifacts
30
Ontology development
starts with: Level 2 = the cognitive representations of practitioners or researchers in the relevant domain
results in: Level 3 representational artifacts (comparable to maps, science texts, dictionaries)
31
Domain =def.
a portion of reality that forms the subject-matter of a single science or technology or mode of study;
proteomicsHIVdemographics...
32
Representation =def.
an image, idea, map, picture, name or description ... of some entity or entities
two kinds of representation:
analogue (photographs)
digital/composite/syntactically structured
33
Class =def.a maximal collection of particulars referred to by a general term
the class A =def. the collection of all particular A’s
where ‘A’ is a general term (e.g. ‘brother of Elvis fan’, ‘cell’)
Classes are on the same level as the instances which they contain
34
(Scientific) Ontology =def.
a representational artifact whose representational units (which may be drawn from a natural or from some formalized language) are intended to represent
1. universals in reality
2. those relations between these universals which obtain universally (= for all instances)
lung is_a anatomical structure
lobe of lung part_of lung
35
Ontology (science)
the science of the kinds and structures of objects, properties, events, processes and relations in every domain of reality
36