Upload
yamka
View
23
Download
0
Tags:
Embed Size (px)
DESCRIPTION
eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Ispra, Italy January 17, 2006. Bruce Bargmeyer, Lawrence Berkley National Laboratory University of California Tel: +1 510-495-2905 [email protected]. XMDR Project Collaboration. - PowerPoint PPT Presentation
Citation preview
1
eXtended Metadata Registry (XMDR)
Interagency/International Cooperation on EcoinformaticsIspra, Italy
January 17, 2006
Bruce Bargmeyer, Lawrence Berkley National LaboratoryUniversity of CaliforniaTel: +1 [email protected]
2
XMDR Project Collaboration
Collaborative, interagency effort EPA, USGS, NCI, Mayo Clinic, DOD, LBNL
…& others Draws on and contributes to
interagency/International Cooperation on Ecoinformatics
Involves Ecoterm, international, national, state, local government agencies, other organizations as content providers and potential users
Interacts with many organizations around the world through ISO/IEC standards committees
3
XMDR Project Results:Bootstrapping Semantic Computing
Design for next generation metadata registries—expressed as a standard
XMDR Prototype, open source software Content loaded in prototype: millions of
concepts, terms, and relations between concepts.
Demonstrations for healthcare and the environment
4
Metadata Registry Extensions
Register (and manage) any semantics that are useful for managing data. E.g., this may include registering not only permissible
values (concepts), definitions, but may extend to registration of the full concept systems in which the permissible values are found.
E.g., may want to register keywords, thesauri, taxonomies, ontologies, axiomatized ontologies….
Support traditional data management and data administration
Lay Foundation for semantic computing: Semantics Service Oriented Architecture, Semantic Grids, Semantics based workflows, Semantic Web ….
5
Where have we been? Where are we planning to go?
System manuals
Data dictionaries
11179 E1
11179 E3
XML & related standards
Semantic grids
11179 E2
Semantics services (SSOA)
Complex semantics management
Data engineering
Data Standards XMDR Project
Semantics:
Semantic Web
Data + ontology lifecyclemanagement
Termino
logies
, onto
logies
Data Management/Data Administration
6
XMDR Draws Together
Metadata Registry
Terminology Thesaurus Themes
DataStandards
Ontology GEMET
StructuredMetadata
UsersUsers
MetadataRegistries
Terminology
CONCEPT
Referent
Refers To Symbolizes
Stands For
“Rose”,“ClipArt”
7
Concept System Store
Metadata Registry
Concept System Thesaurus Themes
DataStandards
Ontology GEMET
StructuredMetadata
UsersUsers
Concept systems:KeywordsControlled VocabulariesThesauriTaxonomiesOntologiesAxiomatized Ontologies
(Essentially graphs: node-relation-node + axioms)
}
8
Management of Concept Systems
Metadata Registry
Concept System Thesaurus Themes
DataStandards
Ontology GEMET
StructuredMetadata
UsersUsers
Concept system:RegistrationHarmonization StandardizationAcceptance (vetting)Mapping (correspondences)
}
9
Life Cycle Management
Metadata Registry
Concept System Thesaurus Themes
DataStandards
Ontology GEMET
StructuredMetadata
UsersUsers
Life cycle management:Data andConcept systems(ontologies)
}
10
Grounding Semantics
Metadata Registry
Concept System Thesaurus Themes
DataStandards
Ontology GEMET
StructuredMetadata
UsersUsers
MetadataRegistries Semantic Web
RDF TriplesSubject (node URI)Verb (relation URI)Object (node URI)
Ontologies
11
Ontology EditorProtege11179 OWL Ontology
XMDR Prototype Architecture: Initial Modules
MetadataValidator
AuthenticationService
MappingEngine
RegistryExternalInterface
Generalization Composition (tight ownership) Aggregation (loose ownership)
Jena, Xerces
Java
RetrievalIndex
FullTextIndex
Lucene
LogicBasedIndex
Jena, OWI KSRacer
RegistryStore
WritableRegistryStore
Subversion
12
Ontology EditorProtege
11179 OWL Ontology
XMDR Prototype Architecture: Initial Implemented Modules
RegistryExternalInterface
Generalization Composition (tight ownership) Aggregation (loose ownership)
Java
RetrievalIndex
FullTextIndex
Lucene
LogicBasedIndexJena
Racer,etc.
RegistryStore
WritableRegistryStore
Subversion
13
UML is Used for 11179 Metamodel, XMDR uses OWL, RDF & XML Schema
OWL XMDROntology &annotations
XMDR’s Relax NG Schema
XMDRXML Schema
UML11179Metamodel
11179 Relational Schema
Relational Metadata
RDF Spec
TRang
XML SchemaLanguage spec XML Objects
Types &Cardinalities
What things go in own files?Which property direction stored?Sequential ordering of properties
Triples: binarylabeled relationships
14
Refined XMDR Subclasses Improve Organization & Enable Inference
15
XMDR Example Content Loaded fromDiverse Sources via LexGrid & XSLT
Original Source A
Lexgrid Source A
XSLT script
Harold Solbrig (Mayo Clinic)
Concept System A
A Concepts
A Relationships
Content loaded to date: 2.7 million triples
16
XMDR Content List (partial)
NBII Biocomplexity Thesaurus
NCI Thesaurus National Cancer Institute Thesaurus
NCI Data Elements (National Cancer Institute Data Standards Registry
UMLS (non-proprietary portions)
GEMET (General Multilingual Environmental Thesaurus)
EDR Data Elements (Environmental Data Registry)
USGS Geographic Names Information System (GNIS) HL7 Terminology, Data Elements
Mouse Anatomy
GO (Gene Ontology)
EPA Web Registry Controlled Vocabulary
BioPAX Ontology
NASA SWEET Ontologies
…
17
NASA-JPL Semantic Web for Earth and Environmental Terminology
SWEET written in OWL ontology language (W3C) Can view with Internet Explorer 5+, Netscape 7+, etc. Can also use OWL-specific tools (e.g., SWOOP, Protégé)
Terms in other taxonomies can be mapped to SWEET using Global Change Master Directory (GCMD) CF Standard Names
http://sweet.jpl.nasa.gov/ontology/
–Earth Realms–Physical Phenomena
(any transient feature)–Physical Processes–Physical Properties–Physical Substances
–Sun Realms–Biosphere Data–Data Centers–Human Activities–Material Things–Numerics
–Sensors–Space–Time–Units
18
Content Loaded from EPA EDR and NASA SWEET Ontology
concepts & relationships
XMDR ontology
SWEET (OWL)
java
EDR
XMDR files XMDR files(ontologies)
19
What happens to XMDR files before they can be used for text searching or inference?
Concept System A A Concepts A Relationships
Lucene
Lucene indexes
Jena
Model AModel BXMDR Ontology…etc
Text queries (Lucene)
Inference queries (Jena)
Search/Query results are sets of URLs for xmdr files pictured above
Concept System B B Concepts B Relationships
etc. …
[all xmdr files] [each system (A,B,…etc) loaded individually]
Union of all models
21
Object_Class Property
Data_Element_Concept
Conceptual_Domain Value_Meaning
Permissible_Value
Data_Element
Value_Domain
*
0..1
*
0..1
*
**
**
Ontology
*
1..*
Concept
* * *
1..*
*
Link2..* link+
*
Relationship
*
type+
*
Enumerated_Conceptual_Domain
Nonenumerated_Conceptual_Domain
Enumerated_Value_Domain
Nonenumerated_Value_Domain
[Figure B - Revised Overview](Keck, 2005-10-14)
How to Search/Query Complex Concepts & Relationships
New Proposed Objects
Current 11179 Objects
23
XMDR RDF Graph Query Facilities Compliment Text Query Capabilities
SQL-like queries e.g., names of ontologies in a registry
Span items that are only indirectly connected e.g., data elements associated with a conceptual domain
Expand queries to subsumed classes in hierarchy e.g., ConceptualDomain includes EnnumeratedConc..
Transitivity e.g., all subclasses subsumed by a higher order class e.g., all superclasses (ancestors) of a particular class
Least common ancestor e.g., closest subsuming concept for 2 concepts
24
Example Subclass Queries: (Inference with Transitivity)
Environmental:What are all the (sub)types of Wetland (in SWEET)?
RDQL: SELECT ?x WHERE (?x rdfs:subClassOf earthrealm:Wetland) USING earthrealm FOR <http://sweet.jpl.nasa.gov/ontology/earthrealm.owl#>
HealthFind all the types of "Lung Carcinoma"
25
More Complex “Sibling” Queries: Concepts with Multiple Ancestors
HealthFind all the siblings of Breast Neoplasm
Note: This is complex, since Breast Neoplasm has two parents - Neoplasm by Site and Breast Disorder -- You would get returned both the by site Neoplasms, such as Eye Neoplasm, Respiratory System Neoplasm, etc. and the Breast Disorder siblings such as Non-Neoplastic Breast Disorder
26
Least Common Ancestor Queries: (Inference with Transitivity)
Health: "Morphine Sulfate" and "Acetaminophen".
least common ancestor should be Analgesic Agent (with multiple intervening concepts.)
27
Searching caDSR for Data Elements via Concepts and Vice-Versa
Common Data Elements (CDEs) are 'connected' to concepts through the Object Class and Property of the CDE. A query such as this should look for the CDE's Object Class derivation rule and select only those data elements associated with those object classes.. Alternatively, you could query the caDSR Concept Class and find all related OCs where the concept was flagged as "primary concept", then get all the Data Elements .. leveraging the ISO 11179 relationships...e.g. Object Class has related Data Element Concepts, DECs have related DEs... Concepts can also be associated with Value Meanings. So, search Concept Class with concept code, find all related Value Meanings, find all Value Domains that used the value meaning, find all Data
Elements that used the Value domain.
29
Comparison of Different Reasoners (on 2.7m triples)
type# of results Q1
# of results Q2
execution time Q1 (ms)
execution time Q2 (ms)
no infrerencing 7 0 13 15RDFS simple 14 7 11 16RDFS default 14 7 14 5RDFS full 14 7 13 18Transitive 7 0 16 32OWL micro 21 7 79 18OWL miniOWL defaultPellet
30
Challenges and Future Goals for XMDR Prototype
Scalability & performance Tools
RDF tool adaptation for metadata registries User-friendly interface Form interface for registration & uploading metadata
References to externally maintained sources Data, ontologies, terminologies
Evaluate alternative technologies For different modules
Demonstrate for key use cases and ecoinformatics applications
31
Challenges and Future Goals (cont)
Progress proposals through standards committees
Harmonization with W3C and OMG standards Incorporate Common Logic, Web Services, etc. Ontology Lifecycle Management (OLM) Improve link of concepts to data Generate schemas from axiomatized ontologies
32
Ecoinformatics Challenges
How does this fit into the research, development, and demonstration activities of the Interagency/International Cooperation on Ecoinformatics?
Should this be a part of the EU-US collaborative R&D?