29
1 eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Ispra, Italy January 17, 2006 Bruce Bargmeyer, Lawrence Berkley National Laboratory University of California Tel: +1 510-495-2905 [email protected]

eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics

  • Upload
    yamka

  • View
    23

  • Download
    0

Embed Size (px)

DESCRIPTION

eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics Ispra, Italy January 17, 2006. Bruce Bargmeyer, Lawrence Berkley National Laboratory University of California Tel: +1 510-495-2905 [email protected]. XMDR Project Collaboration. - PowerPoint PPT Presentation

Citation preview

Page 1: eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics

1

eXtended Metadata Registry (XMDR)

Interagency/International Cooperation on EcoinformaticsIspra, Italy

January 17, 2006

Bruce Bargmeyer, Lawrence Berkley National LaboratoryUniversity of CaliforniaTel: +1 [email protected]

Page 2: eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics

2

XMDR Project Collaboration

Collaborative, interagency effort EPA, USGS, NCI, Mayo Clinic, DOD, LBNL

…& others Draws on and contributes to

interagency/International Cooperation on Ecoinformatics

Involves Ecoterm, international, national, state, local government agencies, other organizations as content providers and potential users

Interacts with many organizations around the world through ISO/IEC standards committees

Page 3: eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics

3

XMDR Project Results:Bootstrapping Semantic Computing

Design for next generation metadata registries—expressed as a standard

XMDR Prototype, open source software Content loaded in prototype: millions of

concepts, terms, and relations between concepts.

Demonstrations for healthcare and the environment

Page 4: eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics

4

Metadata Registry Extensions

Register (and manage) any semantics that are useful for managing data. E.g., this may include registering not only permissible

values (concepts), definitions, but may extend to registration of the full concept systems in which the permissible values are found.

E.g., may want to register keywords, thesauri, taxonomies, ontologies, axiomatized ontologies….

Support traditional data management and data administration

Lay Foundation for semantic computing: Semantics Service Oriented Architecture, Semantic Grids, Semantics based workflows, Semantic Web ….

Page 5: eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics

5

Where have we been? Where are we planning to go?

System manuals

Data dictionaries

11179 E1

11179 E3

XML & related standards

Semantic grids

11179 E2

Semantics services (SSOA)

Complex semantics management

Data engineering

Data Standards XMDR Project

Semantics:

Semantic Web

Data + ontology lifecyclemanagement

Termino

logies

, onto

logies

Data Management/Data Administration

Page 6: eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics

6

XMDR Draws Together

Metadata Registry

Terminology Thesaurus Themes

DataStandards

Ontology GEMET

StructuredMetadata

UsersUsers

MetadataRegistries

Terminology

CONCEPT

Referent

Refers To Symbolizes

Stands For

“Rose”,“ClipArt”

Page 7: eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics

7

Concept System Store

Metadata Registry

Concept System Thesaurus Themes

DataStandards

Ontology GEMET

StructuredMetadata

UsersUsers

Concept systems:KeywordsControlled VocabulariesThesauriTaxonomiesOntologiesAxiomatized Ontologies

(Essentially graphs: node-relation-node + axioms)

}

Page 8: eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics

8

Management of Concept Systems

Metadata Registry

Concept System Thesaurus Themes

DataStandards

Ontology GEMET

StructuredMetadata

UsersUsers

Concept system:RegistrationHarmonization StandardizationAcceptance (vetting)Mapping (correspondences)

}

Page 9: eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics

9

Life Cycle Management

Metadata Registry

Concept System Thesaurus Themes

DataStandards

Ontology GEMET

StructuredMetadata

UsersUsers

Life cycle management:Data andConcept systems(ontologies)

}

Page 10: eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics

10

Grounding Semantics

Metadata Registry

Concept System Thesaurus Themes

DataStandards

Ontology GEMET

StructuredMetadata

UsersUsers

MetadataRegistries Semantic Web

RDF TriplesSubject (node URI)Verb (relation URI)Object (node URI)

Ontologies

Page 11: eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics

11

Ontology EditorProtege11179 OWL Ontology

XMDR Prototype Architecture: Initial Modules

MetadataValidator

AuthenticationService

MappingEngine

RegistryExternalInterface

Generalization Composition (tight ownership) Aggregation (loose ownership)

Jena, Xerces

Java

RetrievalIndex

FullTextIndex

Lucene

LogicBasedIndex

Jena, OWI KSRacer

RegistryStore

WritableRegistryStore

Subversion

Page 12: eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics

12

Ontology EditorProtege

11179 OWL Ontology

XMDR Prototype Architecture: Initial Implemented Modules

RegistryExternalInterface

Generalization Composition (tight ownership) Aggregation (loose ownership)

Java

RetrievalIndex

FullTextIndex

Lucene

LogicBasedIndexJena

Racer,etc.

RegistryStore

WritableRegistryStore

Subversion

Page 13: eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics

13

UML is Used for 11179 Metamodel, XMDR uses OWL, RDF & XML Schema

OWL XMDROntology &annotations

XMDR’s Relax NG Schema

XMDRXML Schema

UML11179Metamodel

11179 Relational Schema

Relational Metadata

RDF Spec

TRang

XML SchemaLanguage spec XML Objects

Types &Cardinalities

What things go in own files?Which property direction stored?Sequential ordering of properties

Triples: binarylabeled relationships

Page 14: eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics

14

Refined XMDR Subclasses Improve Organization & Enable Inference

Page 15: eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics

15

XMDR Example Content Loaded fromDiverse Sources via LexGrid & XSLT

Original Source A

Lexgrid Source A

XSLT script

Harold Solbrig (Mayo Clinic)

Concept System A

A Concepts

A Relationships

Content loaded to date: 2.7 million triples

Page 16: eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics

16

XMDR Content List (partial)

NBII Biocomplexity Thesaurus

NCI Thesaurus National Cancer Institute Thesaurus

NCI Data Elements (National Cancer Institute Data Standards Registry

UMLS (non-proprietary portions)

GEMET (General Multilingual Environmental Thesaurus)

EDR Data Elements (Environmental Data Registry)

USGS Geographic Names Information System (GNIS) HL7 Terminology, Data Elements

Mouse Anatomy

GO (Gene Ontology)

EPA Web Registry Controlled Vocabulary

BioPAX Ontology

NASA SWEET Ontologies

Page 17: eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics

17

NASA-JPL Semantic Web for Earth and Environmental Terminology

SWEET written in OWL ontology language (W3C) Can view with Internet Explorer 5+, Netscape 7+, etc. Can also use OWL-specific tools (e.g., SWOOP, Protégé)

Terms in other taxonomies can be mapped to SWEET using Global Change Master Directory (GCMD) CF Standard Names

http://sweet.jpl.nasa.gov/ontology/

–Earth Realms–Physical Phenomena

(any transient feature)–Physical Processes–Physical Properties–Physical Substances

–Sun Realms–Biosphere Data–Data Centers–Human Activities–Material Things–Numerics

–Sensors–Space–Time–Units

Page 18: eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics

18

Content Loaded from EPA EDR and NASA SWEET Ontology

concepts & relationships

XMDR ontology

SWEET (OWL)

java

EDR

XMDR files XMDR files(ontologies)

Page 19: eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics

19

What happens to XMDR files before they can be used for text searching or inference?

Concept System A A Concepts A Relationships

Lucene

Lucene indexes

Jena

Model AModel BXMDR Ontology…etc

Text queries (Lucene)

Inference queries (Jena)

Search/Query results are sets of URLs for xmdr files pictured above

Concept System B B Concepts B Relationships

etc. …

[all xmdr files] [each system (A,B,…etc) loaded individually]

Union of all models

Page 20: eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics

21

Object_Class Property

Data_Element_Concept

Conceptual_Domain Value_Meaning

Permissible_Value

Data_Element

Value_Domain

*

0..1

*

0..1

*

**

**

Ontology

*

1..*

Concept

* * *

1..*

*

Link2..* link+

*

Relationship

*

type+

*

Enumerated_Conceptual_Domain

Nonenumerated_Conceptual_Domain

Enumerated_Value_Domain

Nonenumerated_Value_Domain

[Figure B - Revised Overview](Keck, 2005-10-14)

How to Search/Query Complex Concepts & Relationships

New Proposed Objects

Current 11179 Objects

Page 21: eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics

23

XMDR RDF Graph Query Facilities Compliment Text Query Capabilities

SQL-like queries e.g., names of ontologies in a registry

Span items that are only indirectly connected e.g., data elements associated with a conceptual domain

Expand queries to subsumed classes in hierarchy e.g., ConceptualDomain includes EnnumeratedConc..

Transitivity e.g., all subclasses subsumed by a higher order class e.g., all superclasses (ancestors) of a particular class

Least common ancestor e.g., closest subsuming concept for 2 concepts

Page 22: eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics

24

Example Subclass Queries: (Inference with Transitivity)

Environmental:What are all the (sub)types of Wetland (in SWEET)?

RDQL: SELECT ?x WHERE (?x rdfs:subClassOf earthrealm:Wetland) USING earthrealm FOR <http://sweet.jpl.nasa.gov/ontology/earthrealm.owl#>

HealthFind all the types of "Lung Carcinoma"

Page 23: eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics

25

More Complex “Sibling” Queries: Concepts with Multiple Ancestors

HealthFind all the siblings of Breast Neoplasm

Note: This is complex, since Breast Neoplasm has two parents - Neoplasm by Site and Breast Disorder -- You would get returned both the by site Neoplasms, such as Eye Neoplasm, Respiratory System Neoplasm, etc. and the Breast Disorder siblings such as Non-Neoplastic Breast Disorder

Page 24: eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics

26

Least Common Ancestor Queries: (Inference with Transitivity)

Health: "Morphine Sulfate" and "Acetaminophen".

least common ancestor should be Analgesic Agent (with multiple intervening concepts.)

Page 25: eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics

27

Searching caDSR for Data Elements via Concepts and Vice-Versa

Common Data Elements (CDEs) are 'connected' to concepts through the Object Class and Property of the CDE. A query such as this should look for the CDE's Object Class derivation rule and select only those data elements associated with those object classes.. Alternatively, you could query the caDSR Concept Class and find all related OCs where the concept was flagged as "primary concept", then get all the Data Elements .. leveraging the ISO 11179 relationships...e.g. Object Class has related Data Element Concepts, DECs have related DEs... Concepts can also be associated with Value Meanings. So, search Concept Class with concept code, find all related Value Meanings, find all Value Domains that used the value meaning, find all Data

Elements that used the Value domain.

Page 26: eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics

29

Comparison of Different Reasoners (on 2.7m triples)

type# of results Q1

# of results Q2

execution time Q1 (ms)

execution time Q2 (ms)

no infrerencing 7 0 13 15RDFS simple 14 7 11 16RDFS default 14 7 14 5RDFS full 14 7 13 18Transitive 7 0 16 32OWL micro 21 7 79 18OWL miniOWL defaultPellet

Page 27: eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics

30

Challenges and Future Goals for XMDR Prototype

Scalability & performance Tools

RDF tool adaptation for metadata registries User-friendly interface Form interface for registration & uploading metadata

References to externally maintained sources Data, ontologies, terminologies

Evaluate alternative technologies For different modules

Demonstrate for key use cases and ecoinformatics applications

Page 28: eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics

31

Challenges and Future Goals (cont)

Progress proposals through standards committees

Harmonization with W3C and OMG standards Incorporate Common Logic, Web Services, etc. Ontology Lifecycle Management (OLM) Improve link of concepts to data Generate schemas from axiomatized ontologies

Page 29: eXtended Metadata Registry (XMDR) Interagency/International Cooperation on Ecoinformatics

32

Ecoinformatics Challenges

How does this fit into the research, development, and demonstration activities of the Interagency/International Cooperation on Ecoinformatics?

Should this be a part of the EU-US collaborative R&D?