35
UMLS and semantic integration Olivier Bodenreider Lister Hill National Center for Biomedical Communications Bethesda, Maryland - USA Semantic Data Integration Workshop Amsterdam, The Netherlands May 18, 2009

Semantic Data Integration Workshop · 18/05/2009 · Semantic Data Integration . Workshop . Amsterdam, The Netherlands. ... (NIC, NOC, NANDA, Omaha, PCDS) z. dentistry (CDT) z. oncology

  • Upload
    dothuy

  • View
    222

  • Download
    1

Embed Size (px)

Citation preview

UMLS and semantic integration

Olivier Bodenreider

Lister Hill National Centerfor Biomedical Communications

Bethesda, Maryland - USA

Semantic Data Integration Workshop Amsterdam, The Netherlands

May 18, 2009

Lister Hill National Center for Biomedical Communications 2

Outline

Unified Medical Language System overviewUMLS MetathesaurusUMLS Semantic Network

Data integration questions

Lister Hill National Center for Biomedical Communications 3

Uses of biomedical ontologies

Knowledge managementAnnotating data and resourcesAccessing biomedical informationMapping across biomedical ontologies

Data integration, exchange and semantic interoperabilityDecision support

Data selection and aggregationDecision supportNLP applicationsKnowledge discovery

[Bodenreider, YBMI 2008]

Unified Medical Language System

Overview

Lister Hill National Center for Biomedical Communications 5

Motivation

Started in 1986National Library of Medicine“Long-term R&D project”

«[…] the UMLS project is an effort to overcome two significant barriers to effective retrieval of machine-readable information.

• The first is the variety of ways the same concepts are expressed in different machine-readable sources and by different people.

• The second is the distribution of useful information among many disparate databases and systems.»

Lister Hill National Center for Biomedical Communications 6

The UMLS in practice

DatabaseSeries of relational files

InterfacesWeb interface: Knowledge Source Server (UMLSKS)Application programming interfaces(Java and XML-based)

Applicationslvg (lexical programs)MetamorphoSys (installation and customization)RRF browser (browsing subsets)

The UMLS is not an end-user application

Lister Hill National Center for Biomedical Communications 7

UMLS 3 components

Lexical resourcesSPECIALIST LexiconLexical tools

MetathesaurusConceptsInter-concept relationships

Semantic NetworkSemantic typesSemantic network relationships

Lexicalresources

Ontologicalresources

Terminologicalresources

UMLS Knowledge Sources

UMLS Metathesaurus

Lister Hill National Center for Biomedical Communications 9

Metathesaurus Basic organization

ConceptsSynonymous terms are clustered into a conceptProperties are attached to concepts, e.g.,

Unique identifierDefinition

RelationsConcepts are related to other conceptsProperties are attached to relations, e.g.,

Type of relationshipSource

Lister Hill National Center for Biomedical Communications 10

Source Vocabularies

152 source vocabularies19 languages

Broad coverage of biomedicine9.7M names2.1M concepts>10M relations

Common presentation

(2009AA)

Lister Hill National Center for Biomedical Communications 11

Biomedical terminologies

General vocabulariesanatomy (UWDA, Neuronames)drugs (RxNorm, First DataBank, Micromedex)medical devices (UMD, SPN)

Several perspectivesclinical terms (SNOMED CT)information sciences (MeSH, CRISP)administrative terminologies (ICD-9-CM, CPT-4)data exchange terminologies (HL7, LOINC)

Lister Hill National Center for Biomedical Communications 12

Biomedical terminologies (cont’d)

Specialized vocabulariesnursing (NIC, NOC, NANDA, Omaha, PCDS)dentistry (CDT)oncology (PDQ)psychiatry (DSM, APA)adverse reactions (COSTART, WHO ART)primary care (ICPC)

Terminology of knowledge bases (AI/Rheum, DXplain, QMR)

Lister Hill National Center for Biomedical Communications 13

Integrating subdomains

Biomedicalliterature

MeSH

Genomeannotations

GOModelorganisms

NCBITaxonomy

Geneticknowledge bases

OMIM

Clinicalrepositories

SNOMED CTOthersubdomains

Anatomy

FMA

UMLS

Lister Hill National Center for Biomedical Communications 14

Integrating subdomains

Biomedicalliterature

Genomeannotations

Modelorganisms

Geneticknowledge bases

Clinicalrepositories

Othersubdomains

Anatomy

Lister Hill National Center for Biomedical Communications 15

Trans-namespace integration

Genomeannotations

GOModelorganisms

NCBITaxonomy

Geneticknowledge bases

OMIMOther

subdomains

Anatomy

FMA

UMLSAddison Disease (D000224)

Addison's disease (363732003)

Biomedicalliterature

MeSH

Clinicalrepositories

SNOMED CT

UMLSC0001403

Lister Hill National Center for Biomedical Communications 16

Addison’s Disease: Concept

Addison’s Disease

C0001403

ADRENAL INSUFFICIENCY (ADDISON'S DISEASE) ADRENOCORTICAL INSUFFICIENCY, PRIMARY FAILURE Hypoadrenalisms, PrimaryMelasma addisonii Primary adrenal deficiency Asthenia pigmentosa Bronzed disease Insufficiency, adrenal primary Primary adrenocortical insufficiency Addison's, disease

Maladie d'Addison - FrenchAddison-Krankheit - GermanMorbo di Addison - ItalianDoença de Addison - PortugueseАДДИСОНОВА БОЛЕЗНЬ - Russianアジソン病 - Japanese

An adrenal disease characterized by the progressive destruction of the adrenal cortex, resulting in insufficient production of aldosterone and hydrocortisone. Clinical symptoms include anorexia; nausea; weight loss; muscle ewakness; and hyperpigmentation of the skin due to increase in circulating levels of ACTH precursor hormone which stimulates melanocytes.

Disease or Syndrome

SNOMED CTSNOMED IntlMeSHMedDRA…

Lister Hill National Center for Biomedical Communications 17

Metathesaurus Relationships

Symbolic relations: ~8 M pairs of conceptsStatistical relations : ~6 M pairs of concepts (co-occurring concepts)Mapping relations: ~150,000

Categorization: Relationships between concepts and semantic types from the Semantic Network

Heart

Concepts

Metathesaurus

38

237

49

5

16

13 22

Esophagus

Left PhrenicNerve

HeartValves

FetalHeart

Medias-tinum

SaccularViscus

AnginaPectoris

CardiotonicAgents

TissueDonors

AnatomicalStructure

Fully FormedAnatomicalStructure

EmbryonicStructure

Body Part, Organ orOrgan Component Pharmacologic

Substance

Disease orSyndrome

PopulationGroup

Semantic Types

SemanticNetwork

UMLS Knowledge Sources

UMLS Semantic Network

Lister Hill National Center for Biomedical Communications 20

Semantic Network

Semantic network relationships (54)hierarchical (isa = is a kind of)

among types– Animal isa Organism– Enzyme isa Biologically Active Substance

among relations– treats isa affects

non-hierarchicalSign or Symptom diagnoses Pathologic FunctionPharmacologic Substance treats Pathologic Function

Lister Hill National Center for Biomedical Communications 21

“Biologic Function” hierarchy (isa)

Biologic Function

Pathologic FunctionPhysiologic Function

Disease orSyndrome

Cell orMolecular

Dysfunction

ExperimentalModel ofDisease

OrganismFunction

Organor TissueFunction

CellFunction

MolecularFunction

Mental orBehavioral

Dysfunction

NeoplasticProcess

MentalProcess

GeneticFunction

Lister Hill National Center for Biomedical Communications 22

Associative (non-isa) relationshipsOrganism

process of

EmbryonicStructure

AnatomicalAbnormality

CongenitalAbnormality

AcquiredAbnormality

Fully FormedAnatomicalStructure

AnatomicalStructure

part of

OrganismAttribute

property of

BodySubstance

contains,produces

conceptualpart of

evaluation of

Body Systemconceptualpart of

part of

Body Part, Organ orOrgan Component

part of

Tissue

part of

Cell

part of

CellComponent

Gene orGenome

Body Spaceor Junction

adjacent to

location of

location of

evaluation ofFinding

Laboratory orTest Result

Sign orSymptom

BiologicFunction

PhysiologicFunction

PathologicFunction

Body Locationor Region

conceptualpart of

conceptualpart of

Injury orPoisoning

disrupts

disrupts

co-occurs with

Lister Hill National Center for Biomedical Communications 23

Why a semantic network?

Semantic Types serve as high level categories assigned to Metathesaurus concepts, independently of their position in a hierarchy

A relationship between 2 Semantic Types (ST) is a possible link between 2 concepts that have been assigned to those STs

The relationship may or may not hold at the concept levelOther relationships may apply at the concept level

Lister Hill National Center for Biomedical Communications 24

Relationships can inherit semantics

Semantic Network

Metathesaurus

AdrenalCortex

AdrenalCortical

hypofunction

Disease or SyndromeBody Part, Organ,

or Organ Component

Pathologic Functionisa

Biologic Function

isa

Fully FormedAnatomical

Structure

isa

location of

location of

UMLS and semantic integration

Data integration questions

Lister Hill National Center for Biomedical Communications 26

Semantic interoperability through the UMLS

Metathesaurus:Terminology/ontology integration

Terms from various terminologies linked through UMLS

Semantic Network:Top domain ontology

Framework for semantic categorization of conceptsTemplate for potential relations among concepts

Lister Hill National Center for Biomedical Communications 27

Potential contribution of UMLS to integration

Data consistencySN as a source of domain and range constraints for relations

Data queryResolve terms into conceptsSource of synonymy[Lexical variants, normalization]

Service queryService interoperability

Lister Hill National Center for Biomedical Communications 28

Potential contribution of UMLS to integration

ProvenanceRich source of metadata about terms

Data integrationMap terms/concepts across vocabulariesData integration through terminology integration

Semantic mediationUMLS as a the global schema

ReasoningLimited

[Mougin, DILS 2008]

Lister Hill National Center for Biomedical Communications 29

Data, metadata and semantics

Not specifically in UMLScaBIG

Cancer Biomedical Informatics Gridhttp://cabig.cancer.gov/National Cancer InstituteCancer Data Standards Registry and Repository (caDSR)http://ncicb.nci.nih.gov/NCICB/infrastructure/cacore_overview/cadsr

Common data elementsMetadata repository

Lister Hill National Center for Biomedical Communications 30

Use of the Metathesaurus in applications

Indexing, semantic annotation, codingMapping across vocabulariesAggregationSupport for Natural Language Processing applications (entity recognition)Source of value sets for information models

Lister Hill National Center for Biomedical Communications 31

Use of the Semantic Network in applications

Partition concepts into subdomainsAggregation

Support for Natural Language Processing applications (language understanding)Consistency checking of relations

MedicalOntologyResearch

Olivier Bodenreider

Lister Hill National Centerfor Biomedical CommunicationsBethesda, Maryland - USA

Contact:Web:

[email protected]

Lister Hill National Center for Biomedical Communications 33

References

UMLSumlsinfo.nlm.nih.gov

UMLS browsers(free, but UMLS license required)

Knowledge Source Server: umlsks.nlm.nih.gov

Semantic Navigator: http://mor.nlm.nih.gov/perl/semnav.pl

RRF browser(standalone application distributed with the UMLS)

Lister Hill National Center for Biomedical Communications 34

References

Recent overviewsBodenreider O. (2004). The Unified Medical Language System (UMLS): Integrating biomedical terminology. Nucleic Acids Research; D267-D270.Bodenreider O. From terminology integration to information integration: Unified Medical Language System (UMLS). BioRDF Teleconference, W3C Semantic Web Health Care and Life Sciences Interest Group, June 5, 2006.http://mor.nlm.nih.gov/pubs/pres/060605-BioRDF.pdf

Lister Hill National Center for Biomedical Communications 35

Biodiversity in the UMLS