39
Medical Informatics Laboratory Department of Biomedical engineering College of Medicine , Seoul National Univ. Eunsil Yoon U.S. National Library of Medicine National Institutes of Health UMLS (The Unified Medical Language System) 2012.11.29 Reviewed by Eunsil Yoon

121129 umls yes

Embed Size (px)

DESCRIPTION

UMLS(unified-medical-language-system)에 대한 정리 및 관련 연구 발표 자료 (의료정보표준 수업에서 발표함)

Citation preview

Page 1: 121129 umls yes

Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.

Eunsil Yoon

U.S. National Library of MedicineNational Institutes of Health

UMLS(The Unified Medical Language System)

2012.11.29 Reviewed by Eunsil Yoon

Page 2: 121129 umls yes

Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.

Eunsil Yoon

Contents

• Introduction

– What is the UMLS?

– UMLS is Use

– www.nlm.nih.gov/research/umls

• The Three UMLS Tools (Knowledge Sources)

– Metathesaurus

– Semantic network

– SPECIALIST Lexicon

• UMLS in JAMIA papers

Page 3: 121129 umls yes

Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.

Eunsil Yoon

What is the UMLS?

• Started in 1986 (NLM; National Library of Medicine)

• NLM is a member of the IHTSDO(owner of SNOMED CT)

Page 4: 121129 umls yes

Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.

Eunsil Yoon

What is the UMLS?

• Unified Medical Language System® (UMLS®)

• A set of files and software that brings together many health

and biomedical vocabularies and standards to enable inter-

operability between computer systems.

• You can use the UMLS to enhance or develop applications,

such as electronic health records, classification tools, dic-

tionaries and language translators.

The UMLS is not an end-user application

Page 5: 121129 umls yes

Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.

Eunsil Yoon

NLM Mainpage

Page 6: 121129 umls yes

Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.

Eunsil Yoon

NLM > UMLS

Page 7: 121129 umls yes

Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.

Eunsil Yoon

NLM > UMLS > UTS

Page 8: 121129 umls yes

Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.

Eunsil Yoon

NLM > UMLS > UTS > Metathesaurus browser

Page 9: 121129 umls yes

Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.

Eunsil Yoon

Metathesaurus Browser > Synonyms

Synonyms (246)(Acute nasopharyngitis or rhinitis) or (common cold)(Acute nasopharyngitis or rhinitis) or (common cold)

(disorder)ARNAS IBILBIDE GARAIETAKO ZOLDURA/ HOTZALDI

ARRUNTAAcut nasopharyngitis (meghűlés)Acut rhinitisAcute NasopharyngitisAcute coryzaAcute infectie bovenste luchtwegenAcute infective rhinitisAcute nasal catarrhAcute nasofaryngitis [verkoudheid]Acute nasopharyngitisAcute nasopharyngitis (common cold)Acute nasopharyngitis [common cold]Acute nasopharyngitis, NOSAcute rhinitisAcute rhinitis (disorder)Akute Rhinopharyngitis [Erkaeltungsschnupfen]Akutní nazofaryngitidaAkutní rinitidaAkutní zánět nosohltanu (prosté nachlazení)COLDCOMMON COLDCORIZACORYZA

ПРОСТУДАかぜかぜひきかぜ症候群コリーザ - 急性急性コリーザ急性鼻咽頭炎急性鼻咽頭炎(感冒)急性鼻炎感冒感冒 - 普通感冒症候群感染性鼻炎普通感冒頭部感冒風邪鼻感冒鼻炎(感染性)급성 코인두염 [ 감기 ]カンセンセイビエンカンボウカンボウショウコウグンキュウセイハナイントウエンカンボウキュウセイビイントウエン

Page 10: 121129 umls yes

Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.

Eunsil Yoon

Metathesaurus Browser > Relations

Page 11: 121129 umls yes

Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.

Eunsil Yoon

NLM > UMLS > UTS > Metathesaurus browser

Page 12: 121129 umls yes

Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.

Eunsil Yoon

NLM > UMLS > UTS > Semantic Network Browser

Page 13: 121129 umls yes

Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.

Eunsil Yoon

NLM > UMLS > UTS > Semantic Network Browser

Page 14: 121129 umls yes

Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.

Eunsil Yoon

The Three UMLS Tools (Knowledge Sources)

• Metathesaurus

• Semantic Network

• SPECIALIST Lexicon

Page 15: 121129 umls yes

Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.

Eunsil Yoon

Metathesaurus

• The Metathesaurus is a large, multi-purpose, and multi-lingual vocabulary

database that contains information about biomedical and health related

concepts, their various names, and the relationships among them.

• Over 100 vocabularies, code sets, and thesauri, or "source

vocabularies" are brought together to create the Metathesaurus.

• organized by meaning and assigned a concept unique identifier (CUI).

• 62% of the Metathesaurus source vocabularies English

• Also contains terms from 17 other languages

Atrial fibrillation ICD-9-CMAF NCI ThesaurusAFib MedDRAAtrial fibrillation (disorder) SNOMED Clinical Termsatrium; fibrillation ICPC2-ICD10 Thesaurus

Ex. “Atrial Fibrillation”

Page 16: 121129 umls yes

Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.

Eunsil Yoon

Metathesaurus Basic organization

• Concepts

– Synonymous terms are clustered into a concept

– Properties are attached to concepts, e.g.,

• Unique identifier

• Definition

• Relations

– Concepts are related to other concepts

– Properties are attached to relations, e.g.,

• Type of relationship

• Source

Page 17: 121129 umls yes

Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.

Eunsil Yoon

Metathesaurus - subsets

•  Users create a useful subset, or smaller grouping of con-

cepts, by choosing source vocabularies

• Examples of subsets include

– Source vocabularies in a language (all Spanish vocabularies)

– All terms that are free for use within the United States

– CPT codes to be used for billing purposes

– Terms with the semantic type 'Clinical Drug'

Page 18: 121129 umls yes

Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.

Eunsil Yoon

Metathesaurus – Unique Identifiers

• Concept Unique Identifiers (CUI)

– A concept is a meaning. A meaning can have many different names. A key

goal of Metathesaurus construction is to understand the intended meaning of

each name in each source vocabulary and to link all the names from all of

the source vocabularies that mean the same thing (the synonyms).

• Lexical (term) Unique Identifiers (LUI) 

– LUI link strings that are lexical variants. Lexical variants are detected using

the Lexical Variant Generator (LVG) program, one of the UMLS lexical

tools.

• String Unique Identifiers (SUI)

– Each unique concept name or string in each language in the Metathesaurus

has a unique and permanent string identifier (SUI). Any variation in character

set, upper-lower case, or punctuation difference is a separate string, with a

separate SUI. SUI contain the letter S followed by seven numbers. In the ex-

ample on the right there are four strings with four different SUI.

• Atom Unique Identifiers (AUI)

– The basic building blocks or "atoms" from which the Metathesaurus is con-

structed are the concept names or strings from each of the source vocabular-

ies. Every occurrence of a string in each source vocabulary is assigned a

unique atom identifier (AUI).

Page 19: 121129 umls yes

Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.

Eunsil Yoon

Metathesaurus – Unique Identifiers > Atom

obsoletesuppressible

Page 20: 121129 umls yes

Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.

Eunsil Yoon

Metathesaurus – Data Files

• The Metathesaurus consists of forty data, metadata, and index files.

• The data files listed below contain information obtained from the source vocabularies.

• The table below illustrates what information populates each data file.

Metadata File Name Contents

MRCONSO.RRF Names, Synonyms, Terms, Term Types, Codes

MRREL.RRF RelationshipsMRHIER.RRF HierarchiesMRSAT.RRF AttributesMRDEF.RRF DefinitionsMRMAP.RRF MappingsMRSMAP.RRF Simplified MappingsMRSTY.RRF Semantic Types

Page 21: 121129 umls yes

Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.

Eunsil Yoon

Metathesaurus – Data Files > RRF

Page 22: 121129 umls yes

Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.

Eunsil Yoon

The Semantic Network

• The Semantic Network

– Semantic types (high level categories)

– Semantic relationships (relationships between semantic types)

• The Semantic Network can be used to categorize any medical vo-

cabulary.

• 133 semantic types in the Semantic Network

• Every Metathesaurus concept is assigned at least one semantic

type; very few terms are assigned as many as five semantic types.

Page 23: 121129 umls yes

Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.

Eunsil Yoon

The Semantic Network - Type• Entity

• A broad type for grouping physical and

conceptual entities.

• Examples of Entity semantic types are:

• Amphibian

• Gene or Genome

• Carbohydrate

• Event

• A broad type for grouping activities,

processes and states.

• Examples of Event semantic types are:

• Social Behavior

• Laboratory Procedure

• Mental Process

Page 24: 121129 umls yes

Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.

Eunsil Yoon

AnatomicalAbnormality

SubstanceOrganismManufactured

ObjectAnatomicalStructure

Conceptual entity

Entity

Physical Object

ClinicalDrug

Fully FormedAnatomicalStructure

EmbryonicStructure

ResearchDevice

MedicalDevice

FoodChemicalBody

Substance

Rickettsia orChlamydiaVirusPlantFungusBacteriumArchaeonAnimal

BiologicalActive

SubstanceReptileMammalFishBirdAmphibian Pharmacologic

Substance

Element,Ion, orIsotope

InorganicChemical

OrganicChemical

Hazardous orPoisonousSubstance

BiologicalDental

Material

Indicator,Reagent, or

Diagnostic Aid

Cellcomponent

Body Part Organ, or Organ Component

CongenitalAbnormality

AcquiredAbnormality

InvertebrateVertebrateGene orGenome

TissueCell AlgaChemicalViewed

Structurally

ChemicalViewed

Functionally

VitaminEnzymeHormoneNeuroreactiveSubstance or

Biogenic AmineHuman Immunologic

Factor ReceptorAntibioticAmino Acid,Nucleoside,

or nucleotide

Carbohydrate

LipidNucleic Acid,Nucleoside

,or Nucleotide

Organophosphorus

Compound

SteroidEicosanoid

Semantic Network Physical Object

Page 25: 121129 umls yes

Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.

Eunsil Yoon

FindingIdea orConcept

Physical ObjectConceptual entity

Occupation orDiscipline

LanguageIntellectual

ProductOrganismAttribute

GroupGroup

AttributeOrganization

Regulationor Law

ClassificationClinical

AttributeSign or

SymptomLaboratory or

Test ResultAmino AcidSequence

BiomedicalOccupation or

Discipline

NucleotideSequence

CarbohydrateSequence

Patient orDisabled

Group

PopulationGroup

Professional orOccupational

GroupFamily GroupAge Group

SpatialConcept

QuantitativeConcept

QualitativeConcept

Temporal Concept

FunctionalConcept

Body SystemMolecular Sequence

GeographicArea

Body Space orJunction

Body Locationor Region

CarbohydrateSequence

Amino AcidSequence

NucleotideSequence

Semantic Network Conceptual Ob-ject Entity

Page 26: 121129 umls yes

Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.

Eunsil Yoon

Event

Behavior

PhenomenonOr ProcessActivity

IndividualBehavior

EducationalActivity

SocialBehavior

Daily orRecreational

Activity

Injury orPoisoning

NaturalPhenomenon

of Process

Human-causedPhenomenon of

Process

MachineActivity

OccupationalActivity

Environmental Effect of

HumanResearchActivity

Health CareActivity

Governmentalor Regulatory

Activity

BiologicFunction

MolecularBiology

ResearchTechnique

Therapeutic orPreventiveProcedure

LaboratoryProcedure

DiagnosticProcedure

PathologicFunction

PhysiologicFunction

Cell orMolecular

DysFunction

OrganismFunction

Organ orTissue

Function

MolecularFunction

CellFunction

ExperimentalModel ofDisease

Diseaseor

Syndrome

Mental orBehavioral

Dysfunction

NeoplasticProcess

MentalProcess

GeneticFunction

Semantic Network - Event

Page 27: 121129 umls yes

Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.

Eunsil Yoon

The Semantic Network - Relationships

• 54 Semantic Relationships

• The primary link between most semantic

types is the ‘isa’ relationship.

• Animal isa Entity

• Carbohydrate isa Chemical

• Human isa Mammal

[ Relation Label ]

isa

part_of

result_of

co-occurs_with

evaluation_of

location_of

Page 28: 121129 umls yes

Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.

Eunsil Yoon

The Semantic Network - Relationships

Page 29: 121129 umls yes

Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.

Eunsil Yoon

SPECIALIST Lexicon

• A lexicon is necessarily a core component of any natural language process-

ing system

• Coverage includes both commonly occurring English words and biomedical

vocabulary discovered in the NLM Test Collection and the UMLS Metathe-

saurus.

• The lexicon entry for each word or term records the syntactic, morphologi-

cal, and graphemic information.

– Syntactic information includes syntactic category(part of speech), and complementation pat-

terns for verbs, adjectives and nouns, as well as positional and modification types for adjec-

tives and adverbs.

– Inflectional morphology is indicated for those syntactic categories which inflect, and spelling

variation is recorded for each lexical item known to exhibit such variation.

Page 30: 121129 umls yes

Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.

Eunsil Yoon

SPECIALIST NLP Tools

Page 31: 121129 umls yes

Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.

Eunsil Yoon

관련연구

[1] Wu S.T., Liu.H et al (2012). Unified Medical Language System term occurrences in clinical notes: a large-scale corpus analysis. Journal of the American Medical Informatics Association : JAMIA, 19(e1), e149–e156.

Page 32: 121129 umls yes

Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.

Eunsil Yoon

[1] UMLS term occurrences in clinical notes

• Objective

– To characterise empirical instances of Unified Medical Language Sys-

tem (UMLS) Metathesaurus term strings in a large clinical corpus, and

to illustrate what types of term characteristics are generalisable across

data sources.

• Data Sources

– The data source for the corpus analysis of clinical text was Mayo Clinic

clinical notes between 1 January 2001 and 31 December 2010, re-

trieved from the Mayo’s Enterprise Data Trust (EDT).

– 51,945,627EA documents

– 296,167 unique terms

– 2,319,010,575 case-insensitive exact term match

Page 33: 121129 umls yes

Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.

Eunsil Yoon

[1] UMLS term occurrences in clinical notes

• Figure 1 shows histograms for the number of words in the UMLS and in the subset that is empirically found in Mayo Clinic data.

• Corpus Analysis – Word Statistics

Page 34: 121129 umls yes

Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.

Eunsil Yoon

[1] UMLS term occurrences in clinical notes

• Corpus Analysis - Term Frequency

Page 35: 121129 umls yes

Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.

Eunsil Yoon

[1] UMLS term occurrences in clinical notes

• Corpus Analysis – Source Terminology

Page 36: 121129 umls yes

Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.

Eunsil Yoon

[1] UMLS term occurrences in clinical notes

• Corpus Analysis – syntactic categories

Page 37: 121129 umls yes

Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.

Eunsil Yoon

[1] UMLS term occurrences in clinical notes

• Cross-Institutional analysis

① Special characters

② Maximum number of words

③ Maximum number of characters

④ Language

⑤ Source terminology

⑥ Semantic group

⑦ Empirical occurrence filter

⑧ Term frequency

• SNOMED-CT• Consumer Health Vocabulary• National Cancer Institute(NCI) Thesaurus• Medical Subject Headings (MSH)• Read Codes• Medical Dictionary for Regulatory Activities Terminology (Med-

DRA)• SNOMED International• MEDCIN• UMLS Metathesaurus• National Drug Filed Reference Terminology(NDF-RT)• The original SNOMED• Online Mendelian Inheritance in Man (OMIM)• Logical Observation Identifiers Names and Codes (LOINC)• Computer Retrieval of Information on Scientific Projects

(CRISP)

• Anatomy• chemicals & drugs• concepts & ideas• Disorders• living beings• physiology• procedures

Page 38: 121129 umls yes

Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.

Eunsil Yoon

[1] UMLS term occurrences in clinical notes

Page 39: 121129 umls yes

Medical Informatics LaboratoryDepartment of Biomedical engineeringCollege of Medicine , Seoul National Univ.

Eunsil Yoon

Reference

• UMLS; http://www.nlm.nih.gov/research/umls

• UMLS Basics Tutorial; http://

www.nlm.nih.gov/research/umls/new_users/online_learning/i

ndex.htm

• UTS; https://uts.nlm.nih.gov/

• Wu S.T., Liu.H et al (2012). Unified Medical Language Sys-

tem term occurrences in clinical notes: a large-scale corpus

analysis. Journal of the American Medical Informatics Asso-

ciation : JAMIA, 19(e1), e149–e156.

• 한승빈 , 김승희 , 최진욱 . ‘UMLS Metathesaurus 2004 의 새로운 파일구조 - Rich Release Format(RRF) 의 소개’