24
Betsy L. Humphreys Betsy L. Humphreys Associate Director for Library Associate Director for Library Operations Operations NLM, NIH, HHS NLM, NIH, HHS [email protected] [email protected] National Library of National Library of Medicine Medicine CENDI Staff Workshop CENDI Staff Workshop Knowledge Organization Systems: Current and Knowledge Organization Systems: Current and Future Uses Future Uses September 16, 2004 September 16, 2004

Betsy L. Humphreys Betsy L. Humphreys Associate Director for Library Operations NLM, NIH, HHS NLM, NIH, HHS [email protected] [email protected] National Library

Embed Size (px)

Citation preview

Betsy L. HumphreysBetsy L. HumphreysAssociate Director for Library OperationsAssociate Director for Library Operations

NLM, NIH, HHSNLM, NIH, HHS [email protected]@nlm.nih.gov

National Library of MedicineNational Library of Medicine

CENDI Staff WorkshopCENDI Staff Workshop

Knowledge Organization Systems: Current and Future UsesKnowledge Organization Systems: Current and Future UsesSeptember 16, 2004September 16, 2004

2

NLM “Knowledge Organization Systems”NLM “Knowledge Organization Systems”

Name and Series/Journal Authority FilesName and Series/Journal Authority Files Library Materials ClassificationLibrary Materials Classification Individual Controlled Vocabularies Individual Controlled Vocabularies

MeSH, MedlinePlus Health Topics, NCBI MeSH, MedlinePlus Health Topics, NCBI Taxonomy, RxNorm clinical drug vocabularyTaxonomy, RxNorm clinical drug vocabulary

Unified Medical Language System (UMLS) Unified Medical Language System (UMLS) Knowledge SourcesKnowledge Sources

Metathesaurus – Metathesaurus – many many vocabularies in a common, vocabularies in a common, integrated formatintegrated format

Semantic NetworkSemantic Network LexiconLexicon Associated toolsAssociated tools

3

NLM “Knowledge Organization Systems”NLM “Knowledge Organization Systems”

Common CharacteristicsCommon Characteristics Searchable on the Web, often interlinked with Searchable on the Web, often interlinked with

other NLM resourcesother NLM resources Distributed in one or more electronic formatsDistributed in one or more electronic formats Used within NLM for:Used within NLM for:

Information retrieval and displayInformation retrieval and display Data creationData creation Natural language interpretationNatural language interpretation

Heavily used outside NLM for wide range of Heavily used outside NLM for wide range of applicationsapplications

Most built and maintained with custom systemsMost built and maintained with custom systems

4

http://wwwcf.nlm.nih.gov/class/

5

6

Medical Subject Headings (MeSH)Medical Subject Headings (MeSH)

Structure of MeSH upgraded in 2000Structure of MeSH upgraded in 2000 Descriptor Class – closely related concepts Descriptor Class – closely related concepts

grouped to enhance retrievalgrouped to enhance retrieval Concept – distinct meaningConcept – distinct meaning Term – concept nameTerm – concept name

http://www.nlm.nih.gov/mesh/meshrels.html

7

Known Translations of MeSHKnown Translations of MeSH

In UMLS - Dutch, Finnish, French, German, In UMLS - Dutch, Finnish, French, German, Italian, Japanese, Portuguese, Russian, Spanish, Italian, Japanese, Portuguese, Russian, Spanish, SwedishSwedish

Other Complete Translations Other Complete Translations Arabic, Chinese, Czech, Greek, Thai, TurkishArabic, Chinese, Czech, Greek, Thai, Turkish

In Progress or Planned or Hoped ForIn Progress or Planned or Hoped For Korean, Slovenian, Vietnamese, Lithuanian, Korean, Slovenian, Vietnamese, Lithuanian,

Polish, Slovakian, Norwegian, Kiswahili Polish, Slovakian, Norwegian, Kiswahili

8

Coordinating Translations How?Coordinating Translations How?

Single Database - Web InterfaceSingle Database - Web Interface Add Language as a Term PropertyAdd Language as a Term Property Translated Terms added to ConceptTranslated Terms added to Concept Non-English Concepts added to DescriptorNon-English Concepts added to Descriptor

9

10

11

Status of UseStatus of Use

Current Active GroupsCurrent Active Groups German, French, Italian, VietnameseGerman, French, Italian, Vietnamese

Groups Beginning Work with MTMSGroups Beginning Work with MTMS Dutch, Finnish, Japanese, Polish, SlovakianDutch, Finnish, Japanese, Polish, Slovakian

Groups Starting SoonGroups Starting Soon Czech, Portuguese, Korean, Norwegian, Russian, Czech, Portuguese, Korean, Norwegian, Russian,

SpanishSpanish

12

13

14

15

16

http://www.ncbi.nlm.nih.gov/Taxonomy/

17

18

http://umlsinfo.nlm.nih.gov

19

The UMLS in practiceThe UMLS in practice

DatabaseDatabase Series of relational filesSeries of relational files

InterfacesInterfaces Web interface: Knowledge Source Server (UMLSKS)Web interface: Knowledge Source Server (UMLSKS) Application programming interfacesApplication programming interfaces

(Java and XML-based)(Java and XML-based)

ApplicationsApplications lvg (lexical programs)lvg (lexical programs) MetamorphoSys (installation and customization)MetamorphoSys (installation and customization) SOON: Metathesaurus browserSOON: Metathesaurus browser

The UMLS is The UMLS is notnot an end-user application an end-user application

20

UMLS UMLS 3 components3 components

MetathesaurusMetathesaurus ConceptsConcepts Inter-concept relationshipsInter-concept relationships

Semantic NetworkSemantic Network Semantic typesSemantic types Semantic network relationshipsSemantic network relationships

Lexical resourcesLexical resources SPECIALIST LexiconSPECIALIST Lexicon Lexical toolsLexical tools

21

Metathesaurus Source VocabulariesMetathesaurus Source Vocabularies

134 source vocabularies134 source vocabularies 126 contributing concept names126 contributing concept names

73 families of vocabularies73 families of vocabularies multiple translations (e.g., MeSH, ICPC, ICD-10)multiple translations (e.g., MeSH, ICPC, ICD-10) variants (American-English equivalents, Australian variants (American-English equivalents, Australian

extension/adaptation)extension/adaptation) subsequent editions usually considered distinct families subsequent editions usually considered distinct families

(ICD: 9-10; DSM: IIIR-IV)(ICD: 9-10; DSM: IIIR-IV)

Broad coverage of biomedicineBroad coverage of biomedicine Common presentationCommon presentation

(2004AB)

22

Metathesaurus Concepts

ConceptConcept (> 1M)(> 1M) CUICUI Set of synonymousSet of synonymous

concept namesconcept names

TermTerm (> 3.8 M)(> 3.8 M) LUILUI Set of normalized namesSet of normalized names

StringString (> 4.3M)(> 4.3M) SUISUI Distinct concept nameDistinct concept name

AtomAtom (> 5.1M)(> 5.1M) AUIAUI Concept nameConcept name

in a given sourcein a given source

(2004AB)

A0000001A0000001 headacheheadache (source 1)(source 1)A0000002A0000002 headacheheadache (source 2)(source 2)

S0000001S0000001

A0000003A0000003 HeadacheHeadache (source 1)(source 1)A0000004A0000004 HeadacheHeadache (source 2)(source 2)

S0000002S0000002

L0000001L0000001

A0000005A0000005 CephalgiaCephalgia (source 1)(source 1)S0000003S0000003

L0000002L0000002

C0000001C0000001

23

Metathesaurus Relationships

Symbolic relations:Symbolic relations: ~9 M pairs of concepts~9 M pairs of concepts Statistical relations :Statistical relations : ~7 M pairs of concepts ~7 M pairs of concepts

(co-occurring concepts)(co-occurring concepts) Mapping relations:Mapping relations: 100,000 pairs of 100,000 pairs of

conceptsconcepts

Categorization: Relationships between concepts Categorization: Relationships between concepts and semantic types from the Semantic Networkand semantic types from the Semantic Network

24

Why you might care about the UMLSWhy you might care about the UMLS

Content with applicability outside of biomedicineContent with applicability outside of biomedicine Tools generally useful in NLP, dataminingTools generally useful in NLP, datamining New Metathesaurus Rich Release FormatNew Metathesaurus Rich Release Format

Potentially useful as format for distribution of any set Potentially useful as format for distribution of any set of vocabularies/ontologies and for robust purpose-of vocabularies/ontologies and for robust purpose-specific mappings between such systemsspecific mappings between such systems

May well lead to development of a variety of tools that May well lead to development of a variety of tools that can output or ingest the formatcan output or ingest the format