17
NLP for Biomedical Applications Information integration through terminology integration Olivier Bodenreider Olivier Bodenreider Lister Hill National Center Lister Hill National Center for Biomedical Communications for Biomedical Communications Bethesda, Maryland Bethesda, Maryland - - USA USA AMIA Symposium Washington, DC November 12, 2003

NLP for Biomedical Applications

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: NLP for Biomedical Applications

NLP for Biomedical ApplicationsInformation integration through terminology integration

Olivier BodenreiderOlivier Bodenreider

Lister Hill National CenterLister Hill National Centerfor Biomedical Communicationsfor Biomedical CommunicationsBethesda, Maryland Bethesda, Maryland -- USAUSA

AMIA SymposiumWashington, DC

November 12, 2003

Page 2: NLP for Biomedical Applications

2Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

IntroductionIntroduction

◆◆ NLP and text mining requireNLP and text mining require●● TerminologyTerminology

●● Domain knowledgeDomain knowledge

◆◆ Biomedical terminologiesBiomedical terminologies●● Usually provide vocabularyUsually provide vocabulary

●● May provide some domain knowledgeMay provide some domain knowledge

●● Enable semantic integrationEnable semantic integration

◆◆ Semantic integration may benefit NLPSemantic integration may benefit NLPby enabling links to external resourcesby enabling links to external resources

Page 3: NLP for Biomedical Applications

Terminology integration

The Unified Medical Language System

Page 4: NLP for Biomedical Applications

4Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Unified Medical Language SystemUnified Medical Language System

◆◆ Started in 1986Started in 1986

◆◆ National Library of MedicineNational Library of Medicine

◆◆ Terminology integrationTerminology integration●● 60 families of biomedical vocabularies60 families of biomedical vocabularies

«[…] the UMLS project is an effort to overcome two significant barriers to

effective retrieval of machine-readable information.

• The first is the variety of ways the same concepts are expressed in

different machine-readable sources and by different people.

• The second is the distribution of useful information among many

disparate databases and systems.»

Page 5: NLP for Biomedical Applications

5Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Integrating Integrating subdomainssubdomains

Biomedicalliterature

Biomedicalliterature

MeSH

GenomeannotationsGenome

annotations

GOModelorganisms

Modelorganisms

NCBITaxonomy

Geneticknowledge bases

Geneticknowledge bases

OMIM

Clinicalrepositories

Clinicalrepositories

SNOMEDOthersubdomains

Othersubdomains

AnatomyAnatomy

UWDA

UMLS

Page 6: NLP for Biomedical Applications

6Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Integrating Integrating subdomainssubdomains

Biomedicalliterature

Biomedicalliterature

GenomeannotationsGenome

annotations

Modelorganisms

Modelorganisms

Geneticknowledge bases

Geneticknowledge bases

Clinicalrepositories

Clinicalrepositories

Othersubdomains

Othersubdomains

AnatomyAnatomy

Page 7: NLP for Biomedical Applications

Information integration

Genetics as an example

Page 8: NLP for Biomedical Applications

8Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

NF2 NF2 GeneGene, , proteinprotein, and , and diseasedisease

Neurofibromatosis 2 is an autosomal dominant disease characterized by tumors called schwannomas involving the acoustic nerve, as well as other features. The disorder is caused by mutations of the NF2 gene resulting in absence or inactivation of the protein product. The protein product of NF2 is commonly called merlin (but also neurofibromin 2 and schwannomin) and functions as a tumor suppressor.

Page 9: NLP for Biomedical Applications

9Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

SchwannomaSchwannoma (acoustic (acoustic neuromaneuroma))

http://www.mayoclinic.com

Page 10: NLP for Biomedical Applications

10Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Page 11: NLP for Biomedical Applications

11Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

NF2 geneNF2 gene

http://staff.washington.edu/timk/cyto/human/ http://www.ncbi.nlm.nih.gov/mapview/

Page 12: NLP for Biomedical Applications

12Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Page 13: NLP for Biomedical Applications

13Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

MerlinMerlin

◆◆ SynonymsSynonyms●● NeurofibrominNeurofibromin 22●● SchwannominSchwannomin●● SchwannomerlinSchwannomerlin●● NeurofibromatosisNeurofibromatosis--22

◆◆ 10 10 isoformsisoforms◆◆ AnnotationsAnnotations

●● Negative regulation of cell proliferationNegative regulation of cell proliferation●● CytoskeletonCytoskeleton●● Plasma membrane Plasma membrane

Page 14: NLP for Biomedical Applications

14Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Page 15: NLP for Biomedical Applications

15Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

Neurofibromatosis 2(Type II neurofibromatosis,

Bilateral acoustic neurofibromatosis)C0027832

NF2(Neurofibromin 2 gene)

C0085114 Merlin(Schwannomin,

Neurofibromin 2)C0254123

NEUROFIBROMATOSIS,TYPE II; NF2

#101000

������������ ������ ����� ������� ������������������

U49724OMIM GenbankExternal resources

UMLS Metathesaurus(Concepts and relations)

Amino Acid,

Peptide, or Protein

Biologically Active

Substance

Neoplastic Process Gene or Genome

UMLS Semantic Network (Semantic Types)

Merlin, Drosophila

Tumor suppressorgenes

Benign neoplasmsof cranial nerves

Neuro-fibromatoses

Tumor suppressorproteins

Page 16: NLP for Biomedical Applications

16Lister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical CommunicationsLister Hill National Center for Biomedical Communications

LimitationsLimitations

◆◆ Genes not systematically representedGenes not systematically represented●● Most gene products and diseases areMost gene products and diseases are

◆◆ Gene/Gene productGene/Gene product--Disease relationsDisease relations●● Not systematically representedNot systematically represented

●● Not explicitly represented (e.g., coNot explicitly represented (e.g., co--occurrence)occurrence)

◆◆ CrossCross--references not systematically representedreferences not systematically represented

◆◆ Naming conventions (genes)Naming conventions (genes)

Page 17: NLP for Biomedical Applications

MedicalOntologyResearch

Olivier BodenreiderOlivier Bodenreider

Lister Hill National CenterLister Hill National Centerfor Biomedical Communicationsfor Biomedical CommunicationsBethesda, Maryland Bethesda, Maryland -- USAUSA

Contact:Contact:Web:Web:

[email protected]@nlm.nih.govmor.nlm.nih.govmor.nlm.nih.gov