22
NLP Fundamentals: Methods and Shared Lexical Resources Guergana Savova, PhD Boston Childrens Hospital and Harvard Medical School

NLP Fundamentals: Methods and Shared Lexical Resources

  • Upload
    homer

  • View
    97

  • Download
    0

Embed Size (px)

DESCRIPTION

NLP Fundamentals: Methods and Shared Lexical Resources. Guergana Savova , PhD Boston Childrens Hospital and Harvard Medical School. Overview. Clinical Element Model (CEM) templates as normalization targets for SHARP NLP NLP areas of research Methods Shared Lexical Resources. - PowerPoint PPT Presentation

Citation preview

Page 1: NLP Fundamentals: Methods and Shared Lexical Resources

NLP Fundamentals: Methods and Shared Lexical Resources

Guergana Savova, PhDBoston Childrens Hospital and

Harvard Medical School

Page 2: NLP Fundamentals: Methods and Shared Lexical Resources

Overview

Clinical Element Model (CEM) templates as normalization targets for SHARP NLP

NLP areas of research Methods Shared Lexical Resources

Page 3: NLP Fundamentals: Methods and Shared Lexical Resources

CEMs as NLP Normalization Target

Page 4: NLP Fundamentals: Methods and Shared Lexical Resources
Page 5: NLP Fundamentals: Methods and Shared Lexical Resources

A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 mpresentation. Her initial blood glucose was 340 mg/dL. Glyburide

A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 months before this presentation. Her initial blood glucose was 340 mg/dL. Glyburide

A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 months before this presentation. Her initial blood glucose was 340 mg/dL. Glyburide

A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 months before this presentation. Her initial blood glucose was 340 mg/dL. Glyburide 2.5 mg once daily was prescribed. Since then, self-monitoring of blood glucose (SMBG) showed blood glucose levels of 250-270 mg/dL. She was referred to an endocrinologist for further evaluation. On examination, she was normotensive and not acutely ill. Her body mass index (BMI) was 18.7 kg/m2 following a recent 10 lb weight loss. Her thyroid was symmetrically enlarged and ankle reflexes absent. Her blood glucose was 272 mg/dL, and her hemoglobin A1c (HbA1c) was 10.3%. A lipid profile showed a total cholesterol of 261 mg/dL, triglyceride level of 321 mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL. Thyroid function was normal. Urinanalysis showed trace ketones. She adhered to a regular exercise program and vitamin regimen, smoked 2 packs of cigarettes daily for the past 25 years, and limited her alcohol intake to 1 drink daily. Her mother's brother was diabetic.

Processing Clinical Notes

A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 months before this presentation. Her initial blood glucose was 340 mg/dL. Glyburide 2.5 mg once daily was prescribed. Since then, self-monitoring of blood glucose (SMBG) showed blood glucose levels of 250-270 mg/dL. She was referred to an endocrinologist for further evaluation. On examination, she was normotensive and not acutely ill. Her body mass index (BMI) was 18.7 kg/m2 following a recent 10 lb weight loss. Her thyroid was symmetrically enlarged and ankle reflexes absent. Her blood glucose was 272 mg/dL, and her hemoglobin A1c (HbA1c) was 10.3%. A lipid profile showed a total cholesterol of 261 mg/dL, triglyceride level of 321 mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL. Thyroid function was normal. Urinanalysis showed trace ketones. She adhered to a regular exercise program and vitamin regimen, smoked 2 packs of cigarettes daily for the past 25 years, and limited her alcohol intake to 1 drink daily. Her mother's brother was diabetic.

Page 6: NLP Fundamentals: Methods and Shared Lexical Resources

Clinical Element ModelDisorder CEM text: diabetes mellituscode: 73211009subject: patient relative temporal context: 3 months agonegation indicator: not negated

Disorder CEM text: diabetes mellituscode: 73211009subject: family member relative temporal context: negation indicator: not negated

Tobacco Use CEM text: smokingcode: 365981007subject: patient relative temporal context: 25 yearsnegation indicator: not negated

Medication CEM text: Glyburidecode: 315989subject: patient frequency: once dailynegation indicator: not negated strength: 2.5 mg

A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 months before this presentation. Her initial blood glucose was 340 mg/dL. Glyburide 2.5 mg once daily was prescribed. Since then, self-monitoring of blood glucose (SMBG) showed blood glucose levels of 250-270 mg/dL. She was referred to an endocrinologist for further evaluation. On examination, she was normotensive and not acutely ill. Her body mass index (BMI) was 18.7 kg/m2 following a recent 10 lb weight loss. Her thyroid was symmetrically enlarged and ankle reflexes absent. Her blood glucose was 272 mg/dL, and her hemoglobin A1c (HbA1c) was 10.3%. A lipid profile showed a total cholesterol of 261 mg/dL, triglyceride level of 321 mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL. Thyroid function was normal. Urinanalysis showed trace ketones. She adhered to a regular exercise program and vitamin regimen, smoked 2 packs of cigarettes daily for the past 25 years, and limited her alcohol intake to 1 drink daily. Her mother's brother was diabetic.

A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 months before this presentation. Her initial blood glucose was 340 mg/dL. Glyburide 2.5 mg once daily was prescribed. Since then, self-monitoring of blood glucose (SMBG) showed blood glucose levels of 250-270 mg/dL. She was referred to an endocrinologist for further evaluation. On examination, she was normotensive and not acutely ill. Her body mass index (BMI) was 18.7 kg/m2 following a recent 10 lb weight loss. Her thyroid was symmetrically enlarged and ankle reflexes absent. Her blood glucose was 272 mg/dL, and her hemoglobin A1c (HbA1c) was 10.3%. A lipid profile showed a total cholesterol of 261 mg/dL, triglyceride level of 321 mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL. Thyroid function was normal. Urinanalysis showed trace ketones. She adhered to a regular exercise program and vitamin regimen, smoked 2 packs of cigarettes daily for the past 25 years, and limited her alcohol intake to 1 drink daily. Her mother's brother was diabetic.

A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 months before this presentation. Her initial blood glucose was 340 mg/dL. Glyburide 2.5 mg once daily was prescribed. Since then, self-monitoring of blood glucose (SMBG) showed blood glucose levels of 250-270 mg/dL. She was referred to an endocrinologist for further evaluation. On examination, she was normotensive and not acutely ill. Her body mass index (BMI) was 18.7 kg/m2 following a recent 10 lb weight loss. Her thyroid was symmetrically enlarged and ankle reflexes absent. Her blood glucose was 272 mg/dL, and her hemoglobin A1c (HbA1c) was 10.3%. A lipid profile showed a total cholesterol of 261 mg/dL, triglyceride level of 321 mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL. Thyroid function was normal. Urinanalysis showed trace ketones. She adhered to a regular exercise program and vitamin regimen, smoked 2 packs of cigarettes daily for the past 25 years, and limited her alcohol intake to 1 drink daily. Her mother's brother was diabetic.

A 43-year-old woman was diagnosed with type 2 diabetes mellitus by her family physician 3 months before this presentation. Her initial blood glucose was 340 mg/dL. Glyburide 2.5 mg once daily was prescribed. Since then, self-monitoring of blood glucose (SMBG) showed blood glucose levels of 250-270 mg/dL. She was referred to an endocrinologist for further evaluation. On examination, she was normotensive and not acutely ill. Her body mass index (BMI) was 18.7 kg/m2 following a recent 10 lb weight loss. Her thyroid was symmetrically enlarged and ankle reflexes absent. Her blood glucose was 272 mg/dL, and her hemoglobin A1c (HbA1c) was 10.3%. A lipid profile showed a total cholesterol of 261 mg/dL, triglyceride level of 321 mg/dL, HDL level of 48 mg/dL, and an LDL of 150 mg/dL. Thyroid function was normal. Urinanalysis showed trace ketones. She adhered to a regular exercise program and vitamin regimen, smoked 2 packs of cigarettes daily for the past 25 years, and limited her alcohol intake to 1 drink daily. Her mother's brother was diabetic.

Page 7: NLP Fundamentals: Methods and Shared Lexical Resources

Comparative EffectivenessDisorder CEM text: diabetes mellituscode: 73211009subject: patient relative temporal context: 3 months agonegation indicator: not negated

Disorder CEM text: diabetes mellituscode: 73211009subject: family member relative temporal context: negation indicator: not negated

Tobacco Use CEM text: smokingcode: 365981007subject: patient relative temporal context: 25 yearsnegation indicator: not negated

Medication CEM text: Glyburidecode: 315989subject: patient frequency: once dailynegation indicator: not negated strength: 2.5 mg

Compare the effectiveness of different treatment strategies (e.g., modifying target levels for glucose, lipid, or blood pressure) in reducing cardiovascular complications in newly diagnosed adolescents and adults with type 2 diabetes.

Compare the effectiveness of traditional behavioral interventions versus economic incentives in motivating behavior changes (e.g., weight loss, smoking cessation, avoiding alcohol and substance abuse) in children and adults.

Page 8: NLP Fundamentals: Methods and Shared Lexical Resources

Meaningful UseDisorder CEM text: diabetes mellituscode: 73211009subject: patient relative temporal context: 3 months agonegation indicator: not negated

Disorder CEM text: diabetes mellituscode: 73211009subject: family member relative temporal context: negation indicator: not negated

Tobacco Use CEM text: smokingcode: 365981007subject: patient relative temporal context: 25 yearsnegation indicator: not negated

Medication CEM text: Glyburidecode: 315989subject: patient frequency: once dailynegation indicator: not negated strength: 2.5 mg

• Maintain problem list• Maintain active med list• Record smoking status• Provide clinical summaries for each office visit• Generate patient lists for specific conditions• Submit syndromic surveillance data

Page 9: NLP Fundamentals: Methods and Shared Lexical Resources

Clinical PracticeDisorder CEM text: diabetes mellituscode: 73211009subject: patient relative temporal context: 3 months agonegation indicator: not negated

Medication CEM text: Glyburidecode: 315989subject: patient frequency: once dailynegation indicator: not negated strength: 2.5 mg

• Provide problem list and meds from the visit

Page 10: NLP Fundamentals: Methods and Shared Lexical Resources

Applications

Meaningful use of the EMR Comparative effectiveness Clinical investigation

– Patient cohort identification– Phenotype extraction

Epidemiology Clinical practice …..

Page 11: NLP Fundamentals: Methods and Shared Lexical Resources

The Science of NLP: Research Areas

Page 12: NLP Fundamentals: Methods and Shared Lexical Resources

NLP Areas of Research Part of speech tagging Parsing – constituency and dependency Predicate-argument structure (semantic role labeling) Named entity recognition Word sense disambiguation Relation discovery and classification Discourse parsing (text cohesiveness) Language generation Machine translation Summarization Creating datasets to be used for learning

– a.k.a. computable gold annotations– Active learning

12

Page 13: NLP Fundamentals: Methods and Shared Lexical Resources

Methods Principled approaches

– Linguistic theory– Computational science

Machine Learning– Supervised– Unsupervised– Lightly supervised

Rules derived by domain experts Combination How to integrate knowledge-based information with data-

driven methods

13

Page 14: NLP Fundamentals: Methods and Shared Lexical Resources

Applications (all apply to biomedicine) Information extraction

“No evidence of adenocarcinoma.”• Disorder

• Text: adenocarcinoma• Associated code: C0001418• Certainty: confirmed • Context: current• Subject: patient• Status: negated

Information retrieval Question answering Document classification Input for

– Decision support systems– Recommender systems

– ….14

Page 15: NLP Fundamentals: Methods and Shared Lexical Resources

Shared Lexical Resources

Page 16: NLP Fundamentals: Methods and Shared Lexical Resources

Why

Developing algorithms System evaluation Community-wide training and test sets

– Compare results and establish state-of-the-art– Establishing standards (ISO TC37)

Long tradition in the general NLP domain– Linguistic Data Consortium and PTB

Layers of annotations on the same text

Page 17: NLP Fundamentals: Methods and Shared Lexical Resources

Available gold annotations: clinical narrative

MiPACQ– 120K words of clinical narrative– Layers of annotations – pos tags, treebanking, propbanking,

UMLS entities and modifiers, UMLS relations and modifiers, coreference

ShARe (Shared Annotated Resources)– 500K words of clinical narrative– Layers of annotations – pos tags, phrasal chunks, UMLS entity

mentions of type Disease/Disorder and modifiers i2b2 shared tasks

– Medication– Coreference

17

Page 18: NLP Fundamentals: Methods and Shared Lexical Resources

Available gold annotations (cont.)

SHARPn– 500K words of clinical narrative– Layers of annotations – pos tags, treebanking, propbanking,

UMLS entities (Diseases/disorders, Signs/Symptoms, Procedures, Anatomical sites, Medications) and modifiers, UMLS relations (locationOf, degreeOf, resultsOf, treats/manages) and modifiers, coreference, template (Clinical Element Model; http://intermountainhealthcare.org/cem)

THYME (Temporal Histories of Your Medical Events)– 500K words of clinical narrative– Layers of annotations – same as MiPACQ and SHARPn +

temporal relations (ISO TimeML extensions to the clinical domain)

18

Page 19: NLP Fundamentals: Methods and Shared Lexical Resources

Sample Annotations

Page 20: NLP Fundamentals: Methods and Shared Lexical Resources

Presentation Lineup

Page 21: NLP Fundamentals: Methods and Shared Lexical Resources

Presentations

Dr. Steven Bethard– Enabling NLP technologies: dependency parsing and dependency-based

semantic role labeling– Critical for discovering CEM attributes and populating the CEM template

Dr. Dmitriy Dligach– Focus on discovering two CEM modifiers – body site and severity

Dr. Stephen Wu– Focus on discovering CEM modifiers related to the subject of the clinical event

Dr. Cheryl Clark– Focus on discovering CEM modifiers for negation and uncertainty

Implemented and released in cTAKES– Monday 1-2:30 pm, cTAKES tutorial and demo– Monday 3-5 pm, cTAKES coding sprint

21

Page 22: NLP Fundamentals: Methods and Shared Lexical Resources

SHARPn NLP Investigators (in alpha site order)

Childrens Hospital Boston and HMS (site PI: Guergana Savova)

Mayo Clinic (Hongfang Liu) MIT (site PI: Peter Szolovits) MITRE corporation (site PI: Lynette Hirschman) Seattle Group Health (site PI: David Carrell) SUNY Albany (site PI: Ozlem Uzuner) University of California, San Diego (site PI: Wendy Chapman) University of Colorado (site PI: Martha Palmer) University of Utah and Intermountain Healthcare (site PI:

Peter Haug)