29
Distillation of Knowledge from the Research Literatures on Alzheimer’s Dementia Wutthipong Kongburan, Mark Chignell, and Jonathan H. Chan School of Information Technology King Mongkut's University of Technology Thonburi JSCI 2017 1

Distillation of Knowledge from the Research Literatures on ... · errors between intervention and other class -> such intervention entities are not in corpus (e.g., Aromatherapy,

  • Upload
    others

  • View
    4

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Distillation of Knowledge from the Research Literatures on ... · errors between intervention and other class -> such intervention entities are not in corpus (e.g., Aromatherapy,

Distillation of Knowledge from the Research Literatures on Alzheimer’s

Dementia

Wutthipong Kongburan, Mark Chignell, and Jonathan H. Chan

School of Information TechnologyKing Mongkut's University of Technology Thonburi

JSCI 2017 1

Page 2: Distillation of Knowledge from the Research Literatures on ... · errors between intervention and other class -> such intervention entities are not in corpus (e.g., Aromatherapy,

Agenda

INTRODUCTION

PROPOSED METHOD

EXPERIMENT, RESULT AND DISCUSSION

CONCLUSION AND FUTURE WORK

2

Page 3: Distillation of Knowledge from the Research Literatures on ... · errors between intervention and other class -> such intervention entities are not in corpus (e.g., Aromatherapy,

Agenda

INTRODUCTION

PROPOSED METHOD

EXPERIMENTAL RESULT AND DISCUSSION

CONCLUSION AND FUTURE WORK

3

Page 4: Distillation of Knowledge from the Research Literatures on ... · errors between intervention and other class -> such intervention entities are not in corpus (e.g., Aromatherapy,

4

Page 5: Distillation of Knowledge from the Research Literatures on ... · errors between intervention and other class -> such intervention entities are not in corpus (e.g., Aromatherapy,

Alzheimer Disease 5

Page 6: Distillation of Knowledge from the Research Literatures on ... · errors between intervention and other class -> such intervention entities are not in corpus (e.g., Aromatherapy,

Number of publications related to Alzheimer treatment

added into PubMed every year

0

500

1000

1500

2000

2500

300019

9519

9619

9719

9819

9920

0020

0120

0220

0320

0420

0520

0620

0720

0820

0920

1020

1120

1220

1320

1420

15

#Publications

6

Page 7: Distillation of Knowledge from the Research Literatures on ... · errors between intervention and other class -> such intervention entities are not in corpus (e.g., Aromatherapy,

Text Mining Research & Applications 7

“the process of automatic deriving high-quality information from natural language text”

Page 8: Distillation of Knowledge from the Research Literatures on ... · errors between intervention and other class -> such intervention entities are not in corpus (e.g., Aromatherapy,

Overview of Information ExtractionOmega-3 fatty acids in fish may help prevent cognitive decline in Alzheimer patient.

8

Page 9: Distillation of Knowledge from the Research Literatures on ... · errors between intervention and other class -> such intervention entities are not in corpus (e.g., Aromatherapy,

Named Entity Recognition (NER)

Classify tokens in to predefined categories such as person name, organizations, diseases, proteins.

9

Page 10: Distillation of Knowledge from the Research Literatures on ... · errors between intervention and other class -> such intervention entities are not in corpus (e.g., Aromatherapy,

NER Features and Models

Campos, David, José Luís Oliveira, and Sérgio Matos. Biomedical named entity recognition: a survey of machine-learning tools. INTECH Open Access Publisher, 2012.

10

Page 11: Distillation of Knowledge from the Research Literatures on ... · errors between intervention and other class -> such intervention entities are not in corpus (e.g., Aromatherapy,

NER Features

http://nlp.stanford.edu/software/jenny-ner-2007.pdf

11

Page 12: Distillation of Knowledge from the Research Literatures on ... · errors between intervention and other class -> such intervention entities are not in corpus (e.g., Aromatherapy,

Conditional Random Fields (CRFs) 12

a type of discriminative probabilistic model.

O is observation words sequence S is label sequence

Linear chain CRFs define the conditional probability of a state sequence given an input sequence to be:

Z0 is a normalization factor of all state sequences [0, 1] fj(si−1 , si , o, i) is one of m functions that describes a feature j is a learned weight for each such feature function (define) If j < 0 : CRFs to avoid using this feature If j = 0 : this feature does not affect the probability If j > 0 : improve probability of S

si−1 , si represent previous and current label for observation words position i

Page 13: Distillation of Knowledge from the Research Literatures on ... · errors between intervention and other class -> such intervention entities are not in corpus (e.g., Aromatherapy,

Conditional Random Fields (CRFs) 13

fj(si−1 , si , o, i) 1 if si = Intervention and o begins with

memantin- 0 Otherwise

= 1 Z0 is a normalization factor

memantine Interventionmemantine monotherapy Interventionmemantine treatment Intervention

Morphological featurePrefixmemantin-…

Testing tokenmemantinchlorid

P(Disease | memantinchlorid) = 0.0P(Other | memantinchlorid) = 0.0P(Intervention | memantinchlorid) = 1.0

Classifier/Model

Training Corpus

Extract feature/Train model NER

memantinchlorid Intervention

Page 14: Distillation of Knowledge from the Research Literatures on ... · errors between intervention and other class -> such intervention entities are not in corpus (e.g., Aromatherapy,

Distillation of Knowledge from the Research Literatures on Alzheimer’s Dementia

14

“Text mining technology may assist in distilling knowledge from the vast corpus of research literature on Alzheimer’s dementia.”

Page 15: Distillation of Knowledge from the Research Literatures on ... · errors between intervention and other class -> such intervention entities are not in corpus (e.g., Aromatherapy,

Agenda

INTRODUCTION

PROPOSED METHOD

EXPERIMENTAL RESULT AND DISCUSSION

CONCLUSION AND FUTURE WORK

15

Page 16: Distillation of Knowledge from the Research Literatures on ... · errors between intervention and other class -> such intervention entities are not in corpus (e.g., Aromatherapy,

Proposed method

Manual annotation

AD Intervention

corpus

Comparison

Evaluation results

Annotationresults

Named entity recognition

Medical abstracts

Train classifier

Manually annotated sentence

16

Page 17: Distillation of Knowledge from the Research Literatures on ... · errors between intervention and other class -> such intervention entities are not in corpus (e.g., Aromatherapy,

Corpus construction 50 abstracts

“Alzheimer treatment” “Alzheimer treatment” “Nonpharmacologictreatment for Alzheimer”

20 abstracts 10 abstracts 20 abstracts

TitleAbstract

17

Page 18: Distillation of Knowledge from the Research Literatures on ... · errors between intervention and other class -> such intervention entities are not in corpus (e.g., Aromatherapy,

Annotation Procedure

The three labels assigned are Disease, Intervention, and Other.

All entities to be annotated were nouns.

A general term will be excluded.

Pronouns and co-references were excluded.

Special characters (e.g., quote, dash, or parenthesis) at the beginning or end of entities were not considered.

18

Page 19: Distillation of Knowledge from the Research Literatures on ... · errors between intervention and other class -> such intervention entities are not in corpus (e.g., Aromatherapy,

Disease entity

Disease included both synonymous mentions and abbreviated forms of AD.

19

Page 20: Distillation of Knowledge from the Research Literatures on ... · errors between intervention and other class -> such intervention entities are not in corpus (e.g., Aromatherapy,

Intervention entity

Any object or action that is performed by doctor or other clinician targeted in order to help the AD patient.Treatment, prevention, diagnosis, cause, risk factor

Such typical treatment - the side effects.Medical interventions

alternative treatment increasing patient satisfaction lead to better health outcomes

20

Page 21: Distillation of Knowledge from the Research Literatures on ... · errors between intervention and other class -> such intervention entities are not in corpus (e.g., Aromatherapy,

Agenda

INTRODUCTION

PROPOSED METHOD

EXPERIMENTAL RESULT AND DISCUSSION

CONCLUSION AND FUTURE WORK

21

Page 22: Distillation of Knowledge from the Research Literatures on ... · errors between intervention and other class -> such intervention entities are not in corpus (e.g., Aromatherapy,

Experimental Design

Manual annotation

AD Intervention

corpus

Comparison

Evaluation results

Annotationresults

Named entity recognition

Medical abstracts

Train classifier

Manually annotated sentence

22

Page 23: Distillation of Knowledge from the Research Literatures on ... · errors between intervention and other class -> such intervention entities are not in corpus (e.g., Aromatherapy,

Experiment (1)

Manual annotation

AD Intervention

corpus

Comparison

Evaluation results

Annotationresults

Named entity recognition

Medical abstracts

Train classifier

Manually annotated sentence

“Alzheimer treatment”

first 10 webpages

23

Page 24: Distillation of Knowledge from the Research Literatures on ... · errors between intervention and other class -> such intervention entities are not in corpus (e.g., Aromatherapy,

Experiment (1)Measure PerformancePrecision 0.987Recall 0.397F1-score 0.566Accuracy 0.982

24

Page 25: Distillation of Knowledge from the Research Literatures on ... · errors between intervention and other class -> such intervention entities are not in corpus (e.g., Aromatherapy,

Discussion

Further analysis on confusion matrix, false positive and false negative tokens of intervention entity errors between intervention and other class -> such

intervention entities are not in corpus (e.g., Aromatherapy, relaxing song, art therapy, coconut oil, pet therapy, Rivastigmine or Exelon, Galantamine or Razadyne). errors between intervention and disease class -> entity's

boundary. For instance, Disease Alzheimer’s should be labeled as Disease, however when the token is Disease Alzheimer’s drugs, they must be labeled as Intervention. a case sensitive in configuration. Although we can exclude

a proper noun from annotation, such as Alzheimer’s Healthcare Center, a mislabeled entity can occur for example, Memantine is not annotated as Intervention.

25

Page 26: Distillation of Knowledge from the Research Literatures on ... · errors between intervention and other class -> such intervention entities are not in corpus (e.g., Aromatherapy,

Experiment (2)“Alzheimer treatment”

Search criteria Unseen/new InterventionNormal search unfamiliar music, dance therapy, environmental

cueing, Chinese medicine, coral calcium, animal-assisted therapy, multi-sensory therapy, massage

physical therapy, intranasal insulin therapy

physical therapy, intranasal insulin therapy, antiepileptic drugs, light therapy, flashing light therapy, mouse’s spineflashing light therapy, routine screeningbiogen therapy, nilvadipine, BACE inhibitors

26

Page 27: Distillation of Knowledge from the Research Literatures on ... · errors between intervention and other class -> such intervention entities are not in corpus (e.g., Aromatherapy,

Agenda

INTRODUCTION

PROPOSED METHOD

EXPERIMENTAL RESULT AND DISCUSSION

CONCLUSION AND FUTURE WORK

27

Page 28: Distillation of Knowledge from the Research Literatures on ... · errors between intervention and other class -> such intervention entities are not in corpus (e.g., Aromatherapy,

Conclusion and Future work 28

Page 29: Distillation of Knowledge from the Research Literatures on ... · errors between intervention and other class -> such intervention entities are not in corpus (e.g., Aromatherapy,

29