17
Faculty of Computer Science CMPUT 605 December 06, 2007March 31, 2008 © 2006 Towards Applying Text Mining and Natural Language Processing for Biomedical Ontology Acquisition Inniss T., Light M., Thomas G., Lee J., Grassi M., Williams A. TMBIO(2006) Amit Satsangi [email protected]

Amit Satsangi [email protected]

Embed Size (px)

DESCRIPTION

Amit Satsangi [email protected]. Towards Applying Text Mining and Natural Language Processing for Biomedical Ontology Acquisition Inniss T., Light M., Thomas G., Lee J., Grassi M., Williams A. TMBIO(2006). Focus. Ontology for describing age-related macular degeneration (AMD) - PowerPoint PPT Presentation

Citation preview

Page 1: Amit Satsangi amit@cs.ualberta

Faculty of Computer Science

CMPUT 605 December 06, 2007March 31,

2008© 2006

Towards Applying Text Mining and Natural Language Processing for Biomedical Ontology Acquisition

Inniss T., Light M., Thomas G., Lee J., Grassi M., Williams A. TMBIO(2006)

Amit [email protected]

Page 2: Amit Satsangi amit@cs.ualberta

© 2006

Department of Computing Science

CMPUT 605

Focus

Ontology for describing age-related macular degeneration (AMD)

Comparison of the accuracy of three methods for Ontology – Natural Language Processing (NLP) – Text Mining (SAS Text Miner)

– Human Expert

Manual and adhoc knowledge acquisition

IDOCS (Intelligent Distributed Ontology Consensus System)

Page 3: Amit Satsangi amit@cs.ualberta

© 2006

Department of Computing Science

CMPUT 605

Introduction

No existing common and standardized vocabulary for classification of disease types for certain eye-diseases

Clinicians, dispersed geographically, may use different terms to describe the same condition

Research aimed at extracting the feature and

attribute descriptions for the vocabulary of AMD,

and build an Ontology from that.

Page 4: Amit Satsangi amit@cs.ualberta

© 2006

Department of Computing Science

CMPUT 605

Related Work

Lot of research done, since 1990’s, for applying NLP techniques in medicine, bio-medicine etc.

NLP & Text Data Mining have been recognized to play an important role in this endeavor

Research focused on online repositories such as Medline & PubMed

NLP systems developed: MedLee, UMLS, GENIES etc.

Page 5: Amit Satsangi amit@cs.ualberta

© 2006

Department of Computing Science

CMPUT 605

IDOCS

Page 6: Amit Satsangi amit@cs.ualberta

© 2006

Department of Computing Science

CMPUT 605

Methodology

Four clinical experts in retinal diseases enlisted to view 100 eye sample images of AMD

Experts in different geographic locations

Described the observations using digital voice recorders – no artificially imposed vocabulary constraints

Another retinal expert for manual parsing of the transcribed text – extracting key words, organization of key-words into categories etc.

Page 7: Amit Satsangi amit@cs.ualberta

© 2006

Department of Computing Science

CMPUT 605

Results: Human Experts

Page 8: Amit Satsangi amit@cs.ualberta

© 2006

Department of Computing Science

CMPUT 605

Methodology: NLP

NLP: Used for information extraction and automatic summarization.

Identify short sequences of words having meaning over and above a meaning composed directly from their parts – “extreme programming”

Ngram Statistics Package (NSP) used for collocation discovery in case of bi-grams

Word-pair associations measured by PMI

Page 9: Amit Satsangi amit@cs.ualberta

© 2006

Department of Computing Science

CMPUT 605

Methodology: NLP

Large PMI for larger degree of association between

the words

Page 10: Amit Satsangi amit@cs.ualberta

© 2006

Department of Computing Science

CMPUT 605

Results: NLP

Page 11: Amit Satsangi amit@cs.ualberta

© 2006

Department of Computing Science

CMPUT 605

Methodology:Text Mining (SAS Text Miner)

Collection of documents (corpus) used as input to any text mining algorithm

Corpus broken into tokens or terms (tokens in a particular language)

Term weighting Measures: Entropy, Inverse Document Frequency (IDF), Global Frequency (GF) -IDF, None (Global weight of 1) & Normal term wt.

Page 12: Amit Satsangi amit@cs.ualberta

© 2006

Department of Computing Science

CMPUT 605

Results: Text Miner

Frequency wt. None

Term wt. Normal

Page 13: Amit Satsangi amit@cs.ualberta

© 2006

Department of Computing Science

CMPUT 605

Common Terms

sss

Page 14: Amit Satsangi amit@cs.ualberta

© 2006

Department of Computing Science

CMPUT 605

Comparison

Thus text mining is a viable and effective method for determining vocabulary to describe a particular disease

Text Mining found a lot of terms that NLP found

Human Expert is the best Ground Truth

Page 15: Amit Satsangi amit@cs.ualberta

© 2006

Department of Computing Science

CMPUT 605

Ontology Generation

Page 16: Amit Satsangi amit@cs.ualberta

© 2006

Department of Computing Science

CMPUT 605

Conclusion and Future Work

Human experts are the best, but they did miss some key descriptors

Text Mining and NLP can enhance the generation of feature generations, by preventing the above case

As a consequence more robust vocabulary can be generated

Extension – evaluate the effectiveness of the automated tools, text mining & NLP

Different weighting schemes to be tried in the future

Page 17: Amit Satsangi amit@cs.ualberta

© 2006

Department of Computing Science

CMPUT 605

Thank You For Your Attention!