Upload
clarissa-verne
View
48
Download
2
Embed Size (px)
DESCRIPTION
Amit Satsangi [email protected]. Towards Applying Text Mining and Natural Language Processing for Biomedical Ontology Acquisition Inniss T., Light M., Thomas G., Lee J., Grassi M., Williams A. TMBIO(2006). Focus. Ontology for describing age-related macular degeneration (AMD) - PowerPoint PPT Presentation
Citation preview
Faculty of Computer Science
CMPUT 605 December 06, 2007March 31,
2008© 2006
Towards Applying Text Mining and Natural Language Processing for Biomedical Ontology Acquisition
Inniss T., Light M., Thomas G., Lee J., Grassi M., Williams A. TMBIO(2006)
Amit [email protected]
© 2006
Department of Computing Science
CMPUT 605
Focus
Ontology for describing age-related macular degeneration (AMD)
Comparison of the accuracy of three methods for Ontology – Natural Language Processing (NLP) – Text Mining (SAS Text Miner)
– Human Expert
Manual and adhoc knowledge acquisition
IDOCS (Intelligent Distributed Ontology Consensus System)
© 2006
Department of Computing Science
CMPUT 605
Introduction
No existing common and standardized vocabulary for classification of disease types for certain eye-diseases
Clinicians, dispersed geographically, may use different terms to describe the same condition
Research aimed at extracting the feature and
attribute descriptions for the vocabulary of AMD,
and build an Ontology from that.
© 2006
Department of Computing Science
CMPUT 605
Related Work
Lot of research done, since 1990’s, for applying NLP techniques in medicine, bio-medicine etc.
NLP & Text Data Mining have been recognized to play an important role in this endeavor
Research focused on online repositories such as Medline & PubMed
NLP systems developed: MedLee, UMLS, GENIES etc.
© 2006
Department of Computing Science
CMPUT 605
IDOCS
© 2006
Department of Computing Science
CMPUT 605
Methodology
Four clinical experts in retinal diseases enlisted to view 100 eye sample images of AMD
Experts in different geographic locations
Described the observations using digital voice recorders – no artificially imposed vocabulary constraints
Another retinal expert for manual parsing of the transcribed text – extracting key words, organization of key-words into categories etc.
© 2006
Department of Computing Science
CMPUT 605
Results: Human Experts
© 2006
Department of Computing Science
CMPUT 605
Methodology: NLP
NLP: Used for information extraction and automatic summarization.
Identify short sequences of words having meaning over and above a meaning composed directly from their parts – “extreme programming”
Ngram Statistics Package (NSP) used for collocation discovery in case of bi-grams
Word-pair associations measured by PMI
© 2006
Department of Computing Science
CMPUT 605
Methodology: NLP
Large PMI for larger degree of association between
the words
© 2006
Department of Computing Science
CMPUT 605
Results: NLP
© 2006
Department of Computing Science
CMPUT 605
Methodology:Text Mining (SAS Text Miner)
Collection of documents (corpus) used as input to any text mining algorithm
Corpus broken into tokens or terms (tokens in a particular language)
Term weighting Measures: Entropy, Inverse Document Frequency (IDF), Global Frequency (GF) -IDF, None (Global weight of 1) & Normal term wt.
© 2006
Department of Computing Science
CMPUT 605
Results: Text Miner
Frequency wt. None
Term wt. Normal
© 2006
Department of Computing Science
CMPUT 605
Common Terms
sss
© 2006
Department of Computing Science
CMPUT 605
Comparison
Thus text mining is a viable and effective method for determining vocabulary to describe a particular disease
Text Mining found a lot of terms that NLP found
Human Expert is the best Ground Truth
© 2006
Department of Computing Science
CMPUT 605
Ontology Generation
© 2006
Department of Computing Science
CMPUT 605
Conclusion and Future Work
Human experts are the best, but they did miss some key descriptors
Text Mining and NLP can enhance the generation of feature generations, by preventing the above case
As a consequence more robust vocabulary can be generated
Extension – evaluate the effectiveness of the automated tools, text mining & NLP
Different weighting schemes to be tried in the future
© 2006
Department of Computing Science
CMPUT 605
Thank You For Your Attention!