Upload
vinith-varghese
View
87
Download
5
Tags:
Embed Size (px)
DESCRIPTION
Information Extraction (IE) focuses on retrieving certain type of information from natural language texts by automatic processing. IE plays an important role in biomedical domain since the knowledge within this area is significantly growing. Relation between entities within this domain can facilitate in various tasks within this domain. Thus this thesis work focuses on extracting semantic information using the concept of semantic role labeling (SRL) with the help of background knowledge sources like ontology.
Citation preview
1
Thesis Final Presentation
Semantic Role Labeling for Information Extraction of Bio-medical data
Shikha Jacob Mathew Vinith Varghese
School of Engineering, Jönköping University
03/26/2014
2
Agenda
Description of the problem
Purpose of this study
Methodology
IE System
Results
Conclusion & Future Work
3
Description of the problem
Biomedical domain is flooded with large amount of information.
Research questions
• How to improvise the Natural Language Processing (NLP) components with increased performance so as to extract high quality information from biomedical domain?
• To find a solution using domain specific knowledge to generate high quality relations between different entities within the domain based on Semantic Role Labeling (SRL)?
StructuredExtract relevent
and useful Manage
4
Purpose of this study
Develop a useful Information Extraction (IE) system that extracts relation between entities using information obtained from SRL within the biomedical domain.
This is accomplished by introducing two features:
Name Entity Recognition Ontology
5
Methodology: Research Approach (1/2)
Design Science Research
Quantitative Evaluation
Adaptive Software Development
6
Methodology: Research Framework (2/2)
The general methodology of design science research
7
IE System - Framework (1/2)
8
IE System - Framework (2/2)
Relational Detector:The steps include:
Dependency Parser- Parser Model
Semantic Role Labeler
9
Semantic Role Labeling (SRL)
• SRL process includes: Pre-processing Argument Identification Argument Classification Post-Processing
• Features used: Word Form, Lemmatizer, POS tagging, Head Word, Dependency Label. Introduced domain specific Features: Ontology, NER.
10
SRL Example
A0 is Agent A1: Patient or Theme
11
Domain Specific Name Entity Recognizer (NER)
The Concept Used for NER:
• Conditional Random Field (CRF): Statistical Modeling Method Pattern Recognition Structured Prediction.
• Use: Argument’s Boundary Identification• Patterns from POS Tag.
12
Domain-specific NER Example
13
Domain Specific Ontology
• Conceptual knowledge organised in a computer based representation.
• IE needs ontologies for interpreting the texts and extracting relevant information .
• Metathesaurus and Semantic Network• UIMA Semantic Types
Broad categories for concepts (Metathesaurus) Use: Predicate Identification: Process/function ST
14
Results: Predicate Identification (1/2)
P R F10
20
40
60
80
100
Before Feature
After Feature
Evaluation criteria: Precision, Recall, F1-measure
15
Results: Predicate Identification (2/2)
• More biomedical predicates identified• Drawbacks:
few false negatives missing predicates
16
Results: Argument Identification (1/2)
P R F10
20
40
60
80
100
Before Feature
After Feature
17
Results: Argument Identification(2/2)
• Boundary of the predicate is small
[John is playing with a bat ] and a ball.
• Lack of identifying predicates
18
Conclusion (1/2)
High quality information extraction
More predicates-more arguments-more relations
Biomedical Field-Researcher:
Integrated information- further study/investigation Manage and structure Easy access
Predicate ArgumentArgument
RELATIONS
19
Conclusion (2/2)
Drawbacks
Speed
Missing predicates
False Negatives
Small predicate boundary
20
Further investigation
Feature Engineering based on context of the text
Name entity classification-NER
How to introduce features so as to not compromise the performance of the system
Catagories, relationships, synonyms
Clause Boundary/Proportional attachments
For making the system specific
Speed/Performance
Predicate boundary identification
Better result-Ontology Information
21
Thank You!!
Questions