Upload
others
View
21
Download
2
Embed Size (px)
Citation preview
DSpace Institution
DSpace Repository http://dspace.org
Computer Science thesis
2021-03-17
CONTEXT-BASED SEMANTIC ROLE
LABELER FOR AMHARIC
SENTENCES USING DEEP LEARNING
ALEMU, BELAY
http://ir.bdu.edu.et/handle/123456789/12394
Downloaded from DSpace Repository, DSpace Institution's institutional repository
BAHIR DAR UNIVERSITY
BAHIR DAR INSTITUTE OF TECHNOLOGY
SCHOOL OF RESEARCH AND POSTGRADUATE STUDIES
FACULTY OF COMPUTING
CONTEXT-BASED SEMANTIC ROLE LABELER FOR AMHARIC
SENTENCES USING DEEP LEARNING
MSC THESIS
BY
ALEMU BELAY TESSEMA
March 17, 2021
BAHIR DAR, ETHIOPIA
i | P a g e
CONTEXT-BASED SEMANTIC ROLE LABELER FOR AMHARIC
SENTENCES USING DEEP LEARNING
BY
ALEMU BELAY TESSEMA
A thesis submitted to the school of graduate studies of Bahir Dar
Institute of Technology, BDU in partial fulfillment of the requirements
for the degree of Masters in Information Technology program in the
Faculty of Computing
Advisor Name: Tesfa Tegegne (PhD)
March 17, 2021
Bahir Dar, Ethiopia
ii | P a g e
iii | P a g e
© 2021
ALEMU BELAY TESSEMA
All Rights Reserved
iv | P a g e
v | P a g e
ACKNOWLEDGEMENT
First and foremost, I would like to thank God for making everything complete in their time
and giving me strength to complete the thesis.
Secondly, I would like to express my deepest gratitude to my advisor Tesfa Tegegne (PhD)
for his generous and fruitful advice and willingness to help me all the time and for his
timely and critical evaluation of my thesis and providing different constructive comments
at each step of the thesis.
Last but not least, I would like to thank my families Mr. Misgan Belay and Mekdes Belay
and my friend Bedru Yimam for their encouragement, inspiration and day to day support
to undertake this endeavor from beginning to end.
Alemu Belay
vi | P a g e
LIST OF ABBREVIATIONS
AI Artificial Intelligence
Bi-LSTM Bidirectional Long Short-Term Memory
CRF Conditional Random Forest
IB1 Instance Based 1
IE Information Extraction
IR Information Retrieval
LOO-CV Leave One Out Cross Validation
LSTM Long-Short Term Memory
MBL Memory Based Learning
ME Maximum Entropy
MLP Multilayer Perceptron
MT Machine Translation
NLP Natural Language Processing
NMT Neural Machine Translation
POS tagging Part of Speech tagging
QA Question Answering
ReLU Rectified Linear Unit
RNN Recurrent Neural Network
SMO Sequential Minimal Optimization algorithm
SMT Social Media Text
SRL Semantic Role Labeling
SVM Support Vector Machine
TTS Text to Speech
vii | P a g e
Table of Contents
ACKNOWLEDGEMENT ...................................................................................................v
LIST OF ABBREVIATIONS ............................................................................................ vi
ABSTRACT ...................................................................................................................... xii
CHAPTER ONE: INTRODUCTION ..................................................................................1
1.1. Background ............................................................................................................................ 1
1.2. Statement of the Problem ....................................................................................................... 3
1.3. Objectives of the Study .......................................................................................................... 5
1.3.1. General Objective .......................................................................................................... 5
1.3.2. Specific Objectives ........................................................................................................ 6
1.4. Scope and Limitation ............................................................................................................. 6
1.5. Significance of the Study ....................................................................................................... 6
1.6. Methodology .......................................................................................................................... 7
1.6.1. Problem identification and motivation ........................................................................... 8
1.6.2. Objectives for a solution ................................................................................................ 8
1.6.3. Design and development ................................................................................................ 8
1.6.4. Demonstration ................................................................................................................ 8
1.6.5. Communication .............................................................................................................. 8
1.6.6. Evaluation Metrics ......................................................................................................... 9
1.7. Design and Development Tools ............................................................................................. 9
1.7.1. Python ............................................................................................................................ 9
1.7.2. TensorFlow .................................................................................................................... 9
1.8. Organization of the Thesis ................................................................................................... 10
CHAPTER TWO: LITERATURE REVIEW ....................................................................11
2.1. Natural Language Processing .............................................................................................. 11
2.2. Semantic Role ...................................................................................................................... 12
2.2.1. Common List of Semantic Roles ................................................................................. 13
2.2.2. Semantic Role Labeling Challenges ............................................................................ 15
2.3. Lexical Resources for SRL .................................................................................................. 15
2.3.1. PropBank ..................................................................................................................... 15
viii | P a g e
2.3.2. FrameNet ..................................................................................................................... 24
2.4. Classification Methods for SRL ........................................................................................... 25
2.4.1. Maximum Entropy ....................................................................................................... 25
2.4.2. Support Vector Machines ............................................................................................ 26
2.4.3. Conditional Random Forest (CRF) .............................................................................. 27
2.5. Deep Learning ...................................................................................................................... 28
2.5.1. Recurrent Neural Network (RNN) ............................................................................... 29
2.5.2. Long-Short Term Memory (LSTM) ............................................................................ 30
2.5.3. Multi-Layer Perceptron (MLP) .................................................................................... 32
2.6. Amharic Language ............................................................................................................... 34
2.6.1. Amharic Sentences ....................................................................................................... 36
CHAPTER THREE: RELATED WORKS ........................................................................39
3.1. Introduction ......................................................................................................................... 39
3.2. Semantic Role Labeling for English and European Languages........................................... 39
3.3. Semantic Role Labeling for Chinese Language .................................................................. 40
3.4. Semantic Role Labeling for Arabic Language .................................................................... 41
3.5. Semantic Role Labeling for Amharic Language ................................................................. 42
CHAPTER FOUR: SYSTEM DESIGN AND IMPLEMENTATION ..............................44
4.1. Introduction ......................................................................................................................... 44
4.2. Proposed System Architecture ............................................................................................ 44
4.2.1. Preprocessing Phase ..................................................................................................... 45
4.2.1. Training Phase ............................................................................................................. 48
4.2.2. Testing/Semantic Role Labeling Phase ........................................................................ 53
4.3. The Proposed Network Model ............................................................................................. 55
CHAPTER FIVE: RESULTS AND DISCUSSION ..........................................................56
5.1. Introduction ......................................................................................................................... 56
5.1.1. Dataset Collection ........................................................................................................ 56
5.1.2. Dataset Preparation ...................................................................................................... 57
5.2. Experiment/Implementation ................................................................................................. 58
5.2.1. Hyperparameter tuning ................................................................................................ 59
ix | P a g e
5.2.2. Proposed model training and Validation accuracy....................................................... 62
5.2.3. Proposed model Training and Validation loss ............................................................. 63
5.3. Experiments ......................................................................................................................... 64
5.3.1. Evaluation .................................................................................................................... 66
5.4. Discussion of Results ........................................................................................................... 71
CHAPTER SIX: CONCLUSION AND RECOMMENDATION .....................................72
6.1. Conclusion ........................................................................................................................... 72
6.2. Contribution of the study ..................................................................................................... 72
6.3. Recommendation ................................................................................................................. 73
REFERENCES ..................................................................................................................74
APPENDIXES ...................................................................................................................80
x | P a g e
LIST OF TABLES
Table 2. 1 List of arguments in PropBank and their description ...................................... 17
Table 2. 2 List of Annotated Adjuncts in PropBank with their Explanations .................. 17
Table 4. 1 Sample output of Preprocessing Phase ............................................................ 45
Table 4. 2 Sample Normalized verbs in the Dataset ......................................................... 46
Table 4. 3 Sample Amharic Sentence Tagged by online Amharic tagger module ........... 47
Table 4. 4 The Forward LSTM and backward LSTM Result for the sentence “ሳሙኤል ምሳሩን
በሞረድ ሳለ” .......................................................................................................................... 50
Table 4. 5 A Work of MLP Scoring function on the sentence “ዳንኤል ቢላዋ ሳለ” ............... 53
Table 4. 6 Sample Labelled Amharic Sentence Generated by the model ......................... 54
Table 5. 1 Collected Sample Amharic Sentences with their Domains ............................. 56
Table 5. 2 Collected Amount of Dataset from Different social media platforms ............. 57
Table 5. 3 Predicate-Argument Relation .......................................................................... 58
Table 5. 4 Training, Testing and Validation Dataset Description .................................... 59
Table 5. 5 List of Hyperparameters Used in the Model with their Description ................ 60
Table 5. 6 Comparison of Role Labeler Performance with and without context of multi-
sense predicate .................................................................................................................. 65
Table 5. 7 Average precision, recall, and F-score result for testing the model ............... 68
Table 5. 8 Individual role label performance on predicate sense based annotated data ... 68
Table 5. 9 Testing Result of the Semantic Role Labeler Model ....................................... 69
xi | P a g e
LIST OF FIGURES
Figure 4. 1 Proposed System Architecture ....................................................................... 44
Figure 4. 2 Concatenation of Argument and POS tag embedding .................................... 49
Figure 4. 3 Visualize MLP classifier model used in semantic role labeling..................... 51
Figure 4. 4 Proposed Network model ............................................................................... 55
Figure 4. 1 Proposed System Architecture ....................................................................... 44
Figure 4. 2 Concatenation of Argument and POS tag embedding .................................... 49
Figure 4. 3 Visualize MLP classifier model used in semantic role labeling..................... 51
Figure 4. 4 Proposed Network model ............................................................................... 55
Figure 5. 1 Proposed Model Training and validation status rate ...................................... 62
Figure 5. 2 Proposed model Training and Validation Loss Rate ...................................... 63
Figure 5. 3 Role labeling Performance with Predicate Sense Based Annotated Data ...... 66
Figure 5. 4 Sample Semantic role labels assigned by the model with the Testing data ... 70
xii | P a g e
ABSTRACT
Currently, different scholars are doing search on different natural language processing
tasks for different languages like, Arabic, English and Chinese such as machine
translation, question answering, information extraction and text summarization.
Nowadays, SRL has become a hot research issue and one of the main focusing areas. Since
it is a crucial and sentence-level semantic task to specify the main role of each argument
in a given text and used as an input for doing other NLP tasks. Unfortunately, the previous
researchers have not focused on Amharic sentences for semantics relationships of
constituents and predicate. In order to solve this gap, we have developed a context-based
semantic role labeler model for Amharic language using a deep learning approach called
Bidirectional Long-Short Term Memory networks by considering different senses of
predicate on simple Amharic sentences during annotation of dataset. The data were
collected from different social media platforms and student Amharic textbooks and
annotated semantically based on a PropBank data annotation guidelines and linguistic
experts from wollo university. From these datasets, we have 40 predicates which have more
than one contextual meaning each of them annotated depending on their sense and
assigned different role labels for multi-sense predicate data for training and testing the
model. The MLP classifiers were used to classify each argument to its associated role label
based on the score predicted by biaffine attentional scorer. The proposed model achieved
95.9% training accuracy and 84.9% Testing accuracy.
1 | P a g e
CHAPTER ONE: INTRODUCTION
1.1. Background
Natural language processing (NLP) is a branch of artificial intelligence (AI) that deals with
the interaction between computers and humans using natural language and help computers
to understand, interpret and manipulate human language [1].
NLP is important for machines to communicate with humans in their own language. Using
NLP for creating a seamless and interactive interface between humans with machines will
continue to be a top priority for today and tomorrow’s increasingly cognitive applications
[3]. During the creation of an interactive interface between humans with machines and
resolving natural language ambiguity semantic analysis and identifying argument roles is
necessary to hold the meaning of contents in a text [3, 4].
Semantic analysis is the process of understanding the meaning of words in natural
language [4] and starts by reading all of the words in content to capture the real meaning
of any text. It identifies the text element and assigns them to their logical and grammatical
roles.
NLP tasks such as Text Classification and Categorization, Textual entailment, Semantic
Parsing and Question Answering (QA), speech and character recognition, Machine
Translation (MT) required semantic role labeling to derive meaning from human languages
using different deep neural network approaches [1, 2]. Since, lower-level tasks such as
parsing provide very little attention towards the semantics of texts in a sentence [4].
Based on the above issue, different scholars across the world motivated to develop a
Semantic Role Labeling (SRL) model for different languages that provide greater attention
towards the semantic relations between the major constituents and predicate of a sentence.
Semantic Role Labeling is an important technique to perform semantic role analysis on
these NLP applications. Natural Language processing applications such as question
answering [5], Textual Entailment [6], machine translation [6], text summarization [7] and
2 | P a g e
information extraction [8] used semantic role labeler techniques as an intermediate task to
govern the semantic relationship between predicates and its constituents and this achieved
a good performance result.
In NLP, semantic role labeling is also called shallow semantic parsing [9] is the process
that assigns associate labels to words or phrases in a sentence that indicate their semantic
role in the sentence. It is a crucial and sentence-level semantic task that specifies Who did
what to whom, and How, When and Where? in a given sentence [1, 9]. For example, the
following Amharic sentence can be labeled as አበበ [ARG0-AGT] ቦርሳውን [ARG1-PAT] ለአልማዝ
[ARG2-BEN] ሰጠ. [REL]
During the time of assigning associate role labels to each argument in a sentence,
considering the sense of predicates in different domains is important for assigning role
labels for each argument properly. Because, different predicates have different senses and
can take a unique set of semantic roles to their arguments [15]. This will affect the
performance of semantic role assignment in semantic role labeling tasks because their
semantic roles relate with their sense. Therefore, the sense of predicates has to be
considered and polysemy verbs should be disambiguated and take an appropriate set of
semantic roles to their arguments depending on the predicate sense [24].
Let us see the sense of the predicate “አለፈ” in the following two simple Amharic sentences
(1) የሞላ አባት ትናንት ህይወቱ አለፈ and (2) የወንጀለኞቹ የፍርድ ቤት ቀጠሮ ትናንት አለፈ.
For instance, in the above two simple Amharic sentences the predicate “አለፈ” has two
senses each of them takes different semantic roles.
On the first sentence, the predicate “አለፈ” has the sense of “ሞተ፣ አረፈ፣ ከዚህ አለም ተለየ”.
Therefore, the verb/predicate takes who died (things died), when died (time of died) and
cause of died argument roles. Nevertheless, in the second sentence, the meaning of the
predicate “አለፈ” different from the first and takes argument roles different from the first.
i.e., things pass, time passes. Therefore, based on the sense of a predicate different set of
roles must be considered.
3 | P a g e
Due to this reason, we are motivated to develop a context-based semantic role labeler
model for simple Amharic sentences using Bidirectional Long-Short Term Memory (Bi-
LSTM) networks. So, the focus of this study is identifying polysemy verbs in our collected
dataset. Later on, assigning a semantically annotated role labels for their associated
arguments of predicates according to their contextual meaning and mixing multi-sense
predicate dataset to single-sense predicate dataset during training our model and try to show
the role of predicate (verb) sense disambiguation which is a sub component of word sense
disambiguation for semantic role labeling task on simple Amharic sentences.
1.2. Statement of the Problem
Semantic role labeling is an important step towards meaning understanding in natural
language processing which identifies and classifies the arguments associated with the
predicates of a sentence [5]. Several works have been done on the area of semantic role
labeling which considers sense of predicates during semantic role labeling using different
deep neural networks for different languages and achieves a better performance result from
the other state-of-art models [15]. However, it is impossible to use these proposed methods
for Amharic language because by its nature Amharic is highly inflectional,
morphologically rich language and uses a different way of semantic role extraction [17].
By considering the above problem, [18] developed a semantic role labeler model for simple
Amharic sentences using a conventional machine learning approach called Memory Based
Learning. They have developed the general architecture for semantic role labeler and
implemented feature extraction algorithms to extract 551 instances from 240 simple
Amharic sentences and obtained an F1-score of 82.51% with default parameters (i.e.,
number of nearest neighbors, distance metrics, class voting weight and feature weighting
metrics) and 89.29% with optimized parameters of IB1. Different predicates have different
senses in different Amharic sentences. For example, in the following two sentences “አማረ
ትልቅ የስጦታ ስዕል ሳለ” and “አማረ ጉንፋን ስለያዘው ክፉኛ ሳለ” the predicate “ሳለ” have different
senses and takes different role arguments.
However, E. Yirga and her colleagues [18] work did not consider the sense of predicate on
simple Amharic sentence structure taken from different domains. For example, the
4 | P a g e
following simple Amharic sentences contain the same predicate “ሳለ” with different
contextual meaning each of them takes different semantic roles to represent their associated
arguments found in each sentence.
So, based on their context the sentences can be labelled as follows: -
አማረ [ARG0-AGT] ትልቅ ስዕል [ARG1-BEN] ሳለ [REL] and
አማረ [ARG0-AGT] ጉንፋን ስለያዘው [ARGM-CAU] ክፉኛ [ARGM-MNR] ሳለ [REL]
The above two sentences are taken from two different domains. i.e., Entertainment and
Sport respectively. Due to this reason, on the above labeled simple Amharic sentence if we
ask the question, who draws or ማን ሳለ? The argument label [ARG0-AGT] provide answer
for this question, what is drawn or ምን ተሳለ? The argument label [ARG1-BEN] provide
answer for this question whereas if we ask the question, why Amare cough or አማረ ለምን
ሳለ? The argument label [ARGM-CAU] provides an answer for this question. Therefore, in
these sentences the predicate “ሳለ” have two different sense in each domain and each of the
arguments in each sentence takes different role label. However, E. Yirga and her colleagues
work did not consider such types of predicate cases in different domains rather they have
considered only single sense predicate dataset.
E. Yrga [18] uses a Memory Based Learning technique, in Memory Based Learning the
learning component is memory based all training examples should be stored in memory
and classified concepts based on their similarity with previously seen concepts on the
memory [20].
Memory Based Learning is a shallow machine learning technique, so, we need to train the
model with manually extracted task-specific features [22] and concepts are lost during a
backpropagation that causes vanishing/exploding gradient problems. Therefore, this
handcrafted based feature extraction consumes more time and causes feature unbalancing
on the concepts [21].
5 | P a g e
A study of [18] also uses k-Nearest neighbor (k-NN) algorithm which is a supervised
machine learning algorithm used for both classification and regression problems [36].
However, k-NN is a lazy learner algorithm (i.e., as the size of the data increases it does not
learn anything from the training data simply uses the training data itself for classification)
and used for only classifying numeric data [36].
In this study, we have developed a context-based Semantic Role Labeling model for simple
Amharic sentence structures by considering different senses of predicates during dataset
preparation using a deep learning technique called Bidirectional Long-Short Term Memory
networks. This will solve the problem of Memory Based Learning technique stated above
by extracting task-specific features automatically. In addition to this, we have used a deep
learning classifier i.e., multilayer perceptron classifier to solve the limitation of k-Nearest
neighbor (k-NN) algorithm which works based on a biaffine attentional scorer function and
can classify any types of data independent of the size of dataset.
Bidirectional Long-Short Term Memory networks are a special kind of Recurrent Neural
Network that have a capability of learning long-term dependencies [12] and contain a new
state called cell state that allows for avoiding vanishing/exploding gradient problems [21]
during a backpropagation.
To this end, this study attempts to answer the following research questions.
RQ 1: To what extent does deep Bi-LSTM classifier improve our semantic role labeling
performance in Amharic sentences?
RQ 2: How Predicate Sense disambiguation affects semantic role labeling on simple
Amharic sentences?
1.3. Objectives of the Study
1.3.1. General Objective
The general objective of this thesis work is designing a context-based semantic role label for
simple Amharic sentences by considering different senses of predicates using Deep Learning.
6 | P a g e
1.3.2. Specific Objectives
In order to achieve the general objective, we have set the following specific objectives.
To review literature on the area of SRL and verb sense disambiguation.
To identify the importance of predicate sense consideration for semantic role
labeling task on simple Amharic sentences
To select an appropriate deep learning classifier for our model.
To develop architecture for semantic role labeling systems.
To prepare a semantically annotated Amharic corpus for training and testing the
system.
To design a semantic role labeler network model
To evaluate the performance of the system using different performance evaluation
metrics.
1.4. Scope and Limitation
In this study, we developed a context based Semantic Role Labeling model for simple
Amharic sentences by considering different senses of predicates. The model does not work
for other language types other than Amharic and considers only simple Amharic sentences.
This work covers identifying the predicate and associated arguments in a given Amharic
sentence and assigning labels which indicates their role in a sentence for each argument
over the predicate in the given simple Amharic sentences. Our model used normalized
predicate and tagged dataset to assign associated role labels for each token.
1.5. Significance of the Study
This research work provides a benefit for other high-level NLP tasks. Because
understanding the semantic content and role of arguments in a given text is an intermediate
task in developing such natural language processing applications. Specially, the result of
this study used as an input for researchers work on the area of machine translation,
information extraction and text summarization.
7 | P a g e
Text Summarization: -Text summarization is a technique for producing automatic short
and important summaries from a huge document based on the user needs. Text
summarization uses SRL result as an input for estimating and identifying in which semantic
relations the entities participated and for estimating sentence similarity to know which
entities participating in which semantic relations are contained in two sentences [24].
Therefore, Predicates and Heads of Role are important for summarizing contents in each
document.
Information Extraction: - Information extraction refers to extracting structured
information from unstructured machine-readable documents automatically [8].
Information Extraction applies SRL for generalization purposes in template systems.
Therefore, semantic role labeling techniques used to construct useful rules that were used
for information extraction.
Machine Translation: - Machine translation is the process of translating from source
language text into the target language. Before performing translation, it is important to
understand the grammatical structure (word order) of the language. Therefore, SRL is
important to identify the structure of the sentence in different languages.
Example, to translate from 1. Abebe [AGENT/S] kicked [REL/V] the ball [ARG1/O] to
2. አበበ [AGENT/S] ኳሷን [ARG1/O] መታት [REL/O]
Therefore, to perform language translation understanding the sentence formation and word
order of a sentence is very important.
1.6. Methodology
This is an empirical research work. We have followed design science research
methodology. Design Science Research Methodology a basic deductive logic of discovery,
because an unsolved problem is taking and tries to find a justificatory knowledge or a
kernel theory which helps in solving the problem. The design Science research
8 | P a g e
methodology process generally followed six steps [19]. These steps and their description
are shown below.
1.6.1. Problem identification and motivation
In this step, we define the research problem that allows us to develop an effective model
that can provide a solution. We also define and justify the significance portion of the
research, the motivational factors for doing the research and identifying necessary
resources for this study include knowledge of the state of the problem.
1.6.2. Objectives for a solution
In this step, we have explained about the objectives of the study that are inferred from the
problem identification step and we have also tried to review various resources related to
our research to know the state of the problem and their solutions.
1.6.3. Design and development
In this step, the artifactual solution is created and determines the artifact’s desired
functionality. We used Pytorch (using TensorFlow as a backend) to design the model.
visual studio code is used for writing required source codes. In addition to this, we have
collected Simple Amharic sentences from different sources such as student textbooks,
Walta newspapers, and Addis Admas Newspaper with the help of linguistic students.
1.6.4. Demonstration
In this step, the developed artifact is demonstrated by simulating how the developed model
is labeled their roles within the given new Amharic sentences. We have used the windows
10 operating system and scientific Python development environment to implement the
model.
1.6.5. Communication
In this section, we present about the problem and its importance, the artifact model, and its
effectiveness to researches and other relevant related information are communicated to
relevant audiences when appropriate.
9 | P a g e
1.6.6. Evaluation Metrics
After we proposed the new system, we have evaluated its performance using different
evaluation metrics such as accuracy, precision and recall in order to check whether the
proposed model achieved the promising result and solved the statement of problem
properly or not.
1.7. Design and Development Tools
1.7.1. Python
We have used python programming language to develop the model. python is a
general purpose and high-level programming language used for text and image processing.
It is open-source and freely available, extensible, portable and easy to use software.
For the purpose of our study, we have used the latest version of python (i.e., python 3.7)
with its important features.
1.7.2. TensorFlow
TensorFlow is an open-source python-based library created by the google brain team used
for numerical computation and large-scale machine learning. TensorFlow bundles a slew
of machine learning and deep learning models and algorithms together and makes them
useful by the way of a common metaphor. It is also an open-source artificial intelligence
library that uses data flow graphs to build models and allows developers to create large
neural network models mainly used for classification, meaning understanding, prediction
and creation.
For our study, we have used the latest version of TensorFlow (i.e., TensorFlow 2.1.0) to
build a sequential deep learning model and classify input data.
10 | P a g e
1.8. Organization of the Thesis
This section presents an overview of the remaining chapters on this thesis work. The rest
of this thesis is organized as follows.
In Chapter Two, we have reviewed literature on the area of semantic role labeling,
classification methods used for SRL and available lexical resources for SRL are presented.
In addition to this, we have explained a detailed description of deep learning architecture
and different neural network classifiers such as Multi-Layered Perceptron.
In Chapter three, related works done on semantic role labeling using different approaches
(Artificial Neural network, Memory Based Learning, Neural network classifier i.e., MLP
etc.).
The fourth chapter describes the overall architecture of the proposed Context-Based
semantic role label for Amharic text. Detail description about phases of the proposed model
(preprocessing, training and semantic role labeling phase) and tasks done in each phase.
Chapter five presents evaluation of the proposed Context-Based semantic role labeler for
Amharic text for simple Amharic sentences. In addition to this detail description about
semantically annotated dataset preparation and training and evaluating the proposed
network model.
Chapter Six shows the major research work findings and the conclusion about the problem
statement. In addition to this, it outlined the major gaps this research work does not cover
and show for next researchers as a recommendation.
11 | P a g e
CHAPTER TWO: LITERATURE REVIEW
2.1. Natural Language Processing
Natural language processing is a field of computer speech synthesis and human language
technology that deals with interaction between computers and humans using natural
language [1]. It allows computers to analyze natural language and convert it into a useful
form of data representation and helps us to understand human language.
NLP basically focused on making computers perform important and interesting tasks and
well understood in human languages such as text-to-speech or speech-to-text conversion,
natural language understanding and machine translation to communicate machines
(computers) with humans in their own language [3].
To generalize the nature, application and behavior of NLP understanding the knowledge
of the language’s meaning is a preliminary task.
For instance, the level of linguistic analysis needs to be understood the following
knowledge of language: -
➢ Phonology: it realized how words related to the speaker sounds.
➢ Syntax: related to putting the correct form of a sentence and the structural role of
each word plays in the sentence.
➢ Semantics: studies about meaningful formation of sentences, context-independent
meaning of words and how these meanings are combined.
➢ Pragmatics: focuses on how sentences are used in different conditions and how
they affect the meaning of the sentence.
➢ Disambiguation: concerns solving the occurrence of ambiguities at different levels
of language.
➢ Discourse: study how the immediately preceding sentences affect the interpretation
of the next sentence.
Therefore, natural language processing focuses on the above levels of language analysis to
process natural language. Our study focuses on the area of context-based semantic
12 | P a g e
annotation of simple Amharic sentences particularly called semantic role labeling which is
important to resolving ambiguities in natural language understanding [29], considering
senses of predicates in different domain and identifying associated arguments and
assigning their appropriate roles for each of them.
2.2. Semantic Role
Semantic roles are also called thematic roles that characterize semantic relationships
between syntactic constituents (arguments) and a predicate [30]. They are abstract models
of the role of an argument expressed by the predicate and identifies the role of a verbal
argument in the event expressed by the verb usually an agent, a patient, experiencer [18].
Semantic roles describe the relation between a predicate typically a verb and its arguments
whereas Semantic role labeling extracts and assign roles to these relations in the sentences.
According to [32], Semantic roles can express in three different levels of generality:
➢ Verb-specific semantic roles: which are roles associated with a specific role of a
verb in a sentence.
E.g., runner, seller, cutter, broker, etc.
➢ Thematic relations: which expresses generalizations across the verb-specific roles.
E.g., agent, instrument, experiencer, theme, patient
➢ Generalized Semantic Roles: those expresses generalization across thematic
relations using two different arguments i.e., Actor and Undergoer. Actors expressed
generalization across the door of the action such as agent, experiencer, instrument
and Undergoer expressed generalization across subsuming patient, theme, recipient
and other roles.
Semantic Role Labeling is an NLP task aimed to capture and represent the participants and
circumstances of events expressed in human languages those revealed to provide answers
to questions such as who did what to whom, where, when and how by assigning different
semantic roles to each constituents of the sentence [31]. SRL approaches are usually
considered intermediary techniques in extracting meaning from text that play an important
13 | P a g e
role towards the natural language understanding. SRL task involves identifying the
constituents of each target predicate i.e., argument identification and classifying the
arguments i.e., argument classification before assigning an associated role label for them.
Nevertheless, in order to perform argument identification and classification accurately,
first the SRL task is required to perform finding the target predicate in the given sentence
and then assign a certain sense number to each of them. The input information contains
several levels of annotation in addition to the role labeling information i.e., parse trees,
POS tags, named entities [32].
2.2.1. Common List of Semantic Roles
According to their semantic roles, arguments are grouped into two major types: (i)
necessary arguments: those are arguments that represent central participants in an action
such as agent, patient, instrument, etc. (ii) optional arguments (adjuncts): those arguments
are optional for an event but have a capability of providing more information about the
action which includes manner, location, cause, time, etc.
We have seen a list of major thematic roles usually considered based on [5].
Agent: those are participants usually considered as a subject of a sentence. In which the
verb specifies the doer or initiator of action as doing or deliberately performs the action.
Example: [መላኩ] Agent እባቡን በዱላ ገደለው
Patient: shows the entity affected by the action, undergoes the action and shows a certain
state of change on its normal situation.
Example: ግንበኞቹ [ድንጋዩን] Patient ፈለጡት
Experiencer: the entity moving or being “located” receives sensory or emotional input.
Example: የጤና መድን ድርጅት [አቅመ ደካሞችን] Experiencer ረዳ
Theme: represents the entity moving and undergoes the action but does not show any state
change. Sometimes they are interchangeably used with patients.
14 | P a g e
Example: ጊፍት ሪል እስቴት [8 አፓርታሞችን] Theme ገነባ
Instrument: Used to perform the stated action.
Example: ካሳ በሩን [በቁልፍ] Instrument ከፈተ
Manner: it represents the way in which an action is carried out.
Example: በብራዚል የኮሮና ቫይረስ ተጠቂወች ቁጥር [በእጥፍ] Manner ጨመረ
Location: the thematic role associated with expressing where the action occurs or shows
the place of the action performed.
Example: ጥራቱን የጠበቀ 250 ካራት ወርቅ [በደቡብ አፍሪካ] Location ተገኘ
Direction or Goal: Where the performed action goes towards.
Example: የአሜሪካ ጎብኝዎች [ወደ ላሊበላ] Direction ተጓዙ
Source: the thematic role associated with where the action originated.
Example: ደረጀ [ከድሬዳዋ] Source ወደ ባህር ዳር መጣ
Purpose: The aim or goal in which an action is performed
Example: አሜሪካ በኢትዮጵያ [የኩፍኝ በሽታን ለመከላከል] Purpose አራት ሚሊየን ዶላር ድጋፍ አደረገች
Time: The time that the action occurred.
Example: ጠቅላይ ሚኒስትሩ ለይፋዊ የስራ ጉብኝት [ትናንት ማታ] Time ፈረንሳይ ገቡ
Beneficiary: The entity which benefits from the action occurs.
Example: ትምህርት ሚኒስቴር [ለ2000 አቅመ ደካማ ወላጅ ልጆች] Beneficiary የትምህርት ቁሳቁስ ሰጠ
Cause: Represents because of what? Something was happening or the action occurred in
the first place.
Example: [በህገወጥ ስደት] Cause የ30 ኢትዮጵያዊያን ስደተኞች ህይዎት አለፈ
15 | P a g e
2.2.2.Semantic Role Labeling Challenges
Efficient semantic role assignment is useful for developing many NLP applications.
However, the problems raised during role assignment can be either from the structural point
of view i.e., the input sentence is text which enriched with morpho-syntactic information
and the output would be a sequence of labeled arguments so, it is difficult to mapping from
input structures of a sentence to output structures. This would cause sequential segmenting
and labeling problems [34].
The other is syntactic variation of a sentences i.e., a single sentence can be expressed in
different structures syntactically, but all of them may have same semantic interpretation or
representation and also it is very difficult to decide a standard set or number of roles and
produce a formal definition for these roles like AGENT, TIME, SOURCE, or
INSTRUMENT [35].
2.3. Lexical Resources for SRL
In this section we have seen important and applicable lexical resources used for semantic
role labeling which are developed by different scholars. These lexical resources are
FrameNet, PropBank are lexical resources used for semantic role labeling.
2.3.1. PropBank
The PropBank is a treebanked structure corpus which provides predicate-argument
annotation for the entire Penn Treebank and creates a corpus of text annotated with
information about basic semantic propositions. Each verb in the treebank is annotated by a
single instance in PropBank, containing information about the location of the verb, and the
location and identity of its arguments [42, 45].
It consists of over 1M annotated words of Wall Street Journal text with existing gold-
standard parse trees and syntactically annotated sentences with predicate-argument pairs
for providing consistent argument labels across different syntactic realizations of the same
verb.
16 | P a g e
In addition to semantic role annotation, PropBank annotation requires the choice of a sense
id for each predicate and aims to provide consistent argument labels across different
syntactic realizations of the same verb, as shown in the following sentences: -
1. [ARG0 Jemal] broke [ARG1 the glass]
2. [ARG1 the glass] broke
The arguments of the same verb “broke” labeled as numbered arguments: i.e., ARG0,
ARG1, ARG2 etc.
Secondly, the PropBank annotation involves assigning a certain functional tag to all
modifiers of verbs, such as direction (DIR), manner (MNR), source (SRC), temporal
(TMP) those have a capable of providing additional information about the arguments of a
sentences [42].
For example: - Melaku Came from Addis Ababa to Gondar by plane yesterday.
Arg0: Melaku
Rel: Come
ArgM-SRC: from Addis Ababa
ArgM-DES: to Gondar
ArgM-INS: by train
ArgM-TMP: yesterday
The PropBank project takes a practical approach to semantic representation, adding a layer
of predicate-argument information, or semantic role labels, to the syntactic structures of
the Penn Treebank [6, 43]. It contains sentences annotated with proto-roles and verb-
specific semantic roles. In PropBank, roles (Arg0 to ArgN) specific to each individual verb
to avoid having to agree on a universal set. i.e., Arg0 basically “agent” and Arg1 basically
“patient”.
17 | P a g e
According to [42, 45], Arguments that are labeled from ARG0 to ARG4 are called core
arguments. Those numbered labels represent very general kinds of semantic roles. The
following table shows the list of numbered arguments in PropBank and their descriptions
that are taken from [42].
Table 2. 1 List of arguments in PropBank and their description
No Label Description
1 ARG0 Agent, Experiencer, Causer
2 ARG1 Patient, Theme
3 ARG2 Beneficiary, Instrument, Attribute, End state
4 ARG3 Starting point, Instrument, Attribute
5 ARG4 Ending Point
In addition to verb-specific numbered arguments, PropBank defines several more general
roles to any verb and assigning functional tags to all modifiers of verbs. These adjuncts
(circumstantial objects) can appear in any verb's frame to provide more additional
information about arguments found in a given sentence marked as ARG-Ms (modifiers).
Based on [42], there are 12 secondary tags for ARGMs used in the Proposition Bank i.e.
DIR, LOC, MNR, TMP, EXT, REC, PRD, PRP, DIS, ADV, MOD, NEG. The following
table shows the list of PropBank Annotated Adjuncts or modifier arguments and their
descriptions in [45].
Table 2. 2 List of Annotated Adjuncts in PropBank with their Explanations
Modifier Type Description
DIR Direction
TMP Time
LOC Location
MNR Manner
EXT Extent
CAU Cause
PUR Purpose
MOD Modal verb
18 | P a g e
NEG Negation marker
REC Reciprocal
ADV General-purpose adverbs
DIS Discourse connectives
The PropBank arguments have their own way of interpretation i.e. numbered arguments
are interpreted in a predicate specific manner whereas the ARGM‟S have a global
interpretation. As shown in [6], the general procedure to design PropBank is based on
creating frame sets for each verb and then using them as annotation guidelines for the
annotation.
The PropBank consists of two basic development processes. These are framing and
annotation.
Framing
Framing is the first process in PropBank which refers to the process of creating the frame
files. It is the collection of frameset entries for each verb which begins with the examination
of a sample of the sentences from the corpus containing the verb under consideration.
In PropBank, frame files provide a verb specific description of all possible semantic roles
by identifying the predicate and its possible arguments.
For example, frame set for the verb ‘ሰበረ’ (break):
Roles: -
ARG0: Breaker
ARG1: Things broken
ARG2: Instrument
Example: Transitive, active:
መልካሙ የህንፃውን መስኮት በመዶሻ ሰበረ (Abebe break the window by hammer)
• ARG0: መልካሙ
19 | P a g e
• REL: ሰበረ
• ARG1: የህንፃውን መስኮት
• ARGM-INS: በመዶሻ
Some verbs may have different contextual senses, so it is impossible to provide the same
set of semantic roles for all verb sense. In such cases, frame files distinguish these two or
more verb senses and define specific argument labels to each frame set. For example, in
the following two sentences, the verb “አከበረ” takes different argument.
1.ሚኪያስ (ARG0-AGT) የፋሲካ በዓልን (ARG1-BEN) በጎንደር (ARGM-LOC) አከበረ
2. ሚኪያስ (ARG0-AGT) አለቃውን (ARG1-TEM) አከበረ
Annotation
Choosing ARG0 versus ARG1
In many cases choosing an argument label for a given sentence is simple and
straightforward means easily given the verb specific definition of this label in the frame
files [42, 45]. However, to some extent this condition becomes more difficult and
ambiguous whether an argument should be annotated as Arg0 or Arg1. Thus, the annotator
must decide between these labels based on the following explanations of what generally
characterizes Arg0 and Arg1. The Arg0 label is assigned to arguments which are
understood as agents, causers, or experiencers whereas Arg1 label is usually assigned to
the patient argument, i.e. the argument which undergoes the change of state or is being
affected by the action.
In addition to this, Arg0 arguments correspond to the subjects of transitive verbs and a
class of intransitive verbs. For Example: -
➢ አበበ (ARG0) ፈተናዉን አለፈ
➢ አበበ (ARG0) መጣ
Whereas Internal arguments (labeled as ARG1) are the objects of transitive verbs and the
subjects of intransitive verbs called unaccusatives. For example:
20 | P a g e
➢ ኪሩቤል ሌባውን(ARG1) ያዘው
➢ ሌባው (ARG1) ተያዘ
Semantically external arguments have Proto-Agent properties such as [6, 45]:
➢ Volitional involvement in the event or state
➢ Causing an event or change of state in another participant
➢ Movement relative to the position of another participant
These arguments have Proto-Patient properties, which means that these arguments can
undergo change of state, are causally affected by another participant and are stationary
relative to movement of another participant.
Annotation Modifier
PropBank used the following annotation modifiers.
Comitatives (COM)
Comitative modifiers formerly also called ‘comitative’ that indicates who an action was
done with. This can include people or organizations (entities that have characteristics of
prototypical agents: animacy, volition) but excludes objects, which would be considered
instrumental modifiers [42, 45].
E.g. መላኩ ከ አያቱ ጋር ምሳውን በላ
• ARG0: መላኩ
• REL: በላ
• ARG1: ምሳውን
• ARGM-COM: ከ አያቱ ጋር
Locatives (LOC)
Locative modifiers indicate where some action takes place [42]. The notion of a locative
is not restricted to physical locations, but it also represents abstract locations.
E.g. መላኩ ከ አያቱ ጋር በ ካፒታል ሆቴል ምሳውን በላ
21 | P a g e
• ARG0: መላኩ
• ARGM-COM: ከ አያቱ ጋር
• ARGM- LOC: በ ካፒታል ሆቴል
• ARG1: ምሳውን
• REL: በላ
Destination (DES)
Destination modifier indicates the final resting place or destination of motion. However, if
there is no clear path being followed, a “location” marker should be used instead [42].
E.g. መስፍን ወደ ጎንደር ሄደ
• ARG1: መስፍን
• ARGM-DES: ወደ ጎንደር
• REL: ሄደ
Extent (EXT)
ARGM-EXT indicates the amount of change occurring from an action. ARGM-EXT
are used mostly for the following [42, 45]:
➢ Numerical adjuncts like ‘(raised prices) by 15%’,
➢ Quantifiers such as ‘a lot’ and
➢ Comparatives such as ‘(he raised prices) more than she did’
E.g. የጃፓን አመታዊ ገቢ ከባለፈው አመት በ 15% አደገ
• ARG0: የጃፓን አመታዊ ገቢ
• REL: አደገ
• ARGM-EXT: በ 15%
• ARGM- TMP: ከባለፈው አመት
Manner (MNR)
Manner adverbs specify how an action is performed. Manner tags should be used when an
adverb could be an answer to a question starting with ‘how?’ [42].
22 | P a g e
E.g. የመስፍን አባት ክፉኛ ታመመ
• ARG1: የመስፍን አባት
• ARGM-MNR: ክፉኛ
• REL: ታመመ
Modals (MOD)
Modals are will, may, can, must, shall, might, should, could, would those consistently have
labeled in the Tree Bank as ‘MOD.’ These elements are one of the few elements that are
selected and tagged directly on the modal word itself, as opposed to selecting a higher node
that contains the lexical item [42, 45].
Temporal (TMP)
Temporal argument modifier shows when an action took place, such as ‘in 1987’, ‘last
Wednesday’, ‘soon’ or ‘immediately’. Also included in this category are adverbs of
frequency (e.g., often always, sometimes), adverbs of duration (for a year/in a year), order
(e.g., first) [42].
E.g. መስፍን ትናንት ከወንድሙ ጋር ወደ ጎንደር ሄደ
• ARG1: መስፍን
• ARGM-TMP: ትናንት
• ARGM-COM: ከወንድሙ ጋር
• ARGM-DES: ወደ ጎንደር
• REL: ሄደ
Purpose Clauses (PRP)
Purpose clauses are used to show the motivation for some action. Clauses beginning with
‘in order to’ and ‘so that’ are canonical purpose clauses [42].
E.g. የጤና ሚኒስቴር ለፖሊዮ ክትባት ሁለት ሚሊየን ብር ድጋፍ አደረገ
• ARG1: የጤና ሚኒስቴር
• ARGM- PRP: ለፖሊዮ ክትባት
23 | P a g e
• ARG1-TEM: 2 ሚሊየን <NUM> ብር <N>
• ARG2: ድጋፍ
• REL: አደረገ
Cause Clauses (CAU)
Similar to ‘Purpose clauses’, these indicate the reason for an action. Clauses beginning
with ‘because’ or ‘due to’ are canonical cause clauses. Questions starting with ‘why,’
which are always characterized by atrace linking back to this question word, are always
treated as cause. However, in these question phrases it can often be difficult or impossible
to determine if the ‘why’ truly represents purpose or cause. Thus, as a general rule, if the
annotator cannot determine whether an argument is more appropriately purpose or cause,
cause is the default choice [42, 45].
E.g. የናይጀሪያው ፕሬዚዳንት በገንዘበ ማጭበርበር ተከሰሱ
• ARG1: የናይጀሪያው ፕሬዚዳንት
• ARGM- CAU: በገንዘበ ማጭበርበር
• REL: ተከሰሱ
Negation (NEG)
Negation is an important notation for PropBank annotation. Therefore, all markers which
indicate negation should be marked as NEG. Most of the time in Amharic sentences
negation marker are indicated using prefix “ አይ ” ፣“ አል ” ፣“ አት ” such as አይ መጣም፣ አት
መጣም ፣ አል በላም. Those are identified using morphological analyzer.
Both FrameNet and PropBank resources used to specify what counts as a predicate, define
the set of roles used in the task, and provide training and test sets. Recall that the difference
between these two models of semantic roles is that FrameNet employs many frame-specific
frame elements as roles, while PropBank uses a smaller number of numbered argument
labels that can be interpreted as verb-specific labels, along with the more general ARGM
labels.
24 | P a g e
2.3.2. FrameNet
FrameNet is a lexical database of English appeared in both human- and machine-readable
format that describes English words using Frame Semantics and sentence arguments are
defined in terms of frames rather than verbs [43].
FrameNet is an electronic resource and a framework for explicit description of the lexical
semantics of words. The key concept in the FrameNet method of annotation is a semantic
frame [44]. A semantic frame can be described as a representation of an object, event or
situation in which each frame has its own set of roles. For example, the roles defined for
the frame research are field, question, researcher and topic.
In the FrameNet dataset, the sentences are arranged in a hierarchical order with each frame
referring to a concept. Frames at the higher level refer to a more generic concept while
frames at the lower level refer to more specific concepts [44].
According to [43], in FrameNet three levels of semantic roles can be distinguished:
Core Elements: those elements are conceptually necessary for the frame and roughly
similar to syntactically obligatory to instantiate required roles. Additionally, the core
element allows to distinguish one frame from other frames.
Peripheral Elements: those frame elements are not central to the frame, but have a
capability of providing more additional information about the event, such as cause, time
and place. Those elements are similar to modifiers.
Extra-thematic Elements: those are not specific to the frame and standard modifiers but
describing the frame with respect to a broader context.
Every frame has invoking predicates attached to it. Figure 1 [43] shows structure of frames
in the FrameNet lexicon. These are the verbs and some nouns that invoke the concept,
referred by the frame they are attached to, these sentences that have these predicates would
have constructs that play the role given by the frame elements of the invoked frame. For
example, [judge she] blames [evaluee the government] [reason for failing to do enough to
help]; in this example predicate blame invokes the judgment frame and other constructs in
25 | P a g e
the sentence play the invoked semantic roles. (She) plays the role (Judge), (the
Government) plays the role (Evaluee), (for failing to do enough to help) plays the role
(Reason).
Figure 2. 1 Sample Domain and Frames from the FrameNet Lexicon
2.4. Classification Methods for SRL
2.4.1. Maximum Entropy
Maximum entropy is a probabilistic classifier which belongs to the class exponential
models which works based on the principle of maximum entropy and selects all the models
which have the largest entropy that fit our training data.
Figure 2. 2 Maximum Entropy Classifier
26 | P a g e
The Max Entropy classifier is a discriminative classifier commonly used in Natural
Language Processing, sentiment analysis, Speech and Information Retrieval problems
commonly used to power up our Machine Learning API [38]. The Maximum Entropy
classifier is trained for identification and classification of the predicates’ semantic
arguments together and only among the embedding ones tries to keep the constituents with
the largest probability.
2.4.2. Support Vector Machines
Support Vector Machine (SVM) is a supervised machine learning algorithm which can be
used for both classification and regression problems [39]. SVM is one of a binary classifier
which works based on the maximum margin strategy by finding a coordinate of individual
observation which best segregates the two classes (hyper-plane/ line). Then a hyperplane
divides a dataset into two classes [46].
Figure 2. 3 Support vector Machine Classifier
The basic advantage behind SVM algorithms is that they have high generalization
performance independent of the dimensions of the feature vectors and they can learn with
a combination of multiple features is possible by using the polynomial kernel function [39].
Due to this, SVM classifier is applicable in different natural language processing tasks such
as document classification, semantic role labeling and named entity recognition and
achieved a better performance result [46]. In SVM classifiers, Support vectors are the data
27 | P a g e
points that are very close to the hyperplane and considered as the critical elements of a data
set. These classifications can produce better performance accuracy and more efficient in
identifying a subset of training points [27].
However, SVM classification and regression algorithms require a clean dataset to work
better which means the classifier becomes less effective on a nosier dataset which contains
overlapping classes and takes longer training time for larger datasets [39].
2.4.3. Conditional Random Forest (CRF)
Conditional random forest is a flexible supervised machine learning algorithm applicable
for both classification and regression tasks which works by building multiple decision trees
and merges them together to get a more accurate and stable prediction [41]. CRFs are a
discriminative model with an undirected graphical structure belonging to the general class
of graphical models aimed at structured learning problems such as sequence, graph and
tree labeling which makes them appropriate for labeling natural language data [28].
Random forest classifier has been successfully applied to a variety of natural language
processing tasks for labeling or parsing of sequential data such as POS tagging, shallow
parsing and semantic understanding, named entity recognition, computer vision [41].
When a tree goes growing, instead of searching for the most important features, random
forest adds additional randomness to the model to search the best match and allows to avoid
overfitting problems occurring in machine learning by creating enough trees in the forest
[28, 41].
28 | P a g e
Figure 2. 4 Conditional Random Field Classifier
However, when the number of decision trees increases the random forest algorithm
performance becomes too slow and ineffective for real-time predictions. It is fast to train
but very slow to predict based on what they trained which means to predict accurately the
algorithm needs more trees that leads a model to become slower [41].
2.5. Deep Learning
Deep learning uses self-taught learning and layers of neural-network algorithms to
construct and decipher higher-level information at other layers based on raw input data
with many hidden layers and powerful computational resources [13].
Deep neural networks designed as components of larger machine-learning applications
involving algorithms for reinforcement learning, classification and regression to recognize
data patterns based on an early understanding of how the human brain functions [22].
29 | P a g e
Figure 2. 5 A Deep Learning Architecture
Deep learning architecture is applied to natural language processing, semantic role
labelling, audio recognition, computer vision and semantic analysis [13]. A deep neural
network architecture builds from an input layer, two or more hidden layers and an output
layer composed of many perceptrons that are connected in different ways and that operate
on different activation functions.
Nowadays different scholars apply different neural network approaches on different areas
of natural language processing activities to obtain a better performance result and to replace
the limitation of traditional (shallow) machine learning approaches that need a hand-crafted
feature to train their network model [22, 23, 28].
Even though, Neural network model contains more hidden layers, that makes the model
more-deeper and it produces higher training and testing accuracy results. It has a capability
of extracting important features from the given data for learning and predicting when a new
unseen data comes. The model requires a large dataset to extract important features
automatically and to achieve a good result [13].
2.5.1. Recurrent Neural Network (RNN)
Recurrent neural network is a powerful ANN approach that has a capability of creating a
recurring connection to itself to process sequential data types [25]. It used stochastic
gradient descent (SGD) optimizer to train the network along with a backpropagation
30 | P a g e
algorithm. Recurrent neural networks work by creating a recurring connection to each layer
of the network and process the input sequentially. This allows the network to learn the
effect of previous input x(t-1) along with the current input x(t) while predicting the output
at time “t” [28].
RNN inputs and outputs are dependent on each other and the hidden layer preserves
sequential information from previous steps. This means the output from an earlier step is
used as an input to the next step using the same weights repeatedly for prediction purposes
and then the layers are joined to create a single recurrent layer [28].
RNN used Long Short-Term Memory and Bidirectional Long Short-Term Memory
algorithm to train the model without making a gradient and vanishing problem which lost
weights and contents during backpropagation [21]. Bidirectional recurrent neural networks
are putting two independent RNNs together that allows the networks to have both backward
and forward information about the sequence at every time step.
2.5.2. Long-Short Term Memory (LSTM)
LSTM networks are a special kind of Recurrent Neural Network that able to hold data on
its memory for a longer period of time for learning long-term dependencies [12] and
coming with a new state called cell state and having Constant Error Carousel (CEC) which
allows the error to propagate back without vanishing [21].
LSTM algorithm suited for natural language understanding, semantic role labelling,
document classification and prediction based on time series data by reading the data
sequentially. These enable data scientists to create deep neural network models using large
stacked networks and handle complex sequence problems in machine learning more
efficiently [9, 25].
31 | P a g e
Figure 2. 6 Visualization of Long-Short Term Memory Networks
Bidirectional Long Short-Term Memory (Bi-LSTM) model manages inputs generated by
two separate states of LSTMs i.e., the forward and backward LSTMs. The Forward LSTM,
which is a regular sequence that starts from the beginning of the sentence while the
Backward LSTM, starts from the end of the input sentence which handles the input
sequences information in the backward direction.
Bidirectional LSTM processes the data in both directions with two separate hidden layers,
which are then fed to the same output layer and then It computes the forward hidden
sequence and the backward hidden sequence [2].
32 | P a g e
Figure 2. 7 Working of Bidirectional Long Short-Term Memory Networks
2.5.3. Multi-Layer Perceptron (MLP)
Multilayer perceptron is one of the most common types of Deep Neural Network (DNN)
which can be applicable in the quintessential deep learning models for classification tasks
[49]. Multilayer perceptron classifier consists of multiple layers of nodes in which each
layer is fully connected to the next layer in the network. These are the input layer, hidden
layers or intermediate layers and output layers.
The input layer consists of neurons that accept the input values. The output from these
neurons is same as the input predictors and nodes in the input layer represent the input data.
An output layer makes a decision or prediction about the input. Typically, the number of
hidden layers found in between input and output layers ranges from one to many and used
as the central computation layer that has the functions that map the input to the output of a
node [49].
The output layer returns the result back to the user environment. Based on [49], during
the design of a neural network MLP is the last component in the network, the dimensions
33 | P a g e
of the output match the number of classes. Often, a SoftMax function is applied to the
output to form a probability distribution over the classes.
MLP consists of a fully connected layer [49]. In a fully connected layer, the parameters of
each unit are independent of the rest of the units in the layer, that means each unit possesses
a unique set of weights. each input vector is associated with a label, or ground truth,
defining its class or class label is given with the data. The output of the network gives a
class score, or prediction, for each input.
To measure the performance of the classifier, the loss function is defined [50]. The loss
will be high if the predicted class does not correspond to the true class, it will be low
otherwise. MLPs have the same input and output layers but may have multiple hidden
layers in between the input and output layers.
Figure 2. 8 Multilayer Perceptron Architecture
MLP algorithms work through input layers and pass the input data by taking the dot product
of the input with the weights that exist between the input layer and the hidden layer. This
dot product yields an input value at the hidden layer. In hidden layers, MLPs utilize
activation functions at each of their calculated layers such as rectified linear units (ReLU),
sigmoid function, tanh. Then, Once the calculated output at the hidden layer has been
pushed through the activation function, push it to the next layer in the MLP by taking the
dot product with the corresponding weights. And it works until the output layer is reached.
34 | P a g e
At the output layer, the calculations will either be used for a backpropagation algorithm
that corresponds to the activation function that was selected for the MLP (in the case of
training) or a decision will be made based on the output (in the case of testing) [49].
In multi-layer perceptron, we apply non-linear activation function on hidden layer and
output layer to mapping input with the weights of the neurons and adding bias. In other
words - it is a mapping of the weighted inputs to the output. We also use a learning
algorithm called backpropagation which is used to continuously adjust the weights of the
connections after each out of processing. The adjustment is based on the error in the output.
In other words, the system is learning from mistakes. The process continues until the cost
of the error is at the lowest as possible.
2.6. Amharic Language
Amharic is the most widely spoken Semitic language in the central highlands of the country
next to the Oromo language and used as the official working language of Ethiopia [17]. It
has 33 characters with a left to right writing system to construct a meaningful sentence and
follows its own Subject / Object / Verb (SOV) word order or sentence structure which is
different from English language.
Amharic uses a script which originated from the Ge’ez alphabet with 33 basic characters
with each having 7 forms for each consonant-vowel combination. Unlike Arabic and
Hebrew, Amharic language uses a semi-syllabic system from left to right with 33 basic
characters with each having 7 forms.
35 | P a g e
Figure 2. 9 Amharic Writing Script Source: - taken from
(https://www.amharicmachine.com/default/alphabet accessed date=9/25/2020)
Semantics is the study of the meaning of words, phrases and sentences by drawing the
exact meaning or the dictionary meaning from the text. In semantic analysis, there is always
an attempt to focus on what the words conventionally mean, rather than on what a speaker
might want the words to mean on a particular occasion. This technical approach to meaning
emphasizes the objective and the general. It avoids the subjective and the local. Linguistic
semantics deals with the conventional meaning conveyed by the use of words and sentences
of a language [14].
In linguistic semantic analysis concept [14], semantics deals with the conventional
meaning conveyed by the use of words and sentences of a language to focus on what the
words conventionally mean, rather than on what a speaker might want the words to mean
on a particular occasion.
36 | P a g e
2.6.1. Amharic Sentences
A sentence is the basic unit of language, a grammatically complete idea and group of words
or phrases that are put together to express a complete thought. It is a group of words or
phrases. Phrase is a basic building block of a sentence [52]. A phrase is a structure in a
language that is constructed from one or more words in the language. It is a syntactic
structure that is wider than a word and smaller than a sentence. Phrases can be constructed
only from a head word or a head word.
In Amharic, phrases are categorized into five categories, namely noun phrase, verb phrase,
adjectival phrase, adverbial phrase and prepositional phrase [14, 51].
A noun phrase is a syntactic unit in which the head is a noun or a pronoun. For example,
in the noun phrase, ነጭ እርግብ/ “White dove”, ነጭ “White” is an adjective which modifying
the noun እርግብ/ “Dove”. A verb phrase is composed of a verb as a head and constituents
such as complements, modifiers and specifiers. For example, in the verb phrase, ወደ አሜሪካ
ሄደ “He went to America, ወደ አሜሪካ ‘to America’ is prepositional phrase modifying the
verb ሄደ ‘Went’ from place point of view.
In Amharic language [51], the adjectival phrase is similar to that of a noun phrase and verb
phrase. It can be composed of an adjective as a head and constituents such as complements,
modifiers and specifiers. For example, in the adjectival phrase, በጣም ትልቅ “Very large”,
በጣም ‘Very’ is a modifier modifying the head of the adjective, ትልቅ “large”.
Based on their number of predicates they contain [17] the structure of a sentence would be
either a simple, compound and complex sentence.
Simple sentences
Simple sentences consist only of one main predicate in their sentence structure.
For Example, the following are simple Amharic sentences.
Example: አበበ ምሳውን በላ (Abebe ate his launch)
A simple sentence may also describe the state of being of the subject or an action that takes
place in the sentence. Example: አበበ መምህር ነው/ Abebe is a teacher; this sentence describes
37 | P a g e
the present state of being of Abebe. We call the above sentences simple Amharic sentences
because they contain only one predicate.
Simple sentences are classified into four kinds called declarative sentences, interrogative
sentences, imperative sentences and negative sentences [51].
Declarative sentence
Declarative sentences are used to convey ideas and feelings that the speaker has about
things, happenings, feelings, etc., that could be physical, mental, real or imaginary.
Example: ዳዊት መምህር ሆነ/Dawit became a teacher
Interrogative sentence
An interrogative sentence is a sentence that questions about the subject or the action the
verb specifies.
Example: አስቴር መቼ መጣች? /When did Aster come?
Negative sentence
Negative sentences simply negate a declarative statement made about something.
Example: ቢኒያም ትምህርት ቤት አልሄደም/Binyam did not go to school
Imperative sentence
Simple imperative sentences convey instructions and mostly their subject is a second
person pronoun that is usually but implied by the suffix on the verb.
Example: ዝም በል/Shut up!
Compound
Compound sentences consist of two independent clauses or more than predicate with one
main verb in their sentence structure [17,52]. A coordinating conjunction (for, and, nor,
but, or, yet, so) often links the two independent clauses and is preceded by a comma. For
Example, the following are compound and complex Amharic sentences.
38 | P a g e
Example: አማረ ቁርሱን በልቶ ወደ ትምህርት ቤት ሄደ/Abebe Go to school after eating his breakfast
(Compound)
Complex Sentences
Complex sentences are formed from either complex noun phrases or complex verb phrases
or both [51,52]. In other words, a complex sentence can have a complex NP and a simple
VP, a simple NP and a complex VP or both complex NP and complex VP. Complex NPs
contain at least one embedded sentence which can be a complement or other type phrase.
On the other hand, complex VPs contain at least one sentence or more than one verb.
A complex sentence contains one independent clause and one or more dependent clauses.
A complex sentence will include at least one subordinating conjunction.
Example: መሳይ ውድድሩን አሸንፏል፣ ነገር ግን ሽልማት አልተሰጠውም /Mesay won the competition,
however he doesn’t get a reward (Complex)
However, for the purpose of this study we have used only simple Amharic sentences types
as well as sentences which have ambiguous predicates or verbs and their associated roles.
39 | P a g e
CHAPTER THREE: RELATED WORKS
3.1. Introduction
In this chapter, we have reviewed different works done on the area of semantic role labeling
by different scholars using different neural network classification approaches for different
languages like, English, Chinese, Vietnamese. Among these we have explained about
semantic role labeling works using recurrent neural networks specifically using LSTM and
Bi-LSTM approach and multilayer perceptron classifier.
3.2. Semantic Role Labeling for English and European Languages
Most previously proposed SRL systems focused on an end-to-end SRL learning system
without using parsing techniques and works based on predefined features and syntactic
structures of sentences. building an end-to-end SRL learning system without using parsing
becomes less successful and provides low accuracy results on SRL systems [28]. Due to
this, researchers motivated to develop different sequential neural network SRL models that
are able to integrate different techniques such as automatic feature selection, parsing and
POS tag to achieve a better accuracy result [13, 28].
In [28] the authors proposed end-to-end learning of semantic role labeling using recurrent
neural networks. They have used deep bi-directional recurrent network as an end-to-end
system and deep bi-directional long short-term memory model which works only by taking
original text information as input feature without the need of predefined feature and any
intermediate tag as syntactic knowledge. Their network model processed the input features
by 8 layers of LSTM bi-directionally.
The latent variables of the model implicitly capture the syntactic structure of a sentence.
Additionally, the authors locate the conditional random field (CRF) model at the top of the
network layers for the purpose of tag sequence prediction. Their model achieves an F1
score of 81.07 on CoNLL-2005 shared task and 81.27 on CoNLL-2012 shared task.
In [13] the authors proposed deep semantic role labeling with self-attention. They
developed a simple and effective deep attentional neural network architecture for SRL to
40 | P a g e
handle structural information and long-range dependencies problems of end-to-end SRL
with recurrent neural networks (RNN). Their network model works based on self-attention
which can directly capture the relationships between two tokens regardless of their
distance.
In addition to this, the authors implemented position embedding technique to hold
positional encoding in attention mechanism and distinguish the position of each input word.
The authors conduct an experiment on the two commonly used datasets from the CoNLL-
2005 shared task and the CoNLL-2012 shared task and achieved an F1-score of 83.4 on
the CoNLL-2005 shared task dataset and 82.7 on the CoNLL-2012 shared task dataset
respectively.
A study by [33], presented Selectively connected self-attentions for semantic role labeling
to solve the long training time over a large data set requirements of recent deep neural
network models. They proposed a novel deep neural network model for semantic role
labeling which works based on stacked attentive representations to provide selective
connections among attentive representations and to capture the hierarchical structures of
languages.
Based on their experimental result their model performed better accuracy as compared to
the state-of-the-art studies, reduced the training time by 62 percent and achieved an F1
score of 86.6 and 83.6 on the CoNLL 2005 and CoNLL 2012 shared tasks respectively.
3.3. Semantic Role Labeling for Chinese Language
In [29] Wang proposed a semantic role labelling system for Chinese language without
considering either syntactic parsing or POS tagging techniques rather they divided the
activities into two main techniques i.e., clustering and labeling. Clustering aims to clustered
similar sentences together for partially replacing syntactic parsing where as in the labeling
step, ANN feeds many numbers of cluster features with chunks of a sentence and then
labeling them with associated semantic roles.
In the experimentation part, the author manually annotated more than one thousand
Chinese clausal sentences by including imperatives, assertions and queries and grouping
41 | P a g e
the syntactic elements into four categories: predicate verb, argument, particle, and marker.
Their model achieved an accuracy of 83.8% and experimental results show the
effectiveness of clustering.
A study of [26] proposed the first Automatic Semantic Role Labeling for Chinese Verbs
using a pre-release version of the Chinese Proposition Bank. The authors used a Maximum
Entropy classifier with a tunable Gaussian prior in the Mallet Toolkit to minimize
overfitting by adjusting the Gaussian prior. The authors reported a result on two conducted
experiments. On the first experiment they obtained F1-score of 92.7% using the
handcrafted parses in the treebank and F1-score of 93.9% using a fully automatic Chinese
parser that integrates word segmentation, POS-tagging and parsing on the second
experiment.
During their experimentation, the authors adopted features that have been described in
recent work on English semantic role labeling to Chinese and focused more on how verb
classes can be induced from “frame files” from the Penn Proposition Bank and be used as
features.
3.4. Semantic Role Labeling for Arabic Language
In [46] the authors proposed the first modern standard semantic role labeling system for
Arabic language based on a supervised machine learning model that uses support vector
machines (SVM) technology and standard features. They adopted an SRL model that uses
SVM to implement a two steps classification approach, i.e., boundary detection and
argument classification. They have trained and tested the model using the pilot Arabic
PropBank data released on the SEMEVAL 2007 data and obtained an F1-score of 94.06
on argument boundary detection and 81.43 on the complete semantic role labeling task
using gold parse trees.
A study of [27] developed Semantic Role Labeling Systems for Arabic using Kernel
Methods that have a capability of exploiting many aspects of the rich morphological
features of the language. Their proposed system works based on a supervised model that
uses support vector machines (SVM) technology for argument boundary detection and
argument classification. They have trained and tested their model using the pilot Arabic
42 | P a g e
PropBank data released as part of the SemEval 2007 data. They conducted an experiment
on the pilot Arabic PropBank data based on Support Vector Machines and Kernel Methods
and they have used SVM-Light-TK toolkit for experimentation which yields an F1 score
of 82.17%.
3.5. Semantic Role Labeling for Amharic Language
In by [18] developed an automatic Semantic Role Labeler for Amharic Text Using Memory
Based Learning. The authors proposed the general architecture of semantic role labeler for
Amharic text using Memory Based Learning (MBL) in 2017. They developed and
implemented feature extraction algorithms to extract 551 instances from 240 simple
Amharic sentences and achieved an accuracy of 82.51% with default parameter values (i.e.,
number of nearest neighbors, distance metrics, class voting weight and feature weighting
metrics) and 89.29% with optimized parameters of MBL algorithm.
Different predicates have different senses in different Amharic sentences. For example, the
following two sentences “አማረ ቢላዋውን በሞረድ ሳለ” and “አማረ ጉንፋን ስለያዘው ክፉኛ ሳለ” are
taken from two different domains. In these sentences, the predicate “ሳለ” have different
contextual meanings and takes different semantic roles. However, E. Yirga and her
colleagues’ work did not consider multiple senses of predicate on simple Amharic
sentences rather they considered only semantic role labeling cases on simple Amharic
sentences with predicates which have only one sense. For example, the following simple
Amharic sentences are labelled as follows:
1. አማረ [ARG0-AGT] ቢላዋ [ARG1-BEN] በሞረድ [ARGM-INS] ሳለ [REL] and
2. አማረ [ARG0-AGT] ጉንፋን ስለያዘው [ARGM-CAU] ክፉኛ [ARGM-MNR] ሳለ [REL]
On the above-labeled simple Amharic sentences, in the first sentence the labelled
arguments [ARG0-AGT], [ARG1-BEN] and [ARGM-INS] answer the questions: who
sharpens? What is sharpened? What tools are used to sharpen? respectively where as in the
second sentence the arguments [ARG0-AGT], [ARGM-CAU], [ARGM-MNR] answer the
questions: who coughs? Why Amare coughed? In what way Amare coughed? respectively.
Therefore, in these sentences the predicate “ሳለ” have different contextual meaning and
43 | P a g e
each of them take different role arguments. However, the previous work did not consider
predicate with multiple contextual meanings. To address the above problem, this study tries
to consider contextual meaning of predicates during dataset collection and we have
annotated each sentence arguments with respect to the sense of predicate based on
PropBank annotation guidelines provided by [42] before feed to our model.
E. Yrga [18] used Memory Based Learning (MBL) technique. MBL is a direct descendant
of the classical k-Nearest Neighbor (k-NN) approach. However, MBL is sensitive to the
chosen features and algorithm parameters. MBL is a shallow machine learning algorithm.
So, it requires manually extracted task-specific features to train the model and concepts are
lost during a backpropagation that causes vanishing/exploding gradient problems [21, 22].
Therefore, this handcrafted based feature extraction consumes more time and causes
feature unbalancing on the concepts.
A study of [18] also used the k-NN algorithm which is a supervised machine learning
algorithm used for both classification and regression problems [36]. However, k-NN is a
lazy learner algorithm (i.e., as the size of the data increases it does not learn anything from
the training data simply uses the training data itself for classification) and used for only
classifying numeric data and [36].
44 | P a g e
CHAPTER FOUR: SYSTEM DESIGN AND IMPLEMENTATION
4.1. Introduction
In this chapter, we discussed the overall architecture of the proposed Context-Based
semantic role labeler model for Amharic text. First, we illustrate the overall architecture of
the proposed model and then we described the individual components of the proposed
system architecture.
4.2. Proposed System Architecture
Figure 4. 1 Proposed System Architecture
45 | P a g e
The proposed system architecture consists of three main phases. These are Preprocessing,
Training and Testing (Semantic role labeling) phases and inside each phase there are
different tasks. In data preprocessing, we have prepared collected simple Amharic
sentences in suitable format. In the proposed system architecture, we have used embedding
techniques to represent the preprocessed Amharic sentence into dense vector, Bi-LSTM
network for sentence encoding to handle the information sequence of input sentence in
forward and backward directions and Multi-Layer Perceptron (MLP) neural network
classifier for generating the score of role labels in each argument.
In this section, we have seen the detailed description about each proposed system
architecture phase.
4.2.1. Preprocessing Phase
Preprocessing phase allows us to make our dataset suitable for our neural network model
for further processing tasks. Text preprocessing is a preliminary step for natural language
processing operation to transform a text into a machine understandable format that allows
machine learning algorithms to work easier [40]. For this study, the preprocessing stage is
done manually which consists of normalization and POS tagging for each token or
arguments of input Amharic sentences.
As shown in the Table 4.1 below: The final result of this phase is normalized and its
respective POS tagged Amharic sentence.
Table 4. 1 Sample output of Preprocessing Phase
No Sentence Normalized POS Tagged
1 ቀነኒሳ በቀለ የማራቶን
ሪከርድ ሰበረ
ቀነኒሳ በቀለ የማራቶን
ሪከርድ ሰበረ
ቀነኒሳ NOUN በቀለ NOUN የማራቶን
NOUNP ሪከርድ ADV ሰበረ VERB
2 ዳንኤል እንጨት ሠበረ ዳንኤል እንጨት ሰበረ ዳንኤል NOUN እንጨት NOUN ሰበረ
VERB
46 | P a g e
4.2.1.1. Normalization
Normalization is a process of converting a list of words to a more uniform sequence of
words to improve text matching [40]. It is a technique used as part of data preparation for
machine learning to change the values of numeric columns in the dataset to a common scale,
without distorting differences in the ranges of values. This technique helps prepare text for
later processing by transforming the words to a standard format. For example, In Amharic
language “ሠ” and “ሰ”, “ሀ” and “ሐ” converting all words to a single representation will
simplify the searching process.
In this study, we have only normalized the list of predicates found in our collected dataset
to a common representation to simplify predicate sense consideration tasks during a
semantic annotation of the dataset. Our proposed network model required normalized
dataset, especially predicate for assigning a correct role label tag for each argument in a
sentence based on the given predicate since it used a sense of predicate as one feature
during semantic role labeling task. So, the same sense predicates but which have different
word formation should have to be represented to a common scale as shown in table 4 below
to reduce the gradients overall oscillation time to represent each feature during training and
to minimize the challenges appeared towards our neural network classifier.
Table 4. 2 Sample Normalized verbs in the Dataset
Word Word Normalized
ሰበረ ሠበረ ሰበረ
ሰራ ሠራ ሰራ
አገኘ ዐገኘ አገኘ
ሄደ ሔደ ሄደ
In neural network models [40], different words have different features, representations and
ranges of values. Hence during training the gradients can oscillate back and forth for a
long period of time to find the global representation of each unique feature. To overcome
this problem, we need to normalize our dataset and make sure that the different features
take on similar ranges of values before we are going to annotate the data and feed it to our
model. So, the gradient descents can converge each feature more quickly.
47 | P a g e
4.2.1.2. POS tagging
POS tagging is also called grammatical tagging or word category disambiguation, the
process of assigning an associated part of a speech tag to a word in a corpus based on its
context and definition [37]. POS tagging can be used as an intermediate step for higher
level NLP tasks such as parsing, Text to Speech (TTS), Information Retrieval (IR), shallow
parsing, Information Extraction (IE) semantic analysis, machine translation.
This process allows us to correctly classify the semantic role of the sentence constituents
(arguments) and predicates found in a given sentence. In the process of POS tagging, the
given Amharic sentence tokenized into a sequence of words/phrases. Each set of
words/phrases represented associated arguments found in our sentences. Then, for the
purpose of tagging each sentence in addition to the online Amharic tagger module we have
used two linguistic experts from wollo university Amharic department to assign part of
speech tag of the individual word in a sentence. From the tagged sentence, tokens with
“Verb” POS tag represent predicates found in a sentence. In our collected dataset, we have
only one “main verb” or “predicate” since our study focused on only simple Amharic
sentence structure. For Example, “አየለ ትናንት ቤት ገዛ” is a simple Amharic sentence taken
from our collected dataset. In the sentence the “predicate” is “ገዛ”. So, the sentence “አየለ
ትናንት ቤት ገዛ” looks like the following format after applying a habit online Amharic POS
tagger module.
Table 4. 3 Sample Amharic Sentence Tagged by online Amharic tagger module
ID Words/Tokens POS Tag
1 አየለ NOUN
2 ትናንት ADV
3 ቤት NOUN
4 ገዛ VERB
1 ደርበው NOUN
2 ስራውን NDet
3 ለቀቀ VERB
48 | P a g e
4.2.1. Training Phase
The training phase of the system architecture contains two main components. These are
Bi-LSTM encoder and Biaffine Attentional Scorer. Bi-LSTM encoder that takes each
word embedding of the given sentence as input and generates dense vectors for each word
and a biaffine attentional scorer which takes the hidden vectors for the given word pair as
input and predict a label score vector for each pair of words.
4.2.1.1. Annotated Amharic Sentences
As we have explained in the preprocessing phase, after we have collected a simple Amharic
sentence from different social media platforms, we have prepared a semantically annotated
Amharic sentence based on the PropBank annotation guidelines developed by [42] those
contains tokens of each sentences, POS of a token and its associated head-dependent role
of each argument respectively. The head-dependent role represents the predicate “ID” and
predicate relationship with each argument in a sentence. We have used these annotated
Amharic sentences to train our neural network model. In the annotated dataset, each
arguments/tokens and POS tags as well as head/predicate of each arguments are listed
down in a CONLL file format style by separating each in a “whitespace”. We have also
used “newline” between sentences to separate one sentence to the other. So, the proposed
model used this empty space to identify one sentences from the other in our annotated
dataset during training.
4.2.1.2. Embedding Layer
Embedding layer is applicable for text processing to make the input data encoded and
represent each token of words by a unique integer. The Embedding layer is initialized with
random weights and learns an embedding for all words in the training dataset [21]. The
embedding technique takes three parameters. i.e., input length, input dimension and output
dimension. The input dimension represents the size of the vocabulary in the text data.
output dimension represents the size of the output vectors in the embedding layer and the
vector space in which words will be embedded and input length is the length of input
sequences.
49 | P a g e
In our case, we have two embedding representations: these are word and POS embedding.
Word embedding is used to represent each argument in a sentence which can be represented
into a dense vector. To perform word embedding, in our case we have initialized three
respective arguments. i.e., we have used 100,200, 30 dimensions as input, output, and input
length respectively.
Pos embedding is also used to represent a POS tag used in a sentence which can represent
a dense vector. To perform POS embedding, in our case we have initialized three
respective arguments. i.e., we have used 100,200, 28 dimensions as input, output, and input
length respectively.
Finally, these vectors are concatenated and passed to the next layer. We used the following
techniques to concatenate the result vectors of words/Arguments and POS embedding.
Figure 4. 2 Concatenation of Argument and POS tag embedding
The word representation of our model is the concatenation of a randomly initialized
word/argument embedding e(a), a randomly initialized part-of-speech tag embedding
e(pos). So, the final word representation is given by e = e(a) ⊕ e(pos), where ⊕ is the
concatenation operator.
4.2.1.3. Bi-LSTM Encoder
Bi-LSTM is a special kind of RNN and powerful in sequence prediction problems because
it is able to store past information and is able to process the sequence of input data in both
forward and backward directions [2].
In Semantic role labeling, Predicates in a sentence are treated as roots in a graph whereas
arguments in a sentence are treated as nodes in a graph which are dependent on the root
node i.e., predicate. Therefore, for our study we have used Bi-LSTM sentence encoder to
handle the position information of words in a sentence and to consider the context of texts
50 | P a g e
from left to right and from right to left in a sequence. In addition to encoding positional
information, Bi-LSTM networks are used to capture sentence aware representation of the
given input sequence and reduce the need for feature engineering.
For Example: Let us show how BiLSTM works in the following Amharic sentence which
is taken from our dataset: input sequence x = (ሳሙኤል ምሳሩን በሞረድ ሳለ). The forward LTSM
handles the sequence information from left to right sequentially starting from 1 to n, where
n refers to the number of arguments/tokens in a sentence. From the above example n=4,
therefore, in the given sentence, the forward LSTM handle sequence information starting
from the words “ሳሙኤል” to the final word “ሳለ”. The LSTM learns the position words in
the forward direction and works until the end token of the sentence. Finally, it passes the
sequence information to the output layer.
Whereas, the backward LSTM also handles the given sentence information in reverse
direction. i.e., from n to 1. It handles the sequence information of the given sentence
starting from the token “ሳለ”. to the token “ሳሙኤል”. Similarly, this LSTM also handles the
position of words in a sentence in reverse direction.
Based on this, the Bi-LSTM network combined the result of Forward LSTM and Backward
LSTM to produce a final context-aware output.
Table 4. 4 The Forward LSTM and backward LSTM Result for the sentence
“ሳሙኤል ምሳሩን በሞረድ ሳለ”
Word Forward LSTM Backward LSTM
ሳሙኤል ምሳሩን, በሞረድ, ሳለ ----
ምሳሩን በሞረድ, ሳለ ሳሙኤል
በሞረድ ሳለ ምሳሩን, ሳሙኤል
ሳለ -- በሞረድ, ምሳሩን, ሳሙኤል
4.2.1.4. Multi-Layer Perceptron (MLP) Classifier
Multi-layer Perceptron (MLP) is an artificial neural network model that uses mapping the
given input data onto a set of appropriate outputs. It consists of at least three layers of
nodes: an input layer, a hidden layer and an output layer. Except for the input nodes, other
51 | P a g e
layers use a nonlinear activation function. MLP needs a combination of backpropagation
and gradient descent for training [49].
In this study, we used the role MLP classifier which is used to predict the score of roles
between arguments and predicate in a sentence. In order to determine the score of roles,
the role MLP input layer takes two vectors. These are: each word in a sentence as argument
and a predicate in a sentence. Then, the role MLP classifier takes predicate word as head
and possible arguments in a sentence as dependent. Finally, the inputs are pushed forward
through the MLP by taking the dot product of the input with the weights that exist between
the input layer and the hidden layer. The result of the dot product yields an input value for
the hidden layer. the MLPs apply rectified linear unit’s activation functions to generate the
outputs. Then push the calculated output at the current layer. Once the calculated output at
the hidden layer has been pushed through the activation function, push it to the next layer
in the MLP by taking the dot product with the corresponding weights, and it works until
the output layer is reached. At the output layer, the score of predicate words and possible
arguments in a sentence will be generated.
Figure 4. 3 Visualize MLP classifier model used in semantic role labeling
52 | P a g e
Activation Function
In a neural network, the activation function is responsible for transforming the summed
weighted input from the node into the activation of the node or output for that input and
used to determine the output of the neural network by mapping the result values in between
0 to 1 or -1 to 1 [50]. Neural network activation functions are a crucial component of deep
learning for determining the output of a deep learning model, its accuracy and also the
computational efficiency of training a model which can make or break a large-scale neural
network.
ReLU is a non-linear activation function that is used in multi-layer neural networks or deep
neural networks [50]. It enables to accelerate the training speed of deep neural networks as
compared to traditional activation functions since the derivative of ReLU is 1 for a positive
input. Due to a constant, deep neural networks do not need to take additional time for
computing error terms during the training phase and are capable of solving the vanishing
gradient problem.
For our study, we have used the Rectified Linear Unit (ReLU) activation function. Since it
is computationally efficient and it makes the network model to converge very quickly.
ReLU has a derivative function and allows for backpropagation.
For illustration how our model calculates the score of roles between predicate and possible
arguments in a sentence. Let’s consider a sentence S = “ዳንኤል ቢላዋ ሳለ”. plus, roots at an
artificial node “ROOT. In the sentence “ዳንኤል ቢላዋ ሳለ”, “ሳለ” is the predicate and we have
two arguments i.e., “ዳንኤል” and “ቢላዋ”. The input layer is the words/arguments and part-
of-speech (POS) tags, then fed to the embedding layer. The embedding layer generates the
vector representation of the input word and part of-speech tag. These vectors are
concatenated and used as input to the LSTM layers (forward LSTM layer and backward
LSTM layer). The output of the LSTM layers is concatenated to produce the input role
MLP layers, which predict the score of roles between predicate and possible arguments in
a sentence.
53 | P a g e
Finally, the role MLP classifier generates scores between arguments and predicate words
in a sentence with respect to the given semantic role label.
Table 4. 5 A Work of MLP Scoring function on the sentence “ዳንኤል ቢላዋ ሳለ”
Arguments Predicate Semantic role Role MLP Classifier
ዳንኤል ሳለ ARG0-AGT S (ዳንኤል, ሳለ, ARG0-AGT)
ዳንኤል ሳለ ARG1-BEN S (ዳንኤል, ሳለ, ARG1-BEN)
ዳንኤል ሳለ REL S (ዳንኤል, ሳለ, REL)
ቢላዋ ሳለ ARG0-AGT S (ቢላዋ, ሳለ, ARG0-AGT)
ቢላዋ ሳለ ARG1-BEN S (ቢላዋ, ሳለ, ARG1-BEN)
ቢላዋ ሳለ REL S (ቢላዋ, ሳለ, REL)
As we have seen from table 4.5 above, the MLP scoring function calculate the score of
each argument with the given predicate in a sentence to be a given semantic role and based
on the score provided by the MLP scoring function the “Role MLP Classifier” classifies a
given argument in a sentence by selecting a maximum score value. For example, from the
above table, let the score of “S (ዳንኤል, ሳለ) to be “ARG0-AGT” =5” and score of “S (ዳንኤል,
ሳለ) to be “ARG1-BEN “= 4” so the “Role MLP Classifier” selects S (ዳንኤል, ሳለ, ARG0-
AGT) based on the maximum score result of MLP scoring function.
4.2.2. Testing/Semantic Role Labeling Phase
Testing/Semantic Role Labeling phase of a proposed system architecture works based on
the network parameters on the trained network model obtained from training phase to
predict associated role labels for each argument in a given sentence and generate a
semantically annotated Amharic sentence.
4.2.2.1. Predict Score of Role Label
In semantic role labeling, the neural network previously described is used only to predict
the scores for argument and relations based on the content of the word and POS tag.
In this stage, the trained model is loaded to compute a scores role label (predicate,
argument, sentence) for assigning a given role label. After having computed scores for all
54 | P a g e
(predicate, argument) combinations in a given sentence, we have applied a maximum
selection algorithm to select the highest score semantic role label. Finally, for each pair
(predicate, argument) previously detected, the semantic role label selects the highest score
labels for connection words.
4.2.2.2. Select Maximum Score of Role Label
In this step; we apply the maximum score selection algorithm to select the highest score
semantic role label between predicate and argument in a sentence to build a semantically
labeled Amharic sentence. The algorithm uses the incoming score of the role label for each
word in a sentence as input. Then the algorithm greedily selects the highest score of the
role label and makes a decision based on this, and it works for each word in a sentence.
4.2.2.3. Generate Labelled Amharic Sentences
This section shows semantically annotated labelled Amharic sentences generated by our
neural network model based on the score of each argument and role label pairs predicted
by the proposed network model.
The following table shows the result generated for the sentence: “መልካሙ አምስት ፊልሞችን
ደረሰ”.
Table 4. 6 Sample Labelled Amharic Sentence Generated by the model
No Tokens/Arguments POS Tag Predicate Role Label
1 መልካሙ NOUN 4 ARG0-AGT
2 አምስት NUMCR 4 ARG1-BEN
3 ፊልሞችን NDet 4 ARG1-BEN
4 ደረሰ VERB 0 REL
55 | P a g e
4.3. The Proposed Network Model
Figure 4. 4 Proposed Network model
The proposed network model takes a sequence of tokens i.e., each token represents
arguments in a given sentence and its associated POS tags from the input Amharic sentence
as input and performs word and Tag embedding within their own embedding dimensions.
i.e., 100 embedding dimensions for words and 30 embedding dimensions for their
associated POS tags. After that, we concatenated these two embedding results before
feeding to Bi-LSTM encoder. The Bi-LSTM encoder takes a sequence of each word and
POS tag embedding of the given sentence from the embedding layer as input and generates
a dense vector value for each word as output.
The Multi-layer Perceptron layer used for mapping the given input data from the Bi-LSTM
encoder onto a set of appropriate outputs. Multi-layer Perceptron uses biaffine attentional
scorer which takes the hidden vectors for the given word pair as input and predicts a label
score vector. For the purpose of our study, we have used the role MLP classifier which is
used to predict the score of roles between arguments and predicate in a sentence. This MLP
classifier generates a score of the role label as output.
56 | P a g e
CHAPTER FIVE: RESULTS AND DISCUSSION
5.1. Introduction
In this chapter the experimental environment, tools and algorithm proposed, evaluation
techniques used and results obtained from the experiments are presented in detail. In
addition to this, we have explained about the data that we have collected and important
dataset preparation techniques that we have applied to transform them from raw data format
to neural network understandable (CONLL) format since this format is suitable to our
neural network model.
5.1.1. Dataset Collection
In this section, we have collected 2000 non-domain specific simple Amharic sentences
from different social media platforms and student textbooks including sentences that
contain predicates which have more than one contextual meaning. Later on, we have
applied data preparation steps such as Normalization, POS tagging on the sentences before
feeding them to the neural network model.
Table 5. 1 Collected Sample Amharic Sentences with their Domains
NO Collected Sentences Domains
1 ሞላ ትናንት ህይወቱ አለፈ Health
2 የኢትዮጵያ ብሄራዊ ቡድን ለአፍሪካ ዋንጫ አለፈ Sport
3 መስፍን አዲስ ጫማ ገዛ Science and Technology
4 አማረ ጉንፋን ስለያዘው ክፉኛ ሳለ Health
5 ሰለሞን ልቦለድ ደረሰ Entertainment
6 ኢትዮጵያዊው አትሌት በለንደን ኦሎምፒክ የወርቅ
ሜዳሊያ አገኘ
Sport
7 አማረ ትልቅ ስዕል ሳለ Entertainment
8 የጀርመኑ ጠፈር ተመራማሪ አዲስ ፕላኔት አገኘ Science and Technology
57 | P a g e
The following table shows the amount of data that we have collected from different social
media platforms and student textbooks in different domains.
Table 5. 2 Collected Amount of Dataset from Different social media platforms
No
Source No of sentences in each domain
Health Sport Science & Technology Entertainment Sum
1 Walta News 300 200 200 200 900
2 Amhara Mass
Media
200 300 200 200 900
3 Student Textbooks 200 200
Total 2000
5.1.2. Dataset Preparation
In this section we have described the data that we have used for training and testing the
proposed network model and important data preparation methods. After collecting these
data, we have performed data preprocessing (i.e., normalization, POS tagging) and
identifying predicates with more than one contextual meaning in the sentence and identify
equivalent arguments for them before we are going to annotate the data and then the data
is semantically annotated based on PropBank annotation guidelines defined by [42].
As we have explained above, the role of Semantic Role Labelling is to determine a
semantically related arguments to the predicate for the purpose of answering the question
“who did what to whom”, “when” and “where”. Due to this, predicates refer to the main
verbs in the sentence that take different arguments. So, each individual arguments of a
sentence is assigned an associated role label depending on the predicate (main verb) in the
sentences. For our study, we have used a “predicate” as a “head” or “root” and “each
associated argument” as a dependent pair and based on the predicate we have assigned an
“ID” for each argument which represents a predicate in a sentence.
58 | P a g e
For Example, the sentence “አየለ ትናንት ቤት ገዛ” looks like the following format after applying
POS tagging technique and assigning an “ID” for each argument.
Table 5. 3 Predicate-Argument Relation
ID Words/Tokens POS Tag Predicate Role
1 አየለ NOUN 4 ARG0-AGT
2 ትናንት ADV 4 ARGM-TMP
3 ቤት NOUN 4 ARG1-BEN
4 ገዛ VERB 0 REL
The ID, words/tokens, POS tag and role of a sentence are separated by a white space like
“አየለ NOUN 4 ARG0-AGT”and two different sentences are separated by a “newline”.
5.2. Experiment/Implementation
We have used python 3.7 programming language installed on windows environment to
develop the model and experiments are done based on the prototype developed with Keras
packages (used TensorFlow as a backend) on Intel Core ™ i5-6200 CPU, and 4 GB of
RAM.
For conducting the experiment, from the total of the dataset we used 70% of the data for
training and splitting the remaining 30% into two equal parts and used for testing and
validation i.e., 300 data for Testing and 300 for validation because of the size of the dataset.
Based on this, we have selected 1400 data for training, 300 data for testing and 300 data
for validation throughout the experiments i.e., allocating 70% of the dataset for training
and the rest 30% of the dataset for testing and validating the model. From the total of the
collected dataset, we have a total of 40 ambiguous verbs or predicates that can take different
sets of role labels to their associated arguments according to their senses. In our case, 30
of the predicates have two different interpretations and also 10 of the predicates have three
contextual meanings in a sentence in which each of them takes a different set of role labels
to represent their arguments. During our data annotation process, we have considered two
up to three senses of each ambiguous verb. So, we have a total of 80 sentences that are
59 | P a g e
annotated based on the sense of their predicates. In addition to the number of single sense
simple Amharic sentences, we have used 50 of the multi-sense predicate sentences for
training the model and the rest 20 for testing as well as 10 of them for validation. However,
E. Yirga and her colleague’s dataset is 200 simple Amharic sentences which is collected
simply from student Amharic and which does not consider the sense of polysemy verbs
during data annotation (semantic role label assignment) and impossible to get the data. So,
this makes it difficult to use their dataset to our work since disambiguating polysemy verbs
during data annotation is one of the tasks done in this research work.
Table 5. 4 Training, Testing and Validation Dataset Description
No Dataset Section Number of Sentences Used
1 Training Dataset 1400
2 Testing Dataset 300
2 Validation Dataset 300
3 Total Dataset 2000
5.2.1. Hyperparameter tuning
While building the LSTM model there are hyperparameters such as dropout, learning rate,
batch size etc. that have to be set up properly to get an accurate prediction during back
testing the proposed network model. A study by [21] inspired that finding the optimal
hyperparameters for our model leads us to obtain higher accuracy results and minimize the
risk of model overfitting. For our study work, we have identified the most important
hyperparameters with their optimal value which are presented in Table 5.5 below those fit
our network model and achieved best accuracy result. During training our network model,
we have taken each hyperparameter one by one and selected the optimal value of each
hyperparameter which achieved good training and testing accuracy.
The learning rate is a tuning parameter in an optimization algorithm that determines the
step size at each iteration to control the model response to the estimated error at each time.
So, Choosing the optimal value of learning rate is a very important task in training a deep
60 | P a g e
neural network models because a too large value of learning rate causes to the model to
converge too fast to a suboptimal solution, whereas a too small value of learning rate leads
to a long training process to get stuck.
We have trained the Bi-LSTM model with different dropout and learning rate values.
When the dropout = 0.33 and learning rate = 0.001, the proposed model achieved 92.5%
training and 79.9% testing accuracy result. Whereas the dropout value = 0.5 and the default
learning rate value = 0.0001, the proposed model achieved 94.5% training and 83.8%
testing accuracy result. Based on these results, the Bi-LSTM dropout = 0.5 and the default
learning rate of Adam = 0.0001 values are selected. Regarding on this result, to improve
the performance of a deep Bi-LSTM network classifier to semantic role labeling we have
increased the dropout value of the neural network and used the default learning rate value
of the optimizer i.e., Adam and decreasing the dropout value of the MLP classifier i.e.,
0.01 to reduce misclassified classes or set of role labels for unseen data.
Table 5. 5 List of Hyperparameters Used in the Model with their Description
No Parameter Value Description
1 Epochs 50 ✓ It refers to the number of iterations when
the whole training data has been passed
through the network, hence one epoch is
one iteration of the whole training data
being passed through the network.
✓ The model is trained in 50 cycles or
iteration in the given training data.
2 Learning rate
0.001 ✓ It controls how much to change the model
in response to the estimated error at each
time the model weights are updated.
3 Embedding
Dropout
0.3
✓ It is a function that randomly chooses cells
in a layer according to the probability
Bi-LSTM Dropout 0.5
61 | P a g e
MLP Dropout 0.01 chosen and sets their output value to 0 for
reducing model overfitting problem.
✓ For our study, we have assigned the
dropout value between 0 and 1 in the
experiment randomly and take the best
result in our model.
4 Bi-LSTM Layer 3 ✓ The number of layers used in Bi-LSTM
network encoder.
5 Word Embedding
dimension
100 ✓ To represent the dimension of the vector
representing each token.
✓ For our study we have used 100 embedding
dimensions for arguments and 30 for their
associated POS tag.
Tag Embedding
Dimension
30
6 Bach size 4 ✓ We have used 4 training sentences in one
iteration to estimate the error gradient in
hyperparameter for the learning algorithm.
7 Activation
Function
ReLU ✓ In Neural networks, activation functions
are used for determining the output of a
deep learning model and its accuracy.
✓ For our study, we have used the ReLU
activation function. Since it is
computationally efficient for
backpropagation.
8 Optimizer Adam ✓ Optimizers are algorithms used to change
the attributes of the neural network learning
rate to reduce the losses.
✓ For our study, we have used the Adam
optimization algorithm since it is
Computationally efficient, requires Little
memory
62 | P a g e
5.2.2. Proposed model training and Validation accuracy
As we have depicted in Figure 5.1 below, our proposed model achieved 95.5 training
accuracy and 81.2 Validation accuracy without considering multiple sense of predicates
annotated dataset. In this case, we have 19 classes or sets of role labels to classify each
argument in a sentence.
Figure 5. 1 Proposed Model Training and validation status rate
As clearly shown in Figure 5.1 above the training and validation Accuracy curve have the
same flow i.e., both of them increase from epoch 2 up to epoch 10. This shows both the
training and validation dataset contains easier examples and gets fewer complex data. since
the model can easily learn important features from the input data and less challengeable to
test the network model and can easily generalize for our unseen data.
When we see the overall performance of the proposed model, it achieved a very good
training accuracy but the validation accuracy is not as good as the training. Starting from
epoch 10 up to epoch 30 both the training and validation curve goes neck to neck, whereas
from epoch 30 up to epoch 45 the training curve increases highly than the validation curve
63 | P a g e
which means the model is very good in feature learning but did not get enough information
to generalize for unseen data. This indicates that the training dataset consists of easier
examples than validation dataset i.e., The model loses some features during validation
because of complexity of the validation dataset the model becomes difficult to generalize
for some features.
Generally, when we go from epoch 0 to epoch 50 both the training and validation curve
ultimately increases which means the model can obtain important features from the input
data as the data size increases and which can make a decision easily for new data based on
what it learns from the training data in the training phase.
5.2.3. Proposed model Training and Validation loss
Training loss is the error on the training set of data whereas validation loss is the error after
running the validation set of data through the trained network. The training and validation
loss of the proposed model is shown in Figure 5.2 below.
Figure 5. 2 Proposed model Training and Validation Loss Rate
64 | P a g e
The study aim is to make the validation loss as low as possible low. Since, most researchers
explained that the appearance of some overfitting in the network model is nearly always a
good thing and allows our neural network model to make accurate predictions for unseen
data.
As we have clearly depicted in Figure 5.2 above, as the number epoch increases both the
validation and training loss drops down. This indicates that the network model learns the
given data better and better. However, due to the nature of the dataset at a certain point the
training error curve continues to drop-down but the validation loss curve begins to rise-up,
this indicates there is some overfitting condition on the model.
5.3. Experiments
In this study, we have performed an additional experiment to show the role of sense
consideration in our dataset for semantic role labeler model performance. The experiment
examined the usefulness of context-based data annotation or preparing a semantically
annotated data for each sense of predicates to semantic role labeling. The proposed neural
network model uses a sequence of words/arguments and its corresponding sequence of tags
as input to assign a label for each argument in a given sentence which indicates their role
based on the main verb/predicate found in a sentence.
Therefore, we are motivated to consider a context of predicates during assigning of a
semantic role label for each argument. In addition to this, we have tried to consider each
sense of predicates in our dataset and prepared a data for each sense after that we have done
an experiment with these datasets to evaluate our model performance. To conduct the
experiment, we have used 70% of the dataset for training and the remaining 30% of dataset
for testing and validation with the same hyperparameter values as we have expressed above
in Table 5.5 except during context-based data annotation we have used two additional
modifier class or role label called “ARGM-CAU” and “ARGM-EXT” those are included
only in this experiment. In addition to this, 50 sentences with ambiguous words are
considered for the training, 20 sentences for validation and 10 sentences for testing the
model.
65 | P a g e
Experiment I: Evaluating the role of Predicate Sense based data annotation on
Semantic Role labeling Performance
In this study, the proposed model takes a sequence of words/arguments and predicates to
determine the relationship between each argument with the predicate based on the MLP
scoring function and uses the score value of predicate and arguments as input to label the
given sentence. During role labeling, the classifier classifies based on the maximum score
of a predicate with each argument in a given sentence.
As shown in table 5.6 below, the proposed model achieved 95.5% Training and 81.2%
validation accuracy on our dataset which is annotated without depending on sense of
predicates, whereas the model achieved 95.9% Training and 83.8% validation accuracy on
our dataset which is annotated depending on sense of predicates. Based on this result, we
have concluded that a context-based data annotation depending on the sense of predicates
and using the correct value of hyperparameters such as learning rate and dropout for
training a Bi-LSTM model increased the performance of our proposed Bi-LSTM network
model during semantic role labeling.
Table 5. 6 Comparison of Role Labeler Performance with and without context of
multi-sense predicate
Evaluation Without Predicate Sense
Dataset
With Predicate Sense
Dataset
Training Accuracy 95.5% 95.9%
Validation Accuracy 81.2% 83.8%
Testing Accuracy 82.6% 84.9%
In network model some overfitting problem appeared to fix this problem different
researchers recommended different model overfitting reducing techniques such as
increasing neural network dropout and trained a model with a huge amount of dataset. For
our study, we have increased a Bi-LSTM dropout from 0.3 to 0.5 to fix model overfitting.
66 | P a g e
Figure 5. 3 Role labeling Performance with Predicate Sense Based Annotated Data
As shown in Figure 5.3 above, the experimental result shows the effectiveness of
considering predicate on different domains and predicate sense-based data annotation on
semantic role labeling by reducing overfitting of our neural network model. Based on this,
we suggested that annotation of dataset depending on the sense of predicate and using
optimal hyperparameter values for semantic role labeling tasks achieved a better training
and testing accuracy result.
5.3.1. Evaluation
For evaluating the performance of our model, we have simply split our dataset into training,
testing and validation data. From our 2000 total dataset, we have used 1400 data for
training, 300 for testing and 300 for validation the MLP classifier which means we have
trained the classifier by 70% of the dataset and the remaining 30% for tested and validating
it because this data splitting rule is fine when the training and testing datasets contain
enough examples to get reliable accuracy results. The model classifier is tested with unseen
data and its classification performance is evaluated by comparing the number of role labels
67 | P a g e
manually assigned to each argument of a sentence in the validation dataset with the number
of role labels or classes correctly recognized by the proposed model. To do this, we have
used a validation dataset which contains 200 simple Amharic sentences.
5.3.1.1. Evaluation Metrics
To evaluate the performance of our proposed network model we have used Precision,
Recall and F1-score evaluation metrics. These evaluation metrics used true positives, true
negatives, false positives and false negatives terms as an essential component by computed
the number of correctly recognized labels or class in our dataset with the number of
incorrectly recognized labels by the classifier. The term True positive, True negative, False
positive and False negative can be described as follows: -
True positive (TP): refers to the number of classes or labels in which we predicted YES
and the actual output was also YES.
False positive (FP): refers to the number of classes or labels in which we predicted YES
and the actual output was NO.
True negative (TN): refers to the number of classes or labels in which we predicted NO
and the actual output was NO.
False negative (FN): refers to the number of classes or labels in which we predicted NO
and the actual output was YES.
Precision: - It is the number of correctly positive results divided by the number of positive
results predicted observations by the classifier.
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 =𝑇𝑃
𝑇𝑃 + 𝐹𝑃
Recall: - it is one of the performance evaluation metrics that is the ratio of correctly
predicted positive observations to all observations in actual class yes.
𝑅𝑒𝑐𝑎𝑙𝑙 =𝑇𝑃
𝑇𝑃 + 𝐹𝑁
68 | P a g e
F1-score: - F1-score is also known as an F measure that used to measure test accuracy of
the model. It is the Harmonic Mean between precision and recall and ranges between 0 and
1. It tells you how precise your classifier is (how many instances it classifies correctly), as
well as how robust it is (it does not miss a significant number of instances).
𝐹1 − 𝑆𝑐𝑜𝑟𝑒 =2(𝑅𝑒𝑐𝑎𝑙𝑙 ∗ 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛)
𝑅𝑒𝑐𝑎𝑙𝑙 + 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛
Based on the testing dataset, the model achieved the following average precision, recall
and F1-score value as depicted on Table 5.7 below.
Table 5. 7 Average precision, recall, and F-score result for testing the model
Dataset Accuracy Precision Recall F1-Score
Testing Dataset 84.9% 82.56% 80.85% 81.72%
In addition to average value the individual class precision, recall and F1-score value are
shown on Table 5.8 below.
Table 5. 8 Individual role label performance on predicate sense based annotated data
69 | P a g e
Table 5. 9 Testing Result of the Semantic Role Labeler Model
Total
number of
sentences
Total number of
arguments in a
sentence
Total number of
arguments (Manually
Assigned)
Number of arguments
(the system correctly
determined)
Testing
Result
(%)
300 1530 1530 1300 84.9%
As we have shown in Table 5.8 above, we have used 300 simple Amharic sentences for
testing our model which have 1530 total number of arguments. In our dataset, we have 21
role labels or unique classes. we have assigned associated role labels for each 1530
arguments manually during data annotation. From these 1530 arguments, our proposed
system assigned a correct associated role label or classified correctly for only 1300
arguments from the total argument which means our models achieves 84.9% testing
accuracy for unseen data i.e., with testing dataset.
The system also assigned a role label for the rest 230 arguments based on the information
obtained from the training phase. But which is incorrect or not matches to the actual value
assigned during data annotation.
This indicated that the proposed model can labelled 1300 arguments correctly based on the
information obtained during model training. Based on this information, the proposed model
performed an argument role labeling accuracy of 84.9% on the testing data. This accuracy
is obtained by matching the number of roles assigned to each argument manually during
data annotation to the number of roles assigned to each argument by the proposed model.
70 | P a g e
Figure 5. 4 Sample Semantic role labels assigned by the model with the Testing data
As depicted above in Figure 5.4, some arguments are classified incorrectly by the model.
This error appears because of the nature of the dataset since we have used manual data
annotation technique and the more similarity between two different classes.
Furthermore, Amharic is morphologically rich language which contains morpho-syntactic
information and contains different ambiguous word that create a challenge for the model
during a dense vector representation of each words and some arguments are misclassified
by the model. To reduce such data representation challenges using a language expert-based
data collection and preparation as well as integrating morphological analyzer and reducing
ambiguous words on the dataset minimized this problem.
The other cause of error in classification of unseen data is sequential segmenting and
labeling problem [34] which has appeared because of difficulty to decide a standard set or
number of roles and produce a formal definition for these roles that will create a syntactic
variation of a sentence.
71 | P a g e
5.4. Discussion of Results
In this study, we have designed and developed a Context-Based Semantic Role Labeler
model for Amharic language. which is used as a multilayer perceptron classifier to predict
the score of predicate-argument relationship of simple Amharic sentences. We have used
a trained neural network model to assign a role label for a given new Amharic sentence
argument. The model contains a neural embedding layer which takes a sequence of words
and their associated POS tag to create dense representation vectors for the given natural
text. Then we have applied a BiLSTM encoder on the embedding layer result for handling
the sequence information in both the forward and backward direction. After we have
obtained a result from the Bi-LSTM encoder we have used MLP classifier for classifying
arguments in a given sentence based on their score of role label. The MLP classifier uses
a biaffine attentional scorer technique to predict the score of argument and their associated
role labels pairs in a given sentence. After generating a score of each argument and possible
role label pairs the MLP classifier selects the maximum score of possible pairs in a sentence
and classifies each argument based on this score for each argument in a sentence. We
conduct experiments to examine the performance of the proposed model on sense of
predicate based annotated simple Amharic sentences. The experimental result shows the
proposed model achieves 95.9% training and 84.9% testing accuracy and the model well
performed on the predicate sense-based annotated dataset and selected optimal model
hyperparameter values such as dropout and learning rate as well as batch size.
We have not compared the performance of the model with other semantic role labeler
models due to the absence of pretrained language models for Amharic language. We have
not also compared with E. Yirga and her colleagues’ work because it is difficult to access
her dataset since it is unavailable in an online data source site. So, we have used
dependency-based data annotation, So the data format, data size and classification methods
are different.
72 | P a g e
CHAPTER SIX: CONCLUSION AND RECOMMENDATION
6.1. Conclusion
Semantic Role Labeling is a process of identifying constituents and predicates in the
sentence and assigning associated labels for each of them that expressed their semantic
roles in the sentences. During role label assigning context of predicate consideration is
necessary to perform a correct annotation of each argument in a sentence.
In this study, we have developed a context-based semantic role labeler model for simple
Amharic sentences using a deep neural network by performing sense of predicates-based
data annotation from different social media platforms. The data were collected for simple
Amharic sentences from different social media platforms, identifying predicates which
have more than one senses and annotate the dataset semantically based on PropBank
annotation guidelines for each predicate depending on the sense of predicates and try to
show its role on SRL tasks. In addition to this, we have used a Bi-LSTM encoder and MLP
Neural network classifier which uses a biaffine attentional scoring techniques to predict
score of labels. The proposed model achieves 95.9% training and 84.9% testing
accuracy on predicate sense-based annotated dataset.
6.2. Contribution of the study
This study provides the following scientific contribution: -
The study contributed a Context-Based Semantic Role Labeler for Amharic
language based on dependency based SRL for simple Amharic sentences.
The study contributed 2000 semantically annotated dataset for Amharic language
that have been used as a resource for further researchers.
This study includes 40 ambiguous verbs which have a total more than 80 sense of
predicates in our simple Amharic sentence dataset with their associated role labels
in which previous researchers did not consider and tested their role on the SRL task.
This increases the training and testing accuracy by 0.4% and 2.6% respectively.
The study shows the effectiveness of deep Bi-LSTM networks and sense of verbs
on semantic role labeling tasks.
73 | P a g e
6.3. Recommendation
In this research, during dataset annotation we have considered verb sense as a
feature for semantic role labeling targeted on simple Amharic sentences only.
However, due to the existence of different kinds of sentences in Amharic language
it is required that this feature on all kinds of Amharic sentences will improve the
performance of the SRL task as a future work.
For this study, we have used language experts based semantically annotated corpus
for training the role labeler model. This is time consuming and becomes boring to
prepare a large dataset. However, for future importing automatic POS tagger and
semantic role annotator modules used in our proposed solution will minimize
manual efforts.
In deep learning, a huge size of corpus with balanced distribution of class is highly
regulating the performance of the proposed system during classification tasks. i.e.,
one of the basic limitations of deep learning is which require a large corpus
However, for our study we have used 2000 corpus only which is not well-balanced
class distribution, in future it is required to test the performance of the proposed
solution with larger and well-balanced corpus.
74 | P a g e
REFERENCES
[1] A. Lopez, “Natural language understanding on semantic role labeling,” School of
Informatics, University of Edinburgh, Mar. 27, 2018.
[2] J. Cai1, S. He1, Z. Li1 and H. Zhao, “A Full End-to-End Semantic Role Labeler, Syntax-
agnostic Over Syntax-aware,” proceedings of the 27th International Conference on
Computational Linguistics, pages 2753–2765, Santa Fe, New Mexico, USA, Aug. 20-26,
2018.
[3] K. Jaideep, “Five applications of natural language processing for businesses,” Ju. 28, 2019.
[Online]: Available: https://www.upgrad.com/blog/5-applications-of-natural-language-
processing-for-businesses/ [Accessed 25 Oct. 2019].
[4] Dr. P. Merlo, “Semantic roles in natural language processing and in linguistic theory,”
Universit´e de Gen`eve, departement de Linguistique, Oct. 9, 2009.
[5] D. Shen, and M. Lapata, "Using semantic roles to improve question answering," In
EMNLP-CoNLL, pp. 12-21. Ju. 2007.
[6] M. Palmer, D. Gildea and P. Kingsbury, “The proposition bank: an annotated corpus of
semantic roles,” association for computational linguistics, 11 Ju., 2005.
[7] M. Mohamed and M. Oussalah, “SRL-ESA-TextSum: A text summarization approach based
on semantic role labeling and explicit semantic analysis,” Information in processing and
management at the university of Birmingham, pages 1356–1372, retrieved 20 Feb. 2019;
accepted 12 Ap. 2019.
[8] J. Christensen, Mausam, S. Soderland and O. Etzioni, “Semantic role labeling for open
information extraction,” in proceedings of the NAACL HLT 2010 first international
workshop on formalisms and methodology for learning by reading, pages 52–60, Los
Angeles, California, Ju. 2010.
75 | P a g e
[9] L. He, K. Lee, M. Lewis and L. Zettlemoyer, “Deep semantic role labeling: what works and
what’s next,” proceedings of the 55th annual meeting of the association for computational
linguistics, pages 473–483 Vancouver, Canada, Ju. 30 – Au. 4, 2017.
[10] L. He, M. Lewis and L. Zettlemoyer, “Question-Answer driven semantic role labeling using
natural language to annotate natural language,” University of Washington Seattle, WA.
[11] R. Cai and M. Lapata, “Syntax-aware semantic role labeling without parsing,” transactions
of the association for computational linguistics, vol. 7, pp. 343–356, 6 Ju., 2019.
[12] F. Qian, L. Sha, B. Chang, L. Liu and M. Zhang, “Syntax aware LSTM model for semantic
role labeling,” in proceedings of the 2nd workshop on structured prediction for natural
language processing, pages 27–32 Copenhagen, Denmark, Sep. 7–11, 2017.
[13] Z. Tan, M. Wang, J. Xie, Y. Chen and X. Shi, “Deep semantic role labeling with self-
attention,” association for the advancement of artificial intelligence, Xiamen University,
China, Dec. 5, 2018.
[14] ጌታሁን አማረ “ዘመናዊ የአማርኛ ሰዋሰው በቀላል አቀራረብ”, Addis Ababa, 1989 (EC).
[15] D. Alfano, R. Abbruzzese and D. Cappetta, “Neural semantic role labeling using verb sense
disambiguation,” 2019.
[16] T. Pham, X. Pham and P. Le-Hong, “Building a semantic role labelling system for
Vietnamese,” 11 May 2017.
[17] ባ. ይማም, የአማርኛ ሰዋሰው የተሻሻለ ሁለተኛ ዕትም, አዲስ አበባ, ጥቅምት 2001.
[18] E. Yirga, “Semantic role labeler for Amharic text using memory-based learning,” (MSC
thesis) Addis Ababa University, Ju. 2017.
[19] K. Peffers, T. Tuunanen, M. A. Rothenberger and S. Chatterjee, “A Design Science
Research Methodology for Information Systems Research,” Published in Journal of
Management Information Systems, Volume 24 Issue 3, pp. 45-78, Winter 2007-8.
76 | P a g e
[20] W. Daelemans and A. van den Bosch, “Memory-Based Learning,” in A draft chapter for
the Blackwell Computational Linguistics and Natural Language Processing Handbook,
University of Antwerp, Alex Clark, Chris Fox and Shalom Lapping, 2009.
[21] W. Ahmed and M. Bahador, “The accuracy of the LSTM model for predicting the S&P
500 index and the difference between prediction and back testing,” School of Electrical
Engineering and Computer Science, Stockholm, Sweden, Ju. 4, 2018.
[22] A. Shelmanov and D. Devyatkin, “Semantic role labeling with neural networks for texts in
Russian,” Proceedings of the International Conference “Dialogue 2017”, Moscow, May
31—Ju. 3, 2017.
[23] M. Roth and M. Lapata, “Neural semantic role labeling with dependency path embeddings,”
School of Informatics, University of Edinburgh, Ju. 18, 2016.
[24] P. Moreda and M. Palomar, “The role of verb sense disambiguation in semantic role
labeling,” natural language processing research group, University of Alicante, 2006.
[25] T. Li and B. Chang, “Semantic role labeling using recursive neural network,” Collaborative
Innovation Center for Language Ability, Xuzhou 221009 China, 2015.
[26] N. Xue, M. Palmer, “Automatic semantic role labeling for Chinese verbs,” University of
Pennsylvania, PA 19104, USA.
[27] M. Diab, A. Moschitti and D. Pighin, “Semantic Role Labeling Systems for Arabic using
Kernel Methods,” Proceedings of Association for Computational Linguistics, pages 798–
806, Columbus, USA, Ju. 2008.
[28] J. Zhou and W. Xu., “End-to-end learning of semantic role labeling using recurrent neural
networks,” in proceedings of the 53rd annual meeting of the association for computational
linguistics and the 7th international joint conference on natural language processing,
pages 1127–1137, Beijing, China, Ju. 26-31, 2015.
77 | P a g e
[29] K. Wang, “Automatic Semantic Role Labeling for Chinese,” 2010 IEEE/WIC/ACM
International Conference on Web Intelligence and Intelligent Agent Technology, Dalian
University of Technology, China, 2010.
[30] A. Ghalibaf, S. Rahati and A. Estaji, “Shallow Semantic Parsing of Persian Sentences,”
23rd Pacific Asia Conference on Language, Information and Computation, pages 150–
159, 2009.
[31] D. Henrique et al., “Using recurrent neural networks for semantic role labeling in
Portuguese,” 01 Oct. 2019.
[32] D. Jurafsky and J. H. Martin, "Speech and Language Processing: An Introduction to
Natural Language Processing, Computational Linguistics, and Speech Recognition," Book
Review, University of Colorado, Boulder, 2015.
[33] J. Park, “Selectively connected self-attentions for semantic role labeling,” Department of
Computer Science and Engineering, Incheon National University; South Korea, 8 Mar.
2019.
[34] L. Marquez, “Exploring challenges in Semantic Role Labeling,” TALP Research Center
Technical University of Catalonia, Moscow, Russia, May 28, 2013.
[35] T. Samardzic, "Semantic Roles in Natural Language Processing and in Linguistic Theory,"
Unpublished PhD Dissertation, Thesis de lic., Universidad de Ginebra, 2009.
[36] G. Stevens, “Automatic semantic role labeling in a Dutch corpus,” Master thesis in
Universiteit Utrecht, Sep. 2006.
[37] D. Godayal and S. Malhotra, “An introduction to part-of-speech tagging and the Hidden
Markov Model,” 8 Ju., 2018
78 | P a g e
[38] T. Liu, W. Che, S. Li, Y. Hu and H. Liu, “Semantic Role Labeling System using Maximum
Entropy Classifier,” Proceedings of the 9th Conference on Computational Natural
Language Learning (CoNLL), pages 189–192, Ann Arbor, Ju., 2005.
[39] T. Mitsumori, M. Murata, Y. Fukuda, K. Doi and H. Doi “Semantic Role Labeling Using
Support Vector Machines,” Graduate School of Information Science, Nara Institute of
Science and Technology.
[40] J. Camacho-Collados and M. Pilehvar, “On the Role of Text Preprocessing in Neural
Network Architectures: An Evaluation Study on Text Categorization and Sentiment
Analysis,” proceedings of the 2018 EMNLP Workshop Blackbox NLP: Analyzing and
Interpreting Neural Networks for NLP, pages 40–46, Brussels, Belgium, Nov. 1, 2018.
[41] T. Cohn and P. Blunsom “Semantic Role Labelling with Tree Conditional Random Fields,”
proceedings of the 9th Conference on Computational Natural Language Learning
(CoNLL), pages 169–172, University of Melbourne, Australia, Ju. 2005,
[42] C. Bonial, O. Babko-Malaya, J. D. Choi, J. Hwang and M. Palmer, “PropBank annotation
guidelines,” Center for Computational Language and Education Research Institute of
Cognitive Science, University of Colorado at Boulder, Dec. 23, 2010.
[43] R. Johansson and P. Nugues, “A FrameNet-based Semantic Role Labeler for Swedish,”
Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions, pages 436–
443, Sydney, Jul 2006.
[44] Fillmore, Charles J., R. Lee-Goldman, and R. Rhodes. "The FrameNet construction," Sign-
based construction grammar (2012): 309-372.
[45] C. Bonial, J. Hwang, J. Bonn, K. Conger, O. Babko-Malaya and M. Palmer, “English
PropBank annotation guidelines,” Center for Computational Language and Education
Research, Institute of Cognitive Science, University of Colorado at Boulder, Nov. 14,
2012.
79 | P a g e
[46] M. Diab, A. Moschitti and D. Pighin, “CUNIT: A Semantic Role Labeling System for
Modern Standard Arabic,”
[47] D. Marcheggiani1, A. Frolov and I. Titov, “A Simple and Accurate Syntax-Agnostic
Neural Model for Dependency-based Semantic Role Labeling,” proceedings of the 21st
Conference on Computational Natural Language Learning (CoNLL 2017), pages 411–
420, Vancouver, Canada, Aug. 3 – Aug. 4, 2017.
[48] T. Dozat, p. qi., and C. Manning, “Stanford’s graph-based dependency parser at the
CONLL 2017 shared task,” 2017.
[49] Lisa M. Belue, “an investigation of Multilayer perceptrons for classification,” MSC thesis
at the Air force Institute of technology, Air University, Captain, USAF, Mar., 1992.
[50] A. Panigrahi, A. Shetty and N. Goyal, “Effect of activation functions on the training of
overparametrized neural nets,” Published as a conference paper at ICLR, Mar. 14, 2020.
[51] A. Gebremariam, “Amharic-to-Tigrigna Machine Translation Using Hybrid Approach,”
Unpublished Master Thesis, College of Natural Science, Addis Ababa University, Oct. 7,
2017.
[52] I. Putrayasa and D. Ramendra, “The Type of Sentence in The Essays of Grade VI
Elementary School Students in Bali Province: A Syntactic Study,” 4th PRASASTI
International Conference on Recent Linguistics Research, volume 166, 2018.
80 | P a g e
APPENDIXES
Appendix A: List of part of speech tags used in this system and their Description
NO POS tag Name Description
1 NOUN Noun
2 VERB Verb
3 NOUNP Noun Phrase
4 NDet Noun Determinant
5 ADJ Adjective
6 ADP Adpposition
7 ADV Adverb
8 ADJP Adjectival Phrase
9 NUMP Numeric Phrase
10 NUMCR Cardinal Number
11 NUMOR Ordinal Number
12 PROPN Preposition Noun
13 ADVP Adverb Phrase
14 CCONJ Coordinating Conjunction
15 ADPP Adpposition Phrase
16 VP Verb Phrase
81 | P a g e
Appendix E: List of PropBank Semantic Roles Used in the System
NO Semantic Role Description
1 ARG0-AGT Agent
2 ARG0-EXP Experiencer
3 ARG1-PAT Patient
4 ARG1-DES Direction or Goal
5 ARG1-BEN Beneficiary
6 ARG1-SRC Source
7 ARG1-EXP Experiencer
8 ARG2-INS Instrument
9 ARG2-EXP Experiencer
10 ARG2-BEN Beneficiary
11 ARG1-TEM Theme
12 ARGM-LOC Location
13 ARGM-DES Destination
14 ARGM-PUR Purpose
15 ARGM-CAU Cause
16 ARGM-INS Instrument
82 | P a g e
17 ARGM-COM Comitative
18 ARGM-MNR Manner
19 ARGM-SRC Source
20 ARGM-TMP Time
21 ARGM-EXT Extent (added during Predicate sense
disambiguation)
22 REL Predicate or Relative (Optional class)