Upload
others
View
6
Download
0
Embed Size (px)
Citation preview
Semantic Role Labeling based on dependency Tree with multi-features
Hanxiao Shi1,2, Guodong Zhou1, Peide Qian1 1School of Computer Science & Technology
Soochow University Suzhou, China
e-mail: [email protected]
Xiaojun Li2 2Computer and Information Engineering College
Zhejiang Gongshang University Hangzhou, China
Abstract—In this paper, a dependency tree-based semantic role labeling (SRL) system is proposed. Firstly, this paper introduces current SRL research situation, analyses syntactic tree-based SRL and dependency tree-based SRL comparatively. System accomplishes predicate identification, and automatically creates dependency relation using a dependency parser. Then, system cuts off the nodes which are not related with the predicate using effective pruning algorithm, and proposes additional features based on Hacioglu’s baseline features. Finally, the features are input to Maximum Entropy classifier to determine the corresponding semantic role label. System achieves the F1-measure of 81.95 on the WSJ corpus of the CoNLL’2008 SRL shared task.
Keywords- Semantic Role Labeling; Dependency relation; Feature extraction
I. INTRODUCTION Automatic semantic parsing has always been one of the
main goals of natural language understanding. Deep semantic parsing can translate the natural language into the form language, so that computers can communicate with human beings freely. However, because this issue is too complex, the results are not very ideal now. Shallow semantic parsing is a simplified deep semantic parsing. It only labels predicate related constituents with semantic roles in a sentence, such as agent, patient, time, place, and so on. The technique can promote many applications, such as question and answering, information extraction, and machine translation. Semantic role labeling is a way to achieve shallow semantic parsing. It has many advantages, such as clear definition and easy to evaluate. More and more researchers have paid much attention on it in recent years.
At present, the mainstream studies of semantic role labeling focus on the use of a variety of statistical machine learning techniques, the use of all kinds of linguistics features, and to identify and classify semantic roles. Many scholars have paid attention to this research, such as Gildea [1], Surdeanu [2], Xue [3], Pradhan [4], Hacioglu [5], Liu T. [6]. Gildea [1]researched on semantic roles labeling based on syntactic analysis in early days, he presented seven baseline features for classifying roles, such as predicate, phrase type, sub-categorization , parse tree path, position, voice, head word, the system achieved 87% in F1-measure, these results are based on syntactic features computed from hand-corrected PropBank test set. Surdeanu et al [2] introduced a new way of automatically identifying predicate argument
structures, it was based on an extend set of features and inductive decision tree learning. The experimental results were desirable in information extraction. Xue [3] proposed an additional set of features, such as Syntactic frame, Lexicalized constituent type, Lexicalized head word, Voice position combination, and the experiments showed that these features leaded to fairly significant improvements in the tasks. He also proposed a useful pruning algorithm to filter the nodes which are not related with the predicate. The system achieved 88.51% in F1-measure on PropBank test set. Pradhan et al [4] formulated the parsing problem as a multi-class classification problem and used a Support Vector Machine (SVM) classifier. Since the training time taken by SVMs scales exponentially with the number of examples, and about 80% of the nodes in a syntactic tree have NULL argument labels, they found it efficient to divide the training process into two stages, while maintaining the same accuracy: 1) Filter out the nodes that have a very high probability of being NULL; 2) The remaining training data is used to train ONE vs ALL classifiers, one of which is the NULL-NON-NULL classifier. The NON-NULL nodes can then be further classified into the set of argument labels based on baseline features and some new features. The system achieved 87% in F1-measure, these results are based on Chariniak automatic syntactic analysis in PropBank. Liu T. [6] used a Maximum Entropy classifier to achieve 77.13% in F1-measure based on single automatic syntactic analysis.
Research on Semantic Role Labeling based on dependency analysis is not popular, compared with Semantic Role Labeling based on syntactic analysis. Hacioglu [5] first formulated the semantic role labeling as a classification problem of dependency relations into one of several semantic roles. A dependency tree is created from a constituency parse of an input sentence. The dependency tree is then linearized into a sequence of dependency relations. A number of features are extracted for each dependency relation using a predefined linguistic context, such as family membership, position, dependent word, headword, and so on. Finally, the features are input to a set of one-versus-all support vector machine (SVM) classifiers to determine the corresponding semantic role label. Experiments were carried out using the DepBank dev set and the CoNLL2004 dev set (directly created from the PropBank), F1-mesure is 84.6% and 79.8% respectively. Recently, there has been some interest in developing a deterministic machine-learning based approach for dependency parsing (Yamada and Matsumoto
2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing
978-0-7695-3739-9 2009
U.S. Government Work Not Protected by U.S. Copyright
DOI 10.1109/IJCBS.2009.99
618
2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing
978-0-7695-3739-9 2009
U.S. Government Work Not Protected by U.S. Copyright
DOI 10.1109/IJCBS.2009.99
618
2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing
978-0-7695-3739-9 2009
U.S. Government Work Not Protected by U.S. Copyright
DOI 10.1109/IJCBS.2009.99
608
2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing
978-0-7695-3739-9 2009
U.S. Government Work Not Protected by U.S. Copyright
DOI 10.1109/IJCBS.2009.99
608
2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing
978-0-7695-3739-9 2009
U.S. Government Work Not Protected by U.S. Copyright
DOI 10.1109/IJCBS.2009.99
608
2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing
978-0-7695-3739-9 2009
U.S. Government Work Not Protected by U.S. Copyright
DOI 10.1109/IJCBS.2009.99
608
2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing
978-0-7695-3739-9 2009
U.S. Government Work Not Protected by U.S. Copyright
DOI 10.1109/IJCBS.2009.99
590
2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing
978-0-7695-3739-9 2009
U.S. Government Work Not Protected by U.S. Copyright
DOI 10.1109/IJCBS.2009.99
590
2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing
978-0-7695-3739-9 2009
U.S. Government Work Not Protected by U.S. Copyright
DOI 10.1109/IJCBS.2009.99
590
2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing
978-0-7695-3739-9 2009
U.S. Government Work Not Protected by U.S. Copyright
DOI 10.1109/IJCBS.2009.99
590
2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing
978-0-7695-3739-9 2009
U.S. Government Work Not Protected by U.S. Copyright
DOI 10.1109/IJCBS.2009.99
584
Authorized licensed use limited to: Soochow University. Downloaded on December 11, 2009 at 00:13 from IEEE Xplore. Restrictions apply.
[7]). In addition to relatively easier portability to other domains and languages the deterministic dependency parsing promises algorithms that are robust and efficient.
II. SYSTEM DESCRIPTION Our experiments were carried out using the CoNLL2008
dev set, which includes verb and noun predicates. Since the performance is not fine if two types of predicates are put together to label, for the characteristic of two types of predicates is not the same and the dependency relations are also much different based on different predicates, for example, the roles of verb predicate could be sibling or children nodes etc. besides its ancestor nodes, however noun predicate roles include more ancestor nodes.
System mainly includes three parts: creation of dependency relation; predicate identification; role labeling (including identification and classification). The process is shown as fig. 1.
Figure 1. The workflow of system process.
A. Conception of dependency analysis There are two types of predicates, noun and verb
predicates, in SRL. For example (1), there are two predicates, noun predicate (markets) and verb predicate (had) respectively, now labeling the semantic roles based on verb predicate (‘had’)
[A0 Economic news] had [A1 little effect] [AM-LOC on financial markets]. (1)
Fig.2 shows dependency relation of example (1) and fig.3 illustrates its corresponding dependency tree.
Figure 2. Dependency relation of example (1)
W=hadR=root
W= newsR=SBJ ARG=A0
W=effectR=OBJ ARG=A1
W= .R=P
W=littleR=NMOD
W=onR=NMOD ARG=AM-LOC
W=marketsR=PC
W=financialR=NMOD
W=economicR=NMOD
Figure 3. Dependency tree of example (1)
In fig. 3, W represents word, R represents dependency relation.
B. Predicate identification and creation of dependency relation Our experiment uses CoNLL2008 shared task test set,
however, predicates are given and required to be determined automatically by the system. Reference to CoNLL2005 shared task (predicates were labeled), our system automatically labels predicates in CoNLL2008 test set using CoNLL2005 corpus, achieves the accuracies of 98.7%. Noun predicates identification is used by machine learning, system uses a Maximum Entropy classifier to achieve the accuracies of 94.5%, features includes: dependency relation, word, POS, headword, headword POS, and so on.
At present, there are many parser tools to create dependency relation, our system use Maltparser and MSTParser to generate the dependency relation respectively, the result comparison is shown in table 1. The result shows that the performance of MSTParser is better.
TABLE I. PERFORMANCE COMPARISON OF MALTPARSER AND MSTPARSER
Labeled attachment Unlabeled attachment Label accuracy
Maltparser 85.50 % 88.41 % 90.41 %
MSTParser 87.01% 89.72% 91.75%
C. Pruning and preprocessing �Our system cuts off the nodes which are not related with
the predicate, three different pruning algorithms are mixed used in noun and verb predicate.
(1)Hacioglu algorithm: is applied to filter out unlikely dependency relation nodes in a dependency tree by only keeping the parent/children/grand-children of the predicate, the siblings of the predicates, and the children/grandchildren of the siblings.
619619609609609609591591591591585
Authorized licensed use limited to: Soochow University. Downloaded on December 11, 2009 at 00:13 from IEEE Xplore. Restrictions apply.
(2)New Hacioglu algorithm: is based on Hacioglu algorithm, includes the nodes to more layers upward and downward with regard to the predicate’s parent, such as the predicate’s grandparent, the grandparent’s children and the grandchildren’s children, the siblings of children, the sibling of grand-children.
(3)Xue algorithm: keeps all nodes and their children nodes from current predicate node to root node.
After two-category identification and classification, system keeps all not-null roles and less-than-threshold NULL. The demonstration is shown in table 2.
TABLE II. PREDICATE IDENTIFICATION USING PRUNING ALGORITHM
Pruning
algorithm Threshold
Miss-pruning
Train
set
Test set
(Gold)
Verb predicate Xue+Hacioglu 0.9 0.7% 0.7%
Noun predicate New Hacioglu 0.95 2.9% 35.5%
In feature files after preprocessing, our system replaces the roles that the number of appearance is less than 200 with Null, such as A5, AM-PRD,C-AM-ADV,R-A2, and so on, for these roles could mislead to generate model results.
D. Features Reference to Hacioglu proposed eight baseline features,
system adds some advanced features, all features and their illustration are shown in table 3. We take the word markets in Fig. 2 as the predicate and the node “had” as the node on focus.
TABLE III. FEATURES AND DEMONSTRATION
Feature demonstration
Baseline feature
Predicate Predicate lemma(market)
Predicate POS POS of current predicate (NN)
Predicate voice Whether the predicate (verb) is realized as an
active or passive construction. ( _ )
Sub-categorization
The dependency relation chain of the relation
label sequence of the predicate’s children
(NMOD)
Path The chain of relations from relation node to
predicate. (NMOD->PC->NMOD->OBJ)
Position The position of the headword of the dependency relation with respect to the predicate position in the sentence. (before)
Dependency relation type The type of the dependency relation
(NMOD)
Headword the modified (head) word in the relation (on)
Advanced features
Dependency relation
pattern of predicate’s
children
The left-to-right chain of the relation labels of current predicate’s children. (NMOD)
POS chain of predicate’s siblings
The left-to-right chain of the POS tag
sequence of current predicate’s siblings.
Dependency relation
chain of predicate’s
siblings
The left-to-right chain of the relation label
sequence of the predicate’s siblings. (_)
Dependency relation
pattern of predicate
The left-to-right chain of the relation label
sequence of the predicate (NMOD)
POS chain of predicate’s
children
The left-to-right chain of the POS tag
sequence of current predicate’s children.
Family member
The relationship between current node and
the predicate node in the family tree, such as
parent, child, sibling. (child) Dependent word
The word of current node (had)
POS of headword The POS tag of the headword of current
word. (IN)
POS of dependent headword the POS tag of current word
(VBD)
Predicate +headword The combination of Predicate lemma and
headword (market+ on)
headword +relation type The combination of headword and relation type.
(on+ NMOD)
III. SYSTEM�RESULTS�AND�PERFORMANCE�ANALYSIS��
Our system uses WSJ corpus supplied by CoNLL2008 shared task, the data is from PropBank and Nombank, including train, dev and test set.
A. Feature performance In order to better evaluate the contribution of various
additional features, we build a baseline system using hand-corrected dependency relations and the eight basic features. Table 4 shows the effect of various additional features by adding one individually to the baseline system.
TABLE IV. THE CHANGE OF EFFECT AFTER ADDING VARIOUS SINGLE FEATURE (%)
Precision Recall F1 Gold Baseline 85.29 79.71 82.39
+ family member 85.69 79.88 82.69 + dependent word 87.74 84.01 85.84 + POS of headword 85.44 79.55 82.37 + POS of dependent word
85.42 79.33 82.47
+POS chain of predicate’s children
85.35 79.73 82.47
620620610610610610592592592592586
Authorized licensed use limited to: Soochow University. Downloaded on December 11, 2009 at 00:13 from IEEE Xplore. Restrictions apply.
+ dependency relation chain of predicate’s children
85.77 79.99 82.81
+ dependency relation chain of predicate’s sibling.
85.29 79.52 82.30
+ POS chain of predicate’s sibling
84.75 79.32 81.95
+ predicate 85.03 79.83 82.34 + predicate + headword
84.30 79.94 82.30
+ relation type + headword
85.65 80.36 82.93
+ POS of headword + POS of dependent word
85.33 79.81 82.46
+ family member + headword
85.44 79.70 82.42
+ predicate + relation type + headword
86.13 79.99 83.12
It shows that the feature of dependent word is most
useful, which improves the labeled F1 score from 82.39% to 85.84%. The performance has a great advance, for dependency relation mainly refers to the relation between dependent word and headword, so dependent word greatly influences on dependency analysis even whole system performance. However, it also shows that the two features about predicate’s sibling deteriorate the performance, so we delete these two features in our experiments. Although the combined feature of “predicate + headword” is useful in constituent structure tree-based SRL, it slightly decrease the performance in dependency tree-based SRL. For convenience, we still keep it in our system.
B. Performance comparison After applying pruning algorithm and above features, the
effect of dependency analysis has been promoted apparently, the result is shown in table V.
TABLE V. PERFORMANCE COMPARISON USING MSTPARSER AND MALTPARSER WITH PREDICATES AUTOMATICALLY IDENTIFIED IN
CONLL2008 WSJ TEST SET
Precision recall F1 Gold 88.46 % 84.84 % 86.63 %
MaltParser 77.11 % 73.12 % 75.06 % MSTParser 83.49 % 80.50 % 81.95 %
The results show that MSTParser-based SRL performs
slightly better than MaltParser-based one, much less than the
performance difference on dependency parsing between them. This suggests that such difference between these two state-of-the-art dependency parsers does not much affect corresponding SRL systems. Our results achieve better performance compared with Hacioglu et al.( achieved 79.8% in F1-measure based on CoNLL2004 shared task test set), for we adopted some additional features and effective pruning algorithm.
IV. CONCLUSIONS This paper describes a SRL system based on dependency
analysis. In order to improve the system performance, we proposed some additional features. Our system automatically labels verb predicates, and makes noun predicates identification by machine learning and uses a Maximum Entropy classifier to achieve the accuracies 94.5%. Our pruning algorithm further cuts off the nodes which are not related with the predicate, improves the effect of whole system. Through above works, the system achieves good performance under experiments. Meanwhile, we also have many works to do in order to improve system performance:
(1)Improve the pruning algorithm, especially to modify the pruning algorithm for noun predicate.
(2)Select effective features or muti-features, try to associate different features to achieve better performance.
(3)Try to use SVM classifier to determine the corresponding semantic role label, Pradhans et al. proposed SVM could work better, for system easily achieve better performance through modifying the parameters of SVM.
ACKNOWLEDGMENT This work was supported by Grand Science Project of
Zhejiang Province of China under Grant number 2008C13082. We would also like to thank our referees for their helpful comments and suggestions.
REFERENCES
[1] Gildea D., Jurafsky D., “Automatic labeling of semantic roles,”
Computational Linguistics, vol. 28, Mar. 2002, pp. 245-288. [2] Surdeanu M, Harabagiu S, Williams J, et al, “Using predicate-argument
structures for information extraction,” Proc. Association for
Computational Linguistics (ACL 2003), 2003, pp. 8-15.
[3] Xue N, Palmer M., “Calibrating features for semantic role labeling,”
Proc. Empirical Methods in Natural Language Processing (EMNLP
2004), 2004, pp. 88-94. [4] Pradhan S, Ward W, Hacioglu K, et al, “Shallow Semantic Parsing Using
Support Vector Machines,” Proc. of NAACL-HLT 2004. [5] Kadri Hacioglu, “Semantic Role Labeling Using Dependency Trees,”
Proc. Computational Natural Language Learning (CoNLL-2004), 2004. [6] Liu Ting, Che Wanxiang, Li Sheng, “Semantic Role Labeling with
Maximum Entropy Classifier,” Journal of Software, vol. 18, Mar. 2007, pp. 565-573.
[7] Hiroyasu Yamada and Yuji Matsumoto, “Statistical Dependency Analysis with Support Vector Machines,” Proc. of IWPT’03, 2003.
621621611611611611593593593593587
Authorized licensed use limited to: Soochow University. Downloaded on December 11, 2009 at 00:13 from IEEE Xplore. Restrictions apply.