Semantic Role Labeling Based on Dependency Tree with Multi

[email protected]

Semantic Role Labeling based on dependency Tree with multi-features

Hanxiao Shi1,2, Guodong Zhou1, Peide Qian1 1School of Computer Science & Technology

Soochow University Suzhou, China

e-mail: [email protected]

Xiaojun Li2 2Computer and Information Engineering College

Zhejiang Gongshang University Hangzhou, China

Abstract—In this paper, a dependency tree-based semantic role labeling (SRL) system is proposed. Firstly, this paper introduces current SRL research situation, analyses syntactic tree-based SRL and dependency tree-based SRL comparatively. System accomplishes predicate identification, and automatically creates dependency relation using a dependency parser. Then, system cuts off the nodes which are not related with the predicate using effective pruning algorithm, and proposes additional features based on Hacioglu’s baseline features. Finally, the features are input to Maximum Entropy classifier to determine the corresponding semantic role label. System achieves the F1-measure of 81.95 on the WSJ corpus of the CoNLL’2008 SRL shared task.

Keywords- Semantic Role Labeling; Dependency relation; Feature extraction

I. INTRODUCTION Automatic semantic parsing has always been one of the

main goals of natural language understanding. Deep semantic parsing can translate the natural language into the form language, so that computers can communicate with human beings freely. However, because this issue is too complex, the results are not very ideal now. Shallow semantic parsing is a simplified deep semantic parsing. It only labels predicate related constituents with semantic roles in a sentence, such as agent, patient, time, place, and so on. The technique can promote many applications, such as question and answering, information extraction, and machine translation. Semantic role labeling is a way to achieve shallow semantic parsing. It has many advantages, such as clear definition and easy to evaluate. More and more researchers have paid much attention on it in recent years.

At present, the mainstream studies of semantic role labeling focus on the use of a variety of statistical machine learning techniques, the use of all kinds of linguistics features, and to identify and classify semantic roles. Many scholars have paid attention to this research, such as Gildea [1], Surdeanu [2], Xue [3], Pradhan [4], Hacioglu [5], Liu T. [6]. Gildea [1]researched on semantic roles labeling based on syntactic analysis in early days, he presented seven baseline features for classifying roles, such as predicate, phrase type, sub-categorization , parse tree path, position, voice, head word, the system achieved 87% in F1-measure, these results are based on syntactic features computed from hand-corrected PropBank test set. Surdeanu et al [2] introduced a new way of automatically identifying predicate argument

structures, it was based on an extend set of features and inductive decision tree learning. The experimental results were desirable in information extraction. Xue [3] proposed an additional set of features, such as Syntactic frame, Lexicalized constituent type, Lexicalized head word, Voice position combination, and the experiments showed that these features leaded to fairly significant improvements in the tasks. He also proposed a useful pruning algorithm to filter the nodes which are not related with the predicate. The system achieved 88.51% in F1-measure on PropBank test set. Pradhan et al [4] formulated the parsing problem as a multi-class classification problem and used a Support Vector Machine (SVM) classifier. Since the training time taken by SVMs scales exponentially with the number of examples, and about 80% of the nodes in a syntactic tree have NULL argument labels, they found it efficient to divide the training process into two stages, while maintaining the same accuracy: 1) Filter out the nodes that have a very high probability of being NULL; 2) The remaining training data is used to train ONE vs ALL classifiers, one of which is the NULL-NON-NULL classifier. The NON-NULL nodes can then be further classified into the set of argument labels based on baseline features and some new features. The system achieved 87% in F1-measure, these results are based on Chariniak automatic syntactic analysis in PropBank. Liu T. [6] used a Maximum Entropy classifier to achieve 77.13% in F1-measure based on single automatic syntactic analysis.

Research on Semantic Role Labeling based on dependency analysis is not popular, compared with Semantic Role Labeling based on syntactic analysis. Hacioglu [5] first formulated the semantic role labeling as a classification problem of dependency relations into one of several semantic roles. A dependency tree is created from a constituency parse of an input sentence. The dependency tree is then linearized into a sequence of dependency relations. A number of features are extracted for each dependency relation using a predefined linguistic context, such as family membership, position, dependent word, headword, and so on. Finally, the features are input to a set of one-versus-all support vector machine (SVM) classifiers to determine the corresponding semantic role label. Experiments were carried out using the DepBank dev set and the CoNLL2004 dev set (directly created from the PropBank), F1-mesure is 84.6% and 79.8% respectively. Recently, there has been some interest in developing a deterministic machine-learning based approach for dependency parsing (Yamada and Matsumoto

2009 International Joint Conference on Bioinformatics, Systems Biology and Intelligent Computing

978-0-7695-3739-9 2009

U.S. Government Work Not Protected by U.S. Copyright

DOI 10.1109/IJCBS.2009.99

618


978-0-7695-3739-9 2009


DOI 10.1109/IJCBS.2009.99

618


978-0-7695-3739-9 2009


DOI 10.1109/IJCBS.2009.99

608


978-0-7695-3739-9 2009


DOI 10.1109/IJCBS.2009.99

608


978-0-7695-3739-9 2009


DOI 10.1109/IJCBS.2009.99

608


978-0-7695-3739-9 2009


DOI 10.1109/IJCBS.2009.99

608


978-0-7695-3739-9 2009


DOI 10.1109/IJCBS.2009.99

590


978-0-7695-3739-9 2009


DOI 10.1109/IJCBS.2009.99

590


978-0-7695-3739-9 2009


DOI 10.1109/IJCBS.2009.99

590


978-0-7695-3739-9 2009


DOI 10.1109/IJCBS.2009.99

590


978-0-7695-3739-9 2009


DOI 10.1109/IJCBS.2009.99

584

Authorized licensed use limited to: Soochow University. Downloaded on December 11, 2009 at 00:13 from IEEE Xplore. Restrictions apply.

[7]). In addition to relatively easier portability to other domains and languages the deterministic dependency parsing promises algorithms that are robust and efficient.

II. SYSTEM DESCRIPTION Our experiments were carried out using the CoNLL2008

dev set, which includes verb and noun predicates. Since the performance is not fine if two types of predicates are put together to label, for the characteristic of two types of predicates is not the same and the dependency relations are also much different based on different predicates, for example, the roles of verb predicate could be sibling or children nodes etc. besides its ancestor nodes, however noun predicate roles include more ancestor nodes.

System mainly includes three parts: creation of dependency relation; predicate identification; role labeling (including identification and classification). The process is shown as fig. 1.

Figure 1. The workflow of system process.

A. Conception of dependency analysis There are two types of predicates, noun and verb

predicates, in SRL. For example (1), there are two predicates, noun predicate (markets) and verb predicate (had) respectively, now labeling the semantic roles based on verb predicate (‘had’)

[A0 Economic news] had [A1 little effect] [AM-LOC on financial markets]. (1)

Fig.2 shows dependency relation of example (1) and fig.3 illustrates its corresponding dependency tree.

Figure 2. Dependency relation of example (1)

W=hadR=root

W= newsR=SBJ ARG=A0

W=effectR=OBJ ARG=A1

W= .R=P

W=littleR=NMOD

W=onR=NMOD ARG=AM-LOC

W=marketsR=PC

W=financialR=NMOD

W=economicR=NMOD

Figure 3. Dependency tree of example (1)

In fig. 3, W represents word, R represents dependency relation.

B. Predicate identification and creation of dependency relation Our experiment uses CoNLL2008 shared task test set,

however, predicates are given and required to be determined automatically by the system. Reference to CoNLL2005 shared task (predicates were labeled), our system automatically labels predicates in CoNLL2008 test set using CoNLL2005 corpus, achieves the accuracies of 98.7%. Noun predicates identification is used by machine learning, system uses a Maximum Entropy classifier to achieve the accuracies of 94.5%, features includes: dependency relation, word, POS, headword, headword POS, and so on.

At present, there are many parser tools to create dependency relation, our system use Maltparser and MSTParser to generate the dependency relation respectively, the result comparison is shown in table 1. The result shows that the performance of MSTParser is better.

TABLE I. PERFORMANCE COMPARISON OF MALTPARSER AND MSTPARSER

Labeled attachment Unlabeled attachment Label accuracy

Maltparser 85.50 % 88.41 % 90.41 %

MSTParser 87.01% 89.72% 91.75%

C. Pruning and preprocessing �Our system cuts off the nodes which are not related with

the predicate, three different pruning algorithms are mixed used in noun and verb predicate.

(1)Hacioglu algorithm: is applied to filter out unlikely dependency relation nodes in a dependency tree by only keeping the parent/children/grand-children of the predicate, the siblings of the predicates, and the children/grandchildren of the siblings.

619619609609609609591591591591585


(2)New Hacioglu algorithm: is based on Hacioglu algorithm, includes the nodes to more layers upward and downward with regard to the predicate’s parent, such as the predicate’s grandparent, the grandparent’s children and the grandchildren’s children, the siblings of children, the sibling of grand-children.

(3)Xue algorithm: keeps all nodes and their children nodes from current predicate node to root node.

After two-category identification and classification, system keeps all not-null roles and less-than-threshold NULL. The demonstration is shown in table 2.

TABLE II. PREDICATE IDENTIFICATION USING PRUNING ALGORITHM

Pruning

algorithm Threshold

Miss-pruning

Train

set

Test set

（Gold）

Verb predicate Xue+Hacioglu 0.9 0.7% 0.7%

Noun predicate New Hacioglu 0.95 2.9% 35.5%

In feature files after preprocessing, our system replaces the roles that the number of appearance is less than 200 with Null, such as A5, AM-PRD，C-AM-ADV，R-A2, and so on, for these roles could mislead to generate model results.

D. Features Reference to Hacioglu proposed eight baseline features,

system adds some advanced features, all features and their illustration are shown in table 3. We take the word markets in Fig. 2 as the predicate and the node “had” as the node on focus.

TABLE III. FEATURES AND DEMONSTRATION

Feature demonstration

Baseline feature

Predicate Predicate lemma（market）

Predicate POS POS of current predicate (NN)

Predicate voice Whether the predicate (verb) is realized as an

active or passive construction. ( _ )

Sub-categorization

The dependency relation chain of the relation

label sequence of the predicate’s children

(NMOD)

Path The chain of relations from relation node to

predicate. (NMOD->PC->NMOD->OBJ)

Position The position of the headword of the dependency relation with respect to the predicate position in the sentence. (before)

Dependency relation type The type of the dependency relation

(NMOD)

Headword the modified (head) word in the relation (on)

Advanced features

Dependency relation

pattern of predicate’s

children

The left-to-right chain of the relation labels of current predicate’s children. (NMOD)

POS chain of predicate’s siblings

The left-to-right chain of the POS tag

sequence of current predicate’s siblings.

Dependency relation

chain of predicate’s

siblings

The left-to-right chain of the relation label

sequence of the predicate’s siblings. (_)

Dependency relation

pattern of predicate

The left-to-right chain of the relation label

sequence of the predicate (NMOD)

POS chain of predicate’s

children

The left-to-right chain of the POS tag

sequence of current predicate’s children.

Family member

The relationship between current node and

the predicate node in the family tree, such as

parent, child, sibling. （child） Dependent word

The word of current node (had)

POS of headword The POS tag of the headword of current

word. (IN)

POS of dependent headword the POS tag of current word

(VBD)

Predicate +headword The combination of Predicate lemma and

headword （market+ on）

headword +relation type The combination of headword and relation type.

（on+ NMOD）

III. SYSTEM�RESULTS�AND�PERFORMANCE�ANALYSIS��

Our system uses WSJ corpus supplied by CoNLL2008 shared task, the data is from PropBank and Nombank, including train, dev and test set.

A. Feature performance In order to better evaluate the contribution of various

additional features, we build a baseline system using hand-corrected dependency relations and the eight basic features. Table 4 shows the effect of various additional features by adding one individually to the baseline system.

TABLE IV. THE CHANGE OF EFFECT AFTER ADDING VARIOUS SINGLE FEATURE (%)

Precision Recall F1 Gold Baseline 85.29 79.71 82.39

+ family member 85.69 79.88 82.69 + dependent word 87.74 84.01 85.84 + POS of headword 85.44 79.55 82.37 + POS of dependent word

85.42 79.33 82.47

+POS chain of predicate’s children

85.35 79.73 82.47

620620610610610610592592592592586


+ dependency relation chain of predicate’s children

85.77 79.99 82.81

+ dependency relation chain of predicate’s sibling.

85.29 79.52 82.30

+ POS chain of predicate’s sibling

84.75 79.32 81.95

+ predicate 85.03 79.83 82.34 + predicate + headword

84.30 79.94 82.30

+ relation type + headword

85.65 80.36 82.93

+ POS of headword + POS of dependent word

85.33 79.81 82.46

+ family member + headword

85.44 79.70 82.42

+ predicate + relation type + headword

86.13 79.99 83.12

It shows that the feature of dependent word is most

useful, which improves the labeled F1 score from 82.39% to 85.84%. The performance has a great advance, for dependency relation mainly refers to the relation between dependent word and headword, so dependent word greatly influences on dependency analysis even whole system performance. However, it also shows that the two features about predicate’s sibling deteriorate the performance, so we delete these two features in our experiments. Although the combined feature of “predicate + headword” is useful in constituent structure tree-based SRL, it slightly decrease the performance in dependency tree-based SRL. For convenience, we still keep it in our system.

B. Performance comparison After applying pruning algorithm and above features, the

effect of dependency analysis has been promoted apparently, the result is shown in table V.

TABLE V. PERFORMANCE COMPARISON USING MSTPARSER AND MALTPARSER WITH PREDICATES AUTOMATICALLY IDENTIFIED IN

CONLL2008 WSJ TEST SET

Precision recall F1 Gold 88.46 % 84.84 % 86.63 %

MaltParser 77.11 % 73.12 % 75.06 % MSTParser 83.49 % 80.50 % 81.95 %

The results show that MSTParser-based SRL performs

slightly better than MaltParser-based one, much less than the

performance difference on dependency parsing between them. This suggests that such difference between these two state-of-the-art dependency parsers does not much affect corresponding SRL systems. Our results achieve better performance compared with Hacioglu et al.( achieved 79.8% in F1-measure based on CoNLL2004 shared task test set), for we adopted some additional features and effective pruning algorithm.

IV. CONCLUSIONS This paper describes a SRL system based on dependency

analysis. In order to improve the system performance, we proposed some additional features. Our system automatically labels verb predicates, and makes noun predicates identification by machine learning and uses a Maximum Entropy classifier to achieve the accuracies 94.5%. Our pruning algorithm further cuts off the nodes which are not related with the predicate, improves the effect of whole system. Through above works, the system achieves good performance under experiments. Meanwhile, we also have many works to do in order to improve system performance:

(1)Improve the pruning algorithm, especially to modify the pruning algorithm for noun predicate.

(2)Select effective features or muti-features, try to associate different features to achieve better performance.

(3)Try to use SVM classifier to determine the corresponding semantic role label, Pradhans et al. proposed SVM could work better, for system easily achieve better performance through modifying the parameters of SVM.

ACKNOWLEDGMENT This work was supported by Grand Science Project of

Zhejiang Province of China under Grant number 2008C13082. We would also like to thank our referees for their helpful comments and suggestions.

REFERENCES

[1] Gildea D., Jurafsky D., “Automatic labeling of semantic roles,”

Computational Linguistics, vol. 28, Mar. 2002, pp. 245-288. [2] Surdeanu M, Harabagiu S, Williams J, et al, “Using predicate-argument

structures for information extraction,” Proc. Association for

Computational Linguistics (ACL 2003), 2003, pp. 8-15.

[3] Xue N, Palmer M., “Calibrating features for semantic role labeling,”

Proc. Empirical Methods in Natural Language Processing (EMNLP

2004), 2004, pp. 88-94. [4] Pradhan S, Ward W, Hacioglu K, et al, “Shallow Semantic Parsing Using

Support Vector Machines,” Proc. of NAACL-HLT 2004. [5] Kadri Hacioglu, “Semantic Role Labeling Using Dependency Trees,”

Proc. Computational Natural Language Learning (CoNLL-2004), 2004. [6] Liu Ting, Che Wanxiang, Li Sheng, “Semantic Role Labeling with

Maximum Entropy Classifier,” Journal of Software, vol. 18, Mar. 2007, pp. 565-573.

[7] Hiroyasu Yamada and Yuji Matsumoto, “Statistical Dependency Analysis with Support Vector Machines,” Proc. of IWPT’03, 2003.

621621611611611611593593593593587


Documents

Semantic Role Labeling Based on Dependency Tree with Multi