Automated Verb Sense Labelling Based on Linked Lexical Resources. Presentation by Judith Eckle-Kohler at EACL 2014 in Gothenburg. Joint work with Kostadin Cholakov and Iryna Gurevych

1

Kostadin Cholakov, Judith Eckle-Kohler and Iryna Gurevych

Automated Verb Sense Labelling

Based on Linked Lexical Resources

2

Outline

Evaluation

April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler

Take Home Messages

Automated Verb Sense Labelling in a Nutshell

3 April 28, 2014 | Computer Science Department | UKP Lab Prof. Iryna Gurevych | Dr. Judith Eckle-Kohler

Motivation

Motivation

Sense annotated corpora are important resources in NLP

usually created manually which is time consuming and expensive

verbs have more senses and thus, annotating verb senses is more

difficult

Solution

Using a large-scale linked lexical resource for creating data annotated

with verb senses automatically

UBY

4

Linking Lexical Resources at the Word Sense

Level – example: UBY

Web 2.0

IMSLex-Subcat


UBY

5

Linking Lexical Resources at the Word Sense

Level – example: UBY

Web 2.0

IMSLex-Subcat


UBY

Open Source Java API: http://code.google.com/p/uby/


Automated Verb Sense Labelling: Approach

UBY

Corpus

Uby: Verb Sense Patterns derived from lexical information

Corpus: Verb Sense Patterns derived from verb instances

Similarity Metric

7

WN ask%2:32:01 (make a request or demand for something to somebody)

is linked to FN Id 639 (request to do or give something):

As twenty are required it might pay to ask your supplier for a ` bulk discount ".


Step 1: Creation of sense patterns from

enriched senses

UBY

Uby: [ask%2:32:0] be PP VV to ask person for a JJ act



enriched senses



enriched senses

sense enrichment predicate argument structure information


Step 2: Automated Labelling based on Pattern

Similarity

WN ask%2:32:01 is linked to FN Id 639:

As twenty are required it might pay to ask your supplier for a ` bulk discount ".

UBY

he would n't be pleased if a rumdum like me were to ask

his daughter for a date

Similarity score: 0.217 > threshold

Uby: [ask%2:32:01] be PP VV to ask person for a JJ act

Corpus: if PP be to ask person for a time


Step 2: Automated Labelling based on Pattern

Similarity

Using a similarity metric to compare patterns derived from UBY and

patterns derived from verb instances found in corpora

Considers the common bi-, tri-, and four-grams of two patterns:

Takes word order into account!

w >= 1 is the window around the verb

Gn(pi) is the set of ngrams occurring in pattern pi

12

Outline

Evaluation


Take Home Messages



Intrinsic Evaluation

Evaluation for occurrences of Senseval-3 verbs in SemCor (152 verbs)

Ca. 33.000 sense patterns generated from WN-FN-WKT for these verbs

various similarity thresholds t


Extrinsic Evaluation – Experimental Setup

Comparison of two supervised classifiers for verb sense

disambiguation:

1. Trained on an automatically labelled corpus (ALC):

Verb senses for test verbs given in MASC and Senseval-3 are

labelled in a huge Web Corpus with similarity threshold t=0.1

2. Trained on SemCor 3.0

Test data:

1. MASC corpus: 16 verbs annotated with WordNet 3.0 senses, 11 997

test instances

2. Senseval-3 dataset for all-words WSD: 152 verbs annotated with

WordNet 3.0 senses, 442 test instances


Training Sets

0 100000 200000 300000 400000

Training Data ALC

SemCor

SemCor 3.0

Ca. 22.000 train instances of 16

MASC and 152 Senseval-3 verbs

Automatically labelled corpus (ALC)

Ca. 350.000 train instances of 16

MASC and 152 Senseval-3 verbs


Classification

Preprocessing: POS tagging, dependency parsing and Named

Entity recognition

using the TreeTagger and the Stanford Parser and Named

Entity Recognizer form the DKPro Core component collection,

http://dkpro-core-asl.googlecode.com

Features: lexical, syntactic and semantic features

Classification: A separate logistic regression classifier is

trained for each of the test verbs, using WEKA,

http://www.cs.waikato.ac.nz/ml/weka/

17

Performance of classifiers (accuracy)

evaluated on MASC / Senseval-3

SemCor 3.0

Evaluation on MASC: 50.23

Evaluation on Senseval-3: 48.64

(45.20 with back-off)

Automatically labelled corpus (ALC)

Evaluation on MASC: 49.00

Evaluation on Senseval-3: 47.51

(43.24 with back-off)


MFS Baseline for the two test sets

1. MASC: MFS baseline: 41.72

2. Senseval-3: MFS baseline: 25.34

Training Sets


Extrinsic Evaluation – effect of sense

enrichment

Best results with the combination WordNet-FrameNet-Wiktionary

WordNet-FrameNet achieves similar accuracy but the coverage is lower

WordNet-FrameNet-Wiktionary-VerbNet achieves lower accuracy

Using WordNet only achieved the lowest coverage and accuracy

19

Outline

Evaluation


Take Home Messages


20

Linked Lexical Resources such as UBY are knowledge bases …

… that can be used to perform automated verb sense labelling

the automatically labelled data can successfully be used to train

supervised Machine Learning systems: Distant / Weak Supervision

This is due to the enriched sense representation for word senses

that are interlinked

Particularly useful for languages such as German where lexical resources

are available but no sense-labelled data exist.


Take Home Messages


Thank You!

Questions?


Training Data Coverage

Coverage of WN senses annotated in MASC in the training data:

There are 22 WN senses with instances in MASC which are not found in

SemCor

There are 34 WN senses with instances in MASC which are not found in

the ALC

The VSD system cannot correctly classify instances of those senses

The Coverage of the WN senses annotated in the test sets by the training

data constitutes the upper bound of our classifiers:

ALC: 0.8805 (increasing the size of the ALC does not help)

SemCor: 0.948


Comparison with other systems for verb sense

disambiguation

State-of-the-art supervised system (Chen and Palmer 2009) on Senseval-

2 data :

0.648 accuracy, MFS baseline: 0.407

Not comparable due to different versions of WordNet used

Best performing Lesk-based system (Miller et al., 2012):

33.86% accuracy for the MASC verbs

30.16% accuracy for the Senseval-3 verbs