20
1 Automatic Extension of Feature-based Semantic Lexicons via Contextual Features March 10, 2005 29th Annual Conference of Gfkl, 2005 Chris Biemann University of Leipzig Germany Rainer Osswald FernUniversität Hagen Germany

1 Automatic Extension of Feature-based Semantic Lexicons via Contextual Features March 10, 2005 29th Annual Conference of Gfkl, 2005 Chris Biemann University

Embed Size (px)

Citation preview

Page 1: 1 Automatic Extension of Feature-based Semantic Lexicons via Contextual Features March 10, 2005 29th Annual Conference of Gfkl, 2005 Chris Biemann University

1

Automatic Extension of Feature-based Semantic Lexicons

via Contextual Features

March 10, 2005

29th Annual Conference of Gfkl, 2005

Chris BiemannUniversity of Leipzig

Germany

Rainer OsswaldFernUniversität HagenGermany

Page 2: 1 Automatic Extension of Feature-based Semantic Lexicons via Contextual Features March 10, 2005 29th Annual Conference of Gfkl, 2005 Chris Biemann University

2

Outline

• Motivation: Lexicon extension for Semantic Parsing

• From co-ocurrences to adjective profiles of nouns

• Inheritance mechanism for semantic features

• Results for complex classes

• Results for binary classes and their combination

• Discussion

Page 3: 1 Automatic Extension of Feature-based Semantic Lexicons via Contextual Features March 10, 2005 29th Annual Conference of Gfkl, 2005 Chris Biemann University

3

Motivation

• Semantic parsing aims at finding a semantic representation for a sentence

• Semantic parsing needs as a prerequisite semantic features of words.

• Semantic features are obtained by manually creating lexicon entries (expensive in terms of time and money)

• Given a certain amount of manually created lexicon entries, it might be possible to train a classifier in order to find more entries

Page 4: 1 Automatic Extension of Feature-based Semantic Lexicons via Contextual Features March 10, 2005 29th Annual Conference of Gfkl, 2005 Chris Biemann University

4

HaGenLex: Semantic Lexicon for German

semantic class

size: 22‘700 entries of these: 11‘300 nouns, 6‘700 verbs

WORD SEMANTIC CLASSAggressivität nonment-dyn-abs-situationAgonie nonment-stat-abs-situationAgrarprodukt nat-discreteÄgypter human-objectAhn human-objectAhndung nonment-dyn-abs-situationÄhnlichkeit relationAirbag nonax-mov-art-discreteAirbus mov-nonanimate-con-potagAirport art-con-geogrAjatollah human-objectAkademiker human-objectAkademisierung nonment-dyn-abs-situationAkkordeon nonax-mov-art-discreteAkkreditierung nonment-dyn-abs-situationAkku ax-mov-art-discreteAkquisition nonment-dyn-abs-situationAkrobat human-object... ...

Page 5: 1 Automatic Extension of Feature-based Semantic Lexicons via Contextual Features March 10, 2005 29th Annual Conference of Gfkl, 2005 Chris Biemann University

5

Characteristics of semantic classes in HaGenLex

In total 50 semantic classes for nouns are constructed from allowed combinations of:

• 16 semantic features (binary), e.g. HUMAN+, ARTIFICIAL- • 17 ontologic sorts, e.g. concrete, abstract-situation...

sort (hierarchy)

semantic features

semantic classes

Page 6: 1 Automatic Extension of Feature-based Semantic Lexicons via Contextual Features March 10, 2005 29th Annual Conference of Gfkl, 2005 Chris Biemann University

6

Application: WOCADI-Parser

„Welche Bücher von Peter Jackson über Expertensysteme wurden bei Addison-Wesley seit 1985 veröffentlicht?“

Page 7: 1 Automatic Extension of Feature-based Semantic Lexicons via Contextual Features March 10, 2005 29th Annual Conference of Gfkl, 2005 Chris Biemann University

7

Underlying Assumptions

• Harris 1968: Distributional Hypothesissemantic similarity is a function over global contexts of words. The more similar the contexts, the more similar the words

• Projected on nouns and adjectives: nouns of similar semantic classes are modified through similar adjectives

• The neighbouring co-occurrence relation between adjectives as left neighbours and nouns as right neighbours approximates typical head-modifier structures

Page 8: 1 Automatic Extension of Feature-based Semantic Lexicons via Contextual Features March 10, 2005 29th Annual Conference of Gfkl, 2005 Chris Biemann University

8

Neighbouring Co-occurrences and Profiles

• Significant co-occurrences reflect relations between words. To determine, which are significant, a significance measure is used (here log-likelihood)

• In the following, we look at adjectives which appear significantly (speak: typically) left to nouns and nouns appearing significantly right of adjectives

• The set of adjectives that co-occur significantly often to the left of a noun is called ist adjective profile (analogous definition of noun profile for adjectives)

• For experiments, we use the most recent German corpus of „Projekt Deutscher Wortschatz“, 500 million tokens

Page 9: 1 Automatic Extension of Feature-based Semantic Lexicons via Contextual Features March 10, 2005 29th Annual Conference of Gfkl, 2005 Chris Biemann University

9

Example: neighbouring profiles

amount: 125‘000 nouns, 25‘000 adjectives

word adjektiv / noun profile

Buch neu, erschienen, erst, neuest, jüngst, gut, geschrieben, letzt, zweit, vorliegend, gleichnamig herausgegeben, nächst, dick, veröffentlicht, ...

Käse gerieben, überbacken, kleinkariert, fett, französisch, fettarm, löchrig, holländisch, handgemacht, grün, würzig, selbstgemacht, produziert, schimmelig,

Camembert gebacken, fettarm, reif

überbacken Schweinesteak, Aubergine, Blumenkohl, Käse

erlegt Tier, Wild, Reh, Stück, Beute, Großwild, Wildkatzen, Büffel, Rehbock, Beutetier, Wal, Hirsch, Hase, Grizzly, Wildschwein, Thier, Eber, Bär, Mücke,

ganz Leben, Bündel, Stück, Volk, Wesen, Vermögen, Herz, Heer, Arsenal, Dorf, Land, Können, Berufsleben, Paket, Kapitel, Stadtviertel, Rudel, Jahrzehnt, ...

Word transl. adjektive / noun profile translations

book new, published, first, newest, most recent, recently, good, written, last, second, onhand, eponymous, next, thick, ...

cheese grated, baked over, small minded, fat, French, low-fat, holey, Dutch, hand-made, green, spicey, self-made, produced, moldy

camembert baken, low-fat, ripe

baked over steak, aubergine, cauliflower, cheese

brought down animal, game, deer, piece, prey, big game, wild cat, buffalo, roebuck, prey animal, whale, hart, bunny, grizzly, wild pig, boar, bear, ...

whole life, bundle, piece, population, kind, fortune, heart, army, anrsenal, village, country, ability, career, packet, chapter, quater, pack, decade ...

Page 10: 1 Automatic Extension of Feature-based Semantic Lexicons via Contextual Features March 10, 2005 29th Annual Conference of Gfkl, 2005 Chris Biemann University

10

Mechanism of Inheritance

Algorithm:Initialize adjective and noun profiles;Initialize the start set;As long as new nouns get classified {

calculate class probabilities for each adjective;for all yet unclassified nouns n {

Multiply class probabilities per class of modifying adjectives; Assign the class with highest probabilities to n;

} }

Which class is assigned to N4 in the next step?

Class probabilities per adjective:• count number of classes• normalize on total number of class wrt. noun classes• normalize to 1

Page 11: 1 Automatic Extension of Feature-based Semantic Lexicons via Contextual Features March 10, 2005 29th Annual Conference of Gfkl, 2005 Chris Biemann University

11

Example: Topf (pot)adjektive profile of Topf (pot) = ax-mov-art-discrete:

angebrannt(X) heiß(-) ehern(-) fremd(-) divers(-) zerbeult(X) brodelnd(-) staatlich(-) gußeisern(-) tönern(X) gemeinsam(-) groß(-) irden(X) verschieden(-) verschlossen(-) anonym(-) rund(-) flach(-) Bremer(-) geschlossen(-) passend(-) gesondert(-) andere(-) riesig(-) Golden(-) eisern(-) europäisch(-) viel(-) öffentlich(-) mehr(-) golden(-) leer(-) klein(-) getrennt(-) möglich(-) speziell(-) übervoll(X) dampfend(-) gleich(-) gefüllt(-)

# classes per adjective:angebrannt (burnt): {nat-substance=1, art-substance=1, ax-mov-art-discrete=1}Suppe (soup) art_substanceZigarette (cigarette) ax-mov-art-discreteMilch (milk) nat-substance

zerbeult (dented): {nonmov-art-discrete=1, mov-nonanimate-con-potag=2, nonax-mov-art-discrete=1, ax-mov-art-discrete=3}Wagen, Auto (wagon, car) mov-nonanimate-con-potagFahrzeug, Mountainbike, Posaune (vehicle, mountainbike, trombone) ax-mov-art-discreteMantel (coat) nonax-mov-art-discreteDach (roof) nonmov-art-discrete

irden (earthen): {art-con-geogr=1, nonax-mov-art-discrete=1, ax-mov-art-discrete=9}Schal (shawl) nonax-mov-art-discreteHafen (port) art-con-geogrTeller, Flasche, Schüssel, Becher, Geschirr, Vase, Krug, Gefäß, Napf (plate, bottle, bowl, cup, dishes, vase, mug, jar) ax-mov-art-discrete

tönern (clay-made): {ax-mov-art-discrete=1, prot-discrete=1}Fuß (foot) prot-discreteGefäß (mug) ax-mov-art-discrete

übervoll (over-filled): {nonmov-art-discrete=3, art-con-geogr=1, nonment-dyn-abbs-situation=1, nonax-mov-art-discrete=1}Zimmer, Saal, Lager (room, hall, encempment) nonmov-art-discreteStall (stable) art-con-geogrVorlesung (lecture) nonment-dyn-abs-situationTablett (tray) nonax-mov-art-discrete

Class probabilities: {mov-nonanimate-con-potag=2.8E-25, ax-mov-art-discrete=5.8E-8, art-con-geogr=1.5E-20,nonax-mov-art-discrete=2.1E-15, nat-substance=3.3E-25, nonment-dyn-abs-situation=1.6E-25,prot-discrete=5.0E-25, art-substance=3.3E-25, nonmov-art-discrete=7.1E-20}

Page 12: 1 Automatic Extension of Feature-based Semantic Lexicons via Contextual Features March 10, 2005 29th Annual Conference of Gfkl, 2005 Chris Biemann University

12

Parameters

• Minimal number of adjectives: minAdjA noun needs at least minAdj classifying adjectivesavoids statistical noise and implies frequency threshold.

• Maximal number of classes per adjective: maxClassAn adjective is only used for classification if it favours maximally maxClass different classesunspecific adjectives do not distort the results

Page 13: 1 Automatic Extension of Feature-based Semantic Lexicons via Contextual Features March 10, 2005 29th Annual Conference of Gfkl, 2005 Chris Biemann University

13

Experimental DataDistribution of semantic classes (total: 6045)

nonment-dyn-abs-situationhuman-objectprot-theor-concept

nonoper-attributeax-mov-art-discretenonment-stat-abs-situationanimal-object

nonmov-art-discretement-stat-abs-situationnonax-mov-art-discretetem-abstractum

mov-nonanimate-con-potagart-con-geograbs-infoart-substance

nat-discretenat-substanceprot-discretenat-con-geogr

prot-substancemov-art-discretemeas-unitoper-attribute

institutionment-dyn-abs-situationplant-objectmov-nat-discretecon-info

con-geogrcon-objectanimate-objectprot-method

dyn-abs-situationobjectnonmov-nonanimate-con-potagabs-geogr

stat-abs-situationmodalityrelationcon-potag

prot-con-objectnonmov-nat-discretenoninstit-abs-potagthc-relation

nonanimate-con-potagabs-situationabs-potag

• 4726 nouns comply to minAdj=5, that means maximal recall=78,2%• In all experiments, 10-fold-cross validation was used

Page 14: 1 Automatic Extension of Feature-based Semantic Lexicons via Contextual Features March 10, 2005 29th Annual Conference of Gfkl, 2005 Chris Biemann University

14

Results global classification• Classification was carried out directly on 50 semantic classes• Different measuring points correspond to parameters minAdj in

{5,10,15,20}, maxClass in {2, 5, 50}• Results too poor for lexicon extension

Precision/Recall for global classifier

00,10,20,30,40,50,60,70,80,9

1

0 0,2 0,4 0,6 0,8 1

Precision

Re

ca

ll

Page 15: 1 Automatic Extension of Feature-based Semantic Lexicons via Contextual Features March 10, 2005 29th Annual Conference of Gfkl, 2005 Chris Biemann University

15

Combining single classifiersArchitecture: binary classifiers for single features, then

combinding the outcome. Parameter: minAdj=5, maxClass=2

ANIMAL +/-ANIMATE +/-ARTIF +/-AXIAL +/-... (16 features)

... (17 sorts)

ab +/-abs +/-ad +/-as +/-

Selection:compatible

semantic classes that are minimal

w.r.t hierarchy and unambiguous.

result classor

reject

Page 16: 1 Automatic Extension of Feature-based Semantic Lexicons via Contextual Features March 10, 2005 29th Annual Conference of Gfkl, 2005 Chris Biemann University

16

Results: single semantic features

• for bias >0,05 good to excellent precision• total precision: 93,8% (86,8% for feature +)• total recall: 70,7% (69,2% for feature +)

Precision/Recall vs. Bias semantic features

0,00

0,20

0,40

0,60

0,80

1,00

0,00 0,10 0,20 0,30 0,40 0,50

Bias in Data

Pre

cisi

on

/Rec

all

total Prec, Prec +, total Rec, Rec +

Name Anzahl + - Bias

method 6004 12 5992 0,0020

instit 6032 39 5993 0,0065

mental 9008 162 8846 0,0180

info 6015 119 5896 0,0198

animal 5995 143 5852 0,0239

geogr 6015 188 5827 0,0313

thconc 6028 518 5510 0,0859

instru 5932 969 4963 0,1634

human 5995 1313 4682 0,2190

legper 6009 1352 4657 0,2250

animate 6010 1505 4505 0,2504

potag 6015 1664 4351 0,2766

artif 5864 2204 3660 0,3759

axial 5892 2260 3632 0,3836

movable 5827 2345 3482 0,4024

spatial 6033 2910 3123 0,4823

Page 17: 1 Automatic Extension of Feature-based Semantic Lexicons via Contextual Features March 10, 2005 29th Annual Conference of Gfkl, 2005 Chris Biemann University

17

Results: ontologic sorts

• for bias >0,10 good to excellent precision• total precision: 94,1% (89,5% for sort +)• total recall: 73,6% (69,6% for sort +)

precision/recall vs. bias ontologic sorts

0,00

0,20

0,40

0,60

0,80

1,00

0,00 0,10 0,20 0,30 0,40 0,50

bias in data

Pre

cisi

on

/Rec

all

total Prec, Prec +, total Rec, Rec +

Name Anzahl + - Bias

re 6033 7 6026 0,0012

mo 6033 8 6025 0,0013

o- 6033 5994 39 0,0065

oa 6045 41 6004 0,0068

me 6045 41 6004 0,0068

qn 6045 41 6004 0,0068

ta 6033 107 5926 0,0177

s 6010 224 5786 0,0373

as 6031 363 5668 0,0602

na 6033 411 5622 0,0681

at 6033 450 5583 0,0746

io 6033 664 5369 0,1101

ad 6031 1481 4550 0,2456

abs 6033 1846 4187 0,3060

d 6010 2663 3347 0,4431

co 6033 2910 3123 0,4823

ab- 6033 3082 2951 0,4891

Page 18: 1 Automatic Extension of Feature-based Semantic Lexicons via Contextual Features March 10, 2005 29th Annual Conference of Gfkl, 2005 Chris Biemann University

18

Results: comb. semantic classes

• no connection between amount of class and results visible• total precision: 80,2%• total recall: 34,2%, number of newly classified nouns: 6649

Precision/Recall in % vs. amount of semantic class

0

20

40

60

80

100

0 250 500 750 1000 1250 1500

amount in training data

Pre

cis

ion

/Recall in

%

%Recall %Precision

Klasse

Anz. Prec Rec nonment-dyn-abs-situation 1421 89,19 34,27

human-object 1313 96,82 69,54

prot-theor-concept 516 53,71 18,22

nonoper-attribute 411 0,00 0,00

ax-mov-art-discrete 362 55,64 40,88

nonment-stat-abs-situation 226 36,84 6,19

animal-object 143 100,0 26,57

nonmov-art-discrete 133 57,41 23,31

ment-stat-abs-situation 126 51,28 15,87

nonax-mov-art-discrete 108 31,48 15,74

tem-abstractum 107 96,77 28,04

mov-nonanimate-con-potag 98 70,45 31,63

art-con-geogr 96 58,70 28,12

abs-info 94 42,31 11,70

art-substance 88 60,47 29,55

nat-discrete 88 100,0 31,82

nat-substance 86 57,14 9,30prot-discrete 73 100,0 57,53

nat-con-geogr 63 65,00 20,63

prot-substance 50 100,0 40,00

mov-art-discrete 45 100,0 37,78

meas-unit 41 90,91 24,39

oper-attribute 39 0,00 0,00Institution 39 0,00 0,00ment-dyn-abs-situation 36 0,00 0,00plant-object 34 100,0 8,82mov-nat-discrete 27 22,22 22,22

con-info 25 40,00 8,00Rest 157 39,24 19,75

Page 19: 1 Automatic Extension of Feature-based Semantic Lexicons via Contextual Features March 10, 2005 29th Annual Conference of Gfkl, 2005 Chris Biemann University

19

Typical mistakesPflanze (plant) animal-object instead of plant-objectzart, fleischfressend, fressend, verändert, genmanipuliert, transgen, exotisch, selten, giftig, stinkend,

wachsend...

Nachwuchs (offspring) human-object instead of animal-objectwissenschaftlich, qualifiziert, akademisch, eigen, talentiert, weiblich, hoffnungsvoll, geeignet, begabt,

journalistisch...

Café (café) art-con-geogr instead of nonmov-art-discrete (cf. Restaurant)Wiener, klein, türkisch, kurdisch, romanisch, cyber, philosophisch, besucht, traditionsreich, schnieke,

gutbesucht, ...

Neger (negro) animal-object instead of human-objectweiß, dreckig, gefangen, faul, alt, schwarz, nackt, lieb, gut, brav

but:

Skinhead (skinhead) human-object (ok){16,17,18,19,20,21,22,23,30}ährig, gleichaltrig, zusammengeprügelt, rechtsradikal, brutal

In most cases the wrong class is semantically close. Evaluation metrics did not account for that.

Page 20: 1 Automatic Extension of Feature-based Semantic Lexicons via Contextual Features March 10, 2005 29th Annual Conference of Gfkl, 2005 Chris Biemann University

20

Any Questions?

Thank you very much!