Transcript
Page 1: Automatic Acquisition of Paradigmatic Relations using Iterated Co-occurrences Chris Biemann, Stefan Bordag, Uwe Quasthoff University of Leipzig, NLP Department

Automatic Acquisition of Paradigmatic Relations

using Iterated Co-occurrences

Chris Biemann, Stefan Bordag, Uwe Quasthoff

University of Leipzig, NLP Department

LREC 2004, Learning & Acquisition (II), 27th of May 2004

Page 2: Automatic Acquisition of Paradigmatic Relations using Iterated Co-occurrences Chris Biemann, Stefan Bordag, Uwe Quasthoff University of Leipzig, NLP Department

Chris Biemann 2

Sets of Words

• Our goal is the automatic extension of homogenous word sets, i.e. WordNet synsets or small subtrees of some hierarchy

• We collect methods and apply them, eventually in combination

• Mind experiment: the computer as „associator“:Input: some example concepts- Detection of the relation- Output of additional instancesThis can be done semi-supervised

• Necessary:- very large text corpus- features- methods

Page 3: Automatic Acquisition of Paradigmatic Relations using Iterated Co-occurrences Chris Biemann, Stefan Bordag, Uwe Quasthoff University of Leipzig, NLP Department

Chris Biemann 3

Statistical Co-occurrences

• occurrence of two or more words within a well-defined unit of information (sentence, nearest neighbors)

• Significant Co-occurrences reflect relations between words

• Significance Measure (log-likelihood):- k is the number of sentences containing a and b together- ab is (number of sentences with a)*(number of sentences with b)- n is total number of sentences in corpus

( , ) log log !

with number of sentences,

.

sig A B x k x k

n

abx

n

Page 4: Automatic Acquisition of Paradigmatic Relations using Iterated Co-occurrences Chris Biemann, Stefan Bordag, Uwe Quasthoff University of Leipzig, NLP Department

Chris Biemann 4

Iterating Co-occurrences

• (sentence-based) co-ocurrences of first order:words that co-occur significantly often together in sentences

• co-occurrences of second order:

words that co-occur significantly often in collocation sets of first order

• co-occurrences of n-th order:words that co-occur significantly often in collocation sets of (n-1)th order

When calculating a higher order, the significance values of the preceding order are not relevant. A co-occurrence set consists of the N highest ranked co-occurrences of a word.

Page 5: Automatic Acquisition of Paradigmatic Relations using Iterated Co-occurrences Chris Biemann, Stefan Bordag, Uwe Quasthoff University of Leipzig, NLP Department

Chris Biemann 5

Constructed Example IOrd 1 dog terrier cat mouse barking bite yelp

dog - - - X x X

terrier - - - x x X

cat - - x - x -

mouse - - X - x -

barking X X - - - -

bite X X x x - -

yelp x x - - - -

Ord 2 dog terrier cat mouse barking bite yelp

dog 3 1 1 - - -

terrier 3 1 1 - - -

cat 1 1 1 - - -

mouse 1 1 1 - 1 -

barking - - - - 2 2

bite - - - 1 2 2

yelp - - - - 2 2

Page 6: Automatic Acquisition of Paradigmatic Relations using Iterated Co-occurrences Chris Biemann, Stefan Bordag, Uwe Quasthoff University of Leipzig, NLP Department

Chris Biemann 6

Constructed Example II

Ord 3 dog terrier cat mouse barking bite yelp

dog - - - - - -

terrier - - - - -

cat - - - - - -

mouse - - - - - -

barking - - - - 1 1

bite - - - - 1 1

yelp - - - - 1 1

Ord 2 dog terrier cat mouse barking bite yelp

dog x - - - - -

terrier x - - - - -

cat - - - - - -

mouse - - - - - -

barking - - - - x x

bite - - - - x x

yelp - - - - x x

Page 7: Automatic Acquisition of Paradigmatic Relations using Iterated Co-occurrences Chris Biemann, Stefan Bordag, Uwe Quasthoff University of Leipzig, NLP Department

Chris Biemann 7

Properties of Iterated Co-occurrences

• after some iterations the sets remain more or less stable• the sets are somewhat semantically homogeneous• sometimes, they have to do nothing with the reference word• calculations performed until 10th order.

• Example for TOP 20 NB-collocations of 10th order for „erklärte“ [explained]: sagte, schwärmte, lobt, schimpfte, meinte, jubelte, lobte, resümierte, schwärmt, Reinhard Heß, ärgerte, kommentierte, urteilte, analysierte, bilanzierte, freute, freute sich, Bundestrainer, freut ,gefreut [said, enthused, praises, grumbled, meant, was jubilant, praised, summarized, dreamt, Reinhard Hess, annoyed, commentated, judged, analyzed, balanced, made happy, was pleased, coach of the national team, is pleased, been pleased]

Page 8: Automatic Acquisition of Paradigmatic Relations using Iterated Co-occurrences Chris Biemann, Stefan Bordag, Uwe Quasthoff University of Leipzig, NLP Department

Chris Biemann 8

Mapping co-occurrences to graphs

• For all words having co-occurrences, form nodes in a graph.

• Connect them all by edges, initialize edge weight with 0• For every co-occurrence of two words in a sentence,

increase edge weight by significance

Page 9: Automatic Acquisition of Paradigmatic Relations using Iterated Co-occurrences Chris Biemann, Stefan Bordag, Uwe Quasthoff University of Leipzig, NLP Department

Chris Biemann 9

First Iteration Step

• The two black nodes A and B get connected in the step if there are many nodes C which are connected to both A and B

• The more Cs, the higher the weight of the new edge

new connection

existing connection

Page 10: Automatic Acquisition of Paradigmatic Relations using Iterated Co-occurrences Chris Biemann, Stefan Bordag, Uwe Quasthoff University of Leipzig, NLP Department

Chris Biemann 10

Second Iteration Step• The two black nodes A and B get connected in the

step if there are many (dark grey) nodes Ds which are connected to both A and B.

• The connections between the nodes Ds and the nodes A and B were constructed because of (light gray) nodes Es and Fs, respectively

new connection

former connection

existing connection

AB

DsEsFs

Page 11: Automatic Acquisition of Paradigmatic Relations using Iterated Co-occurrences Chris Biemann, Stefan Bordag, Uwe Quasthoff University of Leipzig, NLP Department

Chris Biemann 11

Collapsing bridging nodes

• Upper bound for path length in iteration n is 2n. • However, some of the bridging nodes collapse, giving

rise to self-keeping clusters of arbitrary path length, which are invariant under iteration.

Upper 5 nodes: invariant clusterA, B are being absorbed by this cluster

Page 12: Automatic Acquisition of Paradigmatic Relations using Iterated Co-occurrences Chris Biemann, Stefan Bordag, Uwe Quasthoff University of Leipzig, NLP Department

Chris Biemann 12

Examples of Iterated Co-occurrences

Order Reference word

TOP-10 collocations

N2 wine wines, champagne, beer, water, tea, coffee, Wine, alcoholic, beers, cider

S10 wine wines, grape, sauvignon, chardonnay, noir, pinot, cabernet, spicy, bottle, grapes

S1 ringing phone, bells, phones, hook, bell, endorsement, distinctive, ears, alarm, telephone

S2 ringing rung, Centrex, rang, phone, sounded, bell, ring, FaxxMaster, sound, tolled

S4 ringing sounded, rung, rang, tolled, tolling, sound, tone, toll, ring, doorbell

S10 pressing Ctrl, Shift, press, keypad, keys, key, keyboard, you, cursor, menu, PgDn, keyboards, numeric, Alt, Caps, CapsLock, NUMLOCK, NumLock, Scroll

Page 13: Automatic Acquisition of Paradigmatic Relations using Iterated Co-occurrences Chris Biemann, Stefan Bordag, Uwe Quasthoff University of Leipzig, NLP Department

Chris Biemann 13

Intersection of Co-occurrence Sets: resolving ambiguity

Herz-Bube

Stich

Becker

Achtelfinale - Aufschlag - Boris Becker - Daviscup - Doppel - DTB –Edberg - Finale - Graf - Haas - Halbfinale - Match - Pilic - Runde - Sampras - Satz - Tennis - Turnier - Viertelfinale - Weltrangliste - Wimbledon

Alleinspieler - Herz - Herz-Dame - Herz-König - Hinterhand - Karo - Karo-As - Karo-Bube - Kreuz-As - Kreuz-Bube - Pik-As - Pik-Bube - Pik-König - Vorhand -

Becker - Courier - Einzel - Elmshorn - French Open - Herz-As - ins - Kafelnikow - Karbacher - Krajicek - Kreuz-As - Kreuz-Bube - Michael Stich - Mittelhand

- Pik-As - Pik-Bube - Pik-König

bedient - folgenden - gereizt - Karo-Buben - Karo-Dame - Karo-König - Karte - Karten - Kreuz-Ass - Kreuz-Dame - Kreuz-Hand - Kreuz-König - legt - Mittelhand - Null ouvert - Pik - Pik-Ass - Pik-Dame - schmiert - Skat - spielt - Spielverlauf - sticht - übernimmt - zieht -

Agassi - Australian Open - Bindewald - Boris - Break - Chang - Dickhaut - - gewann - Ivanisevic - Kafelnikow - Kiefer - Komljenovic - Leimen - Matchball - Michael Stich - Monte Carlo - Prinosil - Sieg - Spiel - spielen - Steeb - Teamchef

Stich

Page 14: Automatic Acquisition of Paradigmatic Relations using Iterated Co-occurrences Chris Biemann, Stefan Bordag, Uwe Quasthoff University of Leipzig, NLP Department

Chris Biemann 14

Example: NB-collocations of 2nd order warm, kühl, kalt

• Disjunction and filtering for adjectives of collocation sets for warm, kühl, kalt [warm, cool, cold] results in:

abgekühlt, aufgeheizt, eingefroren, erhitzt, erwärmt, gebrannt, gelagert, heiß, heruntergekühlt, verbrannt, wärmer [cooled down, heated, frozen, heated up, warms up, burned, stored, hot, down-cooled, burned, more warmly]

• emotional reading „abweisend“ [repelling] for kühl, kalt is eliminated

warm kühl kalt abgekühlt abgeklärt abgekühlt abkühlen abgekühlt abkühlen angestiegen abkühlen angestiegen anzeigt ablehnend anzeigt aufgeheizt abstrakt aufgeheizt eingefroren aggressiv aushalten erhitzt ähnlich eingefroren erwärmt altmodisch einstellen fertig anders erhitzt gebrannt archaisch ernst gefallen aufgeheizt erwärmt gehalten aushalten frei geklettert bedrohlich gebrannt gekühlt bescheiden gefallen gelagert bitter gehalten gemessen blaß geklettert gesenkt blutleer gekühlt gestiegen distanziert gelagert gesunken eingefroren gemessen gut empfindlich genug Heiß empört gesenkt heruntergekühlt entrüstet gestiegen hoch entsetzt hart höher entspannt heiß kalt erhitzt heruntergekühlt kalte erleichtert hoch kalten erschöpft höher ... ... ...

Page 15: Automatic Acquisition of Paradigmatic Relations using Iterated Co-occurrences Chris Biemann, Stefan Bordag, Uwe Quasthoff University of Leipzig, NLP Department

Chris Biemann 15

Detection of X-onymssynonyms, antonyms, (co)-hyponyms...

• Idea: Intersection of co-occurrence sets of two X-onyms as reference words should contain X-onyms

• lexical ambiguity of one reference word does not deteriorate the result set

• Method:- Detect word class for reference words- calculate co-occurrences for reference words- filter co-occurrences w.r.t the word class of the reference words (by means of POS tags)- perform disjunction of the co-occurrence sets- output result

• ranking can be realized over significance values of the co-occurrences

Page 16: Automatic Acquisition of Paradigmatic Relations using Iterated Co-occurrences Chris Biemann, Stefan Bordag, Uwe Quasthoff University of Leipzig, NLP Department

Chris Biemann 16

Mini-Evaluation

• Experiments for different data sources, NB-collocations of 2nd and 3rd order

• fraction of X-onyms in TOP 5 higher than in TOP 10 ranking method makes sense

• disjunction of 2nd-order and 3rd-order collocations almost always empty different orders exhibit different relations

• satisfactory quantity, more through larger corpora• quality: for unsupervised extension not precise enough

Page 17: Automatic Acquisition of Paradigmatic Relations using Iterated Co-occurrences Chris Biemann, Stefan Bordag, Uwe Quasthoff University of Leipzig, NLP Department

Chris Biemann 17

Word Sets for Thesaurus Expansion

Application: thesaurus expansion

start set: [warm, kalt] [warm, cold]result set: [heiß, wärmer, kälter, erwärmt, gut, heißer, hoch, höher, niedriger, schlecht, frei] [hot, warmer, colder, warmed, good, hotter, high, higher, lower, bad, free]

start set: [gelb, rot] [yellow, red]result set: [blau, grün, schwarz, grau, bunt, leuchtend, rötlich, braun, dunkel, rotbraun, weiß] [blue, green, black, grey, colorful, bright, reddish, brown, dark, red-brown, white]

start set: [Mörder, Killer] [murderer, killer]result set: [Täter, Straftäter, Verbrecher, Kriegsverbrecher, Räuber, Terroristen, Mann, Mitglieder, Männer, Attentäter] [offender, delinquent, criminal, war criminal, robber, terrorists, man, members, men, assassin

Page 18: Automatic Acquisition of Paradigmatic Relations using Iterated Co-occurrences Chris Biemann, Stefan Bordag, Uwe Quasthoff University of Leipzig, NLP Department

Chris Biemann 18

More Examples in English

Intersection of N2-Order collocation sets

Page 19: Automatic Acquisition of Paradigmatic Relations using Iterated Co-occurrences Chris Biemann, Stefan Bordag, Uwe Quasthoff University of Leipzig, NLP Department

Chris Biemann 19

Questions?

THANK YOU !


Recommended