A Genetic Algorithm using Semantic Relations for Word Sense Disambiguation Proposal Michael Hausman

A Genetic Algorithm using Semantic Relations for Word Sense Disambiguation

Proposal

Michael Hausman

2

Overview

• Introduction

• Semantic Relations

– Cost Function 1

– Cost Function 2

• Genetic Algorithms

• Testing and Deliverables

• Schedule

3

Introduction• Word Sense Disambiguation

– Another way of saying, “which dictionary definition is correct in context”

– Main problem of the project

• Example: “Time flies like an arrow.”

– Time (noun #1) -- an instance or single occasion for some event

– Time (noun #5) -- the continuum of experience in which events pass from the future through the present to the past

– Fly (noun #1) -- two-winged insects characterized by active flight

– Fly (verb #1) -- travel through the air; be airborne;

– Fly (verb #2) -- move quickly or suddenly

– Arrow (noun #1) -- a mark to indicate a direction or relation

– Arrow (noun #2) -- a projectile with a straight thin shaft and an arrowhead on one end and stabilizing vanes on the other; intended to be shot from a bow

– Etc.

4

Semantic Relations• All relations become from WordNet

– Machine readable dictionary

– Contains several semantic relations

• Semantic Relations use some aspect of a word to define how closely two objects are related

• Early papers focus on one semantic relation

– Somewhat good results

• Later papers focus on two or three relations

– Better results over early papers

• Therefore, this project uses several semantic relations

5

Semantic Relations: Frequency

• WordNet orders senses (definitions) by how often it appears in the corpus used to make WordNet

• More common senses are more likely to be correct

SenseTotal(w): The total number of senses for this word

Sense(w): the sense number of the word currently in use

1. Fly -- travel through the air; be airborne; (Freq = 1)

2. Fly -- move quickly or suddenly (Freq = 0.5)

6

Semantic Relations: Hypernyms• A more generic way of saying a word

• Many hypernyms in a row creates a tree

– More specific down the tree

– Length of path between words is the similarity of the subjects

• Every definition has a different hypernym tree

– Bank: financial institution vs. bank: ground next to a river

LSO: most specific common hypernym

D: length from the root to LSO

A: length from w1 to LSO

B: length from w2 to LSO

Equation from Wu-Palmer (1994)

entity

…

organism

person

male female

boy girl

LSO

A B

D

animal

7

Semantic Relations: Coordinate Sisters

• Two words that have the same hypernym are coordinate sisters

• Slightly more specific ideas from the same concept

w1 and w2: the two words being compared

Coor(w*): all of the coordinate sisters of the word

Coor(w1) w∋ 2: if w2 is any of the coordinate sisters of w1

entity

…

organism

person

male female

boy girl

animal

8

Semantic Relations: Domain

• The word or collection a word belongs to

• A paragraph talks about the same collection


Dom(w*): the domain of the word given

• Computer Science

– Buffer, drive, cache, program, software

• Sports

– Skate, backpack, ball, foul, snorkel

9

Semantic Relations: Synonym

• Two words that can be interchanged in a given context are synonyms

• Many words in a paragraph may be interchangeable


Syn(w*): all of the synonyms of the word

Syn(w1) w∋ 2: if w2 is any of the synonyms of w1

• computer, computing machine, computing device, data processor, electronic computer, information processing system

• sport, athletics

• frolic, lark, rollick, skylark, disport, sport, cavort, gambol, frisk, romp, run around, lark about

10

Semantic Relations: Antonyms

• Two words that are opposites of each other in a given context are antonyms

• Comparisons tend to have opposites


Ant(w*): all of the anonyms of the word

Ant(w1) w∋ 2: if w2 is any of the antonyms of w1

• good, goodness

– evil, evilness

• pure

– defiled, maculate

• vague

– clear, distinct

11

Cost Function 1• Combining several semantic relations has better

results

• Simply separate a solution into every possible word pair combination and add the relations

• Highest scores are the best

c: the current solution

wi and wj: the current word pair combination

12

Cost Function 2• Semantic Relations only work up to a point

– Picking most frequent definition never picks the second definition

• Some semantic relations don’t work on all POS– Don’t have hypernyms for adjectives

• Weighting semantic relations for each POS could help

c: the current solution

wi and wj: the current word pair combination

POS: the part of speech

***Weight(POS): the weight for the relation and part of speech

13

Cost Function 2: Calculating Weights1. Take a translated file and separate the

parts of speech.

– Nouns, Adjectives, verbs, etc.

2. For each part of speech, find every possible word sense combination. Each word sense combination will be a solution.

3. Calculate the semantic relations for every solution.

4. For each semantic relation, make a scatter plot of a part of speech with every point representing a solution. The x axis is the ratio of correct senses. The y axis is the semantic relation calculation.

14

Cost Function 2: Calculating Weights5. Make a trend line for each scatter plot. This trend line should have the best

R2 value for the data. However, do not use a trend line that goes below zero.

6. Use these trend lines to make a system of equations. Note that there should be a different system of equations for each part of speech.

100 = freqnoun(100)*X + Hypnoun(100)*Y…Coornoun(100)*Z, 95 = freqnoun(95)*X + Hypnoun(95)*Y…Coornoun(95)*Z, etc.

7. Solve the system of equations for the weights

15

Why Genetic Algorithms?

• The cost functions give a way to calculate good solutions if not the best solution

• With a cost function, this is now an optimization problem

• Large solution space

– 5 senses per word for 100 words is 5100 = 7.888 * 1069 possible solutions

• Genetic algorithms use a subset of the solutions to make a good solution

• I like them

16

Genetic Algorithms: Overview

1. Start with a set of solutions (1st Generation)

2. Take original “parent” solutions and combine them with each other to create a new set of “child” solutions (Mating)

3. Somehow measure the solutions (Cost Function) and only keep the “better” half of the solutions

– Some may be from “parent” set, others are from “child” set

4. Introduce some random changes in case solutions are “stuck” or are all the same (Mutation)

5. Repeat starting with Step 2

17

Testing and Deliverables

• SemCor

– Princeton University took the brown corpus and tagged it with WordNet senses

– Common resource used by many researchers

– Many papers reference specific SemCor results

• Testing

– All training and testing will use SemCor

– Compare results to several papers

– Compare results with Michael Billot

• Deliverables includes any code, files, binaries, results, and reports from the project, and possibly a paper

18

ScheduleDate Milestone

November 15, 2010 Wrote any code relating to the cost functions or cost function calculations

November 30, 2010 Calculated the weights for the Weighted Semantic Relation Addition Cost Function

December 30, 2010 Wrote code for the mating, mutation, and genetic algorithm portions

January 15, 2011 Tested all code and have results

January 30, 2011 Contact Michael Billot to get his latest results

February ??, 2011 Paper written and submitted to advisors

February ??, 2011 Final presentation

19

Questions?

20

Mating• Dominant Gene Genetic Algorithm Mating

– Somehow calculate best genes and focus on them

– Mating combines two parent solutions to make two child solutions

• Keep the cost for each gene (word sense) as well as the entire chromosome (disambiguated paragraph)

• Mating two chromosomes

1. Keep the best genes (top half, top third, above average, etc.) from parent 1

2. Add the best genes from parent 2

3. Add the senses from lower genes from parent 1

4. Repeat after swapping both parents

21

Mutations

• Dominant Gene Genetic Algorithm Mutations

– Mating focuses on dominant (best cost) genes, so mutations must focus on the recessive (lower cost) genes

– Mutations tend to focus on one solution

• Keep the cost for each gene (word sense) as well as the entire chromosome (disambiguated paragraph)

• Possible Mutations

1. Randomly change one recessive gene

2. Literally calculate the best sense for a randomly picked semantic relation on a recessive gene

22

References1. Chunhui Zhang, Yiming Zhou, Trevor Martin, "Genetic Word Sense Disambiguation Algorithm," iita, vol. 1, pp.123-127,

2008 Second International Symposium on Intelligent Information Technology Application, 2008

2. Tsuruoka, Yoshimasa. “A part-of-speech tagger for English,” University of Tokyo, 2005. http://www-tsujii.is.s.u-tokyo.ac.jp/~tsuruoka/postagger/

3. Princeton University, "About WordNet," WordNet, Princeton University, 2010. http://wordnet.princeton.edu

4. Basile, P., Degemmis, M., Gentile, A. L., Lops, P., and Semeraro, G. 2007. The JIGSAW Algorithm for Word Sense Disambiguation and Semantic Indexing of Documents. In Proceedings of the 10th Congress of the Italian Association For Artificial intelligence on AI*IA 2007: Artificial intelligence and Human-Oriented Computing (Rome, Italy, September 10 - 13, 2007). R. Basili and M. T. Pazienza, Eds. Lecture Notes In Artificial Intelligence. Springer-Verlag, Berlin, Heidelberg, 314-325

5. Hausman, Michael, “A Dominant Gene Genetic Algorithm for a Transposition Cipher in Cryptography,” University of Colorado at Colorado Springs, May 2009.

6. Hausman, Michael; Erickson, Derrick; “A Dominant Gene Genetic Algorithm for a Substitution Cipher in Cryptography,” University of Colorado at Colorado Springs, December 2009.

7. Princeton University, " The SemCor corpus," MultiSemCor, 2010. http://multisemcor.fbk.eu/semcor.php

8. Patwardhan, S., Banerjee, S., and Pedersen, T. 2003. Using measures of semantic relatedness for word sense disambiguation. In Proceedings of the Fourth International Conference on Intelligent Text Processing and Computational Linguistics, pp. 241–57, Mexico City, Mexico.

9. TORSTEN ZESCH and IRYNA GUREVYCH (2010). Wisdom of crowds versus wisdom of linguists – measuring the semantic relatedness of words. Natural Language Engineering, 16, pp 25-59

http://www-tsujii.is.s.u-tokyo.ac.jp/~tsuruoka/postagger/

http://wordnet.princeton.edu/

http://multisemcor.fbk.eu/semcor.php