39
MORSE: Semantic-ally Drive-n MORpheme SEgment-er Tarek Sakakini Suma Bhat Pramod Viswanath Samuel MORSE minimized the number of on-off clicks for non-verbal communication. This MORSE minimizes the vocabulary size for Natural Language Processing systems.

ACL MORSE update · MORSE: Semantic-ally Drive-n MORphemeSEgment-er Tarek Sakakini Suma Bhat PramodViswanath Samuel MORSEminimized the number of on-off clicks for non-verbal communication

  • Upload
    others

  • View
    1

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ACL MORSE update · MORSE: Semantic-ally Drive-n MORphemeSEgment-er Tarek Sakakini Suma Bhat PramodViswanath Samuel MORSEminimized the number of on-off clicks for non-verbal communication

MORSE:Semantic-ally Drive-n MORpheme SEgment-er

Tarek Sakakini Suma Bhat Pramod Viswanath

Samuel MORSE minimized the number of on-off clicks for non-verbal communication.

This MORSE minimizes the vocabulary size for Natural Language Processing systems.

Page 2: ACL MORSE update · MORSE: Semantic-ally Drive-n MORphemeSEgment-er Tarek Sakakini Suma Bhat PramodViswanath Samuel MORSEminimized the number of on-off clicks for non-verbal communication

Morpheme Segmentation1

Page 3: ACL MORSE update · MORSE: Semantic-ally Drive-n MORphemeSEgment-er Tarek Sakakini Suma Bhat PramodViswanath Samuel MORSEminimized the number of on-off clicks for non-verbal communication

Morpheme Segmentation

Hopefully

Page 4: ACL MORSE update · MORSE: Semantic-ally Drive-n MORphemeSEgment-er Tarek Sakakini Suma Bhat PramodViswanath Samuel MORSEminimized the number of on-off clicks for non-verbal communication

Not a trivial task

+ing

PlayerPlayingBeijing

Butterflies

s

+er+s

Page 5: ACL MORSE update · MORSE: Semantic-ally Drive-n MORphemeSEgment-er Tarek Sakakini Suma Bhat PramodViswanath Samuel MORSEminimized the number of on-off clicks for non-verbal communication

Machine Translation

Applications

Quick

• Rapide

ly

• ment

Sad

• Triste

Sadly

Tristement

Quickly

•Rapidement

Sad

•Triste

Sadly

???

Model:

Test:

Model:

Test:

Page 6: ACL MORSE update · MORSE: Semantic-ally Drive-n MORphemeSEgment-er Tarek Sakakini Suma Bhat PramodViswanath Samuel MORSEminimized the number of on-off clicks for non-verbal communication

ApplicationsInformation Retrieval

Here at Toyota World, we have the cheapest cars in town.We are proudly called the first and last stop.……

Page 7: ACL MORSE update · MORSE: Semantic-ally Drive-n MORphemeSEgment-er Tarek Sakakini Suma Bhat PramodViswanath Samuel MORSEminimized the number of on-off clicks for non-verbal communication

Previous Work2

Page 8: ACL MORSE update · MORSE: Semantic-ally Drive-n MORphemeSEgment-er Tarek Sakakini Suma Bhat PramodViswanath Samuel MORSEminimized the number of on-off clicks for non-verbal communication

Letter Successor Variety (Harris, 1970)

H e l p l e s s l y

Page 9: ACL MORSE update · MORSE: Semantic-ally Drive-n MORphemeSEgment-er Tarek Sakakini Suma Bhat PramodViswanath Samuel MORSEminimized the number of on-off clicks for non-verbal communication

Morfessor (Creutz and Lagos, 2005)

Help: 2387

Helping: 1586

Helper: 498

Helps: 2437

Jump: 1847

Jumping: 1664

Jumper: 1290

Jumps: 2987

Page 10: ACL MORSE update · MORSE: Semantic-ally Drive-n MORphemeSEgment-er Tarek Sakakini Suma Bhat PramodViswanath Samuel MORSEminimized the number of on-off clicks for non-verbal communication

Downsides

FreshmanButterfl iesButterfly ies

Page 11: ACL MORSE update · MORSE: Semantic-ally Drive-n MORphemeSEgment-er Tarek Sakakini Suma Bhat PramodViswanath Samuel MORSEminimized the number of on-off clicks for non-verbal communication

Locally Semantic

car caring

car cars

Cosine similarity

(Schone and Jurafsky, 2000)(Narasimhan et al., 2015)(Luo et al., 2017)

Page 12: ACL MORSE update · MORSE: Semantic-ally Drive-n MORphemeSEgment-er Tarek Sakakini Suma Bhat PramodViswanath Samuel MORSEminimized the number of on-off clicks for non-verbal communication

Distinguishing criteria

car cars

car cars

player players

runner runners

goal goals

play plays

fine fines

wheel wheels

hand hands

laptop laptops

lab labs

Page 13: ACL MORSE update · MORSE: Semantic-ally Drive-n MORphemeSEgment-er Tarek Sakakini Suma Bhat PramodViswanath Samuel MORSEminimized the number of on-off clicks for non-verbal communication

MORSE3

UnsupervisedMorphology Learning

Input:Word Embeddings

Segmentation:Optimization Problem

4 hyperparameters:Small tuning dataset

Page 14: ACL MORSE update · MORSE: Semantic-ally Drive-n MORphemeSEgment-er Tarek Sakakini Suma Bhat PramodViswanath Samuel MORSEminimized the number of on-off clicks for non-verbal communication

Learning Morphology

Step 1

Page 15: ACL MORSE update · MORSE: Semantic-ally Drive-n MORphemeSEgment-er Tarek Sakakini Suma Bhat PramodViswanath Samuel MORSEminimized the number of on-off clicks for non-verbal communication

Collecting candidate morphological rules

Vocabulary: jump play buy jumping playing buying jumper player buyer ….. and stand

(suf, ∅, ing):

(suf, ∅, er):

(pre, ∅, st):

(jump, jumping) (play, playing) (buy, buying)

(jump, jumper) (play, player) (buy, buyer)

(and, stand) (ore, store) (one, stone)

(jump, jumping)(play, playing)(buy, buying) (and, stand)

(Soricut and Och, 2015)

Page 16: ACL MORSE update · MORSE: Semantic-ally Drive-n MORphemeSEgment-er Tarek Sakakini Suma Bhat PramodViswanath Samuel MORSEminimized the number of on-off clicks for non-verbal communication

Signals

Orthographic SemanticWord Embeddings

quick quicklyquick

quicklywrong

wrongly

beautiful

beautifullyconfident

confidently

beautiful beautifully

confident confidently

wrong wrongly

Page 17: ACL MORSE update · MORSE: Semantic-ally Drive-n MORphemeSEgment-er Tarek Sakakini Suma Bhat PramodViswanath Samuel MORSEminimized the number of on-off clicks for non-verbal communication

What makes a good rule?Signal 1: Orthography

(beautiful, beautifully)(quick, quickly)

(confident, confidently)

Rule = (suf, ∅, ly) Size = 8723

……………………………………………………………. (wrong, wrongly)

Rule = (pre, ∅, st) Size= 16

(ore, store) ……(amp, stamp)

Page 18: ACL MORSE update · MORSE: Semantic-ally Drive-n MORphemeSEgment-er Tarek Sakakini Suma Bhat PramodViswanath Samuel MORSEminimized the number of on-off clicks for non-verbal communication

What makes a good rule?Signal 2: Semantics

ore

store

ampstamp

and

stand

one

stone

quick

quicklywrong

wrongly

beautiful

beautifullyconfident

confidently

Page 19: ACL MORSE update · MORSE: Semantic-ally Drive-n MORphemeSEgment-er Tarek Sakakini Suma Bhat PramodViswanath Samuel MORSEminimized the number of on-off clicks for non-verbal communication

What makes a good member of a rule?

Scope: Vocabulary-Wide

quick

quickly

wrong

wrongly

beautiful

beautifully

confident

confidently

on

only

Page 20: ACL MORSE update · MORSE: Semantic-ally Drive-n MORphemeSEgment-er Tarek Sakakini Suma Bhat PramodViswanath Samuel MORSEminimized the number of on-off clicks for non-verbal communication

What makes a good member of a rule?

Scope: Local

confident

confidentlyon

only

Page 21: ACL MORSE update · MORSE: Semantic-ally Drive-n MORphemeSEgment-er Tarek Sakakini Suma Bhat PramodViswanath Samuel MORSEminimized the number of on-off clicks for non-verbal communication

Segmenting

Step 2

Page 22: ACL MORSE update · MORSE: Semantic-ally Drive-n MORphemeSEgment-er Tarek Sakakini Suma Bhat PramodViswanath Samuel MORSEminimized the number of on-off clicks for non-verbal communication

Linear Optimization Problem

uncaring

(caring, uncaring)(ring, uncaring) (uncare, uncaring)

t1

t2

t3

t4

Page 23: ACL MORSE update · MORSE: Semantic-ally Drive-n MORphemeSEgment-er Tarek Sakakini Suma Bhat PramodViswanath Samuel MORSEminimized the number of on-off clicks for non-verbal communication

(care, caring)(car, caring)

Iterate

caring

t1

t2

t3

t4

(carol, caring)

un + caring

Page 24: ACL MORSE update · MORSE: Semantic-ally Drive-n MORphemeSEgment-er Tarek Sakakini Suma Bhat PramodViswanath Samuel MORSEminimized the number of on-off clicks for non-verbal communication

(car, care)

Iterate

care

t1

t2

t3

t4

(ca, care) (re, care)

un + care + ing

Page 25: ACL MORSE update · MORSE: Semantic-ally Drive-n MORphemeSEgment-er Tarek Sakakini Suma Bhat PramodViswanath Samuel MORSEminimized the number of on-off clicks for non-verbal communication

Experiments4

Page 26: ACL MORSE update · MORSE: Semantic-ally Drive-n MORphemeSEgment-er Tarek Sakakini Suma Bhat PramodViswanath Samuel MORSEminimized the number of on-off clicks for non-verbal communication

Experimental Setup

Training Languages Gold Datasets

Morpho Challenge

jumping jump ingplaying play ingjumps jump scalls call srooms room s

Page 27: ACL MORSE update · MORSE: Semantic-ally Drive-n MORphemeSEgment-er Tarek Sakakini Suma Bhat PramodViswanath Samuel MORSEminimized the number of on-off clicks for non-verbal communication

Experiments

64.35

31.0134.06

70.32

38.07

14.98

0

10

20

30

40

50

60

70

80

English Turkish Finnish

Morfessor MORSE

Page 28: ACL MORSE update · MORSE: Semantic-ally Drive-n MORphemeSEgment-er Tarek Sakakini Suma Bhat PramodViswanath Samuel MORSEminimized the number of on-off clicks for non-verbal communication

Morpho Challenge downsides

Business

Turning-point Player’sTurning

Non-compositional

Human error

Trivial instances

Page 29: ACL MORSE update · MORSE: Semantic-ally Drive-n MORphemeSEgment-er Tarek Sakakini Suma Bhat PramodViswanath Samuel MORSEminimized the number of on-off clicks for non-verbal communication

Experiments

◉ 2000 words

◉ Compositional

◉ 91% inter-annotator agreement

◉ In canonical (butterfly + ies) and non-canonical version (butterfl + ies)

New Dataset: SD17

Page 30: ACL MORSE update · MORSE: Semantic-ally Drive-n MORphemeSEgment-er Tarek Sakakini Suma Bhat PramodViswanath Samuel MORSEminimized the number of on-off clicks for non-verbal communication

Results on SD17

57.31

81.01 83.96

0

10

20

30

40

50

60

70

80

90

Morfessor MORSE (tuned on MC)

MORSE (tuned on SD17)

F-score

Page 31: ACL MORSE update · MORSE: Semantic-ally Drive-n MORphemeSEgment-er Tarek Sakakini Suma Bhat PramodViswanath Samuel MORSEminimized the number of on-off clicks for non-verbal communication

Against state-of-the-art

83.9679.9

67.4 67.14

0

10

20

30

40

50

60

70

80

90

MORSE MorphoChain Morfessor S + W Morfessor S + W+ LF-Scores

Page 32: ACL MORSE update · MORSE: Semantic-ally Drive-n MORphemeSEgment-er Tarek Sakakini Suma Bhat PramodViswanath Samuel MORSEminimized the number of on-off clicks for non-verbal communication

Negative Dataset

◉ 100 words like: honeymoon, passport, outdoors

◉ Checks for robustness

43

70

5

10

15

20

25

30

35

40

45

50

Morfessor MORSE

#Segmentations

Page 33: ACL MORSE update · MORSE: Semantic-ally Drive-n MORphemeSEgment-er Tarek Sakakini Suma Bhat PramodViswanath Samuel MORSEminimized the number of on-off clicks for non-verbal communication

Looking forward

◉ Robustness to highly agglutinative languages

◉ Extending to other languages (non-concatenative)

k a t a b ai A

Page 34: ACL MORSE update · MORSE: Semantic-ally Drive-n MORphemeSEgment-er Tarek Sakakini Suma Bhat PramodViswanath Samuel MORSEminimized the number of on-off clicks for non-verbal communication

Looking forward

◉Morphological mappings across languages

(suf, ∅, ly) (suf, ∅, ment)

(suf, ∅, s)

English French

(suf, ∅, es)

(suf, ∅, s)

Page 35: ACL MORSE update · MORSE: Semantic-ally Drive-n MORphemeSEgment-er Tarek Sakakini Suma Bhat PramodViswanath Samuel MORSEminimized the number of on-off clicks for non-verbal communication

Links

https://morse.mybluemix.net https://github.com/yoonlee95/morse_segmentation

Page 36: ACL MORSE update · MORSE: Semantic-ally Drive-n MORphemeSEgment-er Tarek Sakakini Suma Bhat PramodViswanath Samuel MORSEminimized the number of on-off clicks for non-verbal communication

Thank youQuestions?

Page 37: ACL MORSE update · MORSE: Semantic-ally Drive-n MORphemeSEgment-er Tarek Sakakini Suma Bhat PramodViswanath Samuel MORSEminimized the number of on-off clicks for non-verbal communication

Effect of Hyperparameters

PrecisionRecall

Page 38: ACL MORSE update · MORSE: Semantic-ally Drive-n MORphemeSEgment-er Tarek Sakakini Suma Bhat PramodViswanath Samuel MORSEminimized the number of on-off clicks for non-verbal communication

PrerequisiteMorpho-syntactic regularities in word vectors

play

playing

jump

jumping

scream

screaming

s

sing

store

tore

scream

cream

smilemile

slay

lay

Valid rule with an invalid instance Invalid rule(suf, ∅, ing) (s, sing) (pre, ∅, s)

Page 39: ACL MORSE update · MORSE: Semantic-ally Drive-n MORphemeSEgment-er Tarek Sakakini Suma Bhat PramodViswanath Samuel MORSEminimized the number of on-off clicks for non-verbal communication

Demo4

morse.mybluemix.net