29
June 5, 2009 Automated Suggestions for Miscollocations 1 Automated Suggestions for Miscollocations Anne Li-E Liu David Wible Nai-lung Tsao

June 5, 2009Automated Suggestions for Miscollocations 1 Anne Li-E Liu David Wible Nai-lung Tsao

  • View
    217

  • Download
    1

Embed Size (px)

Citation preview

Page 1: June 5, 2009Automated Suggestions for Miscollocations 1 Anne Li-E Liu David Wible Nai-lung Tsao

June 5, 2009 Automated Suggestions for Miscollocations

1

Automated Suggestions for Miscollocations

Anne Li-E Liu

David Wible

Nai-lung Tsao

Page 2: June 5, 2009Automated Suggestions for Miscollocations 1 Anne Li-E Liu David Wible Nai-lung Tsao

June 5, 2009 Automated Suggestions for Miscollocations

2

Overview

• Introduction

• Methodology

• Experimental Results

• Conclusion

Page 3: June 5, 2009Automated Suggestions for Miscollocations 1 Anne Li-E Liu David Wible Nai-lung Tsao

June 5, 2009 Automated Suggestions for Miscollocations

3

Introduction

• Our study focuses on how to find suggestions for miscollocations automatically.

• In this paper, only verb-noun collocations and miscollocations are considered.

Page 4: June 5, 2009Automated Suggestions for Miscollocations 1 Anne Li-E Liu David Wible Nai-lung Tsao

June 5, 2009 Automated Suggestions for Miscollocations

4

Introduction

• Howarth’s (1998) investigation of collocations fo

und in L1 and L2 writers’ writing.

• Granger’s analysis on adverb-adjective collocati

on (1998).

• Liu’s (2002) lexical semantic analysis on the ver

b-noun miscollocations in English Taiwanese Le

arner Corpus.

Page 5: June 5, 2009Automated Suggestions for Miscollocations 1 Anne Li-E Liu David Wible Nai-lung Tsao

June 5, 2009 Automated Suggestions for Miscollocations

5

Introduction

Projects using learner corpora in analyzing and

categorizing learner errors:

• NICT JLE (Japanese Learner English) Corpus

• The Chinese Learner English Corpus (CLEC)

• English Taiwan Learner Corpus (or TLC) (Wible

et al., 2003).

Page 6: June 5, 2009Automated Suggestions for Miscollocations 1 Anne Li-E Liu David Wible Nai-lung Tsao

June 5, 2009 Automated Suggestions for Miscollocations

6

An example

• She tries to improve her students’ problems.

1. solve

2. pose

3. tackle

4. grapple

5. alleviate

6. overcome

7. exacerbate

8. compound

9. beset

10. resolve

reduce

V collocates from Collocation Explorer

Page 7: June 5, 2009Automated Suggestions for Miscollocations 1 Anne Li-E Liu David Wible Nai-lung Tsao

June 5, 2009 Automated Suggestions for Miscollocations

7

Method

• Three features of collocate candidates are used:

1. Word association strength,

2. Semantic similarity

3. Intercollocability (Cowie and Howarth, 1996).

Page 8: June 5, 2009Automated Suggestions for Miscollocations 1 Anne Li-E Liu David Wible Nai-lung Tsao

June 5, 2009 Automated Suggestions for Miscollocations

8

Resource

• 84 VN miscollocations in TLC (Liu, 2002).

Training data: 42 Testing data: 42

• Two knowledge resources: BNC, WordNet

• Two human evaluators.

Page 9: June 5, 2009Automated Suggestions for Miscollocations 1 Anne Li-E Liu David Wible Nai-lung Tsao

June 5, 2009 Automated Suggestions for Miscollocations

9

Word Association Strength

• Mutual Information (Church et al. 1991)

• Two purposes:

1. All suggested correct collocations have to be

identified as collocations.

2. The higher the word association strength the

more likely it is to be a correct substitute for

the wrong collocate.

Page 10: June 5, 2009Automated Suggestions for Miscollocations 1 Anne Li-E Liu David Wible Nai-lung Tsao

June 5, 2009 Automated Suggestions for Miscollocations

10

Semantic Similarity

• A semantic relation holds between a miscollocate and its correct counterpart (Gitsaki et al., 2000; Li

u 2002)

• The synsets of WordNet to be nodes in a graph. measure graph-theoretic distance

*say a story tell a story

Synonymous relation

*say a story think of a story

Hypernymy relation

Page 11: June 5, 2009Automated Suggestions for Miscollocations 1 Anne Li-E Liu David Wible Nai-lung Tsao

June 5, 2009 Automated Suggestions for Miscollocations

11

Semantic Similarity

)),max(2

),(1(max),(

)(),(21

21ji

ji ss

ji

wsynsetswsynsets LL

ssdiswwsim

Page 12: June 5, 2009Automated Suggestions for Miscollocations 1 Anne Li-E Liu David Wible Nai-lung Tsao

June 5, 2009 Automated Suggestions for Miscollocations

12

Intercollocability

• Cowie and Howarth (1996) propose that certain collocations form clusters on the basis of the shared meaning.

convey point get across the message

express concern convey feeling

communicate concern

convey message get across point express concern communicate feeling

Page 13: June 5, 2009Automated Suggestions for Miscollocations 1 Anne Li-E Liu David Wible Nai-lung Tsao

June 5, 2009 Automated Suggestions for Miscollocations

13

Intercollocability

• Collocations in a cluster show a certain degree

of intercollocability.

express one’s concern

condolences

convey messageget across pointexpress concern communicate feeling

express

communicate

concern

feeling

?

Page 14: June 5, 2009Automated Suggestions for Miscollocations 1 Anne Li-E Liu David Wible Nai-lung Tsao

June 5, 2009 Automated Suggestions for Miscollocations

14

Intercollocability

She tries to *improve her students’ problems.

*improve problem

52 noun collocates improve

problem 86 verb collocates

resolve/improve + situation

+ matter

+ way

reduce/improve

+ quality

+ efficiency

+ effectiveness

resolve reduce

Starting point.

Does any of the 86 verbs co-occur with the 52 nouns?

problem problem

Page 15: June 5, 2009Automated Suggestions for Miscollocations 1 Anne Li-E Liu David Wible Nai-lung Tsao

June 5, 2009 Automated Suggestions for Miscollocations

15

situation

matter

problem

way

quality

efficiency

effectiveness

Intercollocability

• The cluster is partially created and the link between

improve, resolve and reduce is developed by virtue of

the overlapping noun collocates.

situation

matter

problem

wayimprove

problemresolve

reduce

Page 16: June 5, 2009Automated Suggestions for Miscollocations 1 Anne Li-E Liu David Wible Nai-lung Tsao

June 5, 2009 Automated Suggestions for Miscollocations

16

Intercollocability

Quantify intercollocability

The number of shared collocates

Page 17: June 5, 2009Automated Suggestions for Miscollocations 1 Anne Li-E Liu David Wible Nai-lung Tsao

June 5, 2009 Automated Suggestions for Miscollocations

17

shared collocate (resolve, improve) = 3shared collocate (reduce, improve) = 3

The more shared collocates a verb has with the wrong verb, the more likely this verb is a good candidate

situation

matter

problem

way

quality

efficiency

effectiveness

situation

matter

problem

wayimprove

problemresolve

reduce

Page 18: June 5, 2009Automated Suggestions for Miscollocations 1 Anne Li-E Liu David Wible Nai-lung Tsao

June 5, 2009 Automated Suggestions for Miscollocations

18

Integrate the 3 features

• The probabilistic model

mc

mc

Ff

cFf

c

mc

ccmcmcc fP

SPSfP

FP

SPSFPFSP

,

,

,

,,

Page 19: June 5, 2009Automated Suggestions for Miscollocations 1 Anne Li-E Liu David Wible Nai-lung Tsao

June 5, 2009 Automated Suggestions for Miscollocations

19

Training

• Probability distribution of word association strength

MI value to 5 levels (<1.5, 1.5~3.0, 3.0~4.5, 4.5~6, >6)

P( MI level )

P(MI level | Sc)

Page 20: June 5, 2009Automated Suggestions for Miscollocations 1 Anne Li-E Liu David Wible Nai-lung Tsao

June 5, 2009 Automated Suggestions for Miscollocations

20

Training

• Probability distribution of semantic similarity

Similarity score to 5 levels(0.0~0.2, 0.2~0.4, 0.4~0.6, 0.6~0.8 and 0.8 ~1.0 )

P(SS level )

P(SS level | Sc)

Page 21: June 5, 2009Automated Suggestions for Miscollocations 1 Anne Li-E Liu David Wible Nai-lung Tsao

June 5, 2009 Automated Suggestions for Miscollocations

21

Training

• Probability distribution of intercollocability

Normalized shared collocates number to 5 levels

(0.0~0.2, 0.2~0.4, 0.4~0.6, 0.6~0.8 and 0.8 ~1.0 )

P(SC level )

P(SC level | Sc)

Page 22: June 5, 2009Automated Suggestions for Miscollocations 1 Anne Li-E Liu David Wible Nai-lung Tsao

June 5, 2009 Automated Suggestions for Miscollocations

22

Experiments• Different combinations of the three features.

Models Feature (s) considered

M1 MI (Mutual Information)

M2 SS (Semantic Similarity)

M3 SC (Shared Collocates)

M4 MI + SS

M5 MI + SC

M6 SS + SC

M7 MI+ SS + SC

Page 23: June 5, 2009Automated Suggestions for Miscollocations 1 Anne Li-E Liu David Wible Nai-lung Tsao

June 5, 2009 Automated Suggestions for Miscollocations

23

Results K-

BestM1 M2

(SS)M3 M4 M5 M6

(SS+SC)M7

(MI+SS+SC)

1 16.67 40.48 22.62 48.81 29.76 55.95 53.75

2 36.90 53.45 38.10 60.71 44.05 63.1 67.86

3 47.62 64.29 50.00 71.43 59.52 77.38 78.57

4 52.38 67.86 63.10 77.38 72.62 80.95 82.14

5 64.29 75.00 72.62 83.33 78.57 83.33 85.71

6 65.48 77.38 75.00 85.71 83.33 84.52 88.10

7 67.86 77.38 77.38 86.90 86.90 86.90 89.29

8 70.24 80.95 82.14 86.90 89.29 88.10 91.67

9 72.62 83.33 85.71 88.10 92.86 90.48 92.86

10 76.19 86.90 88.10 88.10 94.05 90.48 94.05

Page 24: June 5, 2009Automated Suggestions for Miscollocations 1 Anne Li-E Liu David Wible Nai-lung Tsao

June 5, 2009 Automated Suggestions for Miscollocations

24

Results (cont.)

The K-Best suggestions for “get knowledge”.

K-Best M2 M6 M7

1 aim obtain acquire

2 generate share share

3 draw develop obtain

4 obtain generate develop

5 develop acquire gain

Page 25: June 5, 2009Automated Suggestions for Miscollocations 1 Anne Li-E Liu David Wible Nai-lung Tsao

June 5, 2009 Automated Suggestions for Miscollocations

25

The K-Best suggestions for *reach purpose.

K-Best M2 M6 M7

1 achieve achieve achieve

2 teach account account

3 explain trade trade

4 account treat fulfill

5 trade allocate serve

Page 26: June 5, 2009Automated Suggestions for Miscollocations 1 Anne Li-E Liu David Wible Nai-lung Tsao

June 5, 2009 Automated Suggestions for Miscollocations

26

The K-Best suggestions for *pay time.

K-Best M2 M6 M7

1 devote spend spend

2 spend invest waste

3 expend devote devote

4 spare date invest

5 invest waste date

Page 27: June 5, 2009Automated Suggestions for Miscollocations 1 Anne Li-E Liu David Wible Nai-lung Tsao

June 5, 2009 Automated Suggestions for Miscollocations

27

Conclusion

• A probabilistic model to integrate features.

• The early experimental result shows the

potential of this research.

Page 28: June 5, 2009Automated Suggestions for Miscollocations 1 Anne Li-E Liu David Wible Nai-lung Tsao

June 5, 2009 Automated Suggestions for Miscollocations

28

Future works

• Applying such mechanisms to other types of mis

collocations.

• Miscollocation detection will be one of the main

points of this research.

• A larger amount of miscollocations should be inc

luded in order to verify our approach.

Page 29: June 5, 2009Automated Suggestions for Miscollocations 1 Anne Li-E Liu David Wible Nai-lung Tsao

June 5, 2009 Automated Suggestions for Miscollocations

29

Thank you!

Q & A

Anne Li-E Liu [email protected]

David Wible [email protected]

Nai-Lung Tsao [email protected]