63
Evaluation Measure in Language Acquisition Charles Yang University of Pennsylvania LING50 MIT

Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

Evaluation Measure inLanguage Acquisition

Charles YangUniversity of Pennsylvania

LING50MIT

Page 2: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

From LSLT to Aspects: A Psychological Turn

Page 3: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

From LSLT to Aspects: A Psychological Turn

• LSLT

• “any simplification along these lines is immediately reflected in the length of the grammar” (§26: MDL)

• “defined the best analysis as the one that minimizes information per word in the generated language of grammatical discourses” (§35.4: entropy)

• 23.771 Mathematical Backgrounds for Communication (Hall/Partee)

Page 4: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

From LSLT to Aspects: A Psychological Turn

• LSLT

• “any simplification along these lines is immediately reflected in the length of the grammar” (§26: MDL)

• “defined the best analysis as the one that minimizes information per word in the generated language of grammatical discourses” (§35.4: entropy)

• 23.771 Mathematical Backgrounds for Communication (Hall/Partee)

• Aspects

• “... theories require supplementation by an evaluation measure if language acquisition is to be accounted for ... such a measure is not given a priori, in some manner. Rather, any proposal concerning such a measure is an empirical hypothesis about the nature of language” (p37)

Page 5: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

Three Psychological Conditions

Page 6: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

Three Psychological Conditions

• Formal sufficiency: Does the evaluation measure choose the “correct”grammar?

• Distributional methods: Harris (1951), Fowler (1952), Holt (1953)

Page 7: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

Three Psychological Conditions

• Formal sufficiency: Does the evaluation measure choose the “correct”grammar?

• Distributional methods: Harris (1951), Fowler (1952), Holt (1953)

• Ecological validity: Does the evaluation measure operate under reasonable assumptions about the learning data and mechanisms?

Page 8: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

Three Psychological Conditions

• Formal sufficiency: Does the evaluation measure choose the “correct”grammar?

• Distributional methods: Harris (1951), Fowler (1952), Holt (1953)

• Ecological validity: Does the evaluation measure operate under reasonable assumptions about the learning data and mechanisms?

• Developmental compatibility: Does the evaluation measure employed by the learner produce similar developmental patterns in language acquisition?

Page 9: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

MDL: An Evaluation Measure

DL(D,G) = |G|+ log

1

p(D|G)

Page 10: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

MDL: An Evaluation Measure

• MDL is similar with (or equivalent to) many other approaches such as Bayesian inference

DL(D,G) = |G|+ log

1

p(D|G)

Page 11: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

MDL: An Evaluation Measure

• MDL is similar with (or equivalent to) many other approaches such as Bayesian inference

• A method for hypothesis selection rather than hypothesis proposing

• Three conditions for choosing alternative methods, e.g., reinforcement learning, Fourier transform

DL(D,G) = |G|+ log

1

p(D|G)

Page 12: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

MDL: An Evaluation Measure

• MDL is similar with (or equivalent to) many other approaches such as Bayesian inference

• A method for hypothesis selection rather than hypothesis proposing

• Three conditions for choosing alternative methods, e.g., reinforcement learning, Fourier transform

• The composition of data

DL(D,G) = |G|+ log

1

p(D|G)

Page 13: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

MDL: An Evaluation Measure

• MDL is similar with (or equivalent to) many other approaches such as Bayesian inference

• A method for hypothesis selection rather than hypothesis proposing

• Three conditions for choosing alternative methods, e.g., reinforcement learning, Fourier transform

• The composition of data

• Subset Principle (Berwick 1985): the first Evaluation Measure to influence empirical work in language acquisition

DL(D,G) = |G|+ log

1

p(D|G)

Page 14: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

Evaluation Measuring Parameters

Page 15: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

Evaluation Measuring Parameters

• Aspects: “We want the hypotheses compatible with fixed data to be “scattered” in value, so that choice among them can be made relatively easily” (p61)

Page 16: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

Evaluation Measuring Parameters

• Aspects: “We want the hypotheses compatible with fixed data to be “scattered” in value, so that choice among them can be made relatively easily” (p61)

• Parameters can be viewed as low dimensional description of syntactic variation, or MDL

Page 17: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

Evaluation Measuring Parameters

• Aspects: “We want the hypotheses compatible with fixed data to be “scattered” in value, so that choice among them can be made relatively easily” (p61)

• Parameters can be viewed as low dimensional description of syntactic variation, or MDL

• CUNY CoLAG Parameter Domain (Sakas & J.D.Fodor in press): 13 parameters, 3072 grammars, 48086 distinct degree-0 sentences

• Most parameters are favorable for the learner and can be set independently (thus “scattered” well)

Page 18: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

Evaluation Measuring Parameters

• Aspects: “We want the hypotheses compatible with fixed data to be “scattered” in value, so that choice among them can be made relatively easily” (p61)

• Parameters can be viewed as low dimensional description of syntactic variation, or MDL

• CUNY CoLAG Parameter Domain (Sakas & J.D.Fodor in press): 13 parameters, 3072 grammars, 48086 distinct degree-0 sentences

• Most parameters are favorable for the learner and can be set independently (thus “scattered” well)

• Evidence for parameters in language acquisition

Page 19: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

MDL in Action

Page 20: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

MDL in Action

fly

bring

blow

think

catch

draw

Add -dwalk(Regulars)

flew

brought

blew

thought

caught

drew

walked(Regulars)

Stem Past tense

fly

bring

blow

think

catch

draw

Add -dwalk(Regulars)

flew

brought

blew

thought

caught

drew

walked(Regulars)

Stem Past tense

associations

Page 21: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

MDL in Action

Page 22: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

MDL in Action

Page 23: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

MDL in Action

flybringblowthinkcatchdraw

add -t & Rime !/a/

add -ø & Rime !/u/

Add -dwalk(Regulars)

flewbroughtblewthoughtcaughtdrewwalked(Regulars)

Stem Past tense

Page 24: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

MDL in Action

flybringblowthinkcatchdraw

add -t & Rime !/a/

add -ø & Rime !/u/

Add -dwalk(Regulars)

flewbroughtblewthoughtcaughtdrewwalked(Regulars)

Stem Past tense

“ ... the acceptance of these Laws (Grimm’s and Verner’s) as historicalfact is based wholly on considerations of simplicity”

Halle (1961: On the role of simplicity in linguistic descriptions)

Page 25: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

Evidence from Children• Largest past tense analysis to date (Gorman & Yang 2011)

• Free-rider effect: verbs belonging to larger rules learned better

Page 26: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

Evidence from Children• Largest past tense analysis to date (Gorman & Yang 2011)

• Free-rider effect: verbs belonging to larger rules learned better

Spearman ρ Kendall τ G-K γ☞Abstract rules 0.276 0.191 0.202

Surface rules 0.267 0.180 0.190Words only 0.128 0.133 0.140

Page 27: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

Evidence from Children• Largest past tense analysis to date (Gorman & Yang 2011)

• Free-rider effect: verbs belonging to larger rules learned better

Spearman ρ Kendall τ G-K γ☞Abstract rules 0.276 0.191 0.202

Surface rules 0.267 0.180 0.190Words only 0.128 0.133 0.140

Results and discussion • The effects of verb and pattern frequency on CUR (aggregated across all subjects) can be compared with rank correlation coefficients

• Similarly, verbs with low verb frequency and high pattern frequency verbs have significantly higher CURs than verbs with high verb frequency/low pattern frequency (two-tailed Mann-Whitney W = 193, p = 0.035)

• Both these results indicate that pattern frequency trumps verb frequency

•The relative effects of verb and pattern frequency are finally analyzed by fitting a logistic mixed effects regression model with the maximal random effects structure; as verb and pattern frequency are closely correlated (R = 0.635), log verb frequency is first residualized [14] against log pattern frequency

• Increases in both pattern and verb frequency result in fewer overregularizations

• The results are consistent with linguistic models [3, 6, 7, 15] which derive irregular pasts by stem change and suffixation, but storage models have no locus for the free-rider effect

• Future work will use this data to evaluate similarity-based models [5, 15, 16]

English irregular past tense errors and the free-rider effect

[1] S. Kuczaj. 1977. JoVL&VB 16. [2] S. Prasada, S. Pinker. 1993. L&CP 8 [3] M. Halle, K.P. Mohanan. 1985. LI 16. [4] J. Bybee, D. Slobin. 1982. Lg 58. [5] J. McClelland, K. Patterson. 2002. TiCS 6. [6] L. Stockall, A. Marantz. 2006. Mental Lexicon 1. [7] C. Yang. 2002. Knowledge and learning in natural language. Oxford: OUP. [8] R. Brown. 1973. A first language: The early stages. Cambridge: HUP. [9] K. Demuth, J. Culbertson, J. Alter. 2006. L&S 49. [10] W. Hall, W. Nagy, R. Linn. 1984. Spoken words: Effects of situation and social group on oral word usage and frequency. Hillsdale, NJ: Erlbaum. [11] G. Marcus, S. Pinker, M. Ullman, M. Hollander, J. Rosen, F. Xu. 1992. Overregularization in language acquisition. Chicago: UCP. [12] R. Maslen, A. Theakston, E. Lieven, M. Tomasello. 2004. JoSL&HR 47. [13] E. Brill. 1995. Computational Linguistics 21. [14] K. Gorman. 2010. In PWPL 16.2. [15] A. Albright and B. Hayes. 2003. Cognition 90. [16] S. Chandler. 2010. Cognitive Lingustics 21.

Kyle Gormanad • David Fabera • Elika Bergelsonbd • Charles Yangabcd

aDept. of Linguistics • bDept. of Psychology • cDept. of Computer & Information Science • dInstitute for Research in Cognitive Science

The debate • SOME BACKGROUND ON THE SUBJECT

• Storage accounts [1, 2]

• Other accounts, both symbolic [3, 4] and connectionist [5]

The free-rider effect • (definition of overregularization w/ example)

• (say we’ll ignore *broughted and similar errors)

• Yang [7] identifies the free-rider effect: children are quick to master low-frequency irregulars (e.g, wake/woke, forget/forgot) if they follow the same pattern as high-frequency irregulars (break/broke, get/got)

• This requires some representation to be shared by wake and break, and thus is antithetical to the storage account

• Recent experiments on adults also provide support for the derivational account: irregular past tense verbs (e.g., woke) fully prime their base (e.g., wake) in lexical decision [6]

• This is the largest ever study of children’s overregularization errors

Methods • The raw data are CHILDES transcripts of the spontaneous speech of 50 children ages 2-5 who are acquiring monolingual English and have no known cognitive or hearing deficits [8-12]

• A part-of-speech tagger [13] is used to locate verbs in the transcripts; correct and incorrect past tense tokens of irregular verbs are automatically tabulated and then verified by hand; however, immediate repetitions of a verb by a child are counted only one time

• A separate script aggregates the speech of parents and computes verb input frequency

• This coding is validated by comparing the resulting per-verb correct usage rate (CUR) for each of the children studied by Marcus et al. [11] using the Spearman rank correlation coefficient; the results indicates a close match (Adam: ρ = 0.92, Eve: ρ = 0.99, Sarah: ρ = 0.97)

•The procedure yields 11,735 tokens of 97 irregular verb types, and 839 overregularization tokens across 75 verb types (CUR = 92.9%)

• Irregular verbs are grouped into 22 patterns according to the stem change and/or suffix found in the past tense (relative to the present)

• Unlike prior attempts to categorize the English irregular verbs [4, 7, 11], this larger set of patterns uniquely determines the observed past tense from the present form

Pattern Example N Pattern Example N No change burst/burst 13 [ɩ] slide/slid 3

[oʊ] drive/drove 10 [ɔ] wear/wore 3

[ʌ] dig/dug 9 [oʊ...-d] sell/sold 2

[æ] sit/sat 8 [aʊ] find/found 2

[u:] blow/blew 7 [ɛ...-d] say/said 1

[ɛ] breed/bred 7 [ɔ...-t] lose/lost 1

[ɛ...-t] keep/kept 7 [ɛɚ...-d] hear/heard 1

[-t] build/built 5 was be/was 1

[eɩ] eat/ate 5 went go/went 1

rime→[ɔt] buy/bought 5 make make/made 1

[ɑ] shoot/shot 3 stood stand/stood 1

Spearman ρ Kendall τb G-K γ Pattern frequency 0.267 0.180 0.190

Verb frequency 0.128 0.133 0.144

Estimate Std. err. p-value Log pattern frequency 0.488 0.160 0.002

Resid. log verb frequency 0.413 0.081 4.2e-7

< avg. rule freq. > avg. verb freq. > avg. rule freq. < avg. verb freq.

two-tailed Mann-Whitney W=156.5, p=0.019

Page 28: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

Yesterday he ____

Page 29: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

Yesterday he ____

Page 30: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

Yesterday he ____

• The forgotten Wug test (Berko 1958)

• Only one out of 86 children produced bing-bang, gling-glang

Page 31: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

Yesterday he ____

• The forgotten Wug test (Berko 1958)

• Only one out of 86 children produced bing-bang, gling-glang

• Children over-regularize: 8-10%

Page 32: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

Yesterday he ____

• The forgotten Wug test (Berko 1958)

• Only one out of 86 children produced bing-bang, gling-glang

• Children over-regularize: 8-10%

• Children over-irregularize: <0.2%

Page 33: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

Yesterday he ____

• The forgotten Wug test (Berko 1958)

• Only one out of 86 children produced bing-bang, gling-glang

• Children over-regularize: 8-10%

• Children over-irregularize: <0.2%

• Children’s Evaluation Measure produces a binary outcome: productive or lexical

• probability spreading insufficient

Page 34: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

Exceptions in Evaluation Measure

Page 35: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

Exceptions in Evaluation Measure

“core”, “basic word order”, “default case”, “unmarked form”vs.

“periphery”, “lexical listing”, “exceptional marking”, “diacritics”

Page 36: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

Exceptions in Evaluation Measure

• SPE: “Clearly, we must design our linguistic theory in such a way that the existence of exceptions does not prevent the systematic formulation of those regularities that remain ... Finally, an overriding consideration is that the evaluation measure must be designed in such a way that the wider and more varied the class of exceptions to a rule, the less highly valued is the grammar” (p172)

“core”, “basic word order”, “default case”, “unmarked form”vs.

“periphery”, “lexical listing”, “exceptional marking”, “diacritics”

Page 37: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

Exceptions in Evaluation Measure

• SPE: “Clearly, we must design our linguistic theory in such a way that the existence of exceptions does not prevent the systematic formulation of those regularities that remain ... Finally, an overriding consideration is that the evaluation measure must be designed in such a way that the wider and more varied the class of exceptions to a rule, the less highly valued is the grammar” (p172)

• But majority doesn’t rule: 90% of English words in speech are stress initial (Cutler & Carter 1987); Legate & Yang poster

“core”, “basic word order”, “default case”, “unmarked form”vs.

“periphery”, “lexical listing”, “exceptional marking”, “diacritics”

Page 38: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

Measuring Rules

Page 39: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

Measuring Rules

Page 40: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

Measuring Rules

• Optimization again: instead of space, it’s time

Page 41: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

Measuring Rules

• Optimization again: instead of space, it’s time

• Exceptions = Real time processing slowdown

• “kicked the bucket” faster than “lifted the bucket” by 51ms (Swinney & Cutler 1979)

• Production: German irregular past participle (-n) faster than regular (-t) by 38ms (Clahsen & Fleischhauer 2011)

• Lexical decision: English irregular verbs faster than regulars by 19ms (English Lexicon Project; Lignos 2011)

Page 42: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

Measuring Rules

• Optimization again: instead of space, it’s time

• Exceptions = Real time processing slowdown

• “kicked the bucket” faster than “lifted the bucket” by 51ms (Swinney & Cutler 1979)

• Production: German irregular past participle (-n) faster than regular (-t) by 38ms (Clahsen & Fleischhauer 2011)

• Lexical decision: English irregular verbs faster than regulars by 19ms (English Lexicon Project; Lignos 2011)

• Exceptions delay rule computation

Page 43: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

Measuring Rules

• Optimization again: instead of space, it’s time

• Exceptions = Real time processing slowdown

• “kicked the bucket” faster than “lifted the bucket” by 51ms (Swinney & Cutler 1979)

• Production: German irregular past participle (-n) faster than regular (-t) by 38ms (Clahsen & Fleischhauer 2011)

• Lexical decision: English irregular verbs faster than regulars by 19ms (English Lexicon Project; Lignos 2011)

• Exceptions delay rule computation

•Exception 1•Exception 2 •Exception 3•...•Rule

Page 44: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

Tolerance Principle

Page 45: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

Tolerance Principle

N

lnN

To be productive, the maximum exceptions to a rule/process applicable to N items is

Page 46: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

Tolerance Principle

N

lnN

To be productive, the maximum exceptions to a rule/process applicable to N items is

• If English has 150 irregular verbs, we need 900 regulars to have a productive -ed rule: 1050/ln(1050) = 150

Page 47: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

Tolerance Principle

N

lnN

To be productive, the maximum exceptions to a rule/process applicable to N items is

• If English has 150 irregular verbs, we need 900 regulars to have a productive -ed rule: 1050/ln(1050) = 150

• Children start over-regularization when they reach the tipping point

Page 48: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

Tolerance Principle

N

lnN

To be productive, the maximum exceptions to a rule/process applicable to N items is

• If English has 150 irregular verbs, we need 900 regulars to have a productive -ed rule: 1050/ln(1050) = 150

• Children start over-regularization when they reach the tipping point

• N (e.g., vocabulary size) and the number of exceptions may vary from speaker to speaker, accounting for certain individual patterns in language acquisition and sociolinguistic variation

Page 49: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

A Birth-er Problem

Page 50: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

A Birth-er Problem

• The suffix -er is productive and segmented in real time even for broth-er (Rastle, Davis & New 2004) resulting in slowdown (Lignos 2011)

Page 51: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

A Birth-er Problem

• The suffix -er is productive and segmented in real time even for broth-er (Rastle, Davis & New 2004) resulting in slowdown (Lignos 2011)

• While some -er’s are real (hunt-hunter), some are not (corn-corner, cent-center, sock-soccer): children need to learn -er despite exceptions

Page 52: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

A Birth-er Problem

• The suffix -er is productive and segmented in real time even for broth-er (Rastle, Davis & New 2004) resulting in slowdown (Lignos 2011)

• While some -er’s are real (hunt-hunter), some are not (corn-corner, cent-center, sock-soccer): children need to learn -er despite exceptions

• English Lexicon Project (Balota et al. 2007)

• hunt-hunter type: 94, cent-center type: 18

• The suffix -er is productive: 18 < 112/ln(112)=24

Page 53: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

A Birth-er Problem

• The suffix -er is productive and segmented in real time even for broth-er (Rastle, Davis & New 2004) resulting in slowdown (Lignos 2011)

• While some -er’s are real (hunt-hunter), some are not (corn-corner, cent-center, sock-soccer): children need to learn -er despite exceptions

• English Lexicon Project (Balota et al. 2007)

• hunt-hunter type: 94, cent-center type: 18

• The suffix -er is productive: 18 < 112/ln(112)=24

• The suffix -th fails to reach productivity: warmth, width, depth etc. overwhelmed by tooth, booth, filth, forth, ...

Page 54: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

stride-strode-??? N

lnN

Page 55: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

stride-strode-???

• Mere majority is not sufficient; filibuster proof majority required

N

lnN

Page 56: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

stride-strode-???

• Mere majority is not sufficient; filibuster proof majority required

• [-lexical insertion] (Halle 1972, esp. fn1)

• gaps only arise in unproductive corners of morphology

N

lnN

Page 57: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

stride-strode-???

• Mere majority is not sufficient; filibuster proof majority required

• [-lexical insertion] (Halle 1972, esp. fn1)

• gaps only arise in unproductive corners of morphology

• 102 out of 161 irregular verbs (36%) show preterite and past participle syncretism

• Tolerance Principle only allows 1/ln(161)=20% exceptions

• *forwent, *sightsaw, *stridden

N

lnN

Page 58: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

Evaluation Metrics

Page 59: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

Evaluation Metrics

• What not to do: Computer chess

Page 60: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

Evaluation Metrics

• What not to do: Computer chess

• Resource bounded optimization

Page 61: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

Evaluation Metrics

• What not to do: Computer chess

• Resource bounded optimization

• Convergence of methods and disciplines

Page 62: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

Evaluation Metrics

• What not to do: Computer chess

• Resource bounded optimization

• Convergence of methods and disciplines

• Simple theories are usually right ones

Page 63: Evaluation Measure in Language Acquisitionling50.mit.edu/wp-content/uploads/Yang-slides.pdfEvaluation Measuring Parameters • Aspects: “We want the hypotheses compatible with fixed

Thank you, to my teachers

• Bob Berwick

• Noam Chomsky

• Morris Halle