80
Generalization from attested sequences Preferences among unattested sequences Combining prior and learned preferences Similarity, feature-based generalization and bias in novel onset clusters Adam Albright Massachusetts Institute of Technology 8 July 2007 Adam Albright Bias in novel onset clusters (1/80)

Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Similarity, feature-based generalization and bias innovel onset clusters

Adam Albright

Massachusetts Institute of Technology

8 July 2007

Adam Albright Bias in novel onset clusters (1/80)

Page 2: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Introduction

Well-established relation between lexical statistics and gradientacceptability

Novel items with high-frequency combinations of phonemes,morphemes, etc., tend to sound more “English-like” thanitems with rare or unattested combinations

E.g., for phonotactics:

Coleman and Pierrehumbert (1997); Treiman, Kessler,Knewasser, Tincoff, and Bowman (2000); Frisch, Large, andPisoni (2000); Bailey and Hahn (2001); Hammond (2004);Hayes and Wilson (in press); and many others

Adam Albright Bias in novel onset clusters (2/80)

Page 3: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Example: novel words ending in emp

Bailey and Hahn (2001): wordlikeness ratings of novel words(“How typical sounding is blemp?”)

Correlated against type frequency of onsets from CMUPronouncing Dictionary, counted by Hayes & Wilson (in press)

Clear preference for more frequently attested onsets

3.5

4.0

4.5

5.0

5.5

6.0

6.5

7.0

10 100 1000 10000Onset type frequency (CMU Dict)

Mea

n ra

ting

(1=

low

, 7=

high

)

shremp glemp

gemp

lemp

blempklemp

flemp

slemp plemp

Adam Albright Bias in novel onset clusters (3/80)

Page 4: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

The limits of attestedness

Growing body of literature investigating preferences that donot follow straightforwardly from statistics of the input data

+ Preference for some attested sequences over others

(Moreton 2002, 2007; Wilson 2003; Zhang and Lai 2006)

+ Preference for some unattested sequences over others

(Wilson 2006; Finley & Badecker 2007; Berent & al., in press)

Such cases have potential to reveal substantive analytical bias(Wilson 2006)

Adam Albright Bias in novel onset clusters (4/80)

Page 5: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Example

Berent et al. (in press)

English speakers prefer initial #bn over #bd

More likely to interpret [bdIf] as [b@dIf], without a cluster

Little direct evidence in favor of #bn � #bd

Few if any attested examples: bnai brith, bdelliumVery few words that could potentially exhibit initial /b@n/,/b@d/ → [bn], [bd] (beneath)Also few words with medial /b@n/, /b@d/ → [bn], [bd],(nobody, ebony, Lebanon; generally fail to syncopate)In final position, [bd] is well attested (grabbed, described), but[bn] is unattested

Preference evidently not due to greater exposure to [bn]

Perhaps due to bias towards rising sonority profiles?

Adam Albright Bias in novel onset clusters (5/80)

Page 6: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Indirect generalization

Although English speakers have relatively little directexperience with [bd], [bn], they have plenty of experience withclusters like [bl] and [br]

More generally, initial stops are always followed by a sonorant(C2 or vowel)

Perhaps preference for #bn could be inferred from distributionof occurring clusters

Perceptual similarity

+ #bn perceptually closer to #bl than #bd is (?)

Featural similarity

+ #bn part of a broader pattern of stop+coronal sonorantsequences (Hammond, Pater yesterday)

Adam Albright Bias in novel onset clusters (6/80)

Page 7: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Goals of this talk

Report on some attempts to model preferences like #bn �#bd based on indirect inference from attested clusters

Test the extent to which they can be predicted by a statisticalmodel, without prior markedness biasesOf course, a successful data-driven model doesn’t prove thathumans learn similarlyHowever, the case for prior bias is diminished

Preview: mixed results

Some preferences potentially learnable, given certainassumptions (e.g., #bn � #bd)Others not learned by any model tested so far (#bw � #bn)

Provisional claim: best model of speaker preferences combineslearned statistical generalizations and markedness biases

Adam Albright Bias in novel onset clusters (7/80)

Page 8: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Outline

Compare two models of gradient acceptability of attestedsequences

A feature-based grammatical modelA similarity-based analogical model (Generalized NeighborhoodModel; Bailey and Hahn 2001)

Test models’ ability to capture preferences among unattestedonset clusters, by generalization from attested clusters

Sonority preferences in stop+C clustersSonority + place preferences in #bw vs. #dlPlace preferences in sC clusters

Pay-off of combining phonetic biases with learned statisticalpreferences

Adam Albright Bias in novel onset clusters (8/80)

Page 9: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Similarity-based generalizationFeature-based generalization

What we want a model to do

Some desiderata for a statistical model of gradient phonotactics

Trained with realistic L1 input

Child-directed dataApproximated here with adult corpus data (CELEX)

Predict relative preferences among combinations of attestedsequences

#kl, #fl � #gl, #sl

Predict relative preferences among combinations of unattestedsequences

#bw, #bn > #bd, #bz

Able to make predictions for entire words

stip [stIp], plake [pleIk] � chool [tSu:l], nung [n2N]mrung [mr2N], vlerk [vlr

"k] � shpale [SpeIl], zhnet [ZnEt]

Adam Albright Bias in novel onset clusters (9/80)

Page 10: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Similarity-based generalizationFeature-based generalization

Two types of generalization

Two fundamentally different modes of generalization

Comparison to the lexicon: how similar are blick, bnick to theset of existing words?

≈ ‘Dictionary’ task (Schutze 2005)

Evaluation of substrings: how probable/legal are thesubstrings that make up blick, bnick?

≈ Grammatical acceptability

Plausible that speakers perform both types of comparison, tovarying degrees depending on the task (Vitevitch & Luce1999; Bailey & Hahn 2001; Shademan 2007; and others)

Adam Albright Bias in novel onset clusters (10/80)

Page 11: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Similarity-based generalizationFeature-based generalization

Goal of this section

Sketch models that instantiate these two types ofgeneralization

Present benchmarking results on two types of data

Ratings of nonce words with (mostly) attested sequencesRatings of a mix of attested and unattested onset clusters(Scholes 1966)

+ Although neither model is perfect, both provide a reasonablefirst-pass estimate of gradient phonotactic acceptability forthese data sets

Adam Albright Bias in novel onset clusters (11/80)

Page 12: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Similarity-based generalizationFeature-based generalization

Neighborhood density

A crude but widely used estimate of similarity to the lexicon:neighborhood density

Number of words that differ from target word by one change(Greenberg & Jenkins 1964; Coltheart, Davelaar, Jonasson &Besner 1977; Luce 1986)

Generally inadequate for non-words: most have few or noone-change neighbors (Bailey and Hahn 2001)

Adam Albright Bias in novel onset clusters (12/80)

Page 13: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Similarity-based generalizationFeature-based generalization

The Generalized Neighborhood Model

Bailey and Hahn’s (2001) Generalized Neighborhood Model

Support depends on gradient similarity to existing words,rather than one-change threshold

Prob(novel word) ∝∑

Similarity(novel word,existing words)

Every existing word contributes some support, but in mostcases it’s quite small

To be well supported by the lexicon, a novel word should berelatively similar to a decent number of existing words (formodel details, see Bailey and Hahn 2001)

Adam Albright Bias in novel onset clusters (13/80)

Page 14: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Similarity-based generalizationFeature-based generalization

The Generalized Neighborhood Model

Model parameters

Lexicon modeled as set of lemmas with freq > 0 in CELEX

Similarities calculated using natural class model of Frisch,Pierrehumbert and Broe (2004)

No advantage found for using surface word forms, or tokenfrequency

Remaining parameters selected by fitting

Adam Albright Bias in novel onset clusters (14/80)

Page 15: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Similarity-based generalizationFeature-based generalization

Testing the model

Benchmarking data

92 wug words, used in pre-test to past tense study (Albrightand Hayes 2003)

A few rare or illegal sequences ( #Sw, V:nT#, etc.)

70 wug words with no unattested sequences (Albright, inprep.)

Chosen randomly from set of 205 words used in a larger study

Adam Albright Bias in novel onset clusters (15/80)

Page 16: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Similarity-based generalizationFeature-based generalization

Testing the model

Experimental details

Presented auditorily in simple frame sentences (e.g.: ‘[stIp]. Ilike to [stIp].’)

Subjects repeated aloud, and rated from 1 (implausible as anEnglish word) to 7 (would make a fine English word)

Ratings discarded from trials in which subjects repeated wordincorrectly

Adam Albright Bias in novel onset clusters (16/80)

Page 17: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Similarity-based generalizationFeature-based generalization

GNM results

Albright and Hayes (2003) data: Albright (in prep.) data:(r(92) = .557) (r(70) = .592)

R2 = 0.31

2.5

3.0

3.5

4.0

4.5

5.0

5.5

6.0

6.5

0 1 2 3 4 5 6

GNM predicted score (arbitrary units)

Mea

n r

atin

g (

1=

low

, 7=

hig

h) R2 = 0.35

2.5

3.0

3.5

4.0

4.5

5.0

5.5

6.0

6.5

0 0.1 0.2 0.3 0.4 0.5

GNM predicted score (arbitrary units)M

ean

rat

ing (

1=

low

, 7=

hig

h)

Significant moderate fit, though room for improvement

Possibly improved by non-linear fit (Hayes and Wilson, toappear)

Reasonable first pass estimate of gradient acceptability

Adam Albright Bias in novel onset clusters (17/80)

Page 18: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Similarity-based generalizationFeature-based generalization

Biphone probabilities

Simple model of attested sequences: biphone probabilities

Biphone = sequence of two segments

Probability of a two-segment sequence:

Probability of bl:

P(bl) =Count(ba)

Count(all biphones)

Transitional probability from b to l:

P(l|b) =Count(bl)

Count(all b-initial biphones)

Probability of a word [blIk]Joint probability (product) of biphonesAverage probability of biphonesMany other possibilities. . .

Adam Albright Bias in novel onset clusters (18/80)

Page 19: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Similarity-based generalizationFeature-based generalization

Biphone probabilities

Biphones are not enough

Versions of simple biphone probabilities can do reasonably wellmodeling acceptability of monosyllabic non-words made up ofattested sequences (Bailey and Hahn 2001, and others)

Literal biphones cannot distinguish among unattestedsequences

P(bn) = P(bd) = 0

Adam Albright Bias in novel onset clusters (19/80)

Page 20: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Similarity-based generalizationFeature-based generalization

Feature-based generalization

Generalization to novel sequences using phonological features

Halle (1978, attributed originally to Lise Menn): the Bachtest

Plural of [bax] is [baxs]/*[baxz]/*[bax@z]Generalization according to feature [±voice] of final segment

Adam Albright Bias in novel onset clusters (20/80)

Page 21: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Similarity-based generalizationFeature-based generalization

Feature-based generalization

Generalization to novel sequences using phonological features

Even without direct evidence about #bn, #bd, Englishlearners do get evidence about stop+sonorant (or evenstop+consonant) sequences, from sequences like bl, br, sn

Interpolate: #br, #sn :

[−syllabic−sonorant

][−syllabic+sonorant

]

Extrapolate: #br, #bl :

−sonorant−continuant+voice+labial

[−syllabic+sonorant

]

Adam Albright Bias in novel onset clusters (21/80)

Page 22: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Similarity-based generalizationFeature-based generalization

Feature-based generalization

Goal of this model

Learn constraints on possible two-segment sequences, statedin terms of natural classes, and evaluate the amount ofsupport they get from the data

Adam Albright Bias in novel onset clusters (22/80)

Page 23: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Similarity-based generalizationFeature-based generalization

Generalizing from segments to natural classes

Minimal Generalization approach (Albright and Hayes 2002, 2003)

b l u+ b r u

→ b

24 +consonantal+sonorant−nasal

35 u

+ g r u

24 −sonorant−continuant+voice

35 24 +consonantal+sonorant−nasal

35 u

Input data forms compared pair-wise, extracting what theyhave in common

Generalize: Shared feature values are retained, unsharedvalues are eliminated

Adam Albright Bias in novel onset clusters (23/80)

Page 24: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Similarity-based generalizationFeature-based generalization

What to compare with what?

Comparison between [bl] and [gr] is sort of obvious (they havea lot in common)

Unfortunately, not all comparisons are so informative

By comparing dissimilar clusters, we support very broadabstractions (almost all feature specifications eliminated)E.g., b+s or l+p → almost any C

b l a+ s p a

24 +consonantal−nasal−lateral

35 24 +consonantal−nasal−strident

35 a

Adam Albright Bias in novel onset clusters (24/80)

Page 25: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Similarity-based generalizationFeature-based generalization

A potentially fatal prediction

A dangerous misstep: bl + sp → CC

In fact, CC clusters are very well attested in English

Potentially fatal prediction: bdack [bdæk] should be veryacceptable, because it contains a well-attested sequence:24 +consonantal−nasal−lateral

3524 +consonantal−nasal−strident

35

Adam Albright Bias in novel onset clusters (25/80)

Page 26: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Similarity-based generalizationFeature-based generalization

The challenge

Find a way to generalize over natural classes such that initial[bl] and [br] provide moderate support for [bn], even though itis outside the feature space that they define

Prevent comparisons like [bl], [sp] from generalizing to [bd],even though it is within the space they define

Adam Albright Bias in novel onset clusters (26/80)

Page 27: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Similarity-based generalizationFeature-based generalization

Penalizing sweeping generalizations

Intuitively, [dw] + [gw] : [bw] isn’t too great an inductive leap

This is, in part, because the resulting abstraction is so specific

Just need to specify labiality to get [bw]

To get [bd] from

24 +consonantal−nasal−lateral

3524 +consonantal−nasal−strident

35, we must

fill in many features

Adam Albright Bias in novel onset clusters (27/80)

Page 28: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Similarity-based generalizationFeature-based generalization

Penalizing sweeping generalizations

Put differently:

24 −son−cont+voi

3524 −syl−cons+labial

35 describes a small set of

possible sequences (bw, dw, gw)

If such sequences are legal, the probability of finding any oneof them at random is 0.33

The set of

24 +consonantal−nasal−lateral

3524 +consonantal−nasal−strident

35 sequences is large

(≈ 16 × 11, or 176 possibilities); the chance of getting [bd]at random < 0.006

Although both are somewhat supported, the chances ofactually encountering [bw] are much higher than [bd]

Adam Albright Bias in novel onset clusters (28/80)

Page 29: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Similarity-based generalizationFeature-based generalization

Learning strategy

Find descriptions that make the training data as likely aspossible

“English words conform to certain shapes because they haveto, not out of sheer coincidence”

Related to OT ranking principles that seek the mostrestrictive possible grammar (Prince & Tesar 2004; Hayes2004); also related to Bayesian inference, and Maximumlikelihood estimation (MLE)

Adam Albright Bias in novel onset clusters (29/80)

Page 30: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Similarity-based generalizationFeature-based generalization

Trying out the intuition

A simple-minded implementation: instantiation costs

Simple bigram model:

Probability of sequence ab

=Number of times ab occurs in corpus

Total number of two-item sequences

Stated over natural classes:

Probability of sequence ab, where a ∈ class x, b ∈ class y

∝ Number of times xy occurs in corpus

Total number of two-item sequences

× Prob(choosing a from x)

× Prob(choosing b from y)

Adam Albright Bias in novel onset clusters (30/80)

Page 31: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Similarity-based generalizationFeature-based generalization

Trying out the intuition

Probability of a particular instantiation a of a natural class x

Simple:1

Size of x (i.e., number of members)

Weighted: Relative frequency of a × 1

size of x

+ I will use unweighted values here

Adam Albright Bias in novel onset clusters (31/80)

Page 32: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Similarity-based generalizationFeature-based generalization

Example: probability of [bw]

Probability of [bw] using

24 −son−cont+voi

3524 −syl−cons+labial

35= Prob(

24 −son−cont+voi

3524 −syl−cons+labial

35)

× Prob([b]|vcd stops)

× Prob([w]|labial glides)

(relatively high)

Adam Albright Bias in novel onset clusters (32/80)

Page 33: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Similarity-based generalizationFeature-based generalization

Example: probability of [bw]

Probability of [bw] using

24 +cons−nas−lat

3524 +cons−nas−strid

35= Prob(

24 +cons−nas−lat

3524 +cons−nas−strid

35)

× Prob([b]|non-nas/lat C’s)

× Prob([w]|non-nas/strid C’s)

(very low)

Adam Albright Bias in novel onset clusters (33/80)

Page 34: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Similarity-based generalizationFeature-based generalization

Parsing strings of segments

Given multiple possible ways to parse a string of segments, find theone with the highest probability (Coleman and Pierrehumbert1997; Albright and Hayes 2002)

[bw] can find good support from

24 −son−cont+voi

3524 −syl−cons+labial

35[bd] has no allies that provide such a close fit; it must rely on

broader (and weaker) generalizations like

24 +cons−nas−lat

3524 +cons−nas−strid

35

Adam Albright Bias in novel onset clusters (34/80)

Page 35: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Similarity-based generalizationFeature-based generalization

Local summary

Procedure for exploring which sequences of natural classes arebest supported by the data

Result: a set of statements about relative likelihood ofdifferent sequences

Adam Albright Bias in novel onset clusters (35/80)

Page 36: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Similarity-based generalizationFeature-based generalization

Testing the model

Albright and Hayes (2003) data: Albright (in prep.) data:(r(92) = .759) (r(70) = .454)

R2 = 0.58

2.5

3

3.5

4

4.5

5

5.5

6

6.5

-18 -13 -8

Natural class model predicted (log prob)

Mea

n r

atin

g (

1=lo

w, 7

=h

igh

)

R2 = 0.21

2.5

3.0

3.5

4.0

4.5

5.0

5.5

6.0

6.5

-15 -13 -11 -9 -7

Natural class model predicted (log prob)M

ean

rat

ing (

1=lo

w, 7

=h

igh

)

Also a reasonable first pass at modeling attested combinations

Quite a bit better for Albright & Hayes (2003) data

Fairly even performance across the range of ratings

Adam Albright Bias in novel onset clusters (36/80)

Page 37: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Similarity-based generalizationFeature-based generalization

Summary of this section

Two different models of gradient acceptability (whole-wordsimilarity, segmental features)

Both provide decent models of attested sequences

See Albright (in prep.) for arguments that feature-based modelmay be overall better suited to the task

These results are encouraging, but not too surprising givenbody of literature showing correlations between acceptabilityand degree of attestation

Next: attempt to extend this result to unattested sequences

Adam Albright Bias in novel onset clusters (37/80)

Page 38: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Scholes (1966)#bw � #bn � #bd, #bz#bw � #dl#sp � #sk

Preferences among onset clusters

Numerous studies have investigated relative acceptability ofunattested clusters using novel words

Greenberg & Jenkins (1964); Scholes (1966); Pertz & Bever (1975);

Coleman & Pierrehumbert (1997); Moreton (2002); Hay,

Pierrehumbert & Beckman (2004); Davidson (2006); Haunz (2007);

Berent et al., (in press)

Goal of this section

Examine a selection of findings from this literature, testing theextent to which observed preferences can be predicted bymodels based on the set of existing clusters

Adam Albright Bias in novel onset clusters (38/80)

Page 39: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Scholes (1966)#bw � #bn � #bd, #bz#bw � #dl#sp � #sk

Reason to think some cluster preferences could be learned

Hayes and Wilson (to appear)

Preliminary demonstration that some preferences amongunattested clusters may indeed be learnable

Trained variety of inductive and similarity-based models on setof existing English onset clusters

Tested on ability to predict acceptability of novel words, withmix of attested and unattested clusters (Scholes 1966)

Found impressively good fits (r > .83), particularly for theirown model (r = .946)

Plot shown in Bruce’s slides yesterday

Adam Albright Bias in novel onset clusters (39/80)

Page 40: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Scholes (1966)#bw � #bn � #bd, #bz#bw � #dl#sp � #sk

Strategy here

A first task

Show that feature-based model also does fairly well onScholes (1966) data

Test models on more specific comparisons

#bw > #bn > #bd (Berent et al., in press; Albright, in prep)#bw > #dl (Moreton 2002)#sp > #sk (Scholes 1966)

Adam Albright Bias in novel onset clusters (40/80)

Page 41: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Scholes (1966)#bw � #bn � #bd, #bz#bw � #dl#sp � #sk

Scholes (1966)

Asked 7th graders about acceptability of words with bothattested and unattested onset clusters

plung [pl2N], shpale [SpeIl], fkeep [fki:p], ztin [ztIn], zhnet [ZnEt], . . .

For each word, counted how many participants deemedpossible as an English word—e.g.:

kr2n, stIn 100%bl2N, slr

"k 84%

gl2N, Sr2n 72%nl2N, sr2n 47%nr2N, zr2n 33%vtIn, fnEt 19%ZpeIl, vmæt 11%vki:p, Zvi:l 0%

Adam Albright Bias in novel onset clusters (41/80)

Page 42: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Scholes (1966)#bw � #bn � #bd, #bz#bw � #dl#sp � #sk

Test 1: whole word acceptability

Trained models on English words from CELEXTested on Scholes non-words → ratings for entire monosyllableModels’ predictions rescaled as in Hayes and Wilson

GNM: (r(60) = .756) Natural class model: (r(60) = .503)

pl

klbl

gl

ml

fl sl

Sl

vl

zlZl

pr trkr

brdrgr

mr

nr

fr

sr

Sr

vr

zrZr

fp

sp

SpvpzpZp

ft

st

St

vtzt

Ztfk

sk

Skvk zkZk fm

sm

Sm

vmzmZm fn

sn

Sn

vnzn

Zn

sfSf fs

fS

zvZv

vz

0.5

1Observed

.4 .6 .8 1Predicted

pl

klbl

gl

ml

flsl

Sl

vl

zlZl

prtrkr

brdr

gr

mr

nr

fr

sr

Sr

vr

zr

Zrfp

sp

Spvp zpZp

ft

st

St

vt

zt

Ztfk

sk

Skvk zkZk fm

sm

Sm

vmzmZm fn

sn

Sn

vn

zn

Zn

sfSf fs

fS

zv

Zv

vz

0.2

.4.6

.81

Observed

!6 !5.5 !5 !4.5 !4 !3.5Predicted

Adam Albright Bias in novel onset clusters (42/80)

Page 43: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Scholes (1966)#bw � #bn � #bd, #bz#bw � #dl#sp � #sk

Test 2: Onset acceptability

Used training corpus from Hayes and Wilson (to appear)Word-initial onsets from CMU pronouncing dictionary, with“exotic” onsets removed (#sf, #zw, etc.)

Results are considerably better

GNM: (r(60) = .881) Natural class model: (r(60) = .830)

pl

klbl

gl

ml

flsl

Sl

vl

zlZl

prtrkr

brdrgr

mr

nr

fr

sr

Sr

vr

zrZrfp

sp

SpvpzpZp

ft

st

St

vtzt

Ztfk

sk

SkvkzkZkfm

sm

Sm

vmzmZm fn

sn

Sn

vnzn

Zn

sfSf fs

fS

zvZv

vz

0.2

.4.6

.81

Observed

0 .2 .4 .6 .8 1Predicted

pl

klbl

gl

ml

flsl

Sl

vl

zlZl

prtrkr

brdr

gr

mr

nr

fr

sr

Sr

vr

zrZr

fp

sp

Spvp zpZp

ft

st

St

vtzt

Ztfk

sk

Skvk zkZk fm

sm

Sm

vmzmZm fn

sn

Sn

vnzn

Zn

sfSffs

fS

zvZv

vz

0.2

.4.6

.81

Observed

0 .2 .4 .6 .8 1Predicted

Adam Albright Bias in novel onset clusters (43/80)

Page 44: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Scholes (1966)#bw � #bn � #bd, #bz#bw � #dl#sp � #sk

Points to note

Best results emerge if we assume that subjects based theirresponses mainly on onset clusters

Not implausible! Scholes used just a few rhymesSendlmeier (1987): subjects focus on salient part of test items

Even with this assumption, neither model achieves as good alinear fit as Hayes and Wilson’s model

Bimodal distribution; numerical fits hard to interpret

Nevertheless, all models make headway in predictingcluster-by-cluster preferences

As well or better than on attested sequencesIf they do this well in predicting other preferences, we’dconclude that there’s a good chance they’re learnable

Adam Albright Bias in novel onset clusters (44/80)

Page 45: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Scholes (1966)#bw � #bn � #bd, #bz#bw � #dl#sp � #sk

Preference for sonority-rising clusters

Berent & al (in press)

English speakers prefer #bn � #bd

As discussed above, this does not appear to mirror anyobvious statistical difference between [bn] and [bd] in English

Adam Albright Bias in novel onset clusters (45/80)

Page 46: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Scholes (1966)#bw � #bn � #bd, #bz#bw � #dl#sp � #sk

Some new experimental data

40 non-words with {p, b}-initial clusters

#bl, #br, #bw, #bn, #bz, #bd; #pl, #pr, #pw, #pn, #pt

Paired with variety of rhymes, controlling neighborhooddensity and bigram probability as much as well (full list inAppendix)

Rated for plausibility as English words, in same experiment as70 wug words used above for benchmarking models onattested sequences

Adam Albright Bias in novel onset clusters (46/80)

Page 47: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Scholes (1966)#bw � #bn � #bd, #bz#bw � #dl#sp � #sk

Preference for sonority-rising clusters

Results

Acceptability ratings and repetition accuracy both mirror C2sonority: #bw � #bn � #bd, #bz

1

1.5

2

2.5

3

bw bn bz bd

Cluster

Mea

n r

atin

g (

1=

low

, 7=

hig

h)

86%

88%

90%

92%

94%

96%

98%

100%

Per

cen

t co

rrec

t

Mean rating

% correct repetitions

Adam Albright Bias in novel onset clusters (47/80)

Page 48: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Scholes (1966)#bw � #bn � #bd, #bz#bw � #dl#sp � #sk

Similarity-based generalization (GNM)

As above, trained both on whole words (CELEX) and onsetcorpus (Hayes & Wilson)

Results: #bn � #bd, #bz predicted correctly over wholewords, not onsets

a. Whole word predictions b. Onsets only

0.00

0.01

0.01

0.02

0.02

0.03

0.03

0.04

bw bn bd bz

Cluster

Pre

dic

ted

sco

re (

arb

itra

ry u

nit

s)

0

20

40

60

80

100

bw bn bz bd

Cluster

Pre

dic

ted

sco

re (

arb

itra

ry u

nit

s)

Adam Albright Bias in novel onset clusters (48/80)

Page 49: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Scholes (1966)#bw � #bn � #bd, #bz#bw � #dl#sp � #sk

Feature-based generalization (Natural class model)

In this case, similar predictions regardless of whether trainedon whole words or onsets only

Results are similar as GNM whole-word predictions

a. Whole word predictions b. Onsets only

-12.25

-12.05

-11.85

-11.65

-11.45

-11.25

bw bn bd bz

Cluster

Pre

dic

ted

sco

re (

log p

rob

)

-2.30

-2.25

-2.20

-2.15

-2.10

-2.05

-2.00

bw bn bz bd

Cluster

Pre

dic

ted

sco

re (

log p

rob

)

Adam Albright Bias in novel onset clusters (49/80)

Page 50: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Scholes (1966)#bw � #bn � #bd, #bz#bw � #dl#sp � #sk

Feature-based generalization (Natural class model)

#bn � #bd, #bz predicted correctly

P(

24 +labial−son−contin

35»+coronal+son

–) > P(

24 +labial−son−contin

35»+coronal+voice

–) ©

#bw � #bn not correctly predicted

P(

24 +labial−son−contin

35»+labial+son

–) < P(

24 +labial−son−contin

35»+coronal+son

–) §

Adam Albright Bias in novel onset clusters (50/80)

Page 51: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Scholes (1966)#bw � #bn � #bd, #bz#bw � #dl#sp � #sk

Discussion

Preference for #bn � #bd tends to emerge from all models

Similar preference also predicted by Hayes and Wilson modelBerent et al. bniff > bdiff preference is correctly predicted

However, #bw � #bn preference not consistently predicted

Hayes & Wilson model is the only model to predict direction ofpreference (Hayes, p.c.)

Also (not shown): #pn � #pt, #ps not consistentlypredicted

Feature-based bigram model predicts correctly; GNM andHayes & Wilson models do not

Adam Albright Bias in novel onset clusters (51/80)

Page 52: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Scholes (1966)#bw � #bn � #bd, #bz#bw � #dl#sp � #sk

A negative result?

Models under consideration here capture certain aspects ofspeaker preferences, but no model consistently predicts fullrange of preferences

Must be seen against backdrop of positive results in previoussections

All of these models do well on preferences among attestedclusters (benchmarking data)Models also make significant headway on unattested clusters,at broad pass (Scholes 1966 data)Failure is specifically in predicting preferences like #bw > #bn

Adam Albright Bias in novel onset clusters (52/80)

Page 53: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Scholes (1966)#bw � #bn � #bd, #bz#bw � #dl#sp � #sk

A negative result?

The failure is interpretable!

Human ratings reflect preference for stops to be followed bysegments that support perceptible bursts, formant transitions(vowels > glides > liquids > nasals > obstruents)

Adam Albright Bias in novel onset clusters (53/80)

Page 54: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Scholes (1966)#bw � #bn � #bd, #bz#bw � #dl#sp � #sk

A negative result?

A suggestive result

Although this by no means proves that humans have a substantivebias for stops to be followed by sonorous segments, it shows thatcurrent statistical models falter precisely where such biases wouldbe helpful

The positive payoff of incorporating such a bias will bediscussed shortly

Adam Albright Bias in novel onset clusters (54/80)

Page 55: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Scholes (1966)#bw � #bn � #bd, #bz#bw � #dl#sp � #sk

A related preference: #bw � #dl

Moreton (2002)

Perceptual bias against hearing #dl when presented withambiguous (dl∼gl) tokens

Corresponding bias against #bw is weaker or non-existent

Adam Albright Bias in novel onset clusters (55/80)

Page 56: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Scholes (1966)#bw � #bn � #bd, #bz#bw � #dl#sp � #sk

A related preference: #bw � #dl

Preference not predicted by any of the models considered here

Natural class model:

P(

24 +labial−son−contin

35»+labial+son

–) < P(

»−son−contin

–[+approx]) §

GNM:

#dl gets support from #bl, #gl#bw gets less support from #dw, #gw §

Hayes and Wilson model:

*[+labial]

[ˆ +approx

+coronal

]� *

[+coronal−strident

][+cons] §

Adam Albright Bias in novel onset clusters (56/80)

Page 57: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Scholes (1966)#bw � #bn � #bd, #bz#bw � #dl#sp � #sk

A related preference: #bw � #dl

Preference for #bw � #bn plausibly supported by markednessconsiderations

#dl/#gl more confusable than #bw/#gw (?)

Or perhaps an articulatory constraint against coronal+lsequences (?)

Adam Albright Bias in novel onset clusters (57/80)

Page 58: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Scholes (1966)#bw � #bn � #bd, #bz#bw � #dl#sp � #sk

Upshot of this section

An initially encouraging result: relative acceptability of #bnover #bd can indeed be supported by comparison to similarstop+sonorant combinations

However, this result fails to extend to other combinations withfavorable sonority profiles, including #pn and #bw

Adam Albright Bias in novel onset clusters (58/80)

Page 59: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Scholes (1966)#bw � #bn � #bd, #bz#bw � #dl#sp � #sk

A detail from Scholes (1966): #sp � #sk

#sp: accepted by 29/33 subjects

#sk: accepted by 20/33 subjects

Unfortunately, different rhymes were used for these twoclusters (spale, skeep)

Evidence above suggests rhymes did not influence responses inScholes’ study muchIf they did, -ale should be worse than -eep (fewer neighbors,lower bigram probability)

Adam Albright Bias in novel onset clusters (59/80)

Page 60: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Scholes (1966)#bw � #bn � #bd, #bz#bw � #dl#sp � #sk

A detail from Scholes (1966): #sp � #sk

Dispreference for sk is found elsewhere, as well

Cozier (2007): final -sk simplified to -s in Trinidadian English,but final -sp preserved

Historical change: context-free sk > S (OE, OHG)

Adam Albright Bias in novel onset clusters (60/80)

Page 61: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Scholes (1966)#bw � #bn � #bd, #bz#bw � #dl#sp � #sk

Testing #sp � #sk

Preference for #sp > #sk not predicted by any model tested here

Nature class-based model: treats both equally +coronal+continuant−voice

[−continuant−voice

GNM trained on corpus onsets: nearly equal support

#sp (1.0623) vs. #sk (1.0618) §

Hayes and Wilson model: equal support

Both receive perfect scores (no violations) §

Adam Albright Bias in novel onset clusters (61/80)

Page 62: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Scholes (1966)#bw � #bn � #bd, #bz#bw � #dl#sp � #sk

A possible phonetic basis?

Cozier (2007): Anticipatory coarticulation from lip closurealters [s] in [sp] → additional cues for stop place

Greatest benefit in final position, when no following vowel

Perhaps small added benefit in prevocalic position isnonetheless helpful?

Adam Albright Bias in novel onset clusters (62/80)

Page 63: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Scholes (1966)#bw � #bn � #bd, #bz#bw � #dl#sp � #sk

Davidson (2007): #fC � #zC � #vC

Productions: approx. avg. performance (est. from graphs)#fC: ≈64–70%#zC: ≈57–58%#vC: ≈30–36%

Trained natural classes on Hayes & Wilson onsets-only corpus

Tested on #fn, #ft, #zn, #zt, #vn, #vt; averaged over n,t

-5.4

-5.3

-5.2

-5.1

-5.0

-4.9

-4.8

fC zC vC

Mea

n pr

edic

ted

(log

prob

)

Adam Albright Bias in novel onset clusters (63/80)

Page 64: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Scholes (1966)#bw � #bn � #bd, #bz#bw � #dl#sp � #sk

Davidson (2007): #fC � #zC � #vC

Points to note

Predicted order doesn’t hold of #Cn and #Ct independently

#fn � #vn � #zn#zt � #ft � #vt (!!)

Focusing on /# n context, voiceless � voiced predictedsuccessfully, but #zn needs a boost

#pl, #pr, #fl, #fr → [labial obstr][coronal sonorant]*#tl, *#Tl, *#sr remove support for [anterior obstr][cor son]Plausible boost: advantage of extra amplitude of frication

For /# t context, voicing agreement bias is needed

Despite initially promising results, details may reveal useful role forphonetically motivated biases

Adam Albright Bias in novel onset clusters (64/80)

Page 65: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Incorporating prior biases

Results of preceding section have been largely negative(unlearnable preferences)

In each case, the failure of the models mirrors a possiblephoentically-motivated bias

Goal of this section: demonstrate how incorporating learnedstatistical generalizations with prior markedness biases canprovide a successful overall model

Approach follows Wilson (2006): side-by-side comparison ofmodels with/without explicit markedness bias

Adam Albright Bias in novel onset clusters (65/80)

Page 66: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

The sonority bias

Requirement of interest here: stops should be followed bymore sonorous segments

Plausible restatement in phonetically grounded terms: stopsshould be followed by segments which. . .

Support perception of burst and VOTSupport perception of formant transitions

These requirements favor following segments which

Are strongly voicedDo not interfere with formant transitions, either byblurring/removing them (nasals) or providing independenttargets (l, r, w, to varying extents)

Adam Albright Bias in novel onset clusters (66/80)

Page 67: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

The sonority bias

For now, I will treat these as independent requirements

C2 sonority: violations reflect availability of voiced formants

C2 son

glide *liquid **nasal ******obstruent *******

Non-antagonistic place combinations: violated by pw/bw, tl/dl

Ultimately, may be better combined into a single condition onpossible contrasts (Flemming 2004, and others)

Adam Albright Bias in novel onset clusters (67/80)

Page 68: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Baseline: statistical knowledge or markedness alone

Considered separately, neither the inductive model nor themarkedness bias is sufficient to model human preferences

Statistical model doesn’t capture systematic sonority bias

Markedness bias ignores differences between rhymes

a. Statistical preferences alone b. Sonority preference alone(r(38) = .182, n.s.)

bdutebdussbdeek

bluteblodd

bluss bladblemp blig

bnussbneen

bnodd

brelth

brenth

bwudd

bwadd

bwodd

bzuss bzeen

bzikebzodd

plakepleen

pleekplim

pneppneek

pneen

prundge

pruptpresp

pteen

ptadptussptep

pwetpwist

pwusspwadd

pwuds

12

34

5M

ean

ratin

g (1

=low

, 7=h

igh)

!14 !13 !12 !11 !10Model predicted (arbitrary units)

bdutebdussbdeek

blutebloddblussbladblempblig

bnussbneenbnodd

brelth

brenth

bwuddbwaddbwodd

bzussbzeen

bzikebzodd

plakepleenpleekplim

pneppneekpneen

prundge

pruptpresp

pteen

ptadptussptep

pwetpwistpwusspwadd

pwuds1

23

45

Mea

n ra

ting

(1=l

ow, 7

=hig

h)

0 2 4 6Model predicted (arbitrary units)

Adam Albright Bias in novel onset clusters (68/80)

Page 69: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Combining statistical and markedness preferences

Relative importance of various preferences determined posthoc using a Generalized Linear Model, determining optimalweights (coefficients) by maximum likelihood estimation

When markedness constraints are added to statisticalpreferences, a very accurate overall model is obtained

Adam Albright Bias in novel onset clusters (69/80)

Page 70: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Combining statistical and markedness preferences

Statistical + sonority preferences: r(38) = .733

bdutebdussbdeek

bluteblodd

blussbladblemp blig

bnussbneen

bnodd

brelth

brenth

bwuddbwadd

bwodd

bzussbzeen

bzikebzodd

plakepleen

pleekplim

pneppneek

pneen

prundge

pruptpresp

pteen

ptadptussptep

pwetpwistpwuss

pwadd

pwuds

12

34

5M

ean

ratin

g (1

=low

, 7=h

igh)

1 2 3 4Model predicted (arbitrary units)

Adam Albright Bias in novel onset clusters (70/80)

Page 71: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Combining statistical and markedness preferences

Adding *bw/dl: r(38) = .971

bdutebdussbdeek

bluteblodd

blussbladblemp blig

bnussbneen

bnodd

brelth

brenth

bwuddbwadd

bwodd

bzussbzeen

bzikebzodd

plakepleen

pleekplim

pneppneek

pneen

prundge

pruptpresp

pteen

ptadptussptep

pwetpwistpwuss

pwadd

pwuds

12

34

5M

ean

ratin

g (1

=low

, 7=h

igh)

1 2 3 4 5Model predicted (arbitrary units)

Adam Albright Bias in novel onset clusters (71/80)

Page 72: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Combining statistical and markedness preferences

Points to note

Payoff of “sonority jump” from n to l

Mimics jump in ratings between #bl and #bnPossibly just due to attested/unattested differenceHappens to correspond to significant difference in availabilityof formant transitions—perhaps not coincidental that optimalfunction has this form?

Bias against lT], nT] would further improve fit

Items like prundge, brelth, brenth not part of original designFiller items, part of replication of Bailey and Hahn (2001)Included in analysis here for sake of completeness

Adam Albright Bias in novel onset clusters (72/80)

Page 73: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Combining analogical and markedness preferences

In this case, similar results can be had with combination of analogy+ markedness (r(38) = .969)

bdbdbd

blbl

blblblbl

bnbnbn

br

br

bw

bw

bw

bzbz

bzbz

plpl

plpl

pnpnpn

pr

prpr

pt

ptptpt

pwpw

pwpw

pw

12

34

5M

ean

ratin

g (1

=low

, 7=h

igh)

2 3 4 5 6Model predicted (arbitrary units)

Adam Albright Bias in novel onset clusters (73/80)

Page 74: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

“Experience trumps typology”?

Hammond (yesterday): “Experience trumps typology?”

Results here show relatively greater effect of phonetic biases,lesser effect of learned statistical generalizations

Full model:

Coeff. Std. Err. z Sig.

Stat. model .2344 .0529 4.43 p < 0.0001C2 sonority .5814 .0248 23.47 p < 0.0001OCP 2.4711 .1559 15.85 p < 0.0001Const. 4.4536 .6268 7.11 p < 0.0001

Adam Albright Bias in novel onset clusters (74/80)

Page 75: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

The deus ex machina of markedness constraints

It seems impossible to know a what point a less biasedapproach is doomed, and when a markedness-basedexplanation is motivated

Benefit of attacking the problem from both endsPerhaps revealing that best markedness bias is one that reflectsquantitative phonetic differences in availability of cues?

Large distance between liquids and nasals

Assess concrete performance of combined model, ashand-crafted “standard” for less stipulative models to strive for

Not a proof that prior markedness bias is required

Merely a demonstration that current best model is one thatincorporates it

Adam Albright Bias in novel onset clusters (75/80)

Page 76: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

The positive result

Two models that do reasonably well on modeling preferencesamong attested clusters

GNM and natural class-based model both do fairly well onbenchmarking data

See Albright (in prep) for arguments that natural class modelmay ultimately be superior

Adam Albright Bias in novel onset clusters (76/80)

Page 77: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

The negative result

Attempts to infer preferences for certain unattested clusters basedon attested data: mixed results

Some preferences evidently inferable given corpus of existingEnglish forms (e.g., #bn � #bd)

Other preferences are not, at least given currently availablemodels (#bw � #bn, #bw � #dl, #sp > #sk)

Adam Albright Bias in novel onset clusters (77/80)

Page 78: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

What to conclude?

What do we conclude from this?

Certainly, it is not possible to exclude the possibility that abetter model might succeed where these models have failed

Many different avenues to explore

More refined approaches to evaluating support forcombinations of natural classesDifferent sets of phonological featuresDifferent syntax for referring to combinations of segmentsNot clear whether improvement will ultimately come fromincorporating biases directly, as suggested here, or from betterfeature sets and representations

Adam Albright Bias in novel onset clusters (78/80)

Page 79: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

An unsurprising lesson

Successful statistical models require a good theory of phonology

Right features and representations

Right way to apportion “credit” from data to hypotheses(Dresher 2003)

Right set of prior biases/constraints

Externally applied, as in current GLM analysisAs part regularization term in constraint weighting (Wilson2006)

Adam Albright Bias in novel onset clusters (79/80)

Page 80: Similarity, feature-based generalization and bias in novel ... · A similarity-based analogical model (Generalized Neighborhood Model; Bailey and Hahn 2001) Test models’ ability

Generalization from attested sequencesPreferences among unattested sequencesCombining prior and learned preferences

Future directions

Research program outlined here is preliminary attempt to build aframework for comparing and testing hypotheses about thesedifferent components

Broad base of data for benchmarking inductive models withphonological features and representations, but no explicitmarkedness biases

Quantitative test of gain from incorporating different pieces oftheoretical machinery (Gildea and Jurafsky 1996; Hayes andWilson, to appear)

Adam Albright Bias in novel onset clusters (80/80)