76
Parameter Setting

Parameter Setting

  • Upload
    zeke

  • View
    47

  • Download
    0

Embed Size (px)

DESCRIPTION

Parameter Setting. Compounding Parameter. ResultativesProductive N-N Compounds American Sign Language   Austrooasiatic (Khmer)   Finno-Ugric   Germanic (German, English)   Japanese-Korean   Sino-Tibetan (Mandarin)   - PowerPoint PPT Presentation

Citation preview

Page 1: Parameter Setting

Parameter Setting

Page 2: Parameter Setting

Compounding Parameter

Resultatives Productive N-N Compounds

American Sign Language Austrooasiatic (Khmer) Finno-Ugric Germanic (German, English) Japanese-Korean Sino-Tibetan (Mandarin) Tai (Thai) Basque Afroasiatic (Arabic, Hebrew) Austronesian (Javanese) Bantu (Lingala) Romance (French, Spanish) Slavic (Russian, Serbo-Croatian)

Page 3: Parameter Setting

Developmental Evidence

• Complex predicate properties argued to appear as a group in English children’s spontaneous speech (Stromswold & Snyder 1997)

• Appearance of N-N compounding is good predictor of appearance of verb particle constructions and other complex predicate constructions - even after partialing out contributions of– Age of reaching MLU 2.5

– Production of lexical N-N compounds

– Production of adjective-noun combinations

– Correlations are remarkably good

Page 4: Parameter Setting

Sample Learning Problems

1. Null-subject parameter

2. V2

3. Long-distance reflexives

4. Scope inversion

5. Argument structure alternations

6. Wh-scope marking

7. Complex predicates/noun-noun compounding

8. Condition C vs. Language-specific constraints

9. One-substitution

10. Preposition stranding

11. Disjunction

12. Subjacency parameteretc.

Page 5: Parameter Setting

Classic Parameter Setting

• Language learning is making pre-determined choices based on ‘triggers’ whose form is known in advance

• Challenge I: encoding and identifying reliable triggers• Challenge II: overgeneralization• Challenge III: lexically bound parameters

Page 6: Parameter Setting

The Concern…

• “From its inception, UG has been regarded as that which makes acquisition possible. But for lack of a thriving UG-based account of acquisition, UG has come to be regarded instead as an irrelevance or even an impediment. It is clearly open to the taunt: ‘All that innate knowledge, only a few facts to learn, yet you can’t say how!’” (Fodor & Sakas, 2004, p.8)

Page 7: Parameter Setting

Input

Analyze Input

Action

utterance corpus

Page 8: Parameter Setting

Gibson & Wexler (1994)

• Triggering Learning Algorithm

– Learner starts with random set of parameter values

– For each sentence, attempts to parse sentence using current settings

– If parse fails using current settings, change one parameter value and attempt re-parsing

– If re-parsing succeeds, change grammar to new parameter setting

A+

+ -

-

B

Si

Page 9: Parameter Setting

Gibson & Wexler (1994)

• Triggering Learning Algorithm

– Learner starts with random set of parameter values

– For each sentence, attempts to parse sentence using current settings

– If parse fails using current settings, change one parameter value and attempt re-parsing

– If re-parsing succeeds, change grammar to new parameter setting

A+

+ -

-

B

SiGreediness Constraint

Single Value Constraint

Page 10: Parameter Setting

Gibson & Wexler (1994)

• For an extremely simple 2-parameter space, the learning task is easy - any starting point, any destination

• Triggers do not really exist in this model

VO SVO

OV

VOS

OVSSOV

SV VS

Page 11: Parameter Setting

Gibson & Wexler (1994)

• Extending the space to 3-parameters– There are non-adjacent grammars– There are local maxima, where current grammar and all

neighbors fail

VO

SV VS

SVO

OV

VOS

OVSSOV-V2

+V2

Page 12: Parameter Setting

Gibson & Wexler (1994)

• Extending the space to 3-parameters– There are non-adjacent grammars– There are local maxima, where current grammar and all

neighbors fail

VO

SV VS

SVO

OV

VOS

OVSSOV-V2

+V2

String:Adv S V O

Page 13: Parameter Setting

Gibson & Wexler (1994)

• Extending the space to 3-parameters– There are non-adjacent grammars– There are local maxima, where current grammar and all

neighbors fail– All local maxima involve impossibility of retracting a +V2

hypothesis

VO

SV VS

SVO

OV

VOS

OVSSOV-V2

+V2

String:Adv S V O

Page 14: Parameter Setting

Input

Analyze Input

Action

utterance corpus

Page 15: Parameter Setting

Sample Learning Problems

1. Null-subject parameter

2. V2

3. Long-distance reflexives

4. Scope inversion

5. Argument structure alternations

6. Wh-scope marking

7. Complex predicates/noun-noun compounding

8. Condition C vs. Language-specific constraints

9. One-substitution

10. Preposition stranding

11. Disjunction

12. Subjacency parameteretc.

Page 16: Parameter Setting

Gibson & Wexler (1994)

• Solutions to local maxima problem– #1: Initial state is

V2: default to [-V2]S: unsetO: unset

– #2: Extrinsic ordering

VO

SV VS

SVO

OV

VOS

OVSSOV-V2

+V2

String:Adv S V O

Page 17: Parameter Setting

Fodor (1998)

• Unambiguous Triggers– Local maxima in TLA result from the use of ‘ambiguous triggers’

– If learning only occurs based on unambiguous triggers, local maxima should be avoided

• Difficulties– How to identify unambiguous triggers?

– Unambiguous trigger can only be parsed by a grammar that includes value Pi of parameter P, and by no grammars that include value Pj.

– A parameter space with 20 binary parameters implies 220 parses for any sentence.

Page 18: Parameter Setting

Fodor (1998)

• Ambiguous Trigger

– SVO can be analyzed by at least 5 of the 8 grammars in G&W’s parameter space

VO SVO

OV

VOS

OVSSOV-V2

+V2

Page 19: Parameter Setting

Fodor (1998)

• Structural Triggers Learner (STL)

– Parameters are treelets

– Learner attempts to parse input sentences using supergrammar, that contains treelets for all values of all unset parameters, e.g., 40 treelets for 20 unset binary parameters.

– Algorithm• #1: Adopt a parameter value/trigger structure if and only if it occurs as a part of

every complete well-formed phrase marker assigned to an input sentence by the parser using the supergrammar.

• #2: Adopt a parameter value/trigger structure if and only if it occurs as a part of a unique complete well-formed phrase marker assigned to the input by the parser using the supergrammar.

Page 20: Parameter Setting

Fodor (1998)

• Structural Triggers Learner

– If a sentence is structurally ambiguous, it is taken to be uninformative (slightly wasteful, but conservative)

– Unable to take advantage of collectively unambiguous sets of sentences, e.g. SVO and OVS, which entail [+V2]

– Still unclear (to me) how it manages its parsing task

Page 21: Parameter Setting

Input

Action

utterance corpus

Generate Parses

Select Parses

grammar supergrammar

Page 22: Parameter Setting

Charles Yang

Page 23: Parameter Setting
Page 24: Parameter Setting

Competition Model

• 2 grammars - start with even strength, both get credit for success, one gets punished for failure

• Each grammar is chosen for parsing/production as a function of its current strength

• Must be that increasing Pi for one grammar decreases Pj for other grammars

• Is it the case that the presence of some punishment will guarantee that a grammar will, over time, always fail to survive?

Page 25: Parameter Setting

Competition Model

• Upon the presence of an input datum s, the child– Selects a grammar Gi with the probability pi.

– Analyzes s with Gi.

– Updates competition• If successful, reward Gi by increasing pi.

• Otherwise, punish Gi by decreasing pi.

• This implies that change only occurs when a selected grammar succeeds or fails

Page 26: Parameter Setting

Competition Model

• Linear reward-penalty scheme (LR-P, Bush & Mosteller, 1951)

– Given an input sentence s, the learner selects a grammar Gi with probability pi, from the population of N possible grammars.

– If Gi --> s then

• p’i = pi + (1 - pi)

• p’j = (1 - )pj

– If Gi -/-> s then

• p’i = (1 - )pi

• p’j = /(N - 1) + (1 - )pj

Page 27: Parameter Setting

Competition Model

• Linear reward-penalty scheme (LR-P, Bush & Mosteller, 1951)

– Given an input sentence s, the learner selects a grammar Gi with probability pi, from the population of N possible grammars.

– If Gi --> s then

• p’i = pi + (1 - pi)

• p’j = (1 - )pj

– If Gi -/-> s then

• p’i = (1 - )pi

• p’j = /(N - 1) + (1 - )pj

The value p is a probability for theentire grammar.

This rule suggests that all grammarsare affected on each trial, not onlythe grammar that is currently beingtested. Other discussions in the bookdo not clearly make reference to this.

Page 28: Parameter Setting

From Grammars to Parameters

• Number of Grammars problem– Space with n parameters implies at least 2n grammars (e.g.

240 is ~1 trillion)– Only one grammar is used at a time, so implies very slow

convergence

• Competition among Parameter Values– How does this work?

• Each trial involves selection of a vector of parameters[0, 1, 1, 0, 0, 1, 1, 1, …]

• Success or failure rewards/punishes all parameters, regardless of their complicity in the outcome of the trial

– Naïve Parameter Learning model (NPL) may reward incorrect parameter values as hitchhikers, or punish correct parameter values as accomplices.

Page 29: Parameter Setting

Avoiding Accomplices

• How could ill-placed reward/punishment be avoided?

– Identify which parameters are responsible for success/failure on a given trial

– Parameters associated with lexical items/treelets

Page 30: Parameter Setting

Empirical Predictions

• HypothesisTime to settle upon target grammar is a function of the frequency of sentences that punish the competitor grammars

• First-pass assumptions– Learning rate set low, so many occurrences needed to lead

to decisive changes– Similar amount of input needed to eliminate all competitors

Page 31: Parameter Setting

Empirical Predictions

• ±wh-movement

– Any occurrence of overt wh-movement punishes a [-wh-mvt] grammar

– Wh-questions are highly frequent in input to English-speaking children (~30% estimate!)

– [±wh-mvt] parameter should be set very early

– This applies to clear-cut contrast between English and Chinese, but …

• French: [+wh-movement] and lots of wh-in-situ• Japanese: [-wh-movement] plus scrambling

Page 32: Parameter Setting

Empirical Predictions

• Verb-raising

– Reported to be set accurately in speech of French children (Pierce, 1992)

Page 33: Parameter Setting

French: Two Verb Positions

a. Il ne voit pas le canard

he sees not the duck

b. *Il ne pas voit le canard

he not sees the duck

c. *Il veut ne voir pas le canard

he wants to.see not the duck

d. Il veut ne pas voir le canard

he wants not to.see the duck

Page 34: Parameter Setting

French: Two Verb Positions

a. Il ne voit pas le canard

he sees not the duck

b. *Il ne pas voit le canard

he not sees the duck

c. *Il veut ne voir pas le canard

he wants to.see not the duck

d. Il veut ne pas voir le canard

he wants not to.see the duck

agreeing (i.e. finite) forms precede pas

non-agreeing (i.e. infinitive) forms follow pas

Page 35: Parameter Setting

French Children’s Speech

• Verb forms: correct or default (infinitive)

• Verb position changes with verb form

• Just like adults

127 119

finite infinitive

(Pierce, 1992)

Page 36: Parameter Setting

French Children’s Speech

• Verb forms: correct or default (infinitive)

• Verb position changes with verb form

• Just like adults

122

124

V-neg

neg-V

(Pierce, 1992)

Page 37: Parameter Setting

French Children’s Speech

• Verb forms: correct or default (infinitive)

• Verb position changes with verb form

• Just like adults

121 1

6 118

finite infinitive

V-neg

neg-V

(Pierce, 1992)

Page 38: Parameter Setting

Empirical Predictions

• Verb-raising

– Reported to be set accurately in speech of French children (Pierce, 1992)

– Crucial evidence

… verb neg/adv …

– Estimated frequency in adult speech: ~7%

– This frequency set as an operational definition of sufficiently frequent for early mastery (early 2’s)

Page 39: Parameter Setting

Empirical Predictions

• Verb-second

– Classic argument: V2 is mastered very early by German/Dutch speaking children (Poeppel & Wexler, 1993, Hageman, 1995)

– Yang’s challenges

• Crucial input is infrequent

• Claims of early mastery are exaggerated

Page 40: Parameter Setting

Two Verb Positions

a. Ich sah den Mann

I saw the man

b. Den Mann sah ich

the man saw I

c. Ich will [den Mann sehen]

I want the man to.see[inf]

d. Den Mann will ich [sehen]

the man want I to.see[inf]

Page 41: Parameter Setting

Two Verb Positions

a. Ich sah den Mann

I saw the man

b. Den Mann sah ich

the man saw I

c. Ich will [den Mann sehen]

I want the man to.see[inf]

d. Den Mann will ich [sehen]

the man want I to.see[inf]

agreeing verbs (i.e.finite verbs) appear insecond position

non-agreeing verbs (i.e.infinitive verbs) appear in final position

Page 42: Parameter Setting

German Children’s Speech

• Verb forms: correct or default (infinitive)

• Verb position changes with verb form

• Just like adults

208 43

finite infinitive

Andreas, age 2;2(Poeppel & Wexler, 1993)

Page 43: Parameter Setting

German Children’s Speech

• Verb forms: correct or default (infinitive)

• Verb position changes with verb form

• Just like adults

203

48

V-2

V-final

Andreas, age 2;2(Poeppel & Wexler, 1993)

Page 44: Parameter Setting

German Children’s Speech

• Verb forms: correct or default (infinitive)

• Verb position changes with verb form

• Just like adults

197 6

11 37

finite infinitive

V-2

V-final

Andreas, age 2;2(Poeppel & Wexler, 1993)

Page 45: Parameter Setting

Empirical Predictions

• Cross-language word orders

– Dutch: SVO, XVSO, OVS– Hebrew: SVO, XVSO, VSO– English: SVO, XSVO– Irish: VSO, XVSO– Hixkaryana: OVS, XOVS

– Order of elimination• Frequent SVO input quickly eliminates #4 and #5

• Relatively frequent XVSO input eliminates #3

• OVS is needed to eliminate #2 - only ~1.3% of input

Page 46: Parameter Setting

Empirical Predictions

• But what about the classic findings by Poeppel & Wexler (etc.)?

– They show mastery of V-raising, not mastery of V-2

– Yang argues that early Dutch shows lots of V-1 sentences, due to the presence of a Hebrew grammar (based on Hein corpus)

e.g. week ik niet [know I not]

Page 47: Parameter Setting

Empirical Predictions

• Early Argument Drop

– Resuscitates idea that early argument omission in English is due to mis-set parameter

– Overt expletive subjects (‘there’) ~1.2% frequency in input

Page 48: Parameter Setting

Null Subjects

• Child English– Eat cookie.

• Hyams (1986)– English children have an Italian setting of null-subject

parameter– Trigger for change: expletive subjects

• Valian (1991)– Usage of English children is different from Italian children

(proportion)

• Wang (1992)– Usage of English children is different from Chinese children

(null objects)

Page 49: Parameter Setting

Empirical Predictions

• Early Argument Drop

– Resuscitates idea that early argument omission in English is due to mis-set parameter

– Wang et al. (1992) argument about Chinese was based on mismatch in absolute frequencies between Chinese & English learners

– Yang: if incorrect grammar is used probabilistically, then absolute frequency match not expected - rather, ratios should match

– Ratio of null-subjects and null-objects is similar in Chinese and English learners

– Like Chinese, English learners do not produce wh-obj pro V?

Page 50: Parameter Setting

Empirical Predictions

• Null Subject Parameter Setting

– Italian environment• English [- null subject] setting killed off early, due to presence of

large amount of contradictory input

• Italian children should exhibit an adultlike profile very early

– English environment• Italian [+ null subject] setting killed off more slowly, since

contradictory input is much rarer (expletive subjects)

• The fact that null subjects are rare in the input seems to play no role

Page 51: Parameter Setting

Poverty of the Stimulus

• Structure-dependent auxiliary fronting

– Is [the man who is sleeping] __ going to make it to class?

– *Is [the man who __ sleeping] is going to make it to class?

• Pullum: relevant positive examples exist (in the Wall Street Journal)

• Yang: even if they do exist, they’re not frequent enough to account for mastery by children aged 3;2 in Crain & Nakayama’s experiments (1987)

Page 52: Parameter Setting

Other Parameters

• Noun compounding/complex predicates– English

• Novel N-N compounds punish Romance-style grammars

• Simple data can lead to elimination of competitors

– Spanish• Lack of productive N-N compounds is irrelevant

• Lack of complex predicate constructions is irrelevant

• How can the English (superset) grammar be excluded?

• Subset problem

Page 53: Parameter Setting

Subset Problem

• Subset problem is serious: all grammars are assumed to be present from the start

– How can the survivor model avoid the subset problem?

Li

Lj

Page 54: Parameter Setting

Other Parameters

• P-stranding Parameter– English

• Who are you talking with?

• Positive examples of P-stranding punish non P-stranding grammars

– Spanish• ***Quien hablas con?

• Non-occurrence of P-stranding does not punish P-stranding grammars

• Could anything be learned from the consistent use of pied-piping, or from the absence of P-stranding?

Page 55: Parameter Setting

Other Parameters

• Locative verbs & Verb compounding– John poured the water into the glass.

*John poured the glass with water.

– (*)John filled the water into the glass. <-- ok if [+V compounding]John filled the glass with water.

– English• Absence of V-compounding is irrelevant• Simple examples above do not punish Korean grammar (superset)• Korean grammar may be punished by more liberal properties elsewhere,

e.g. pile the table with books.

– Korean• Occurrence of ground verbs in figure frame punishes English grammar• Occurrence of V-compounds punishes English grammar

Page 56: Parameter Setting

Other Parameters

• “Or as PPI” Parameter (Takuya)

– John didn’t eat apples or oranges– English

• Neither…nor reading punishes Japanese grammar

– Japanese• Examples of Japanese reading punish English???

Page 57: Parameter Setting

Other Parameters

• Classic Subjacency Parameter (Rizzi, 1982)

– English: *What do you know whether John likes ___?Italian: okAnalysis:Bounding nodes are (i) NP, (ii) CP (It.)/IP (Eng.)

– English• Input regarding subjacency is consistent with Italian grammar

– Italian• If wh-island violations occur, this punishes English grammar• Worry: production processes in English give rise to non-trivial numbers

of wh-island violations.

Page 58: Parameter Setting

Triggers

• Triggers

– Unambiguous triggers - if they exist - do not have the role of identifying the winner as much as punishing the losers

– Distributed triggers - target grammar is identified by conjunction of two different properties, neither of which is sufficient on its own

• Difficult for Fodor and for Gibson/Wexler - no memory

• Survivor model: also no memory, but distributed trigger works by never punishing the winner, but separately punishing the losers

Page 59: Parameter Setting

Input

Analyze Input

Action

utterance corpus

Page 60: Parameter Setting

Input

Action

utterance corpus

Generate Parses

Select Parses

grammar supergrammar

Page 61: Parameter Setting

Input

Failure

utterance corpus

Generate Parses

Select Parses

grammar supergrammar

Success

Page 62: Parameter Setting

Overgeneralization

• When two values of a parameter both allow analysis of an input utterance, overgeneralization is a problem.

• Presentation of sentences from a subset grammar will not punish a superset grammar

Page 63: Parameter Setting

Lexical Learning

• Alternative subcategorizations

– John believes Mary.

– John believes that Mary is a spy.

– John understands Mary.

– John understands that Mary is a spy.

– *John hopes Mary.

– John hopes that Mary is not a spy.

Page 64: Parameter Setting

Lexical Learning

• It is easy to update lexical subcategorization frequencies - because the alternatives are mutually incompatible

[but this doesn’t solve the argument structure learning problem entirely]

• This is harder for choices that stand in a superset/subset relation, since the alternatives are not mutually incompatible

• Can parametric choices be characterized in a similar way?

Page 65: Parameter Setting

• Utterance-based learning

– General difficulty in assessing informativeness of input

– Distributional information can be tied to units that are carried forward in time: lexical items, parameters

– Distributional information can be used reliably when alternatives are mutually incompatible

• Does this situation change if we move to corpus-based learning?

Page 66: Parameter Setting

Input

Analyze Input

Action

utterance corpus

Page 67: Parameter Setting
Page 68: Parameter Setting

Input

Analyze Input

Action

Utterance vs. Corpus

Parse vs. Trigger

Generate vs. Select

Grammar vs Supergrammar

Success vs failure

Reward/punish; nothingInformed update

OvergeneralizationMutual incompatibility

Page 69: Parameter Setting
Page 70: Parameter Setting

Questions about Survivor Model

• How to address the Subset Problem?• How to address the Hitchhiker/Accomplice problem,

i.e., improve blame-assignment• How to use a realistic model of cross-language

variation?

• Better empirical evidence for multiple grammars in child speech

Page 71: Parameter Setting

Subset Problem

• Problem– If a [+P] grammar generates a superset of what a [-P] grammar generates,

then the [+P] grammar is never punished by presentation of sentences from [-P]

– The fact that the extra sentences generated by [+P] never occur plays no role

• Analysis-by-synthesis– “How would I have said that?”– If a comprehender takes input sentence [meaning] and uses grammar to re-

generate sentence, feedback available from (mis-)match with original sentence

– If speaker generates sentence with superset [+P] grammar, then mismatch can be used to punish [+P]

– Generation using [-P] grammar will never be punished

Page 72: Parameter Setting

Blame Assignment

• Problem– All grammars are treated as vectors of parameter values

[0, 0, 1, 0, 1, 1, 1, 0, 1, …]

– Success or failure in parsing any individual trial punishes or rewards all parameter values in the grammar, regardless of their contribution to the success/failure (‘hitchhikers’, ‘accomplices’)

– How to update only the relevant parameter values?

• ‘Relevant’ parameter– Fodor: only treelets that must be used are relevant on any trial

– Survivor model: reward/punish values of only those parameters (treelets) that can be used on any given trial

Page 73: Parameter Setting

Lexicalized Parameters

• Problem– It is hard to characterize natural language grammars as a fixed-length

vector of parameters[0, 0, 1, 0, 1, 1, 1, 0, 1, …]

– Variation is associated with specific lexical items (e.g., domains for anaphors), whose number varies across languages

• Answer– Task is to learn properties of lexical items– Reward/punish properties of lexical items used in the sentence only

• Challenge– How to handle cases where items compete for realization, e.g., pronoun vs.

anaphor?

Page 74: Parameter Setting

Other Information Sources

• All parameter setting models– Syntactic properties are learned only through success or failure in

parsing input sentences

• Non-syntactic cues (e.g., Pinker ‘semantic bootstrapping’)– Semantic information cues syntactic properties of words

– ‘Linking rules’ for verb argument structuree.g., [manner-of-motion] -> V NPfigure PPground

John poured water into the glass*John poured the glass with water

Page 75: Parameter Setting

Role of Probabilistic Information

• Probabilistic information has limited role– It is used to predict the time-course of hypothesis evaluation

– It contributes to the ultimate likelihood of success in only a very limited sense

– It does not contribute to the generation of hypotheses - these are provided by a pre-given parameter space

– By gradually accumulating evidence for or against hypotheses, the model becomes somewhat robust to noise

• One-grammar-at-a-time models– Negative evidence (i.e. parse failure) has drastic effect

– Hard to track degree of confidence in a given hypothesis

– Therefore hard to protect against fragility

Page 76: Parameter Setting

Statistics as Evidence or as Hypothesis

• Phonological learning– Age 0-1: developmental change reflects tracking of surface

distribution of sounds?

– Age 1-2: [current frontier] developmental change involves abstraction over a growing lexicon, leading to more efficient representations

• Neural Networks etc.– Statistical generalizations are the hypotheses

• Lexical Access– Context guides selection among multiple lexical candidates

• Syntactic Learning– Survivor model: statistics tracks accumulation of evidence for pre-

given hypotheses