Human simulations of vocabulary learning

1

Human simulations of vocabulary learning

Présentation Interface Syntaxe-PsycholinguistiqueY-Lan BOUREAU

Gillette, Gleitman, Gleitman, Lederer

2

OutlineSome background

The problem to be solvedFacts : nouns’ acquisition precedes verbs’ acquisitionExisting theoryGillette et al.’s hypothesis

Simulation experimentLearning from observationLearning from linguistic hints

Discussion

3

The problem of language learning

Children learn language from scratchTraditional hypothesis :

Children hear adults speakThey spot that « cat » is uttered most frequently when there is a cat aroundThey infer that « cat » means ‘cat’

But babies’ vocabulary does not reflect input frequencies

much more nouns than verbs in babies vocabulary

4

Nouns are learnt earlierA conceptual hypothesis :

Verbs are conceptually more difficultSo cannot be learnt until babies display adequate conceptual knowledge

Alternative hypothesis : information requirements

Verbs require some syntax to be already acquired (e.g. : I know that Mommy is coming)

5

Pairing word to worldThree sources of information :

Nonlinguistic evidence (e.g. Mommy says « cat » when the cat is there)Linguistic evidence :• Co-occurrence of semantically related words in

sentences (e.g. food names usually appear with verbs like « eat »)

• Syntactic structures in which words occur (e.g. a verb with one subject and two complements is likely to be of the « give » kind)

6

HypothesisHypothesis : the baby

(1) acquires a small stock of nouns by word-to-world pairing (2) uses that stock of nouns as a scaffold for constructing representations of the linguistic input that will support a more efficient learning procedure

Support : correlation of changes in vocabulary size with appearance of multiword speech

7

A simulation experimentPrinciple :

Adult learners • (no conceptual issues any more)

Trying to guess : most frequently used nouns or verbs

Observational clues : video clipsLinguistic clues : co-occurring words, syntactic frame

8

First experimentOnly videoclipsAdults trying to guess 24 nouns and 24 verbs

9

Results, Part I : Nouns win

Nouns are guessed with much better results than verbs :

MEAN % CORRECT FINAL IDENTIFICATIONexp 1

0

10

20

30

40

50

NOUNS VERBS

MEAN %CORRECT FINALIDENTIFICATION

10

Imageability rulesProvided clues are exclusively visualNouns of the set (e.g. elephant, plane, bag) are a lot more « imageable » than verbs (e.g. think, know, wait)

IMAGEABILITY

012345678

NOUNS VERBS

11

Results, part II

05

101520253035404550

MEA

N %

CO

RR

ECT

FIN

ALID

EN

TIF

ICAT

ION

CO

RR

EC

TIN

GW

ITH

IMAG

EAB

ILIT

Y

Nouns

Verbs

Verbs

Nouns

12

Conclusion of experiment I

The one relevant factor seems to be imageability

Not that surprising : from a video, you learn imageable things ; a thing that is not imageable would be hard to picture !!

13

Linguistic clues vs. Observational clues

All nouns removed. 6 conditions :

1 : videoclips (but with a bip for the verb)2 : alphabetical lists of nouns3 : 1+2 (videoclips + alphabetical lists)4 : syntactic frames with all nonsense words5 : sentences with only the verb as nonsense6 : 1+5 (videoclips + sentences)

14


15


6 conditions :1 : videoclips (but with a bip for the verb)2 : alphabetical lists of nouns3 : 1+2 (videoclips + alphabetical lists)4 : syntactic frames with all nonsense words5 : sentences with only the verb as nonsense6 : 1+5 (videoclips + sentences)

16

ResultsMean % correct identification

0102030405060708090

100

video

clips

lists

of no

uns

video

clips

+ lis

tssy

ntacti

c fram

esse

ntenc

es+n

ouns

video

s+se

ntenc

es

Mean % correctidentification

No nouns provided !!

No more visual information !!

Nouns reintroduced

Visuals reintroduced

17


Remarkably : leap between 3 and 4, whereas the reverse could have been expected !Interestingly, those verbs that were best learnt in the observational learnings show a decrease between 3 and 4

18

DiscussionVerbs : complementary distributions (12 never learnt with visual clues = 12 best learnt with linguistic clues) This distribution corresponds to the « imageability » criterion :

Quite logically, you can learn visually only what is visually representableVerbs that use higher level linguistic representations have to wait until those can be constructed

19

Discussion : general scheme

First, imageable words are learnt on a word-to-world pairing basis

Those imageable words are mostly nounsThat would explain why nouns make up most of young infants’ vocabulary

Second, this first set of words allows learning of new words on a sentence-to-world pairing

Thus conceptual words can be learnt as well

20

Some reservationsThe argument structure is the same across languages (logical requirements), but :

Adults already know the words, so they could try to guess the verbs by exhaustive search with all the information given (e.g. : the best performance is for « look », and it is probably due to the use of « look » with « at » )

21

The End

Documents

Human simulations of vocabulary learning