1
Modeling Situated Language Learning in Early Childhood via Hypernetworks Zhang, Byoung-Tak 1,2 , Lee, Eun Seok 2 , Heo, Min-Oh 1 , and Kang, Myounggu 1 1 School of Computer Science & Engineering, Seoul National University 2 Interdisciplinary Program in Cognitive Science, Seoul National University {btzhang, eslee, mgkang, moheo}@bi.snu.ac.kr Method Simulation Experiment Results References Data Sets Biointelligence Lab, Seoul National University, Seoul 151-744, Korea (http://bi.snu. Thomas is a tank engine who lives on the island of Sodor. He has six small wheels, a short funnel a short boiler and a short dome. Season 1, 1 st Video (40 pairs) Edward was in the shed with the engines. They were all bigger than him and boasted about it. The driver won't pick you, said Gordon. He wants strong engines. Season 1, 2 nd Video (40 pairs) Season 1 10 videos (398 pairs) Barsalou, L. W. (2008). Grounded cognition, Ann. Rev. Psych., 59, 617-645. Griffiths, T., Steyvers, M., & Tenenbaum, J., (2007) Topics in semantic representation, Psych. Rev., 114(2), 211–244. Knoeferle, P., & Crocker, M. W. (2006). The coordinated interplay of scene, utterance, and world knowledge: Evidence from eye-tracking, Cog. Sci., 30, 481-529. Spivey, J. M. (2007). The Continuity of Mind, Oxford Univ. Press. Yu, C, Smith, L. B., & Pereira, A. F. (2008). Grounding word learning in multimodal sensorimotor interaction, CogSci-2008, pp. 1017-1022. Zhang, B.-T. (2008). Hypernetworks: a molecular evolutionary architecture for cognitive learning and memory. IEEE Comp. Intell. Mag., 3(3), 49-63. Zwann, R. A. & Kaschak, M. P. (2008). Language in the brain, body, and world. Chap. 19, Cambridge Handbook of Situated Cognition. · · · · · · · Background Human language is grounded, i.e. embodied and situated in the environment (Barsalou, 2008; Zwaan & Kaschak, 2008) Grounded language models rely on multimodal sensory data (Knoeferle & Crocker, 2006; Spivey, 2007; Yu et al. 2008) Language grounding requires a flexible modeling tool that can deal with high-dimensional data · · · Research Questions What kinds of grounded, linguistic “concept map” are cognitive-computationally plausible and child-learning friendly? How does the “multimodal” concept map of a young child evolve as being incrementally exposed to situated language environments? What kinds of cognitive learning algorithms can naturally simulate the situated language learning process? Can we use the situated concept-map building process for endowing AI agents with language learning capability? · · · · Key Ideas in This Work · · · Hypernetwork structure (Zhang, 2008) as a multimodal concept map Population coding as an incremental learning-friendly representation Cartoon videos as a surrogate for situated child language corpora 1. Define linguistic words and visual words 2. Learn incrementally a hypernetwork on the videos 3. Plot the multimodal concept map for a given query · Example sentences generated from the learned concept map (Dat · Evolution of the multimodal concept map (MCM) for Data · MCM for ‘Station’ after learning the 1st video MCM for ‘Station’ after learning the 2nd video MCM for ‘Station’ after learning the videos 1-4 MCM for ‘Station’ after learning the videos 1-5 1 2 3 4 5 6 7 8 9 10 Generated mental images per each word Dataset Sequence Generated mental images by a sentence Fat · People · Went · Station · Fat people went station · Mental images generated from the learned multimodal concept ma · Data 1: Educational animation video scripts for children (text only) 33,910 sentences, 6,124 word types, 208,116 word tokens; 10 learning stages according to the recommended language levels of difficulty Data 2: Video of Thomas & Friends, Season 1, Numbers 1 - 10 398 pairs of the dialogue sentence and its corresponding image · · - Excerpted sentences from earlier learning stages to later ones with evolving concept association CE/ 표표 CE/ 표표 CE/ 표표 CE/ 표표 CE/ 표표 CE/ 표표 CE/ 표표 CE/ 표표 CE/ 표표 CE/ 표표 CE/ 표표 CE/ 표표 CE/ 표표 17.7 (1.49) 23.3 (1.49) 34 (1.83) 27.7 (1.25) 48.2 (1.55) 47.6 (1.17) 53.8 (1.55) 45.8 (1.03) 43.2 (1.75) 43.0 (1.33) 13 (1.05) 1.8 (0.79) 20.7 (0.95) 2.2 (0.42) 16.7 (1.25) 19.9 (0.88) 19 (1.15) 6.9 (1.66) 15.6 (4.06) 3.80 (1.03) Developmental lear... Training sesseions - The typical proportion (%) and SD of semantically well-structured sentences from 100 random sentences generation task with developmental learning and improvised learning. (By human scaling measurement)

Modeling Situated Language Learning in Early Childhood via Hypernetworks

Embed Size (px)

DESCRIPTION

Season 1, 1 st Video (40 pairs). Modeling Situated Language Learning in Early Childhood via Hypernetworks. Thomas is a tank engine who lives on the island of Sodor. He has six small wheels, a short funnel. a short boiler and a short dome. - PowerPoint PPT Presentation

Citation preview

Page 1: Modeling Situated Language Learning  in Early Childhood via  Hypernetworks

Modeling Situated Language Learning in Early Childhood via Hypernetworks

Zhang, Byoung-Tak1,2, Lee, Eun Seok2, Heo, Min-Oh1, and Kang, Myounggu1

1 School of Computer Science & Engineering, Seoul National University2 Interdisciplinary Program in Cognitive Science, Seoul National University

{btzhang, eslee, mgkang, moheo}@bi.snu.ac.kr

Method

Simulation Experiment Results

References

Data Sets

Biointelligence Lab, Seoul National University, Seoul 151-744, Korea (http://bi.snu.ac.kr)

Thomas is a tank engine who lives on the island of Sodor.

He has six small wheels, a short funnel

a short boiler and a short dome.

Season 1, 1st Video (40 pairs)

Edward was in the shed with the engines.

They were all bigger than him and boasted about it.

The driver won't pick you, said Gordon. He wants strong engines.

Season 1, 2nd Video (40 pairs) Season 1 10 videos

(398 pairs)

Barsalou, L. W. (2008). Grounded cognition, Ann. Rev. Psych., 59, 617-645.Griffiths, T., Steyvers, M., & Tenenbaum, J., (2007) Topics in semantic representation, Psych. Rev., 114(2), 211–244.Knoeferle, P., & Crocker, M. W. (2006). The coordinated interplay of scene, utterance, and world knowl-edge: Evidence from eye-tracking, Cog. Sci., 30, 481-529. Spivey, J. M. (2007). The Continuity of Mind, Oxford Univ. Press. Yu, C, Smith, L. B., & Pereira, A. F. (2008). Grounding word learning in multimodal sensorimotor interac-tion, CogSci-2008, pp. 1017-1022.Zhang, B.-T. (2008). Hypernetworks: a molecular evolutionary architecture for cognitive learning and memory. IEEE Comp. Intell. Mag., 3(3), 49-63.Zwann, R. A. & Kaschak, M. P. (2008). Language in the brain, body, and world. Chap. 19, Cambridge Handbook of Situated Cognition.

··

·

··

·

·

Background Human language is grounded, i.e. embodied and situated in the environment (Barsalou, 2008; Zwaan & Kaschak, 2008)Grounded language models rely on multimodal sensory data (Knoeferle & Crocker, 2006; Spivey, 2007; Yu et al. 2008)Language grounding requires a flexible modeling tool that can deal with high-dimensional data

·

·

·

Research Questions What kinds of grounded, linguistic “concept map” are cognitive-computationally plausible and child-learning friendly? How does the “multimodal” concept map of a young child evolve as being incrementally exposed to situated language environments?What kinds of cognitive learning algorithms can naturally simulate the situated language learning process?Can we use the situated concept-map building process for endowing AI agents with language learning capability?

·

·

·

·

Key Ideas in This Work ···

Hypernetwork structure (Zhang, 2008) as a multimodal concept mapPopulation coding as an incremental learning-friendly representationCartoon videos as a surrogate for situated child language corpora

1. Define linguistic words and visual words2. Learn incrementally a hypernetwork on the videos3. Plot the multimodal concept map for a given query

·

Example sentences generated from the learned concept map (Data 1)·

Evolution of the multimodal concept map (MCM) for Data 2·

MCM for ‘Station’ after learning the 1st videoMCM for ‘Station’ after learning the 2nd video

MCM for ‘Station’ after learning the videos 1-4MCM for ‘Station’ after learning the videos 1-5

1 2 3 4 5 6 7 8 9 10

Generated mental images per each word

DatasetSequence

Generated mental images by a sentence

Fat·

People·

Went·

Station·

Fat people went station

·

Mental images generated from the learned multimodal concept map (Data 2)·

Data 1: Educational animation video scripts for children (text only) 33,910 sentences, 6,124 word types, 208,116 word tokens; 10 learning stages according to the recommended language levels of difficultyData 2: Video of Thomas & Friends, Season 1, Numbers 1 - 10 398 pairs of the dialogue sentence and its corresponding image

·

·

- Excerpted sentences from earlier learning stages to later ones with evolving concept association

CE/표준 CE/표준 CE/표준 CE/표준 CE/표준 CE/표준 CE/표준 CE/표준 CE/표준 CE/표준CE/표준

CE/표준

CE/표준

17.7(1.49)

23.3(1.49)

34(1.83)

27.7(1.25)

48.2(1.55)

47.6(1.17)

53.8(1.55)

45.8(1.03) 43.2

(1.75)43.0

(1.33)

13(1.05)

1.8(0.79)

20.7(0.95)

2.2(0.42)

16.7(1.25)

19.9(0.88)

19(1.15)

6.9(1.66)

15.6(4.06)

3.80(1.03)

Developmental learningImprovised learning

Training sesseions

- The typical proportion (%) and SD of semantically well-structured sentences from 100 random sentences generation task with developmental learning and improvised learning. (By human scaling measurement)