Upload
hector-simon
View
222
Download
0
Tags:
Embed Size (px)
Citation preview
CORPORA-2013
A study of inflectional morpheme develop-ment in English-speaking children using
CHILDES Corpus
Myung Sook Min1, Sun-Young Lee2, Jong-Sup Jun1
1 Hankuk University of Foreign Language & 2 Cyber Hankuk University of Foreign Language
26 JUNE, 2013
The International Conference on Corpus Linguistics CORPORA-2013
CORPORA-2013
2
Using the CHILDES(Child Language Data Exchange
System) database, this study investigated the order of ac-
quisition of inflectional morphemes and the overregular-
ization found in English children’s L1 acquisition.
Research Goal
CORPORA-2013
1. Introduction
Background Children’s L1 development is made by regularizing the linguistic knowledge ac-
quired through diverse input from caregivers. In English, the language development can be measured by the usage of the in-
flectional morphemes such as –ing and –(e)d. Brown(1973) proposed the mean order of acquisition of 14 morphemes and
Marcus et al.(1992) confirmed the U-shape development in the acquisition of
English verbal irregular past tense.
Research purpose Using the whole CHILDES database, this study verifies the previous studies that
studied a limited number of subjects on inflectional morpheme development in
child language.
3
CORPORA-2013
2. Literature Review
2.1 Acquisition order of inflectional morphemes
Berko(1958) Brown(1973)
Studied the acquisition of mor-
phemes in 4-7 year old American
children using WUG Test which
investigates children’s ability to ap-
ply the inflectional morphemes to
nonsense words. Order of acquisition of Infl.
(1) Present progressive(-ing)
(2) Past regular(-ed)
(3) Third Person regular(-s)
(4) Possessive(-’s)
Studied the acquisition of grammat-
ical morphemes by analyzing the
spontaneous utterance produced by
3 children.
Order of acquisition of Infl.
(1) Present progressive
(2) Plural (3) Past irregular
(4) Possessive (5) Past regular
(6) Third person regular
(7) Third person irregular
4
CORPORA-2013
2. Literature Review
2.2 Overregularization
Marcus et al.(1992) – (-ed) Kuczaj(1977) – (-ing)
Studied the overregularization of
past tense morpheme on the sponta-
neous utterance produced by 83
subjects. Overregularization rate was not
high but its tendency existed. Overregularization errors were
found from the age of 2 till the be-
ginning of school age. U-shape development confirmed.
Studied the overregularization of
present progressive morpheme on
the spontaneous utterance produced
by 15 subjects. Overregularization was rarely
found. Claimed that it is because there is
no irregular present progressive
form for irregular verbs.
5
CORPORA-2013
2. Literature Review
2.3 Research questions
Limit of pre-vious studies
The results of previous studies are insufficient for the
generalization of children’s language development due to a
limited number of participants.
Research questions
1) Do children apply the inflectional morphemes to diverse
verbs as they get older?
2) Is the overregularization error found? And is the U-shape
developmental pattern found in children’s language
acquisition?
3) Related to questions 1-2 above, is there a difference between
the UK and the USA children’s language development? If so,
is it due to mothers’ input?
6
CORPORA-2013
2. Literature Review
2.4 Research Method
Method The number of inflectional word types, their frequency and type
per token ratio, and D which stands for ‘lexical diversity’ were
calculated to measure the development of inflectional
morpheme by age.
D indicates the lexical diversity on randomly selected sentences.
The higher D is, the more diverse the words to which the
children apply the inflectional morphemes.
D is calculated by the command of VocD in CLAN on the
CHILDES Corpus with different lengths of texts.
7
CORPORA-2013
3. Corpus Study
3.1 CHILDES Corpus
The CHILDES Corpus is one of the most frequently used for research on
language acquisition and the caregiver’s input influence research.
Rearranged the entire CHILDES Corpus to analyze it in an easy way and
focused on the corpus from the age of 1 to 7 which accounts for 97% of the
entire CHILDES Corpus.
7,841 files were created with 2,272 files from 275 UK children and 5,569 files
from 1,355 USA children.
35,130 word types with 1,937,624 tokens from the UK children and 63,705
word types with 2,771,312 tokens from the USA children were extracted by the
command of FREQ in CLAN
8
CORPORA-2013
3. Corpus Study
3.3 Analysis
First, classified 4,700,000 words by regular inflectional morphemes such as –(e)d and then extracted irregular inflectional morphemes such as ‘wore’ and integrated it with the regular inflectional words. (1) Present progressive(-ing)(2) Regular and irregular past tense(-(e)d, irr), (3) Comparative and superlative(-er, -est, irr)(4) Third person singular present/plural (-(e)s, irr),(5) Possessive singular and plural(-’s, -s’)(6) Pronoun
Calculated Type, Token and TTR by the command of FREQ in CLAN- Command: freq +t*CHI +u +f @ file
Calculated D by the command of VocD in CLAN- Command: vocd +t"CHI" +r6 +s"@C:\CHILDES\CLAN\lib\17133_ ed_d_irr_2556.cut" +u +f @ file
9
CORPORA-2013
3. Corpus Study
3.4. Results
Extracted the inflectional word types of 13,528 and the tokens of 1,221,916
TTR and D of inflectional morphemes by country
Inflectional morphemes
UK USA
Type Token TTR D Type Token TTR D
-ing 1,229 39,759 0.031 33.26 1,084 47,458 0.023 27.38
-d_ed_irr(V) 1,006 82,474 0.012 10.78 1,472 132,405 0.011 19.23
-er_-est_irr(A) 217 11,499 0.008 0.77 198 13,978 0.014 1.57
-es_-s_irr(N) 4,245 114,524 0.037 18.18 3,905 172,631 0.023 30.78
pronoun 52 165,778 0.000 2.64 51 321,019 0.000 1.82
-s'_-'s 1,359 49,219 0.028 4.48 1,904 71,172 0.027 3.77
Total 8,108 463,2530.019
11.685 8,614 758,663 0.016 14.0910
CORPORA-2013
3. Corpus Study
3.4.1 Present progressive (-ing)
TTR and D
AgeUK USA
Type Token TTR D Type Token TTR D
1 97 719 0.135 13.83 290 3,378 0.086 27.60
2 1,158 33,264 0.035 33.27 637 15,904 0.040 28.14
3 256 3,071 0.083 20.18 558 10,924 0.051 23.93
4 131 642 0.204 21.84 569 11,343 0.050 26.67
5 154 1,009 0.153 27.07 383 3,740 0.102 28.95
6 83 264 0.314 27.07 244 1,210 0.202 36.05
7 118 790 0.149 22.48 200 959 0.209 27.23
Total 1,997 39,759 0.153 23.68 2,881 47,4580.106
28.3711
CORPORA-2013
3. Corpus Study
3.4.1 Present progressive (-ing)
The difference in D is not found between the UK and the USA children.
The correlations between D and the children’s age were not significant, which seems to indicate that children already apply the present progressive morpheme to diverse verbs from the age of 1.
- UK children: r =0.025, p >.05 / USA children: r =0.385, p >.05
80-90% of the most frequently used 50 words in children’s speech were found in the most frequently used 50 words in mothers’.
Overregularization errors were rarely found.
- Noun+ing(tennising, swording, appetizing) one or twice of each Adjective+ing(noticeabling) only once
- However, present progressive and gerund shares the same form, it needs further study by reviewing their usage.
12
CORPORA-2013
3. Corpus Study
3.4.2 Past tense (-(e)d_irr(V))
TTR and D
AgeUK USA
Type Token TTR D Type Token TTR D
1 94 1,800 0.052 5.24 223 5,145 0.043 16.34
2 913 68,022 0.013 11.45 726 33,190 0.022 19.92
3 262 6,474 0.040 12.63 757 33,020 0.023 21.24
4 166 1,600 0.104 15.68 820 38,482 0.021 21.00
5 181 2,327 0.078 14.82 547 13,106 0.042 22.60
6 108 610 0.177 12.34 352 4,755 0.074 24.52
7 162 1,641 0.099 13.39 321 4,707 0.068 20.05
Total 1,886 82,474 0.080 12.22 3,746 132,405 0.042 20.81
13
CORPORA-2013
3. Corpus Study
3.4.2 Past tense (-(e)d_irr(V))
As children get older, the D of past tense increased by the age of 5 or 6 and
decreased at age 7 in both the UK and the USA.
A marginal correlation was found between the D and the children’s age.(The
critical value of significant correlation coefficient was 0.68) It means children
tend to apply past tense morphemes to more diverse verbal words as their age
increased.
UK children: r=0.643 p>.05 / USA children r= 0.66, p>.05
In all age groups, the D of the USA children is higher than that of the UK
children.
That the D of past tense is lower than that of present progressive confirms the
grammatical morpheme developmental order proposed by Brown(1973).
14
CORPORA-2013
3. Corpus Study
3.4.2 Past tense (-(e)d_irr(V))
The words with the highest frequencies are occupied mostly by irregular verbs.
They were found four times more than regular verbs in both countries.
- UK: 25 irregular verbs, 8 irregular verbs whose bare form shares the same
form as the past and the past participle, 7 auxiliary verbs, 6 regular verbs, 4
words with regular past tense morphemes but probably used as adjectives
- USA: 25 irregular verbs, 9 irregular verbs whose bare form shares the same
form as the past and the past participle such as put, 5 auxiliary verbs, 5
regular verbs, 6 words with regular past tense morphemes but probably used
as adjectives
15
CORPORA-2013
3. Corpus Study
3.4.2 Past tense (-(e)d_irr(V))
‘go’ and ‘fall’ were the most overregularized irregular verbs attached with
regular past tense morpheme ‘-(e)d’.
Overregularization error type and frequency of irregular verb ‘go’
Age
Correct Overregularizationtotal
went Gone Subtotal goed goned wented subtotal
UKUSA
UK USA UK USA UKUSA
UKUSA
UK
USA
UKUSA
UK USA
1 - 29 580 239 580 268 1 - - - - - 1 - 581 268
2 784 572 4,978 627 5,762 1,199 17 38 3 2 - 1 20 41 5,782 1,240
3 73 675 192 142 265 817 3 52 - - - - 3 52 268 869
4 23 860 21 109 44 969 - 4 - - - - - 4 44 973
5 33 286 22 30 55 316 - - - - - - - - 55 316
6 24 158 2 6 26 164 - - - - - - - - 26 164
7 100 74 15 22 115 96 - - - - - - - - 115 9616
CORPORA-2013
3. Corpus Study
3.4.2 Past tense (-(e)d_irr(V))
Overregularization errors were not found at the age of 1 but appeared between the ages of 2 and 3 and then they began to disappear from the age of 4 or 5.
U-shape developmental pattern of irregular verb ‘go’
Overregularization rate of ‘go’ between the UK and the USA was significantly different by the Pearson chi-square.
17
1 2 3 4 5 6 791%
92%
93%
94%
95%
96%
97%
98%
99%
100%
UKUSA
age
CORPORA-2013
3. Corpus Study
3.4.2 Past tense (-(e)d_irr(V))
Overregularization error type and frequency of irregular verb ‘fall’
The overregularization error types of ‘fall’ were found more in the UK
children.
Age
Correct Overregularizationtotal
Fell Fallen subtotal falled felled fallened subtotal
UKUSA
UKUSA
UKUSA
UKUSA
UKUSA
UKUSA
UKUSA
UKUSA
1 3 73 - - 3 73 - - - 1 - - - 1 3 74
2 315 462 290 2 605 464 57 94 6 5 4 - 67 99 672 563
3 37 324 16 5 53 329 4 34 - 2 - - 4 36 57 365
4 15 264 2 1 17 265 1 8 - 1 - - 1 9 18 274
5 16 94 2 - 18 94 - 2 - 1 - - - 3 18 97
6 7 32 - 1 7 33 1 - - - - - 1 - 8 33
7 5 15 1 1 6 16 - 1 - 1 - - - 2 6 18
18
CORPORA-2013
3. Corpus Study
3.4.2 Past tense (-(e)d_irr(V))
Overregularization error in irregular past tense tended to appear at the age of 2 and began to decrease from the age of 3 and disappeared at the age of 4 or 5.
U-shape developmental pattern of irregular verb ‘fall’
Overregularization rate of ‘fall’ between the UK and the USA was not significantly different by the Pearson chi-square.
19
age1 2 3 4 5 6 780%
82%
84%
86%
88%
90%
92%
94%
96%
98%
100%
UKUSA
CORPORA-2013
3. Corpus Study
3.4.3 Comparative and Superlative (-er_-est_irr(A))
TTR and D
AgeUK USA
Type Token TTR D Type Token TTR D
1 11 661 0.017 0.10 19 1,660 0.011 0.26
2 76 9,940 0.008 0.74 70 4,242 0.017 0.92
3 26 420 0.062 1.24 99 2,758 0.036 2.03
4 21 126 0.167 2.89 122 3,349 0.036 2.70
5 28 194 0.144 2.92 84 1,211 0.069 3.44
6 20 57 0.351 5.82 47 376 0.125 4.06
7 17 101 0.168 2.54 41 382 0.107 2.17
total 199 11,499 0.131 2.32 482 13,978 0.057
2.23
20
CORPORA-2013
3. Corpus Study
3.4.3 Comparative and Superlative (-er_-est_irr(A))
As children get older, the D of comparative -er and superlative –est increased
by the age of 6 and slightly decreased at age 7 in both the UK and the USA. It
confirms that children applied comparative and superlative form to diverse
adjectival words as they get older.
Strong correlations were found between D and the children’s age.
UK: r = 0.779, p < .05 / USA: r = 0.776, p < .05
The Ds between the UK and the USA were not distinctively noticeable.
21
CORPORA-2013
3. Corpus Study
3.4.3 Comparative and Superlative (-er_-est_irr(A))
The words with the most frequency are more and better followed by last, bigger, higher in the UK children and cleaner, higher, bigger, later in the USA children.
Overregularization error type and frequency of ‘little’
AgeCorrect Overregularization
Totalless littler littlest
UK USA UK USA UK USA UK USA1 0 0 0 0 0 0 0 02 3 1 1 7 0 5 4 123 0 3 2 6 1 9 3 184 0 6 0 6 1 4 1 165 0 7 0 4 1 0 1 116 0 0 1 4 1 3 2 77 0 1 0 0 0 0 0 1
22
CORPORA-2013
3. Corpus Study
3.4.3 Comparative and Superlative (-er_-est_irr(A))
The overregularization errors were found till the age of 6 but still show the U-shape developmental pattern.
U-shape developmental pattern of ‘little’
Overregularization rate of ‘little’ between the UK and the USA was significantly different by the Pearson chi-square.
23
age1 2 3 4 5 6 70%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
UK_littlerUK_littlestUSA_littlerUSA_littlest
CORPORA-2013
3. Corpus Study
3.4.3 Comparative and Superlative (-er_-est_irr(A))
4 files that littler was found in both child and mother. In these files, children
produced 8 times while mothers produced 17 times.
This finding tells us the possible influence of mothers’ input on child
langauage.
24
CORPORA-2013
4. Discussion
1) Do children apply the inflectional morphemes to diverse verbs as they get
older?
D
Present progressive(23.68~28.37) > Past tense (12.22~20.81) >
Comparative/Superlative(2.32~2.23)
- D confirms the grammatical morpheme developmental order
proposed by Brown(1973).
The developmental patterns of each inflectional morpheme were different as
children got older.
That irregular verbs were found more than 4 times than regular verbs in 50
most frequently used verbs supports Brown(1973)’s claim that children
acquired irregular verbs earlier than regular verbs.
25
CORPORA-2013
4. Discussion
2) Is the overregularization error found? And is the U-shape developmental
pattern found in children’s language acquisition?
The overregularization errors were found and the U-shape developmental
pattern which was claimed in the previous studies like Brown(1973) and
Marcus et al.(1992) were confirmed in CHILDES Corpus on a large scale.
The overregularizaiton errors were found in past tense the most and rarely
found in present progressive.
26
CORPORA-2013
4. Discussion
3) Related to questions 1-2 above, is there a difference between the UK and the
USA children’s language development? If so, is it due to mothers’ input?
Similarities
(1) As children get older, they apply the inflectional morpheme to more
diverse words.
(2) U-shape developmental patterns were found in both the UK and the
USA.
Differences
(1) The overregularization error rate in English children was lower
than that in American children.
(2) The possible influence of mothers’ input on children’s language is
suggestive.
27
CORPORA-2013
5. Conclusion
This study investigated the inflectional morpheme development in child language using the data from CHILDES Corpus from 1-7 years old.
Our findings are:
1) Children tended to apply the inflectional morpheme to more diverse words as they got older.
2) U-shape developmental pattern was confirmed.
3) The overregularization errors were found while children applied the inflectional morphemes to words.
4) With Ds, this study supports the grammatical developmental order proposed by Brown(1973).
5) This study showed the possible influence of mothers’ input on children’s language by the different developmental aspects of the UK and the USA children.
28
CORPORA-2013
References[1] Berko, Jean(1958), The child’s learning of English morphology. Word, 14, 47-56[2] Brown, Roger(1973), A first Language-The early Stages, Harvard University Press[3] CHILDES (http://childes.psy.cmu.edu/) [4] Johansson, Victoria(2008), Lexical diversity and lexical density in speech and writing: a developmental
perspective, Lund University, Dept. of Linguistics and Phonetics, Working Papers 53. p.61-79[5] Kuczaj, Stan A.(1977), Why do children fail to overgeneralize the progressive inflection?, Journal of
Child Language 5. p.167-171 [6] MacWhinney, B. & Snow, C. E.(2000), The Child Language Data Exchange System: An Update.
Journal of Child Language 17. p.457-472[7] Marcus, Gary F.; Pinker, Steven; Ullman, Michael; Hollander, Michelle; Rosen, T. John; and Su,
Fei(1992), Overregularization in Language Acquisition, MONOGRAPHS OF THE SOCIETY FOR RESEARCH IN CHILD DEVELOPMENT Serial No. 228 Vol. 57
[8] Malvern, David; Brian Richards; Ngoni Chipeer & Pilar Duran(2004), Lexical diversity and language development: quantification and assessment New York: Palgrave Macmillan
[9] Maslen, Robert J C; Theakston, Anna L; Lieven, Elena V M; Tomasello, Michael(2004), A Dense Corpus Study of Past Tense and Plural Overregularization in English, Journal of Speech, Language, and Hearing Research 47. 6. p.1319-1333
[10] McCathy, Philip M. & Jarvis S(2004), vocd: A theoretical and empirical evaluation, Language Testing 24.4 p.459-488
[11] Richards, Brian J. & David Malvern(1997), Quantifying lexical diversity in the study of language development. Reading: Faculty of Education and Community Studies
[12] Templin, M.C.(1957), Certain language skills in children. Minneapolis: University of Minnesota Press29
CORPORA-2013
30
Myung Sook Min ([email protected])
Sun-Young Lee ([email protected])
Jong-Sup Jun ([email protected])
Contact Info.