151

SON-R 2 - Tests & Test-research

  • Upload
    others

  • View
    7

  • Download
    0

Embed Size (px)

Citation preview

Page 1: SON-R 2 - Tests & Test-research
Page 2: SON-R 2 - Tests & Test-research

This document contains for the greater part the “SON-R 21/2-7 Manual and ResearchReport”. Not included are chapter 12 (Directions per subtest), chapter 13 (The recordform, norm tables and computer program) and the appendices.

The reference for this text is:Tellegen, P.J., Winkel, M., Wijnberg-Williams, B.J. & Laros, J.A. (1998).Snijders-Oomen Nonverbal Intelligence Test. SON-R 21/2-7 Manual and ResearchReport. Lisse: Swets & Zeitlinger B.V.

This English manual is a translation of the Dutch manual, published in 1998 (SON-R21/2-7 Handleiding en Verantwoording). The German translation was also publishedin 1998 (SON-R 21/2-7 Manual). In 2007 a German manual was published withGerman norms (SON-R 21/2-7 Non-verbaler Intelligenztest. Testmanual mit deutscherNormierung und Validierung).

Translation by Johanna Noordam

ISBN 90 265 1534 0

Since 2003, the SON-tests are published by Hogrefe Verlag, Göttingen, Germany.

© 1998, 2009 Publisher: Hogrefe, Authors : Peter J. Tellegen & Jacob A. Laros

http://www.hogrefe.deE-mail: [email protected]

Rohnsweg 25, 37085 Göttingen, Germany

Page 3: SON-R 2 - Tests & Test-research

3

CONTENTS

Foreword . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

PART I: THE CONSTRUCTION OF THE SON-R 2,,,,,-7

1. Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.1 Characteristics of the SON-R 2,-7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

1.2 History of the SON-tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

1.3 Rationale for the revision of the Preschool SON . . . . . . . . . . . . . . . . . . . . . . . 16

1.4 Phases of the research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17

1.5 Organization of the manual . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2. Preparatory study and construction research . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.1 The preparatory study . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.2 The construction research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

3. Description of the SON-R 2,,,,,-7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.1 The subtests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.2 Reasoning tests, spatial tests and performance tests . . . . . . . . . . . . . . . . . . . . 31

3.3 Characteristics of the administration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

4. Standardization of the test scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.1 Design and realization of the research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.2 Composition of the normgroup . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.3 The standardization model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.4 The scaled scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5. Psychometric characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5.1 Distribution characteristics of the scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5.2 Reliability and generalizability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

5.3 Relationships between the subtest scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

5.4 Principal components analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

5.5 Stability of the test scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

Page 4: SON-R 2 - Tests & Test-research

4 SON-R 2,-7

PART II: VALIDITY RESEARCH

6. Relationships with other variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

6.1 Duration of test administration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

6.2 Time of test administration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

6.3 Examiner influence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

6.4 Regional and local differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

6.5 Differences between boys and girls . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

6.6 SES level of the parents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

6.7 Parents’ country of birth . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

6.8 Evaluation by the examiner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

6.9 Evaluation by the teacher . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

7. Research on special groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

7.1 Composition of the groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

7.2 The test scores of the groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70

7.3 Relationship with background variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

7.4 Diagnostic data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74

7.5 Evaluation by the examiner . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75

7.6 Evaluation by institute or school staff . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

7.7 Examiner effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

7.8 Psychometric characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 79

8. Immigrant children . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

8.1 The test results of immigrant children . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

8.2 Relationship with the SES level . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

8.3 Differentiation according to country of birth . . . . . . . . . . . . . . . . . . . . . . . . . . 82

8.4 Comparison with other tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83

8.5 The test performances of children participating in OPSTAP(JE) . . . . . . . . . . 84

9. Relationship with cognitive tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

9.1 Correlation with cognitive tests in the standardization research . . . . . . . . . . . 89

9.2 Correlation with nonverbal tests in primary education . . . . . . . . . . . . . . . . . . 93

9.3 Correlation with cognitive tests at OVB-schools . . . . . . . . . . . . . . . . . . . . . . . 94

9.4 Correlation with cognitive tests in special groups . . . . . . . . . . . . . . . . . . . . . . 96

9.5 Correlation with the WPPSI-R in Australia . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

9.6 Correlation with cognitive tests in West Virgina, USA . . . . . . . . . . . . . . . . . . 102

9.7 Correlation with the BAS in Great Britain . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

9.8 Overview of the correlations with the criterion tests . . . . . . . . . . . . . . . . . . . . 106

9.9 Difference in correlations between the Performance Scale and the Reasoning

Scale . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

9.10 Difference in mean scores on the tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

9.11 Comparisons in relation to external criteria . . . . . . . . . . . . . . . . . . . . . . . . . . . 112

Page 5: SON-R 2 - Tests & Test-research

5CONTENTS

PART III: THE USE OF THE TEST

10. Implications of the research for clinical situations . . . . . . . . . . . . . . . . . . . . . . . . 117

10.1 The objectives of the revision . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 117

10.2 The validity of the test . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

10.3 The target groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

10.4 The interpretation of the scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

10.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 134

11. General directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

11.1 Preparation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

11.2 Directions and feedback . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 138

11.3 Scoring the items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

11.4 The adaptive procedure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

11.5 The subtest score . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144

11.6 Adapting the directions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 145

12. Directions per subtest . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147

12.1 Mosaics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

12.2 Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 154

12.3 Puzzles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

12.4 Analogies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163

12.5 Situations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 169

12.6 Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173

13. The record form, norm tables and computer program . . . . . . . . . . . . . . . . . . . . 187

13.1 The use of the record form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187

13.2 The use of the norm tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 191

13.3 The use of the computer program . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194

13.4 Statistical comparisons . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198

References . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205

Appendix A Norm tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211

Appendix B The record form . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 250

Appendix C The file SONR2.DAT . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 255

Appendix D Contents of the test kit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 256

Page 6: SON-R 2 - Tests & Test-research

6 SON-R 2,-7

TABLES AND FIGURES IN THE TEXT

Introduction

Table 1.1 Overview of the versions of the SON-tests . . . . . . . . . . . . . . . . . . . . . . . . . 15

Pilot study and construction research

Table 2.1 Relationship between the subtests of the Preschool SON and the

SON-R 2,-7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

Table 2.2 Origin of the items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

Description of the SON-R 2,,,,,-7

Table 3.1 Tasks in the subtests of the SON-R 2,-7 . . . . . . . . . . . . . . . . . . . . . . . . . . 26

Figure 3.1 Items from the subtest Mosaics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Figure 3.2 Items from the subtest Categories . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

Figure 3.3 Items from the subtest Puzzles . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

Figure 3.4 Items from the subtest Analogies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

Figure 3.5 Items from the subtest Situations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

Figure 3.6 Items from the subtest Patterns . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

Table 3.2 Classification of the subtests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

Standardization of the test scores

Table 4.1 Composition of the norm group according to age, sex and phase of research 37

Table 4.2 Demographic characteristics of the norm group in comparison with the

Dutch population . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

Table 4.3 Education and country of birth of the mother in the weighted and

unweighted norm group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

Psychometric characteristics

Table 5.1 P-value of the items . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

Figure 5.1 Plot of the discrimination and difficulty parameter of the items . . . . . . . . . 45

Table 5.2 Mean and standard deviation of the raw scores . . . . . . . . . . . . . . . . . . . . . . 46

Table 5.3 Distribution characteristics of the standardized scores in the weighted

norm group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

Table 5.4 Floor and ceiling effects at different ages . . . . . . . . . . . . . . . . . . . . . . . . . . 47

Table 5.5 Reliability, standard error of measurement and generalizability of the test

scores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48

Table 5.6 Reliability and generalizability of the IQ score of the Preschool SON, the

SON-R 2,-7 and the SON-R 5,-17 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

Table 5.7 Correlations between the subtests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

Table 5.8 Correlations of the subtests with the rest total score and the square of the

multiple correlations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52

Table 5.9 Results of the Principal Components Analysis in the various age and

research groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 53

Table 5.10 Test-retest results with the SON-R 2,-7 . . . . . . . . . . . . . . . . . . . . . . . . . . . 55

Table 5.11 Examples of test scores from repeated test administrations . . . . . . . . . . . . 56

Page 7: SON-R 2 - Tests & Test-research

7CONTENTS

Relationships with other variables

Table 6.1 Duration of the test administration . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

Table 6.2 Relationship of the IQ scores with the time of administration . . . . . . . . . . 58

Table 6.3 Examiner effects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

Table 6.4 Regional and local differences . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

Table 6.5 Relationship of the test scores with sex . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

Table 6.6 Relationship of the IQ score with the occupational and educational level

of the parents . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61

Table 6.7 Relationship of the IQ score with the SES level . . . . . . . . . . . . . . . . . . . . . 62

Table 6.8 Relationship between IQ and country of birth of the parents . . . . . . . . . . . 63

Table 6.9 Relationship between evaluation by the examiner and the IQ . . . . . . . . . . 64

Table 6.10 Correlations of the total scores with the evaluation by the teacher . . . . . . . 65

Table 6.11 Correlations of the subtest scores with the evaluation by the teacher . . . . 66

Research on special groups

Table 7.1 Subdivision of the research groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 68

Table 7.2 Composition of the research groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

Table 7.3 Test scores per group . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

Figure 7.1 Distribution of the 80% frequency interval of the IQ scores of the various

groups . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

Table 7.4 Relationship of the IQ scores with background variables . . . . . . . . . . . . . . 74

Table 7.5 Reasons for referral of children at schools for Special Education and

Medical Daycare Centers for preschoolers, with mean IQ scores . . . . . . . 75

Table 7.6 Relationship between IQ and evaluation by the examiner . . . . . . . . . . . . . 76

Table 7.7 Correlations between test scores and evaluation by institute or school

staff member . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77

Table 7.8 Correlations between the subtests and subtest-rest correlations . . . . . . . . . 79

Immigrant children

Table 8.1 Test scores of native Dutch children, immigrant children and children

of mixed parentage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

Table 8.2 Relationship between group, SES level and IQ . . . . . . . . . . . . . . . . . . . . . . 82

Table 8.3 Differentiation of mean IQ scores according to country of birth . . . . . . . . 83

Table 8.4 Mean IQ scores of Surinam, Turkish and Moroccan children who had

participated in the OPSTAP(JE) project . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

Relationship with cognitive tests

Table 9.1 Overview of the criterion tests used and the number of children to whom

each test was administered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

Table 9.2 Characteristics of the children to whom a criterion test was administered

in the standardization research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 89

Table 9.3 Correlations with other tests in the standardization research . . . . . . . . . . . 90

Table 9.4 Correlations with nonverbal cognitive tests in the second year of

kindergarten, 5 to 6 years of age . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 94

Page 8: SON-R 2 - Tests & Test-research

8 SON-R 2,-7

Table 9.5 Correlations with cognitive tests completed by children at low SES schools

given educational priority . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

Table 9.6 Characteristics of the children in the special groups to whom a criterion

test was administered . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

Table 9.7 Correlations with criterion tests in the special groups . . . . . . . . . . . . . . . . 98

Table 9.8 Correlations with the WPPSI-R in Australia . . . . . . . . . . . . . . . . . . . . . . . . 102

Table 9.9 Age and sex distribution of the children in the American validation

research . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103

Table 9.10 Correlations with criterion tests in the American research . . . . . . . . . . . . . 104

Table 9.11 Correlations with the BAS in Great Britain . . . . . . . . . . . . . . . . . . . . . . . . . 105

Table 9.12 Overview of the correlations with the criterion tests . . . . . . . . . . . . . . . . . . 107

Table 9.13 Difference in scores between SON-IQ and PIQ of the WPPSI-R . . . . . . . . 108

Table 9.14 Correlations of the Performance Scale and the Reasoning Scale with

criterion tests, for cases in which the difference between correlations was

greater than .10 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

Table 9.15 Comparison between the mean test scores of the SON-R 2,-7 and the

criterion tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

Table 9.16 Comparisons between tests of the evaluation of the subject’s testability . . 113

Table 9.17 Comparisons between tests in relation to socioeconomic and ethnic

background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

Table 9.18 Comparisons between tests in relation to evaluation of intelligence and

language skills . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116

Implications of the research for clinical situations

Table 10.1 Mean change in IQ score over a period of one month . . . . . . . . . . . . . . . . . 118

Figure 10.1 The components of the variance of the SON-R 2,-7 IQ score . . . . . . . . . 123

Table 10.2 Classification of IQ scores and intelligence levels . . . . . . . . . . . . . . . . . . . 130

Table 10.3 Composition of the variance when several tests are administered . . . . . . . 132

Table 10.4 Correction of mean IQ score based on administration of two or three tests 133

Table 10.5 Obsolescence of the norms of the SON-IQ . . . . . . . . . . . . . . . . . . . . . . . . . 133

Record form, norm tables and computer program

Table 13.1 Examples of the calculation of the subject’s age . . . . . . . . . . . . . . . . . . . . . 190

Figure 13.1 Diagram of the working of the computer program . . . . . . . . . . . . . . . . . . . 195

Table 13.2 Comparison between the possibilities using the computer program and

using the norm tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 197

Table 13.3 Examples of probability and reliability intervals for various scores . . . . . 202

Page 9: SON-R 2 - Tests & Test-research

9

Nan JanSnijders-Oomen Snijders(1916-1992) (1910-1997)

FOREWORD

The publication of the SON-R 2,-7 completes the third revision of the Snijders-Oomen Non-verbal Intelligence Tests. Over a period of fifty years Nan Snijders-Oomen and Jan Snijderswere responsible for the publication of the SON tests. We feel honored to be continuing theirwork. They were interested in this revision and supported us with advice until their death.

The present authors played different roles in the production of this test and the manual. PeterTellegen, as project manager, was responsible for the revision of the test and supervised theresearch. Marjolijn Winkel made a large contribution to all phases of the project in the context ofher PhD research. Her thesis on the revision of the test will be published at the end of 1998. JaapLaros, at present working at the University of Brasilia, participated in the construction of thesubtests, in particular Mosaics and Analogies. Barbara Wijnberg-Williams, made a large contri-bution, based on her experience as a practicing psychologist at the University Hospital ofGroningen, to the manner in which the test can be administered nonverbally to children withcommunicative handicaps.

The research was carried out at the department for Personality and Educational Psychology ofthe University of Groningen. Wim Hofstee, head of the department, supervised the project.Jannie van den Akker and Christine Boersma made an important contribution to the organiza-tion of the research.

The research was made financially possible by a subsidy from SVO, the Institute for Education-al Research (project 0408), by a subsidy from the Foundation for Behavioral Sciences, a section

Page 10: SON-R 2 - Tests & Test-research

10 SON-R 2,-7

of the Netherlands Organization for Scientific Research (NWO-project 575-67-033), and bycontributions from the SON research fund. Wolters-Noordhoff, who previously published theSON-tests, made an important contribution to the development of the testing materials. Thedrawings for the subtests Categories, Puzzles and Situations were made by Anjo Mutsaars. Thefigures for the subtest Patterns were executed by Govert Sips of the graphical design agencySips. Wouter Veeman from Studio van Stralen executed the subtests Mosaics and Analogies.

The construction of a test requires a large number of subjects, for the construction research aswell as for the standardization and the validation. In the last few years, more than three thousandchildren were tested with the SON-R 2,-7 in the framework of the research. We are greatlyindebted to them, as well as to their parents and the staff members of the schools and instituteswhere the research was carried out.

In the Netherlands, as well as in Australia, Great Britain and the United States of America, manystudents, researchers, and practicing psychologists and orthopedagogic specialists contributedto the research. Thanks to their enthusiasm and involvement, the research could be carried outon such a large and international scale. Without claiming to be comprehensive, we would like tomention the following people by name:

Margreet Altena, Rachida El Baroudi, Cornalieke van Beek, Wynie van den Berg,M. van den Besselaar, Marleen Betten, Marjan Bleckman, Nico Bollen, Rene Bos,Ellen Bouwer, Monique Braat, C. Braspenning, Marcel Broesterhuizen, Karen Brok,Ankie Bronsveld, Aletha Brouwer, Anne Brouwer, Sonja Brouwer, Lucia Burnett,Mary Chaney, Janet Cooper, Pernette le Coultre-Martin, Richard Cress, J. van Daal,Shirley Dennehy, M. van Deventer, Dorrit Dickhout-Kuiper, Julie Dockrell,Nynke Driesens, Petra van Driesum, Marcia van Eldik, Marielle Elsjan, Yvonne Eshuis,Arnoud van Gaal, Judith Gould, Marian van Grinsven, Nicola Grove,Renate Grovenstein, Marije Harsta, R.G. den Hartog, Leida van der Heide,Roel van der Helm, Marlou Heppenstrijdt, Valerie Hero, Sini Holm,Marjan Hoohenkerk, E.P.A. Hopster, Jacqueline ten Horn, Jeannet Houwing,Hans Höster, Jo Jenkinson, Jacky de Jong, Myra de Jong, Anne Marie de Jonge,José Kamminga, Jennifer Kampsnider, Claudine Kempa, Debby Kleymeer, JeanetKoekkoek, Marianne van de Kooi, Annette Koopman, Monique Koster, A.M. Kraal,Marijke Kuiper, Koosje Kuperus, Marijke Künzli-van der Kolk, Judith Landman,Nan Le Large, Del Lawhon, J. van Lith-Petry, Jan Litjens, Amy Louden,Henk Lutje Spelberg, Mannie McClelland, Sanne Meeder, Anke van der Meijde,Jacqueline Meijer, Sjoeke van der Meulen, Bieuwe van der Meulen, Jitty Miedema,Margriet Modderman, Cristal Moore, Marsha Morgan, Renate Mulder,Marian Nienhuis-Katz, F. Nietzen, Theo van Noort, Stephen O’Keefe,Jamila Ouladali, Mary Garcia de Paredes, Inge Paro, Immelie Peeters, Jo Pelzer,Simone Peper, Trudy Peters-ten Have, Dorothy Peterson, Mirea Raaijmakers,Lieke Rasker, Inge Rekveld, Lucienne Remmers, E.J. van Rijn van Alkemade,Susan Roberts, Christa de Rover, Peter van de Sande, A.J. van Santen,Liesbeth Schlichting, Marijn Schoemaker, Ietske Siemann, Margreet Sjouw,Emma Smid, L. Smits, Tom Snijders, Marieke Snippe, P. Steeksma, Han Starren,Lilian van Straten, Penny Swan, Dorine Swartberg, Marjolein Thilleman,Lous Thobokholt-van Esch, Jane Turner, Dick Ufkes, Baukje Veenstra,Nettie van der Veen, Marja Veerman, Carla Vegter, Pytsje Veltman, Harriet Vermeer,Mieke van Vleuten, Jeroen Wensink, Betty Wesdorp-Uytenbogaart, Jantien Wiersma,Aranka Wijnands, G.J.M. van Woerden, Emine Yildiz and Anneke Zijp.

With the publication of this “Manual and Research Report” of the SON-R 2,-7, an importantphase of the revision of the test comes to an end. This does not mean that the test is ‘finished’.The value of a test is determined, for a large part, by diagnostic experiences and by ongoing

Page 11: SON-R 2 - Tests & Test-research

11FOREWORD

research. We are, therefore, interested in the experiences of users, and we would appreciatebeing informed of their research results when these become available as internal or externalpublications. We intend to inform users and other interested parties about the developments andfurther research with the SON tests via Internet. The address of the homepage will be:www.ppsw.rug.nl/hi/tests/sonr.

In the last years the need to carry out diagnostic research on children at a young age has greatlyincreased. Furthermore, the realization has grown that the more traditional intelligence tests areless suitable for important groups of children because they do not take sufficient account of thelimitations of these children, or of their cultural background. In these situations the SON testsare frequently used. We hope that this new version of the test will also contribute to reliable andvalid diagnostic research with young children.

Groningen, January 1998 Dr. Peter Tellegen

Heymans InstituteUniversity of GroningenGrote Kruisstraat 2/19712 TS GroningenThe Netherlands

tel. +31 50 363 6353fax +31 50 363 6304

e-mail: [email protected]://www.testresearch.nl

Reviewing of the SON-R 2,,,,,-7

The test has been reviewed by de COTAN, the test commission of the NetherlandsInstitute for Psychologists. The categories used are insufficient, sufficient and good.The rating is as follows:

Basics of the construction of the test: goodExecution of the materials: goodExecution of the manual: goodNorms: goodReliability: goodConstruct validity: goodCriterion validity: good

Page 12: SON-R 2 - Tests & Test-research
Page 13: SON-R 2 - Tests & Test-research

13

1 INTRODUCTION

The new version of the Snijders-Oomen Nonverbal Intelligence Test for children from two-and-a-half to seven years, the SON-R 2,-7, is an instrument that can be individually administered toyoung children for diagnostic purposes. The test makes a broad assessment of mental function-ing possible without being dependent upon language skills.

1.1 CHARACTERISTICS OF THE SON-R 2,,,,,-7

The SON-R 2,-7, like the previous version of the test, the SON 2,-7 (Snijders & Snijders-Oomen, 1976), provides a standardized assessment of intelligence. The child’s scores on sixdifferent subtests are combined to form an intelligence score that represents the child’s abilityrelative to his or her age group. Separate norm tables allow total scores to be calculated for theperformance tasks and for the tasks mainly requiring reasoning ability.

A distinctive feature of the SON-R 2,-7 is that feedback is given during administration ofthe test. After the child has given an answer, the examiner tells the child whether it is correct orincorrect. If the answer is incorrect, the examiner demonstrates the correct answer. Whenpossible, the correction is made together with the child. The detailed directions provided in themanual also make the test suitable for the assessment of very young children. In general, theexaminer demonstrates the first items of each subtest in part or in full. Examples are included inthe test directions and items.

The items on the subtests of the SON-R 2,-7 are arranged in order of increasing difficulty.This way a procedure for determining a starting point appropriate to the age and ability of eachindividual child can be used. By using the starting point and following the rules for discontinu-ing the test, the administration time is limited to fifty to sixty minutes.

The test can be administered nonverbally or with verbal directions. The spoken text does notgive extra information. The manner of administration can thus be adapted to the communicationability of each individual child, allowing the test to proceed as naturally as possible.

Because the test can be administered without the use of written or spoken language, it isespecially suitable for use with children who are handicapped in the areas of communicationand language. For the same reason it is also suitable for immigrant children who have little or nocommand of the language of the examiner.

The testing materials do not need to be translated, making the test suitable for internationaland cross-cultural research. The SON-tests are used in various countries. The names of thevarious subtests are shown on the test booklets in the following languages: English, German,Dutch, French, and Spanish. The manual has been published in English and German as well asin Dutch.

A similarity between the SON-R 2,-7 and other intelligence tests for (young) children, suchas the BAS (Elliott, Murray & Pearson, 1979-82), the K-ABC (Kaufman & Kaufman, 1983), theRAKIT (Bleichrodt, Drenth, Zaal & Resing, 1984) and the WPPSI-R (Wechsler, 1989), is thatintelligence is assessed on the basis of performance on a number of quite diverse tasks. How-ever, verbal test items are not included in the SON-R 2,-7. Such items are often dependent to agreat extent on knowledge and experience. The SON-R 2,-7 can therefore be expected to befocused more on the measurement of ‘fluid intelligence’ and less on the measurement of ‘crys-tallized intelligence’ (Cattell, 1971) than are the other tests.

Page 14: SON-R 2 - Tests & Test-research

14 SON-R 2,-7

The subtests of the SON-R 2,-7 differ from the nonverbal subtests in other intelligence tests intwo important ways. First, the nonverbal part of other tests is generally limited to typicalperformance tests. The SON-R 2,-7, however, includes reasoning tasks that take a verbal formin the other tests. Second, while the testing material of the performance part of the other tests isadmittedly nonverbal, the directions are given verbally (Tellegen, 1993).

An important difference with regard to other nonverbal intelligence tests such as the CPM(Raven, 1962) and the TONI-2 (Brown, Sherbenou & Johnsen, 1990) is that the latter testsconsist of only one item-set and are therefore greatly dependent on the specific ability that ismeasured by that test. Nonverbal intelligence tests such as the CTONI (Hammill, Pearson &Wiederholt, 1996) and the UNIT (Bracken & McCallum, 1998) consist of various subtests, likethe SON-R 2,-7. A fundamental difference, however, is that the directions for these tests aregiven exclusively with gestures, whereas the directions with the SON-R 2,-7 are intended tocreate as natural a test situation as possible.

An important way in which the SON-R 2,-7 differs from all the above-mentioned tests isthat the child receives assistance and feedback if he or she cannot do the task. In this respect theSON-R 2,-7 resembles tests for learning potential that determine to what extent the childprofits from the assistance offered (Tellegen & Laros, 1993a). The LEM (Hessels, 1993) is anexample of this kind of test.

In sum, the SON-R 2,-7 differs from other tests for young children in its combination of afriendly approach to children (in the manner of administration and the attractiveness of thematerials), a large variation in abilities measured, and the possibility of testing intelligenceregardless of the level of language skill.

1.2 HISTORY OF THE SON-TESTS

The publication of the SON-R 2,-7 completes the third revision of the test battery that NanSnijders-Oomen started more than fifty years ago. In table 1.1 the earlier versions are shownschematically.

The first version of the SON-test was intended for the assessment of cognitive functioning indeaf children from four to fourteen years of age (Snijders-Oomen, 1943). Drawing on existingand newly developed tasks, Snijders-Oomen developed a test battery which included an assort-ment of nonverbal tasks related to spatial ability and abstract and concrete reasoning. The testwas intended to provide a clear indication of the child’s learning ability and chances of succeed-ing at school. One requirement for the test battery was that upbringing and education shouldinfluence the test results as little as possible. Further, a variety of intellectual functions had to beexamined with the subtests, and the tasks had to interest the child to prevent him or her becom-ing bored or disinclined to continue.

No specific concept of intelligence was assumed as a basis for the test battery. However,‘form’, ‘concrete coherence’, ‘abstraction’ and ‘short-term memory’ were seen as acceptablerepresentations of intellectual functioning typical of subjects suffering from early deafness(Snijders-Oomen, 1943). The aim of the test battery was to break through the one-sidedness ofthe nonverbal performance tests in use at the time, and to make functions like abstraction,symbolism, understanding of behavioral situations, and memory more accessible for nonverbaltesting.

The first revision of the test was published in 1958, the SON-’58 (Snijders & Snijders-Oomen, 1958). In this revision the test battery was expanded and standardized for hearing aswell as deaf children from four to sixteen years of age.

Two separate test batteries were developed during the second revision. The most importantreason for this was, in all the subtests of the original SON, a different type of test item hadseemed more appropriate for children above six years of age. The bipartite test that alreadyexisted in fact was implemented systematically in this second revision: the SSON (Starren,1975) was designed for children from seven to seventeen years of age; for children from three toseven years of age the SON 2,-7, commonly known as Preschool SON, or P-SON, was devel-oped (Snijders & Snijders-Oomen, 1976).

Page 15: SON-R 2 - Tests & Test-research

15INTRODUCTION

The form and contents of the SSON strongly resembled the SON-’58, except that the SSONconsisted entirely of multiple choice tests. After the publication of the SSON in 1975, the SON-’58 remained in production because it was still in demand. In comparison to the SSON, theSON-’58 contained more stimulating tasks and provided more opportunity for observation ofbehavior, because it consisted of tests in which children were asked to manipulate a large varietyof test materials. The subtests in the Preschool SON maintained this kind of performance test toprovide opportunities for the observation of behavior.

The third revision of the test for older children, the SON-R 5,-17, was published in 1988(Snijders, Tellegen & Laros, 1989; Laros & Tellegen, 1991; Tellegen & Laros, 1993b). This testreplaces both the SON-’58 and the SSON, and is meant for use with hearing and deaf childrenfrom five-and-a-half to seventeen years of age. In constructing the SON-R 5,-17 an effort wasmade to combine the advantages of the SSON and the SON-’58. On the one hand, a range ofdiverse testing materials was included. On the other hand, a high degree of standardization inthe administration and scoring procedures as well as a high degree of reliability of the test wasachieved.

The SON-R 5,-17 is composed of abstract and concrete reasoning tests, spatial ability testsand a perceptual test. A few of these tests are newly developed. A memory test was excludedbecause memory can be examined better by a specific and comprehensive test battery than by asingle subtest. In the SON-R 5,-17, the standardization for the deaf is restricted to conversionof the IQ score to a percentile score for the deaf population. The test uses an adaptive procedurein which the items are arranged in parallel series. This way, fewer items that are either too easyor too difficult are administered. Feedback is given in all subtests; this consists of indicating

Table 1.1Overview of the Versions of the SON-Tests

SON (1943)

Snijders-Oomen Deaf Children

4-14 years

SON-’58 (1958)

Snijders & Snijders-OomenDeaf and Hearing Children

4-16 years

SON 2,,,,,-7 (Preschool SON) SSON(1975) (1975)

Snijders & Snijders-Oomen StarrenHearing and Deaf Children Hearing and Deaf Children

3-7 years 7-17 years

SON-R 2,,,,,-7 SON-R 5,,,,,-17(1998) (1988)

Tellegen, Winkel, Wijnberg-Williams & Laros Snijders, Tellegen & LarosGeneral Norms General Norms2;6-8;0 years 5;6-17;0 years

– under each heading has been listed: the year of publication of the Dutch manual, the authors of themanual, the group and the age range for which the test was standardized

Page 16: SON-R 2 - Tests & Test-research

16 SON-R 2,-7

whether a solution is correct or incorrect. The standardized scores are calculated and printed bya computer program.The SON-R 5,-17 has been reviewed by COTAN, the commission of the Netherlands Institutefor Psychologists responsible for the evaluation of tests. All aspects of the test (Basics of theconstruction of the test, Execution of the manual and test materials, Norms, Reliability andValidity) were judged to be ‘good’ (Evers, Van Vliet-Mulder & Ter Laak, 1992). This means theSON-R 5,-17 is considered to be among the most highly accredited tests in the Netherlands(Sijtsma, 1993).

After completing the SON-R 5,-17, a revision of the Preschool SON was started, resulting inthe publication of the SON-R 2,-7. The test was published in 1996, together with a manualconsisting of the directions and the norm tables (Tellegen, Winkel & Wijnberg-Williams, 1997).In the present ‘Manual and Research report’, the results of research done with the test are alsopresented: the method of revision, the standardization and the psychometric characteristics, aswell as the research concerning the validity of the test. Norm tables allowing the calculation ofseparate standardized total scores for the performance tests and the reasoning tests have beenadded. Also, the reference age for the total score can be determined. Norms for experimentalusage have been added for the ages of 2;0 to 2;6 years. All standardized scores can easily becalculated and printed using the computer program.

1.3 RATIONALE FOR THE REVISION OF THE PRESCHOOL SON

The most important reasons for revising the Preschool SON were the need to update the norms,to modernize the test materials, to improve the reliability and generalizability of the test, and toprovide a good match with the early items of the SON-R 5,-17.

Updating the normsThe Preschool SON was published in 1975. After a period of more than 20 years, revision of anintelligence test is advisable. Test norms tend to grow obsolete in the course of time. Researchshows (Lynn & Hampson, 1986; Flynn, 1987) that performance on intelligence tests increasesby two or three IQ points over a period of 10 years. Experience in the Netherlands with therevision of the SON-R 5,-17 and the WISC-R is consistent with this (Harinck & Schoorl,1987). Comparisons in the United States of scores on the WPPSI and WPPSI-R, and scores onthe WISC-R and WISC-III showed an average increase in the total IQ scores of more than threepoints every ten years. The increase in the performance IQ was more than four points every tenyears (Wechsler, 1989, 1991).

Changes in the socio-economic environment may explain the increase in the level of perform-ance on intelligence tests (Lynn & Hampson, 1986). Examples of these changes are watchingtelevision, increase in leisure time, smaller families, higher general level of education, changes inupbringing and education. The composition of the general population has also changed; in theNetherlands the population is ageing and the number of immigrants is increasing. The norms ofthe Preschool SON from 1975 can be expected to provide scores that are too high, and that nolonger represent the child’s performance in comparison to his or her present age group.

The testing materialsThe rather old-fashioned testing materials were the second reason for revising the test: some ofthe drawings used were very dated, and the increasing number of immigrant children in theNetherlands over the last twenty years makes it desirable to reflect the multi-cultural back-ground of potential subjects in the materials (see Hofstee, 1990). The structure of the materialsand the storing methods of the test were also in need of improvement.

Improving the reliability and generalizabilityA third motive for revision was to improve the reliability and generalizability of the PreschoolSON, especially for the lower and upper age ranges. Analysis of the data presented in the manual

Page 17: SON-R 2 - Tests & Test-research

17INTRODUCTION

of the Preschool SON showed that the subtests differentiated too little at these ages. The range ofpossible raw scores had a mean of 12 points. In the youngest age group, 20% of the childrenreceived the lowest score on the subtests and in the oldest age group, 43% received the highestscore (Hofstee & Tellegen, 1991). In other words, the Preschool SON was appropriate for chil-dren of four or five years old, but it was often too difficult for younger children and too easy forolder children. Further, there was no standardization at the subtest level, only at the level of thetotal score; this meant that it was not possible to calculate the IQ properly if a subtest had not beenadministered. Finally, the norms were presented per age group of half a year. This could lead to adeviation of six IQ points if the age did not correspond to the middle of the interval.

Correspondence with the SON-R 5,,,,,-17To be able to compare the results of the SON-R 2,-7 with those of the SON-R 5,-17, the newtest for young children should be highly similar to the test for older children. An overlap in theage ranges of the tests was also considered desirable. This way, the choice of a test can be basedon the level of the child, or on other specific characteristics that make one test more suitablethan the other. Various new characteristics of the SON-R 5,-17, such as the adaptive testprocedure, the standardization model and the use of a computer program, were implemented asfar as possible in the construction of the SON-R 2,-7.

1.4 PHASES OF THE RESEARCH

On the basis of the above-mentioned arguments it was decided to revise the Preschool SON. Therevision was not restricted to the construction of new norms; the items, subtests and directionswere also subjected to a thorough revision. The revision proceeded in several phases. Thissection presents a short review of the research phases.

Preparatory studyIn the preparatory study the Preschool SON was evaluated. This started in 1990. The aim of thepreparatory study was to decide how the testing materials of the Preschool SON could best beadapted and expanded. To this end, users of the Preschool SON were interviewed, the literaturewas reviewed, other intelligence tests were analyzed and a secondary analysis of the data of thestandardization research of the Preschool SON was performed.

Construction research phaseThe construction research for the SON-R 2,-7 took place in 1991/’92. During this period, threeexperimental versions of the test were administered to more than 1850 children between twoand seven years of age. The final version of the SON-R 2,-7 was compiled on the basis of thedata from this research, the experiences and observations of examiners, and the comments andsuggestions of psychologists and educators active in the field.

Standardization research phaseThe standardization research, in which more than 1100 children in the age range two to sevenyears participated, took place during the school year 1993/’94. The results of this researchformed the basis for the standardization of the SON-R 2,-7, and the evaluation of its psycho-metric characteristics. During the standardization research, background data relevant for theinterpretation of the test scores were collected.

For the validation of the test, other language and intelligence tests were administered to alarge number of the children who participated in the standardization research. Administration ofthese tests was also made possible by collaboration with the project group that was responsiblefor the standardization of the Reynell Test for Language Skills (Van Eldik, Schlichting, LutjeSpelberg, Sj. van der Meulen & B.D. van der Meulen, 1995) and the Schlichting Test forLanguage Production (Schlichting, Van Eldik, Lutje Spelberg, Sj. van der Meulen & B.F. vander Meulen, 1995).

Page 18: SON-R 2 - Tests & Test-research

18 SON-R 2,-7

Validation research phaseSeparate validation research was done for the following groups: children in special educationalprograms, children at medical preschool daycare centers, children with a language, speech and/or hearing disorder, deaf children, autistic children and immigrant children. Validation researchwas also carried out in Australia, the United States of America and the United Kingdom. Theresults of these children on the SON-R 2,-7 have been compared with their performance onmany other cognitive tests.

1.5 ORGANIZATION OF THE MANUAL

This manual is made up of three parts. In the first part the construction phase of the test isdiscussed. Chapter 2 deals with the preparatory study and the construction research duringwhich new testing materials and administration procedures were developed. In chapter 3 adescription is given of the subtests and the main characteristics of the administration of the test.The standardization research and the standardization model used are described in chapter 4.Information about psychometric characteristics such as reliability, factor structure and stabilitycan be found in chapter 5.

In the second part research concerning the validity of the test is described. Chapter 6 is basedon the results in the norm group and discusses the relations between test performance and othervariables, such as socio-economic level, sex and evaluations by the examiner and teachers. Inchapter 7 the test results in a number of special groups of children, with whom the SON-tests areoften used, are discussed. The special groups include children with a developmental delay,autistic children, language, speech and/or hearing disabled children, and deaf children. Chapter8 deals with the performance of immigrant children. In chapter 9 the correlations between theSON-R 2,-7 and several other tests for intelligence, language skills, memory and perceptionare discussed. The research on validity involved both children in regular education and handi-capped children, and was partly carried out in other countries.

The third part of this book concerns the practical application of the test. Chapter 10 dealswith the implications of the research results in practice, and with problems that can arise withthe interpretation of the results. The general directions for the administration and scoring of thetest are described in chapter 11; the directions for the separate subtests can be found in chapter12. Chapter 13 gives guidelines for using the record form, the norm tables and the computerprogram.

In the appendices the norm tables for determining the reference age, and the standardizedsubtest and total scores can be found, as well as an example of the record form and a descriptionof the contents of the test kit.

In general, ages in the text and tables are presented in years and months. This means that 4;6years equals four years and six months. In a few tables the mean ages are presented with adecimal; this means that 4.5 years is the same as 4;6 years. In the norm tables the age of 4;6years indicates an interval from ‘four years, six months, zero days’ to ‘four years, six months,thirty days’ inclusive.

To improve legibility, statistical results have been rounded off. This can lead to seeminglyincorrect results. For instance a distribution of 38.5% and 61.5% becomes, when rounded off,39% and 62%, and this does not add up to 100%. Similar small differences may occur in thepresentation of differences in means or between correlations.

Pearson product-moment correlations were used in the analyses. Unless stated otherwise,the correlations were tested one-sidedly.

Page 19: SON-R 2 - Tests & Test-research

19

2 PREPARATORY STUDY ANDCONSTRUCTION RESEARCH

In this chapter, the test construction phase is described. In this phase, the research necessary toconstruct a provisional version of the test was carried out. Successive improvements resulted inthe final test battery.

2.1 THE PREPARATORY STUDY

The preparatory study was carried out to discover how best to adapt and possibly to expand thematerials of the Preschool SON. To this end ten users of the Preschool SON were interviewedabout their experience with the test (via questionnaires). Secondary analyses were also carriedout on the original material from the standardization research of the Preschool SON. A reviewof the literature and an analysis of other intelligence tests were undertaken as a preparation forthe revision (Tellegen, Wijnberg, Laros & Winkel, 1992).

Composition of the Preschool SONThe Preschool SON was composed of fifty items distributed over five subtests: Sorting, Mosa-ics, Combination, Memory and Copying. In the subtest Sorting, geometrical forms and pictureswere sorted according to the category to which they belong. The subtest Mosaics was an actiontest in which various mosaic patterns had to be copied using red and yellow squares. Combina-tion consisted of matching halves of pictures and doing puzzles. In the subtest Memory, alsocalled the Cat House, the aim was to find either one or two cats that were hidden several times inthe house. Copying consisted of copying figures that were drawn by the examiner or shown in atest booklet.

Evaluation by usersAn inventory of the comments received from ten users of the Preschool SON was made.These were psychologists employed by school advisory services, audiological centers, insti-tutes for the deaf, medical preschool daycare centers, and in the care for the mentallydeficient.

On the whole, the Preschool SON was given a positive assessment as a test to which childrenrespond well and that affords plenty of opportunity to observe the child’s behavior. The usersdid, however, have the impression that the IQ score of the Preschool SON overrated the level ofthe children. Clear information about administering and scoring the various subtests was lack-ing in the manual. The users followed the directions accurately but not literally. Furthermore,they thought the subtests contained too few examples. They were inclined to provide extra help,especially to young and to mentally deficient children. The discontinuation criterion, used in thePreschool SON, was three consecutive mistakes per subtest. This discontinuation rule wasconsidered too strict, particularly for the youngest children, and, in practice, this rule was notalways applied.

The subtest Memory was administered in different ways. Some users administered it as agame, playing a kind of hide and seek, whereas others tried to avoid doing this. The users hadthe impression that this subtest was given too much weight in the total score of the PreschoolSON. Also, some doubt existed about the relationship between this subtest and the other ones.

Page 20: SON-R 2 - Tests & Test-research

20 SON-R 2,-7

Comparative research on the Preschool SON, the Stanford-Binet and parts of the WPPSI wasconducted by Harris in the United States of America. In general, her assessment of the test waspositive. Her criticism focused on some of the materials and the global norm tables (Harris, 1982).

Secondary analyses of the standardization dataThe original data from a sample of hearing children (N=503) involved in the standardizationresearch of the Preschool SON was used for the secondary analyses. A study was made of thedistribution of the test scores according to age, the correlation between the test scores and thereliability. The results were as follows:– The standard deviation of the raw subtest scores was usually highest in children from four to

five years of age. For Mosaics and Copying, the range of scores for young children from 2;6to 4 years was very restricted. For most subtests the range decreased greatly in the oldestgroups from 5;6 to 7 years.

– In the conversion of the scores into IQ scores, the distributions were not sufficiently normal-ized, so that they were negatively skewed for children from five years onwards. This couldresult in extremely low IQ scores.

– The reliability for combinations of age groups was recalculated. After this, a correction forage was carried out. The mean reliability of the subtests was .57 for children from 2;6 to fouryears of age, .66 for children from four to five years, and .61 for children from 5;6 to sevenyears. The reliability of the total score was .78 for children from 2;6 to four years, .86 forchildren from four to five years, and .82 for children from 5;6 to seven years. Generally, thereliability was low, especially for the youngest and oldest age groups where strong floor andceiling effects were present. The reliability of the subtests and the total scores was muchlower than the values mentioned in the manual of the Preschool SON. The cause of thisdiscrepancy was that, in the manual, the reliability was calculated for combined age groupswith no correction for age.

– The generalizability of the total score is important for the interpretation of the IQ scores. Inthis case, the subtests are seen as random samples from the domain of possible, relevantsubtests. The generalizability coefficient of the Preschool SON was .61 for the age groupfrom 2;6 to four years, .75 for the age group from four to five years and .65 for the age groupfrom 5;6 to seven years.

– The reliability of the subtest Memory was low and the score on this subtest showed a lowcorrelation with age and with the scores on the remaining subtests.

Review of the literatureIn the revision of the Preschool SON we attempted to produce a version that was compatible withthe early items of the SON-R 5,-17. As the subtest Analogies in the SON-R 5,-17 is one of itsstrongest components, the possibility of developing a similar analogy test for young children wasexamined. Based on recent research results (Alexander et al., 1989; Goswami, 1991) it seemedpossible to construct an analogy test for children from about 4 years of age onwards. Since ananalogy test would most likely be too difficult for the youngest children, starting this test withsorting seemed advisable; the level of abstraction required for sorting is lower than the level ofabstraction required for understanding analogies, and, in a certain sense, precedes it.

Implications for the revisionThe results of the preparatory study confirmed the need for a new standardization and athorough revision of the Preschool SON. An important goal in the revision of the PreschoolSON was the improvement of the psychometric characteristics of the test. The reliability andthe generalizability of the test scores were lower than was desirable, especially in theyoungest and oldest of the age groups for which the test was designed. However, an increasein reliability could not be gained simply by expanding the number of items and subtestsbecause an increase in the duration of the test could lead to fatigue, loss of motivation anddecrease in concentration. Any expansion of the test had therefore to be combined with aneffective adaptive procedure.

Page 21: SON-R 2 - Tests & Test-research

21

For the SON-R 5,-17 with an administration time of about one-and-a-half hours, the meanreliability of the total score is .93 and the generalizability is .85. If the administration of theSON-R 2,-7 was to be limited to one hour, a reliability of .90 and a generalizability of .80 seemedto be realistic goals. The improvement of these characteristics could be achieved by adding veryeasy and very difficult items to each subtest, and by increasing the number of subtests.

An important object during the revision of the Preschool SON was to obtain a good matchwith the early items of the SON-R 5,-17. As the age ranges of the two tests overlapped, the ideawas to take the easy items of the SON-R 5,-17 as a starting point for the new, most difficultitems of the SON-R 2,-7.

These considerations led to a plan for the revision of the Preschool SON in which the subtestMemory was dropped. The subtest Memory (the Cat House) had a low level of reliability and,what is more, a low correlation with age and the remaining subtests. The interviews with usersof the Preschool SON showed that children enjoyed doing the Cat House subtest, but that thedirections for administration were often not followed correctly. Another consideration was thatassessment of memory can be carried out more effectively with a specific and comprehensivetest battery. The results from a single subtest are insufficient to draw valid conclusions aboutmemory. On the basis of similar considerations, no memory subtest had been included in theSON-R 5,-17. The four remaining subtests of the Preschool SON were expanded to six sub-tests by dividing two existing subtests:– The subtest Sorting was divided into two subtests: the section Sorting Disks was expanded

with simple analogy items consisting of geometrical forms similar to the SON-R 5,-17; thesection Sorting Pictures was expanded with easy items from the subtest Categories of theSON-R 5,-17.

– The section of the subtest Combining, in which two halves of a picture had to be combined,was expanded with items from the subtest Situations from the SON-R 5,-17; the sectionPuzzles was expanded and implemented as a separate subtest.

– The subtest Mosaics was expanded with simple items and with items from the SON-R 5,-17.– The subtest Copying was adapted to increase its similarity to the subtest Patterns of the

SON-R 5,-17.

The relationship between the subtests of the Preschool SON and the SON-R 2,-7 is presentedschematically in table 2.1.

Table 2.1Relationship Between the Subtests of the Preschool SON and the SON-R 2,-7

Preschool SON SON-R 2,,,,,-7

Subtest Task Subtest Task

Sorting Sorting disks Analogies Sorting disksAnalogies SON-R 5,-17

Sorting figures Categories Sorting figuresCategories SON-R 5,-17

Mosaics Mosaics Mosaics Mosaics in a framewith/without a frame Mosaics SON-R 5,-17

Combination Two halves of a Situations Two halves of a picturepicture Situations SON-R 5,-17

Puzzles Puzzles Puzzles in a frame‘separate puzzles’

Memory Finding cats

Copying Copying drawn figures Patterns Copying patterns

PREPARATORY STUDY AND CONSTRUCTION RESEARCH

Page 22: SON-R 2 - Tests & Test-research

22 SON-R 2,-7

2.2 THE CONSTRUCTION RESEARCH

In 1991/’92, extensive research was done with three experimental versions of the test. Thesewere administered to more than 1850 children between two and eight years of age. The researchwas carried out in preschool play groups, day care centers and primary schools across theNetherlands. The versions were also administered on a small scale to deaf children and childrenwith learning problems. The examiners participating in the construction research were mainlytrained psychologists with experience in testing. Psychologists and educators who normallymake diagnostic assessments of young children were contacted in an early phase to obtaininformation about the usability of the construction versions for children with specific problems.More than twenty people in the field, employed by school advisory services, audiologicalcenters and outpatient departments, administered sections of the three versions to a number ofchildren. They commented on and gave suggestions for the construction of the material, thedirections and the administration procedure.

Points of departure for the constructionThe most important objectives in the construction and administration of the experimentalversions were:– expanding the number of items and subtests to improve the reliability of the test and to make

the test more suitable for the youngest and the oldest age groups,– limiting the mean administration time to a maximum of one hour by using an effective

adaptive procedure,– making the testing materials both attractive for children and durable,– developing clear directions for the administration of the test and the manner of giving feed-

back.

Testing materialsFrom the first experimental version on, the test consisted of the following subtests: Mosaics,Categories, Puzzles, Analogies, Situations and Patterns. This sequence was maintained through-out the three versions. Tests that are spatially oriented are alternated with tests that requirereasoning abilities, and abstract testing materials are alternated with materials using concrete(reasoning) pictures. Mosaics is a suitable test to begin with as it requires little direction, thechild works actively at a solution, and the task corresponds to activities that are familiar to thechild.

The items of the experimental versions consisted of (adapted) items from the Preschool SONand the SON-R 5,-17 and of newly constructed items. Most of the new items were very simpleitems that would make the test better suited to young children. Table 2.2 shows the origin of theitems in the final version of the test. Of a total of 96 items, five of which are example items, 45%are new, 25% are adaptations of Preschool SON items, and 30% are adaptions from the SON-R5,-17.

In the first experimental version the original items of the Preschool SON and the SON-R5,-17 were used. In the following versions all items of the subtests were redrawn andreformed to improve the uniformity of the material and to simplify the directions for thetasks. In the pictures of people the emphasis was on pictures of children and care was takento have an even distribution of boys and girls. More children with a non-western appearancewere included.

An effort was made to make the material colorful and attractive, durable and easy to store. Amat was used to prevent the material from sliding around, to facilitate picking up the pieces andto increase the standardization of the test situation.

Adaptive procedure and duration of administrationTo make the test suitable for the age range from two to seven years, a broad range of taskdifficulty is required. An adaptive test procedure is desirable to limit the duration of the test, andto prevent children having to do tasks far above or far below their level. Having to do items that

Page 23: SON-R 2 - Tests & Test-research

23

are much too difficult is very frustrating and demotivating for children. When older children aregiven items that are much too easy, they very quickly consider these childish and may then beinclined not to take the next, more difficult items seriously.

In the Preschool SON a discontinuation rule of three consecutive mistakes was used.Because the mistakes had to be consecutive, children sometimes had to make many mistakesbefore the test could be stopped. In practice this meant that, especially with young children,examiners often stopped too early. In the SON-R 5,-17 the items are arranged in two or threeparallel series and in each series the test is discontinued after a total of two mistakes. In the firstseries the first item is taken as a starting point; in the following series the starting point dependson the performance in the previous series. This method has great advantages: everyone starts thetest at the same point, but tasks that are too easy as well as tasks that are too difficult are skipped.Further, returning to an easier level in the next series is pleasant for the child after he or she hasdone a few tasks incorrectly.

Research was carried out with the first experimental version to see if the adaptive method ofthe SON-R 5,-17 could also be applied with the SON-R 2,-7. The problem was, however, thatthe subtests consist of two different parts. This makes a procedure with parallel series confusingand complicated because switching repeatedly from one part of the test to the other may benecessary. In the subsequent construction research, only one series of items of progressivedifficulty was used. However, the discontinuation criterion was varied and research was done onthe effect of using an entry procedure in which the item taken as a starting point depended on theage of the child.

Finally, on the basis of the results of this research, a procedure was chosen in which thefirst, third or fifth item is taken as a starting point and each subtest is discontinued after atotal of three mistakes. The performance subtests can also be discontinued when two subse-quent mistakes are made in the second section of these tests. The items in these subtests havea high level of discrimination, and the children require a fair amount of time to complete thetasks. They become frustrated if they have to continue when the next item is clearly toodifficult for them.

As a result of the adaptive procedure, the number of items to be administered is strictlylimited, and the mean duration of the test is less than an hour, but very little information is lostby skipping a few items. Further, the children’s motivation remains high during this procedurebecause only a very few items above their level are administered.

Difficulty of items and ability to discriminateAfter each phase of research the results were analyzed per subtest with the 2-parameter logisticmodel from the item response theory (IRT; see Lord, 1980; Hambleton & Swaminathan, 1985).The program BILOG (Mislevy & Bock, 1990) was used for this analysis. With this program theparameters for difficulty and discrimination of items can be estimated for incomplete tests. The

Table 2.2Origin of the Items

Subtests of the SON-R 2,,,,,-7

Origin Mos Cat Puz Ana Sit Pat Total

Adapted fromthe Preschool SON 3 4 6 3 2 6 24

Adapted fromthe SON-R 5,-17 6 9 – 5 9 – 29

New items 7 3 9 10 4 10 43

Total number of items,including examples 16 16 15 18 15 16 96

PREPARATORY STUDY AND CONSTRUCTION RESEARCH

Page 24: SON-R 2 - Tests & Test-research

24 SON-R 2,-7

IRT-model was used because the adaptive administration procedure makes it difficult to evalu-ate these characteristics on the basis of p-values and item-total correlations. The parameter fordifficulty indicates a level of ability at which 50% of the children solve the item correctly; theparameter for discrimination indicates how, at this level, the probability that the item will beanswered correctly increases as ability increases.

Because of the use of an adaptive procedure, it was important that the items wereadministered in the correct order of progressive difficulty; the examiner had to be reasonablycertain that items skipped at the beginning would have been solved correctly, and that itemsskipped at the end would have been solved incorrectly. Also important was a balanceddistribution in the difficulty of the items, and sufficient numbers of easy items for youngchildren and difficult items for older ones. On the basis of the results of the IRT-analysis,new items were constructed, some old items were adapted and others were removed from thetest. In some cases the order of administration was changed. A problem arising from this wasthat items may become more difficult when administered early in the test. The help andfeedback given after an incorrect solution may benefit the child so that the next, moredifficult item becomes relatively more easy.

Directions and feedbackAn important feature of the SON-tests is that directions can be given verbally as well asnonverbally. This makes the test situation more natural because the directions can correspond tothe communication skills of the child. When verbal directions are given, care must be taken notto provide extra information that is not contained in the nonverbal directions. However, nonver-bal directions have their limitations, so that explaining to the children exactly what is expectedof them is difficult, certainly with young children. Examples were therefore built into the firstitems to give the child the opportunity to repeat what the examiner had done or to solve a similartask. As the test proceeds, tasks are solved more and more independently. To make the items ofthe SON-R 5,-17 suitable for this approach, they were also adapted, for example, by firstworking with cards that have to be arranged correctly instead of pointing out the correct alter-native.

Not only does the difficulty of the items increase in the subtests, the manner in which theyare administered changes as well. In the construction research this procedure was continuouslyadapted, and the directions were improved in accordance with the experiences and comments ofthe examiners and of practicing psychologists. The greatest problems in developing clear direc-tions arose in the second section of the subtest Analogies. Here the child has to apply a similartransformation to a figure as is shown in an example. This is difficult to demonstrate non-verbally because of the high level of abstraction, but it can be explained in a few words. The testtherefore provides first for extensive, repeated practice on one example, and then provides anexample with every following item.

The feedback and help given after an incorrect solution is important in giving the child aclear understanding of the aim of the tasks. The manner in which feedback and help should begiven was worked out in greater detail during the research and is described in the directions.

Scoring PatternsIn the subtest Patterns lines and figures must be copied, with or without the help of preprinteddots. Whether the child can draw neatly or accurately is not important when copying, butwhether he or she can see and reproduce the structure of the example is. This makes highdemands on the assessment and a certain measure of subjectivity cannot be excluded. Duringthe construction research, a great deal of attention was paid to elucidating the scoring rules, andinter-assessor discrepancies were used to determine which drawings were difficult to evaluate.On this basis, drawings that help to clarify the scoring rules were selected. These drawings areincluded in the directions for the administration of Patterns.

Page 25: SON-R 2 - Tests & Test-research

25

3 DESCRIPTION OF THE SON-R 2,,,,,-7

The SON-R 2,-7 is a general intelligence test for young children. The test assesses a broadspectrum of cognitive abilities without involving the use of language. This makes it especiallysuitable for children who have problems or handicaps in language, speech or communication,for instance, children with a language, speech or hearing disorder, deaf children, autistic chil-dren, children with problems in social development, and immigrant children with a differentnative language.

A number of features make the test particularly suitable for less gifted children and childrenwho are difficult to test. The materials are attractive, the tasks diverse. The child is given thechance to be active. Extensive examples are provided. Help is available on incorrect responses,and the discontinuation rules restrict the administration of items that are too difficult for thechild.

The SON-R 2,-7 differs in various aspects from the more traditional intelligence tests, incontent as well as in manner of administration. Therefore, this test can well be administered asa second test in cases where important decisions have to be taken, on the basis of the outcome ofa test, or if the validity of the first test is in doubt.

Although the reasoning tests in the SON-R 2,-7 are an important addition to the typicalperformance tests, the nonverbal character of the SON tests limits the range of cognitiveabilities that can be tested. Other tests will be required to gain an insight into verbaldevelopment and abilities. However, for those groups of children for whom the SON-R2,-7 has been specifically designed, a clear distinction must be made between intelligenceand verbal development.

After describing the composition of the subtests, the most important characteristics of the testadministration are presented in this chapter.

3.1 THE SUBTESTS

The SON-R 2,-7 is composed of six subtests:1. Mosaics,2. Categories,3. Puzzles,4. Analogies,5. Situations and6. Patterns.The subtests are administered in this sequence. The tests can be grouped into two types:reasoning tests (Categories, Analogies and Situations) and more spatial, performance tests(Mosaics, Puzzles and Patterns). The six subtests consist, on average, of 15 items of increas-ing difficulty. Each subtest consists of two parts that differ in materials and/or directions.In the first part the examples are included in the items. The second part of each subtest,except in the case of the Patterns subtest, is preceded by an example, and the subsequentitems are completed independently. In table 3.1 a short description is given of the tasksin both sections of the subtests. In figures 3.1 to 3.6 a few examples of the items arepresented.

Page 26: SON-R 2 - Tests & Test-research

26 SON-R 2,-7

Mosaics (Mos)The subtest Mosaics consists of 15 items. In Mosaics, part I, the child is required to copy severalsimple mosaic patterns in a frame using three to five red squares. The level of difficulty isdetermined by the number of squares to be used and whether or not the examiner first demon-strates the item.

In Mosaics II, diverse mosaic patterns have to be copied in a frame using red, yellow and red/yellow squares. In the easiest items of part II, only red and yellow squares are used, and thepattern is printed in the actual size. In the most difficult items, all of the squares are used and thepattern is scaled down.

Categories (Cat)Categories consists of 15 items. In Categories I, four or six cards have to be sorted into twogroups according to the category to which they belong. In the first few items, the drawings onthe cards belonging to the same category strongly resemble each other. For example, a shoe or aflower is shown in different positions. In the last items of part I, the child must him or herselfidentify the concept underlying the category: for example, vehicles with or without an engine.

Categories II is a multiple choice test. In this part, the child is shown three pictures of objectsthat have something in common. Two more pictures that have the same thing in common havethen to be chosen from another column of five pictures. The level of difficulty is determined bythe level of abstraction of the shared characteristic.

Puzzles (Puz)The subtest Puzzles consists of 14 items. In part I, puzzle pieces must be laid in a frame to

Table 3.1Tasks in the Subtests of the SON-R 2,-7

Task part I Task part II

Mosaics Copying different simple Copying mosaic patterns inmosaic patterns in a frame, a frame, using red, yellowusing red squares. and red/yellow squares.

Categories Sorting cards into two groups Three pictures of objects haveaccording to the category to something in common.which they belong. From a series of five pictures,

two must be chosen that have thesame thing in common.

Puzzles Puzzle pieces must be laid Putting three to six separatein a frame to resemble a puzzle pieces together togiven example. form a whole.

Analogies Sorting disks into two Solving an analogy problem bycompartments on the basis applying the same principleof form and/or color of change as in the exampleand/or size. analogy.

Situations Half of each of four pictures One or two pieces are missing inis printed. The missing drawing of a situation.halves must be placed with The correct piece(s) must be chosenthe correct pictures. from a number of alternatives.

Patterns Copying a simple pattern. Copying a pattern in which five, nineor sixteen dots must be connected bya line.

Page 27: SON-R 2 - Tests & Test-research

27DESCRIPTION OF THE SON-R 2,-7

resemble the given example. Each puzzle has three pieces. The first few puzzles are firstdemonstrated by the examiner. The most difficult puzzles in part I have to be solved independ-ently.

In Puzzles II, a whole must be formed from three to six separate puzzle pieces. No directionsare given as to what the puzzles should represent; no example or frame is used. The number ofpuzzle pieces partially determines the level of difficulty.

Figure 3.1Items from the Subtest Mosaics

Item 3 Item 9 Item 14(Part I) (Part II) (Part II)

Figure 3.2Items from the Subtest Categories

Item 4 Item 11(Part I) (Part II)

Page 28: SON-R 2 - Tests & Test-research

28 SON-R 2,-7

Analogies (Ana)The subtest Analogies consists of 17 items. In Analogies I, the child is required to sort three,four or five blocks into two compartments on the basis of either form, color or size. The childmust discover the sorting principle him or herself on the basis of an example. In the first fewitems, the blocks to be sorted are the same as those pictured in the test booklet. In the last itemsof part I, the child must discover the underlying principle independently: for example, largeversus small blocks.

Analogies II is a multiple choice test. Each item consists of an example-analogy in which ageometric figure changes in one or more aspect(s) to form another geometric figure. Theexaminer demonstrates a similar analogy, using the same principle of change. Together with thechild, the examiner chooses the correct alternative from several possibilities. Then, the child hasto apply the same principle of change to solve another analogy independently. The level ofdifficulty of the items is related to the number and complexity of the transformations.

Situations (Sit)The subtest Situations consists of 14 items. Situations I consists of items in which one half ofeach of four pictures is shown in the test booklet. The child has to place the missing halvesbeside the correct pictures. The first item is printed in color in order to make the principle clear.The level of difficulty is determined by the degree of similarity between the different halvesbelonging to an item.

Situations II is a multiple choice test. Each item consists of a drawing of a situation with oneor two pieces missing. The correct piece (or pieces) must be chosen from a number of alter-natives to make the situation logically consistent. The number of missing pieces determines thelevel of difficulty.

Patterns (Pat)The subtest Patterns consists of 16 items. In this subtest the child is required to copy anexample. The first items are drawn freely, then pre-printed dots have to be connected to makethe pattern resemble the example. The items of Patterns I are first demonstrated by the examinerand consist of no more than five dots.

Figure 3.3Items from the Subtest Puzzles

Item 3 Item 11(Part I) (Part II)

Page 29: SON-R 2 - Tests & Test-research

29DESCRIPTION OF THE SON-R 2,-7

Figure 3.4Items from the Subtest Analogies

Item 8(Part I)

Item 9(Part I)

Item 16(Part II)

Page 30: SON-R 2 - Tests & Test-research

30 SON-R 2,-7

Figure 3.5Items from the Subtest Situations

Item 5(Part I)

Item 10(Part II)

Page 31: SON-R 2 - Tests & Test-research

31DESCRIPTION OF THE SON-R 2,-7

The items in Patterns II consist of five, nine or sixteen dots and have to be copied by the childwithout help. The level of difficulty is determined by the number of dots and whether or not thedots are pictured in the example pattern.

3.2 REASONING TESTS, SPATIAL TESTS ANDPERFORMANCE TESTS

Reasoning testsReasoning abilities have traditionally been seen as the basis for intelligent functioning (Carroll,1993). Reasoning tests form the core of most intelligence tests. They can be divided intoabstract and concrete reasoning tests. Abstract reasoning tests, such as Analogies and Catego-ries, are based on relationships between concepts that are abstract, i.e., not bound by time orplace. In abstract reasoning tests, a principle of order must be derived from the test materialspresented, and applied to new materials. In concrete reasoning tests, like Situations, the object isto bring about a realistic time-space connection between persons or objects (see Snijders,Tellegen & Laros, 1989).

Spatial testsSpatial tests correspond to concrete reasoning tests in that, in both cases, a relationship within aspatial whole must be constructed. The difference lies in the fact that concrete reasoning testsconcern a meaningful relationship between parts of a picture, and spatial tests concern a ‘form’relationship between pieces or parts of a figure (see Snijders, Tellegen & Laros, 1989; Carroll,1993). Spatial tests have long been integral components of intelligence tests. The spatial sub-tests included in the SON-R 2,-7 are Mosaics and Patterns. The subtest Puzzles is moredifficult to classify, as the relationship between the parts concerns form as well as meaning. Weexpected the performance on Puzzles and Situations to relate to concrete reasoning ability.

Figure 3.6Items from the Subtest Patterns

Item 6 Item 13 Item 16(Part I) (Part II) (Part II)

Page 32: SON-R 2 - Tests & Test-research

32 SON-R 2,-7

However, the correlations and factor analysis show that Puzzles is more closely associated withMosaics and Patterns (see section 5.3)

Performance testsAn important characteristic that Puzzles, Mosaics and Patterns have in common is that the itemis solved while manipulating the test stimuli. That is why these three subtests are called per-formance tests. In the three reasoning tests (Situations, Categories and Analogies), in contrast,the correct solution has to be chosen from a number of alternatives. For the rest, the six subtestsare very similar in that perceptual and spatial aspects as well as reasoning ability play a role inall of them.

The performance subtests of the SON-R 2,-7 can be found in a similar form in otherintelligence tests. However, only verbal directions are given in these tests. Reasoning tests canalso regularly be found in other intelligence tests, but then they often have a verbal form (such asverbal analogies).

In table 3.2 the classification of the subtests is presented. The empirical classification, in whicha distinction is made between performance tests and reasoning tests, is based on the results ofprincipal components analysis of the test scores of several different groups of children (seesection 5.4). In table 3.2. the number of each subtest indicates the sequence of administration;the sequence of the subtests in the table is based on similarities of content. This sequence is usedin the following chapters when presenting the results.

Table 3.2Classification of the Subtests

No Abbr Subtest Content Empirical

6 Pat Patterns Spatial insight Performance test1 Mos Mosaics Spatial insight Performance test3 Puz Puzzles Concrete reasoning Performance test5 Sit Situations Concrete reasoning Reasoning test2 Cat Categories Abstract reasoning Reasoning test4 Ana Analogies Abstract reasoning Reasoning test

3.3 CHARACTERISTICS OF THE ADMINISTRATION

In this section the most important characteristics of the SON-R 2,-7 are discussed.

Individual intelligence testMost intelligence tests for children are administered individually. The SON-R 2,-7 follows thistradition for the following reasons:– the directions can be given nonverbally,– feedback can be given in the correct manner,– testing can be tailored to the level of each individual child,– the examiner can encourage children who are not very motivated or cannot concentrate;personal contact between the child and the examiner is essential for effective testing, certainlyfor children up to the age of four to five years.

Nonverbal intelligence testThe SON-R 2,-7 is nonverbal. This means that the test can be administered without the use ofspoken or written language. The examiner and the child are not required to speak or write andthe testing materials have no language component. One is, however, allowed to speak during the

Page 33: SON-R 2 - Tests & Test-research

33DESCRIPTION OF THE SON-R 2,-7

test administration, otherwise an unnatural situation would arise. The manner of administrationof the test depends on the communication abilities of the child. The directions can be givenverbally, nonverbally with gestures or using a combination of both. Care must be taken whengiving verbal directions that no extra information is given.

No knowledge of a specific language is required to solve the items being presented. How-ever, level of language development, for example, being able to name objects, characteristicsand concepts, can influence the ability to solve the problems correctly. Therefore the SON-R2,-7 should be considered a nonverbal test for intelligence rather than a test for nonverbalintelligence.

DirectionsAn important part of the directions to the child is the demonstration of (part of) the solution to aproblem. An example item is included in the administration of the first item on each subtest, anddetailed directions are given for all first items. Once the child understands the nature of the task,the examiner can shorten the directions for the following items. If the child does not understandthe directions, they can be repeated.

In the second part of each subtest an example is given in advance. Once the child understandsthis example, he or she can do the following items independently.

FeedbackThe examiner gives feedback after each item. In the SON-R 5,-17, feedback is limited totelling the child whether his of her answer is correct or incorrect. In the SON-R 2,-7 theexaminer indicates whether the solution is correct or incorrect, and, if the answer is incorrect,he/she also demonstrates the correct solution for the child. The examiner tries to involve thechild when correcting the answer, for instance, by letting him or her perform the last action.However, the examiner does not explain why the answer was incorrect.

By giving feedback, a more normal interaction between the examiner and the child occurs,and the child gains a clearer understanding of the task. The child is given the opportunity tolearn and to correct him or herself. In this respect a similarity exists between the SON-tests andtests for learning potential (Tellegen & Laros, 1993a).

Entry procedure and discontinuation ruleEach subtest begins with an entry procedure. Based on age and, when possible, the estimatedcognitive level of the child, a start is made with the first, third or fifth item. This procedure waschosen to prevent children from becoming demotivated by being required to solve too manyitems that are below their level. The design of the entry procedure ensures that the first items thechild skips would have been solved correctly. Should the level chosen later appear to be toodifficult, the examiner can return to a lower level. However, because of the manner in which thetest has been constructed, this should occur infrequently.

Each subtest has rules for discontinuation. A subtest is discontinued when a total of threeitems has been incorrectly solved. The mistakes do not have to be consecutive. The threeperformance subtests are also discontinued when two consecutive mistakes are made in thesecond part. Frequent failure often has a drastically demotivating effect on children and canresult in refusal to go on.

Time factorThe speed with which the problems are solved plays a very subordinate role in the SON-R2,-7. A time limit for completing the items is used only in the second part of the performancetests. The time limit is generous. Its goal is to allow the examiner to end the item. The construc-tion research showed that children who go beyond the time limit are seldom able to find acorrect solution when given more time.

Duration of test administrationThe administration of the SON-R 2,-7 takes about 50 minutes (excluding any short breaks

Page 34: SON-R 2 - Tests & Test-research

34 SON-R 2,-7

during administration). During the standardization research the administration took betweenforty and sixty minutes in 60% of the cases. For children with a specific handicap, the adminis-tration takes about five minutes longer. For children two years of age, administration time isshorter; nearly 50% of the two-year-olds complete the test in less that forty minutes.

StandardizationThe SON-R 2,-7 is meant primarily for children in the age range from 2;6 to 7;0 years. Thenorms were constructed using a mathematical model in which performance is described as acontinuous function of age. An estimate is made of the development of performance in thepopulation, on the basis of the results of the norm groups (see chapter 4). These norms run from2;0 to 8;0 years. In the age group from 2;0 to 2;6 years, the test should only be used forexperimental purposes. In many cases the test is too difficult for children younger than 2;6years. Often, they are not motivated or concentrated enough to do the test. However, in the agegroup from 7;0 to 8;0 years, the test is eminently suitable for children with a cognitive delay orwho are difficult to test. The easy starting level and the help and feedback given can benefitthese children. For children of seven years old who are developing normally, the SON-R 5,-17is generally more appropriate.

The scaled subtest scores are presented as standard scores with a mean of 10 and a standarddeviation of 3. The scores range from 1 to 19. The SON-IQ, based on the sum of the scaledsubtest scores, has a mean of 100 and a standard deviation of 15. The SON-IQ ranges from 50 to150. Separate total scores can be calculated for the three performance tests (SON-PS) and thethree reasoning tests (SON-RS). These have the same distribution characteristics as the IQscore. When using the computer program, the scaled scores are based on the exact age; in thenorm tables age groups of one month are presented. With the computer program, a scaled totalscore can be calculated for any combination of subtests.

In addition to the scaled scores, based on a comparison with the population of children of thesame age, a reference age can be determined for the subtest scores and the total scores. Thisshows the age at which 50% of the children in the norm population perform better, and 50%perform worse. The reference age ranges from 2;0 to 8;0 years. It provides a different frame-work for the interpretation of the test results, and can be useful when reporting to persons whoare not familiar with the characteristics of deviation scores. The reference age also makes itpossible to interpret the performance of older children or adults with a cognitive delay, forwhom administration of a test, standardized for their age, is practically impossible and notmeaningful.

As with the SON-R 5,-17, no separate norms for deaf children were developed for theSON-R 2,-7. Our basic assumption is that separate norms for specific groups are only requiredwhen a test discriminates against a special group of children because of its contents or themanner in which it is administered. Research using the SON-R 2,-7 and the SON-R 5,-17with deaf children (see chapter 7) shows that this is absolutely not the case for deaf childrenwith the SON tests.

Page 35: SON-R 2 - Tests & Test-research

35

4 STANDARDIZATION OF THE TEST SCORES

Properly standardized test norms are necessary for the interpretation of the results of a test. Thetest norms make it possible to assess how well or how badly a child performed in comparison tothe norm population. The norm population of the SON-R 2,-7 includes all children residing inthe Netherlands in the relevant age group, except those with a severe physical and/or mentalhandicap. The standardization process transforms the raw scores into normal distributions witha fixed mean and standard deviation. This allows comparisons to be made between children,including children of different ages. Intra-individual comparisons between performances ondifferent subtests are also possible. As test performances improve very strongly in the age rangefrom two to seven years, the norms should ideally be related to the exact age of the child and notto an age range, as is the case for most intelligence tests for children.

4.1 DESIGN AND REALIZATION OF THE RESEARCH

Age groupsEleven age groups, increasing in age by 6 months, from 2;3 years to 7;3 years formed the pointof departure for the standardization research. In each group one hundred children were to betested: fifty boys and fifty girls. When selecting the children, an effort was made to keep the agewithin each group as homogeneous as possible. The age in the youngest group, for instance, wassupposed to deviate as little as possible from two years, three months and zero days.

Regions of researchTo ensure a good regional distribution, the research was carried out in ten regions, five of whichare in the West, three in the North/East, and two in the South of the Netherlands. The regionswere chosen to reflect specific demographic characteristics of the Netherlands. In nine of the tenregions, one examiner administered all the tests. In one region, two examiners shared the testadministration. Approximately the same number of children was tested in each region in fiveseparate two week periods. The test was administered to 22 children, one boy and one girl fromeach age group in each region in each period. The sample to be tested consisted of 1100children, i.e., 10 (regions) x 5 (periods) x 11 (age groups) x 2 (one boy and one girl).

CommunitiesThe second phase of the standardization research concerned the selection of the communities inthe ten research regions where the test administrations were to take place. In total, 31 communi-ties were selected. Depending on the size of the community, the research was carried out duringone, two or three periods. The selected communities were representative for the Netherlandswith regard to number of inhabitants and degree of urbanization.

SchoolsChildren four years and older were tested at primary schools. Research at schools was carriedout in the same communities as the research with younger children. One, two or three schoolswere selected in each community, depending on the number of periods in which research was tobe done in that community. To select the schools, a sample was drawn from the schools in eachcommunity. The chance of inclusion was proportional to the number of pupils at the school.

Page 36: SON-R 2 - Tests & Test-research

36 SON-R 2,-7

Fifty schools were approached, 25 were prepared to participate. Schools that were not preparedto participate were replaced by other schools in the same community. The socio-economicstatus of the parents was taken into account in the choice of replacement schools.

Selection of the childrenThe manner of selecting the children depended on their age. For children in the age groups up tofour years, samples were drawn from the local population register, which contains data onname, date of birth, sex and address of the parents. The boy or girl, whose age correspondedmost closely to the required age for each age group was selected. The parents received a letterexplaining the aims of the research and asking them to participate. If no reaction to this letterwas received, they were approached again by letter of by telephone.

In about one quarter of the cases, the test could not be administered to the child that hadoriginally been selected. Some parents refused permission for their child to participate. Some-times, the data from the population register were no longer correct, or practical problems madeit impossible for the parents to allow their child to participate in the research program. In thiscase, the children were replaced, as far as possible, by children from the same community.

For children four years and older, the experimenter selected, per school and per age group,one boy and one girl whose age on the planned test date corresponded as closely as possible tothe required age. If the deviation from the required age was too large, either two boys or twogirls were selected from one age group, or one extra child was tested at another school. Parentswere sent a written request for permission, which was nearly always given.

Practical implementationThe department of Orthopedagogics of the University of Groningen, responsible for the stand-ardization in the Netherlands of the Reynell Test for Language Understanding and the Schlich-ting Test for Language Production (Lutje Spelberg & Sj. van der Meulen, 1990), collaborated inthe design and execution of the standardization research. In three of the five research periods,children who were tested with the SON-R 2,-7 had also participated in the standardizationresearch of the language tests six months earlier. To validate both the language tests and theSON-R 2,-7, a third test was administered to some of the children in the intervening period.

Eleven examiners, eight women and three men, administered most of the tests. Most werepsychology graduates, with extensive experience in testing young children, some of which hadbeen gained in the previous research they had carried out with the language tests.

Children below four years old were tested in a local primary health care center, in thepresence of one of the parents. In a few cases the child was tested at home. Older children weretested at school in a separate room. An effort was made to administer the whole test in onesession. However, a short break between the subtests was allowed. At the schools, breaking offthe test for longer periods, or even continuing a test the next day, was sometimes necessarybecause of school hours and breaks.

In a few cases the test could not be administered correctly. If no more than four subtestscould be administered, the test was considered invalid and was not used in the analyses. Thissituation occurred in the case of ten children, eight of whom who were two years old.

Completing the norm groupThe greater part of the standardization research took place in the period from September toDecember 1993. As fewer children than had been planned were tested in the youngest agegroups, the norm group was supplemented with 31 children in the spring of 1994. Further,immigrant children appeared to be under-represented in the youngest age groups. Eight immi-grant children, who had been tested in a different research project were therefore added to thenorm group. Finally, eight pupils, 4 years or older, from special schools were added. This was asample from a group of children who had been tested at schools for special education with apreschool department.

Page 37: SON-R 2 - Tests & Test-research

37STANDARDIZATION OF THE TEST SCORES

4.2 COMPOSITION OF THE NORM GROUP

The norm group consisted of 1124 children. Table 4.1 shows the composition of the groupaccording to age and sex, and the distribution according to age of the children who were addedto the norm group for various reasons. The mean age per group is practically identical to theplanned age, and the distribution according to age within the age groups is very narrow. In allthe groups the number of boys is approximately equal to the number of girls.

The extent to which the distribution of the selected demographic characteristics of the normgroup conformed to that of the total Dutch population (Central Bureau for Demographics, CBS,1993) is presented in table 4.2. Children from the large urban communities are slightly under-represented, but these communities are also characterized by a relatively smaller number ofyoungsters.

Weighting the norm groupAs a result of sample fluctuations and the different sampling methods used for children above andbelow four years of age, the backgrounds of the children differed from age group to age group.For the standardization, the following factors were weighted within each age group: the percent-age of children with a mother born abroad, the educational level of the mother, and the child’ssex. This allowed a better comparison between the different age groups. Finally, the observationswere weighted so that the number of children per age group was the same. After weighting, everyage group consisted of 51 boys and 51 girls, making the size of the total sample 1122.

An example may elucidate this weighting procedure. The percentage of children with aforeign mother in the entire norm group was 11%. If the percentage in the age group 3;9 years,for example, was 8%, the children with a foreign mother in this age group received a weight of11/8, and the children with a Dutch mother received a weight of (100-11)/(100-8) = 89/92.When using weights, critical limits of 2/3 and 3/2 were adhered to, in order to prevent somechildren contributing either too much or too little to the composition of the weighted normgroup. After the various steps in the weighting procedure, 80% of the children had a weightingfactor between .80 and 1.25.

Table 4.1Composition of the Norm Group According to Age, Sex and Phase of Research (N=1124)

Age Group

2;3 2;9 3;3 3;9 4;3 4;9 5;3 5;9 6;3 6;9 7;3

Total 98 99 99 100 102 101 105 105 102 107 106

Phase

1993 94 89 86 90 99 101 104 104 101 105 103Addition: 1994 3 9 11 7 2 – – – – – – Immigrant 1 1 2 3 1 – – – – – – Spec. Educ. – – – – – – 1 1 1 2 3

Sex

Boys 47 50 48 52 52 50 52 53 49 53 55Girls 51 49 51 48 50 51 53 52 53 54 51

Age

Mean (years) 2.24 2.76 3.25 3.75 4.25 4.74 5.24 5.74 6.25 6.74 7.24SD (days) 14 16 16 14 15 22 22 24 18 23 21

Page 38: SON-R 2 - Tests & Test-research

38 SON-R 2,-7

Table 4.3 presents the level of education and country of birth of the mother, before and afterweighting, for three age groups. As can be seen, the differences between the age groups weremuch smaller after weighting. The level of education of the mothers corresponded well to thelevel of education in the population of women between 25 and 45 years of age (CBS, 1994). Thepercentages for low, middle and high levels of education in the population are respectively 27%,54% and 19%. The percentage of children whose mother was born abroad also corresponded tothe national percentage of 10% immigrant children in the age range from zero to ten years(Roelandt, Roijen & Veenman, 1992).

Table 4.2Demographic Characteristics of the Norm Group in Comparison with the Dutch Population(N=1124)

Region Norm Group Population

North/East-Netherlands 31% 31%South-Netherlands 19% 22%West-Netherlands 50% 47%

Size of Community Norm Group Population

Less than 10.000 inhabitants 12% 11%10.000 to 20.000 inhabitants 22% 20%20.000 to 100.000 inhabitants 44% 42%More than 100.000 inhabitants 22% 27%

Degree of Urbanization Norm Group Population

(Urbanized) Rural Communities 37% 34%Commuter Communities 16% 15%Urban Communities 47% 51%

Table 4.3Education and Country of Birth of the Mother in the Weighted and Unweighted Norm Group

Education Mother Country of Birth MotherUnweighted Norm Group Low Middle High Netherlands Abroad

2 and 3 years 26% 57% 17% 91% 9%4 and 5 years 32% 51% 17% 90% 10%6 and 7 years 40% 45% 15% 86% 14%

Total 32% 51% 17% 89% 11%

Education mother Country of Birth MotherWeighted Norm Group Low Middle High Netherlands Abroad

2 and 3 years 28% 54% 18% 89% 11%4 and 5 years 32% 52% 16% 89% 11%6 and 7 years 33% 50% 17% 87% 13%

Total 31% 52% 17% 89% 11%

Page 39: SON-R 2 - Tests & Test-research

39STANDARDIZATION OF THE TEST SCORES

4.3 THE STANDARDIZATION MODEL

Subtest scoresThe first step in standardization is transforming the raw subtest scores to normally distributedscores with a fixed mean and standard deviation. Usually, these transformations are carried outseparately for each age group. The disadvantage of this method, however, is that the relativelysmall number of subjects in each age group allows chance factors to play an important role inthe transformations. In the SON-R 2,-7, a different method, developed for the standardizationof the SON-R 5,-17, was applied (Snijders, Tellegen & Laros, 1989, p.43-45; Laros &Tellegen, 1991, p. 156-157).

In this method, the score distributions for all age groups are fitted simultaneously as acontinuous function of age. This is done for each subtest separately. The function gives anestimate, dependent on age, of the distribution of the scores in the population. With the fittingprocedure an effort is made to minimize the difference between the observed distribution andthe estimated population distribution, while limiting the number of parameters of the function.Within the age range of the model two pre-conditions must be met:1. For each age, the standardized score must increase if the raw score increases.2. For each raw score, the standardized score must decrease if the age increases.A great advantage of this method is that the use of information on all age groups simultaneouslymakes the standardization much more accurate. Further, the standardized scores can be calculat-ed on the basis of the exact age. The model also allows for extrapolation outside the age range inwhich the standardization research was carried out. In the SON-R 2,-7, the model had tocomply with the pre-conditions for the age range from 2;0 to 8;0 years.

The logistic regression modelThe logistic regression model is used to estimate parameters of a function in order to describethe chance of a certain occurrence as precisely as possible. The model has the following form:

Chance(occurrence) = exp[Z]/(1+exp[Z])

Z can be a composite function of independent variables, in our case, age and score. The depend-ent variable is defined by determining for each person and for each possible score (in the rangefrom 0 to the maximum score minus 1), whether that score or a lower score was received. If thisis the case, the dependent variable is given the value 1. If this is not the case, the dependentvariable is given the value 0.

Because of the narrow distribution of age in each subgroup, the analysis was based on themean age in the subgroup. However, our model has the special characteristic that standardiza-tion does not need to be based on homogeneous age groups.The regression procedure was carried out in two phases. In the first phase, Z was defined asfollows:

Z = b0 + b1X + b2X2 + b3X3

+ b4X4 + b5X5

+ b6Y + b7Y2 + b8Y3

Here b0 through b8 are the estimated parameters, X through X5 are powers of the raw score, andY through Y3 are powers of age. When fitting the model, the procedure for logistic regression inSPSS was used (SPSS Inc, 1990).

Using the parameters found for the third degree function of age, age was transformed to Y’ insuch a manner that the relation between Y’ and the test scores in the above mentioned modelbecame linear. In the following phase, Y’ was used in the regression analysis and the interactionbetween score and age was added to the model. The definition of Z in this second phase was:

Z = b0 + b1X + b2X2 + b3X3 + b4X4 + b5X5 + b6Y’ + b7Y’*X

+ b8Y’*X2 + b9Y’*X3 + b10Y’*X4 + b11Y’*X5

Page 40: SON-R 2 - Tests & Test-research

40 SON-R 2,-7

After the stepwise fitting procedure, the number of selected parameters in the subtests variedfrom six to ten. The cumulative proportion in the population, in the age range from two to eight,could then be estimated for every possible combination of age and score. Normally distributedz-values were then determined by calculating the mean z-value for the normal distributioninterval that corresponded to the upper limit and the lower limit of each raw score. The averag-ing procedure caused a slight loss of dispersion, for which we corrected.

This model may seem to be complicated. However, for simple linear transformations per agegroup, twenty-two parameters for each subtest would have to be estimated, and in the case ofnonlinear transformations based on the cumulative proportions, more than one hundred param-eters would have to be estimated.

ReliabilityFor each subtest and age group the reliability was calculated with the formula for labda2(Guttman, 1945). This is, like labda3 (Coefficient alpha; Cronbach, 1951), a measure for inter-nal consistency. However, labda2 is preferable if the number of items is limited, and if thecovariance between the items is not constant (Ten Berge & Zegers, 1978).

The reliability for each subtest was fitted as a third degree function of the transformed age(Y’), using the method of stepwise multiple regression. In a few cases, when extrapolating to theages of 2;0 and 8;0, extreme values occurred for the estimate of reliability. In these cases, thelower limit for the estimated value was set at .30 and the upper limit at .85.

Correlations and total scoresIn each age group correlations between the standardized subtest scores were first corrected forunreliability, and then fitted as a third degree function of age. Using the estimated values of thecorrelations in the population, the standard deviation of the total score could be calculated forevery age and every combination of subtests, and transformed into the required standardizeddistribution.

4.4 THE SCALED SCORES

The scaled scores are presented in two different ways, as standard scores and as reference ages.The standard score (also called deviation score) shows how well or how badly the child per-forms in relation to the population of children of the same age. The reference age (also calledmental age or test age) shows at which age 50% of the children in the population perform worsethan the subject. Unless stated otherwise, standard scores are meant when scaled scores arementioned in this manual. In the following section, a short explanation is given of scaled scoresof the SON-R 2,-7.

Standard scoresScaled subtest scores are presented on a normally distributed scale with a mean of 10 and astandard deviation of 3. These so-called Wechsler scores have a range of 1 to 19. As a result of‘floor’ and ‘ceiling’ effects, the most extreme scores will not occur in all age groups. The rawscores of the subtests are less differentiated than the standard scores. As a result, only some ofthe values in the range of 1 to 19 are used in each age group. However, the values show theposition in the normal distribution with more precision which would not be possible with a lessdifferentiated scale.

The sum of the six scaled subtest scores is the basis of the IQ score. This SON-IQ has a meanof 100 and a standard deviation of 15. The range extends from 50 to 150.

The sum of the scaled scores of Mosaics, Puzzles and Patterns is transformed to provide thePerformance Scale (SON-PS), and the sum of Categories, Situations and Analogies forms theReasoning Scale (SON-RS). Both scales, like the IQ-distribution, have a mean of 100 and astandard deviation of 15. The range extends from 50 to 150.

In the Appendix, the norm tables for the subtests are shown for each month of age, for the age

Page 41: SON-R 2 - Tests & Test-research

41STANDARDIZATION OF THE TEST SCORES

range 2;0 to 8;0 years. The tables for calculating the standardized total scores are presented perfour month period. When the computer program is used, all the standardized scores are based onthe exact age.

Reference ageThe reference age is derived from the raw score(s). The actual age of the child is not important.For the age range of 2;0 to 8;0 years, the reference age is presented in years and months. Thereference age for the subtests can be found in the norm tables. The reference age for the totalscore is the age at which a child with this raw scores would receive an IQ score of 100. This ageis determined iteratively, with the help of the computer program, for the Total Score on the test,the Performance Scale and the Reasoning Scale. An approximation of the reference age for thetotal score is presented in the norm tables in the appendix. This approximation is based on thesum of the six raw subtest scores.

For use of the norm tables and the computer program, we refer to chapter 13 (The record form,norm tables and computer program). Directions on the procedure to be used when the test hasnot been fully administered can also be found in this chapter.

Page 42: SON-R 2 - Tests & Test-research
Page 43: SON-R 2 - Tests & Test-research

43

5 PSYCHOMETRIC CHARACTERISTICS

Important psychometric characteristics of the SON-R 2,-7 will be discussed in this chapter.These are the distribution characteristics of the scores, the reliability and generalizability of thetest, the relationship between the test scores and the stability of the scores. In general, theseresults are based on the weighted norm group (N=1122). In several analyses comparisons havebeen made between the results in three age groups, namely:– two- and three year-olds (the norm groups of 2;3, 2;9, 3;3 and 3;9 years),– four- and five-year-olds (the norm groups of 4;3, 4;9; 5;3 and 5;9 years),– six- and seven-year-olds (the norm groups of 6;3, 6;9 and 7;3 years).

The results in this chapter are relevant for the internal structure of the test. Research on validity,carried out in the norm group, will be discussed in chapter 6 (Relationships with other variables)and in chapter 9 (Relationship with cognitive tests).

5.1 DISTRIBUTION CHARACTERISTICS OF THE SCORES

Level of difficulty of the test itemsAs entry and discontinuation rules are used in the SON-R 2,-7, it is important that successiveitems of the subtests increase in difficulty. Table 5.1 shows the p-values of the items, calculatedover the entire norm group. The p-value represents the proportion of children who completedthe item correctly. Items skipped at the beginning of the subtest are scored as correct; items thatare not administered after discontinuation of the test are scored as incorrect.

In general, the level of difficulty of the items increased as expected. Six of the 91 items weremore difficult than the following item, but in four cases the difference in p-value was only .02.

Table 5.1P-value of the Items (N=1122)

Pat Mos Puz Sit Cat Ana

item 1 .90 .95 .97 .95 .91 .96item 2 .88* .81 .90 .91 .89 .93item 3 .90 .77 .89 .87 .89 .84*item 4 .88 .76 .79 .86 .82 .86item 5 .86 .73 .76 .80 .75 .73item 6 .79 .70 .72 .67 .69 .52*item 7 .77 .64 .64 .56 .64 .58item 8 .62 .58 .59 .54 .51 .57item 9 .60 .46 .37* .46 .49 .45item 10 .43 .33 .44 .32 .33 .28item 11 .33 .23 .25 .17 .30 .28item 12 .30 .14 .19 .12 .17 .23item 13 .21 .08* .13 .07 .10 .15item 14 .20 .10 .05 .06 .09 .13item 15 .13 .06 .05 .04*item 16 .04 .06item 17 .04

*: the p-value is lower than the p-value of the following item

Page 44: SON-R 2 - Tests & Test-research

44 SON-R 2,-7

For two items, item 9 of Puzzles and item 6 of Analogies, the difference was larger. The sixdeviating items are marked with an asterisk in table 5.1.

IRT modelAs in the construction research, the item characteristics for the definitive test were estimatedwith the 2-parameter model from item response theory. The computer program BIMAIN(Zimowski et al., 1994) was used for these calculations. This program does not require allsubjects to have completed all the items. The two item parameters estimated for the items ofeach subtest are the a-parameter and the b-parameter. The a-parameter shows how well the itemdiscriminates and the b-parameter shows how difficult the item is. To obtain a reliable estimateof the item parameters, the analysis was carried out on the test results of 2498 children, almostall the children who were tested during the standardization and the validation research. Theestimate is based on the items administered de facto.

In figure 5.1 the item characteristics are represented in a graph. The distribution of the b-parameters is similar to the results obtained on the basis of the p-values. Except for a few smalldeviations, the items increase in difficulty. The difficulty of the items is also distributed evenlyover the range from -2 to +2.

The mean of the discrimination parameter is highest for Patterns (mean=4.8) and Mosaics(mean=3.8). For Puzzles, Situations, Categories and Analogies, the means are 2.8, 2.4, 2.9 and2.2 respectively. Within the subtests, however, the discrimination values of the items can divergestrongly.

Initially, we considered basing the scoring and standardization of the SON-R 2,-7 on theestimated latent abilities as represented in the IRT model. A good method for doing this withincomplete test data was described by Warm (1989). Such a method of scoring has importantadvantages: items which clearly discriminate have more weight in the evaluation, no assump-tions need to be made about scores on items that were not administered, and the precision ofstatements about the ability of a person can be shown more clearly. However, the disadvantagesare that this scoring method can only be done with a computer, and that important differencescan occur between the standardized computer results and the results obtained with norm tables.The main factor in the decision not to apply the IRT model when standardizing the test, how-ever, was the fact that the data did not fit the model. This is not surprising. The IRT modelassumes that the item scores are obtained independently. However, the feedback and the helpgiven with the SON-R 2,-7, creates an interdependence among the scores. This works outpositively for the test and its validity, but it limits the psychometric methods that can be appliedsuccessfully. IRT models that take learning effects into account are being developed (seeVerhelst & Glas, 1995), but programs with which the item parameters can be estimated incombination with an adaptive test administration, are not yet available.

Correlation of test performances with ageTable 5.2 presents, for each age group, the mean and the standard deviation of the raw subtestscores and of the sum of the six subtest scores. The mean score increases with age for allsubtests. The sum of the raw subtest scores increases by about nine points per half year in theyoungest age groups, and by about five points per half year in the oldest age groups.

The strong correlation with age is also evident from the high correlations between the subtestscores and age. The multiple correlation of age and the square of age with the subtest scores hasa mean of .87 and varies from .80 (Analogies) to .91 (Patterns). For the other subtests, thecorrelations are .85 (Situations), .88 (Categories), .89 (Puzzles) and .90 (Mosaics). For the sumscore, the multiple correlation with age is .93. Because of the large increase in test performancewith age, the norm tables were constructed for each month of age.

Distribution of the standardized scoresThe subtest scores, standardized and normalized for age, are presented on a scale of 1 to 19 witha mean of 10 and a standard deviation of 3. The sum of the six standardized subtest scores ispresented on a scale with a mean of 100 and a standard deviation of 15. This score, the SON-IQ,

Page 45: SON-R 2 - Tests & Test-research

45PSYCHOMETRIC CHARACTERISTICS

Figure 5.1Plot of the Discrimination (a) and Difficulty (b) Parameter of the Items

Patternsa

9 15 4

5 – 3 6 8 10 11 12 14 16 7 13

1 2 52 –

| | | | | |-3 -2 -1 b 0 1 2

Mosaicsa

5 – 4 8 10 136 7 9 11 12 14

1 52 – 2 3 15

| | | | | |-3 -2 -1 b 0 1 2

Puzzlesa

5 – 54 8

1 2 6 7 102 – 3 9 11 12 13 14

| | | | | |-3 -2 -1 b 0 1 2

Situationsa 5 –

2 41 3

2 – 5 6 8 7 9 10 11 12 13 14| | | | | |

-3 -2 -1 b 0 1 2

Categoriesa 5 –

8 112 3 4 5 6 7 9 12 14 15

2 – 1 10 13| | | | | |

-3 -2 -1 b 0 1 2

Analogiesa 4 –

8 10 12 14 152 – 1 2 4 3 5 7 6 9 11 13 16 17

| | | | | |-3 -2 -1 b 0 1 2

Page 46: SON-R 2 - Tests & Test-research

46 SON-R 2,-7

ranges from 50 to 150. A distribution with a mean of 100 and a standard deviation of 15 is alsoused for the Performance Scale (SON-PS), based on the sum of the scores of Mosaics, Puzzlesand Patterns, and for the Reasoning Scale (SON-RS), based on the sum of the scores of Catego-ries, Analogies and Situations.

In table 5.3, the mean and the standard deviation of the standardized scores are presented forthe entire weighted norm group and for three age groups. Only very small deviations from theplanned distribution were found for the entire group. No significant deviations from the normaldistribution were found in tests for skewness and kurtosis. Deviations in mean and dispersionsometimes differed slightly across the three separate age groups, but an analysis of varianceshowed that the differences between the means were not significant. A test for the homogeneityof the variances also failed to show any significant differences. The kurtosis was not significantin the different groups. The distribution was positively skewed for Puzzles and for theReasoning Scale in the oldest group. However, the values for skewness were small, .4 and .3,respectively. A variance analysis was also carried out over the eleven original age groups. Nosignificant differences in mean and variance between the groups were established for any of thevariables.

Table 5.2Mean and Standard Deviation of the Raw Scores

Pat Mos Puz Sit Cat Ana Sum Subt.

Age Mean (SD) Mean (SD) Mean (SD) Mean (SD) Mean (SD) Mean (SD) Mean (SD)

2;3 1.1 (1.5) 1.3 ( .9) 2.0 (1.0) 1.6 (1.7) 1.2 (1.6) 1.8 (1.4) 9.0 (5.3)2;9 3.8 (2.3) 1.8 (1.1) 2.7 (1.3) 3.6 (2.2) 2.8 (2.0) 3.5 (1.9) 18.1 (6.8)3;3 5.7 (1.8) 3.3 (2.1) 4.3 (2.0) 5.1 (1.8) 4.7 (1.9) 5.1 (2.0) 28.3 (8.4)3;9 7.1 (1.3) 5.3 (2.2) 6.3 (1.9) 6.2 (1.6) 6.1 (1.9) 6.2 (2.1) 37.3 (7.6)4;3 8.4 (1.3) 7.3 (1.9) 7.7 (1.7) 7.3 (1.7) 7.6 (1.9) 6.9 (2.1) 45.2 (7.2)4;9 9.5 (1.5) 8.3 (1.8) 8.4 (2.0) 7.8 (1.7) 8.5 (2.0) 8.2 (2.1) 50.7 (7.9)5;3 10.4 (1.6) 9.0 (1.5) 9.4 (1.7) 8.4 (1.7) 8.6 (1.7) 8.4 (2.4) 54.3 (6.7)5;9 11.2 (1.9) 9.8 (1.8) 10.0 (2.0) 9.1 (1.7) 9.9 (1.9) 9.6 (2.6) 59.8 (8.6)6;3 12.9 (2.0) 11.1 (2.0) 11.0 (1.9) 10.2 (1.8) 11.0 (1.7) 10.5 (3.3) 66.6 (9.1)6;9 13.2 (1.8) 11.4 (1.9) 11.2 (1.5) 10.5 (1.7) 11.2 (1.9) 11.3 (3.1) 68.8 (8.3)7;3 14.1 (1.7) 12.2 (2.0) 11.5 (1.5) 11.1 (1.6) 11.9 (1.6) 12.7 (3.0) 73.6 (8.3)

Total 8.9 (4.3) 7.4 (4.1) 7.7 (3.7) 7.4 (3.4) 7.6 (3.9) 7.7 (4.0) 46.5 (21.7)

Table 5.3Distribution Characteristics of the Standardized Scores in the Weighted Norm Group

Total 2-3 years 4-5 years 6-7 yearsMean (SD) Mean (SD) Mean (SD) Mean (sd)

Patterns 10.0 (2.9) 9.9 (2.8) 10.0 (2.9) 10.1 (3.1)Mosaics 10.1 (3.0) 10.0 (3.0) 10.0 (3.1) 10.2 (3.0)Puzzles 10.0 (3.0) 10.0 (2.9) 10.0 (3.0) 10.1 (3.0)Situations 10.0 (2.9) 10.0 (2.8) 9.9 (3.1) 10.0 (2.8)Categories 10.0 (2.9) 10.0 (2.9) 10.0 (3.0) 10.1 (2.9)Analogies 10.0 (2.9) 10.0 (2.7) 10.0 (3.0) 9.8 (3.1)

SON-PS 100.2 (15.1) 100.1 (15.2) 99.9 (15.0) 100.6 (15.2)SON-RS 99.9 (15.0) 100.1 (14.5) 100.0 (15.6) 100.0 (14.9)

SON-IQ 100.1 (15.0) 100.1 (14.8) 99.9 (15.2) 100.5 (15.0)

Page 47: SON-R 2 - Tests & Test-research

47PSYCHOMETRIC CHARACTERISTICS

These results indicate that the standardization model is adequate and gives a good estimate ofthe distribution of the scores in the population; the deviations in the samples can be seen aschance deviations from the population values resulting from sample fluctuations.

‘Floor’ and ‘ceiling’ effectsAlthough the standardization of the subtest scores was based on a distribution with a range from1 to 19, these scores could not be obtained in all age groups. The youngest children had rawscores of zero so often that the standardized scores were substantially higher than 1. This meansthat, at this age, the test differentiates less for children with a low performance level. The firstpart of table 5.4 presents, for a few age ranges, the standardized scores in the situation where thechild receives no positive scores. In the age range 2;0 to 2;6 years, considerable ‘floor’ effectscan be seen. From 2;9 years onwards these effects are much smaller. The lowest possiblestandard subtest scores are about two standard deviations below the mean of 10 and the lowestIQ score that can occur is 51. From 3;6 years onwards, no ‘floor’ effects occur.

In the second part of table 5.4 the standardized scores are presented for the situation in whichall the items are done correctly. From the age of about 6;0 onwards, small ‘ceiling’ effects can beobserved. From 7;0 years onwards, these effects become more important and the maximum IQscore of 150 can no longer be reached.

5.2 RELIABILITY AND GENERALIZABILITY

Reliability of the subtestsThe reliability of the subtests is based on the internal consistency of the item scores. Thereliability was calculated using the formula for labda2 (Guttman, 1945). However, an assump-tion made by this and similar formulas for internal consistency is that the item scores areobtained independently. The sequence in which the items are administered should therefore

Table 5.4Floor and Ceiling Effects at Different Ages

Floor Effect (lowest possible standardized score)

Age Pat Mos Puz Sit Cat Ana PS RS IQ

2;0 9 6 4 8 9 7 70 86 732;3 8 6 3 7 8 6 68 80 682;6 6 5 3 5 7 5 62 72 632;9 4 4 2 4 5 3 52 61 513;0 3 3 2 3 3 2 52 52 503;3 1 2 1 2 2 1 50 50 503;6 1 1 1 1 1 1 50 50 50

Ceiling Effect (highest possible standardized score)

Age Pat Mos Puz Sit Cat Ana PS RS IQ

5;0 19 19 19 19 19 19 150 150 1505;6 19 19 19 19 18 19 150 150 1506;0 18 18 18 18 18 19 149 150 1506;6 16 17 17 17 17 18 141 149 1507;0 15 16 16 16 16 17 137 140 1437;6 14 15 16 16 16 16 132 138 1398;0 13 14 15 16 15 15 126 134 133

Page 48: SON-R 2 - Tests & Test-research

48 SON-R 2,-7

have no effect on the scores. In the case of the SON-R 2,-7, this condition is not fulfilled fortwo reasons. First, the entry and discontinuation rules mean that scores on some items deter-mine whether other items are or are not administered. The latter items are, however, scored as‘correct’ or ‘incorrect’. When item scores become interdependent in this way, reliability isinflated. In the case of the SON-R 5,-17, where this was investigated, the mean overestimationof the reliability of the subtests as a result of the adaptive procedure was .11 (Snijders, Tellegen& Laros, 1989, p. 46-51). The item scores are not independent for a second reason. After everyitem that a child cannot solve independently, extensive help and feedback are given. This oftenleads to the next, more difficult item being solved correctly. These inconsistencies, which havea valid cause, lead to an underestimation of reliability.

The net effects of the underestimation of reliability (as a result of valid inconsistencies) onthe one hand, and the overestimation of reliability (as a result of artificial consistencies) on theother hand, cannot be determined. Therefore, the reliability of the subtests with the SON-R2,-7 was based on the formulas for internal consistency and no correction for under or overes-timation was applied. The uncertainty about the correctness of the estimate of reliability is areason to be reticent about the individual interpretation of results on the subtest level. It was alsothe reason why the standardized subtest scores were not presented, as was done with theSON-R 5,-17, in such a way that the reliability was taken into account in the score.

Table 5.5Reliability, Standard Error of Measurement and Generalizability of the Test Scores

Reliability

Age Pat Mos Puz Sit Cat Ana Mean PS RS IQ

2;6 .79 .41 .45 .79 .81 .75 .67 .68 .89 .863;6 .73 .76 .75 .66 .73 .73 .73 .86 .84 .904;6 .72 .77 .75 .62 .70 .74 .72 .88 .81 .905;6 .74 .74 .70 .62 .68 .78 .71 .87 .81 .906;6 .76 .78 .69 .66 .68 .83 .73 .87 .84 .917;6 .79 .84 .69 .69 .69 .85 .76 .88 .86 .92

Mean .75 .73 .69 .67 .71 .78 .72 .85 .84 .90

Standard Error of Measurement

Age Pat Mos Puz Sit Cat Ana PS RS IQ

2;6 1.4 2.3 2.2 1.4 1.3 1.5 8.5 5.0 5.63;6 1.6 1.5 1.5 1.7 1.6 1.6 5.6 6.1 4.74;6 1.6 1.5 1.5 1.9 1.7 1.5 5.3 6.6 4.75;6 1.5 1.5 1.6 1.8 1.7 1.4 5.4 6.5 4.76;6 1.5 1.4 1.7 1.8 1.7 1.2 5.4 6.0 4.57;6 1.4 1.2 1.7 1.7 1.7 1.2 5.2 5.5 4.2

Generalizability Standard Error of Estimation

Age PS RS IQ Age PS RS IQ

2;6 .45 .74 .71 2;6 11.1 7.7 8.03;6 .67 .66 .77 3;6 8.7 8.8 7.14;6 .77 .57 .78 4;6 7.3 9.8 7.15;6 .78 .56 .78 5;6 7.0 9.9 7.06;6 .75 .63 .80 6;6 7.5 9.1 6.77;6 .71 .71 .82 7;6 8.1 8.1 6.4

Mean .69 .64 .78

Page 49: SON-R 2 - Tests & Test-research

49PSYCHOMETRIC CHARACTERISTICS

The calculated values of labda2 have been fitted in the standardization model as a function ofage. The results for a number of ages are presented in table 5.5. The mean reliability of thesubtests is .72; it increases, though not regularly, with age. Very low reliabilities were found forMosaics and Puzzles at the age of 2;6 years. A learning effect may occur with these subtests ata young age when help is offered, and this may result in an underestimation of reliability.

In the second part of table 5.5 the standard errors of measurement are presented. The stand-ard error of measurement is the standard deviation of the standardized scores that would bereceived by an individual child, if the subtest could be administered to him or her many times. Itindicates how strongly the test results of a child can fluctuate. Section 13.4 describes how to usethe standard error of measurement to test the differences between scores statistically.

Reliability of the total scoresThe reliability of the Performance Scale, the Reasoning Scale and the SON-IQ was calculatedusing the formula for stratified alpha. This is a formula for the reliability of linear combinations(Cronbach, Schönemann & McKie, 1965; Nunnally, 1978, p. 246-250). The reliability of the IQscore had a mean of .90. Reliability increased with age, from .86 at 2;6 years to .92 at 7;6 years.The standard error of measurement of the IQ decreased from 5.6 at 2;6 years to 4.2 at 7;6 years(see table 5.5).

The mean reliability of the Performance Scale was .85 and the mean reliability of theReasoning Scale .84. In general, the reliability of the Performance Scale was higher. Theyoungest children formed an exception. In this group, the reliability of the Reasoning Scale wasclearly higher than the reliability of the Performance Scale.

The scores on the Performance Scale and the Reasoning Scale were strongly correlated.In the entire norm group the correlation was .56. In the age groups two and three, four andfive, and six and seven years, the correlations were .52, .55 and .61 respectively. Thecorrelation between the two scales decreased the reliability of the difference between thePerformance Scale and the Reasoning Scale. The mean reliability of the difference score was.65. The minimum difference between the two scores for significance on the 1%- and 5%level is shown in the norm tables.

Generalizability of the total scoresThe generalizability of the IQ and the two scale scores was also determined. This shows howwell one can generalize, on the basis of the selected subtests, to the total domain of comparablesubtests. The generalizability was calculated using the formula for coefficient alpha with subtestscores instead of item scores as the unit of analysis. For homogeneous (sub)tests, alpha, as ameasure for internal consistency, can also be used as estimate of reliability. However, coeffi-cient alpha has a different meaning for a total score based on subtest scores, each of which hasits own specific reliable variance. In this case, it can be interpreted as a measure of generaliza-bility. The six subtests of the SON-R 2,-7 can be considered a sample from the domain ofsimilar nonverbal subtests. Alpha represents the expected correlation of the IQ score with thetotal score on a different, same sized combination of subtests from the domain. The square rootof alpha is the correlation of the IQ score with the hypothetical test score that would be expectedif a large number of similar nonverbal subtests had been administered. The same applies for thePerformance Scale and the Reasoning Scale. However, here the domain of subtests is limited tosimilar performance or reasoning tests.

The mean generalizability coefficient (α) of the SON-IQ was .78. It increased from .71 at 2;6years to .82 at 7;6 years. The mean generalizability for the Performance Scale was .69 (relative-ly high for the middle age groups) and for the Reasoning Scale .64 (relatively high for theextreme age groups).

In table 5.5 the standard errors of estimation, based on the generalizability coefficient, arealso presented. The standard error of estimation for the IQ represents the standard deviation ofthe distribution of IQ scores of all subjects with the same SON-IQ that would be found if a largenumber of subtests were administered. The greater the dispersion, the less accurate are thestatements about ‘the’ level of intelligence based on these test results. The standard error of

Page 50: SON-R 2 - Tests & Test-research

50 SON-R 2,-7

estimation was used to construct the interval in which the ‘domain score’ will, with a certainprobability, be found. This interval is not situated symmetrically around the given score. Whenthe point of departure is the distribution of the scores in the norm population, the middle of theinterval equals 100 + √α(IQ-100). The standard error of estimation equals 15√(1-α). In thenorm tables, this interval is presented for each IQ score with a latitude of 1.28 times the standarderror of estimation. This means that the probability that the ‘domain score’ is in the interval is80%. When using the computer program, these intervals are also presented for the PerformanceScale and the Reasoning Scale.

For individual assessments, the interval gives a good indication of the accuracy with which astatement, based on the test results, can be made about the level of intelligence. The interval isbroader than the intervals that are based, as is customary, on the reliability of the test. Wheninterpreting the results of an intelligence test, one will, in general, not want to limit oneself tothe specific abilities included in the test. The interval, based on generalizability, takes intoaccount the facts that the number of items per subtest is necessarily limited, and that the choiceof the subtests also denotes a limitation.

Given the problems in correctly determining the reliability of the subtests with the SON-R2,-7, it is fortunate that the calculation of the generalizability of the total scores dependsexclusively on the number of subtests and the strength of the correlations between the subtests,and not on the reliability of the subtests.

Comparison with the Preschool SON and the SON-R 5,,,,,-17The reliability and generalizability of the IQ score of the SON-R 2,-7 were compared with theprevious version of the test, the Preschool SON, and with the revision of the SON for olderchildren, the SON-R 5,-17. In the manual for the Preschool SON, reliabilities based on calcu-lations over combined age groups were presented. The combination of age groups leads to ahigh overestimation of reliability. Therefore, new calculations were carried out on the originalnormalization material, and the reliability and the generalizability for homogeneous age groupswere determined. (Tellegen et al., 1992).

The reliability and the generalizability of the SON-R 2,-7 were greatly improved withrespect to the Preschool SON. This is especially so for the more extreme age groups. However,an improvement can also be seen for the four-year-olds, for whom the reliability and generaliz-ability of the old Preschool SON were highest (table 5.6).

Table 5.6Reliability and Generalizability of the IQ Score of the Preschool SON, the SON-R 2,-7 and theSON-R 5,-17

Reliability Generalizability

SON-R SON-R SON-R SON-RAge P-SON 2,-7 5,-17 Age P-SON 2,-7 5,-17

2;6 years .86 – 2;6 years .54 .71 –.78

3;6 years .90 – 3;6 years .69 .77 –

4;6 years .86 .90 – 4;6 years .74 .78 –

5;6 years .90 .90 5;6 years .71 .78 .79.82

6;6 years .91 .92 6;6 years .62 .80 .81

7;6 years – .92 .93 7;6 years .52 .82 .83

Page 51: SON-R 2 - Tests & Test-research

51PSYCHOMETRIC CHARACTERISTICS

In comparison with the SON-R 5,-17, the results of similar age groups for reliability andgeneralizability are practically the same. However, for the total age range of the SON-R5,-17, the mean reliability (.93) and the generalizability (.85) are higher than for theSON-R 2,-7.

5.3 RELATIONSHIPS BETWEEN THE SUBTEST SCORES

The relationship between the test scores was examined using the correlations between thesubtests and the correlations of each subtest with the sum of the remaining subtests.

Correlations between the subtestsThe correlations between the standardized subtest scores for the entire norm group and for threeage groups are presented in table 5.7. The mean correlation in the entire group was .36. Thestrongest correlations were found between Patterns and Mosaics (.50) and between Puzzles andMosaics (.45); the weakest correlations were those of Categories and Analogies with Puzzles(.30 and .28) and of Analogies with Situations (.31).

The mean correlations increased with age. In the youngest group the mean was .33, in themiddle group .37, and in the oldest group .40. If we compare the oldest and youngest agegroups, nearly all correlations appear to increase. The exception to the rule is Categories; thecorrelation of Categories with Patterns increased, but the correlations with the other four sub-tests decreased.

The increase in the correlations with age corresponds to the findings with the SON-R 5,-17.Here the mean correlation in the age range 6;6 to 14;6 years increased from .38 to .51. The meancorrelation with the SON-R 5,-17 for the six and seven-year-olds was .39, almost equal to themean correlation of .40 in the same age group with the SON-R 2,-7.

With the SON-R 2,-7, as with the SON-R 5,-17, the correlation between the performanceson the different subtests increased with age. This also increased the reliability and generalizabil-ity of the SON-IQ for the older age groups.

Table 5.7Correlations Between the Subtests

Age: 2-7 years Age: 2-3 years

Pat Mos Puz Sit Cat Ana Pat Mos Puz Sit Cat Ana

Pat – Pat –Mos .50 – Mos .36 –Puz .39 .45 – Puz .24 .39 –Sit .35 .36 .34 – Sit .33 .30 .31 –Cat .35 .36 .30 .39 – Cat .32 .39 .31 .51 –Ana .34 .37 .28 .31 .39 – Ana .28 .31 .22 .29 .45 –

Age: 4-5 years Age: 6-7 years

Pat Mos Puz Sit Cat Ana Pat Mos Puz Sit Cat Ana

Pat – Pat –Mos .60 – Mos .56 –Puz .50 .49 – Puz .44 .47 –Sit .33 .34 .32 – Sit .41 .48 .38 –Cat .36 .34 .32 .33 – Cat .37 .33 .26 .33 –Ana .32 .40 .28 .28 .33 – Ana .43 .39 .35 .36 .41 –

Page 52: SON-R 2 - Tests & Test-research

52 SON-R 2,-7

Correlation with the total scoreThe correlation of the subtests with the total score was examined by calculating the correlationwith the unweighted sum of the five remaining subtests and the square of the multiple corre-lation of a subtest with the five remaining subtests (table 5.8). The latter indicates the proportionof variance explained by the optimally weighted combination of the other subtests.

For the entire norm group, Patterns and Mosaics correlated most strongly with the remainingtotal score. However, this was not the case in the youngest age group. For the two to three-year-olds, Categories had the strongest correlation with the remaining total score (.59), but for the sixto seven-year-olds this correlation decreased to .46. In this age range, Categories had theweakest correlation with the remaining subtests.

About 70% of the variance of each subtest could not be explained by the scores on the othersubtests. This is partially explained by the unreliability of the subtests. However, it also indi-cates that a substantial part of the reliable variance of each subtest is specific. The importance ofthe subtest-specific reliable variance decreased as the children grew older.

5.4 PRINCIPAL COMPONENTS ANALYSIS

In order to determine how many dimensions can be distinguished meaningfully when inter-preting the test results, a Minimum Rank Factor Analysis was first carried out for the entirenorm group (Ten Berge & Kiers, 1991). This method was used to determine how manyfactors were required to explain the common variance of the variables. One factor explained87% of the common variance, two factors explained 97% and three factors explained 100%.The third factor added little to the the explained variance. After rotation, only one subtesthad a high loading on the third factor. As a result, further analyses of a solution were basedon two factors.

In the first part of table 5.9 the results of the Principal Components Analysis for the entirenorm group and for the three age groups are presented. In the entire norm group, 60% of thetotal variance is explained by the first two components. The percentage increases slightly, to64%, in the age groups. The total variance includes the subtest-specific reliable variance and theerror of measurement variance of the subtests. Therefore, the percentages of explained varianceare lower than for the minimum rank factor analysis that determines which part of the commonvariance is explained.

In the entire norm group the loadings on the rotated components showed a clear distinctionbetween the performance subtests (Patterns, Mosaics and Puzzles) and the reasoning subtests(Situations, Categories and Analogies). This distinction was also seen in the middle age group.In the youngest groups, however, Patterns had an equally high loading on both components,whereas in the oldest group, Situations, like the performance tests, had its highest loading on thefirst component.

Table 5.8Correlations of the Subtests with the Rest Total Score and the Square of the MultipleCorrelations

Correlation with Rest Total Square of the Multiple Correlation

2-7 years 2-3 4-5 6-7 2-7 years 2-3 4-5 6-7

Pat .56 .44 .61 .63 Pat .33 .20 .43 .41Mos .59 .52 .63 .63 Mos .37 .28 .45 .43Puz .50 .42 .55 .53 Puz .27 .20 .33 .30Sit .49 .51 .45 .54 Sit .25 .31 .20 .30Cat .51 .59 .47 .46 Cat .27 .40 .23 .24Ana .47 .45 .45 .54 Ana .24 .24 .22 .30

Page 53: SON-R 2 - Tests & Test-research

53PSYCHOMETRIC CHARACTERISTICS

To determine how important the differences in loadings between the three age groups were, aSimultaneous Components Analysis was carried out on these data sets (Millsap & Meredith,1988; Kiers & Ten Berge, 1989). This was done to examine whether a uniform solution ofcomponent weights explained (substantially) less of the variance than the solutions that wereoptimal for the separate age groups. The analysis with the SCA program (Kiers, 1990) showedthat this was not the case: the uniform solution over the three age groups explained 61.1% of thevariance and the separate optimal solutions explained 61.4% of the variance. Also importantwas the fact that the simple weights, being 1 or 0 (depending on the scale to which the subtestbelongs), were almost as effective as the optimal uniform solution. Using simple weights, as isdone in the construction of the Performance Scale and the Reasoning Scale, the percentage ofexplained variance was 60.8%.

Table 5.9Results of the Principal Components Analysis in the Various Age and Research Groups

Eigenvalue and Percentage of the Explained Variance by the first two Main Components

2-7 years 2-3 years 4-5 years 6-7 years

F1 2.8 47% 2.7 45% 2.9 48% 3.0 50%F2 .8 13% .8 14% .8 14% .8 14%

Loadings on the first two Varimax-Rotated Components

2-7 years 2-3 years 4-5 years 6-7 yearsF1 F2 F1 F2 F1 F2 F1 F2

Pat .72 .29 .44 .43 .82 .23 .69 .37Mos .75 .29 .72 .30 .78 .31 .79 .23Puz .80 .12 .85 .07 .78 .19 .79 .07

Sit .35 .59 .30 .65 .22 .68 .65 .29Cat .17 .80 .25 .78 .20 .74 .13 .88Ana .18 .75 .05 .78 .21 .70 .35 .70

Boys Girls low SES high SESF1 F2 F1 F2 F1 F2 F1 F2

Pat .84 .13 .71 .31 .81 .15 .62 .39Mos .79 .28 .72 .33 .76 .24 .70 .31Puz .59 .38 .84 .07 .76 .10 .84 .08

Sit .24 .70 .39 .55 .36 .59 .23 .71Cat .16 .79 .16 .81 .00 .88 .12 .84Ana .25 .66 .17 .75 .46 .46 .33 .64

Tested outside Gen./Perv. Speech/language/ Immigrant the Netherlands Dev. Disorder Hearing Disorder

F1 F2 F1 F2 F1 F2 F1 F2

Pat .90 .03 .85 .28 .80 .37 .78 .28Mos .71 .39 .82 .36 .80 .33 .79 .26Puz .52 .34 .75 .39 .82 .24 .82 .18

Sit .10 .87 .30 .78 .42 .66 .57 .32Cat .22 .72 .41 .71 .29 .80 .28 .79Ana .38 .60 .28 .80 .23 .78 .24 .83

Page 54: SON-R 2 - Tests & Test-research

54 SON-R 2,-7

In the second part of table 5.9, the loadings on the first two components are shown for differentsamples of the norm group. These are the boys (N=561), the girls (N=563), and the childrenwhose parents had either a low (N=233) or a high SES level (N=202). The SES level and itscorrelation with the test performances are described in section 6.6. In the four groups theloadings of the subtests are consistent with a distinction between performance and reasoningtests, with one exception: the loading of Analogies was the same for both components for thechildren with a low SES level.

The last part of table 5.9 presents the component loadings for a number of groups who werenot, or only partially, tested in the context of the standardization research. The first groupconsisted of immigrant children (N=118). These were children who lived in the Netherlands andwhose parents were both born abroad. About two thirds of this group was tested in the context ofthe standardization research. The remaining one third was tested at primary schools in thecontext of the validation research (see chapter 8). The second group consisted of children whowere tested in other countries (N=440). The research was conducted in Australia, the UnitedStates of America and Great Britain, mainly with children without specific problems or handi-caps, although some children with impaired hearing, bilingual children and children with alearning handicap were included (see section 9.5, 9.6 and 9.7). The third and fourth groupsconsisted of children with specific problems and handicaps, who were examined in the Nether-lands in the context of the validation of the test (see chapter 7). The third group consisted ofchildren with a general developmental delay and children with a pervasive developmentaldisorder (N=328). The fourth group consisted of children with a language/speech disorder,impaired hearing and/or deaf children (N=346). In these four groups, with one exception, theloadings on the first two rotated components corresponded to the distinction between perform-ance and reasoning tests. In the group of children with language/speech and/or impaired hearingdisorders, the subtest Situations had its highest loading on the first performance component.

The distinction made by the SON-R 2,-7 between the Performance Scale and the ReasoningScale, is supported to a large extent by these results in very different groups. Though thereliability of the difference between scores on the two scales is moderate, this distinction is themost relevant one for the intra-individual interpretation of the test results. The empirical validityof the distinction between the Performance Scale and the Reasoning Scale will be discussed insection 9.9.

5.5 STABILITY OF THE TEST SCORES

Correlations and meansThe SON-R 2,-7 was administered a second time to a sample of 141 children who had part-icipated in the standardization research. The mean interval between administrations was 3.5months, with a standard deviation of 21 days. The age of the children varied from 2;3 to 7;4years. The mean age at the first administration was 4;6 years with a standard deviation of 1;5years. The number of boys and girls was almost equal.

The correlations between the scores at each administration, and the mean and standarddeviation of the scores, are presented in table 5.10. If the standard deviation of the scores of thefirst administration was different from the standard deviation in the norm population, the corre-lations were corrected (see Guilford & Fruchter, 1978).

The test-retest correlation for the IQ score was .79. For the Performance Scale and theReasoning Scale, it was .74 and .69 respectively, and for the subtests .57 on average. Thestability was relatively high for Mosaics and Categories (both .64) and relatively low for Situa-tions and Analogies (respectively .48 and .49).

The test-retest correlations for all the test scores are clearly lower than the reliability basedon internal consistency. This indicates that changes in performance occur which cannot beattributed to errors of measurement. In chapter 10 the significance of this will be discussed.

Performances on all subtests were, on average, better during the second administration. Theincrease in standardized scores (both times based on the exact age) varied from .5 (Analogies) to

Page 55: SON-R 2 - Tests & Test-research

55PSYCHOMETRIC CHARACTERISTICS

1.2 (Situations). The scores on the Performance Scale and the Reasoning Scale increased bymore than 5 points. The IQ score increased, on average, by 6 points. All differences in meanscores were significant at the 1% level, except for the subtests Patterns and Analogies.

A distinction was made between the children who were younger than 4;6 years (mean age3;4 years; N=67) at the first administration, and children who were older (mean age 5;7 years;N=74). In the younger group the test-retest correlation for the IQ was .78, in the older group .81.The correlation for the Reasoning Scale decreased slightly with age (from .71 to .69). For thePerformance Scale it increased clearly (from .65 to .80). The increase in the mean IQ in bothgroups was practically equal.

Profile analysisA profile analysis was carried out to determine the meaning of the intra-individual differencesbetween the subtest scores of a single subject. One of the characteristics of the profile is thedispersion of the scores. This was calculated as the standard deviation of the six scores (thesquare root of the mean square of the deviations of the six subtests from the individual mean). Inthe entire norm group the mean of the dispersion was 2.0. For 24% of the children, the intra-individual dispersion was 2.5 or higher, and for 9% the dispersion was 3.0 or higher.

The mean individual dispersion for the 141 children who were tested twice with the SON-R2,-7 was 2.0 on both occasions. Remarkably, the correlation between the dispersion on the firstand second administration was weak (.17) and not significant.

Another important characteristic of the profile is the relative position of the subtest scores.To determine whether this was stable, the six subtest scores from the first administration werecorrelated, for each child, with the six scores of the second administration. The mean correlationwas .32. The strength of the correlation depends very much on the dispersion of the scores onthe first administration. Clearly, if the differences are small, they are determined largely byerrors of measurement and are therefore unstable. Where the dispersion on the first administra-tion was less than 2.0 (N=69), the mean correlation was .22; where the dispersion was 2.0 to 3.0,the mean correlation was .38, and for the twelve children who had a dispersion of 3.0 or more,the mean correlation was .61. This indicates that the differences between the subtest scores mustbe substantial before we can conclude that they will remain stable over a period of some months.When using the computer program, the dispersion is calculated and printed.

The difference between the scores on the Performance Scale and the Reasoning Scale in thefirst administration correlated .46 with the difference between the two scores in the secondadministration. For the children younger than 4;6 years, the correlation was .43 and for the olderchildren it was .50.

Table 5.10Test-Retest Results with the SON-R 2,-7 (N=141)

Admin. I Admin. IIr Mean (SD) Mean (SD) Difference

Patterns .56 10.2 (3.0) 10.8 (2.6) .6Mosaics .64 10.6 (2.8) 11.6 (3.1) 1.0Puzzles .60 10.2 (2.9) 11.4 (2.8) 1.1Situations .48 10.4 (2.5) 11.6 (3.1) 1.2Categories .64 10.5 (2.8) 11.2 (2.9) .7Analogies .49 10.6 (2.9) 11.1 (3.0) .5

SON-PS .74 102.5 (14.3) 107.9 (14.3) 5.5SON-RS .69 103.5 (13.7) 108.7 (15.2) 5.2

SON-IQ .79 103.4 (13.7) 109.4 (14.7) 6.0

– correlations have been corrected for variance in the first administration

Page 56: SON-R 2 - Tests & Test-research

56 SON-R 2,-7

As an example, the scores of a few children on the two administrations are presented in table5.11. The dispersion and the correlation between the six scores are also shown. The examplesillustrate that important changes can take place in the intra-individual order of the subtestscores.

Table 5.11Examples of Test Scores from Repeated Test Administrations (I and II)

Example A Example B Example C Example DI II I II I II I II

SON-IQ 97 108 109 116 106 110 121 120

SON-PS 100 105 108 110 100 94 122 123SON-RS 93 113 107 118 113 126 116 113

Patterns 11 14 12 10 11 9 14 12Mosaics 8 11 13 17 9 9 9 12Puzzles 11 7 9 8 10 9 18 17Situations 9 14 13 13 8 13 10 12Categories 9 12 12 11 14 16 15 13Analogies 9 10 8 14 14 13 12 11

Dispersion 1.1 2.4 2.0 2.9 2.3 2.7 3.1 2.0Correlation –.18 .32 .56 .78

Page 57: SON-R 2 - Tests & Test-research

57

6 RELATIONSHIPS WITH OTHER VARIABLES

In this chapter the relationship is discussed between test performance and a number of variablesthat are important in order to judge the validity of the test. The analyses are based on the resultsof the standardization research. Other tests were also administered to a large number of thechildren in order to validate the SON-R 2,-7. The results are described in chapter 9. A compar-ison is made in section 9.11 between the SON-R 2,-7 and other tests, with respect to theirrelationship with a number of variables that are discussed in this chapter, i.e. SES index,parents’ country of birth, evaluation by the examiner, and the school’s evaluation of languageskills and intelligence.

6.1 DURATION OF TEST ADMINISTRATION

In general the test was administered in one session with short breaks if necessary. In the case of9% of the children a break of longer than a quarter of an hour was taken, usually due to schoolrecess or the end of the school day. In these cases the second part of the test was administeredlater in the day or on another day. The mean IQ score of the children to whom the test wasadministered in two parts did not deviate from the mean of the children to whom the test wasadministered in one session.

The duration of administration (including short breaks) had a mean of 52 minutes with a stand-ard deviation of 12 minutes. For two-year-olds the duration of administration was shorter; in theage group of 2;3 years the mean duration of administration was 38 minutes and in the age groupof 2;9 years this was 46 minutes. From three years onwards the mean was fairly constant at 54minutes. In table 6.1 the frequency distribution of the duration of administration is presented bothfor the total norm group and for the two-year-olds and the older children as separate groups.

There was a significant positive correlation between the duration of administration and theIQ score. This relationship was strong (r=.52) especially for the two-year-olds. The correlationfor the older children was .34. The relation could be explained by the fact that children withineach group who performed well completed more items on average.

Table 6.1Duration of the Test Administration

Duration of the complete test Mean duration in minutes(N=1124) (N=1014)

2-7 years 2 yrs 3-7 yrs Mean (SD)

- 40 min 16% 49% 9% Patterns 7.0 (3.0)41 - 50 min 32% 30% 32% Mosaics 10.3 (3.9)51 - 60 min 32% 17% 36% Puzzles 8.5 (3.1)61 - 70 min 14% 3% 16% Situations 6.3 (2.3)

> 70 min 6% 1% 7% Categories 8.4 (3.3)Analogies 8.8 (2.8)

Total 49.2 (10.7)

Page 58: SON-R 2 - Tests & Test-research

58 SON-R 2,-7

The duration of the administration of the separate subtests was known for 1014 children (table6.1). Situations had the shortest duration of administration with a mean of 6.3 minutes and alsothe narrowest dispersion in duration. Mosaics had the longest duration of administration with amean of 10.3 minutes, and the widest dispersion.

The duration of administration was also recorded for children who participated in othervalidation research projects (see chapter 7). The mean duration (including short breaks) for thesechildren, who had varying problems and handicaps in cognitive development and communication,was 57 minutes. This was 5 minutes longer than for the children in the standardization research.The duration of administration was relatively short for children with a general developmentaldelay (a mean of 53 minutes) and relatively long for deaf children (a mean of 66 minutes).

6.2 TIME OF TEST ADMINISTRATION

The influence of the time of administration on test results was examined in the standardizationresearch. The largest part of the norm group was tested during the first twelve weeks of theschool year 1993/94. For these 1065 children, the relationship was examined, using analysis ofvariance, between the IQ scores and the period of research (four consecutive periods of threeweeks), the day of the week on which the test was administered, and the time of day at which theadministration was started.

In table 6.2 the mean IQ scores for each category of these three variables are presented as adeviation from the total mean. Each variable was controlled for the effect of the other twovariables. The largest differences in mean IQ scores were found for the variable starting time,but the effect was not significant (F[6,997]=1.26; p=.27).

Table 6.2Relationship of the IQ Scores with the Time of Administration (N=1065)

Starting Time N dev. Day of Week N dev. Period N dev.

8- 9 a.m. 108 1.9 Monday 198 –.8 I 305 .69-10 a.m. 231 .3 Tuesday 287 –.1 II 302 –.210-11 a.m. 240 –.4 Wednesday 162 –.5 III 178 .611- 1 p.m. 139 –1.9 Thursday 262 1.2 IV 280 –.81- 2 p.m. 162 –.6 Friday 156 –.32- 3 p.m. 115 2.4After 3 p.m. 70 –1.2

6.3 EXAMINER INFLUENCE

Eleven examiners tested most of the children in the standardization research. The scores of thedifferent examiners were compared, while controlling for the sex of the children, the percentageof immigrant children (children whose parents had both been born abroad) and the SES index.In table 6.3 the deviations from the total mean are shown for the IQ score, after controlling forthe other variables.

The beta coefficient, which indicates how strong the association is after controlling for theother variables, was .18 and clearly significant (F[10,1059]=4.09; p<.01]. However, the differ-ences between the examiners also occurred partially because of sample fluctuations. About onequarter of the variance in the mean scores of the examiners could be ascribed to this. However,even when this was taken into account, deviations of two to three IQ points, resulting from themanner of administration by the examiner or from other characteristics of the examiner,remained plausible.

Page 59: SON-R 2 - Tests & Test-research

59RELATIONSHIPS WITH OTHER VARIABLES

The strength of the examiner influence showed no clear relation to age; the beta for the two- andthree-year-olds was .23, the beta for the four- and five-year-olds was .18 and the beta coefficientfor the six- and seven-year-olds was .28.

The mean IQ score of the children who were tested by three male examiners was 2.2 pointslower than the mean IQ score of the children who were tested by the female examiners. The p-value of the difference was .02. There was no interaction effect between the sex of the child andthe sex of the examiner. The number of male examiners was too small to assess whether thedifference between the male and the female examiners was based on their sex or whether thiswas caused by personal characteristics unrelated to their sex.

With the exception of Situations, the examiner influence was significant in the varioussubtests. The influence was greatest for Mosaics and Categories, the tests that are adminis-tered first. This may indicate that the differences between the examiners were related to themanner in which the child was put at ease and motivated at the beginning of the testadministration.

6.4 REGIONAL AND LOCAL DIFFERENCES

The selection of the communities where the standardization research was carried out wasstratified according to region, community size and degree of urbanization. The IQ scores as adeviation from the total mean are presented in table 6.4. In the first column the observed valuesare given. In the second column the means after controlling for sex, the SES-index and thepercentage of immigrant children are given. The entire group consists of 1102 children. Thechildren from special schools and the immigrant children who were later added to the normgroup were not included. The children whose SES-index was not known were also not included.

Relatively few differences were found between regions or different sized communities. Bothvariables had a p-value of .04 for the differences. After controlling for the background of thechildren, the differences decreased and were no longer significant. The differences according todegree of urbanization (rural communities, urbanized rural communities, commuter communi-ties and urban communities) were small before and after controlling for the other variables andwere not significant.

The results for region, community size and degree of urbanization correspond to the findingswith the standardization research of the SON-R 5,-17 (Snijders, Tellegen & Laros, 1989). An

Table 6.3Examiner Effects (N=1073)

Mean Scores as Strength of the Examiner EffectDeviation from the before (eta) and after (beta)Total Mean Controlling for other Variables

Examiner N dev. Score eta beta

A 104 –4.81 Patterns .15 .15B 98 –3.14 Mosaics .22 .20C 115 –2.58 Puzzles .14 .15D 50 –2.45 Situations .13 .12E 92 –0.21 Categories .21 .19F 61 –0.20 Analogies .16 .17G 97 1.24H 110 1.76 SON-PS .17 .16I 123 2.46 SON-RS .18 .18J 115 2.78K 108 2.99 SON-IQ .18 .18

Page 60: SON-R 2 - Tests & Test-research

60 SON-R 2,-7

exception was that with the SON-R 5,-17, relatively high performances were found for chil-dren in commuter communities, both before and after controlling for other variables.

6.5 DIFFERENCES BETWEEN BOYS AND GIRLS

In table 6.5 the mean test scores of the boys (N=561) and the girls (N=563) from the norm groupare presented. The differences that were significant at the 1% level using a t-test, are markedwith an asterisk. The girls performed significantly better than the boys on four subtests. Thebiggest differences were encountered with the abstract reasoning tests Categories and Analo-gies. Patterns was the only performance subtest in which a clear difference was found. ThePerformance Scale showed a sex difference of 1.7 points that was not significant. The differenceof 4.6 points on the Reasoning Scale, however, was significant. The difference between themean IQ scores was 3.5 points. This difference tended to decrease for older children. In thegroup of two- and three-year-olds the difference in IQ scores was 5.7 points, for the four- andfive-year-olds this was 2.4 points and for the oldest children 2.1 points. However, the interactioneffect between sex and age group was not significant (F[2,1118]=1.73; p=.17). A regressionanalysis showed that the interaction effect between the exact age and the sex on the IQ score,was also not significant.

Table 6.4Regional and Local Differences (N=1102)

Deviations of the IQ Scores in relation to the Total Mean I: without controlling for other variablesII: after controlling for other variables

Region Community Size Degree of Urbanization(x 1000)

N I II N I II N I II

North/East 342 –1.0 –.7 < 20 375 –.4 .4 Rural Community 164 .4 .5South 212 2.3 1.5 <100 489 .8 .3 Urbanized Rural Comm. 250 –.6 .0West 548 .2 –.1 >100 238 –2.1 –1.4 Commuter Community 183 .7 –.6

Urban Community 505 –.1 .1

Table 6.5Relationship of the Test Scores with Sex (N=1124)

Boys GirlsScore Mean (SD) Mean (SD) Difference t

Patterns 9.7 (2.8) 10.3 (3.0) 0.5 3.06 *Mosaics 9.9 (3.0) 10.1 (3.0) 0.2 1.08Puzzles 10.0 (3.0) 10.0 (3.0) 0.0 -.04Situations 9.7 (2.9) 10.2 (2.9) 0.5 2.83 *Categories 9.7 (2.9) 10.4 (3.0) 0.7 3.82 *Analogies 9.5 (2.9) 10.4 (2.8) 0.9 5.07 *

SON-PS 99.3 (14.9) 101.0 (15.2) 1.7 1.86SON-RS 97.6 (14.8) 102.2 (14.9) 4.6 5.19 *

SON-IQ 98.4 (14.8) 101.9 (14.9) 3.5 3.94 *

*: p < .01 with two-tailed testing

Page 61: SON-R 2 - Tests & Test-research

61RELATIONSHIPS WITH OTHER VARIABLES

On the basis of other research data, the decrease or disappearance of the difference betweenboys and girls with age is plausible. No sex difference was found during the standardizationresearch of the SON-R 5,-17 during which 1350 children from 6;6 to 14;6 years were tested.The mean IQ score of the boys was 100.1 and of the girls 100.0. During the American standard-ization research of the K-ABC (Kaufman & Kaufman, 1983), a positive difference was found of4.4 points on the total score for girls in the age group from 2, to 5 years, whereas this differencewas only .2 in the age group from 5 to 12, years. During the standardization of the GOS2,-4,, the Dutch version of the K-ABC for young children, the total score for the girls provedto be 4.7 points higher than the total score for the boys (Neutel, Van der Meulen & LutjeSpelberg, 1996). The results with regard to sex-related differences found in the SON-R tests andthe K-ABC are thus very similar. In the case of adolescents and adults, however, males appear toperform better on intelligence tests (Lynn, 1994).

6.6 SES LEVEL OF THE PARENTS

Education and occupationThe socio-economic level of the parents was based on their occupational and educational level.Information on the level of occupation was provided by the index of occupations of the Institutefor Applied Sociology in Nijmegen (Van Westerlaak, Kropman & Collaris, 1975). If a parentwas out of work, he or she was classified according to the last job held. The index of occupationsdistinguishes 6 levels. The categories ‘unskilled worker’ (e.g. grocery packer, constructionworker) and ‘skilled worker’ (e.g. dockworker, animal keeper) were combined. The occupation‘housewife’ also belongs to that category. The categories ‘lower employee’ (e.g. bartender, bankteller) and ‘small independent businessman’ (e.g. druggist, gardener) were also combined. Twomore categories are distinguished: ‘intermediate employee’ (e.g. teacher, librarian) and ‘profes-sional’ (e.g. psychologist, lawyer).

The level of education was based on the highest level that had been completed. These levelsrange from the lowest category, i.e. primary school, via the general secondary education stream,the higher general secondary education stream, the pre-university education stream, to thehighest category, higher vocational education and university.

The occupational and educational level of both parents was known in the case of 1071children of the norm group. In table 6.6 the distribution according to occupational and educa-

Table 6.6Relationship of the IQ Score with the Ooccupational and Educational Level of the Parents(N=1071)

Father MotherOccupational Level Pct Mean (SD) Pct Mean (SD)

0 (Un)Skilled Worker/Housewife 33% 96.1 (14.3) 39% 96.1 (14.8)1 Lower Empl/Sm. Ind. Business 32% 99.6 (14.2) 44% 101.5 (14.0)2 Intermediate Employee 19% 102.6 (14.8) 14% 106.3 (15.1)3 Professional 16% 108.4 (14.8) 3% 110.4 (14.7)

Father MotherEducational Level Pct Mean (SD) Pct Mean (SD)

0 Primary Education 7% 92.9 (12.5) 6% 92.4 (12.3)1 General Secondary Education 38% 96.9 (14.7) 43% 96.6 (14.3)2 Higher Gen.Secondary Education 29% 101.2 (13.4) 34% 102.7 (14.2)3 Tertiary: Non-University 19% 104.7 (14.9) 14% 107.7 (14.5)4 Tertiary: University 7% 111.6 (15.0) 3% 111.5 (15.8)

Page 62: SON-R 2 - Tests & Test-research

62 SON-R 2,-7

tional level of both parents, and the mean IQ for each category, is presented for these children.The table shows that the IQ score clearly increased with the occupational and educational levelof the father and the mother. The correlation with the occupational level of the father was .28and with the occupational level of the mother .27. The correlation with the educational level ofthe father was .31 and with the educational level of the mother .32. All these correlations weresignificant at the 1% level.

SES index and SES levelThe SES index was based on a combination of the educational and occupational levels of theparents. The occupational level of the parent with the highest level was added to the educationallevel of both parents. If the occupational level of one parent was not known, the level of theother parent was used. If the educational level of one parent was not known, the educationallevel of the other parent was counted twice.

In all, the SES index of 1118 children in the norm group could be calculated. The mean was4.8, the standard deviation 2.6. The SES index was also categorised according to level. Fourlevels were used: ‘low’, ‘below average’, ‘above average’ and ‘high’. The categories arereferred to as SES levels.

The correlation of the SES index with the IQ score was .34. The Performance Scale had acorrelation of .29 with the SES index, the Reasoning Scale a correlation of .31. Among thesubtests small differences in strength of the correlation were found; Situations had the weakestcorrelation (.22) and Analogies had the strongest (.25). The differences between boys and girlswere also slight; the correlations with the IQ score were .36 and .34 respectively.

The correlation between the SES index and the IQ score increased with age; for children oftwo and three years of age the correlation was .23, for children of four and five years of age itwas .35 and for six- and seven-year-olds it was .46. In table 6.7 the mean IQ scores per SES levelare presented for the entire group and for the three age groups separately. In the entire group thedifference in mean IQ score between children from a low and from a high SES level was 15points. In the youngest group this difference was 12 points and in the oldest group 19 points.The fact that the performances, especially of children with a high SES level, increased with ageis remarkable. Analysis of variance, however, showed that the interaction effect between agegroup and SES level was not significant (F[6,1106]=1.23; p=.28). Regression analysis alsoshowed that the interaction effect between the SES index and the age at which the test was takenwas not significant.

6.7 PARENTS’ COUNTRY OF BIRTH

In this section a short overview is given of the test performances of the immigrant children in thenorm group. In chapter 8 the results of immigrant children will be discussed in detail. In table6.8 the mean IQ scores are presented for three groups of children; native Dutch children (bothparents born in the Netherlands), immigrant children (both parents born abroad), and a mixed

Table 6.7Relationship of the IQ Score with the SES Level

Entire Group 2-3 years 4-5 years 6-7 years(N=1118) (N=396) (N=409) (N=313)

SES Level Pct Mean (SD) Mean (SD) Mean (SD) Mean (SD)

1 Low 21% 92.6 (13.7) 93.6 (15.2) 92.6 (13.2) 92.0 (13.2)2 Below Average 32% 98.4 (14.0) 98.9 (14.8) 98.0 (14.1) 98.4 (12.6)3 Above Average 29% 102.8 (14.0) 101.6 (14.5) 102.1 (13.3) 105.7 (13.8)4 High 18% 107.9 (14.9) 105.5 (12.5) 108.1 (17.1) 111.1 (13.9)

Page 63: SON-R 2 - Tests & Test-research

63RELATIONSHIPS WITH OTHER VARIABLES

group (one parent born abroad). The eight immigrant children who were later added to the normgroup are not included in this analysis. With reference to the country of birth of the immigrants,a distinction was made between the three most important groups, i.e. Surinam or the Antilles,Turkey and Morocco. The remaining countries were subdivided into Western (Europe, NorthAmerica, Australia) and non-Western countries.

The mean IQ score of the immigrant children was 93.2, 7.5 points lower than the mean IQ ofnative Dutch children. The difference was significant at the 1% level. The mean IQ of the mixedgroup, with one foreign parent, was slightly higher than that of the native Dutch children.However, this difference was not significant. In the mixed and immigrant groups, the perform-ances of the Turkish and Moroccan children were low; the performances of the Surinam and theAntillean children were above average in the mixed group and low in the immigrant group. Theremaining Western children scored above average in both groups and the remaining non-Western children had an average score in both groups.

6.8 EVALUATION BY THE EXAMINER

After the test had been administered, the behavior of the child was evaluated by the examiner asto– motivation,– concentration,– cooperation with the examiner,– comprehension of the directions.The evaluation categories were ‘poor’, ‘mediocre’, ‘varying’ and ‘good’. The evaluations‘mediocre’ and ‘varying’ were combined in the presentation of the results. In table 6.9, thefrequency distribution of the evaluation is presented for three age groups, together with themean IQ scores.

Clear differences in evaluation existed between the different ages. The evaluation ‘poor’ wasrarely given on any aspects for children from four years onwards. Children two and three yearsof age received the rating ‘poor’ between 3% (motivation and cooperation) and 7% (concentra-tion and comprehension of the directions) of the time. The mean evaluation of the four aspectswas ‘good’ for 69% of the children two and three years of age, for 89% of children four and fiveyears of age and for 96% of children six and seven years of age. In all age groups, problems withconcentration were mentioned most frequently. In the youngest age group, comprehension ofthe directions was also evaluated as ‘mediocre’ or ‘varying’ fairly frequently.

In all three age groups the evaluation of concentration and of comprehension of directionscorrelated significantly with the IQ score. The correlations were strongest in the youngest age

Table 6.8Relationship Between IQ and Country of Birth of the Parents (N=1116)

Both Parents One Parent Both ParentsNative Dutch Foreign Foreign

Country of Birth N Mean (SD) N Mean (SD) N Mean (SD)

The Netherlands 969 100.7 (14.9) – –Surinam/Antilles – 11 103.6 (12.8) 27 91.8 (13.5)Turkey – 2 79.5 (13.4) 18 94.1 (13.0)Morocco – 1 82 21 88.8 (10.7)Other Western – 27 103.0 (17.2) 3 107.3 (17.9)Other Non-Western – 25 101.0 (14.5) 12 98.9 (16.9)

Total 969 100.7 (14.9) 66 101.3 (15.7) 81 93.2 (13.8)

Page 64: SON-R 2 - Tests & Test-research

64 SON-R 2,-7

group. In this group the correlations of motivation and cooperation with intelligence were alsosignificant.

The four evaluations were also combined. Zero was the lowest possible combined score (allfour evaluations ‘poor’) and eight the highest possible combined score (all four evaluations‘good’). The combined score, which gives an indication of how well the child responds to beingtested, increased greatly with age until the age of four years. In the age groups of 2;3, 2;9, 3;3and 3;9 the means were respectively 5.3, 6.3, 7.2, 7.5. From four years onwards the meangradually increased to 7.9 at the age of 7;3 years.

6.9 EVALUATION BY THE TEACHER

The teachers of the children tested at school were asked, at the end of the school year, toevaluate them on a number of aspects. In general, a period of six to eight months existedbetween the test administration and the evaluation. At that time, the schools had not beeninformed about the test results. The evaluations were given by teachers of the classes onethrough four at 48 different schools (Classes one and two correspond broadly with kindergartenin the American school system, and with preschool in the English school system. Class three

Table 6.9Relationship Between the Evaluation by the Examiner and the IQ

2-3 years (N=396) 4-5 years (N=413) 6-7 years (N=315)

Motivation Pct Mean (SD) Pct Mean (SD) Pct Mean (SD)

Poor 3% 90.3 (14.9) 1% 85.0 ( 4.2)Mediocre/Varying 24% 98.2 (13.3) 9% 96.5 (13.6) 2% 106.9 (15.4)Good 73% 101.4 (15.0) 89% 100.4 (15.4) 98% 100.1 (14.9)

Correlation .15* .12* –.07

Concentration Pct Mean (SD) Pct Mean (SD) Pct Mean (SD)

Poor 7% 89.3 (17.8) 1% 77.0 (11.9)Mediocre/Varying 32% 97.0 (12.7) 17% 95.4 (12.4) 9% 91.8 (16.5)Good 61% 103.2 (14.5) 82% 101.2 (15.4) 91% 101.1 (14.5)

Correlation .28* .21* .18*

Cooperation Pct Mean (SD) Pct Mean (SD) Pct Mean (SD)

Poor 3% 89.3 (14.7) 0.2% 87 0.3% 112 Mediocre/Varying 20% 97.8 (13.4) 7% 93.9 (12.4) 4% 89.5 ( 9.2)Good 77% 101.4 (14.9) 93% 100.3 (15.4) 96% 100.7 (14.9)

Correlation .16* .11 .11

Comprehensionof directions Pct Mean (SD) Pct Mean (SD) Pct Mean (SD)

Poor 7% 87.3 (11.7)Mediocre/Varying 27% 95.8 (14.1) 9% 91.4 (13.2) 3% 88.9 (13.9)Good 66% 103.5 (14.1) 91% 100.7 (15.2) 97% 100.6 (14.8)

Correlation .33* .17* .14*

*: p < .01 with one-tailed testing

Page 65: SON-R 2 - Tests & Test-research

65RELATIONSHIPS WITH OTHER VARIABLES

corresponds to first grade or form, and class four to second grade or form of primary schools).At all schools, an evaluation was requested of motivation, concentration and work tempo of thechild, and on intelligence, motor development and language development. In classes 3 and 4 anevaluation of the level of reading, writing and arithmetic was also requested. The evaluation wasgiven on a 5-point scale, ranging from ‘low’ via ‘average’ to ‘high’.

Table 6.10 presents the correlations between the schools’ evaluations of these characteristicsand the Performance Scale, the Reasoning Scale and the SON-IQ. Correlations are presented forthe entire group (N=616), and for the pupils of classes 1 and 2 (N=344, mean age 5;2 years) andthe pupils of classes 3 and 4 (N=272, mean age 6;9 years) separately. All correlations weresignificant at the 1% level with one-tailed testing.

In classes 1 and 2, the evaluations of intelligence, concentration and language developmenthad strong relationships with the IQ score (the correlations are .47, .47 and .44 respectively) Theevaluations of motivation and work tempo also had a reasonably strong correlation with the IQ.The weakest correlation was found for the evaluation of motor development (r=.28). After astepwise regression analysis, the multiple correlation of the evaluations of intelligence, concen-tration and language development with the IQ score was .53. The correlations of the evaluationswith the Performance Scale were higher than with the Reasoning Scale, except for the evalua-tion of motor development where little difference was found.

In classes 3 and 4, the correlations of the IQ score with the evaluations of intelligence andlanguage development were slightly weaker than in groups 1 and 2 (.44 and .42 respectively).The correlations with motivation, concentration and work tempo decreased more, as did thecorrelation with the evaluation of motor development. In classes 3 and 4 an evaluation was alsogiven of the level of reading, writing and arithmetic. Of these, arithmetic had the highestcorrelation with the IQ score (r=.36). After stepwise regression analysis, the multiple correla-tion of the evaluations of intelligence, language development and writing skills with the IQscore was .48. In classes 3 and 4 the evaluation of writing skills clearly had a stronger correla-tion with the Performance Scale than with the Reasoning Scale; this was less so for arithmeticand work tempo. The other evaluations had stronger correlations with the Reasoning Scale.

For all classes combined, the correlation between the SON-IQ and the evaluation of intelli-gence was .46; the correlations with language development (r=.44) and with the evaluation ofconcentration by the teacher (r=.40) were also high.

In table 6.11 the correlations of the subtests with the teachers’ evaluations are presented.With the exception of the correlation between writing and Categories, all correlations weresignificant at the 1% level. Of all the subtests, Mosaics had the strongest correlation with theevaluation of intelligence (r=.38) and Situations the weakest (r=.24). Situations also had a weak

Table 6.10Correlations of the Total Scores with the Evaluation by the Teacher

Classes 1 and 2 Classes 3 and 4 Classes 1-4(N=344) (N=272) (N=616)

Evaluation PS RS IQ PS RS IQ PS RS IQ

Motivation .34 .24 .34 .23 .28 .28 .30 .27 .32Concentration .45 .37 .47 .27 .31 .32 .37 .35 .40Tempo .33 .27 .34 .26 .23 .27 .30 .26 .31

Intelligence .44 .37 .47 .37 .42 .44 .42 .40 .46Motor Development .25 .24 .28 .17 .18 .19 .22 .22 .24Language Development .41 .34 .44 .36 .39 .42 .40 .37 .44

Reading .26 .29 .31Writing .31 .23 .30Arithmetic .34 .31 .36

Page 66: SON-R 2 - Tests & Test-research

66 SON-R 2,-7

correlation with the other evaluations. Patterns and Analogies had the strongest correlationswith the evaluation of motor development. The three performance subtests correlated mosthighly with the evaluation of language development. Situations and Categories (multiple choicetests) correlated less strongly with the evaluations of motivation, concentration and work tempothan did the other subtests. Puzzles, Categories and Analogies had relatively strong correlationswith the evaluation of reading, Patterns and Puzzles with the evaluation of writing, and Patternsand Analogies with the evaluation of arithmetic.

For the group as a whole, the evaluation of intelligence was ‘low’ for six children, ‘belowaverage’ for 63 children, ‘average’ for 343 children, ‘above average’ for 107 children, and ‘high’for 33 children. The mean IQ scores were 74.2, 89.2, 97.7, 107.0 and 114.6 respectively. Thisshows a difference of more than 40 IQ points between the children who were evaluated, morethan half a year after administration of the test, by the teacher as being either ‘less’ or ‘highly’intelligent.

When the fact is taken into account that the relationships examined here refer to subjectiveevaluations by a large number of different teachers, and not to standardized measurements ofschool achievement, the correlation between the evaluation of intelligence and the SON-IQ cancertainly be called good.

Table 6.11Correlations of the Subtest Scores with the Evaluation by the Teacher

Groups 1-4 (N=616)

Evaluation Pat Mos Puz Sit Cat Ana

Motivation .25 .24 .25 .15 .21 .24Concentration .31 .30 .29 .24 .24 .30Tempo .24 .27 .22 .14 .22 .22

Intelligence .33 .38 .31 .24 .33 .33Motor Development .22 .14 .18 .12 .15 .21Language Development .33 .33 .32 .26 .29 .28

Groups 3 and 4 (N=272)

Evaluation Pat Mos Puz Sit Cat Ana

Reading .21 .17 .24 .18 .24 .24Writing .28 .18 .28 .17 .13 .23Arithmetic .27 .33 .23 .19 .19 .30

Page 67: SON-R 2 - Tests & Test-research

67

7 RESEARCH ON SPECIAL GROUPS

In practice, intelligence tests are administered mainly to children with a cognitive developmen-tal delay and to children with specific handicaps. Many of these children have a handicap incommunicative skills such as language or speech, and/or hearing problems. With these children,the use of a nonverbal intelligence test that does not depend on the use of language is aprerequisite for an independent evaluation of their cognitive skills. In this chapter the results arediscussed of the research carried out with the SON-R 2,-7 on a number of groups of specialchildren. In chapter 9 the correlations between the SON-IQ and the scores on other testsadministered to these children will be discussed.

7.1 COMPOSITION OF THE GROUPS

Research with the SON-R 2,-7 was carried out at a large number of schools and institutes forchildren with problems and handicaps. This was done partially parallel to, and partially follow-ing the standardization research. An effort was made to examine all the children in the correctage group. However, in a few cases the parents refused permission or the school considered itinadvisable to test the children. The test was administered by staff members at the school/institute, by examiners of the standardization research, and by trained students participating inthe research projects within the framework of their study (Brouwer, Koster & Veenstra, 1995,Snippe, 1996).

Types of schools and institutesPupils at the following types of schools/institutes participated in the research:– Schools for Special Education with a department for children at risk in their development

The test was administered to 100 children at six schools for special education with a depart-ment for ‘young children at risk in their development’. Children are usually transferred fromthese schools to schools for children with specific learning and educational problems and toschools for learning disabled children.

– Medical daycare centers for preschoolersThe test was administered to 162 children at three medical daycare centers for preschoolers.These establishments provide daytime treatment, from the age of about one and a halfonwards, for children with a developmental disorder. Such disorders are usually the result ofa combination of psychic, somatic and social factors.

– Schools for children with Language, Speech and Hearing disordersThree schools for children with speech or language problems and children with impairedhearing participated in the research. One hundred and eighty-three children were tested atthese schools.

– The outpatient’s department for nose, throat and ear surgeryChildren with speech or language and hearing problems were also tested at the outpatient’sdepartment for nose, throat and ear surgery of a University hospital, where they were under-going psychological examination in connection with their problems. This group consisted of90 children.

– Institutes for the deafChildren who were being educated at, or receiving guidance from, one of the five institutesfor the deaf in the Netherlands were tested. The research group was limited to native Dutch

Page 68: SON-R 2 - Tests & Test-research

68 SON-R 2,-7

children and children who did not have multiple handicaps. The results of the pupils at oneinstitute for the deaf were not taken into account in the presentation because of a strongexaminer effect. At the four other institutes for the deaf 95 children were tested with theSON-R 2,-7.

– Autism teamsThree different autism teams tested 44 children who were diagnosed as autistic or as having adevelopmental disorder related to autism. Autism teams are ambulatory institutions con-cerned with the diagnosis and guidance of children with these disorders. Autism and autismrelated disorders belong to the category of pervasive developmental disorders (APA, 1987).

The research groupsChildren with different problems can be placed at the same school or institute. For the analysisof the results, the children were therefore grouped according to the nature of their problemrather than the type of school or institute. The relationship between the type of school and theresearch group is presented in table 7.1. The following research groups were formed:– General developmental delay

Children with a general developmental delay were pupils at the schools with a department forchildren at risk in their development and at the medical preschool daycare centers. Variouscognitive, social and emotional factors play a role in the referral of these children. The testscores of both groups of children were very similar. The entire group consisted of 238children.

– Pervasive developmental disordersHalf the group of 90 children with a pervasive developmental disorder were children testedby the autism teams. All the children from the other groups who were diagnosed as autistic orhaving an autism related disorder were also included in this group. These were mainly pupilsfrom schools for children with language, speech and hearing disorders, from schools with adepartment for children at risk in their development and children from medical preschooldaycare centers.

– Language and/or speech disabilitiesThis group consisted of pupils from the schools for children with language, speech andhearing disorders, and children tested in the outpatient’s department for nose, throat and earsurgery, who had a language and/or speech disorder. If they also had a hearing loss, it was lessthan 30 dB. The entire group consisted of 179 children.

Table 7.1Subdivision of the Research Groups

Research Group

General Pervasive Language/School/ developm. developm. speech HearingInstitute N delay disorder disorder impaired Deaf

Special schools for children at risk in their development 100 89 11 – – –Medical daycare centers for preschoolers 162 149 13 – – –Schools for children with lang./sp./hear. disorders 183 – 21 116 44 2Outpatient’s department for nose/throat/ear surgery 90 – – 63 27 –Institute for the deaf 95 – 1 – 2 92Autism teams 44 – 44 – – –

Total 674 238 90 179 73 94

Page 69: SON-R 2 - Tests & Test-research

69RESEARCH ON SPECIAL GROUPS

– Hearing impairedThis group consisted of 73 hearing-impaired children with a hearing loss of more than 30 dBand less than 90 dB. The children were mainly pupils from the schools for children withlanguage, speech and hearing disorders, and the outpatient’s department for nose, throat andear surgery. Two pupils who had been tested at the Institute for the deaf were also included inthis group.

– DeafThe deaf children had a hearing loss of at least 90 dB. The group of 94 deaf children consistedmainly of children who had been tested at the Institutes for the deaf. Two pupils from theschools for children with language, speech and hearing disorders, with a hearing loss of morethan 90 dB were also included in this group.

Background of the childrenIn table 7.2 the distribution of the five research groups is presented according to sex, age, andsocio-economic level. A distinction is also made between native Dutch and immigrant children.

Table 7.2Composition of the Research Groups

Research Group

General Pervasive Speech/development development language Hearing

delay disorder disorder impaired Deaf

Sex

Boys 72% 79% 70% 63% 60%Girls 28% 21% 30% 37% 40%

Age

Mean 5;2 5;6 5;1 5;3 5;3(SD) (1;2) (1;2) (1;1) (1;3) (1;3)

2 years 4% 1% 3% 1% 1%3 years 13% 11% 17% 18% 19%4 years 24% 21% 25% 25% 20%5 years 31% 27% 34% 21% 25%6 years 24% 30% 20% 29% 28%7 years 5% 10% 1% 7% 7%

SES Index

Mean 3.1 5.0 3.9 4.4 5.2(SD) (2.1) (2.7) (2.3) (2.2) (2.8)

Unknown 8% 4% 17% 23% 7%

Country of birth

Native Dutch 86% 88% 96% 95% 94%Mixed 8% 5% 3% 2% 6%Immigrant 6% 8% 1% 4% 0%

Unknown 9% 1% 18% 23% 10%

Page 70: SON-R 2 - Tests & Test-research

70 SON-R 2,-7

Children with one parent who was born outside the Netherlands belong to the mixed category.Boys were over-represented in all groups. This is the case, in particular, in the groups with a

general developmental delay, a pervasive developmental disorder and with speech or languagedisorders. In these groups the percentage of boys varied between 70% and 79%. On a nationalscale, the percentage of boys in the age range up to 8 years in special education is also twice ashigh as the percentage of girls (CBS, 1993). In the groups of hearing-impaired and deaf chil-dren, the ratio of boys to girls was lower, with the percentage of boys approximately 60%.

The age distribution was very similar in the various groups. The mean age varied from 5;1 to5;6 years. Most children were between 3 and 6 years old at the time of the test administration. Asmall number of two-year-olds (most older than 2;6 years) and a small number of seven-year-olds (most younger than 7;6 years) were tested.

In the norm group the mean SES level, based on the educational and occupational level of theparents, was 4.8 with a standard deviation of 2.6. The mean SES level of the children with apervasive developmental disorder and of the deaf children was slightly higher; for the hearing-impaired children it was slightly lower. The SES level of the speech or language disabledchildren was clearly lower with a mean of 3.9, and the mean SES index of 3.1 of the childrenwith a general developmental delay was very low. In view of the relationship between the levelof intelligence and the SES level in the norm population, a developmental delay may be expect-ed to occur more frequently in children with a low SES level.

The percentage of native Dutch children in the groups of children with a speech or languagedisorder, of hearing-impaired children and of deaf children was relatively high. This wasbecause, in the group of deaf children, immigrant children did not meet the selection criteria,and in the other two groups, the research was carried out in the North of the Netherlands whererelatively few immigrants live.

7.2 THE TEST SCORES OF THE GROUPS

In table 7.3, the means of the scores on the different subtests, and of the total scores arepresented for each group. In the second part of the table, the deviation of the mean of eachsubtest from the mean of all subtests is shown for each group. In the last part of the table thedistribution of the IQ scores is presented for five intervals of 20 points. The results for eachgroup will be discussed, then the results of the different groups will be compared.

Children with a general developmental delayThe children with a general developmental delay had a mean IQ score of 80.3 with a relativelyhigh standard deviation of 17.6. Nearly 70% had a score lower than 70 (this was 2% in the normgroup). Three percent of the children had a score higher than 110 (26% in the norm group).There was a small difference between the mean scores on the Performance Scale and on theReasoning Scale, but it was not statistically significant (t[237]=1.91, p=.06). This group scoredlowest on Patterns (mean=6.3) and highest on Puzzles (mean=8.0).

The mean IQ score of the children who were tested at the schools for special education witha department for children at risk in their development (mean=79.5) did not deviate significantlyfrom the mean score of the children who were tested in the medical daycare centers for pre-schoolers (mean=80.8). However, the dispersion in scores in the latter group was greater(sd=19.1) than in the former group (sd=14.9).

Children with a pervasive developmental disorderChildren with a pervasive developmental disorder had a mean IQ score of 78.3 with arelatively high standard deviation of 18.7. In this group, 75% of the children had a scorelower than 90, and 6% had a score higher than 110. Hardly any difference existed betweenthe mean scores on the Performance Scale and on the Reasoning Scale. The lowest scoreswere obtained on the subtest Patterns (mean=5.6) and the highest scores on the subtestAnalogies (mean=7.8).

Page 71: SON-R 2 - Tests & Test-research

71RESEARCH ON SPECIAL GROUPS

Table 7.3Test Scores per Group

Mean and Standard Deviation

General Pervasive Speech/developm. developm. language Hearingdelay disorder disorder impaired Deaf(N=238) (N=90) (N=179) (N=73) (N=94)

Mean (SD) Mean (SD) Mean (SD) Mean (SD) Mean (SD)

Pat 6.3 (3.3) 5.6 (3.6) 7.7 (3.0) 8.2 (2.9) 9.9 (3.0)Mos 6.6 (3.5) 7.1 (3.9) 8.3 (3.2) 8.6 (3.2) 9.9 (2.7)Puz 8.0 (3.3) 7.5 (3.3) 8.6 (3.0) 9.3 (2.9) 10.3 (3.3)Sit 7.8 (3.3) 7.0 (3.4) 8.6 (3.0) 9.5 (3.3) 10.5 (2.8)Cat 7.2 (3.5) 6.5 (3.7) 7.8 (2.9) 8.9 (3.2) 8.4 (2.6)Ana 7.6 (2.7) 7.8 (3.4) 8.6 (3.1) 9.1 (3.0) 9.2 (3.1)

PS 81.4 (17.8) 80.2 (19.1) 88.8 (15.8) 91.9 (15.5) 100.0 (15.3)RS 83.2 (17.3) 80.9 (17.8) 88.8 (15.7) 94.4 (16.9) 95.9 (13.6)

IQ 80.3 (17.6) 78.3 (18.7) 87.5 (15.9) 92.2 (16.6) 97.9 (14.4)

Deviation from the Mean Subtest Score per Subtest

General Pervasive Speech/developm. developm. language Hearing

delay disorder disorder impaired Deaf

Pat –1.0 –1.3 –.6 –.7 .2Mos –.6 .2 .0 –.4 .2Puz .8 .6 .3 .3 .6Sit .5 .0 .4 .6 .8Cat –.1 –.5 –.5 –.1 –1.3Ana .3 .9 .4 .2 –.5

Frequency Distribution of the IQ Scores

General Pervasive Speech/Norm developm. developm. language Hearing

Interval group delay disorder disorder impaired Deaf

50- 69 2% 28% 32% 12% 8% 1%70- 89 23% 40% 43% 46% 34% 32%90-110 49% 28% 19% 36% 48% 46%

111-130 24% 3% 6% 6% 8% 20%131-150 2% 0% 0% 1% 1% 1%

The children with the diagnosis of autism had a lower IQ score (mean=73.3, N=38) than thechildren with the diagnosis of autism related disorder (mean=82.0, N=52; t[88]=2.23, p=.03).The largest difference between the autistic children and the children with an autism relateddisorder was found in the subtests Categories and Situations. Apparently the autistic childrenhad difficulty completing reasoning tests that use concrete pictures and situations.

The mean IQ score of the children tested by the autism teams did not differ from the meanscores of the children with a pervasive developmental disorder who were tested at other schools/institutes.

Page 72: SON-R 2 - Tests & Test-research

72 SON-R 2,-7

Children with a speech or language disorderThe mean IQ score of the children with a speech or language disorder was 87.5 with a standarddeviation of 15.9. More than one third of the children had a score between 90 and 110. Morethan half had a score lower than 90, and 7% a score higher than 110. The means of both scalescores were the same. The mean subtest scores deviated less than in the two previous groups.

A slight loss of hearing (less than 30 dB) or varying conductive hearing losses occured inmore than half the children. These children had a lower IQ score (mean=85.7, N=85) thanthe children with good hearing (mean=89.6, N=70) but this difference was not significant(t[153]=1.52, p=.13).

The difference in mean IQ scores between the children who were tested at the schools forchildren with a speech, language and hearing disorder and children from the outpatient’s depart-ment was not significant.

Hearing-impaired childrenThe mean IQ score of hearing-impaired children was 92.2 with a standard deviation of 16.6. Themean score on the Reasoning Scale (mean=94.4) was slightly higher than the score on thePerformance Scale (mean=91.9). However, the difference was not significant (t[72]=1.58,p=.12). The differences between the mean scores on the subtests were also small.

No difference in IQ scores occurred between the children with a loss of hearing of 30-59 dB(mean=92.5, N=36) and the children with a loss of hearing of 60-89 dB (mean=92.7, N=43).

Hardly any difference in mean IQ scores was found between the children who were tested atthe schools for children with speech, language and hearing disorders, and the children from theoutpatient’s department.

Deaf ChildrenThe research with deaf children was restricted to native Dutch children who were not multiplyhandicapped. A few children with one parent who was born outside the Netherlands wereincluded in the analysis. The mean IQ score of the deaf children was 97.9 with a standarddeviation of 14.4. As in the norm group, nearly half the children had an IQ score between 90 and110. A clear difference was found between the scores on the Performance Scale (mean=100.0)and on the Reasoning Scale (mean=95.9; t[93]=2.82, p=.01). Deaf children obtained the lowestscores on the subtests Categories (mean=8.4) and Analogies (mean=9.2). The scores on theother subtests deviated only slightly from the mean of 10 found in the norm group.

These results were very similar to those of the research carried out using the SON-R 5,-17with the entire population of older deaf children (Snijders, Tellegen & Laros, 1989). The nativeDutch deaf children, who were not multiple handicapped (three quarters of the deaf population),had a mean score on the SON-R 5,-17 of 97.0 and, as on the SON-R 2,-7, the lowest score wason the subtests Categories and Analogies. In the research with the SON-R 5,-17, these abstractreasoning tests also appeared to have the most substantial relationship with the STADO-R, awritten language test for deaf children (De Haan & Tellegen, 1986).

Comparisons between the groupsThe differences in mean IQ scores among the five groups were highly significant(F[4,669]=26.10, p<.001). Differences between pairs of groups were tested at the 5% level,using the modified LSD procedure (test for the least-significant differences). The differencebetween the children with a general developmental delay and the children with a pervasivedevelopmental disorder was not significant. Both groups scored significantly lower than thethree other groups. The children with a speech or language disorder differed significantly fromthe deaf children, but not from the children with impaired hearing (hearing loss less than 90 db).The children with impaired hearing did not score significantly lower than the deaf children.

In figure 7.1, 80% intervals of the distribution of the IQ scores are presented for the differentgroups. In each group 10% of the children have a lower score, and 10% have a higher score. Tofacilitate comparison, the interval for the children in primary education, four years and older,from the standardization research is also shown. The intervals illustrate the substantial differ-

Page 73: SON-R 2 - Tests & Test-research

73RESEARCH ON SPECIAL GROUPS

Figure 7.1Distribution of the 80% Frequency Interval of the IQ Scores of the Various Groups

50 60 70 80 90 100 110 120| | | | | | | |

< > Primary Education

< > Deaf

< > Hearing Impaired

< > Speech/language Disorder

< > Pervasive Developmental Disorder

< > General Developmental Delay

| | | | | | | |50 60 70 80 90 100 110 120

ences between the groups. The children with a general developmental delay and the childrenwith pervasive developmental disorders had low performance levels. Deaf children were verysimilar to children in primary education. The children with impaired hearing and the childrenwith a speech or language disorder took an intermediate position.

Besides these differences, the figure also shows a large overlap in the distributions of thegroups. The mean scores of the children with a developmental disorder or delay were low, butin both groups a good 10% of the children had a score higher than 100, which is the mean ofthe norm population. In contrast, 10% of the children in these groups had a score of 50 orthereabouts, which means that they performed at such a low level that the test did not differen-tiate further.

In all the groups, children performed relatively poorly on the subtests Categories and (withexception of the deaf) Patterns. In all the groups, children performed relatively well on Puzzles,Situations and (with exception of the deaf) on Analogies. The results on Mosaics varied (seetable 7.3).

When evaluating the differences between the groups, the manner in which the groups wereselected must be taken into account. Most of the children examined attended special schools andinstitutes that had strict selection procedures for admittance. Children who had, for example, apervasive developmental disorder or with impaired hearing, but who were in regular educationwere strongly under-represented. In their case, a cognitive delay is less likely to occur. On theother hand, autistic children in daycare centers for the mentally disabled were not included inthe research. The results are only representative for the children at the kinds of schools andinstitutes listed above, and then only to a limited extent due to the small number of schools andinstitutes involved. No statement can be made on the basis of this research about ‘the’ intelli-gence of autistic children, or ‘the’ intelligence of children with impaired hearing. Only in the

Page 74: SON-R 2 - Tests & Test-research

74 SON-R 2,-7

case of the deaf children was an effort made to obtain a representative picture of the intelligencelevel of (native Dutch) deaf children who are not multiple handicapped.

7.3 RELATIONSHIP WITH BACKGROUND VARIABLES

A variance analysis for a number of background variables such as sex, age, SES level andimmigrant status was carried out with the IQ score as dependent variable, controlling for theresearch group. No significant interaction effect with the research groups was found for any ofthe variables. In table 7.4 the mean values of the IQ scores are presented as the deviation fromthe total mean after controlling for the research group.

Few differences were found between boys and girls (p=.64), or among the three age groupsof two and three years, four and five years, and six and seven years (p=.59). A relationship withthe SES level of the parents (p=.02) was found, but this was much weaker than in the normgroup. The difference between the native Dutch children, the immigrant children and the chil-dren with a mixed background was not significant (p=.17). However, different backgroundcharacteristics (like sex and SES level) played an indirect role in the referral to the specialschools, because of the relative frequency of developmental problems among boys and amongchildren with a low SES level.

7.4 DIAGNOSTIC DATA

Diagnostic data for a large number of pupils from the schools for special education with adepartment for children at risk in their development and from the medical daycare centers hadbeen gathered during the admittance procedure to the school or the daycare center in question.The data refer to the home situation, the existence of emotional problems, behavioral problemsand communicative handicaps, and also include an evaluation of motor, language and cognitivedevelopment. Complete data sets were available for 238 children, 93 children from a departmentfor children at risk and 145 children from a medical daycare center. Twenty-four of thesechildren had a pervasive developmental disorder. The mean IQ score of the entire group of 238children was 80.9 with a standard deviation of 17.1.

In table 7.5, the distribution of the diagnostic variables is presented together with themean IQ scores for each category. Various problems and delays appear to be present in allthe diagnostic variables. The most favorable evaluation was found in relation to communica-tive handicaps (60% ‘none’) and motor development (40% ‘normal’). Serious behavioralproblems and large delays in language development were mentioned most frequently. Withrespect to the evaluation of cognitive development, nearly half the children had a small delayand 20% had a large delay.

The correlations between the IQ scores and the evaluation of the home situation, and ofemotional and behavioral problems, were weak and not significant. The relationships with theother diagnostic variables were significant on the 1% level. The correlation with communicative

Table 7.4Relationship of the IQ Scores with Background Variables

Sex Age SES Level Country of birth

N Dev N Dev N Dev N Dev

Boys 470 .2 2-3 j. 121 1.3 Low 172 –2.7 Native Dutch 538 –.2Girls 204 –.5 4-5 j. 354 –.5 Below aver. 233 –.5 Mixed 32 5.0

6-7 j. 199 .2 Above aver. 115 3.3 Immigrant 23 –2.8High 77 2.6

Page 75: SON-R 2 - Tests & Test-research

75RESEARCH ON SPECIAL GROUPS

handicaps was -.26. The correlation with both motor and language development was .46. TheSON-IQ correlated most strongly, .66, with the evaluation of cognitive development. The meanIQ score of the children whose cognitive development had been evaluated as ‘normal’ was 95.6,whereas the mean IQ score of the children with a large delay was more than 30 points lower, i.e.,64.3. With a stepwise multiple regression the correlation with the IQ increased slightly, from .66to .67, when motor development was also taken into account.

The Performance Scale and the Reasoning Scale both correlated strongly with the evaluationof cognitive development (.59 and .61). The Performance Scale had a stronger correlation withthe evaluation of motor development (r=.43) than with the evaluation of language development(r=.40). The Reasoning Scale had a higher correlation with the evaluation of language develop-ment (r=.44) than with the evaluation of motor development (r=.39).

7.5 EVALUATION BY THE EXAMINER

As was done during the standardization research, all the children in the special groups wererated by the examiner on motivation, cooperation, and comprehension of the directions, follow-ing the test administration. In table 7.6 the ratings and the mean IQ scores are presented for eachgroup. In a small number of cases, (approximately 2%), motivation, cooperation, or comprehen-sion of the directions was evaluated as ‘poor’. An exception was the group with a pervasivedevelopmental disorder, where cooperation was evaluated as ‘poor’ in 8% of the children.Concentration was evaluated as low in 5% of the children. In the case of the deaf children,however, this was 1%. Concentration and, to a lesser degree, motivation were frequently rated as‘mediocre’ or ‘varying’. Cooperation and comprehension of the directions were most frequentlyrated as ‘good’.

The deaf children were evaluated most positively. On average, the children with a pervasivedevelopmental delay had the lowest evaluation with relation to motivation, concentration, coop-eration and comprehension of directions.

Table 7.5Reasons for Referral of Children at Schools for Special Education and Medical Daycare Cent-ers for Preschoolers (N=238), with mean IQ scores

Fairly VeryNormal Unfavorable Unfavorable

Pct Mean Pct Mean Pct Mean

Home situation 29% 80.1 48% 80.6 23% 82.9

None Light SeverePct Mean Pct Mean Pct Mean

Emotional problems 17% 79.9 59% 83.3 24% 75.9Behavioral problems 14% 80.5 51% 81.5 35% 80.3Communicative handicap 60% 83.7 30% 79.6 10% 67.7

Small LargeNormal Delay Delay

Pct Mean Pct Mean Pct Mean

Motor development 40% 91.8 48% 73.4 12% 74.1Language development 24% 93.6 44% 80.2 32% 72.2Cognitive development 32% 95.6 48% 78.1 20% 64.3

Page 76: SON-R 2 - Tests & Test-research

76 SON-R 2,-7

Sixty-five percent of the entire group of 674 children were rated ‘good’ on all four aspects, or onthree aspects, with the fourth rated as ‘mediocre’/’varying’. Eleven percent had a mean rating of‘mediocre’/’varying’, or lower.

In comparison to the standardization research, the evaluations of the children from thesespecial groups were most similar to the evaluations of children two and three years of age.However, children from special groups received much higher ratings for comprehension of thedirections than did the two- and three-year-olds in the standardization research.

The ratings of motivation, cooperation, and comprehension of the directions correlatedsignificantly with the IQ score in most groups. The correlations were strongest in the group withimpaired hearing and for the evaluation of concentration. The correlations were substantiallystronger than in the norm group. The main cause for this is that a negative evaluation was morefrequently given in the special groups.

Table 7.6Relationship between IQ and Evaluation by the Examiner

General Pervasive Speech/developm. developm. language Hearing

delay disorder disorder impaired Deaf(N=238) (N=90) (N=179) (N=73) (N=94)

Motivation Pct Mean Pct Mean Pct Mean Pct Mean Pct Mean

Poor 2% 63.7 2% 54.5 2% 76.3 3% 71.5 –Mediocre/Varying 33% 73.4 29% 69.7 22% 78.1 30% 84.2 21% 92.6Good 65% 84.4 69% 82.8 75% 90.6 67% 96.7 79% 99.3

Correlation .33* .37* .33* .41* .19

Concentration Pct Mean Pct Mean Pct Mean Pct Mean Pct Mean

Poor 6% 62.9 4% 61.3 5% 72.3 6% 74.0 1% 68Mediocre/Varying 44% 78.1 43% 72.0 39% 80.7 38% 84.7 26% 92.2Good 50% 84.1 52% 85.1 56% 93.4 56% 99.1 73% 100.3

Correlation .28* .39* .44* .49* .31*

Cooperation Pct Mean Pct Mean Pct Mean Pct Mean Pct Mean

Poor 1% 61.7 8% 60.7 2% 66.3 1% 86 –Mediocre/Varying 24% 78.6 14% 72.2 12% 76.5 12% 78.7 13% 86.1Good 75% 81.1 78% 81.3 86% 89.4 86% 94.3 87% 99.6

Correlation .11 .32* .32* .28* .32*

Comprehensionof directions Pct Mean Pct Mean Pct Mean Pct Mean Pct Mean

Poor 3% 55.9 2% 50.0 3% 79.2 3% 63.5 1% 83Mediocre/Varying 22% 75.0 28% 71.3 15% 76.6 20% 84.9 9% 90.5Good 75% 82.9 70% 82.0 82% 89.8 77% 95.2 90% 98.8

Correlation .31* .34* .28* .38* .19

*: p < .01 with one-tailed testing

Page 77: SON-R 2 - Tests & Test-research

77RESEARCH ON SPECIAL GROUPS

7.6 EVALUATION BY INSTITUTE OR SCHOOL STAFF

In the case of a large number of children who were tested at schools and institutes, a staffmember, closely concerned with the child evaluated the following four aspects: intelligence,language development, fine motor skills and communicative orientation (the extent to which thechild seeks and maintains contact with others in his or her surroundings). The evaluation wasgiven on a five-point scale running from ‘low’ via ‘intermediate’ to ‘high’ in the case of intelli-gence and language development, and in the case of motor activity and communication from‘low’ via ‘reasonable’ to ‘high’.

In general the evaluation was carried out after the schools and institutes had received theprovisional results on the test. The possibility that the results on the SON-R 2,-7 influencedthe evaluation cannot be excluded. However, many other test and research data on thechildren were available at the schools, so that the question remains whether the test resultson the SON-R 2,-7 contributed much to the evaluation. We also do not know whether theperson making the evaluation was acquainted with the results. Because a certain amount ofcontamination may have occurred, the results presented in this section must be interpretedwith care.

Mean evaluations and their correlations with the test scores are presented in table 7.7 for twobroad groups. The first group consisted of children with a general developmental delay (N=222)and children with a pervasive developmental disorder (N=46). The second group consisted ofchildren with a speech or language disorder (N=105), children with impaired hearing (N=42)and deaf children (N=94).

In the group of children with a general developmental delay or with pervasive developmentaldisorders, the subjective evaluation of intelligence and language development was generallylow. In the speech/language/hearing-impaired group, the children were given a relatively lowevaluation regarding their language development; on all other aspects, the mean evaluation washigher than in the first group.

The mean IQ score in the first group was 80.9 with a standard deviation of 17.1; the correla-tion with the evaluation of intelligence was .68. In the second group, where the dispersion of

Table 7.7Correlations Between Test Scores and Evaluation by Institute or School Staff Member

General developmental delay Speech/language disorderPerv. developm. disorder (N=268) Hearing impaired/deaf (N=241)

Distribution Intell. Language Motiv. Commun. Intell. Language Motiv. Commun.

Mean 2.4 2.3 2.9 3.0 2.9 2.1 3.2 3.5SD (.9) (.8) (.9) (.9) (.7) (1.0) (1.1) (1.0)

correlation Intell. Language Motiv. Commun. Intell. Language Motiv. Commun.

Patterns .56 .42 .44 .24 .53 .35 .34 .21Mosaics .60 .41 .36 .15 .51 .27 .21 .21Puzzles .37 .19 .36 .19 .45 .22 .24 .18Situations .50 .39 .25 .19 .37 .28 .20 .10Categories .58 .45 .31 .28 .45 .12 .22 .20Analogies .43 .27 .36 .15 .31 .10 .17 .06

SON-PS .59 .40 .45 .22 .59 .33 .32 .24SON-RS .64 .47 .38 .26 .50 .22 .26 .15

SON-IQ .68 .48 .46 .27 .61 .31 .32 .23

– correlations > .14 are significant at the 1% level

Page 78: SON-R 2 - Tests & Test-research

78 SON-R 2,-7

both IQ scores and the evaluation of intelligence was narrower, the correlation was also weaker,i.e., .61.

In the group of children with a developmental delay, the correlation of the Reasoning Scalewith the evaluation of intelligence was higher than that of the Performance Scale. The correla-tions with the subtests Puzzles and Analogies were relatively weak. In the group of childrenwith language/speech/hearing disorders, the Performance Scale had the highest correlationswith the evaluation of intelligence; Situations and Analogies had the lowest correlations.Patterns and Mosaics had strong correlations with the evaluation of intelligence in both groups.

Reasonably strong correlations with the evaluation of language development and fine motordevelopment were also found in both groups. Patterns had the strongest correlation with theevaluation of motor skills. The Performance Scale correlated more strongly than the ReasoningScale with motor skills. The correlations between the test scores and the evaluation of thecommunicative orientation of the child were positive but weak.

Using a stepwise multiple regression analysis, the extent of the influence of the other evalu-ations on the correlation between the evaluation of intelligence and the SON-IQ was examined.In both groups the correlation increased when the evaluation of motor skills was included; in thefirst group from .68 to .74, and in the second group from .61 to .65.

7.7 EXAMINER EFFECTS

The evaluation of examiner effects was much more difficult in the special groups than in thestandardization research, because large differences existed between the groups and becausemost or all of the children tested by an examiner belonged to one specific group. Furthermore,the number of children tested by each examiner was much smaller than in the standardizationresearch. Using a variance analysis, the differences in IQ scores between the examiners wastested. The school evaluation of intelligence and fine motor activity, and the SES index were allcontrolled for. The comparison was limited to examiners who had tested at least 20 children.The number of examiners was 11 and the number children 446.

The main examiner effect, after controlling for the other variables, was significant(F[10,426]=2.81, p<.01). The beta coefficient was .17 and the mean absolute deviation of theexaminers from the total mean was 2.4 IQ points. These results correspond to the size of theexaminer effect in the standardization research where the absolute deviation of the examinerswas 2.2 points.

The children from one of the Institutes for the deaf, who were all tested by the same examiner,were not taken into account in the presentation of the results of the deaf children. The mean IQscore of these children was 82.2 (sd=14.7). This deviated significantly (p<.001) from the meanIQ score of 97.9 of the other 94 deaf children. The difference was especially large on theReasoning Scale, i.e., 20 points; on the Performance Scale the difference was 11 points.

The SON-R 2,-7 and the Performance Scale of the WPPSI-R were administered, a fewmonths apart, to 19 of the 22 children at this Institute. The mean PIQ score was 104.2, adifference of more than 23 points with the SON-IQ. Though the scores on the WPPSI-R may beoverestimates, as the directions are adapted and the norms slightly dated, these results suggest astrong examiner effect in the administration of the SON-R 2,-7. Three years later the SON-R5,-17 was administered to 18 children at this institute. The mean score was 100.2, nearly 17points higher than the IQ score on the SON-R 2,-7.

The fact that the correlations with both the WPPSI-R PIQ (r=.82) and the SON-R 5,-17 IQ(r=.66) were reasonably strong, is noteworthy. This means that the examiner effect occurredsystematically. We suspect that the low performances of the children with this examiner wererelated to the short time she had been working at the Institute for the deaf. The examiner wasprobably less able to make the aim of the tasks clear to the deaf children. This would alsoexplain why the performance, especially on reasoning tasks, was so low; the nature of thesetasks is less obvious than that of the tasks in the performance subtests.

Page 79: SON-R 2 - Tests & Test-research

79RESEARCH ON SPECIAL GROUPS

7.8 PSYCHOMETRIC CHARACTERISTICS

The correlations between the subtest scores, and the distinction between performance andreasoning tests, were examined in two groups of children. The relative frequency of heterogene-ous test profiles in these groups, in comparison to the standardization group, was also examined.The first group consisted of children with a general developmental delay and children with apervasive developmental disorder. The second group consisted of children with a speech orlanguage disorder, children with impaired hearing and deaf children.

Correlations between subtestsThe mean correlation between the subtests was .51 in the first group of children, and .44 in thesecond. This was higher than the mean correlation of .37 for the four- and five-year-olds in thenorm group. The higher correlations can be (partially) explained by the wider dispersion of thesubtest scores; the mean variance of the subtest scores was 11.3 in the first group and 9.5 in thesecond group. In the norm population the variances of the subtest scores equal 9.0.

The correlations between the subtests are presented in table 7.8. In both groups, Mosaics hadthe strongest correlation with Patterns (.71 and .63). In both groups, the correlation of Analogieswith Puzzles and Situations was relatively weak. In the second group the correlation of Situa-tions with Mosaics and Categories was also weak.

The three performance tests correlated most strongly, in both groups, with the sum of theremaining subtests. In the first group, the correlation of Analogies with the total score wasweakest; in the second group the correlation of all three reasoning tests with the total score wasrelatively weak.

Because the correlations between the subtests were stronger than in the norm group, thegeneralizability coefficient of the IQ score in both groups was also higher. In the first groupalpha was .86, in the second group .82. In the norm group the generalizability in the comparableage range was .78.

Principal Components AnalysisA PCA was carried out in both groups. The results were discussed in section 5.4 and presentedin table 5.9. In the combined group of children with a general developmental delay and childrenwith a pervasive developmental disorder, the first two factors explained 71% of the variance.The loadings of the subtests on the rotated factors were consistent with the distinction betweenperformance and reasoning tests.

In the group of children with a speech, language or hearing disorder, the first two factorsexplained 66% of the variance. The loadings for five of the subtests corresponded to the distinc-

Table 7.8Correlations Between the Subtests and Subtest-Rest Correlations

General developmental delay and Speech/language disorder andPervasive developm.disorder (N=328) Hearing impaired/Deaf (N=346)

Pat Mos Puz Sit Cat Ana Pat Mos Puz Sit Cat Ana

Pat – Pat –Mos .71 – Mos .63 –Puz .62 .58 – Puz .54 .55 –Sit .53 .49 .50 – Sit .40 .37 .46 –Cat .53 .48 .45 .56 – Cat .43 .42 .40 .37 –Ana .48 .48 .39 .42 .47 – Ana .43 .42 .37 .34 .46 –

Subt. Subt.Rest .75 .71 .66 .64 .63 .57 Rest .66 .66 .63 .51 .55 .54

Page 80: SON-R 2 - Tests & Test-research

80 SON-R 2,-7

tion between performance and reasoning tests. However, the subtest Situations had its highestloading on the first (performance) factor.

Individual profileThe intra-individual differences among the subtest scores of the children in the special groupswere not exceptionally large. In the standardization research the mean dispersion of the sixscores was 2.0 with a standard deviation of .7. The mean for children from the special groupswas 2.1 with a standard deviation of .7. The means varied from 1.9 for children with impairedhearing to 2.2 for children with a pervasive developmental disorder.

Page 81: SON-R 2 - Tests & Test-research

81

8 IMMIGRANT CHILDREN

In this chapter a study is made of the test performances of children one or both of whose parentswere born outside the Netherlands. These children were tested in the standardization research(N=147), or attended a preschool playgroup (N=8), or a primary school where complementaryresearch projects were carried out (N=54). Of these 209 children, 118 were immigrant children(both parents were born outside the Netherlands) and the remaining 91 children belonged to themixed group (one parent born outside the Netherlands). In section 8.5 the results of the immi-grant children will be compared with the results of 90 children participating in OPSTAP(JE), aprogram to stimulate the development of immigrant children.

8.1 THE TEST RESULTS OF IMMIGRANT CHILDREN

The test scores of the mixed and immigrant groups were compared with the scores of the nativeDutch children from the standardization research. The mean ages were 4;9 years in the nativeDutch group, 5;3 years in the mixed group and 5;6 years in the immigrant group. The percentageof boys in the mixed group was 47% and in the immigrant group 52%.

The mean scores are presented in table 8.1. The performances of the children with a mixedbackground differed only slightly from the performances of native Dutch children. The differ-ences for the total scores were negligible, and none of the differences in the subtest scores weresignificant at the 5% level. However, the mean scores of the immigrant children were clearlylower than those of the native Dutch children. The mean IQ score of the immigrant children wasnearly 8 points lower than that of the native Dutch children. With the exception of Analogies, alldifferences were significant at the 1% level.

In the group of immigrant children the differences between the mean subtest scores wereslight. The biggest difference, between Mosaics (8.7) and Analogies (9.3), was not significant atthe 5% level. The fact that the biggest difference in subtest scores in the mixed group also

Table 8.1Test Scores of Native Dutch Children, Immigrant Children and Children of Mixed Parentage

Native Dutch Mixed Immigrant(N=969) (N=91) (N=118)

Score Mean (SD) Mean (SD) Mean (SD)

Patterns 10.1 (2.9) 10.0 (2.6) 8.9 (2.8)Mosaics 10.1 (3.0) 9.7 (2.9) 8.7 (3.1)Puzzles 10.1 (3.0) 10.3 (3.0) 9.2 (2.4)Situations 10.1 (2.9) 9.8 (2.9) 9.1 (2.8)Categories 10.2 (2.9) 10.0 (3.1) 8.8 (3.0)Analogies 10.0 (2.9) 10.5 (3.2) 9.3 (3.2)

SON-PS 100.7 (15.2) 100.2 (14.3) 93.4 (13.6)SON-RS 100.4 (14.8) 100.5 (15.2) 93.8 (15.6)

SON-IQ 100.7 (14.9) 100.6 (15.2) 92.8 (14.4)

Page 82: SON-R 2 - Tests & Test-research

82 SON-R 2,-7

occurred between Mosaics (9.7) and Analogies (10.5) is noteworthy. The results show that thelower performances of the immigrant children were not caused or worsened specifically by thesubtests Categories, Situations and Puzzles. These subtests use meaningful picture materialsand might therefore have a culture specific meaning. The mean score on these three subtests wasequal to the mean score on Patterns, Mosaics and Analogies. These last three subtests use non-meaningful picture materials such as geometrical forms. No differences were found between themean scores on the Performance Scale and the Reasoning Scale in the immigrant or in the mixedgroup.

8.2 RELATIONSHIP WITH THE SES LEVEL

Information about the level of education and occupation of the parents was available for mostchildren. The SES index, calculated on the basis of these data, had a mean of 5.1 (sd=2.8) in themixed group, a mean of 2.5 (sd=2.7) in the immigrant group, and a mean of 4.9 in the nativeDutch group (sd=2.5). The SES index of the immigrant children was significantly (p<.01) lowerthan the SES index of the native Dutch children.

In table 8.2 the percentage of children at each SES level is presented for each group (the SESindex has been limited to four categories). The distribution curve of the mixed group wasslightly flatter than that of the native Dutch group; the distribution of the immigrant group wasvery skewed. In comparison to the native Dutch group, more than three times as many childrenof the immigrant group had a low SES level, whereas the number of children with a high SESlevel in the native Dutch group was more than three times as high as that in the immigrant group.

The mean IQ scores for each SES level are also presented in table 8.2. Within each group aclear and comparable relationship existed between SES level and IQ, and no significant inter-action effect was found. When the SES level was controlled, the differences among the threegroups almost disappeared and were no longer significant (F[2,1158]=2.81; p>.05). The differ-ence of nearly eight IQ points between the immigrant and the native Dutch group decreased,after controlling for the SES level, to three points.

Table 8.2Relationship Between Group, SES Level and IQ

Native Dutch Mixed Immigrant(N=963) (N=90) (N=117)

SES Level Pct Mean (SD) Pct Mean (SD) Pct Mean (SD)

Low 18% 92.9 (14.9) 21% 94.3 (11.9) 61% 90.3 (13.5)Below average 33% 98.8 (13.9) 27% 98.3 (16.0) 22% 94.6 (12.3)Above average 31% 102.9 (13.8) 28% 102.7 (15.9) 11% 99.1 (17.8)High 19% 108.1 (14.9) 24% 106.5 (14.2) 6% 102.9 (16.6)

8.3 DIFFERENTIATION ACCORDING TO COUNTRY OF BIRTH

The largest immigrant groups in the Netherlands come from Surinam, the Antilles, Morocco andTurkey. These groups are most strongly represented in this research group (table 8.3). Childrenof parents born in Surinam, Morocco and Turkey had mean scores close to 90. The small groupof children from other African and South American countries had the same mean score. TheAntillean and Asian children had scores close to 100 and the small group of children from otherWestern countries performed above average.

For children with one parent born outside the Netherlands, the differences in mean IQ scores

Page 83: SON-R 2 - Tests & Test-research

83IMMIGRANT CHILDREN

were slight. Only a small group with one Turkish or Moroccan parent scored clearly belowaverage.

A total of 39 children were not born in the Netherlands. In the case of seven of thesechildren, both parents were born in the Netherlands; these children were presumably adopt-ed. The mean IQ score of these seven children was 4 points lower than of the native Dutchchildren. Six of the children in the mixed group were born outside the Netherlands. Theirmean IQ score was nearly 2 points higher than the score of the children in the mixed groupwho were born in the Netherlands. Of the 107 children whose parents were both bornoutside the Netherlands, and whose country of birth was known, 26 were also born outsidethe Netherlands. Their mean IQ score was more than 1 point lower than the IQ score ofthe immigrant children who were born in the Netherlands. This indicates that whether thechild was born in the Netherlands or in another country had little effect on the test per-formance.

8.4 COMPARISON WITH OTHER TESTS

The mean IQ score of the 118 immigrant children in this research project with the SON-R 2,-7 was 92.8, nearly 2 points higher than the mean IQ score of 91.0 of the immigrant childrenparticipating in the standardization research of the SON-R 5,-17 (N=61). In comparison to theSON-R 5,-17, the scores of the Turkish children were higher, while the scores of the Surinam/Antillean children were lower.

Research was done with the RAKIT in different immigrant groups by Resing, Bleichrodt andDrenth (1986). The RAKIT is an intelligence test with verbal and performance tasks for chil-dren 4 years and older. In the age group of 5;8 years the mean RAKIT IQ in the Surinam/Antillean group was 89.6; in the Turkish group this was 80.0 and in the Moroccan group 80.5.Each group consisted of approximately 60 children. The mean IQ scores on the SON-R 2,-7 inthe three ethnic groups were 3, 9 and 11 points higher respectively.

Using the LEM (Learning test for Ethnic Minorities; Hessels, 1993), research was done withTurkish and Moroccan children five and six years of age. The LEM was specially designed tomeasure learning potential and to depend as little as possible on culture specific knowledge andskills. The Turkish and Moroccan groups consisted of 120 children each. The mean standard-ized total scores of the Turkish and the Moroccan children were 83.5 and 84.4 respectively. This

Table 8.3Differentiation of Mean IQ Scores According to Country of Birth

Country of birth of parents Country

One or both One parent Both parents of birth abroad abroad abroad of child

land N Mean N Mean N Mean N Mean

Surinam 49 92.3 13 97.8 36 90.3 4 79.8Antilles 22 99.3 7 102.9 15 97.6 9 93.1Marocco 26 88.7 1 82 25 88.9 3 97.3Turkey 26 91.0 4 86.3 22 91.8 2 93.0Indonesia 15 102.6 12 103.6 3 98.7 –Other Africa 14 96.1 10 99.3 4 88.0 4 99.5Other Asia 11 101.4 4 106.0 7 98.7 6 88.7Other S-America 8 96.8 7 97.6 1 91 4 88.3Other Western 38 103.9 33 102.8 5 111.8 7 107.7

Total 209 96.2 91 100.6 118 92.8 39 94.2

Page 84: SON-R 2 - Tests & Test-research

84 SON-R 2,-7

means that their mean score on the LEM was approximately 6 points lower than the mean IQscore of Turkish and Moroccan children on the SON-R 2,-7.

The conclusion on the basis of these comparisons is that immigrant children get better resultson the SON-R 2,-7 than on the RAKIT and the LEM. Comparisons of the SON-R 2,-7 andanother test (see section 9.11), administered to the same children, indicate also that the SON-R2,-7 is much less dependent on culture specific knowledge and skills.

8.5 THE TEST PERFORMANCES OF CHILDREN PARTICIPATING INOPSTAP(JE)

OPSTAP is a family intervention program for immigrant families (Eldering & Vedder, 1992)and has been used in the Netherlands since 1987. It is the Dutch version of the program HIPPY(Home Intervention Programme for Preschool Youngsters; Lombard, 1981) that was developedin Israel. OPSTAP is aimed at helping mothers of immigrant children in the kindergarten agerange. OPSTAPJE has been operating for a few years now and is aimed at helping mothers withchildren in the preschool age range. The goal of the programs is to enhance the mother’s abilityto stimulate the child in his or her development. This is achieved by (group)discussions, byproviding materials and by supplying exercises for the child.

In 1994, research using the SON-R 2,-7 was carried out, in collaboration with RichardCress of the Averoès Foundation, with a number of children who were participating in OPSTAPor OPSTAPJE. In general, the test was administered at the end of the two-year interventionperiod. Three of the four examiners (all of them female) had participated in OPSTAP(JE) ascoordinator or trainee. One examiner was of Moroccan descent and one of Surinam descent. Atotal of 105 OPSTAP(JE) children were tested. We have limited the presentation of the results tothose Surinam, Moroccan and Turkish children, whose parents were both born outside theNetherlands (N=90). A good comparison can be made between these groups and similar immi-grant groups discussed previously that have not, as far as we know, participated in an interven-tion program.

The percentage of boys in both the OPSTAP(JE) group and the immigrant comparison groupwas 53%. The age varied from two to seven years and had a mean of 5;0 years (sd=1;4 years).The number of Surinam children was 33; the number of Moroccan children was 22 and thenumber of Turkish children was 35.

In table 8.4 the mean scores are presented of the OPSTAP(JE) children, of the immigrantchildren from the comparison group and of the native Dutch children from the standardizationresearch. The mean score of the 90 OPSTAP(JE) children was 102.8. This was two points higherthan the mean score of the native Dutch children. However, the difference was not significant.The mean score of the OPSTAP(JE) children was 12.5 points higher than that of the immigrant

Table 8.4Mean IQ Scores of Surinam, Turkish and Moroccan Children Who Had Participated in theOPSTAP(JE) Project

OPSTAP (JE) Comparison Group Immigrant Immigrant Native Dutch

Country of birthof parents N Mean (SD) N Mean (SD) N Mean (SD)

Surinam 33 98.4 (17.6) 36 90.3 (15.0) –Morocco 22 106.5 (11.3) 25 88.9 (10.1) –Turkey 35 104.7 (11.1) 22 91.8 (14.0) –The Netherlands – – 969 100.7 (14.9)

Total 90 102.8 (14.2) 83 90.3 (13.3) 969 100.7 (14.9)

Page 85: SON-R 2 - Tests & Test-research

85IMMIGRANT CHILDREN

children from the comparison group. The difference according to country of birth was largestfor Moroccan children and least for Surinam children. A variance analysis carried out withcountry of birth and participation in OPSTAP(JE) as factors, showed that neither the interactioneffect nor the main effect for country of birth was significant. However, the main effect forparticipation in OPSTAP(JE) was highly significant (F[1,167]=33.77, p<.01).

The possibility exists that factors other than participation in OPSTAP(JE) contributed tothese differences, such as, for instance, the SES level of the parents. A selection effect may haveoccurred in the decision for parents to participate in OPSTAP(JE), or when parents agreed toparticipate in this research. Another difference is that the test was administered at home in theOPSTAP(JE) research, and at school in most of the other research projects. The ethnic back-ground of the examiners appeared to have had no influence. The scores of the children who weretested by the two immigrant examiners were on average two points lower than the scores of thechildren who were tested by the two Dutch examiners. Furthermore, the scores of the childrenwho were tested by an examiner from their own ethnic group were no higher than those of theother children. What could, of course, have played a role is, that all four examiners had a greatdeal of experience with immigrant children and were therefore well able to motivate and stimu-late the children. In order to be able to give an unambiguous evaluation of the effect ofOPSTAP(JE), research needs to be done with a pretest, post-test, control group design, with theexaminer as variable to be controlled for.

Page 86: SON-R 2 - Tests & Test-research
Page 87: SON-R 2 - Tests & Test-research

87

9 RELATIONSHIP WITH COGNITIVE TESTS

Within the framework of the validation of the test, the relationship between the IQ scores on theSON-R 2,-7 and the performances on a large number of cognitive tests was examined. Thevalidation measures, here referred to as criterion tests, were mostly general development andintelligence tests like the BOS 2-30, the Stutsman, the GOS 2,-4,, the LDT, the RAKIT,various versions of the Wechsler tests, the BAS, the MSCA and the TONI-2, and tests forlanguage development and verbal intelligence like the Reynell Test and the Schlichting Test, theTvK, the PPVT-R and the PLS-3. More specific tests were also administered, including amemory test (TOMAL) and a test for visual perception (DTVP-2). In the text the tests will bedescribed and the acronyms explained.

The administration of the SON-R 2,-7 and the criterion tests was carried out within theframework of a number of different research projects. In sections 9.1 through 9.7 the results ofeach research project are described. These projects were:1. The nationwide standardization research.2. Research in the Netherlands on pupils at second year kindergarten level (± 5-6 years) at

primary schools.3. Research in the Netherlands at OVB-schools. These are schools with a policy of educational

priority in certain areas designated as low SES areas.4. The Dutch research at schools and institutes for children with special problems and handi-

caps.5. Research in Australia on non-handicapped children and on children with impaired hearing or

a developmental delay.6. Research in the United Stated of America on children in regular education.7. Research in Great Britain on children without specific problems, children with learning

problems and children growing up bilingually.

Table 9.1 presents the tests that were used in the different research projects. In a number of casesonly some sections of the criterion test were administered. In order to be able to compare thecorrelations of the research projects better, they have all been corrected for the dispersion of theIQ scores of the SON-R 2,-7 (Guilford & Fruchter, 1978, p. 325). This correction is notcomparable to the correction for attenuation by which correlations are systematically strength-ened. When correcting for dispersion, the correlations are strengthened if the standard deviationof the SON-IQ in the research group is smaller than 15, and they are weakened if the standarddeviation is larger than 15. As an example we will give a few corrected correlations for anobserved correlation of .60. This becomes: .65 (sd=13); .63 (sd=14); .58 (sd=16) or .55 (sd=17).

In section 9.8 a summary is presented of the correlations between the SON-IQ and thecriterion tests that are discussed in this chapter. A distinction is made between general intelli-gence tests, nonverbal cognitive tests, and language and verbal intelligence tests. Approximate-ly half of the correlations with general intelligence tests ranged from .59 to .70. With nonverbalcognitive tests they ranged from .59 to .75, and with verbal (intelligence) measures half of thecorrelations ranged from .45 to .54.

Section 9.9 examines whether important differences were found between the correlations ofthe Performance Scale and the Reasoning Scale with the criterion tests, and whether thesedifferences were systematic. When differences were found, the Performance Scale of theSON-R 2,-7 had a relatively strong correlation with the performance part of other intelligence

Page 88: SON-R 2 - Tests & Test-research

88 SON-R 2,-7

tests and with visual perception, whereas the Reasoning Scale had a strong correlation with theverbal part of other intelligence tests and with language comprehension.

In section 9.10 the differences between the mean scores of the SON-IQ and mean total scoresof the criterion tests are presented. The problems that occur when making these comparisons arealso examined. Large differences between standardized scores may occur as a result of normsbecoming obsolete, or as a result of differences in populations used for standardization. Ifobsolescence of the norms was not taken into account, the scores on the SON-R 2,-7 weregenerally lower than on the other tests. If scores on the other tests were corrected for obsoles-cence, the mean score of the SON-IQ corresponded with the corrected American and Englishnorms. The scores on the SON-R 2,-7 were relatively high in comparison to the correctedDutch test scores. However, the scores corresponded well with the most recently standardizedtest in the Netherlands, the GOS.

Because of the amount of research described in this chapter, it may be easier for the reader toread the summarizing sections 9.8 through 9.10 first, and then the separate descriptions of eachresearch project.

Finally, in section 9.11 a comparison is made between the relationship of the SON-R 2,-7and the criterion tests, using a number of external variables. These are the ‘testability’ of thechild, the correlation with SES level and native country of the parents, and external assessmentsof intelligence and language skills.

The results of the research described in this chapter can clarify the extent to which the scoreson the SON-R 2,-7 are comparable with the scores on other intelligence tests, and give insightinto the relationships between the nonverbal measure of intelligence provided by with theSON-R 2,-7 and other aspects of cognitive development such as language skill, memory andperception. In chapter 10, the results of this correlational research are worked out in more detailtogether with the results of the previous chapters. In chapter 10, attention is focussed especiallyon the implications of the research results for the use of the test in practice.

Table 9.1Overview of the Criterion Tests Used and the Number of Children to Whom Each Test WasAdministered

Netherlands

Criterion Standardiz. Primary ‘OVB’ SpecialTest research school school groups Australia USA GB

P-SON/SON-R 5,-17 119 – – 206 – – –BOS 2-30 (BSID) 50 – – 26 – – –GOS 2,-4,/K-ABC 115 – – – – 31 –RAKIT 165 – 73 70 – – –WPPSI(-R)/WISC-R – – 41 112 155 75 –LDT – – 73 80 – – –Stutsman – – – 42 – – –MSCA (MOS) – – – – – 26 –BAS – – – – – – 58TONI-2 – 153 – – – – –

DTVP-2 – 153 – – – – –TOMAL – 153 – – – – –

Reynell (TB) 558 – – 179 – – –Schlichting (ZO/WO) 558 – – – – – –TvK 108 – – 49 – – –PPVT-R (Peabody) – – – – – 29 –PLS-3 – – – – – 47 –

Page 89: SON-R 2 - Tests & Test-research

89

9.1 CORRELATION WITH COGNITIVE TESTS IN THESTANDARDIZATION RESEARCH

The design and execution of the standardization research of the SON-R 2,-7 was carried out incollaboration with the project group responsible for the standardization of the Reynell Test forLanguage Comprehension and the Schlichting Test for Language Production. Approximatelyhalf the children participating in the SON-R 2,-7 standardization research had completed theReynell and the Schlichting Test half a year earlier. For a small number of children the intervalwas a year. Between the administration of the language tests and the SON-R 2,-7, a criteriontest was administered to many of the children as part of the process of validating both theSON-R 2,-7 and the Reynell/Schlichting Test. As extra criterion test, the BOS, the GOS, theRAKIT or the TvK was used. In the case of many of the children who had not been tested

Table 9.2Characteristics of the Children to Whom a Criterion Test Was Administered in the Standardiza-tion Research

RetestSON-R SON-R Reynell-Schlichting2,-7 5,-17 BOS GOS RAKIT LC/SD/WD Lexi AM TvK

Total 141 119 50 115 165 558 56 241 108

Age group

2;3 years 12 – 28 11 – 41 47 – –2;9 years 13 – 22 19 – 51 9 – –3;3 years 12 – – 19 – 54 – 24 –3;9 years 9 – – 25 – 57 – 53 –4;3 years 21 – – 26 14 54 – 50 114;9 years 30 – – 14 23 55 – 55 165;3 years 7 22 – 1 25 64 – 59 185;9 years 12 23 – – 27 66 – – 176;3 years 9 23 – – 25 58 – – 166;9 years 8 27 – – 27 58 – – 157;3 years 8 24 – – 24 – – – 15

Sex

Boys 72 63 24 59 80 269 23 117 50Girls 69 56 26 56 85 289 33 124 58

SES Index

Mean 5.3 4.1 4.9 5.5 4.9 4.9 4.5 5.0 4.4(SD) (2.7) (2.2) (2.7) (2.7) (3.0) (2.5) (2.3) (2.5) (2.3)

Country of birth

Native Dutch 84% 83% 94% 90% 87% 92% 96% 93% 90%Mixed 6% 7% 2% 6% 4% 5% 4% 4% 5%Immigrant 9% 10% 4% 4% 9% 3% – 3% 5%

– the age group is the age at the time of administration of the SON-R 2,-7

RELATIONSHIP WITH COGNITIVE TESTS

Page 90: SON-R 2 - Tests & Test-research

90 SON-R 2,-7

previously with another test, either the GOS or the RAKIT was administered approximatelythree months after adminstration of the SON-R 2,-7, or the children were tested again witheither the SON-R 2,-7 or the SON-R 5,-17.

In table 9.2 the background of the children to whom a criterion test was administered ispresented. The age groups refer to the age at which the SON-R 2,-7 was administered. Theresults are presented in table 9.3. The age in this table is based on the mean age at administrationof the SON-R 2,-7 and the criterion test. The interval (in months) is the period between theadministration of the tests. The results of the research are discussed for each test.

SON-R 2,,,,,-7To determine the stability of the test results, the SON-R 2,-7 was administered a second time to141 children after a delay of three to four months. The results were presented in section 5.5.They will be discussed briefly here as they may serve as basis for the assessment of the

Table 9.3Correlations with Other Tests in the Standardization Research

Scores Age Interval(years) (months)

Criterion SON-R 2,-7Criterion Test N r Mean (SD) Mean (SD) Mean (SD) Mean (SD)

SON-R 2,,,,,-7IQ-score on retest 141 .79 109.4 (14.7) 103.4 (13.7) 4.7 (1.4) 3.5 (0.7)

SON-R 5,,,,,-17Standard IQ 119 .76 103.6 (12.2) 98.2 (12.6) 6.4 (0.7) 3.6 (0.7)

BOS 2-30Mental Scale 50 .59 100.5 (17.7) 103.0 (15.9) 2.4 (0.3) 2.7 (0.7) Nonverbal Scale .53 98.6 (17.3)

GOS 2,,,,,-4,,,,,Cognitive DI 115 .65 104.4 (15.7) 102.9 (15.5) 3.6 (0.7) 3.2 (0.7) Simultaneous DI .63 102.8 (17.4) Sequential DI .49 105.1 (13.0)

RAKITShortened version IQ 165 .60 102.2 (15.6) 102.4 (14.6) 5.8 (1.0) 3.0 (0.7)

REYNELL/SCHLICHTINGMean LC, SD and WD 558 .48 100.8 (12.8) 101.4 (15.4) 4.4 (1.4) 6.3 (1.4) Lang.comprehension (LC) .46 101.1 (15.0) Sentence Developm. (SD) .35 100.3 (14.2) Word Development (WD) .45 100.9 (14.8)

Auditive Memory 241 .27 100.6 (14.3) 101.9 (15.8) 4.1 (0.7) 6.1 (1.0)

Lexilist 56 .54 102.4 (15.8) 101.7 (16.5) 2.0 (0.1) 6.9 (2.5)

TvKMean of 5 subtests 108 .59 4.7 (1.6) 101.2 (15.4) 5.7 (1.0) 3.0 (0.8)

– the correlations have been corrected for the variance of the SON-IQ– the age is the mean age at the time of administration of the SON-R 2,-7 and the criterion test

Page 91: SON-R 2 - Tests & Test-research

91

correlations of the SON-R 2,-7 with other tests. The age at the first administration ranged fromtwo to seven years with a mean of 4;6 years. The correlation between the IQ scores was .79. Thiscorrelation increased slightly with age. With children up to 4;6 years (N=67), the correlationwas .78 and with older children the correlation was .81 (N=74). On the basis of these retestcorrelations, correlations with criterion tests were not expected to exceed .80 if the periodbetween administrations was a few months or more.

SON-R 5,,,,,-17After a delay of at least three months, the SON-R 5,-17 was administered to 119 children 5years and older (mean age was 6;3 years). The more difficult items of the subtests Mosaics,Categories, Analogies and Situations of the SON-R 2,-7 are very similar in content to theeasier items of these subtests of the SON-R 5,-17. The subtest Puzzles does not have anequivalent in the SON-R 5,-17. Two new subtests in this test are Stories and Hidden Pictures.Both tests have a subtest Patterns, however, the subtests differ in content.

The correlation between the IQ scores of the two tests was .76. The correlation was as highwith children younger than 6;6 years (N=68; r=.75) as with the older children (N=51; r=.75). Aswas the case with the retest of the SON-R 2,-7, there was a noticeable learning effect with theadministration of the SON-R 5,-17. The mean scores were more than 5 points higher with theSON-R 5,-17.

BOS 2-30The BOS 2-30 (Bayley Developmental Scales; Van der Meulen & Smrkovsky, 1983) is a test forthe mental and motor development of children in the age range from two to thirty months. Thistest is the Dutch version of the Bayley Scales of Infant Development (BSID; Bayley, 1969). Adevelopmental index is calculated for the Mental Scale and the Motor Scale with a mean of 100and a standard deviation of 16. A nonverbal score for the Mental Scale can be determined byexcluding the items with a verbal content in the scoring (Van der Meulen & Smrkovsky, 1987;Le Coultre-Martin et al., 1988).

Fifty children (24 boys and 26 girls) were tested. In the case of 47 children, both parentswere born in the Netherlands. The SES level corresponded to that of the norm group. The meanage at the time of administration of the BOS was 2;3 years. The SON-R 2,-7 was administeredtwo to four months later. The administration of the BOS was limited to the Mental Scale.

The correlation of the developmental index of the Mental Scale of the BOS with the SON-IQwas .59. The correlation of the Nonverbal Scale of the BOS with the SON-IQ was slightly lower(r=.53). On average, the children scored more than two points lower on the BOS than on theSON-R 2,-7.

GOS 2,,,,,-4,,,,,The GOS 2,-4, (Groningen Developmental Scales; Neutel, Van der Meulen & Lutje Spelberg,1996) is the Dutch version, for children from 2, to 4, years, of the Kaufman AssessmentBattery for Children (K-ABC; Kaufman & Kaufman, 1983). Two new subtests were added tothe GOS (Motor Skills and Copying Figures). In contrast to the K-ABC, the GOS does not makea distinction between a Mental Scale and an Achievement Scale. The number of subtests admin-istered is 9, 11 or 13, depending on age. The total of all subtests forms the Cognitive Scale.Furthermore, the subtests are subdivided into a Simultaneous Scale and a Sequential Scale.Three subtests from the Achievement Scale of the K-ABC have been added to the SimultaneousScale. The subtest Arithmetic and the two new subtests are part of the Sequential Scale. Thethree developmental indexes have a mean of 100 and a standard deviation of 15.

The GOS was administered to 115 children (59 boys and 56 girls). The mean SES index was5.5. In the case of 103 children, both parents were born in the Netherlands. The period betweenthe administration of the tests was on average three months. The GOS was administered first to64 children, and the SON-R 2,-7 was administered first to 51 children. The mean age at thetime of administration of the tests was 3;7 years.

The correlation between the Cognitive Developmental Scale of the GOS and the SON-IQwas .65. The mean and the standard deviation of both tests were very similar. The correlation for

RELATIONSHIP WITH COGNITIVE TESTS

Page 92: SON-R 2 - Tests & Test-research

92 SON-R 2,-7

three age groups, based on the age at the time of administration of the GOS was .64 (youngerthan 3;2 years; N=39), .62 (age between 3;2 and 4;2 years; N=51) and .77 (older than 4;2 years;N=25).

In the entire group the correlation between the Simultaneous Scale and the SON-IQ was .63,and between the Sequential Scale and the SON-IQ .49. The dispersion of Simultaneous Scale(sd=17.4) was significantly larger than that of the Sequential Scale (sd=13.0). When the correla-tions were corrected not for the standard deviation of the SON-IQ, but instead for the standarddeviation of the two subscales of the GOS, the correlation of the Simultaneous Scale with theSON-IQ was .59 and the correlation of the Sequential Scale with the SON-IQ was .56.

RAKITThe RAKIT (Revision of the Amsterdam Intelligence Test for Children; Bleichrodt, Drenth,Zaal & Resing, 1984) is a general intelligence test, developed in the Netherlands, for children inthe age range four to eleven years. There are twelve subtests which tap spatial-perceptual aswell as verbal abilities. In the age range six to ten years, the RAKIT IQ has a correlation of .81with the IQ score on the WISC-R (Bleichrodt, Resing, Drenth & Zaal, 1987). In our researchproject the shortened version of the RAKIT, five or six subtests, depending on age, was admin-istered. The IQ score of the shortened RAKIT has a mean of 100 and a standard deviation of 15.

Research was done with 165 children (80 boys and 85 girls). The mean SES index was 4.9.Thirteen percent of the children had one or both parents born outside the Netherlands. TheRAKIT was administered first to 111 children and the SON-R 2,-7 was administered first to 54children. The mean interval between the two administrations was three months. The age at thetime of administration was on average 5;10 years.

The correlation between the SON-IQ and the shortened version RAKIT IQ was .60. Themean and the dispersion of both tests corresponded well. Three age groups were distinguishedon the basis of the combination of RAKIT subtests administered. In the first group (mean age4;8 years at the time of administration of the RAKIT; N=53) the correlation was .50. In thesecond group (mean age 5;8 years; N=48) the correlation was .62. In the oldest age group (meanage 6;10 years; N=64) the correlation between the SON-IQ and the RAKIT IQ was .65.

Reynell Test and Schlichting TestThe Reynell Test for Language Comprehension (Van Eldik et al., 1995) is the recently complet-ed Dutch revision of the language comprehension section of the Reynell DevelopmentalLanguage Scales (Reynell, 1985). The test provides one standardized score for receptive lan-guage development. The Schlichting Test for Language Production (Schlichting et al., 1995) isa newly developed test for expressive language development. The test provides standardizedscores for Sentence Development and Word Development for children in the age range 1;8 to6;3 years. For the age range 2;9 to 4;9 the test has an Auditive Memory section (repeating seriesof words). For the age of 1;9 years standardized scores are calculated for the Lexilist, a list ofwords and sentences from early language development completed by the parents.

The Reynell Test and the Schlichting Test were standardized on the same population. Bothtests were administered in one session. The standardized scores have a mean of 100 and astandard deviation of 15.

Half a year, or in a few cases one year, after the Reynell/Schlichting Test, the SON-R 2,-7was administered to 558 children (269 boys and 289 girls). The interval between the administra-tion of the language tests and the SON-R 2,-7 was, on average, 6.3 months. The mean age atthe time of administration of the SON-R 2,-7 was 4;7 years. The SES index had a mean of 4.9.In the case of 92% of the children both parents were born in the Netherlands.

The correlation of the SON-IQ with the Language Comprehension score on the Reynell Testwas .46; the correlations with Sentence Development and Word Development of the SchlichtingTest were .35 and .45 respectively. The correlation of the SON-IQ with the mean score onLanguage Comprehension, Sentence Development and Word Development was .48. This corre-lation increased with age. Depending on the age at the time of administration of the Reynell/Schlichting Test, the correlation for the one- and two-year-olds was .40 (N=153), for the three-

Page 93: SON-R 2 - Tests & Test-research

93

year-olds .36 (N=124), for the four-year-olds .51 (N=119), for the five-year-olds .52 (N=124)and for the six-year-olds .72 (N=58).

The correlation between the SON-IQ and the score on the Lexilist was .54 (N=56). The ageat the time of administration of the Lexilist was 1;9 years. The mean age at the time of adminis-tration of the SON-R 2,-7 was 2;4 years.

The correlation of the SON-IQ with the Auditive Memory section of the Schlichting Test was.27 (N=241). For children less than four years at the time of administration of the SchlichtingTest, the correlation was .25 (N=127) and for the older children the correlation was .28 (N=114).

TvKThe TvK (Language Tests for Children; Van Bon, 1982) is a test battery consisting of ten testsfor receptive and productive language development in children in the age range four to tenyears, developed in the Netherlands. The TvK is an adaptation of the Illinois Test of Psycholin-guistic Abilities (ITPA). During the research two receptive tests (Choice of Sentence Structureand the Choice of Vocabulary) and three productive tests (Word Form Production, SentenceStructure Production 0 and Vocabulary Production 1) were administered. The scaled scores ofthe tests have a mean of 5 and a standard deviation of 2.

The TvK was administered to 108 children (50 boys and 58 girls). In the case 97 children,both parents were born in the Netherlands. The SES index had a mean of 4.4. The age at the timeof administration of the TvK was on average 5;6 years. The SON-R 2,-7 was administered onaverage three months later.

The correlations of the SON-IQ with the subtests of the TvK ranged from .39 (Choice ofSentence Structure) to .52 (Choice of Vocabulary). The correlation of the SON-IQ with themean score on the five subtests of the TvK was .59. For the younger children (age at the time ofadministration of the TvK less than 5;6 years; N=53) the correlation was .50; for the olderchildren the correlation was .68 (N=55).

9.2 CORRELATION WITH NONVERBAL TESTS IN PRIMARYEDUCATION

Within the framework of a research project carried out by psychology students, the SON-R2,-7 was administered to pupils in the second year of kindergarten (approximately 5-6 years ofage) at six primary schools (Van den Berg et al., 1994; Driesens et al. 1994; Elsjan et al., 1994).Three other nonverbal tests were also administered: the TONI-2, the nonverbal section of theTOMAL and the DTVP-2.

The TONI-2 (Brown, Sherbenou & Johnsen, 1990) is the revision of the Test of NonverbalIntelligence (TONI; Brown, Sherbenou & Johnsen, 1982). The test has only one section and canbe administered in approximately 15 minutes. The test consists of multiple choice items inwhich the relationship between abstract figures must be discovered. Two parallel versions of thetest are available. In this research Form A was used. The TONI-2 has been standardized in theUnited States of America for the age range from 5 to 86 years.

The TOMAL (Test of Memory and Learning; Reynolds & Bigler, 1994) is a battery ofmemory tests, developed and standardized in America. The standard battery consists of fiveverbal and five nonverbal subtests. There are also four supplementary subtests. During thisresearch, the administration of the TOMAL was limited to the five nonverbal subtests which,together, provide the score on the Nonverbal Memory Index. The test has been standardized forthe age range from five to nine years.

The DTVP-2 (Hammill, Pearson & Voress, 1993) is the recent American revision of theMarianne Frostig Developmental Test of Visual Perception (Frostig, Lefever & Whittlesey,1966). The original test was published in the Netherlands as the ‘Test voor Visuele Waarneming’(Van den Akker & Van Boecop, 1976). The DTVP-2 consists of eight subtests. Besides the totalscore for General Visual Perception, separate total scores can be calculated for Motor-ReducedVisual Perception and Visual-Motor Integration, each of which is based on four subtests. Thetest has been standardized for the age range from four to eleven years.

RELATIONSHIP WITH COGNITIVE TESTS

Page 94: SON-R 2 - Tests & Test-research

94 SON-R 2,-7

The testing materials did not have to be adapted for the research in the Netherlands. Thedirections of the TOMAL and the DTVP-2 were translated. The directions of the TONI-2 aregiven nonverbally. American norms were used in the research. The standardized total scoreshave a mean of 100 and a standard deviation of 15. The norms for the TONI-2 and the TOMALare given for each year of age and are therefore very rough for the young age groups. Thestandardized scores for the age in months were therefore calculated by interpolation and extrap-olation.

The research was carried out on 153 children (64 boys and 89 girls). The mean age of thechildren was 5;10 years with a standard deviation of 5 months. The SES index had a mean of 6.6(sd=3.0) and was clearly higher than the mean of the norm group. The percentage of nativeDutch children was 86%.

All four tests were administered to the children in three sessions at school. The administra-tion of the TONI-2 and the TOMAL was combined, the TONI-2 being administered first. Thesequence of administration of the SON-R 2,-7, the TONI/TOMAL and the DTVP-2 varied.The mean interval between the administration of the SON-R 2,-7 and one of the other tests was21 days with a standard deviation of 12 days.

The mean scores and the correlations between the SON-IQ and the total scores on the othertests are presented in table 9.4. The correlation with the IQ score on the TONI-2 was .51; thecorrelation with the Nonverbal Memory Index of the TOMAL was .45. The highest correlation,.73, was found with the total score on the DTVP-2. The correlation with the tasks that do notrequire motor skills was somewhat stronger (r=.70) than the correlation with the visual motortasks (r=.66).

9.3 CORRELATION WITH COGNITIVE TESTS AT OVB-SCHOOLS

In the framework of research on the effect of a policy for the stimulation of children from socio-economic and cultural groups with educational delays (Bollen, 1991, 1996; Rekveld, 1994), agroup of children was tested in 1991 with a number of subtests of the LDT and the RAKIT. Twoyears later, in 1993, the tests were administered for a second time, together with the SON-R2,-7. After a further two years, the WISC-R and two tests for reading skills were administeredto a number of these children. The first test administration, in 1991, took place at the beginningof the school year with children in the first year of kindergarten (approximately 4-5 years ofage) at four different OVB-schools. These are regular primary schools which have been giveneducational priority because of the large number of native Dutch children with a low SES leveland/or the large number of immigrant children.

Table 9.4Correlations with Nonverbal Cognitive Tests in the Second Year of Kindergarten, 5 to 6 Years ofAge (N=153)

CorrelationTest Score with SON-IQ Mean (SD)

SON-R 2,,,,,-7 IQ 102.4 (15.8)

TONI-2 Form A .51 103.5 (14.1)

TOMAL Nonverbal Memory Index .45 97.5 (11.7)

DTVP-2 General Visual Perception .73 109.2 (14.4) Motor-Reduced Visual Perception .70 100.8 (13.8) Visual Motor Integration .66 116.7 (15.7)

– the correlations have been corrected for the variance of the SON-IQ

Page 95: SON-R 2 - Tests & Test-research

95

At the time of administration of the SON-R 2,-7 and the second administration of the LDT andthe RAKIT, most of the children were in first grade (approximately 6-7 years of age). The meanage at the time of administration of the SON-R 2,-7 was 6;8 years with a standard deviation offour months. The SES index of the children (34 boys and 39 girls) had a mean of 2.0 (sd=1.9).The SES level of 75% of the children was low, 16% were below average and 9% were aboveaverage or high. Approximately half the children had one or both parents born outside theNetherlands, mainly in Surinam or the Antilles.

The LDT (Leiden Diagnostic Test; Schroots & Van Alphen de Veer, 1976) is a generalintelligence test for children in the age range four to eight years. The test has eight subtests,some taken from other tests. During the research the performance subtest Block Patterns andthree verbal subtests (Repeating Sentences, Questions about a Story and Comprehension andInsight) were administered. The standardized subtest scores have a mean of 100 and a standarddeviation of 15. Four verbal subtests from the RAKIT (Bleichrodt et al., 1984) were adminis-tered (Meaning of Words, Learning Names, Production of Ideas and Story Pictures). These areall components of the verbal learning and fluency factor. The standardized subtest scores have amean of 15 and a standard deviation of 5. The WISC-R is the Dutch edition (Van Haasen et al.,1986) of the American test with the same name (Wechsler, 1974). The scores for the perform-ance IQ (PIQ), the verbal IQ (VIQ) and the total IQ (FSIQ) have a mean of 100 and a standarddeviation of 15. Two reading tests that had been developed by the CITO (Central Institute forTest Development) were administered in the school year 1995/96. These were the Cito Three-Minute-Test for the level of Technical Reading and the Cito Test for Textual Reading.

The subtests of the LDT and the RAKIT were administered in 1991 (N=69) and in 1993(N=73) in one session. The period between administration of the LDT/RAKIT and the SON-R2,-7 varied in 1993 from several days up to several weeks. The WISC-R and the Test for

RELATIONSHIP WITH COGNITIVE TESTS

Table 9.5Correlations with Cognitive Tests Completed by Children at Low SES Schools Given Educatio-nal Priority (OVB-Schools)

Year Crit. Test SON-R 2,,,,,-7adm. Criterion Test N r Mean (SD) Mean (SD)

’91 LDT Block patterns 69 .54 99.7 (13.7) 92.7 (15.4) Mean 3 verbal tests .44 96.3 (11.0)RAKIT Mean 4 verbal tests .61 12.7 ( 3.6)

’93 LDT Block patterns 73 .66 95.8 (13.7) 92.0 (15.0) Mean 3 verbal tests .54 98.6 (10.8)RAKIT Mean 4 verbal tests .42 13.5 ( 3.5)

’95 WISC-R Total IQ 41 .74 90.5 (13.1) 92.2 (14.1) Performance IQ .73 91.5 (12.8) Verbal IQ .60 91.2 (13.6)CITO-test Technical reading .38 39.0 (21.6) Textual reading .52 16.4 (16.1)

– the correlations have been corrected for the variance of the SON-IQ– the SON-R 2,-7 was administered in 1993

Page 96: SON-R 2 - Tests & Test-research

96 SON-R 2,-7

Technical Reading were administered at the beginning of the school year ’95/’96 (N=41); theTest for Textual Reading was was administered later that year (N=35).

The correlations of the SON-IQ with the various test scores are presented in table 9.5. Thecorrelation with the performance subtest Block Patterns of the LDT, administered two yearsearlier, was .54. When administered in the same period as the SON-R 2,-7, the correlationincreased to .66. The correlation with the three verbal subtests of the LDT also increased from.44 to .54. The fact that the strong correlation of the SON-IQ with the four verbal subtests of theRAKIT decreased from .61 to .42, is noteworthy. The two subtests that had weaker correlationswith the SON-R 2,-7 in 1993, (Word Meaning and Production of Ideas), also had weakercorrelations with the LDT when administered in 1993.

The strongest correlation was found between the SON-IQ and the WISC-R, which wasadministered two years later. The correlation with the total IQ was .74; the correlations with thePIQ and the VIQ were .73 and .60 respectively. On average the SON-IQ score was slightlyhigher than the IQ score on the WISC-R.

The correlation of the SON-IQ with the Test for Textual Reading was .52 and the correlationwith the Test for Technical Reading was .38.

9.4 CORRELATION WITH COGNITIVE TESTS IN SPECIAL GROUPS

In the framework of the validation research, the SON-R 2,-7 was administered at a number ofschools and institutes for children with special problems and handicaps. Information on thechildren’s scores on a number of cognitive tests that are frequently used in these groups wasrequested from the schools. The tests were the Preschool SON, the SON-R 5,-17, the BOS, theStutsman, various versions of the Wechsler tests, the LDT and the RAKIT. Furthermore, infor-mation was requested concerning two tests for language development, the Reynell and the TvK.If a test had been administered several times, the most recent score was used.

For each criterion test, an overview is presented in table 9.6 of the age distribution of thechildren at the time of administration of the SON-R 2,-7, of various other background charac-teristics and of the specific groups the children came from. These groups are described inchapter 7. Most of the children for whom other test results were known were four to six yearsold at the time the SON-R 2,-7 was administered. The other test data, in the case of the youngerchildren, came mostly from the Preschool SON and the Reynell. In general, the criterion testhad been administered earlier, often many years earlier. The percentage of boys was relativelyhigh, and the average SES level was lower than in the norm population.

The correlation of the SON-R 2,-7 with the various criterion tests is presented in table 9.7.The results for each test will be discussed separately.

Preschool SONIn the case of 188 children, the IQ scores were known on the predecessor of the SON-R2,-7, the Preschool SON. More than half of this group were children with language/speechand hearing problems. Additionally, a large number of children with a general developmen-tal delay and/or a pervasive developmental disorder were tested with the Preschool SON.The IQ scores of the deaf children that were based on the separate standardization for thedeaf, were transformed into IQ scores based on the standardization for the hearing. Datafrom the Preschool SON were only used in the analysis if the test had been administered infull.

The mean age at the time of administration of the Preschool SON was 3;10 years, the meanage at the time of administration of the SON-R 2,-7 was 5;3 years. The period between theadministration of the tests was, on average, nearly a year and a half. In a few cases the intervalwas more than four years. In the case of 95% of the children, the SON-R 2,-7 was administeredafter the Preschool SON.

The correlation between the IQ scores on both tests was .65. This correlation increasedgreatly as the age at which the Preschool SON was administered increased. In the age group up

Page 97: SON-R 2 - Tests & Test-research

97

to 3;5 years the correlation was .57 (N=60); in the age range 3;5 to 4;1 years the correlation was.64 (N=64) and in the age range from 4;1 onwards the correlation was .77 (N=64). The intervalbetween the administration of the two tests may have influenced the increase in the correlationswith age. In the youngest group the average interval was 22 months and in the oldest group 10months. A relatively large difference, 13 IQ points, was found between the mean scores of thetwo tests. A substantial decrease in IQ scores can be expected in view of the interval of morethan 20 years between the two standardizations.

RELATIONSHIP WITH COGNITIVE TESTS

Table 9.6Characteristics of the Children in the Special Groups to Whom a Criterion Test WasAdministered

P-SON WPPSISON-R Stuts- WPPSI-R5,-17 BOS man WISC-R LDT RAKIT Reynell TvK

Total 206 26 42 112 80 70 179 49

Age

2 years – – 2 – – – 2 –3 years 21 4 5 2 1 – 39 –4 years 63 11 17 16 9 8 47 65 years 73 6 8 42 20 23 60 216 years 41 4 9 46 39 36 31 217 years 8 1 1 6 11 3 – 1

Group

Gen.Dev.Disorder 57 – 12 61 44 26 64 –Perv.Dev.Disorder 22 – 3 8 7 4 26 12Speech/lang.Disord. 58 9 – 4 20 24 74 28Hearing impaired 23 13 – – 9 16 15 9Deaf 46 4 27 39 – – – –

Sex

Boys 140 11 30 86 55 45 128 33Girls 66 15 12 26 25 25 51 16

SES Index

Mean 4.2 3.9 4.7 4.0 3.4 3.6 3.5 3.7(SD) (2.6) (2.1) (2.9) (2.5) (2.4) (2.5) (2.0) (2.3)

Country of birth

Native Dutch 89% 95% 90% 94% 91% 97% 92% 96%Mixed 7% 5% 5% 5% 4% 2% 5% 2%Immigrant 4% – 5% 1% 5% 2% 2% 2%

– the age is the age at the time of administration of the SON-R 2,-7

Page 98: SON-R 2 - Tests & Test-research

98 SON-R 2,-7

SON-R 5,,,,,-17The children at one institute for the deaf were not included in the analysis of the special groups,because of the probability that the low scores of these children were the result of an examinereffect (see section 7.7). Most of these children (N=18) were tested again three years later withthe SON-R 5,-17, the revision of the SON for older children. The mean age at the time ofadministration of the SON-R 2,-7 was 5;5 years. The mean age at the time of administration ofthe SON-R 5,-17 was 8;6 years. The correlation between the IQ scores was .66.

Table 9.7Correlations with Criterion Tests in the Special Groups

Scores Age Interval(years) (months)

criterion SON-R 2,-7Criterion Test N r Mean (SD) Mean (SD) Mean (SD) Mean (SD)

P-SONIQ-score 188 .65 97.9 (16.4) 84.8 (18.2) 4.6 (0.7) 17.0 (12.6)

SON-R 5,,,,,-17Standard IQ 18 .66 100.2 (20.4) 83.5 (14.5) 7.0 (0.8) 36.8 ( 7.0)

BOS 2-30Nonverbal scale 26 .50 95.5 (18.1) 84.1 (13.9) 3.5 (0.5) 35.4 (14.5)

STUTSMANTotal IQ 42 .57 106.7 (21.6) 92.7 (18.7) 4.1 (0.6) 21.1 (15.3)

WPPSI-RPerformance scale 19 .82 104.2 (15.6) 80.6 (13.2) 5.5 (0.8) 4.7 ( 2.8)

WPPSIPerformance scale 20 .82 111.4 (12.6) 102.0 (11.2) 5.5 (0.5) 10.7 ( 5.8)

WPPSITotal IQ 53 .60 87.9 (17.4) 83.0 (16.0) 5.5 (0.6) 8.2 ( 8.9) Verbal scale .49 87.7 (16.4) Performance scale .59 90.3 (18.8)

WISC-RTotal IQ 20 .62 85.9 (16.7) 82.1 (14.7) 6.6 (0.4) 2.4 ( 2.7) Verbal scale .47 91.8 (17.9) Performance scale .76 82.6 (14.9)

LDTTotal IQ 80 .58 85.0 (14.8) 81.6 (14.0) 5.9 (0.7) 8.4 ( 6.1)

RAKIT(Shortened version) IQ 40 .46 80.0 (16.6) 79.7 (15.0) 5.8 (0.6) 6.5 ( 5.0)

RAKITMean of 4 subtests 30 .64 14.9 ( 3.2) 93.8 (14.6) 5.9 (0.6) 8.5 ( 5.5)

REYNELLLanguage comp. A 179 .44 –1.4 ( 1.2) 83.0 (17.5) 4.9 (1.0) 5.3 ( 6.2)

TvKMean of 4 subtests 49 .53 3.4 ( 1.5) 86.1 (14.3) 5.9 (0.7) 3.5 ( 2.8)

– the correlations have been corrected for the variance of the SON-IQ– the age is the mean age at the time of administration of the SON-R 2,-7 and the criterion test

Page 99: SON-R 2 - Tests & Test-research

99

BOS 2-30The scores of 26 children on the nonverbal developmental index of the BOS 2-30 were known(Bayley Scales of Infant Development; Van der Meulen & Smrkovsky, 1983, 1987). All thechildren had a language/speech or hearing disorder. The mean age at the time of administrationof the BOS was 2;0 years. The administration of the SON-R 2,-7 took place between one andfive years later. The period between administrations was, on average, nearly three years. Themean age at the time of administration of the SON-R 2,-7 was 4;11 years. The correlationbetween the nonverbal developmental index of the BOS and the SON-IQ was .50.

STUTSMANThe Stutsman Test (Stutsman, 1931) uses toys and utensils. The tasks to be performed aredifferent for each age group. The test was adapted for the Netherlands (Smulders, 1963).However, the old American norms were maintained.

In this investigation, the test was mainly administered to deaf children and children with ageneral developmental delay. The mean age at the time of administration of the Stutsman was3;3 years and the mean age at the time of administration of the SON-R 2,-7 was 5;0 years. In 40of the 42 cases the Stutsman was administered first. The correlation between the IQ scores onthe two tests was .57. The norms of the Stutsman are obsolete; the scores have a mean that is 14points higher than the SON-IQ.

WPPSI-RThe performance scale of the WPPSI-R (Wechsler Preschool and Primary Scale of Intelligence- Revised; Wechsler, 1989) was administered to 19 children at one institute for the deaf. This isthe institute that was not taken into account in the analysis of the results of deaf children,because of an examiner effect on the administration of the SON-R 2,-7 (see section 7.7). At thetime the WPPSI-R was administered it had not been translated and standardized for the Nether-lands. A translation done by the institute was used, and the directions for the performancesubtests were adapted for use with deaf children. The scores were based on American norms.

The SON-R 2,-7 was administered first to 8 children and the WPPSI-R was administeredfirst to 11 children. The mean age at the time of administration of the tests was 5;6 years. Theinterval between the tests was, on average, 5 months. The correlation between the performanceIQ of the WPPSI-R and the SON-IQ was .82.

WPPSIA Dutch manual of the WPPSI (Wechsler, 1967) in which the American norms are used, waspublished in 1973 (Berger, Creuwels & Peters, 1973). In 1981 a Flemish adaptation of the testwas published with Flemish norms (Stinissen & Vander Steene, 1981). The test data for theWPPSI do not always show clearly which directions and norms were used.

In the case of 20 deaf children the administration of the WPPSI was limited to the perform-ance scale. The SON-R 2,-7 was administered first to six children and the WPPSI was admin-istered first to 14 children. The mean age at the time of administration of the WPPSI was 5;3years and of the SON-R 2,-7 5;8 years. The interval between the tests was, on average, 11months. The correlation between the WPPSI PIQ and the SON-IQ was .82.

The WPPSI was administered in full to 53 children. These were nearly all children with adevelopmental disorder. In 70% of the cases the WPPSI was administered first. The mean agesat the time of administration of the WPPSI and the SON-R 2,-7 were 5;3 and 5;9 yearsrespectively. The interval between administration of the tests was, on average, 8 months. Thecorrelation with the total IQ of the WPPSI was .60. The correlations of the SON-IQ with theverbal scale and the performance scale of the WPPSI were .49 and .59 respectively.

WISC-RIn the case of 20 children, scores were available on the WISC-R (Van Haasen et al., 1986), theDutch language version of the Wechsler Intelligence Scale for Children - Revised, (Wechsler,1974), that has been standardized for the Netherlands. This test was administered mainly to

RELATIONSHIP WITH COGNITIVE TESTS

Page 100: SON-R 2 - Tests & Test-research

100 SON-R 2,-7

children with a general developmental delay or with pervasive developmental disorder and to afew children with a speech or language disorder. The SON-R 2,-7 was administered first to 15children. The mean ages at the time of administration of the SON-R 2,-7 and the WISC-R were6;6 and 6;8 years respectively. The interval between administration was, on average, a littlemore than two months.

The correlation with the WISC-R total IQ was .62. With the verbal scale the correlation was.47 and with the performance scale .76. The mean score of the SON-IQ was more than 3 pointslower than the WISC-R total IQ. The score on the verbal scale of the WISC-R was 9 pointshigher than the score on the performance scale; the mean score on the performance scale waspractically the same as the SON-IQ.

Correlations with the SON-IQ were also calculated for the combined data of the WPPSI, theWPPSI-R and the WISC-R. As the norms differ, the mean scores of the tests were equated to 0for each test combination. Subsequently the correlations were calculated for the combinedgroup. Using this procedure, the correlation of the SON-IQ with the performance scale of theWechsler tests could be calculated for 112 children; this was .69.

In the case of 73 children to whom the WPPSI and the WISC-R were administered in full, thecorrelation with the total IQ was .62. The correlations of the SON-IQ with the verbal scale andthe performance scale for these children were .49 and .63 respectively.

LDTThe LDT (Leiden Diagnostic Test; Schroots & Alphen de Veer, 1976) consists of eight subtestswhich tap verbal and performance skills, and memory. The subtests are partially adapted sub-tests from other tests, including subtests of the WPPSI and the WISC. The test has beenstandardized for the Netherlands.

The LDT was administered in full to 80 children, most of whom had a general developmentaldelay or a speech or language disorder. In the case of 53 children, the LDT was administeredfirst. The mean ages at the time of administration of the LDT and the SON-R 2,-7 were 5;7 and6;1 years respectively. The average interval between the tests was 8 months.

The correlation of the SON-IQ with the LDT IQ was .58. The correlation of the SON-IQ withthe mean score on three performance subtests (Block Patterns, Folding Papers and Copy-Tapping) was .67; the correlation with the mean score on two memory tests (Vocabulary Lengthand Indicating Pictures) was .43 and the correlation with the mean score on three verbal tests(Repeating Sentences, Questions about a Story and Comprehension and Insight) was .20. In thecase of children younger than 5;6 years at the time of administration of the LDT (N=38), thecorrelation with the LDT IQ was .53.; in children older than 5;6 it was .61. The correlation withthe performance tests of the LDT increased with age from .59 to .74.

RAKITThe administration of the RAKIT (Revision of the Amsterdam Intelligence Test for Children;Bleichrodt et al., 1984) takes so long that usually only a few subtests were administered. TheRAKIT was administered to all groups except the deaf children. In the case of 54% of thechildren the SON-R 2,-7 was administered first.

The shortened version of the RAKIT was administered to 27 children and the test wasadministered in full to 13 children. In this group of 40 children, the mean ages at the time ofadministration of the RAKIT and the SON-R 2,-7 were 5;7 and 5;11 years respectively. Theperiod between the administrations was, on average, a good half year. The correlation of theSON-IQ with the RAKIT IQ was .46; the mean scores were practically the same.

In the case of 30 other children, the administration of the RAKIT was limited to the firstfour subtests (Figure Recognition, Exclusion, Memory and Word Meaning). The mean agesat the time of administration of the RAKIT and the SON-R 2,-7 were 5;10 and 6;0 yearsrespectively. The period between administrations was, on average, a good 8 months. Thecorrelation between the SON-IQ and the mean standard score on the four subtests of theRAKIT was .64.

Page 101: SON-R 2 - Tests & Test-research

101

REYNELLIn these research groups, the scores on the RDLS (Reynell Development Language Scales;Reynell, 1977) relate to the Dutch translation by Bomers and Mugge (1985), which uses the oldEnglish norms. In most cases only the subtest Language Comprehension A was administered.The standardized scores have a mean of 0 and a standard deviation of 1.

The Reynell Test was administered to 179 children. The mean age at the time of administra-tion of the Reynell was 4;10 years and the mean age at the time of administration of the SON-R2,-7 was 5;0 years. The interval between tests was on average a little more than 5 months; in52% of the cases the Reynell was administered first.

The correlation between the score on Language Comprehension and the SON-IQ was .44. Inthe group of children with a general developmental delay or with a pervasive developmentdisorder (N=90), the correlation was .55; in the group of children with a speech or languagedisorder, or with impaired hearing (N=89), the correlation was .35. A distinction was made inboth groups between the children who were younger than five years at the time of administra-tion of the Reynell and the older children. The correlation in the youngest group of children withgeneral or pervasive development problems was .63 (N=54) and in the oldest group the correla-tion was .46 (N=36). In the group of children with speech or language disorders, or withimpaired hearing, the correlation was .24 in the youngest group (N=44) and .49 in the oldestgroup (N=45).

TvKThe scores of 49 children were known on at least three of the following four subtests of the TvK(Language Tests for Children; Van Bon, 1982) Word-Form Production, Choice of SentenceStructure, Choice of Vocabulary and Vocabulary Production.

The TvK was administered mainly to children with a speech or language disorder. Childrenwith a pervasive development disorder and hearing impaired children were also tested. In 84%of the cases the TvK was administered after the SON-R 2,-7. The mean interval between thetests was a little more than three months. The mean age at the time of administration of theSON-R 2,-7 was 5;10 years and the mean age at the time of administration of the TvK was 6;0years. The standardized scores on the TvK have a mean of 5 and a standard deviation of 2. Thecorrelation of the mean standard score on the subtests of the TvK with the SON-IQ was .53.

9.5 CORRELATION WITH THE WPPSI-R IN AUSTRALIA

Comparative research on the WPPSI-R (Wechsler, 1989) and the SON-R 2,-7 was carried outin Victoria, Australia by Jo Jenkinson at Deakin University, in collaboration with Susan Robertsof the Mental Health Research Institute, Shirley Dennehy of the Advisory Council for Childrenwith Impaired Hearing, and the University of Groningen (Brouwer, Koster & Veenstra, 1995;Jenkinson, Roberts, Dennehy & Tellegen, 1996; Tellegen, 1997). The research was done with asample of 155 children (72 boys and 83 girls) with a mean age of 4;5 years (standard deviationwas 10 months). The sample consisted of children without specific problems and handicaps(control group; N=59), children with impaired hearing, of whom 75% had a hearing loss of atleast 60 dB (N=59), and children with a developmental delay (N=37).

The SON-R 2,-7 and the WPPSI-R were administered alternately; the mean intervalbetween tests was 20 days. The SON-R 2,-7 was administered by Dutch students and theWPPSI-R by a school psychologist who was, in most cases, associated with the institute wherethe research was conducted. Only the performance subtests of the WPPSI-R were administeredto hearing disabled children and children with a developmental delay. The entire WPPSI-R wasadministered to the control group.

The mean scores, and the correlations of the SON-IQ with the performance scale (PIQ), theverbal scale (VIQ) and the total score on the WPPSI-R (FSIQ), are presented in table 9.8.American norms have been used for the WPPSI-R and Dutch norms for the SON-R 2,-7. Whencalculated over the total group, the correlation with the PIQ was .78; within the different groups

RELATIONSHIP WITH COGNITIVE TESTS

Page 102: SON-R 2 - Tests & Test-research

102 SON-R 2,-7

this correlation was .74 or .75. The correlation with the verbal IQ was clearly lower in thecontrol group (r=.54). The correlation with the full scale IQ of the WPPSI-R (r=.75) was slightlyhigher in the control group than the correlation with the PIQ. On average, the scores on theSON-R 2,-7 were five points lower than those of the PIQ.

The mean differences between the groups were very similar for the SON-IQ and the PIQ: thedifference between the hearing-disabled group and the control group was 13.3 for the SON-IQand 10.3 for the PIQ. For the group with a developmental delay the difference was 40.5 for theSON-IQ and 38.5 for the PIQ.

Table 9.8Correlations with the WPPSI-R in Australia

Mean and Standard Deviation

Entire Control Hearing Developm.group group impairment delay

(N=155) (N=59) (N=59) (N=37)

SON-IQ 94.2 (22.3) 108.9 (14.5) 95.6 (15.6) 68.4 (19.0)

WPPSI-R PIQ 99.1 (21.8) 112.2 (13.5) 101.9 (17.2) 73.7 (17.5)VIQ – 109.1 (11.1) – –FSIQ – 112.2 (12.4) – –

Correlation with SON-IQ

Entire Control Hearing Developm.group group impairment delay

WPPSI-R PIQ .78 .74 .74 .75VIQ – .54 – –FSIQ – .75 – –

– the correlations have been corrected for the variance of the SON-IQ

9.6 CORRELATION WITH COGNITIVE TESTS IN WEST VIRGINIA,USA

In the USA, research on the relationship between the SON-R 2,-7 and several cognitive tests hasbeen done by Stephen O’Keefe at the West Virginia Graduate College. Students from the Univer-sity of Groningen, who administered the SON-R 2,-7 to some of the children, participated in thisresearch (Ten Horn, 1996). The following tests were used: the WPPSI-R (Wechsler, 1989), theK-ABC (Kaufman & Kaufman, 1983), the MSCA (McCarthy Scales of Children’s Abilities;McCarthy, 1972), the PPVT-R (Peabody Picture and Vocabulary Test – Revised; Dunn & Dunn,1981) and the PLS-3 (Preschool Language Scale-3; Zimmerman, Steiner & Pond, 1992).

Most of the children were between four and five years of age, were tested at school indifferent places in West Virginia, and had no specific handicaps. Nearly all children wereCaucasian. The distribution according to sex and age at which the SON-R 2,-7 was adminis-tered, is presented per test in table 9.9. In most cases, administration of the criterion test and theSON-R 2,-7 took place shortly after each other, sometimes on the same day. The criterion testwas usually administered first. The PPVT was not administered within the framework of thisresearch. Previously acquired data were made available by the school.

In table 9.10, the mean scores on the tests and the correlations with the SON-IQ are present-ed. American norms were used for the American tests and Dutch norms for the SON-R 2,-7.The results are described per test.

Page 103: SON-R 2 - Tests & Test-research

103

WPPSI-RThe WPPSI-R was administered to 75 children whose mean age at the time the SON-R 2,-7was administered was 5;1 years. The correlation of the SON-IQ with the total IQ (FSIQ) of theWPPSI-R was .59; the correlations with the performance and verbal scales were .60 and .43respectively. The mean score on the SON-IQ was more than two points lower than the FSIQ andnearly four points lower than the PIQ.

K-ABCThe original American edition of the Kaufman Assessment Battery for Children differs inseveral respects from the Dutch edition for young children (the GOS 2,-4,). The simultaneousscale of the K-ABC consists of seven parts, and the sequential scale of six parts. However, thenumber of subtests administered depends on age. The subtests of the simultaneous and sequen-tial scales form the mental scale. A number of subtests of the mental scale, in which no verbalabilities are required, form the nonverbal scale. The mean of all scale scores is 100 and thestandard deviation is 15.

The mean age of the 31 children to whom the K-ABC was administered was 4;7 years. TheSON-IQ had the highest correlation with the total Mental Score of the K-ABC, r=.66. Thecorrelation with the Sequential Scale (r=.29) was considerably lower than the correlation withthe Simultaneous Scale (r=.58). This corresponds with the results of Dutch research with theGOS 2,-4,, but with the K-ABC, as with the GOS, the distribution of scores on the SequentialScale was considerably narrower that the distribution of scores on the Simultaneous Scale. Thecorrelation with the Achievement Scale was .58 and correlation with the Nonverbal Score of themental scale was .61.

MSCAThe McCarthy Scales of Children’s Abilities, published in the Netherlands as the MOS 2,-8,(Van der Meulen & Smrkovsky, 1986), consists of eighteen subtests. The administration was lim-ited to the subtests of the Verbal Scale, the Perceptual Performance Scale and the QuantitativeScale, which, together, form the General Cognitive Index. The scale scores have a mean of 50 anda standard deviation of 10. The general index has a mean of 100 and a standard deviation of 16.

The test was administered to 26 children with a mean age of 4;7 years. The correlation withthe General Cognitive Index of the MSCA was .61. The highest correlation was with with thePerceptual Performance Scale (r=.61). The correlation with the Verbal Scale was .48 and thecorrelation with the Quantitative Scale was .40.

PPVT-RThe Peabody Picture Vocabulary Test requires the child to choose from four pictures the one thatbest represents the meaning of a word that has been presented verbally. The standard score onthe test has a mean of 100 and a standard deviation of 15.

The PPVT-R scores of 29 children to whom the SON-R 2,-7 was administered were knownby the school. The mean age at the time of administration of the SON-R 2,-7 was 5;6 years.The correlation of the Peabody Standard Score with the SON-IQ was .47.

Table 9.9Age and Sex Distribution of the Children in the American Validation Research

Sex Age at time of admin. of SON-RCriterion Test N Boys Girls 3 years 4 years 5 years 6 years

WPPSI-R 75 38 37 1 28 45 1K-ABC 31 16 15 – 31 – –MSCA 26 12 14 1 24 1 –PPVT-R 29 15 14 – 3 25 1PLS-3 47 26 21 – 47 – –

RELATIONSHIP WITH COGNITIVE TESTS

Page 104: SON-R 2 - Tests & Test-research

104 SON-R 2,-7

PLS-3The Preschool Language Scale-3 is a test for the receptive and expressive language ability ofyoung children. Separate scores are calculated for Auditory Comprehension and ExpressiveCommunication. Together they form the Total Language Ability Score. The three standardizedscores have a mean of 100 and a standard deviation of 15.

The test was administered to 47 children. The mean age at the time of administration of theSON-R 2,-7 was 4;7 years. The correlation with the Total Score of the PLS-3 was .61. Thecorrelation with the Receptive Language Ability (r=.59) was slightly higher than the correlationwith the Expressive Language Ability (r=.56).

9.7 CORRELATION WITH THE BAS IN GREAT BRITAIN

A comparative research project on the BAS (British Ability Scales; Elliott, Murray & Pearson,1979-82) and the SON-R 2,-7 was set up by Julie Dockrell of the Institute of Education,University of London. The research was carried out in collaboration with the University ofGroningen. English students administered the BAS and Dutch students the SON-R 2,-7. Thetests were administered alternately to 58 children from the first class of different primaryschools, with the interval between tests varying from a few days to a few weeks. The first class

Table 9.10Correlations with Criterion Tests in the American Research

Scores Age Interv.(years) (days)

Criterion SON-R 2,,,,,-7Criterion Test N r Mean (SD) Mean (SD) Mean Mean

WPPSI-RFull Scale IQ 75 .59 96.8 (13.9) 94.5 (16.6) 5.1 14 Performance IQ .60 98.3 (14.9) Verbal IQ .43 96.1 (13.0)

K-ABCMental Processing Composite 31 .66 97.3 (16.0) 86.1 (20.9) 4.6 16 Simultaneous Processing .58 96.2 (19.3) Sequential Processing .29 98.3 (13.3) Achievement Scale .58 96.0 (13.9) Nonverbal Scale .61 96.5 (15.5)

MSCAGeneral Cognitive Index 26 .61 102.3 (19.3) 95.0 (19.1) 4.6 13 Verbal Scale .48 50.8 (13.3) Perceptual-Perform. Scale .61 52.2 (10.2) Quantitative Scale .40 49.9 (10.9)

PPVT-RStandard Score Equivalent 29 .47 95.7 (19.7) 95.5 (15.3) 5.5 –

PLS-3Total Language Score 47 .61 102.7 (19.8) 91.4 (18.3) 4.6 5 Auditory Comprehension .59 103.6 (18.8) Expressive Communication .56 101.3 (18.6)

– the age is the age at the time of administration of the SON-R 2,-7– the correlations have been corrected for the variance of the SON-IQ

Page 105: SON-R 2 - Tests & Test-research

105

corresponds to group three in Dutch primary education. The mean age was 6;3 years with astandard deviation of 3 months. The group consisted of 34 boys and 24 girls. The schoolsselected children belonging to one of the following three groups: the control group (childrenwithout specific problems and handicaps, N=20); the ESL group (English as a SecondLanguage, N=22) and the LD-group (Learning Disabled, N=16).

The shortened version of the BAS was administered. This consists of four subtests (NamingVocabulary, Digit Recall, Similarities and Matrices), supplemented by two nonverbal subtests(Block Design and Visual Recognition). In addition to the IQ score for the shortened version andthe combination of six subtests, the mean score for the three verbal tests (Naming Vocabulary,Digit Recall and Similarities) and the three nonverbal tests (Matrices, Block Design and VisualRecognition) were also calculated. The IQ scores have a mean of 100 and a standard deviationof 15; the subtest scores have a mean of 50 and a standard deviation of 10.

In table 9.11, the mean scores for the entire group and for the different subgroups arepresented, together with the correlations of the scores on the BAS with the SON-IQ. Thecorrelation with the shortened version of the BAS was .80 in the entire group. When twononverbal subtests are added to the shortened version of the BAS, the correlation increased to.87. The correlation with the three nonverbal tests (r=.78) was higher than with the three verbaltests (r=.71), but even the latter was high.

Within the three subgroups the correlations of the SON-IQ with the BAS IQ, based on sixsubtests, and with the nonverbal tests, had comparably high values. In the control group, how-ever, the correlations of the SON-IQ with the shortened version of the BAS, and with the threeverbal subtests, were clearly lower than in the other groups.

In the entire group, the IQ scores on the SON-R 2,-7 were, on average, 7 points lower thanon the shortened version of the BAS. The difference in IQ scores between the control group andthe ESL group was slightly less for the SON-R 2,-7 (20.8 points) than for the shortened form ofthe BAS (23.9 points). When the two nonverbal tests were added to the BAS IQ, the differenceon the BAS between the two groups decreased to 19.5 points. The difference between the

RELATIONSHIP WITH COGNITIVE TESTS

Table 9.11Correlations with the BAS in Great Britain

Mean and Standard Deviation

Entire Control English Learninggroup group 2nd language problems

(N=58) (N=20) (N=22) (N=16)

SON-IQ 83.6 (20.4) 102.7 (14.4) 81.9 (11.2) 61.9 (11.9)

BAS IQ (Shortened vers.) 90.6 (20.0) 111.5 (10.0) 87.6 (11.7) 68.7 ( 9.8)

BAS IQ (6 Subtests) 92.4 (18.8) 111.2 ( 8.7) 91.7 (10.8) 69.9 ( 8.8) Mean of 3 verbal tests 44.1 ( 9.4) 54.3 ( 5.6) 41.4 ( 5.6) 35.0 ( 4.2) Mean of 3 nonverbal tests 49.2 ( 9.8) 56.4 ( 7.0) 51.1 ( 6.9) 37.6 ( 4.8)

Correlation with the SON-IQ

Entire Control English Learninggroup group 2nd language problems

BAS IQ (Shortened vers.) .80 .56 .76 .78

BAS IQ (6 Subtests) .87 .83 .85 .87 Mean of 3 verbal tests .71 .35 .60 .73 Mean of 3 nonverb.tests .78 .69 .81 .81

– the correlations have been corrected for the variance of the SON-IQ

Page 106: SON-R 2 - Tests & Test-research

106 SON-R 2,-7

control group and the LD group was 40.8 points for the SON-IQ and 42.8 points for theshortened BAS. For the BAS IQ based on six subtests, the difference was 41.3 points.

9.8 OVERVIEW OF THE CORRELATIONS WITH THE CRITERIONTESTS

In table 9.12 an overview is presented of the correlations in the various research projects of theintelligence and (language) development tests with the SON-IQ. A distinction has been madebetween:– general intelligence measures, based on verbal as well as performance test sections,– nonverbal measures, such as scores for performance intelligence, visual perception and non-

verbal memory,– verbal measures, such as the verbal section of the intelligence tests and more specific meas-

ures for verbal development and skills.

The 12 correlations with general intelligence measures varied from .54 to .87. The mean of thecorrelations was .65. Half of the correlations ranged from .59 to .70. Two correlations with thetotal IQ of the WPPSI-R (r=.75) and the total IQ of the WISC-R (r=.74), and the correlation withthe sum of the six subtests of the BAS (r=.87) were higher than .70.

The 21 correlations with nonverbal (intelligence) measures ranged from .45 to .83 and had amean of .65. Fifty percent of the correlations ranged from .59 to .75. The correlations that werehigher than .75 refer, in two cases, to the performance IQ of the WPPSI-R (.77 and .83), to thecorrelation with three performance subtests of the BAS (.78), to the correlation with the SON-R5,-17 (.76) and to the retest with the SON-R 2,-7 (.79). Relatively low correlations werefound with the nonverbal version of the BOS (.50 and .53), with the Stutsman (r=.57), theTONI-2 (r=51) and the nonverbal memory of the TOMAL (r=.45).

The 19 correlations with measures for verbal development and verbal intelligence rangedfrom .20 to .71 with a mean of .48. Half of the correlations fell between .45 and .54. The verbalsections of intelligence tests (WISC-R, BAS) as well as specific language tests (TvK, PLS-3)had relatively strong correlations.

In a few studies the correlation with the total score on the criterion test could be comparedwith the correlation on the performance and the verbal scales (as in the case of the WPPSI-R, theWISC-R, the MSCA, the BAS and the LDT). In all these cases the correlation with the perform-ance scale of the criterion test was clearly stronger than with the verbal scale. In the case of theWPPSI-R, the WISC-R and the MSCA the correlation with the total score was almost as high asthe correlation with the performance scale of these tests. This corresponds to the findings withthe SON-R 5,-17 (Tellegen, 1993). In the case of the BAS the correlation with the total scorewas stronger than with the performance section as a result of the strong correlation with theverbal section. The correlation with the total score in the case of the LDT, on the other hand,decreased because of the very weak correlation with the verbal section. The two tests for whicha nonverbal score could be calculated by leaving out part of the test (BOS 2-30 and K-ABC) hadweaker correlations with the SON-IQ.

The correlations that were obtained support the convergent and divergent validity of the SON-R2,-7. The correlations with general intelligence tests and nonverbal cognitive tests werereasonably strong, whereas the correlations with verbal and memory tests were clearly weaker.

However, the level of the correlations with other intelligence measures was considerablylower than the reliability of the test. This means that important differences may be foundbetween the scores on the SON-R 2,-7 and other intelligence tests. To a great extent this wasthe result of the young age at which the children were tested and the occasionally very longinterval between the administrations of the tests, as well as of differences in the composition ofthe tests and in the manner of administration between the SON-R 2,-7 and the criterion tests. Ingeneral, performance on intelligence tests tends to be less stable as the age at which the children

Page 107: SON-R 2 - Tests & Test-research

107RELATIONSHIP WITH COGNITIVE TESTS

Table 9.12Overview of the Correlations with the Criterion Tests

Intelligence/DevelopmentTest Country Group N General Nonverbal Verbal sec.

P-SON NL special groups 188 .65 IQ 9.4SON-R 2,-7 NL stand.research 141 .79 IQ 9.1SON-R 5,-17 NL special groups 18 .66 IQ 9.4SON-R 5,-17 NL stand.research 119 .76 IQ 9.1

Stutsman NL special groups 42 .57 IQ 9.4

TONI-2 NL prim.education 153 .51 IQ 9.2

BOS 2-30 NL stand.research 50 .59 MS .53 Nonv. 9.1BOS 2-30 NL special groups 26 .50 Nonv. 9.4

K-ABC (GOS) NL stand.research 115 .65 GCI 9.1K-ABC US prim.education 31 .66 GCI .61 Nonv. 9.6

WPPSI/WPPSI-R NL special groups 39 .83 PIQ 9.4WPPSI/WISC-R NL special groups 73 .62 FSIQ .63 PIQ .49 VIQ 9.4WISC-R NL OVB-schools 41 .74 FSIQ .73 PIQ .60 VIQ 9.3WPPSI-R AU special groups 96 .77 PIQ 9.5WPPSI-R AU prim.education 59 .75 FSIQ .74 PIQ .54 VIQ 9.5WPPSI-R US prim.education 75 .59 FSIQ .60 PIQ .43 VIQ 9.6

MSCA US prim.education 26 .61 .61 .48 9.6

BAS (shortened) GB mixed group 58 .87 (6s) .78 (3s) .71 (3s) 9.7

LDT NL OVB-schools 71 .60 BP .49 (3s) 9.3LDT NL special groups 80 .58 IQ .67 (3s) .20 (3s) 9.4

RAKIT (short) NL stand.research 165 .60 9.1RAKIT (short) NL special groups 70 .54 9.4RAKIT NL OVB-schools 71 .51 (4s) 9.3

DTVP-2 NL prim.education 153 .73 GVP 9.2

TOMAL NL prim.education 153 .45 NMI 9.2

PPVT-R US prim.education 29 .47 9.6

PLS-3 US prim.education 47 .61 9.6

TvK NL stand.research 108 .59 (5s) 9.1TvK NL special groups 49 .53 (4s) 9.4

Reynell (old) NL special groups 179 .44 LC 9.4Reynell (new) NL stand.research 558 .48 LC 9.1

Schlichting NL stand.research 558 .35 SD 9.1Schlichting NL stand.research 558 .45 WD 9.1Schlichting NL stand.research 56 .54 Lex 9.1Schlichting NL stand.research 241 .27 AM 9.1

– the correlations have been corrected for the variance of the SON-IQ– (3s) signifies score based on 3 subtests– NL (Netherlands); GB (Great Britain); US (United States of America); AU (Australia)– sec: the section in which the research has been described

Page 108: SON-R 2 - Tests & Test-research

108 SON-R 2,-7

are tested decreases and as the period between the test administrations increases (Bayley, 1949).As described in section 9.1 and 9.4, on the basis of various analyses, the correlation of the

SON-IQ with the criterion tests increased greatly as the age at which the tests were administeredincreased. The facts that part of the research was done with children who were difficult to testand that a shortened version of the criterion tests was often administered are also factorscontributing to the weakening of the correlations.

In order to illustrate the occasionally very large discrepancies between the scores on the SON-R2,-7 and tests that correspond greatly in content, we shall make a further comparison of thedifferences in scores between the PIQ of the WPPSI-R and the SON-IQ. The comparison isbased on the results of the 155 children who were tested in Australia and the 75 children whowere tested in West Virginia with the WPPSI-R. In these research projects the interval betweenthe two test administrations was generally limited to a few weeks. In table 9.13 the frequencydistribution of the absolute differences between the PIQ and the SON-IQ is presented. Thesescores were also calculated after first correcting for the difference in means, so that possiblediscrepancies in standardization of the tests do not play a role; five points were deducted fromthe PIQ for this.

After correcting for the means, the differences in scores for two thirds of the children wereslight (less than 10 points). For a quarter of the children, the differences ranged from 10 to 19points. In the case of 9% of the children the differences were larger than 20 points and for fourof these children the differences were quite extreme, i.e. 30 points or more. The scores of thesefour children are presented in the second part of table 9.13.

Two children, a boy with impaired hearing tested in Australia and a girl tested in WestVirginia, scored substantially lower on the SON-R 2,-7 than on the performance section of theWPPSI-R. In the latter case the performance on the SON-R 2,-7 was possibly influencednegatively by the fact that the child had been tested on the WPPSI-R earlier that day. Two girls,both tested in West Virginia, scored substantially higher on the SON-R 2,-7 than on theWPPSI-R. Neither of these girls functioned well socially and both were difficult to test.

These extreme cases, in which a child performed far below his or her potential on one of thetests, had a strong negative influence on the correlations. In the Australian research the correla-tion increased from .78 to .80 if the deviating subject was left out of the calculation. In theAmerican research the correlation increased from .60 to .74 if the three children with deviatingscores are left out.

The examples show that extreme ‘underperforming’ can occur with the SON-R 2,-7 as wellas with the WPPSI-R. This certainly also applies to other intelligence tests. In chapter 10 thesignificance of this for diagnostic work with young children is examined.

Table 9.13Difference in Scores between SON-IQ and PIQ of the WPPSI-R (N=230)

Frequency Distribution of the Absolute Difference in Scores

0 – 9 10 – 19 20 – 29 30 – 39 40 – 49 50 – 59

No correction 54% 34% 10% 0.4% 0.9% 0.4%Correction mean 66% 25% 7% 0.4% 1.3% 0%

Children with Difference Score > 30

Sex Age Land SON-IQ PIQ Sex Age Land SON-IQ PIQ

A boy 4;4 Aust. 79 130 C girl 4;11 US 124 86B girl 4;8 US 68 110 D girl 5;1 US 113 71

Page 109: SON-R 2 - Tests & Test-research

109

9.9 DIFFERENCE IN CORRELATIONS BETWEEN THEPERFORMANCE SCALE AND THE REASONING SCALE

In order to evaluate the validity of the distinction between the Performance Scale (SON-PS) andthe Reasoning Scale (SON-RS) of the SON-R 2,-7, we examined whether consistent differ-ences were found in the strength of the correlation of these scales with the other tests. Thecomparison was limited to samples of at least 50 persons. As we are comparing correlations oftwo scales within the same research group and not correlations between different researchgroups, the correlations were not corrected for the variance of the test scores.

Table 9.14 presents the correlations of the Performance and the Reasoning Scales with thecriterion tests for which a difference of .10 or more was found between the correlations with thetotal score, or a subscale. These correlations are printed in bold type. The country where theresearch was carried out, the specific group of children, and the section in which the research isdescribed, are also shown in the table. The correlations with the LDT at the OVB-schools,where the LDT was administered twice, were calculated as the mean of both correlations. In the

RELATIONSHIP WITH COGNITIVE TESTS

Table 9.14Correlations of the Performance Scale and the Reasoning Scale with Criterion Tests, for Casesin which the Difference Between Correlations Was Greater Than .10 (printed in bold)

rCriterion Test N PS RS Diff. Land Group sec.

WPPSI/ FSIQ 73 .54 .58 –.04 NL special groups 9.4WISC-R PIQ .59 .53 .06

VIQ .39 .50 –.11

WPPSI-R FSIQ 75 .59 .53 .06 VS prim.education 9.6PIQ .64 .51 .13VIQ .42 .42 .00

WPPSI-R FSIQ 59 .67 .62 .05 AU control group 9.5PIQ .77 .53 .24VIQ .38 .53 –.15

WPPSI-R PIQ 96 .90 .74 .16 AU special groups 9.5

BAS IQ Shortened vers. 58 .77 .86 –.09 GB entire group 9.63 Nonverbal subt. .82 .79 .033 Verbal subtests .68 .82 –.14

LDT Total IQ 80 .47 .56 –.09 NL special groups 9.43 Performance subt. .66 .49 .172 Memory tests .30 .47 –.173 Verbal subtests .07 .32 –.25

LDT Blokpatronen 71 .62 .45 .17 NL OVB schools 9.33 Verbal subtests .39 .47 –.08

RAKIT IQ Shortened vers. 165 .58 .46 .12 NL norm group 9.1

RAKIT IQ Shortened vers. 70 .42 .52 –.10 NL special groups 9.4

DTVP-2 GVP Total score 153 .78 .52 .26 NL prim.education 9.2

Reynell Language compreh. 179 .36 .54 –.18 NL special groups 9.4

– codes for the countries: NL (The Netherlands); GB (Great Britain); US (United States of America) ;AU (Australia)

Page 110: SON-R 2 - Tests & Test-research

110 SON-R 2,-7

Australian research with the WPPSI-R, the children with impaired hearing were combined withthe children with learning problems when calculating the correlations.

The Performance Scale of the SON-R 2,-7 clearly had a stronger correlation than the Reason-ing Scale with:– the performance scale of the Wechsler tests,– the performance subtests of the LDT,– the DTVP-2, the test for visual perception.

The Reasoning Scale of the SON-R 2,-7 clearly had a stronger correlation than the Perform-ance Scale with:– the verbal scale of the Wechsler tests,– the verbal subtests of the BAS,– the verbal subtests and the memory tests of the LDT,– the Reynell Test for Language Comprehension.

The results in two research projects with the shortened version of the RAKIT were contra-dictory. In the case of the DTVP-2, the large difference in correlations was caused mainly by thesubtests of the scale for Visual Motor Integration; the difference here was .34. The differencebetween the correlations with the scale for Motor Reduced Visual Perception was .14. In thestandardization research the difference in correlations with the Reynell Test for LanguageComprehension and the Schlichting Test for Language Production was slight. The difference intwo research projects with the TvK had a mean of -.08.

The results support the distinction that was made on the basis of the analysis of the internalstructure of the test (section 5.4). They indicate that two aspects of general intelligence arerepresented in the SON-R 2,-7; on one hand the performance perceptual tasks, related tospatial understanding and visual motor skills, and on the other hand the tasks that requireabstract and concrete reasoning. These latter tasks have a stronger relationship with verbalintelligence and language skills. Because of this, the SON-R 2,-7 is more versatile than anonverbal intelligence test that is limited to specific performance tasks.

9.10 DIFFERENCES IN MEAN SCORES ON THE TESTS

An important factor in interpreting and comparing test scores is the degree of comparability ofthe norms of the different tests. This can be particularly problematic if a test is used in a countrythat is not the country where the standardization was done. However, norms that are obtained inthe same country are also not always easily comparable. This could be because the normpopulations are defined differently (including/excluding children from special education; in-cluding/excluding immigrant children). It could also be the manner in which the scores ofchildren who complete none or only a part of the test are handled in the standardization. Also,‘floor’ and ‘ceiling’ effects that often occur in the extreme age ranges can complicate makingcomparisons.

The size of the norm group influences the accuracy of the norms. Further, the model that ischosen for the transformation of the raw scores into scaled scores determines the norms. Ifnorms are presented for a relatively wide age range, systematic distortions occur for the childrenwhose age does not correspond to the middle of the range. Furthermore, the question ariseswhether the norms are applicable when only part of a test was administered (for instance ashortened version, or only the performance sections). In this case, the manner of administrationdiffers from the manner in which the norm data were gathered.

An important, and very difficult problem for diagnostics is obsolescence of test norms. Theperformance on intelligence tests by children of the same age increases by about 3 points every10 years in Western countries (Lynn & Hampson, 1986; Flynn, 1987). However, this can differ

Page 111: SON-R 2 - Tests & Test-research

111

from country to country and from test to test. As a result of a general improvement in perform-ance, the norms will become ‘stricter’ for a new test, and the scores will be lower than on testsstandardized some time ago. A similar effect was observed in the Netherlands during the revisionof the WISC-R (Harinck & Schoorl, 1987) and the SON-R 5,-17 (Snijders, Tellegen & Laros,1989). In an American comparison of the WISC-III with the WISC-R (Wechsler, 1991), and ofthe WPSSI-R with the WPPSI (Wechsler, 1989), the increase for the FSIQ was 3.4 points per 10years (averaged over both tests); for the PIQ this was 4.3 points and for the VIQ 1.9 points.

In table 9.15 the mean scores on the SON-R 2,-7 and the most important criterion tests arepresented. When possible, the results of different research groups were combined (the section inwhich the research is described is referred to in the table). Neither criterion tests that wereadministered to less than 50 children, nor specific verbal tests are shown in this table. Further-more, a distinction was made between criterion tests that were scored according to Dutch,American and English norms.

The differences between the mean scores were also corrected for the interval between thepublication of the criterion test and the publication of the SON-R 2,-7 (1996). Unfortunately

RELATIONSHIP WITH COGNITIVE TESTS

Table 9.15Comparison Between the Mean Test Scores of the SON-R 2,-7 and the Criterion Tests

Dutch Norms Difference Mean Year without/with

Criterion Test N SON-R Crit. Type correction section

P-SON Total IQ 188 84.9 97.9 ’75 p –13.0 –4.0 [9.4]

BOS Mental Scale 50 103.0 100.5 ’83 g 2.4 6.8 [9.1]Nonverbal Scale 76 96.5 97.5 ’83 p –1.0 4.6 [9.1/9.4]

GOS Gen. Cogn. Index 115 102.9 104.4 ’93 g –1.5 –.5 [9.1]

RAKIT Short.version IQ 205 98.0 97.9 ’84 g 0.0 4.1 [9.1/9.4]

LDT Total IQ 80 81.6 85.0 ’76 g –3.3 3.5 [9.4]Block patterns 71 92.3 97.8 ’76 p –5.5 3.1 [9.3]

WISC-R Total IQ 61 88.9 89.0 ’86 g –.1 3.3 [9.3/9.4]Performance IQ 61 88.9 88.6 ’86 p .3 4.6 [9.3/9.4]

American Norms Difference Mean Year without/with

Criterion Test N SON-R Crit. Type correction section

TONI-2 IQ Form A 153 102.4 103.5 ’90 p –1.2 1.4 [9.2]

DTVP-2 Gen. Vis. Perc. 153 102.4 109.2 ’93 p –6.8 –5.5 [9.2]

TOMAL Nonv. Mem. Ind. 153 102.4 97.5 ’94 p 4.9 5.7 [9.2]

WPPSI-R Total IQ 134 100.8 103.6 ’89 g –2.7 –.3 [9.5/9.6]Performance IQ 230 94.3 98.8 ’89 p –4.6 –1.6 [9.5/9.6]

English Norms Difference Mean Year without/with

Criterion Test N SON-R Crit. Type correction section

BAS Short.version IQ 58 83.6 90.6 ’79 g –7.1 –1.3 [9.7]

– year: year of publication of the manual– type: g = general, p = performance/nonverbal

Page 112: SON-R 2 - Tests & Test-research

112 SON-R 2,-7

most test manuals do not give any information about the period in which the norm data weregathered. If the interval between gathering the norm data and the publication of the test wasknown to be much longer than three years, this was taken into account (in the case of the GOSthe interval was six years). In the absence of reliable data about the obsolescence of the norms inrelation to country and test, the strength of the correction was based on the aforementionedAmerican results of the WPPSI-R and the WISC-III. For each year between the publication ofthe criterion test and the SON-R 2,-7, .34 point was deducted from the mean scores for generalintelligence measures and .43 point for performance and nonverbal measures.

When no correction was performed, the differences in means between the SON-R 2,-7 andthe criterion tests were slight for the tests that were standardized in the Netherlands after 1980.The scores on the SON-R 2,-7, however, were considerably lower than scores on the PreschoolSON (published in 1975) and also clearly lower than the scores on the LDT (1976). Aftercorrection for the year of publication, the scores on the SON-R 2,-7 were, in general, 3 to 4points higher than scores on the other tests that were standardized in the Netherlands. However,even after correction, the scores on the SON-R 2,-7 were 4 points lower than the scores on thePreschool SON. This supports the impression gained from practical experience that the normsof the Preschool SON were much too ‘easy’. The reason for the relatively large difference withthe mean scores on the BOS may be the fact that both tests were administered at two years ofage. A ‘ceiling’ effect occurs on the BOS at this age and a ‘floor’ effect occurs on the SON-R2,-7.

The fact that the scores on the SON-R 2,-7, after correction, were generally higher than onthe other tests could mean that the increase in the intelligence scores of Dutch children in thelast ten to fifteen years is less than we have assumed on the basis of the American data. It couldalso mean that a number of children from the special groups and the immigrant group, withwhom part of this research was carried out, profited more from the specific characteristics of theSON-R 2,-7, such as the nonverbal character and the feedback. When comparing the SON-R2,-7 with the most recently standardized test, the GOS, which was administered to 115 chil-dren during the standardization research, little difference was found in the mean scores.

When comparing tests, using American norms, the scores on the SON-R 2,-7 were lowerthan the scores on the American tests (with the exception of the TOMAL). However, aftercorrection, no differences were found, on average, with the different tests. The difference withthe total score on the WPPSI-R was minimal. With the PIQ the difference was -1.6 and with theIQ score on the TONI-2 the difference was 1.4. However, a large negative difference occurredwith the DTVP-2 and an equally large positive difference occurred with the TOMAL.

When the English norms for the BAS were used, a large difference, 7 points, was found.After correction for obsolescence of the norms, this difference practically disappeared.

These results indicate that a strong similarity exists in the development of the (nonverbal)intelligence of children in the Netherlands, the United Stated of America and Great Britain, andthat the Dutch age norms of the SON-R 2,-7 can be used in Western countries for a broadassessment of intelligence. However, standardizations conducted on a national level remainpreferable, in order to arrive at more precise norms at the subtest level, and at a better determi-nation of the dispersion and the form of the score distributions.

9.11 COMPARISONS IN RELATION TO EXTERNAL CRITERIA

Analysis of the relationship between the scores on the SON-R 2,-7 and the criterion testsshows the existence of a reasonable correspondence with general and nonverbal intelligencetests. In order to increase insight into the aspects on which the SON-R 2,-7 differs from othertests, a number of comparisons were made between the SON-R 2,-7 and other tests, in relationto external criteria. The external criteria were the assessment by the examiner of the ‘testability’of the child, background information such as the SES level and native country of the parents,and the assessment by teachers and institutional staff of intelligence and language development.

Page 113: SON-R 2 - Tests & Test-research

113

In the comparisons, correlations between the SON-R 2,-7 and a number of criterion tests withother variables, were calculated. The comparison between the SON and the other test wasalways based on the same group of children. As these correlations were examined within agroup, and not between groups, they were not corrected for the variance of the SON-IQ.

Evaluation of testabilityAs with the SON-R 2,-7, the children who completed the GOS 2,-4, or the RAKIT in theframework of the standardization research, were evaluated, after the test, by the examiner onmotivation, concentration and understanding of the directions. In table 9.16 the number of timesthe children were given the evaluation ‘good’ with relation to these aspects is presented for thechildren who were evaluated on the SON and the GOS (N=107), and for the children who wereevaluated on the SON and the RAKIT.

The children were more frequently evaluated as being well motivated and well concentratedduring the administration of the SON-R 2,-7 than during the administration of the GOS and theRAKIT. The difference in percentages varied from 10% to 18%. The evaluation of comprehen-sion of directions was also more often positive with the SON-R 2,-7; the difference with bothother tests was about 6%.

The percentages were lower in the comparison of the SON and the GOS than in the compar-ison of the SON and the RAKIT. This was the result of the younger ages at which the SON-GOScombination was administered.

The results are an indication that the attractiveness and variety of the testing materials of theSON-R 2,-7, the opportunity for the child to be active, the help and feedback given, the limitson the administration of difficult items, the absence of the necessity to talk, and the extensivedirections, have been successful in allowing the children to do the test in the best possiblecircumstances.

RELATIONSHIP WITH COGNITIVE TESTS

Table 9.16Comparisons between Tests of the Evaluation of the Subject’s Testability

Percentage of the children with evaluation ‘good’

N Motivation Concentration Compr. directions

SON-R 2,,,,,-7 107 79% 77% 79%GOS 2,,,,,-4,,,,, 107 64% 62% 74%

Difference 15% 15% 5%

Percentage of the children with evaluation ‘good’

N Motivation Concentration Compr. directions

SON-R 2,,,,,-7 169 91% 85% 88%RAKIT 169 81% 67% 81%

Difference 10% 18% 7%

Background variablesThe correlations of a number of criterion tests with the SES index and with the distinction nativeDutch subject or not, have been compared with the correlation between the SON-R 2,-7 andthese variables. The analyses were always carried out in the same group. Due to missing values,small differences in numbers occur in the correlation with SES index and native country; intable 9.17 the mean number is shown. Country of origin was dichotomised to form a nativeDutch group (children whose parents were both born in the Netherlands) versus a group ofchildren one or both of whose parents was born abroad. A positive correlation means that the

Page 114: SON-R 2 - Tests & Test-research

114 SON-R 2,-7

native Dutch children scored higher on the test. The comparisons were limited to the standard-ization research and the research at primary schools. In table 9.17 the column headed by‘difference’ shows the difference between the correlation of the SON-R 2,-7 and the correla-tion of the criterion test with the variable. A positive difference means that the SON-R 2,-7 hada stronger correlation with the background variable.

Most of the comparisons indicated that the SON-R 2,-7 correlated less strongly with theSES level of the parents than the other tests. In nine of the thirteen comparisons involvingabsolute differences of .05 or more, the differences were negative. The correlation of theSON-R 2,-7 with the SES index was considerably weaker (.10 or more) than the GOS, the totalIQ on the WISC-R, the DTVP (visual perception), the verbal subtests of the RAKIT, the verbalscale of the WISC-R and the TvK (language test). On the other hand, the correlation of theSON-R 2,-7 with the SES index was considerably higher (.10 or more) than those of the BOS,the TOMAL (nonverbal memory) and the performance scale of the WISC-R.

Nearly all comparisons showed that the differences in performance between the native Dutchand the immigrant children was smaller on the SON-R 2,-7 than on the other tests. This was

Table 9.17Comparisons Between Tests in Relation to Socioeconomic and Ethnic Background

Correlation with Correlation withSES index Dutch/Immigrant

Crit. SON-R Crit. SON-RStandardization Research N test 2,,,,,-7 Diff. test 2,,,,,-7 Diff.

SON-R 5,,,,,-17 118 .21 .24 .04 –.11 –.17 –.06

BOS 2-30 50 .12 .23 .11 .30 .03 –.27

GOS 2,,,,,-4,,,,, 115 .54 .39 –.15 .17 .06 –.11

RAKIT (Short.version) 168 .48 .43 –.05 .16 .16 .00

REYNELL/SCHLICHTINGMean LC, SD and WD 557 .39 .34 –.05 .16 .04 –.12

TvK (Mean of 5 subt.) 108 .52 .40 –.11 .23 .05 –.18

2nd Year Schooling Crit. SON-R Crit. SON-R(5-6 years old) N test 2,,,,,-7 Diff. test 2,,,,,-7 Diff.

TONI-2 141 .39 .48 .09 .06 .24 .18

TOMAL 141 .32 .48 .16 .16 .24 .07

DTVP-2 141 .59 .48 –.11 .36 .24 –.12

Crit. SON-R Crit. SON-ROVB-Schools N test 2,,,,,-7 Diff. test 2,,,,,-7 Diff.

LDT Block Patterns 65 .47 .49 .03 .06 .01 –.05

LDT (Verbal tests) 65 .57 .49 –.07 .07 .01 –.06

RAKIT (Verbal tests) 65 .59 .49 –.10 .07 .01 –.06

WISC-RTotal IQ 40 .49 .38 –.11 .27 .13 –.14 Verbal Scale .56 .38 –.18 .26 .13 –.13 Performance Scale .28 .38 .10 .22 .13 –.09

Page 115: SON-R 2 - Tests & Test-research

115

particularly so for the BOS, the GOS, and the DTVP, the total score on the WISC-R and theverbal scale of the WISC-R, and for the language tests (the Reynell/Schlichting Test and theTvK). Only on the TONI were the differences between the native Dutch and the immigrantchildren clearly smaller than on the SON-R 2,-7.

The differences between the SON-R 2,-7 and the SON-R 5,-17 were slight for bothbackground variables.

These comparisons demonstrate that the performance on the SON-R 2,-7 is less dependent onsocial and cultural differences than the performance on tests that (partially) require verbalknowledge and skills, like general and verbal intelligence tests and language tests.

Evaluation of intelligence and language skillsMost of the school-aged children in the standardization research were evaluated by the teacheras to ‘intelligence’ and ‘language development’. A large number of children from the specialgroups were also evaluated on these aspects by institute staff members. The correlationsbetween these evaluations and performances on the SON-R 2,-7 were discussed in sections 6.9and 7.6. Here we will examine the extent to which the correlations of other tests with theevaluation of intelligence and language development deviated from those of the SON-R 2,-7.In table 9.18 the correlations of the SON-R 2,-7, and of the criterion tests, with the evaluationof intelligence and language development are presented. The results on the performance scalesof the WPPSI and the WPPSI-R when the verbal scale was not administered, are combined, asare the results from complete administrations of the WPPSI and the WISC-R. The same wasdone in the case of the RAKIT, where either the shortened form or the four subtests wereadministered. In these cases the correlations were first calculated for each subgroup and subse-quently combined by weighting them in proportion to the number of persons in each subgroup.

Six of the correlations with the evaluation of intelligence showed an absolute differencelarger than .10. In four of these cases, the SON-R 2,-7 has a stronger correlation with theevaluation of intelligence. This is the case with two language tests and the Stutsman. In thespecial groups, the SON-R 2,-7 correlated more strongly with the evaluation of intelligencethan the RAKIT, but in the standardization research the association between the RAKIT and theevaluation of intelligence was stronger. Also, the LDT had a higher correlation with the evalua-tion of intelligence. The differences with the performance section of the LDT and the Wechslerwere slight. The SON-R 2,-7 had the same correlation as the SON-R 5,-17 with the evalua-tion.

In general, the SON-R 2,-7 correlated more weakly with the evaluation of language devel-opment than did the other tests. The six correlations with a difference greater than .10 all refer tothe total scores and to the verbal and memory section of the RAKIT, LDT and WPPSI/WISC-R.In the special groups, however, the SON-R 2,-7 had a stronger correlation than the RAKITwith the evaluation of language skills.

The fact that, in the standardization research as well as in the special groups, the correlationof the SON-R 2,-7 with the evaluation of language development deviated little from thecorrelations of the specific language tests, like the Reynell/Schlichting and the TvK, with thisevaluation, is noteworthy.

The comparisons with the other tests demonstrate that the SON-R 2,-7 correlates adequatelywith the evaluations of intelligence and language development. The SON-R 2,-7 correlatedmore strongly with the evaluation of intelligence than did language tests, while, surprisingly, thecorrelation with the evaluation of language development was approximately the same as that ofthe language tests. The correlations with the evaluation of intelligence and language develop-ment were similar to those of the performance scale of other tests. Correlations with the evalu-ation of intelligence tended to be a bit weaker than for general intelligence tests; correlationswith the evaluation of language development were clearly weaker.

RELATIONSHIP WITH COGNITIVE TESTS

Page 116: SON-R 2 - Tests & Test-research

116 SON-R 2,-7

Table 9.18Comparisons Between Tests in Relation to Evaluation of Intelligence and Language Skills

Correlation with Correlation withevaluation of evaluation ofintelligence language developm.

Crit. SON-R Crit. SON-RStandardization Research N test 2,,,,,-7 Diff. test 2,,,,,-7 Diff.

SON-R 5,,,,,-17 116 .47 .47 .00 .41 .34 –.07

RAKIT (Short.version) 158 .62 .42 –.20 .58 .43 –.14

REYNELL/SCHLICHTINGMean LC, SD and WD 285 .50 .47 –.03 .54 .49 –.05

TvK (5 subtests) 95 .56 .62 .06 .51 .56 .05

Crit. SON-R Crit. SON-RSpecial Groups N test 2,,,,,-7 Diff. test 2,,,,,-7 Diff.

P-SON 152 .75 .74 .00 .33 .35 .02

STUTSMAN 40 .61 .79 .18 .64 .70 .06

WPPSI/WPPSI-RPerformance Scale 39 .45 .49 .03 .41 .35 –.06

WPPSI/WISC-RTotal IQ 68 .74 .65 –.09 .74 .55 –.19 Verbal Scale .63 .65 .01 .73 .55 –.18 Performance Scale .70 .65 –.05 .62 .55 –.06

LDTTotal IQ 77 .69 .53 –.16 .49 .23 –.26 Performance subtests .50 .53 .03 .19 .23 .05 Memory subtests .55 .53 –.02 .39 .23 –.15 Verbal subtests .52 .53 .01 .52 .23 –.29

RAKITShortened version ormean of 4 subtests 69 .49 .60 .11 .27 .41 .14

REYNELLLanguage compreh. A 141 .55 .75 .21 .29 .36 .08

TvKMean of 4 subtests 49 .45 .72 .27 .29 .27 –.02

Page 117: SON-R 2 - Tests & Test-research

117

10 IMPLICATIONS OF THE RESEARCH FORCLINICAL SITUATIONS

In the previous chapters a detailed description was given of the results of the research carriedout with the SON-R 2,-7 to date. A summary of important results is presented here. In thesummary, the following questions will be answered:– have the objectives of the revision been realized,– does the test provide a valid measurement of intelligence,– for whom is the test suitable,– how should the results be interpreted.

10.1 THE OBJECTIVES OF THE REVISION

The most important objectives in the revision of the Preschool SON were:– actualizing and improving the testing materials,– clarifying the directions,– determining accurate and differentiated norms,– increasing the reliability and generalizability,– limiting the duration of the test by an adaptive procedure,– realizing a good correspondence with the SON-R 5,-17.

Testing materialsNew testing materials were developed, and existing material was completely renewed. Thenumber of items almost doubled and the number of subtests was increased from five, as in thePreschool SON, to six. Our experience with the test suggests that the materials are attractive forchildren and that the drawings and directions are clear. The storage system has been greatlyimproved and the materials are very manageable and durable.

DirectionsThe description of the directions is much more detailed than in the previous version of the test.This requires a greater effort from the examiner in learning how to administer the test. However,it prevents the examiner from giving a personal interpretation of the directions, which wouldresult in the test not being administered in a standardized manner. The directions leave sufficientroom to adapt the administration to specific characteristics of the child.

The directions show clearly how to provide feedback and help. This is important as the wayin which directions are given differs from that of most intelligence tests. In comparison with thePreschool SON, feedback and help are offered more consistently and are described in moredetail.

NormsIn the Preschool SON, age norms were given only for the total score on the test. The SON-R2,-7 has norms at the level of the subtests, the scale scores (SON-PS and SON-RS), and thetotal score (SON-IQ). Furthermore, the general norms are based on a large sample of 1124children, instead of 500 children as with the Preschool SON. The statistical fitting procedureused with the SON-R 2,-7 increases the accuracy of the norms still further. Weighting the

Page 118: SON-R 2 - Tests & Test-research

118 SON-R 2,-7

sample with respect to a number of variables related to intelligence (SES level, mother’s countryof birth, and sex) prevents differences between age groups, with regard to these variables, frominfluencing the accuracy of the norms.

The age range of the norms has been extended, for practical purposes, from 2;6 to 8;0 years.The norms are rather precisely differentiated according to age. In the norm tables monthly normgroups are distinguished, whereas the computer program bases the norms on the exact age.

Differentiated norms are very important for testing young children. For each age group in thestandardization research, the change in the IQ score that would result from using the norms forchildren who were one month older was determined (table 10.1). For the two and three-year-olds the difference was approximately 3 IQ points, for the four and five-year-olds it was 2 IQpoints and for the older children 1 IQ point. If three-monthly norm tables had been used (as isthe case in WPPSI-R), then the administration of the test one day earlier or later could result ina difference of 9 IQ points for a child on the border between two age groups. The systematicdeviation (upwards or downwards) on the borderline between two age groups of the tables isthen 4 to 5 points. In the Preschool SON, with age groups of half a year, these deviations wereeven larger. By using monthly age groups, the systematic deviations for the youngest childrenare at most 1 or 2 IQ points with the SON-R 2,-7.

The most precise results, certainly for the youngest age groups, are obtained with the compu-ter program. Furthermore, the program makes an accurate calculation of the scaled test resultspossible when the test has not been administered in full. Finally, the program can calculate thereference age for the total scores. This can only be approximated using the tables.

Table 10.1Mean Change in IQ Score Over a Period of One Month

Age Diff. Age Diff. Age Diff.

2;3 years 3.9 4;3 years 2.3 6;3 years 1.42;9 years 3.3 4;9 years 2.0 6;9 years 1.13;3 years 2.9 5;3 years 1.8 7;3 years .93;9 years 2.6 5;9 years 1.6

Reliability and generalizabilityAn important objective in the revision of the test was to improve the reliability and generaliza-bility. The mean reliability of the test increased from .82 to .90, and the generalizability from .64to .78. All age groups showed an improvement but the more extreme ages showed the greatestimprovement.

The objective for the revision was a reliability of .90 and a generalizability of .80. From threeyears onwards the results correspond closely to these values. For the two-year-olds however, thereliability (.86) and the generalizability (.71) are lower than the goal we had set; for the seven-year-olds the reliability (.92) and the generalizability (.82) are higher.

When calculating the reliability of the subtests, the problem arises that two types of inter-dependence occur. This occurs on the one hand because of the adaptive procedure (which leadsto an overestimation of the reliability), and on the other hand because of the feedback (whichleads to an underestimation of the reliability). Such interdependence makes the calculation ofthe reliability of the subtests and the total score less accurate. However, the mean value of .90for the reliability seems to be a realistic estimate. Lower reliability for the SON-R 2,-7 than forthe SON-R 5,-17 (.93) was consistent with expectations based on the younger age at the timeof administration of the SON-R 2,-7, the smaller number of subtests, and the shorter durationof the test administration.

However, the reliability of .90 is clearly higher than the generalizability of the total score ofthe test (.78). This is also self-evident. If the reliability of the SON-R 2,-7 were the same as the

Page 119: SON-R 2 - Tests & Test-research

119

generalizability, this would imply that the subtests have no specific reliable variance and that auniform level of ability determines the performance on all test items. Research with the SON-R5,-17, however, has shown that the proportion of specific reliable variance of the subtestsactually increases as the age of the children being examined decreases.

Adaptive procedureThe adaptive procedure, in which the entry and discontinuation rules are applied, was developedto limit the duration of the administration of the test, and to improve the motivation of thechildren. The administration of ‘childish’ items, ones that are much too easy, as well as theadministration of items that are too difficult, has a demotivating effect. Especially for childrenwho are uncertain and often feel that they are failing, being confronted with tasks that are abovetheir level can be very frustrating. Because the administration of a subtest is discontinued aftera maximum of three mistakes, it was often possible to administer the test to children who wereotherwise difficult or impossible to test.

The mean duration of administration, in the different groups of children who have beenexamined, was less than one hour. For very young children the duration of administration wasmuch shorter and for older children the duration of administration was somewhat related to thelevel of their ability: children who performed relatively well completed more items and thistook more time.

Correspondence with the SON-R 5,,,,,-17In the first two versions of the SON tests no distinction was made between a test for the youngerand a test for the older children. This distinction was first made in the construction of thePreschool SON and the SSON. However, there were large differences between the PreschoolSON and the SSON in both content and manner of administration (see section 1.2). Oneobjective during the construction of the SON-R 2,-7 was to achieve a good correspondencewith the SON-R 5,-17, which was published in 1988. A strong similarity in content now existsbetween the difficult items of the subtests Mosaics, Categories, Analogies and Situations of theSON-R 2,-7 and the easy items of the corresponding subtests of the SON-R 5,-17. In bothtests an adaptive procedure is used and feedback is given. In the case of the SON-R 5,-17,however, feedback is limited to indicating whether a solution is correct or incorrect. Both testsalso use highly differentiated norms that can be calculated with a computer program.

Strong similarities exist between the materials and the procedures used on the two tests. Thecorrelation between the SON-R 2,-7 and the SON-R 5,-17, with an interval of three to fourmonths between tests, was considerable (r=.76) and not much lower than the retest correlationof the SON-R 2,-7 for children from 4;6 years onwards (r=.81). Sattler (1992) considers anoverlap in the content of tests such as the WPPSI-R and the WISC-III not very desirable,because the tests can no longer be administered within a short period as independent tests. Thereason for the overlap of the age norms in the SON tests, however, is not to make retests withina short period possible, but to offer a choice that is optimal with respect to the age, skills andspecific problems of the child.

10.2 THE VALIDITY OF THE TEST

The question as to the validity of the test is primarily a question of whether the SON-R 2,-7 is agood and usable measure of intelligence. This question is especially important because the test’snonverbal character limits the manner in which intelligence can be measured. However, the ques-tion is also difficult to answer because intelligence is not an accurately defined and demarcatedconcept. The research carried out does, therefore, not supply an unambiguous answer as to ‘the’validity. It does, however, provide more insight into the positive aspects of the test and of thelimitations that must be taken into account in the interpretation of the results. In the discussion ofvalidity we make distinctions between the contents of the test, the relationship with other indica-tors of intelligence, and the relationship with other cognitive and non-cognitive variables.

IMPLICATIONS OF THE RESEARCH FOR CLINICAL SITUATIONS

Page 120: SON-R 2 - Tests & Test-research

120 SON-R 2,-7

Contents of the testThe subtests of the SON-R 2,-7 are all aimed at solving problems which require spatial insightand the ability to reason abstractly and concretely. Performance depends less on acquiredknowledge than on the ability to discover methods and rules, and to apply these to new andgradually more complex situations and materials. In this way the SON-R 2,-7 corresponds tothe definitions of intelligence as problem solving ability and ability to learn, and so emphasizes‘fluid intelligence’ rather than ‘crystallized intelligence’ (Cattell, 1971). However, this does notmean that experience gained by the children will not influence their ability to solve the prob-lems.

The performance on the test improves greatly with age. In the range 2;3 to 7;3 years, 86% ofthe variance in the raw total score was explained by age. This means that the SON-R 2,-7 isprimarily a developmental test that registers large differences in cognitive development. Thisalso means that highly differentiated norms are required, and that high demands are made on thetest as the age variance is not relevant for the reliability and validity of the scaled scores.

The mean of the correlations between the scaled subtest scores of the SON-R 2,-7 was .36.The correlations increased with age. The relatively low level of the correlations emphasizes theimportance of basing an evaluation of intelligence on a variety of test items. The commonvariance of the subtests is determined mainly by one factor. Furthermore, a distinction can bemade between the more spatial, visual-motor, performance tests (Mosaics, Puzzles andPatterns) and the tests aimed at concrete and abstract reasoning (Categories, Analogies andSituations). The correct solutions to these last tests are reasoned out and selected, whereas thesolutions of the performance tests are constructed.

Congruent validityThe relationship with other indicators of intelligence was investigated by examining the correla-tions with evaluations by other persons, and with performance on other intelligence tests.

The correlation between the SON-IQ and primary school teacher’s evaluations of thechildren’s intelligence was .46. For children from special education programs and at medicalpre-school daycare centers, the SON-IQ had a correlation of .66 with the evaluation of thecognitive development given with the referral to the school in question. The evaluation ofintelligence, generally given after the administration of the test for the children from thesegroups, had a correlation of .68 with the SON-IQ. For the children with a language/speech/hearing disorder, the correlation between the SON-IQ and the subsequent evaluation of intelli-gence was .61. The fact that the correlation within mainstream primary education was lowerthan within the special groups, is not surprising. In the first years of primary education, cogni-tive development is not studied systematically. However, children in the special groups aregiven an extensive psychological examination at the time of admission, and subsequently,intelligence tests are administered at regular intervals to follow their development.

In comparison with general, partially verbal intelligence tests, the correlations between theSON-R 2,-7 and the evaluation of intelligence are slightly lower. In comparison with theperformance section of such tests and the Stutsman, the correlations are practically similar orhigher.

The mean correlation of the SON-R 2,-7 with general intelligence tests and with nonverbaltests was .65. Approximately half the correlations with general intelligence tests lay between.59 and .70 and approximately half the correlations with nonverbal (intelligence) tests laybetween .59 and .75.

Correlations higher than .70 were found with the total score on the WPPSI-R (.75), theWISC-R (.74) and the shortened version of the BAS (.87); with the performance section of theWPPSI(-R) and WISC-R (.73 and .83) and the BAS (.78); with the DTVP-2 (.73) and with theSON-R 5,-17 (.76). These results indicate that a reasonably strong correlation existed betweenthe score on the SON-R 2,-7 and a large number of very diverse (nonverbal) intelligence tests.However, they also indicate that the child’s performance on the SON-R 2,-7 can differ greatlyfrom his or her performance on another test. These differences can be much larger than may be

Page 121: SON-R 2 - Tests & Test-research

121

expected solely on the basis of the reliability of the tests. Four different causes can be indicated:– differences in content and procedure between the tests,– fluctuations in performance,– stable changes,– limitations of the research.

Differences in contentLarge differences in content can exist between the SON-R 2,-7 and other intelligence tests.Specifically verbal subtests are absent in the SON, as are memory tests and tests in which aseries of actions must be imitated, such as the sequential subtests of the K-ABC. Further, thesubtests of the SON-R 2,-7 do not have tempo characteristics, whereby simple tasks must becompleted as quickly as possible. In addition to these differences in the content of the items,there are differences in the manner of administration that may influence the results: for instance,the help and feedback given during the administration of the SON-R 2,-7 and the limitednumber of mistakes before the test is discontinued.

Fluctuations in performanceFluctuations in the performances of (young) children can also be an important cause of thedifference in scores. The retest correlation of the SON-R 2,-7, with an interval of three to fourmonths, was .79. This was clearly lower than the reliability of the test (.90), which is based onthe internal consistency. The idea that, in this short period, large stable changes occur effectingthis difference, is not plausible. The retest correlation may have been influenced slightly nega-tively by the fact that the learning effect that occurs with a retest is not the same for each child.The study of differences in scores between the performance scale of the WPPSI-R and the SON-IQ shows that large differences in performance may occur that cannot be explained solely on thebasis of content, stable changes, or errors of measurement. The relationship between the per-formance on the different subtests of the test was less strong, and the correlation of the SON-IQwith other test scores were lower for the younger children. Problems with concentration andmotivation during the administration of the test occurred more often at younger ages. In the agerange for which the SON-R 2,-7 is intended (and especially for the youngest children), fluctu-ations in performance that are difficult to predict will have to be taken into account. All kinds offactors can influence performance: how much at ease a child is with a specific examiner,something that happened on the day of the test administration, feelings of anticipation, physicalcondition, like tiredness or beginning influenza, etc. Fluctuations in performance, not related tocharacteristics of the subtests, can also occur during the test administration. This could be due tofactors like tiredness during the course of the test administration, physical discomfort, orincreasing motivation as the child begins to feel more at ease in the test situation.

Stable changesMore stable changes in ability may lead to differences in scores when a relatively large intervaloccurs between the administration of the tests. The rate at which children develop is not thesame for everyone and will fluctuate. Various factors influence the cognitive development ofchildren. Large changes in the circumstances in which a child grows up, and important eventsmay slow development down or, alternately, remove impediments to development. In variouscorrelational research projects with the SON-R 2,-7, the interval between the administration ofthe tests was more than one year and differences in rate of development definitely affected partof the correlations negatively.

Limitations of the researchIn the different phases of the research – administering the tests, scoring, calculating the age,determining the scaled scores, recording and processing the data – mistakes can also bemade that influence the results. One example is the switching of subjects when matching thedata. This happened during the comparative research of the SON-R 2,-7 and the WPPSI-R(Tellegen, 1997). In large scale research such mistakes can be, and are, made. Also, knowing the

IMPLICATIONS OF THE RESEARCH FOR CLINICAL SITUATIONS

Page 122: SON-R 2 - Tests & Test-research

122 SON-R 2,-7

specific conditions under which each test was administered, and evaluating whether each ad-ministration occurred according to the standard directions becomes more difficult. This prob-lem occurred, for instance, with the test results supplied by the special schools. The extent towhich the standard test administration procedure may have had to be adapted to the specificproblems of different children is not known for these results.

The inaccuracy of norm tables can also be a source of differences between scores. However,in the case of tests with very broad norm tables (half-yearly or yearly), this was corrected forduring this research. When comparing the test scores, poor correspondence between normsbecause of obsolete norms or because the norm groups are not comparable, can lead to largedifferences between scores. However, this has generally no effect on the correlations.

During this research, the administration of other tests was, for practical purposes, sometimeslimited to a shortened version, or, in connection with handicaps of the children, to the nonverbalor performance section. This often limited the reliability and validity of the criterion tests andalso the strength of the correlations.

ConclusionsThe limitations of the reliability of the test, the specific characteristics of the contents of theSON-R 2,-7, and the instability of the test performances of young children seem to us to be themost important causes of the differences in scores between the SON-R 2,-7 and other (non-verbal) intelligence tests. These factors, which lead to lower correlations, play a smaller partwhen children are older. When the influence of stable changes and the limitations of theresearch are taken into account, it is realistic to take a value of approximately .70 for thecorrelation of the SON-R 2,-7 with other intelligence tests as a point of departure, if theinterval between the administration of the tests is not longer than one year.

Based on this evaluation of the correlation with other intelligence tests, and the data on thereliability (based on internal consistency) and the stability, the variance of the SON-IQ can beroughly described as follows (see figure 10.1):– 10% measurement error variance,– 10% reliable unstable variance,– 10% reliable test-specific variance,– 70% stable reliable variance that is generalizable to other tests.The last component, the variance that the SON-R 2,-7 has in common with other intelligencetests administered at a different time, is the most relevant for the evaluation of intelligence, andthe value .70 can be seen as an indication of the validity. However, for very young children, thevalidity will be lower due to the lower reliability of the test and greater instability; in olderchildren the validity will, in accordance, be higher. The proportion of test-specific variance will,of course, also depend on the extent to which the criterion tests correspond in content andprocedure with the SON-R 2,-7. The validity in this case is based on the correlation with a(nonverbal) intelligence test administered at another time. However, if we could correlate thescores on the SON-R 2,-7 with the ‘ideal’ score based on a large number of other tests,administered at different times within one year, then the correlation would equal ,.70 and thevalidity coefficient would be .84.

Construct ValidityThe objective during the development of the first SON test was to overcome the one-sidednessof the existing performance tests, and to incorporate tasks related to abstract and concretereasoning in the nonverbal test. The aim of the SON-test was, and is, to measure generalintelligence as precisely as possible, within the limitations of a nonverbal administration. Theresults of the factor analyses on the subtests of the SON-R 2,-7, carried out with very diversegroups of children, in the Netherlands as well as abroad, support the distinction between threeperformance tests (Mosaics, Puzzles and Patterns) and three reasoning tests (Categories,Analogies and Situations). This is a relative distinction, as the largest part of the commonvariance of the subtests can be reduced to one general factor.

The significance of the two scales is confirmed by the correlations with other tests. The

Page 123: SON-R 2 - Tests & Test-research

123

Performance Scale had a relatively strong correlation with the performance scale of the Wechslertests, the performance section of the LDT and with the DTVP-2. The Reasoning Scale had arelatively strong correlation with the verbal section of the Wechsler tests, the verbal section ofthe LDT and the BAS, and with the Reynell Test for Language Comprehension. These resultsshow that a broader domain of intelligence is measured by the SON-R 2,-7 than by tests thatconsist exclusively of a performance section.

MemoryThe SON-R 2,-7, like the SON-R 5,-17, has no specific memory tests. The correlations of theSON-IQ with two memory tests of the LDT (.43), the TOMAL (.45), the sequential develop-ment index of the K-ABC/GOS (.29 and .49), and with auditory memory in the Schlichting Test(.27), were moderately positive. These relatively low correlations argue strongly in favor ofexamining intelligence and memory separately. Incorporating a few memory subtests in anintelligence test is too restricted a basis for a valid assessment of memory. In addition, theinterpretation of the intelligence score becomes more difficult because memory is a separatefactor.

Visual perceptionThe correlation of .73 with the DTVP-2, a test for visual perception, shows that visual percep-tion is strongly represented in the SON-R 2,-7. The Performance Scale in particular wasstrongly related to the DTVP-2. Perception, in this context, is not ‘passively’ seeing, but com-prises structuring, evaluating and comparing visual information.

Motor skillsThe subtests of the Performance Scale of the SON-R 2,-7 in particular require visual-motorskills. In primary education, the correlation of the SON-IQ with the teacher’s evaluation ofmotor development was low (.24). However, for children with a developmental delay or with alanguage/speech/hearing disorder, these correlations were higher (.46 and .32).

Verbal skillsKnowing the relationship between the SON-R 2,-7 and verbal intelligence and language skillsis important if we are to be able to judge to what extent the domain of intelligence is restricted

IMPLICATIONS OF THE RESEARCH FOR CLINICAL SITUATIONS

Figure 10.1The Components of the Variance of the SON-R 2,-7 IQ Score

Variance of measurement error 10%

Unstable variance 10%

Test-specific variance 10%

Stablegeneralizablevariance 70%

≥≤≤≤≤≤¥≤≤≤≤≤≤µ

≥≤≤≤≤≤¥≤≤≤≤µ

reliability(.90)

stability(.80)

proportionof validvariance(.70)

dolfhartsuiker
Line
dolfhartsuiker
Line
dolfhartsuiker
Line
dolfhartsuiker
Line
dolfhartsuiker
Line
dolfhartsuiker
Line
dolfhartsuiker
Line
dolfhartsuiker
Line
dolfhartsuiker
Line
dolfhartsuiker
Line
dolfhartsuiker
Line
Page 124: SON-R 2 - Tests & Test-research

124 SON-R 2,-7

by the nonverbal character of the test. However, verbal intelligence is not a clearly definedconcept. On the basis of factor-analytical research, a distinction is made by Kaufman (1975) inthe verbal scale of the WISC-R between the factors ‘Verbal Comprehension’ and ‘Freedom fromDistractibility’. Further, the factor ‘Verbal Comprehension’, includes quite diverse subtests, forinstance, Similarities, a verbal subtest of abstract reasoning, and Vocabulary, a test tappingverbal knowledge. The skills required for the subtest Similarities belong to the intelligencedomain that the SON-R 2,-7 is intended to measure. The performance on a subtest like Vocab-ulary is so dependent on the circumstances in which a child grows up that a nonverbal alterna-tive for this test is not feasible for the SON-R 2,-7. On the K-ABC, subtests which clearly tapverbal knowledge are scored separately and are not included in the calculation of the mentaldevelopment index.

The fact that a precise distinction between intelligence, verbal intelligence and languageskills cannot be made is shown by the correlations of the SON-IQ with evaluations of intelli-gence and language skills. In the case of children in primary education, the correlation with theevaluation of intelligence was .46 and the correlation with the evaluation of language develop-ment was only slightly lower, .44. However, clear differences in the correlations with theseevaluations were found for children with a developmental delay (.68 versus .48) and for childrenwith a language/speech/hearing disorder (.61 versus .31).

The correlations of the SON-IQ with the verbal section of intelligence tests, and with tests oflanguage skills and language development were in the order of .50. Taking into account the factthat the SON-R 2,-7 can be administered completely without using any language, these corre-lations are considerable. The Reasoning Scale of the SON-R 2,-7 contributed most to thesecorrelations.

Socio-economic differencesThe SON-IQ had approximately the same association with the SES level of the parents as other(nonverbal) intelligence tests. However, the correlation with SES level was less strong than forlanguage tests and the verbal section of general intelligence tests. In comparison with mostother tests, smaller differences were found between immigrant and native Dutch children whenusing the SON-R 2,-7.

ConclusionsThese results support the conclusion that the concept of intelligence measured by the SON-R2,-7 corresponds broadly with what is considered to be general intelligence. The SON-R 2,-7emphasises visual-motor and perceptual skills, spatial insight, and the ability to reason abstract-ly and concretely. This corresponds with the factors ‘Fluid Intelligence’ and ‘Broad VisualPerception’ of Carroll’s classification (1993). Memory, knowledge, and language skills have anindirect association with performance, but the measurement is not based on these skills. The testis less dependent on socio-economic factors than are verbal tests, and can best be defined as anonverbal, general intelligence test with an emphasis on ‘fluid intelligence’ and ‘visual percep-tion’.

10.3 THE TARGET GROUPS

Over the years the SON test has developed from an intelligence test for deaf children to ageneral nonverbal intelligence test that is especially suitable for children with communicativehandicaps, for example, children with language/speech/hearing disorders, autistic children, andchildren raised with a different language or bilingually. The test is also highly suitable forchildren who are difficult to test, who have learning problems, or a developmental delay. Asreference ages were also calculated in the standardization, the test can be used for older mental-ly deficient or mentally handicapped children and adults. The test is less suited to certaincategories of children, e.g., children with visual handicaps and children with serious motorhandicaps.

Page 125: SON-R 2 - Tests & Test-research

125

Communicative handicapsThe research with native Dutch children who were deaf but not multiple handicapped found thatthe mean IQ score (97.9) deviated only slightly from the score of the hearing population. Aswith the SON-R 5,-17, the lower scores in the group of deaf children related only to Categoriesand Analogies, the subtests for abstract reasoning.

The mean score of the children with a language/speech and/or hearing disorder was approx-imately 90. However, this group of children cannot be compared very well with the deafchildren. On one hand, the group included children with multiple handicaps. On the other hand,children with a language/speech/hearing disorder, who were functioning well in regular educa-tion were not included in the research group.

Administration of the SON-R 2,-7 appears to be quite possible for children with communi-cative handicaps. Cooperation and comprehension of the directions were judged to be ‘good’ bythe examiner for 80%-90% of these children. Motivation was judged to be ‘good’ in approxi-mately 70%. Problems with concentration were most frequently mentioned: with about 40% ofthe children rated as moderate or fluctuating. The test could be administered in full to practical-ly all the children.

In the case of the above mentioned children a nonverbal test is necessary for a valid evalua-tion of the level of intelligence, as the delay in verbal development may result completely orpartially from this handicap, and bear little relation to other aspects of cognitive development.In this group the SON-IQ correlated clearly with the evaluation of intelligence (r=.61), whereasthe correlation with the evaluation of language development was much lower (r=.31).

Developmental delay/disorderThe research on children with a developmental delay and developmental disorders was carriedout with children at schools for special education with a pre-school department, at medical pre-school daycare centers, and with children with pervasive developmental disorders. With thesechildren, multiple social, emotional and behavioral problems often occur, as well as delays incognitive, verbal and motor development.

The mean SON-IQ for this group was approximately 80. A considerable delay was foundon all subtests. Large differences in scores were found within the group. Approximately 10%of the children had a score close to 50, and slightly more than 10% had a score higher than100. Performance on the test corresponded strongly with the diagnostic evaluation of cogni-tive development at the time of admittance to the school/institute (r=.66), and with theevaluation of intelligence that was made later by other professionals involved with thechildren (r=.68).

The children in this group were more difficult to test. Motivation, concentration, cooperationor comprehension of the directions were more often rated as moderate or fluctuating by theexaminer than in the group of children with communicative handicaps.

Children who are difficult to testIn an ideal testing situation the children are well motivated to complete the test, they compre-hend the directions, and they work concentratedly until they have finished the tasks. In practicethis may be different. In the case of (young) children, motivation and concentration cannot beexpected to be present beforehand. The testing situation, the materials and the interaction withthe examiner must be such that the child becomes interested, and the course of the administra-tion should be structured in such a way that the interest is held.

The comparison of examiners’ evaluations of the children’s testability, after administrationof the GOS 2,-4,, the RAKIT and the SON-R 2,-7, showed that the children were bettermotivated and concentrated during the administration of the SON-R 2,-7. They also under-stood the directions better than with the two other, partially verbal, tests. A number of differentcharacteristics of the SON-R 2,-7 probably played a role here. The nonverbal character of thetest, whereby the child may, but does not have to talk is attractive for children who are shy orguarded toward adults. The help and feedback offered encourages the child to complete thetasks. This lessens the feelings of failure, helps clarify the objectives of the tasks, and leads to a

IMPLICATIONS OF THE RESEARCH FOR CLINICAL SITUATIONS

Page 126: SON-R 2 - Tests & Test-research

126 SON-R 2,-7

more natural interaction between examiner and child. Strictly limiting the number of itemscompleted incorrectly also prevents the child from quickly becoming demotivated. Further-more, the SON-R 2,-7 has very varied test items, with which the child is constantly andactively involved.

These qualities make the SON-R 2,-7 attractive for use with children who do not have aspecific communicative handicap, but whose social, emotional and behavioral problems mayinterfere with the administration of a more traditional intelligence test.

The mentally handicappedIn the case of the mentally handicapped, the administration and interpretation of intelligencetests are difficult, because a large discrepancy exists between their chronological age and thelevel at which they function. When a test that is appropriate for their age is administered, thescaled scores will often have the lowest value, making further differentiation according to levelimpossible. However, when a test that corresponds to the level of the subject is chosen, forexample a test for young children, a reference age can be calculated, but not an IQ score or anyother standard score scaled for age. The administration of a test approprate to the level of thesubject will often be preferred. The manner of administering the tasks, and the level of difficultyof the tasks, will then correspond well to the abilities of the subject, and the administration willbe more motivating than the administration of a test at too high a level.

When they are no longer in a period of rapid development, the reference ages of mentallyhandicapped subjects can be compared. However, if the research is carried out on mentallyhandicapped children who are still developing, the comparison of the reference age must belimited to persons of approximately the same age.

Using the SON-R 2,-7 and the SON-R 5,-17, research was carried out, on a limited scale,with children and adults who were mentally handicapped (Wijnands, 1997). The correlationbetween the reference age on the SON tests and the reference age based on various other tests,including different versions of the Wechsler tests, the BOS 2-30 and the MSCA (n=26), was.79.The positive points of the SON-R 2,-7, mentioned above in relation to children who aredifficult to test, appear to be very important as well when testing persons with a mental handi-cap. These people often have a great fear of failure, and the help and feedback, and the discon-tinuation after a limited number of mistakes, contribute to their motivation and enjoyment incompleting the test.

Immigrant childrenTesting immigrant children with traditional intelligence tests can lead to an underestimation oftheir cognitive potential. This occurs because no account is taken of the fact that lack ofknowledge of, and skill in the language of the examiner does not necessarily indicate that theverbal capacities of these children are lower. A lower level of performance on the verbal sectioncan, but it does not necessarily, indicate a lower level of intelligence. The performance on theperformance section of these tests can also be ‘biased’ because the directions are usually givenverbally. Correlational comparisons between the SON-R 2,-7 and a number of other testsshowed that, in most cases, the differences between native Dutch and immigrant children weresmaller when the SON-R 2,-7 was used. On the SON-R 2,-7, the difference in IQ scoresbetween native Dutch and immigrant children was 7.5 IQ points, half a standard deviation.Children with one parent born outside the Netherlands scored as high as native Dutch children.Turkish and Moroccan children scored approximately 10 points higher on the SON-R 2,-7 thancomparable groups tested with the RAKIT, and 6 points higher than comparable groups testedwith the LEM.

The delay that was found in the group of immigrant children was comparable to the delayfound in native Dutch children with parents of the same SES level. Research on immigrantchildren who participated in OPSTAP(JE) showed that, after a two-year stimulation program,these children performed at the mean level of native Dutch children. However, selection forparticipation in OPSTAP(JE) and/or the research may have contributed to these relatively goodperformances.

Page 127: SON-R 2 - Tests & Test-research

127

There were no indications that the contents of the pictures in the different subtests caused extraproblems for the children with a different cultural background. We assume that depicting chil-dren with a non-western appearance contributed to making the SON-R 2,-7 test materialsrecognizable to these children.

All results indicate that the test can be used effectively with immigrant children. Of course,an evaluation of these children’s language skills, and of the extent of their knowledge of theDutch language, can also be important. However, this must not be confused with intelligence,nor should language skills directly influence the evaluation of intelligence.

Visual handicapsThe SON-R 2,-7 has a strong visual orientation. All subtests use pictures. When vision isgreatly impaired and not compensated by glasses or other means, use of the SON-R 2,-7 mustbe strongly discouraged. Adapted tests are available for these children (Dekker, 1987). Slightlimitations of vision will probably not be a problem. The pictures in the test are large and clearand do not require the ability to discriminate small visual differences.

Motor handicapsVarious tasks of the SON-R 2,-7 require motor skills and eye-hand coordination. During theconstruction of the test an effort was made to minimize the influence of this on the evaluation ofperformance. In the subtest Patterns wide criteria for the evaluation of the drawings are used. Inthe subtests Puzzles and Mosaics, frames are used to make it easier for young and poorlycoordinated children to perform the tasks well. Furthermore, the time limits, in as far as they areapplied, are broad and speed is not scored. In the case of children with more serious motorhandicaps, the possibility that these handicaps may influence performance negatively should beconsidered. In chapter 11, possibilities for adapting the administration procedure to the child’slevel of motor skill are discussed.

Use of the test in other countriesThe nonverbal character of the test, and the availability of the manual in different languages,mean that the test can easily be used in other countries. Research in Australia, Great Britain andthe United States of America has shown that the testing materials are very usable in thesecountries. A problem occurred now and then with one or two items, for example, the exampleitem of the subtest Situations in which a rabbit in a cage is being fed. Having a rabbit as a pet inAustralia, unlike the Netherlands, is unusual. However, such small problems with the testingmaterials will influence the validity of the test in these countries slightly, or not at all. If the testis used in countries and cultures that differ greatly from the Netherlands, or more generally,from Western countries, one should check whether the testing materials are sufficiently recog-nizable or need adapting.

Comparison of the scores on the SON-R 2,-7 according to Dutch norms, and the scoreson other tests according to English and American norms, has shown that the mean scoresdiffer only slightly when the period between the different standardizations is taken intoaccount. For countries that have a comparable socio-economic level to the Netherlands, theDutch norms of the SON-R 2,-7 can be used to evaluate intelligence. However, nationalnorms remain preferable to improve the standardization for the country in question. Onemust keep in mind, however, that many of the national norms currently in use probablyproduce greater distortions than the Dutch norms for the SON-R 2,-7. At least for the timebeing, these are up to date.

AgesThe SON-R 2,-7 was originally intended for the age range 2, to 7 years. However, the normsof the tests were constructed for the age range 2 to 8 years. In the following section we will showhow the test can be used for a number of different age groups. The question when the SON-R5,-17 should be preferred to the SON-R 2,-7 will also be discussed here.

IMPLICATIONS OF THE RESEARCH FOR CLINICAL SITUATIONS

Page 128: SON-R 2 - Tests & Test-research

128 SON-R 2,-7

2;0 – 2;5 yearsThe test is used experimentally. At this age considerable floor effects occur and reliabilityand generalizability are low. The motivation and concentration of children at this age areoften insufficient to allow completion of the test. The test can be diagnostically interestingwhen a child has a high score on a test with strong ceiling effects in this age group, forinstance, the BOS 2-30.

2;6 – 2;11 yearsThe test is usable in this age group. Moderate floor effects occur. Reliability and generaliza-bility are reasonable. However, difficulty coping with the test situation can be a problem,especially for children with specific problems and handicaps.

3;0 – 5;5 yearsIn this age group the test can be used to good effect. Floor or ceiling effects rarely occur.Reliability and generalizability are good.

5;5 – 5;11 yearsThe SON-R 5,-17 has also been standardized for this age group. However, the SON-R2,-7 is more suitable because various subtests of the SON-R 5,-17 have a strong flooreffect at this age.

6;0 – 6;11 yearsBoth the SON-R 2,-7 and the SON-R 5,-17 are highly suitable for this age group. TheSON-R 2,-7 has slight ceiling effects. The use of the SON-R 5,-17 is preferable whenexamining (highly) gifted children. For children with a cognitive delay and/or handicaps,and for children who are difficult to test, use of the SON-R 2,-7 is preferable.

7;0 – 7;11 yearsIn general, use of the SON-R 5,-17 is preferable in this age group. The reliability andgeneralizability of the SON-R 2,-7 are good, but ceiling effects clearly occur. This may notbe a problem for children with a below-average level. Administration of the SON-R 2,-7can be attractive in this age group for children with handicaps, children with a cognitivedevelopmental delay, and for children who are difficult to test. These children may profitfrom the help offered during the test and from the easy level of the first items.

From 8;0 years onwardsFrom this age on, no norms for the standard scores exist for the SON-R 2,-7. The SON-R5,-17 has been standardized to the age of 17;0 years. For children 8 years and older, theSON-R 2,-7 can be interesting when the level is so low that the administration of a test thatcorresponds to the abilities of the subject is more suitable. For these children, accuratedetermination of the reference age can be more informative than an extremely low IQ score.

10.4 THE INTERPRETATION OF THE SCORES

The most important function of the administration of a test like the SON-R 2,-7 is to provideinformation about the level of the cognitive development of a child for diagnostic purposes, foradvice and assistance, and for an (interim) evaluation of the effect of treatment programs andinterventions. This means that the results of the test may have great consequences for the homesituation of the child and for his or her development. The effects can also be far-reaching for theparents, and advice given or decisions taken in the process of diagnosis can have financialconsequences.

In general, administration of the test will not take place in isolation, but within a frameworkof discussions with, and observation of child and parent(s), and of information from schools or

Page 129: SON-R 2 - Tests & Test-research

129

family doctors. An intelligence test is also frequently administered in combination with otherdevelopmental tests. The administration of the test will often take place as part of a cycle inwhich formulating questions and gathering relevant information are alternated (Kievit & Tak,1996).

The SON-R 2,-7 supplies information about performance on different levels (subtests, scalescores and total scores), and in different ways (reference age, deviation scores, observations). Inthe following section, the value of this information for the diagnostic process, and the risks thatexist when a single result on one test is interpreted as ‘the’ level of intelligence will be dis-cussed.

Level of scoresThe objective of the SON-R 2,-7 is to give an impression of the general intelligence level of thechild. Diverse subtests are used not to determine differences in performance among the subtests,but rather because the influence of the specific characteristics of the subtests on the total scoredecreases when the test is made up of several subtests. The accuracy of the IQ score is notprimarily judged by the reliability of the test, but by its generalizability. All the variance that isspecific to the subtests is considered irrelevant for the generalizability. The SON-IQ, with the80% probability interval that is based on the generalizability should, in our opinion, be the basisfor the evaluation of the test results.

Subtest scoresThe differences between the scores on the subtests have the lowest reliability and stability. Theretest research shows that differences between subtest scores are also unstable. When the differ-ences between subtests are relatively large, the chance is greater that the order of the differencesis largely maintained. If one wants to interpret the differences between the subtest scoresfurther, one must therefore first determine whether the differences are relatively large. This canbe done by the computer program.

Although conclusions should not be drawn on the basis of the results at the subtest level,evaluating the differences between subtest scores in relation to other information availableabout the child, or to impressions gained during the administration of the test, may be worth-while. Such an evaluation may allow specific ideas to be developed about the child’s strengthsand weaknesses, which can subsequently be examined further. The explorative use of the subtestdata can be of value when the intertest differences are sufficiently large.

Scale scoresThe possibilities for using the scores on the Performance Scale and the Reasoning Scale aregreater than the possibilities of the subtest scores, but are still more limited than the possibilitiesfor the score on the SON-IQ. The scale scores are more reliable than the subtest scores, with amean of .85, and a retest stability of .72. An important difference with respect to the subtestscores is that the scale scores are based on several subtests. This means that generalizablestatements can be made on the basis of these scores. However, the correlation between the twoscale scores is rather high (.56), which means that the reliability of the difference between thetwo scores is limited to .65. The stability of the difference score is even lower, i.e., .46. Beforeinterpreting differences between the two scores, one should certainly determine whether thedifference is significant. Both the norm tables and the computer program supply information onthis. General statements, for example about a possible difference between the development ofperformance and reasoning ability, can be made only if the probability intervals of the twoscores do not overlap. This information is supplied when using the computer program.

The diagnostic possibilities of the two scale scores need to be studied in further detail. Forthe time being, this information should be used exploratively.

IQ scoreThe SON-IQ, the scaled and standardized total score of the SON-R 2,-7, is the most usable,generalizable, reliable, and stable result of the test. Combined with the 80% probability interval,

IMPLICATIONS OF THE RESEARCH FOR CLINICAL SITUATIONS

Page 130: SON-R 2 - Tests & Test-research

130 SON-R 2,-7

the SON-IQ gives a good indication of the level of intelligence of the child. The categories asshown in table 10.2 can be used to give a rough definition of the test result. The first column isneutral and descriptive: this shows whether the child’s performance on the test was high, low oraverage. This classification is also used by the DTVP-2 and, with slightly different limits, by theWPPSI-R. The two other classifications give a description of the level of intelligence of thechild in qualitative terms, related to the IQ score.

The reference ageAt the level of subtests, scale scores and total test results, the results on the test can also bepresented as a reference age. As for the standard scores, the reference age based on the total testresult is the most reliable, stable and generalizable age score, and can therefore best be used forthe evaluation of the results.

For children and adults older than 8 years, the reference age is the only standardized infor-mation available. For children up to 8 years old, the reference age can be used as additionalinformation for the SON-IQ.

Opinions differ as to the usefulness of reference ages (also called test ages or mental ages).The reference age reflects an absolute level of performance, as is demonstrated by the fact thatthe reference age can be estimated fairly accurately from the sum of the raw scores. Contrary tothe standard scores that represent relative levels of performance within an age group, thecalculation of the reference age does not depend on the age of the child. However, the inter-pretation of the reference age should be done in connection with the age at the time of the testadministration. An identical reference age of 4;3 years means that a child of 4;1 years hasperformed as well on the test as a child of 5;2 years, but the psychological significance of thetest result is completely different for the two children.

An IQ score can be expected to be more or less stable during the course of a child’s develop-ment. However, this is not the case for the reference age; as long as the child is developing, thereference age will increase. As the development progresses much faster at a younger than at anolder age, the discrepancy between the reference age and the age at the time of administration ofthe test will constantly increase for a specific child. For an IQ of 80, the discrepancy in months ismuch smaller at the age of 4;0 years than at the age of 6;0 years. Furthermore, given a fixed age,the discrepancy is much smaller with an IQ of 80 than with an IQ of 120. This means that refer-ence ages, or discrepancies between reference age and age at the time of administration of thetest, are often difficult to compare and do not lend themselves very well for statistical analysis.Another disadvantage of using the reference age is that, in contrast to the IQ score, no probabilityinterval is offered to indicate how (in)accurate the statement about the reference age is.

Despite these limitations, the reference ages can certainly be useful. The reference agerepresents, in a very concrete way, how the child functions during the test, and this information

Table 10.2Classification of IQ Scores and Intelligence Levels

IQ Description % IQ Description (1) IQ Description (2)

>130 Very high 2% >130 Highly gifted >129 Very superior121–130 High 7% 121–130 Gifted 120–129 Superior111–120 Above aver. 16% 111–120 Above average 110–119 High average90–110 Average 50% 90–110 Average 90–109 Average80– 89 Below aver. 16% 80– 89 Less gifted 80– 89 Low average70– 79 Low 7% 60– 79 Learning probl. 70– 79 Borderline

<70 Very low 2% <60 Learning disorder <70 Mentallydeficient

– (1) classification by Struiksma en Geelhoed (1996)– (2) classification used with the Wechsler scales (Sattler, 1992)

Page 131: SON-R 2 - Tests & Test-research

131IMPLICATIONS OF THE RESEARCH FOR CLINICAL SITUATIONS

can be instructive when reporting the results, for instance, to the parents. Furthermore, thereference age provides information about the level of tasks that the child comprehends, and thiscan be used to show at which level learning materials or training can be given. One can,naturally, not only depend on the reference age. When a 7-year-old child has a reference age of3;5, this child is in a completely different situation, and has completely different learningabilities, from a 3-year-old child with a reference age of 3;5 years. For that matter, a 3-year-oldwith an IQ of 80 is of course not ‘equal’ to a 7-year-old with an IQ of 80.

The reference age can best be described as ‘the child’s performance on the test correspondsto the mean performance of children of .. years old’. This is better than saying ‘the childfunctions at the level of a .. year-old’. The latter formulation suggests, unjustly, that the com-plete cognitive or mental level of the child is described by the test.

Evaluation by the examinerDuring the research, children were evaluated on aspects of motivation, concentration, coopera-tion and comprehension of the directions, after the test had been administered. Especially in thecase of young children and children with developmental delays and/or disorders, problemsoccurred regularly in these areas. These groups often performed less well, or even badly on thetest. Children who received negative evaluations but were able to complete the test were notexcluded from the description and analysis of the results. The information from such an evalua-tion is important for the diagnosis. If one has the impression that the child was not wellmotivated and concentrated, the question whether the test result provides a valid indication ofthe intelligence arises. An important point here is whether problems occurred by chance duringthis test administration, or whether they are characteristic for the child and occur in manysituations. The question may then arise whether treating the motivation and concentrationproblems will, in the long run, lead to better test and learning performances.

When a child performs badly on a test and the evaluation of the various motivational andconcentration aspects is positive, one can have more confidence that the child has really shownhis or her ability level, and that the low score is not the result of the fact that he or she is difficultto test.

The four evaluation categories used in the research and printed on the record form can betaken as point of departure for the observation. Because the children are so actively involvedwith the SON-R 2,-7, and because of the extensive interaction between the examiner and thechild, the test offers many opportunities for observation, and we expect users of the test to makeuse of this.

Generalization of the test resultThe SON-IQ shows how well the child has performed on the test. Based on the first descriptionin table 10.2, this performance can be classified as ranging from ‘very high’ to ‘very low’.Classification becomes more difficult when one wants to take the limitations of the test intoaccount, and to make a general statement about the intelligence level based on the performance.

Generalizing across subtestsThe 80% probability interval that is always given with the IQ scores allows for two limitations:namely, the unreliability of the test and the fact that part of the reliable variance is specific foreach subtest. The interval indicates where the IQ score is expected to lie if a large number ofcomparable subtests were administered. This score would be almost completely reliable, andthe influence of specific characteristics of the subtests would then be negligible. This intervalhas a width of about 18 points. Most descriptive categories of the IQ score in table 10.2 have awidth of 10 points. This means that the 80% interval of the IQ score embraces either twocategories, or one category and part of both of the adjacent categories.

Generalizing across timeThe 80% interval of the IQ score takes no account of the stability of the test over a period ofseveral months. However, this is important for the evaluation of intelligence. The retest correla-

Page 132: SON-R 2 - Tests & Test-research

132 SON-R 2,-7

tion of the SON-R 2,-7 in a heterogeneous age group is as high as the generalizability coeffi-cient. Whether this is valid for all age groups is not known. However, it means that the 80%probability interval can also be interpreted in another way, namely as the expected interval forthe hypothetical IQ score if we could administer these six subtests many times with an interven-ing period of several months. In this interpretation, allowance has been made, in the 80%interval, for the reliability as well as the instability, but no longer for the specific variance of thesubtests.

Generalizing across tests and timeThe 80% interval of the SON-IQ, with which (approximately) two of the three limitations of asingle administration of the test can be taken into account in two different ways, has a width thatis not precise enough, in many practical situations, to make important decisions. If all threeaspects are taken into consideration – unreliability, instability and test-specific characteristics –an assessment of the level of intelligence, based on the test result, can be made with even lesscertainty. The real danger of drawing completely incorrect conclusions based on a single testresult for young children is demonstrated by the comparison of the scores on the SON-IQ withthe PIQ and the WPPSI-R (see section 9.8). In the case of four of the 230 children, a differenceof around 40 points occurred. In two cases the child had a low score on the SON-R 2,-7, and intwo cases on the WPPSI-R. If the evaluation is to be used to make important decisions, with farreaching consequences for the child and his or her surroundings, the administration of a singleintelligence test is unlikely to be sufficient. The risk that a distorted idea of the intelligence willbe formed, due to a combination of unreliability, fluctuations in the performance and specificcharacteristics of the test, is too great.

Administration of several testsBased on the research on the congruent validity of the SON-R 2,-7, the variance of the test hasbeen described as follows in section 10.2 (see figure 10.1):– measurement error variance (10%)– unstable variance (10%)– test-specific variance (10%)– valid generalizable variance (70%)The proportion of valid generalizable variance is based on correlations of approximately .70with other (nonverbal) intelligence tests. If we assume that there are other intelligence tests,with similar variance compositions, and with correlations of .70 with each other and with theSON-R 2,-7, then the composition of the variance of the mean score when two or threedifferent tests are administered can be calculated (see table 10.3). The assumption here is thatthe interval between the test administrations is between several weeks and several months.

When two tests are administered, the share of the undesired sources of variance is reduced by40%. The proportion of valid variance increases from 70% to 82%. When three tests are admin-istered, the share of the undesired sources of variance is reduced by 60%; the proportion of validvariance now becomes 88%. For young children, the share of undesired variance is larger thanfor the older children. Therefore, in the last part of table 10.3, an estimate has been made of the

Table 10.3Composition of the Variance When Several Tests Are Administered

Average of Average of 2-4 years: 3 testsSON-R 2,-7 two tests three tests 5-7 years: 2 tests

Variance of meas. error 10% 6% 4% 5%Unstable variance 10% 6% 4% 5%Specific variance 10% 6% 4% 5%

Valid variance 70% 82% 88% 85%

Page 133: SON-R 2 - Tests & Test-research

133

components of the variance, when three tests are administered to children from 2 to 4 years ofage, and when two tests are administered to children from 5 to 7 years of age. This leads to anestimate of 85% valid variance for both groups. The reliability of the mean score is .95 and thestability is .90.

When the scores on two or three tests are averaged, the dispersion becomes narrower. Table10.4 shows how to correct the mean score for this. This correction is based on a standarddeviation of the mean score of 13.6.

To calculate the mean of the scores, the norms of the different tests must be comparable. Fora number of reasons, including obsolescence of the test norms, this is often not the case. Thisproblem was discussed in section 9.10. In table 10.5 the expected obsolescence of the norms ofthe SON-R 2,-7, based on an estimate of obsolescence of one IQ point per 3 years, is presented.This means that, in theory, 3 years after the standardization 1 IQ point should be subtracted fromthe IQ score. The same holds true for other intelligence tests.

Table 10.5Obsolescence of the Norms of the SON-IQ

Year of administration Year of administration Year of administration

1996 - 1998: 1 point 2002 - 2004: 3 points 2008 - 2010: 5 points1999 - 2001: 2 points 2005 - 2007: 4 points 2011 - 2013: 6 points

– The obsolescence has been calculated from 1993/’94, the year in which the standardization was carriedout. An obsolescence rate of 1 IQ point per three years was used.

An improved estimate of the level of intelligence can also be gained by administering thetest again. This also leads to higher reliability and improved stability. However, a retestwithin a short period brings with it the problem of learning effects. Further, the specificcharacteristics of the test will still influence the mean score. Administering a different test incombination with the SON-R 2,-7 is a much more attractive alternative, because thisreduces the influence of various undesired sources of variance at the same time. Naturallythe alternative test must be suitable for the target group. For various groups of children withwhom the SON is used, only nonverbal tests, or the performance section of more generaltests will be considered as alternatives. Diversity in materials and method of testing is, asfar as possible, desirable. Furthermore, having different examiners administer the tests isrecommended.

Because the SON-R 2,-7 differs so clearly from many other tests in the method of adminis-tration and the materials used, it is very suitable to be administered as an extra test for childrento whom a (partially) verbal intelligence test can also be administered.

An IQ score based on two test administrations, and for young children preferably on threetest administrations, can be interpreted with much more confidence as the level of intelligence.Such scores can be evaluated qualitatively according to the descriptions presented in the secondand third parts of table 10.2.

IMPLICATIONS OF THE RESEARCH FOR CLINICAL SITUATIONS

Table 10.4Correction of Mean IQ Score Based on Administration of Two or Three Tests

Mean IQ Mean IQ Mean IQ Mean IQ Mean IQ Mean IQ Mean IQ

50 45 65 61 80 78 95 94 110 111 125 128 140 14455 50 70 67 85 83 100 100 115 117 130 133 145 15060 56 75 72 90 89 105 106 120 122 135 139 150 155

– both the mean IQ score (Mean) and the newly standardized IQ score (IQ) are presented

Page 134: SON-R 2 - Tests & Test-research

134 SON-R 2,-7

Other informationThe question whether another test besides the SON-R 2,-7 should be administered dependsprimarily on the consequences of an incorrect evaluation of the child. If these are not veryserious, and if the evaluation can easily be revised, a relatively large margin of uncertainty isacceptable. The risk of an incorrect evaluation will also decrease if the result of the test can beinterpreted in combination with information from the parents, teachers and others concernedwith the child. The observations of the examiner may also give an indication of the desirabilityof administering an extra test.

Manner of administrationA condition for the validity of the test result is that the test is administered in the correct mannerand according to the directions. Experience in administering tests is very important in thisrespect, as is experience in interacting with young children and, if relevant, with children withspecific problems or handicaps. The administration of the test does not necessarily have to bedone by a psychologist. However, the interpretation and recording of the results remain thedomain of qualified experts.

An important aspect of the SON-R 2,-7 is its friendly approach to children and the interac-tion between examiner and child. This makes completing the test enjoyable, for both child andexaminer. However, this also means that the examiner is closely involved in the administrationand hence that the risks of examiner effects are greater. With one exception, systematic examin-er effects were restricted to a few IQ points during our research. To reduce the risk of sucheffects, it is advisable, in addition to closely following the directions, to be present on someoccasions when someone else administers the test and to allow someone else to be present whenadministering the test oneself. If possible, comparing the results of different examiners now andthen is also useful.

10.5 CONCLUSIONS

The research has shown that the SON-R 2,-7 is a valid, reliable intelligence test that can beused to good effect with children with problems and handicaps in language development andcommunication, with children with a foreign language or bilingual background, with childrenwith a developmental delay and developmental disorders, and with mentally deficient childrenand adults. When more traditional, verbal intelligence tests are used with these groups, theevaluation of intelligence can be distorted by the language skills of the child. The test results ofthe deaf children who were not multiple handicapped, and whose performance was almost equalto that of the children in the norm group, demonstrate the importance of a nonverbal testadministration. The same holds true for the performance of the immigrant children. This wasmuch better than the performance on the traditional tests, and was comparable to the results ofthe children in the norm group with a similar SES level.

Young children, in particular young children with a problem or handicap, are still frequentlydifficult to test. In addition to the nonverbal character of the test, which allows, but does notrequire, the child to speak, a child-oriented test situation is established by the help given by theexaminer, the attractiveness of the materials and the manner in which the child is activelyinvolved. Comparisons with two other tests showed that the children were more motivated andconcentrated with the SON-R 2,-7 and that, according to the examiners, they understood thedirections better.

The interaction between examiner and child offers extra opportunities for observation. How-ever, the manner of administering the test does require the examiner to be thoroughly preparedand to follow the directions.

The scores on the test correlated strongly with various evaluations of intelligence. Theperformance of children with a developmental delay (in the Netherlands and in Australia) andlearning problems (in Great Britain) was low, as was expected. The correlations with othernonverbal intelligence tests were reasonable. However, due to differences in content between

Page 135: SON-R 2 - Tests & Test-research

135

the tests and to fluctuations in the performances, the correlations were lower than would beconsidered possible on the basis of the reliability of the test. The score on the test gives anindication of the intelligence of the child; the score is not ‘the’ level of intelligence. Whendecisions with far-reaching consequences have to be made, the diagnosis should be based on theadministration of two or three intelligence tests.

The IQ score, for which the reference age can also be determined, is of prime importance forthe interpretation of the test results. The distinction between a Performance Scale and a Reason-ing Scale was supported by the Principal Components Analysis and by the patterns of correla-tions with other tests. This is important because it, in turn, supports the multifacetted nature ofthe concept of intelligence as it is measured with the SON-R 2,-7; however, the reliability ofthe difference between the two scale scores is relatively low and of less practical importance.The norms for the test scores are based on the exact age of the child, so avoiding systematicdistortions in the presentation of the results, and probability intervals are presented, allowingthe user to take the uncertainty about the results into account.

The difficulties which arise when testing young children, and the great diversity of problemsand handicaps of young children for which psychological assessments are requested, make itextremely important that a number of well-constructed, standardized and validated intelligencetests are available. The SON-R 2,-7 complies with these criteria.

IMPLICATIONS OF THE RESEARCH FOR CLINICAL SITUATIONS

Page 136: SON-R 2 - Tests & Test-research
Page 137: SON-R 2 - Tests & Test-research

137

11 GENERAL DIRECTIONS

In this chapter the general characteristics of the procedure for the test administration andscoring are presented. In chapter 12 the directions for each separate subtest will be described.

11.1 PREPARATION

Before the test is administered for the first time, the examiner should become familiar with thematerials, the directions and scoring of the items. We strongly advise trying out the test anumber of times before using it. In our experience, administration of the test is not difficult. Inorder to administer the test correctly the examiner must have a good command of the directionsso that he or she does not need to consult the manual during the administration. Learning toadminister the test is facilitated by observing a test administration or watching a video recordingof it.

If attention is not continually focused on the child, he or she can easily be distracted andloose interest in the test. Specific characteristics of the administration of each subtest aredescribed on the record form so that these are always immediately available during the adminis-tration of the test.

A valid test administration of young children, certainly when they have problems and hand-icaps, requires a high level of expertise from the examiner. Experience in testing children isessential. When a child has specific problems or handicaps, experience in interacting with thesechildren is desirable in order to be able to communicate easily with the child, and to deal withany problems that may arise. Administration of the test is not restricted to psychologistsand (ortho)educationalists; experience in testing of, and interaction with young children is ofparamount importance. However, interpreting and reporting on the test results remains theprerogative of experts.

The directions should be followed as closely as possible. Deviating from the directions mayinfluence the test results. In general, sufficient latitude is allowed in the directions for adaptingto the comprehension and skills of the individual child. Because of specific problems of a child,e.g., motor handicaps, adapting the administration of the test may be necessary. This will bediscussed in section 11.6.

Set-upThe examiner sits at a table opposite the child. The table should not be too broad. Otherwise,the examiner cannot easily help the child. The height of the table and chair should beadjusted to the child. The child should be able to sit comfortably, to easily see what is on thetable and what the examiner does. Preferably, the examiner should sit so that the light fallson his or her face.

Only the material the child needs at that moment should be on the table. The child workson a large anthracite-colored mat. The mat stops the material sliding around, makes it easierfor the child to pick up items, and supplies a uniform background. The record form and thematerial needed by the examiner are placed on another table, preferably outside the reach ofthe child.

Page 138: SON-R 2 - Tests & Test-research

138 SON-R 2,-7

CHILD

bottom

left test materials right

top

EXAMINER

recordform

storagebox

When describing the directions, the left-right perspective is correct from the examiner’s posi-tion. The top-bottom perspective is correct when seen from the child’s perspective. We havechosen this top-bottom perspective because referring to the top of the test booklet as the bottom,when it is lying the other way round according to the examiner’s perspective, may be confusing.

The test booklets are presented so that the title page is facing the child. The page numbersand numbers on the cards are always legible from the examiner’s perspective. When studyingthe directions, this test situation should be taken into account.

The examiner should always be sure that the child’s view of the material is not blocked whilepresenting materials, giving directions, or correcting answers. The examiner should considerright or left handedness when placing materials on the table or giving them to the child.

IntroductionBefore starting the test, the examiner should allow time for the child to get used to the setting.The child should not have the impression that he or she has to achieve, but that he or she will beplaying with different materials.

The length of time, needed to administer the SON-R 2,-7, varies from three to five quartersof an hour. The entire test should, preferably, be administered in one sitting. The examiner canallow a short break between subtests now and then, so that the child can have a drink or go to thebathroom.

11.2 DIRECTIONS AND FEEDBACK

Verbal and nonverbal directionsThe SON-R 2,-7 can be administered with and without the use of spoken language. The verbaland nonverbal directions are always printed in columns next to each other. The sentences printedin small capitals (CAPITALS) in the left-hand column represent the spoken text. The italicizedtext (italic) in the right-hand column represents the nonverbal directions.

We have tried to make both types of directions equivalent. Therefore it is important thatno extra verbal information be given. When using verbal directions one should limit oneselfto the text in capitals and give no further explanation. Naming the pictures or the shape andcolor of the blocks used in Analogies for example, is not allowed. When using nonverbaldirections, one should be careful not to add any extra information in gestures or facialexpressions. The directions may be repeated when the child does not understand what he orshe is expected to do.

The nonverbal directions are used for children who have problems understanding spokenlanguage. Most generally, a combination of the two is used in which the nonverbal directions areaccompanied by parts of the verbal directions depending on the capabilities of the child tounderstand verbal direction.

Page 139: SON-R 2 - Tests & Test-research

139

When using nonverbal directions, the gesture for ‘together’ is often used. This should be done inthe following manner; move both hands together (slowly) as if to catch a large ball.

Help and feedbackAfter each item the examiner tells the child whether a solution is correct or incorrect. When achild has made a mistake or when he or she is not able to complete the item, the examiner helpsand corrects the solution, while trying to actively involve the child. The purpose of feedback isnot to show the child what he or she cannot do, but to show him or her how to do the itemcorrectly. However, the item is only scored as being correct if the child has completed itindependently.

In principle, the examiner corrects an item made incorrectly while involving the child. Themistake does not have to be corrected when the subtest is discontinued on the basis of the rulesfor discontinuation.

The manner in which feedback should be given is described at the end of each part in thesubtest directions. In broad lines the feedback consists of the following reactions:

Following a correct solution:YES, THAT’S GOOD, or YES, THAT’S Nod affirmatively: yes or use aRIGHT, or GOOD, or use a similar similar gesture.phrase.

When a child has made a mistake:NO, IT’S NOT QUITE RIGHT Make a questioning gesture.

Shake head: no.

The examiner points to the picture, block, puzzle, or card. The examiner corrects the mistake,when possible with the child.

LOOK, IF WE DO IT LIKE THIS, IT’S BETTER. Point to the correction and nodaffirmatively: yes.

The examiner tries to involve the child in actively correcting the mistakes by letting him or herperform the last activity. The examiner does not explain why the child’s answer was wrong.

When the child does not react despite encouragement:The examiner completes the item while trying to actively involve the child in the solution.

Extended and short directionsIn the first part of the subtests no separate examples are shown, but an example is included in theadministration of the item. That is why extended directions are given for the first items in eachsubtest. When the purpose is clear, the examiner can suffice with short directions and graduallyshorten them to:

NOW THIS ONE, or NEXT ONE Nod encouragingly.

When a child does not comprehend the directions, these may be repeated.

The second part of each subtest, with the exception of Patterns, is preceded by an example. Thisexample is always completed when the child reaches the items of the second part.

Every time the child has given an answer to an item, one must ascertain whether the child hasfinished.

ARE YOU READY? or THAT’S IT? or READY? Make a questioning gesture.

GENERAL DIRECTIONS

Page 140: SON-R 2 - Tests & Test-research

140 SON-R 2,-7

The child may immediately correct his or her answer him/herself. In such a case the examinershould ask what the final answer is. Make sure that the child does not consider the question,whether he or she is ready or whether the final answer has been given, to express doubt about thecorrectness of the answer. Varying the questions might be advisable (for example: ‘show me thecorrect picture again’, or ‘which picture matches this best’).

Wait with feedback until the answer is complete (this is very important when more than onechoice must be made).

Sometimes a child comments on slight differences in color between the testing material and thepictures in the test booklets, or about the space remaining in the ‘frame’ of Mosaics. Reassurethe child and tell him or her that it does not matter.

11.3 SCORING THE ITEMS

All items completed by the child are scored as being correct (1) or incorrect (0). An item is only‘correct’ if it has been completed by the child independently and correctly. A time limit is usedin the second part of some of the subtests. When this is the case, items must be completed withinthe time limit in order to be scored as being correct. In the case of older children items at thebeginning of the subtest are not presented on the basis of the entry procedure (see section 11.4)and are scored as ‘+’. These items are scored as correct for the total score of the subtest. Whena child refuses to do an item, this is indicated by ‘–’ and scored as being incorrect.

Time limitsIn part II of the performance subtests (Mosaics, Puzzles and Patterns) a maximum amount oftime is allowed per item. The examiner uses a stopwatch for these items. The time limit is 2,minutes. Experience has shown that items are hardly ever completed correctly after this amountof time has passed. The examiner may stop earlier when the child clearly cannot finish the itemsuccessfully. When the child is almost finished after 2, minutes, the examiner allows the childto finish the item.

The following situations can arise:– When the child is finished before the time is up, the examiner scores the item as being either

correct (1) or incorrect (0).– When it is clear before the time is up that the child will not succeed, the examiner can offer

help. The item is then scored as being incorrect (0).– When the child is not finished and the time limit has been reached, the examiner can help. The

item is scored as being incorrect (0).– When the time limit has been reached and the child can finish the item independently in a

short time, the child is allowed to do so. The item is scored as being incorrect (0).

RefusalWhen the child does not wish to continue halfway through an item, the examiner encourages thechild to go on. When this has no effect, the examiner offers help. The item is then scored asbeing incorrect (0).

When the child refuses to do an item in advance, or even to begin with a subtest, andencouragement does not help, the examiner completes the item and tries to involve the child.The item is then scored as being a refusal (–). The child is then encouraged to complete the nextitem. The administration of the subtest is discontinued when two consecutive items havebeen refused. This subtest cannot be used for the evaluation of the test performances or forthe calculation of the IQ score.

When the child does not want to continue, a break may be called for, and in this extremesituation changing the sequence of administration of the subtests may be considered. If, forinstance, the child does not want to continue doing Analogies, Patterns can be administered first

Page 141: SON-R 2 - Tests & Test-research

141

followed by Situations. Patterns, during which the child draws, is more attractive to do for somechildren than Situations, during which the child must make choices.

11.4 THE ADAPTIVE PROCEDURE

During the administration of the SON-R 2,-7 an adaptive procedure is used that aims atlimiting the administration to the items that correspond to the level of the child. Using theentry procedure, based on age/level, items are not presented that would in all likelihood havebeen correctly solved. Children appear to become demotivated and uninterested when theyhave to complete too many items below their level. The discontinuation rule precludes thechild having to try to solve too many items above his or her level. Items that are too difficultfor the child are frustrating and may easily lead to a refusal to continue, or to behaviorindicating that the child does not care what he or she does and that the child pays no moreattention. Besides these aspects of motivation, the adaptive procedure aims at limiting theduration of the test.

Entry procedureThe first item of a subtest to be completed depends on the age and level of the child. Based onage and class in primary education, the following rule holds:

Entry-item 1: children of 2 or 3 years who have no school experience.Entry-item 3: children of 4 and 5 years who are in their first or second year of school.Entry-item 5: children of 6 years or older who are in their third or higher year of school.

When a discrepancy exists between the age of the child and the level in primary education, theentry level corresponding to the lower level is chosen. A six-year-old who is still in his secondyear of school will begin with entry-item 3. Children of 2 and 3 years always begin with entry-item 1.

When a child is suspected of having a substantial cognitive developmental delay, the entrylevel can be adapted. When the examiner receives the impression that a five-year-old functionsat the level of a three-year-old child, he or she will begin with entry-item 1.

However, when a child is suspected of having only a slight developmental delay (roughlycorresponding to an IQ of 85 to 100), beginning at a lower level than is suggested on the basis ofage and level at school is not necessary or desirable. When the child has a fear of failure or isdifficult to test for another reason, beginning at a lower level may be wise.

In the subtest directions, the administration procedure is always described starting with item 1.At the end of the description of part I of each subtest, changes in the directions due to beginningwith entry-item 3 or 5 are described.

The skipped items are scored as ‘+’ on the record form. In the calculation of the subtest scorethese items are reckoned as being correct.

Rules for discontinuationThe following discontinuation rule holds for all subtests:

A) A SUBTEST IS DISCONTINUED WHEN A TOTAL OF THREE INCORRECT AN-SWERS HAS BEEN GIVEN.

PAY ATTENTION: To reach the criterion of three incorrect answers it is not necessary thatthe mistakes be consecutive.

In addition to A the following discontinuation rule applies to part II of the performance subtests(Mosaics, Puzzles and Patterns):

GENERAL DIRECTIONS

Page 142: SON-R 2 - Tests & Test-research

142 SON-R 2,-7

B) ADMINISTRATION OF THE SUBTESTS MOSAICS, PUZZLES AND PATTERNSIS ALSO DISCONTINUED WHEN TWO CONSECUTIVE ITEMS HAVE BEENSCORED AS BEING INCORRECT IN PART II OF THESE SUBTESTS.

PAY ATTENTION: The time limit is also in effect for these items.

PAY ATTENTION: The number of mistakes includes the items completed incorrectly (score‘0’) as well as the items that were refused (score ‘–’).

Examples for discontinuation rule A:The subtest is discontinued when a total of three items have been scored as being incorrect. Theentry procedure does not affect the discontinuation rule.

The meaning of the scores is:

+ Item skipped (based on the entry procedure; scored as being correct for the total score),1 Item correct (completed entirely, independently and within the time limit when in

effect),0 Item incorrect (completed incorrectly, not independently, incompletely or not within the

time limit),– Item refused (scored as being incorrect for the total score).

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 score

Mos 1 0 0 1 0 A 2

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 score

Cat 1 1 1 0 1 1 0 A 1 0 6

1 2 3 4 5 6 7 8 9 10 11 12 13 14 score

Puz + + + + 1 1 A 1 1 1 1 1 0 1 1 1 3

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 score

Cat + + 1 1 1 0 – A 1 0 6

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 score

Ana + + 1 1 1 0 1 1 0 0 A 7

1 2 3 4 5 6 7 8 9 10 11 12 13 14 score

Sit + + 1 1 1 0 A 0 1 0 6

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 score

Pat + + 1 1 1 0 1 1 1 1 0 1 0 10

Page 143: SON-R 2 - Tests & Test-research

143

Examples for discontinuation rule B:When two consecutive items of part II in the subtests Mosaics, Puzzles and Patterns havebeen scored as being incorrect the subtest is also discontinued.

GENERAL DIRECTIONS

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 score

Mos + + + + 1 1 A 1 1 1 0 0 9

1 2 3 4 5 6 7 8 9 10 11 12 13 14 score

Puz + + + + 1 1 A 1 1 1 1 0 – 10

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 score

Pat 1 1 1 1 1 1 1 1 1 1 0 0 10

A special situation: starting and going back to previous itemsThe criteria for the entry-item and the construction of the various subtests are such that goingback to previous items should only happen incidentally. When this does occur, the followingrules, that will be described separately for entry-item 3 and entry-item 5, apply.

Entry-item 3A child is 4 or 5 years old and starts the subtest with item 3. When either item 3 or 4 areincorrectly solved, one goes straight back to item 1 and presents both items 1 and 2. When thecriterion for discontinuation has not been reached, one continues with the more difficult itemsuntil the criterion has been reached.

When one has started the subtest with item 3, and the situation occurs that one has to goback to item 1, then the following subtests are always started with item 1, and the entryprocedure is abandoned.

A number of examples: the sequence in which the items have been presented is printed under theitem scores.

Item 3, the first item to be completed has been solved incorrectly. One goes straightback to item 1.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 score

Puz 1 1 1 0 1 1 A 0 0 5

3 4 1 2 5 6 7 8

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 score

Ana 1 1 0 0 0 A 2

2 3 1 4 5

1 2 3 4 5 6 7 8 9 10 11 12 13 14 score

Sit 0 0 0 A 0

2 3 1

Page 144: SON-R 2 - Tests & Test-research

144 SON-R 2,-7

Entry-item 5Children of 6 years and older start with item 5. When either item 5 or item 6 is scored as beingincorrect, one goes straight back to item 3 and does item 3 as well as item 4.

When items 3 and 4 are both scored as being correct, one goes on to the more difficult itemsuntil the discontinuation criterion has been reached.

When either item 3 or 4, or both items 3 and 4 are scored as being incorrect, one goes back toitem 1 and does item 1 as well as item 2. When the discontinuation criterion has been met ascalculated from item 1 on, the score is calculated on the basis of the item at which the discontin-uation criterion has been reached. When the discontinuation criterion has not yet been met, onegoes on to the more difficult items.

When the situation occurs that one has to go back in a subtest, the subsequent subtests arestarted at the lowest entry level reached, i.e., at entry-item 3 or at entry-item 1.

Item 6 has been completed incorrectly, so item 3 and 4 are administered. Then onecontinues with item 7 until a total of three items have been completed incorrectly.

Item 5 has been completed incorrectly. So item 3 and 4 are administered. Because item3 is completed incorrectly, item 1 and 2 are administered. The total number of mistakesis still less than three so one continues with item 6 until three mistakes have been made.

After completing item 5 and subsequently 3 and 4, three mistakes have been made.However, one does go back to item 1.

Pay attention: The discontinuation criterion was reached at item 4 of Analogies. Item5, which was completed first, is no longer counted for the score.

11.5 THE SUBTEST SCORE

The score on the subtest equals the number of items completed correctly (1), plus the number ofitems that were skipped at the beginning (+). However, the subtest score is easier to calculate bytaking the number of the last item that was administered and deducting the number of mistakes(0) and refusals (-). The scores of a six-year-old child on the different subtests are shown below.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 score

Mos + + 1 1 1 0 A 1 0 0 6

3 4 1 2 5 6 7

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 score

Cat 1 1 0 1 0 1 1 A 0 5

4 5 2 3 1 6 7 8

1 2 3 4 5 6 7 8 9 10 11 12 13 14 score

Puz 1 0 0 0 0 A 1

4 5 2 3 1

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 score

Ana 1 0 0 0 1 0 A 1

5 6 3 4 1 2

Page 145: SON-R 2 - Tests & Test-research

145

Various aspects of the adaptive procedure are also illustrated on this record form. If this is notyet entirely clear, section 11.3 and 11.4 should be studied anew. The scores that are calculatedhere are the raw subtest scores. Using the norm tables or the computer program, they may betransformed into the scaled standard scores.

GENERAL DIRECTIONS

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 score

Moz + + + + 1 1 A 1 0 0 7

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 score

Cat + + + + 1 1 1 A 0 – – –

1 2 3 4 5 6 7 8 9 10 11 12 13 14 score

Puz + + + + 1 1 A 1 0 0 7

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 score

Ana + + 1 1 1 0 1 0 0 A 6

1 2 3 4 5 6 7 8 9 10 11 12 13 14 score

Sit + + 1 1 1 1 A 1 1 – 0 0 8

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 score

Pat + + 1 1 0 0 1 0 5

The subtest Categories has not been scored in this example because the child refused to com-plete two consecutive items. This subtest is not used when calculating the IQ score.

11.6 ADAPTING THE DIRECTIONS

The directions of the SON-R 2,-7 show great flexibility, which makes adapting the test to thecommunicative skills and age or cognitive level of the child a possibility. However, within thisbroad framework, standardization of the method of administration is required and deviation isundesirable. Undoubtedly other methods of administration are possible and may lead to a betterperformance by the child. However, the standardization research was not carried out that way,and a test result obtained in different circumstances cannot be interpreted using the norm tables.

Even so, special problems or handicaps my occur, which would make rigorous application ofthe directions undesirable because the result of the test would then not be indicative of thechild’s cognitive skills. This may be the case if a child has a motor handicap. In all sections ofthe test the child is expected to actively do something, and when he or she is not physically ableto do so, the test result will not be valid.

Small scale research has been conducted as to the usefulness of the test for children with a motorhandicap (Van de Beek, 1995). This research has demonstrated that Patterns cannot be adminis-tered correctly. Picking up and handling the other materials can also be problematic. This can be

Page 146: SON-R 2 - Tests & Test-research

146 SON-R 2,-7

obviated, for example, by always offering the cards one by one during the subtests Situationsand Categories, by putting the blocks on the table instead of having the subject take them out ofthe box when doing Mosaics, by adding an extra non-slip layer under the mat, or by allowing thechild to give the examiner directions (this does assume good verbal skills). Problems inhandling the materials also make adapting the time limits desirable, and possibly administeringthe test over a period of a few days. However, our experience is limited and the diversity ofmotor handicaps is so large, that giving set rules for administering the test to these children isdifficult. The examiner will have to discover what the limiting factors are and whether, and inwhich manner, these can be compensated. When one works mainly with motor handicappedchildren, administering the test a few times to children who are not handicapped is advisable.This way one can get a clear idea of problems that occur during the administration that arespecific to the handicap.

Adapting the manner of administration of the test can also be desirable when children arevery fidgety of find it hard to focus on the test (for example autistic children). In such a case,sitting at a corner of the table may be preferable to sitting opposite the child as one can thendraw the child’s attention by touching him or her.

When one deviates from the standard directions during the administration of the test, thisshould be mentioned on the record form so that others can take this into account when interpret-ing the results.

Page 147: SON-R 2 - Tests & Test-research

205

REFERENCES

Akker, J. van den & Boecop, A. van (1976). Test voor visuele waarneming van Marianne Frostig.Handleiding. Amsterdam: Swets & Zeitlinger.

Alexander, P.A., Willson, V.L., White, C.S., Fuqua, J.D., Clark, G.D., Wilson, A.F. & Kulowich, J.M.(1989). Development of analogical reasoning in 4- and 5-year-old children. Cognitive Develop-ment, 4, 65-88.

APA (1987). Diagnostic and Statistical Manual of Mental Disorders, DSM III-R. Washington: Ameri-can Psychiatric Association.

Bayley, N. (1949). Consistency and variability in the growth of intelligence from birth to eighteenyears. Journal of Genetic Psychology, 75, 165-196.

Bayley, N. (1969). Manual for the Bayley Scales of Infant Development. New York: The PsychologicalCorporation.

Beek, C. van de (1995). De toepasbaarheid van de SON-R 2,-7 bij kinderen met een motorischehandicap. RU Groningen: intern verslag.

Berg, W. van den, Heide, L. van der, Kamminga, J., Meeder, S. & Paredes, M.G. de (1994). Slimgezien! een vergelijking tussen de SON-R 2,-7 (intelligentietest) en de DTVP-2 (visuele per-ceptietest). RU Groningen: intern verslag.

Berge, J.M.F. ten & Kiers, H.A.L. (1991). A numerical approach to the approximate and the exactminimum rank of a covariance matrix. Psychometrika, 56, 309-315.

Berge, J.M.F. ten & Zegers, F.E. (1978). A series of lower bounds to the reliability of a test. Psycho-metrika, 4, 575-579.

Berger, H.J.Chr., Creuwels, J.M.P. & Peters, H.F.M. (1973). Nederlandse handleiding bij het gebruikvan Wechsler’s intelligentie-schaal voor kleuters, de W.P.P.S.I. Amsterdam: Swets & Zeitlinger.

Bleichrodt, N., Drenth, P.J.D., Zaal, J.N. & Resing, W.C.M. (1984). RAKIT Revisie AmsterdamseKinder Intelligentie Test. Instructie, normen, psychometrische gegevens. Lisse: Swets &Zeitlinger.

Bleichrodt, N., Resing, W.C.M., Drenth, P.J.D. & Zaal, J.N. (1987). Intelligentie-meting bij kinderen.Lisse: Swets & Zeitlinger.

Bollen, N. (1991). Cognitief aanvangsniveau jongste kleuters basisonderwijs. OVG Groningen: internverslag.

Bollen, N. (1996). De cognitieve ontwikkeling van kleuter tot achtjarige in het basisonderwijs. OVGGroningen: intern verslag.

Bomers, A.J.A.M. & Mugge, A.M. (1985). Reynell Taalontwikkelingstest: Nederlandse instructie.Nijmegen: Berkhout.

Bon, W.H.J. van (1982). TvK Taaltests voor Kinderen. Handleiding. Lisse: Swets & Zeitlinger.Bracken, B.A. & McCallum, R.S. (1998). UNIT Universal Nonverbal Intelligence Test. Itaska, IL:

Riverside Publishing.Brouwer, A., Koster, M. & Veenstra, B. (1995). Validation of the Snijders-Oomen test

(SON-R 2,-7) for Dutch and Australian children with disabilities. RU Groningen: internverslag.

Brown, L., Sherbenou, R.J. & Johnsen, S.K. (1982). Test of Nonverbal Intelligence. Austin, TX:Pro-Ed.

Brown, L., Sherbenou, R.J. & Johnsen, S.K. (1990). TONI-2 Test of Nonverbal Intelligence.Examiner’s manual. Second Edition. Austin, TX: Pro-Ed.

Carroll, J.B. (1993). Human cognitive abilities. A survey of factor-analytic studies. Cambridge:Cambridge University Press.

Page 148: SON-R 2 - Tests & Test-research

206 SON-R 2,-7

Cattell, R.B. (1971). Abilities; their structure, growth, and action. Boston: Houghton Mifflin.CBS (1993). Centraal bureau voor de statistiek: Statistisch Jaarboek 1993. ’s-Gravenhage: SDU/

uitgeverij.CBS (1994). Centraal bureau voor de statistiek: de leefsituatie van de nederlandse bevolking 1993,

kerncijfers. ’s-Gravenhage: SDU/uitgeverij.Coultre-Martin, J.P. le, Wijnberg-Williams, B.J., Meulen, B.F. van der & Smrkovsky, M. (1988). BOS

2-30. Normen voor kinderen met een vermoede hoorstoornis of met een spraak- of taalstoornis.Tijdschrift voor Orthopedagogiek, 27, 75-84.

Cronbach, L.J. (1951). Coefficient alpha and the internal structure of tests. Psychometrika, 16, 297-334.

Cronbach, L.J., Schönemann, P. & McKie, D. (1965). Alpha coefficients for stratified parallel tests.Educational and Psychological Measurement, 25, 291-312.

Dekker, R. (1987). Intelligentie van visueel gehandicapte kinderen in de leeftijd van 6 tot 15 jaar.Amsterdam: VU Uitgeverij.

Drenth, P.J.D. (1966). De psychologische test. Deventer: Van Loghum Slaterus.Driesens, N., Horn, J. ten, Paro, I., Schoemaker, M. & Swartberg, D. (1994). De mogelijke samenhang

tussen twee niet-verbale intelligentietests: SON-R 2,-7 en de TONI-2. RU Groningen: internverslag.

Dunn, L.M. & Dunn, L.M. (1981). PPVT Peabody Picture Vocabulary Test – Revised. Manual forForms L and M. Circle Pines, MN: American Guidance Service.

Eldering, L. & Vedder, P. (1992). OPSTAP: een opstap naar meer schoolsucces? Amsterdam/Lisse:Swets & Zeitlinger.

Eldik, M.C.M. van, Schlichting, J.E.P.T., Lutje Spelberg, H.C., Meulen, Sj. van der & Meulen, B.F.van der (1995). Reynell Test voor Taalbegrip. Handleiding. Nijmegen: Berkhout.

Elliott, C.D., Murray, D.J. & Pearson, L.S. (1979-82). British ability scales: Manuals. Windsor:National Foundation for Educational Research.

Elsjan, M., Kooi, M. van de, Kuiper, M., Raaijmakers, M. & Wensink, J. (1994). SON-R 2,-7 enTOMAL: samenhang tussen een niet-verbale intelligentietest en een geheugentest. RU Gronin-gen: intern verslag.

Evers, A., Vliet-Mulder, J.C. van & Laak, J. ter (1992). Documentatie van Tests en Testresearch inNederland. Assen: Van Gorcum.

Flynn, J.R. (1987). Massive IQ Gains in 14 Nations: What IQ tests Really Measure. PsychologicalBulletin, 2, 171-191.

Frostig, M., Lefever, D.W. & Whittlesey, J.R.B. (1966). Administration and scoring manual for theMarianne Frostig Developmental Test of Visual Perception. Palo Alto, CA: Consulting Psycho-logists Press.

Goswami, U. (1991). Analogical reasoning: what develops? A review of research and theory. ChildDevelopment, 62, 1-22.

Guilford, J.P. & Fruchter, B. (1978). Fundamental statistics in psychology and education (6th ed.).New York: McGraw-Hill.

Guttman, L. (1945). A basis for analyzing test-retest reliability. Psychometrika, 10, 255-282.Haan, N. de & Tellegen, P.J. (1986). De herziening van de schriftelijke taaltest voor doven. RU

Groningen: intern verslag.Haasen, P.P. van, Bruyn, E.E.J. de, Pijl, Y.J., Poortinga, Y.H., Lutje Spelberg, H.C., Steene,

G. vander, Coetsier, P., Spoelders-Claes & Stinissen, J. (1986). WISC-R, WechslerIntelligence Scale for Children – Revised. Nederlandstalige uitgave. Lisse: Swets & Zeitlinger.

Hambleton, R.K. & Swaminathan, H. (1985). Item response theory: Principles and applications.Boston, MA: Kluwer-Nijhoff.

Hammill, D.D., Pearson, N.A. & Voress, J.K. (1993). DTVP-2 Developmental Test of Visual Percep-tion. Examiner’s manual. Second Edition. Austin, TX: Pro-Ed.

Hammill, D.D., Pearson, N.A. & Wiederholt, J.D. (1996). CTONI Comprehensive Test of NonverbalIntelligence. Examiner’s Manual. Austin, TX: Pro-Ed.

Harinck, F. & Schoorl, P. (1987). Wast vernieuwde WISC-R werkelijk witter? Kind en adolescent, 3,109-118.

Page 149: SON-R 2 - Tests & Test-research

207REFERENCES

Harris, S.H. (1982). An evaluation of the Snijders-Oomen Nonverbal Intelligence Scale for YoungChildren. Journal of Pediatric Psychology, 7, 3, 239-251.

Hessels, M.G.P. (1993). Leertest voor Etnische Minderheden. Theoretische en EmpirischeVerantwoording. Rotterdam: RISBO.

Hofstee, W.K.B. (1990). Toepasbaarheid van psychologische tests bij allochtonen. Rapport van detestscreeningscommissie ingesteld door het LBR in overleg met het NIP. Utrecht: LandelijkBureau Racismebestrijding.

Hofstee, W.K.B. & Tellegen, P.J. (1991). SON 2,-7, subsidie-aanvraag NWO 560-267-033. Gronin-gen: RUG Persoonlijkheids- en Onderwijspsychologie.

Horn, J. ten (1996). Amerikaanse validering van de Snijders-Oomen niet-verbale intelligentietest voorjonge kinderen, de SON-R 2,-7. RU Groningen: intern verslag.

Jenkinson, J., Roberts, S., Dennehy, S. & Tellegen, P. (1996). Validation of the Snijders-OomenNonverbal Intelligence Test – Revised 2,-7 for Australian Children with Disabilities. Journalof Psychoeducational Assessment, 14, 276-286.

Kaufman, A.S. (1975). Factor Analysis of the WISC-R at 11 age levels between 6, and 16, years.Journal of Consulting and Clinical Psychology, 43, 135-147.

Kaufman, A.S. & Kaufman, N.L. (1983). K-ABC Kaufman Assessment Battery for Children.Interpretive Manual. Circle Pines, MN: American Guidance Service.

Kiers, H.A.L. (1990). SCA: een programma voor simultane component analyse. Groningen: IEC,ProGamma.

Kiers, H.A.L. & ten Berge, J.M.F. (1989). Alternating least squares algoritms for simultaneouscomponents analysis with equal weight matrices in two or more populations. Psychometrika,54, 467-473.

Kievit, Th. & Tak, J.A. (1996). De praktijk van de hulpverlening en het gebruik van de regulatievecyclus. In: Kievit, Th., Wit, J de, Groenendaal, J.H.A. & Tak, J.A. (eds.), Handboek psycho-diagnostiek voor de hulpverlening aan kinderen. Utrecht: De Tijdstroom.

Laros, J.A. & Tellegen, P.J. (1991). Construction and validation of the SON-R 5,-17, the Snijders-Oomen non-verbal intelligence test. Groningen: Wolters-Noordhoff.

Lienert, G.A. (1961). Testaufbau und Testanalyse. Weinheim: Verlag Julius Beltz.Lombard, A.D. (1981). Success begins at Home. Educational Foundations of Pre-schoolers.

Massachusetts, Toronto: Lexington Books.Lord, F.M. (1980). Applications of item response theory to practical testing problems. Hillsdale, NJ:

Lawrence Erlbaum.Lord, F.M. & Novick, M.R. (1968). Statistical Theories of Mental Test Scores. Reading, MA:

Addison-Wesley Publishing Company.Lutje Spelberg, H.C. & Van der Meulen, Sj. (1990). Het meten van taalbegrip en taalproductie,

subsidie-aanvraag NWO 560-256-040. Groningen: RUG afd. Orthopedagogiek.Lynn, R. (1994). Sex differences in intelligence and brain size: a paradox resolved. Personality and

Individual Differences, 17, 2, 257-271.Lynn, R. & Hampson, S. (1986). The rise of national intelligence: evidence from Britain, Japan and the

U.S.A.. Personality and Individual Differences, 1, 23-32.McCarthy, D. (1972). Manual for the McCarthy Scales of Children’s Abilities. San Antonio: The

Psychological Corporation.Meulen, B.F. van der & Smrkovsky, M. (1983). BOS 2-30 Bayley Ontwikkelingsschalen. Handleiding.

Lisse: Swets & Zeitlinger.Meulen, B.F. van der & Smrkovsky, M. (1986). MOS 2,-8, McCarthy Ontwikkelingsschalen.

Handleiding. Lisse: Swets & Zeitlinger.Meulen, B.F. van der & Smrkovsky, M. (1987). BOS 2-30 Bayley Ontwikkelingsschalen. Handleiding

bij de niet-verbale versie. Lisse: Swets & Zeitlinger.Millsap, R.E. & Meredith, W.M. (1988). Component analysis in cross-sectional and longitudinal data.

Psychometrika, 53, 123-134.Mislevy, R.J. & Bock, R.D. (1990). BILOG 3: Item Analysis and Test Scoring with Binary Logistic

Models. Mooresville, IN: Scientific Software.Neutel, R.J., Meulen, B.F. van der & Lutje Spelberg, H.C. (1996). GOS 2,-4, Groningse

OntwikkelingsSchalen. Handleiding. Lisse: Swets & Zeitlinger.

Page 150: SON-R 2 - Tests & Test-research

208 SON-R 2,-7

Nunnally, J.C. (1978). Psychometric Theory (2nd ed.). New York: McGraw-Hill.Nunnally, J.C. & Bernstein, I.H. (1994). Psychometric Theory (3rd ed.). New York: McGraw-Hill.Raven, J.C. (1962). Coloured Progressive Matrices. London: Lewis.Rekveld, I. (1994). De cognitieve ontwikkeling van kleuters in het basisonderwijs. OVG

Groningen: Intern verslag.Resing, W.C.M., Bleichrodt, N. & Drenth, P.J.D. (1986). Het gebruik van de RAKIT bij allochtoon

etnische groepen. Nederlands Tijdschrift voor de Psychologie, 41, 179-188.Reynell, J.K. (1977). Reynell Developmental Language Scales. Windsor: NFER-Nelson.Reynell, J.K. (1985). Reynell Developmental Language Scales, second revision. Windsor: NFER-

Nelson.Reynolds, C.R. & Bigler, E.D. (1994). TOMAL Test of Memory and Learning. Examiner’s manual.

Austin, TX: Pro-Ed.Roelandt, Th., Roijen, J.H.M. & Veenman, J. (1992). Minderheden in Nederland: statistisch vade-

mecum 1992. ’s-Gravenhage: SDU/uitgeverij.Sattler, J.M. (1992). Assessment of Children. Revised and Updated Third Edition. San Diego, CA:

J.M. Sattler, Publisher, Inc.Schlichting, J.E.P.T., Eldik, M.C.M. van, Lutje Spelberg, H.C., Meulen, Sj. van der & Meulen, B.F.

van der (1995). Schlichting Test voor Taalproduktie. Handleiding. Nijmegen: Berkhout.Schroots, J.J.F. & Alphen de Veer, R.J. van (1976). LDT Leidse Diagnostische Test, deel 1

Handleiding. Amsterdam: Swets & Zeitlinger.Sijtsma, K. (1993). Kaf en koren onder Nederlandse tests. De Psycholoog, 28, 12, 502-503.Smulders, F.J.H. (1963). STUTSMAN intelligentietest voor kleuters. Nederlandstalige bewerking.

Nijmegen: Berkhout.Snijders-Oomen, N. (1943). Intelligentieonderzoek van doofstomme kinderen. Nijmegen: Berkhout.Snijders, J.Th. & Snijders-Oomen, N. (1958) eerste editie, (1970) tweede editie. Snijders-Oomen

niet-verbale intelligentieschaal SON-’58. Groningen: Wolters-Noordhoff.Snijders, J.Th. & Snijders-Oomen, N. (1976). Snijders-Oomen Non-verbal Intelligence Scale, SON

2,-7. Groningen: Tjeenk Willink BV.Snijders, J.Th., Tellegen, P.J. & Laros J.A. (1989). Snijders-Oomen non-verbal intelligence test,

SON-R 5,-17. Manual and research report. Groningen: Wolters-Noordhoff.Snippe, M.D. (1996). Prestaties van kinderen met autisme en aan autisme verwante stoornissen op de

SON-R 2,-7. RU Groningen: Intern verslag.SPSS Inc. (1990). SPSS/PC+ 4.0 Advanced Statistics. Chicago, Illinois: SPSS Inc.Starren, J. (1975). SSON 7-17. De ontwikkeling van een nieuwe versie van de SON voor 7-17 jarigen.

Verantwoording en handleiding. Groningen: Wolters-Noordhoff.Stinissen, J. & Steene, G. vander (1981). WPPSI Wechsler Preschool and Primary Scale of

Intelligence. Handleiding bij de Vlaamse aanpassing. Lisse: Swets & Zeitlinger.Struiksma, A.J.C. & Geelhoed, J.W. (1996). Intelligentieonderzoek. In: Kievit, Th., Wit, J de, Groenen-

daal, J.H.A. & Tak, J.A. (eds.), Handboek psychodiagnostiek voor de hulpverlening aankinderen. Utrecht: De Tijdstroom.

Stutsman, R. (1931). Mental measurement of preschool children. Yonkers-on-Hudson, NY: WorldBook.

Tellegen, P.J. (1993). A nonverbal alternative to the Wechsler Scales: The Snijders-Oomen NonverbalIntelligence Tests. In First Annual South Padre Island International InterdisciplinaryConference on Cognitive Assessment of Children and Youth in School and Clinical Settings, ACompendium of Proceedings. Fort Worth, TX: CyberSpace Publishing Corporation.

Tellegen, P. (1997). An Addition and Correction to the Jenkinson et al. (1996) Australian SON-R2,-7 Validation Study. Journal of Psychoeducational Assessment, 15, 67-69.

Tellegen, P.J. & Laros, J.A. (1993a). The Snijders-Oomen Nonverbal Intelligence Tests: GeneralIntelligence Tests or Tests for Learning Potential? In: Hamers, J.H.M., Sijtsma, K. &Ruijssenaars, A.J.J.M. (eds.), Learning Potential Assessment: Theoretical, Methodological andPractical Issues. Amsterdam/Lisse: Swets & Zeitlinger.

Tellegen, P.J. & Laros, J.A. (1993b). The Construction and Validation of a Nonverbal Test ofIntelligence: The Revision of the Snijders-Oomen Tests. European Journal of PsychologicalAssessment, Vol 9, 2, 147-157.

Page 151: SON-R 2 - Tests & Test-research

209REFERENCES

Tellegen, P.J., Winkel, M. & Wijnberg-Williams, B.J. (1997). Snijders-Oomen Nonverbal IntelligenceTest SON-R 2,-7. Manual. Lisse: Swets & Zeitlinger

Tellegen, P.J., Wijnberg, B.J., Laros, J.A. & Winkel, M. (1992). Evaluatie van de SON 2,-7 tenbehoeve van de revisie. RU Groningen: intern verslag.

Verhelst, N.D. & Glas, C.A.W. (1995). Dynamic Generalizations of the Rasch Model. In: Fischer,G.H. & Molenaar, I.W. (eds.), Rasch Models: Foundations, Recent Developmemts, andApplications. New York: Springer-Verlag.

Warm, T.A. (1989). Weighted Likelihood Estimation of Ability in Item Response Theory.Psychometrika, 54, 3, 427-450.

Wechsler, D. (1967). Manual for the Wechsler Preschool and Primary Scale of Intelligence. SanAntonio, TX: The Psychological Corporation.

Wechsler, D. (1974). Manual for the Wechsler Intelligence Scale for Children – Revised. San Antonio,TX: The Psychological Corporation.

Wechsler, D. (1989). WPPSI-R, Wechsler Preschool and Primary Scale of Intelligence – Revised.Manual. San Antonio, TX: The Psychological Corporation.

Wechsler, D. (1991). WISC-III Manual. San Antonio, TX: The Psychological Corporation.Westerlaak, J.M. van, Kropman, J.A. & Collaris, J.W.M. (1975). Beroepenklapper. Nijmegen:

Instituut voor Toegepaste Sociologie (ITS).Wijnands, A. (1997). De SON-R tests: verkennend onderzoek van de SON-R tests bij kinderen en

volwassenen met een verstandelijke handicap. RU Groningen: intern verslag.Zimmerman, I.L., Steiner, V.G. & Pond, R.E. (1992). PLS-3 Preschool Language Scale-3. Examiner’s

Manual. San Antonio, TX: The Psychological Corporation.Zimowski, M.F., Muraki, E., Mislevy, R.J. & Bock, R.D. (1994). BIMAIN 2, Multiple-group IRT

Analysis and Test Maintenance for Binary Items. Chicago, IL: Scientific Software International.