11
Rasch analysis of a Spanish language-screening parent survey Mark Guiberson a, *, Barbara L. Rodriguez b a University of Wyoming, WY 82071-2000, USA b University of New Mexico, NM 87131, USA The Hispanic population is the fastest growing minority group in the United States. Between the years 2000 and 2010, the Hispanic population grew by 43%, with young children representing the largest portion of this growth (U.S. Census Bureau, 2010). The Census Bureau estimates that in the coming years the Latino school-age population will increase from 11 million to 28 million and this number will continue to grow as Latino children account for almost 26% of the nation’s population under five years of age. A large segment of these children are Spanish-speaking; approximately 79% percent of English language learners (ELLs) are from Spanish-speaking backgrounds (Paya ´n & Nettles, 2007; U.S. Census, 2010). The increasing presence of young Spanish-speaking children in the U.S. combined with the lack of available tools presents the need to develop language-screening measures to identify children within this population who are at-risk for language impairment (LI). Indeed, many federally funded preschool programs require that children be screened for the risk of language impairment and other disabilities (U.S. Department of Education, 2012; U.S. Department of Health & Human Services, 2001). In addition to the paucity of language screening measures available in Spanish, there is a concomitant shortage of Spanish-English bilingual speech-language pathologists (SLPs) to complete screening and assessment activities (American Speech-Language- Hearing Association, 2010a). The shortage of bilingual providers creates heavy workloads for bilingual SLPs, which makes accessing bilingual SLPs even more difficult (American Speech-Language-Hearing Association, 2010b; Guiberson & Atkins, 2012). To complicate this matter early interventionists, including SLPs, report that they lack confidence when screening linguistically diverse children and the lack of easy to use, reliable and valid screening tools is a major obstacle to serving these children and families (Guiberson & Atkins, 2012; U.S. Office of Special Education Programs, 2007). Research in Developmental Disabilities 35 (2014) 646–656 A R T I C L E I N F O Article history: Received 5 October 2013 Received in revised form 17 December 2013 Accepted 26 December 2013 Available online 20 January 2014 Keywords: Parent survey Preschool Spanish-speaking Rasch analysis A B S T R A C T The purpose of this study was to evaluate and refine items from a parent survey designed to screen the language skills of Spanish-speaking preschoolers. This investigation applied Rasch modeling to systematically evaluate and identify items that demonstrated favorable qualities. A set of 124 parent survey items was administered to 107 Spanish-speaking parents of preschool age children. Parents completed survey items intended to provide a global measure of preschool language abilities. Rasch analyses of the survey items were conducted using WINSTEPS. Results indicated that 59 items, all vocabulary items, fit the Rasch model. Sufficient unidimensionality was obtained, with the model accounting for 58% of the variance. Item difficulty estimates ranged from 7.43 to 4.12, with a shortage of items at both the lower ability level and at the higher ability level. Analyses of pruned and remaining items identified the type of items that may be most useful for a refined item bank. These results will inform the development of new items for a Spanish language-screening parent survey for preschool age children. ß 2014 Elsevier Ltd. All rights reserved. * Corresponding author. Tel.: +1 307 766 3985. E-mail address: [email protected] (M. Guiberson). Contents lists available at ScienceDirect Research in Developmental Disabilities 0891-4222/$ see front matter ß 2014 Elsevier Ltd. All rights reserved. http://dx.doi.org/10.1016/j.ridd.2013.12.011

Rasch analysis of a Spanish language-screening parent survey

Embed Size (px)

Citation preview

Page 1: Rasch analysis of a Spanish language-screening parent survey

Research in Developmental Disabilities 35 (2014) 646–656

Contents lists available at ScienceDirect

Research in Developmental Disabilities

Rasch analysis of a Spanish language-screening parent survey

Mark Guiberson a,*, Barbara L. Rodriguez b

a University of Wyoming, WY 82071-2000, USAb University of New Mexico, NM 87131, USA

A R T I C L E I N F O

Article history:

Received 5 October 2013

Received in revised form 17 December 2013

Accepted 26 December 2013

Available online 20 January 2014

Keywords:

Parent survey

Preschool

Spanish-speaking

Rasch analysis

A B S T R A C T

The purpose of this study was to evaluate and refine items from a parent survey designed

to screen the language skills of Spanish-speaking preschoolers. This investigation applied

Rasch modeling to systematically evaluate and identify items that demonstrated favorable

qualities.

A set of 124 parent survey items was administered to 107 Spanish-speaking parents of

preschool age children. Parents completed survey items intended to provide a global

measure of preschool language abilities. Rasch analyses of the survey items were

conducted using WINSTEPS.

Results indicated that 59 items, all vocabulary items, fit the Rasch model. Sufficient

unidimensionality was obtained, with the model accounting for 58% of the variance. Item

difficulty estimates ranged from �7.43 to 4.12, with a shortage of items at both the lower

ability level and at the higher ability level. Analyses of pruned and remaining items

identified the type of items that may be most useful for a refined item bank. These results

will inform the development of new items for a Spanish language-screening parent survey

for preschool age children.

� 2014 Elsevier Ltd. All rights reserved.

The Hispanic population is the fastest growing minority group in the United States. Between the years 2000 and 2010, theHispanic population grew by 43%, with young children representing the largest portion of this growth (U.S. Census Bureau,2010). The Census Bureau estimates that in the coming years the Latino school-age population will increase from 11 millionto 28 million and this number will continue to grow as Latino children account for almost 26% of the nation’s populationunder five years of age. A large segment of these children are Spanish-speaking; approximately 79% percent of Englishlanguage learners (ELLs) are from Spanish-speaking backgrounds (Payan & Nettles, 2007; U.S. Census, 2010).

The increasing presence of young Spanish-speaking children in the U.S. combined with the lack of available tools presentsthe need to develop language-screening measures to identify children within this population who are at-risk for languageimpairment (LI). Indeed, many federally funded preschool programs require that children be screened for the risk of languageimpairment and other disabilities (U.S. Department of Education, 2012; U.S. Department of Health & Human Services, 2001). Inaddition to the paucity of language screening measures available in Spanish, there is a concomitant shortage of Spanish-Englishbilingual speech-language pathologists (SLPs) to complete screening and assessment activities (American Speech-Language-Hearing Association, 2010a). The shortage of bilingual providers creates heavy workloads for bilingual SLPs, which makesaccessing bilingual SLPs even more difficult (American Speech-Language-Hearing Association, 2010b; Guiberson & Atkins,2012). To complicate this matter early interventionists, including SLPs, report that they lack confidence when screeninglinguistically diverse children and the lack of easy to use, reliable and valid screening tools is a major obstacle to serving thesechildren and families (Guiberson & Atkins, 2012; U.S. Office of Special Education Programs, 2007).

* Corresponding author. Tel.: +1 307 766 3985.

E-mail address: [email protected] (M. Guiberson).

0891-4222/$ – see front matter � 2014 Elsevier Ltd. All rights reserved.

http://dx.doi.org/10.1016/j.ridd.2013.12.011

Page 2: Rasch analysis of a Spanish language-screening parent survey

M. Guiberson, B.L. Rodriguez / Research in Developmental Disabilities 35 (2014) 646–656 647

The use of parent surveys to screen the language skills of young Spanish-speakers may provide a solution to some of thechallenges associated with accurate identification of children at-risk for LI. A monolingual English-speaking SLP could use aSpanish parent survey as a first step in the screening process. This would help determine which children should be referredfor a second level screening with a Spanish-speaking SLP and which children should be passed. Ideally this will decreaseunnecessary referrals to bilingual SLPs and lead to accurate identification of children who have language difficulties. Finally,Spanish parent surveys could also provide a way for families to be actively involved in the screening process asrecommended by federal guidelines (Guiberson, 2008; Guiberson & Banerjee, 2012; IDEA, 1997, 2004; U.S. Department ofHealth & Human Services, 2006).

1. Language-screening surveys for use with Spanish-speaking preschoolers

Efficient and effective screening measures are needed in Spanish to differentiate children who are at risk for LI fromchildren who have typical language development (TD). There are several parent surveys that have been used with Spanish-speaking preschool age children (Guiberson, 2008; Paradis, Emmerzael, & Duncan, 2010; Squires, Twombly, Bricker, & Potter,2009). These tools are reviewed in the following section in order to describe the classification accuracy and usefulness ofeach tool.

1.1. Alberta language development questionnaire

The Alberta language development questionnaire (ALDeQ) is a parent questionnaire developed to screen the languageskills of preschool age children of linguistic minority backgrounds in Canada (Paradis et al., 2010). Many of the ALDeQ itemswere based on Spanish parent survey questions used in an earlier study of school age children (Restrepo, 1998), but uniqueitems were also created. The ALDeQ was also developed to be non-first language specific, with the goal of using a standard setof questions with parents from a variety of linguistic backgrounds. The questionnaire consists of four sections: earlymilestones, current first language abilities, behaviors patterns/activity preferences, and family history. Parents are asked todescribe their child’s skills using rating scales; this format was selected over yes/no questions so that the tool could capturedetailed developmental information about emergent skills.

The developers of the ALDeQ completed a norming study that included 168 children and parents from nine differentlanguage backgrounds, including Spanish (Paradis et al., 2010). Language impaired and typically developing groupcomparisons revealed significant group differences across subtests. However, discriminant analyses revealed inadequateclassification accuracy (with poor sensitivity described but no coefficients reported). The authors of the ALDeQ caution thatgiven the tool’s poor sensitivity, the ALDeQ should be used qualitatively in combinations with other sources of information.These results indicated that when used alone, the ALDeQ is inadequate to screen the language skills of Spanish-speakingpreschoolers.

1.2. Spanish ages and stages questionnaires

The ages and stages questionnaires (ASQ) (; Squires et al., 2009) are a set of age specific questionnaires that can be used toscreen preschool age children. The questionnaires are used to screen five developmental areas including communication.Parents rate a child’s development with respect to specific age- appropriate behaviors sampled from these domains byanswering each question with a 0 (not yet), 5 (sometimes), or 10 (yes). Scores are then used to classify children as either pass(i.e., scores above the empirically derived cutoff score in all domains) or at risk (i.e., on or below the cutoff score in anydomain). The English ASQ has demonstrated favorable psychometric properties including strong concurrent validity, test-retest reliability, and adequate sensitivity and specificity (Hamilton, 2006; Skellern, Rogers, & O’Callaghan, 2001; Squireset al., 2009)

The ASQ has been translated or adapted for use with Spanish-speaking families. Two studies have evaluated theclassification accuracy of the Spanish ASQ communication subscale (Guiberson & Rodriguez, 2010; Guiberson, Rodriguez, &Dale, 2011). The first study included preschool age Spanish-speaking children (N = 48). Significant concurrent validity(r = .56, p = <.01) was observed with the Spanish Preschool-Language Scale 4th edition (SPLS-4), but inadequate sensitivity(.59). Similar findings were revealed in the second study with toddler age children, significant concurrent validity with theSPLS-4 (r = .71, p = <0.01) but inadequate sensitivity (.56). Results from these two studies indicate that the ASQ hasinadequate classification accuracy when used with Spanish-speaking preschool age children, and therefore should not beused for screening the language skills of these children.

1.3. Pilot Inventario-III

The Pilot Inventario-III (Pilot INV-III; Guiberson, 2008) is a Spanish version of the MacArthur-Bates CommunicativeDevelopment Inventory-III (Fenson et al., 2007). This tool is meant to measure global language ability in preschool agechildren. It includes multiple components, including a vocabulary checklist, a sentence usage section, and a language usagesection. The vocabulary checklist is not meant to be a comprehensive inventory of all of the words that a child uses; rather itis a list of words that capture a range of vocabulary observed in preschool age children. The sentence usage section presents

Page 3: Rasch analysis of a Spanish language-screening parent survey

M. Guiberson, B.L. Rodriguez / Research in Developmental Disabilities 35 (2014) 646–656648

12 sentence pairs that convey the same message but differ in terms of grammatical complexity. Parents are instructed to readthe sentence pairs and rate which sentence sounds more like the way their child speaks. The language usage section includes12 yes/no questions about the child’s semantic and syntactic development.

In a pilot study, the Pilot INV-III demonstrated significant concurrent validity with other language measures (r = 57,p = <.05; Guiberson, 2008) and in a classification accuracy study that included forty-eight children with and withoutexpressive language delays the Pilot INV-III demonstrated favorable psychometric characteristics (Guiberson & Rodriguez,2010). All three subscales had satisfactory inter-item reliability (Vocabulary = .92, Sentences = .95, Usage = .94), whichsuggests that the Pilot INV-III subscales measure linguistic skills in a uniform or consistent way. Concurrent validity betweenthe Pilot INV-III and a standardized language measure revealed a positive and significant relationship (r = .62, p = <.01,r2 = .38), as well as adequate classification accuracy (LR+ = 4.25, 95% CI [1.88, 9.58], and LR� = 0.22, 95% CI [0.09, 0.55]). Thesequalities indicate that the Pilot INV-III may provide some useful information suggestive of the presence or absence ofdisorder, however based on the classification accuracy measures further refinement of this tool is needed in order to make adefinitive screening determination. Analyses that evaluate each survey item individually, such as item response theory orRasch modeling, is needed to refine the Pilot INV-III and to identify questions that are most helpful in screening for risk of LIin Spanish-speaking populations.

2. Applying the Rasch modeling framework

Rasch modeling is a statistical modeling approach used to analyze test or survey items that are meant to measure a latentconstruct (e.g., language ability). A comprehensive tutorial of Rasch modeling is beyond the scope of this article, butnumerous resources are available for review (Baylor et al., 2011; Bond & Fox, 2007; DeVellis, 2003; Henard, 2000). Raschmodeling uses item difficulty estimates to explain the relationship between individual items, the latent trait, and theprobability of a given individual’s performance on an item. Item difficulty estimates (b) and person ability estimates (u) areboth interpreted on a logit scale, and their relationship to each test item can be explained by a monotonically increasingfunction called the item characteristic curve (ICC; Bond & Fox, 2007; DeVellis, 2003).

There are several fundamental assumptions when applying Rasch modeling (Henard, 2000). First, item discrimination isassumed to be constant across test/survey items. Unidimensionality is a second assumption, which requires that only oneability or latent trait be measured by the various items that make up the test/survey. Local independence is the thirdassumption; this assumption requires an examinee’s responses to test items to be statistically independent of each other(this does not imply that items would not be intercorrelated though).

There are several benefits to Rasch modeling (Henard, 2000). Theoretically, the estimate of a participant’s ability isindependent of the sample of test items (item free); and the estimate of item difficulty is independent of the sample used foritem calibration (person free). Another benefit to Rasch modeling is that precision or fit statistics for items and persons areprovided allowing for model calibration. This approach also determines how items fit within the tool in terms of itemdifficulty and determines if the items on the tool are capturing the range of abilities desired. The process of pruningunnecessary items and developing new items refines a tool’s item bank, so that ultimately all items are useful.

3. Current study

In the current study we applied Rasch modeling to analyze items from the Pilot INV-III obtained from a sample of parentsof Spanish-speaking preschool age children who represented a range of language abilities. The latent construct for of the PilotINV-III is language ability in Spanish-speaking preschoolers. The construct of language ability encompasses both typical andimpaired language ability. Thus, knowledge from this study will be used in two ways: (a) to assist in identifying items thatindicate when children are typically developing or at-risk for LI, and (b) to inform the development of a refined Spanishparent survey item bank. The long-term goal of this research is to create and refine an item bank of parent survey items thatcan be used to screen the language skills of Spanish-speaking preschoolers. As the first step of analyses, we evaluatedconstruct validity of the parent survey items. Next, we examined fit statistics. During this iterative model calibration process,persons and items that did not fit the model were pruned and excluded from subsequent analyses. With the remaining items,unidimensionality of the survey items was evaluated to ensure that the remaining survey items were measuring one latentconstruct. As a final step, estimated item difficulty, item characteristic curves (ICCs), and a person-item map were examined.

4. Method

4.1. Participants

One hundred and seven Spanish-speaking children between 36 and 71 months of age (M = 47.4, SD = 10.60) and theirparents participated in this study. The sample included 63 boys and 44 girls. Forty-eight children and families alsoparticipated in an earlier study describing the classification accuracy of the Pilot INV-III (Guiberson & Rodriguez, 2010). Theresearch team recruited families from two regional Head Start programs and state funded early childhood and healthcareprograms located in two states. Only parents and children that met the inclusion criteria of speaking Spanish 80% of the timeor more, based upon parent intake interview and report, were included in the study. Children also met the following

Page 4: Rasch analysis of a Spanish language-screening parent survey

M. Guiberson, B.L. Rodriguez / Research in Developmental Disabilities 35 (2014) 646–656 649

inclusionary criteria: (a) normal hearing; (b) no known neurological impairment; and (c) lack of severe phonologicalimpairment. Given the fact that the primary purpose of this study was to evaluate items from a Spanish parent survey oflanguage ability with the goal of identifying items that may assist in screening this population, the researchers recruitedSpanish-speaking children with a range of language abilities, including children with language scores in the LI range. Thirtychildren had Spanish PLS-4 expressive language scores that were in the LI range (standard scores of �75; M = 71.77,SD = 9.34) while seventy-seven children had Spanish PLS-4 scores above this point (M = 101.10, SD = 8.10).

4.2. Instruments

4.2.1. Pilot Inventario-III

The Pilot Inventario-III (Pilot INV-III) (Guiberson, 2008; Guiberson & Rodriguez, 2010) is a Spanish version of theMacArthur-Bates Communicative Development Inventory-III (CDI-III) (Fenson et al., 2007). The Pilot INV-III was createdwith the approval of the developer of the CDI-III (P. Dale, personal communication, March 12, 2008). However, the Pilot INV-III is not part of the MacArthur-Bates CDI tools, nor is it commercially available. The Pilot INV-III is meant to measure globallanguage skills during the preschool years, and includes a vocabulary checklist (n = 100), sentence usage questions (n = 12),and language usage questions (n = 12). In an earlier study, the Pilot INV-III had satisfactory inter-item reliability, significantcurrent validity with a standardized language measure, and adequate classification accuracy measures (Guiberson &Rodriguez, 2010).

4.2.2. Spanish preschool language scales-4

The Spanish version of the preschool language scales-fourth edition (SPLS-4) is an assessment developed from a Spanish-language model (Zimmeraman, Steiner, & Pond, 2002). The SPLS-4 was not part of the Rasch analyses, however theexpressive subtest was used as an external source that established that children who were included in the study exhibited arange of language abilities. The range of SPLS-4 scores for the sample are reported in participants section. The SPLS-4 wasstandardized on 1,188 Spanish-speaking children living in the United States from monolingual Spanish or mostly Spanishbilingual households. A SPLS-4 normative study (n = 575) indicated strong test-retest reliability (.77–.86) and split-halfinternal consistency (.80–.90). In a concurrent validity study (n = 140), the expressive section of the SPLS-4 demonstratedacceptable classification accuracy (sensitivity = 92–.91, specificity = .77–.61).

4.3. Procedures

Prior to data collection, the researchers determined that the sample would be comprised of at least 100 families; this isthe recommended sample size for a Rasch model (Linacre, 2011). General recruitment efforts included sending home flyersthrough Head Start and participating in health and literacy fairs and family night events hosted by these programs.Interested families who met the inclusionary criteria were presented with an informed consent form in Spanish that hadbeen approved by a university institutional review board.

Consenting families who met the criteria for the study were given or mailed the Pilot INV-III and scheduled a study visit.Research assistants contacted families two days prior to scheduled study visits to remind them to bring the completed PilotINV-III to their scheduled visit. All study visits were conducted within two weeks of initial contact, and occurred in either aparent education room or a child assessment room at the collaborating preschool centers. During the study visits, parents(either mothers only or mothers and fathers) accompanied their children while a Spanish bilingual SLP administered theSPLS-4. All testing was completed in Spanish because the children were Spanish speakers.

5. Data analysis

The WINSTEPS (Linacre, 2011) program was used for the Rasch analyses of Pilot INV-III items. Rasch models provideprovisional estimates of item difficulty and person ability, and compare expected responses based on these estimates. TheWINSTEPS program also provides fit statistics for each person and item in the form of weighted information (infit) andoutlier-sensitive (outfit) mean square (MSQ) values (for a review see, Bond & Fox, 2007).

To meet the goal of this study, a series of analyses were conducted. As a first step, construct validity was examined. Next,fit statistics were inspected in order to insure that persons and items fit the model. Third, a unidimensionality analysis wascompleted. Finally, item difficulty estimates and item-characteristic curves were inspected. These steps assisted inidentifying items that the model indicated tap into the construct of language ability, and ultimately to the refinement of anitem bank of survey items.

6. Results

6.1. Step 1. Examine construct validity

The goal of this study was to identify items from a parent survey that tap into the language abilities of Spanish-speakingpreschoolers. One hundred and twenty four items that collectively demonstrated adequate classification accuracy

Page 5: Rasch analysis of a Spanish language-screening parent survey

M. Guiberson, B.L. Rodriguez / Research in Developmental Disabilities 35 (2014) 646–656650

(Guiberson & Rodriguez, 2010) were administered to a sample of individuals from the target population, Spanish-speaking preschoolers living in the United States. As a first step construct validity was examined. Within the context oftool development, construct validity describes whether items actually measure the underlying construct that they areintended to measure (Bryant, 2000). Point-measure correlations are Pearson product-moment correlations based onmeasures from the Rasch model, which correlate individual item response values and the corresponding person abilityestimates. This correlation informs whether the responses to each item align with the ability estimates of the persons.Point measure correlations range from �1 to 1, and in general should be noticeably positive �3 (Bond & Fox, 2007;Linacre, 2011). Point measure correlations can be used to identify problematic items that do not appear to map onto thetest’s latent construct, in this case language abilities in Spanish-speaking preschoolers. Point measure correlations valuesobtained for the 124 survey items were all positive and varied from .31 to .73, providing a preliminary indication ofadequate construct validity.

6.2. Step 2. Examine fit statistics

Having described the construct validity of the Pilot INV-III, initial model calibration was completed (Bond & Fox, 2007;Linacre, 2011). As a first step, the fit statistics of person ability estimates were examined. Outfit is examined first as it is ameasure of unexpected outlying observations, and at the person level outfit measures indicate whether a series ofresponses are inconsistent with the Rasch model. Unexpected patterns of responses or outlying observations are quantifiedby the infit measures. Outfit and infit MSQ values >2.0 are concerning, and indicate when persons should be pruned fromthe data set (Bond & Fox, 2007; Linacre, 2011). Of the 107 participants, eight had outfit values >2.0 (ranging from 2.02 to6.35), while infit values for all other participants were in the acceptable range. These eight cases included five children withSPLS-4 scores in the LI range and three with scores in the typical range. Unexpected responses analyses also confirmed thataccording to the model, these eight participants had responses that were unusual and inconsistent based on item difficultyestimates and person ability estimates. These eight participants were pruned from the data set and excluded from allsubsequent analyses.

The next step of calibration involved examining item fit statistics. This is an iterative process during which items that donot fit the Rasch model are pruned and a new model is run with the remaining items until a model with desirable fit statisticsis achieved. For each iteration, all items with infit or outfit MSQ values of �2.0 were excluded, and the analysis was repeateduntil all remaining items obtained infit and MSQ fit values of <2.0. A total of 65 items were pruned form the original set of124 items; deleted items included all of language usage items (n = 12) and sentences items (n = 12) as well as a sizeablenumber of vocabulary items (n = 41). Fifty-nine vocabulary items remained in the item bank. These items and their fitstatistics are presented in Table 1.

6.3. Step 3. Unidimensionality analysis

In order to describe construct dimensionality and to establish whether or not the remaining items represented the sameunderling construct, two analyses were completed. Point-measure correlations were completed with the 59 items that fit themodel to evaluate how closely item scores were correlated with total scores; this is an indicator of unidimensionality. All ofthe items had point-measure correlations ranging from .30 to .75 (see Table 1), indicating that the remaining itemsdemonstrated unidimensionality. Next, a Rasch principle components analysis of the residuals was calculated with the 59items. The Rasch model accounted for 58% of the variance, and the principal contrast of the residuals accounted for 2% of theunexplained variance. These coefficients also indicated sufficient unidimensionality.

6.4. Step 4. Item difficulty estimates and item characteristic curves

With the remaining 59 items, item difficulty estimates were calculated and ICCs were plotted. This combination ofinspecting model coefficients as well as graphical representation of the data is common in Rasch modeling (Linacre, 2011).Item difficulty estimates are presented in Table 1. ICCs for all 59 items are presented in Fig. 1. Roughly half of the items haditem difficulty estimates >0 and half had item difficulty estimates <0. In order to demonstrate items of varying difficulty, theICCs for four items of varying item difficulty estimates were plotted and are presented in Fig. 2. These ICCs illustrate howwords of differing item difficulty estimates fit the model. For example, vaso/glass had an item difficulty estimate of -7.43 (aneasier item) and is represented by the far left curve in Fig. 2. While peculiar/peculiar had an item difficulty estimate of 4.12 (amore difficult item) and is represented by the far right curve.

As a final step of viewing the item difficulty estimates of the remaining items, an item person map was generated. Theitem person map presents a visual display of the range of the latent trait that the instrument measures. The item person mapdisplays the model results along a scale of item difficulty, with both items and persons in the sample. When an item bank isintended for use with a wide range of individuals, the items should be spread across a range of ability levels. When there areno nearby items to a given ability estimate, the item bank has less precision. Fig. 3 presents the person item map for the itembank. Based on visual inspection, very few items were seen at the higher ability levels and a sparse number of items at thelower ability level were observed.

Page 6: Rasch analysis of a Spanish language-screening parent survey

Table 1

Rasch item statistics for the 59 Pilot INV-III items that fit the model.

Survey item Item difficulty Item error Infit MnSq Infit ZSTD Outfit MnSq Outfit ZSTD Point-measure

correlations

peculiar/peculiar 4.12 0.43 0.98 0 1.52 0.8 0.6

podrıa/might 3.18 0.37 0.78 �1.1 0.38 �0.9 0.69

material/material 2.54 0.34 0.99 0 0.75 �0.1 0.67

profundo/deep 2.54 0.34 1.17 0.9 0.77 �0.1 0.65

promesa/promise 2.31 0.34 0.91 �0.4 0.55 �0.5 0.7

usted mismo/yourself 2.31 0.34 1.01 0.1 0.91 0.1 0.67

odiar/hate 2.2 0.33 1.04 0.3 1.04 0.3 0.67

acerca/about 1.98 0.33 1.08 0.5 0.68 �0.3 0.68

cerco/fence 1.77 0.32 0.83 �1 0.67 �0.4 0.71

apresurar/hurry 1.77 0.32 1.03 0.2 0.7 �0.3 0.69

semana/week 1.77 0.32 0.97 �0.1 0.84 �0.1 0.69

cada uno/each 1.77 0.32 0.76 �1.4 0.51 �0.7 0.73

entre/between 1.67 0.32 1.04 0.3 1.22 0.5 0.67

estampilla/stamp 1.27 0.31 1.31 1.8 1.22 0.5 0.65

acera/sidewalk 1.27 0.31 0.75 �1.6 0.45 �0.9 0.74

idea/idea 1.27 0.31 1.36 2 1.01 0.2 0.65

encima de/on top of 1.27 0.31 0.92 �0.4 0.91 0.1 0.7

medir/measure 1.17 0.31 0.85 �0.9 0.61 �0.5 0.73

entonces/then 1.17 0.31 0.86 �0.8 0.51 �0.8 0.73

diferente/different 0.98 0.31 0.97 �0.1 0.72 �0.3 0.71

antes/before 0.8 0.3 0.79 �1.4 0.58 �0.6 0.74

enfermera/nurse 0.7 0.3 0.99 0 0.77 �0.2 0.71

aburrido/bored 0.7 0.3 0.67 �2.4 0.37 �1.2 0.76

olvidar/forget 0.61 0.3 1.3 1.8 1.35 0.7 0.65

vainilla/vanilla 0.52 0.3 1.2 1.3 1.51 0.9 0.66

vegetales/vegetables 0.52 0.3 1.16 1 1.51 0.9 0.67

asegurar/fasten 0.52 0.3 0.72 �2 0.78 �0.2 0.74

accidente/accident 0.43 0.3 1.2 1.3 0.88 0 0.68

patina/skate 0.43 0.3 0.75 �1.8 0.42 �1 0.75

vacıo/empty 0.24 0.3 0.68 �2.3 0.42 �1 0.76

codo/elbow 0.15 0.3 1.38 2.2 1.92 1.4 0.62

pensar/think 0.15 0.3 0.72 �2 0.45 �1 0.75

ayer/yesterday 0.06 0.3 1.15 1 1.29 0.7 0.67

esos/they 0.06 0.3 1.36 2.1 1.29 0.6 0.64

ninguno/none 0.06 0.3 0.75 �1.8 0.81 �0.1 0.74

frente/front �0.03 0.3 1.02 0.2 0.81 �0.2 0.7

mitad/half �0.03 0.3 0.83 �1.1 0.5 �0.9 0.74

largo/long �0.03 0.3 1.06 0.5 0.69 �0.4 0.7

muebles/furniture �0.22 0.31 0.69 �2.2 0.62 �0.6 0.75

perdido/lost �0.22 0.31 0.84 �1 0.59 �0.6 0.73

necesito/need to �0.31 0.31 1.2 1.3 1.26 0.6 0.66

manguera/hose �0.4 0.31 0.92 �0.4 1.19 0.5 0.71

familia/family �0.59 0.31 1.04 0.3 1.52 1 0.68

lejos/away �0.79 0.31 0.85 �0.9 0.91 0 0.72

circulo/circle �1.29 0.32 0.95 �0.2 0.93 0.1 0.69

negro/black �1.29 0.32 1.24 1.3 1.1 0.4 0.65

enojado/angy �1.5 0.33 1.01 0.1 0.69 �0.3 0.68

escalera/ladder �1.61 0.33 0.94 �0.2 0.53 �0.6 0.69

cacahuete/peanut �1.72 0.33 1.67 3 1.42 0.8 0.57

salsa/sauce �1.72 0.33 1.14 0.8 1.03 0.3 0.65

agarrar/catch �1.83 0.34 1.03 0.2 0.8 �0.1 0.66

nadie/nobody �2.07 0.34 1.11 0.6 0.93 0.1 0.64

salir/leave �2.19 0.35 0.8 �1 0.49 �0.6 0.69

arriba/above �2.19 0.35 1.11 0.6 0.77 �0.1 0.64

sofa/sofa �3.92 0.42 0.91 �0.3 0.41 �0.6 0.57

futbol/football �4.1 0.43 1.55 2 1.84 1.1 0.44

una/fingernail �4.29 0.44 0.73 �1.1 0.23 �1 0.56

sal/salt �4.49 0.45 1.33 1.3 0.77 0 0.46

vaso/glass �7.43 1.04 1.02 0.3 0.16 �1.4 0.31

Note: Item difficulty = item difficulty estimate.

MnSq = mean squared residuals.

ZSTD = standardized z values.

M. Guiberson, B.L. Rodriguez / Research in Developmental Disabilities 35 (2014) 646–656 651

Page 7: Rasch analysis of a Spanish language-screening parent survey

Fig. 2. Item characteristic curves for four items.

Fig. 1. Item characteristic curves for all 59 items.

M. Guiberson, B.L. Rodriguez / Research in Developmental Disabilities 35 (2014) 646–656652

7. Discussion

The goal of this study was to apply Rasch analyses to a language-screening Spanish parent survey in order to identifyuseful items and to eventually develop additional items that will assist in screening Spanish-speaking children at risk for LI.Our initial analyses of point-measure correlations for all 124 Pilot INV-III yielded noticeably positive values indicating thatthe tool had adequate construct validity. Person fit statistics were calculated and revealed that 8 participants did not fit themodel because of poor fit and outlying profiles. Data from these 8 participants were excluded in subsequent analyses. Item fitstatistics were calculated and repeated until all remaining items had infit and outfit MSQ statistics <2.0. The remaining itembank included 59 items that fit the model based on item difficulty and person ability estimates. Sixty-five items were prunedfrom the original 124-item survey. The remaining items demonstrated adequate unidimensionality and the Rasch modelaccounted for 58% of the variance.

Page 8: Rasch analysis of a Spanish language-screening parent survey

Fig. 3. Person item-map for 59 remaining survey items.

Note. The items are listed on the right in a hierarchical order, from most difficult (at the top of the map) to those that were least difficult (at the bottom of the

map). Child ability estimates are presented on the left, from the highest ability level at the top to the lowest ability level at the bottom. M = the mean, S = one

standard deviation, T = two standard deviations, X = 1 child. This figure was generated using WINSTEPS 3.72 (Linacre, 2011).

M. Guiberson, B.L. Rodriguez / Research in Developmental Disabilities 35 (2014) 646–656 653

7.1. Inspection of pruned items

In order to better understand the types of questions that did not fit the model, we inspected the pruned items. Based on fitstatistics, a total of 65 items were pruned form the original set of 124 items. All of the language usage items and the sentenceitems were pruned. The question format of these items was different than the checklist format of the vocabulary items. Theremay have been multiple problems with the sentence items. First, the instructions are complex:

En cada numero marque la oracion que MAS se parece a lo que su hijo (a) dice en este momento. Si su hijo(a) esta diciendo

oraciones mas complicadas que las oraciones que se dan aquı, por favor marque la segunda oracion (p. 2; Guiberson &Rodrıguez, 2010)./For each pair of sentences below, mark the one that sounds MOST like the way you child talks at themoment. If your child is saying sentences even more complicated that the two provided, mark the second one (p. 2;Fenson et al., 2007)

These directions are complex for two reasons. First, they require parents to appraise each item and then predict whichsentence hypothetically would sound more like their child, if he/she were to produce such a sentence. Second, parents alsomay have been confused by the change in instructions for the sentence items (of the two exemplars, check only one item) ascompared to the vocabulary checklist instructions (check every item that your child says). If parents were following the

Page 9: Rasch analysis of a Spanish language-screening parent survey

M. Guiberson, B.L. Rodriguez / Research in Developmental Disabilities 35 (2014) 646–656654

format and instruction of the vocabulary checklist, they may have answered the sentences items based on their recollectionof their child producing each of the twelve sentence pairs. A third issue may be that culturally specific parenting interactionstyle or developmental priorities may influence how parents responded to these items that were meant to tap intogrammaticality. While there is a great deal of variability within cultural groups, it is generally believed that Latino parentsmay follow a more interdependent parenting style (Greenfield et al., 2006; Kayser & Guiberson, 2008). The interdependentstyle may result in parents supplementing or completing communication for children when messages are incomplete orinaccurate in some way. This means that parents who follow an interdependent parenting style are more likely to focus moreon the content of a message than the form of that message. This may in part explain why the twelve items that focused ongrammaticality did not fit the model.

The twelve items that comprise the language usage section were also pruned. The format of these items was different,from the other 112 items on the survey, in that they were asked in a yes/no format. The language usage items were meant totap into children’s use of language, but seemed to also include other constructs. For example, half of the items appeared totap into pre-academic concept development including quantity, spatial, and other mathematical concepts. Several of theitems focused on vocabulary use, including items that inquired about the use of question words (que/what, donde/where, por

que/why, como/how), and the use of a conjunction (porque/because). There also was an item in this section that inquiredabout the child’s metalinguistic ability (

?

Le ha preguntado su hijo(a) lo que significa cierta palabra?/Does your child ever askwhat a particular word means?) and one that appeared to be more related to logic than language skill:

?

Puede su hijo(a) contestar preguntas tales como: ‘

?

Que haces cuando tienes hambre? y

?

Que haces cuando tienes sueno?’

con respuestas apropiadas tales como: ‘conseguir comida’, ‘ir a dormir’ y/o ‘tomar una siesta’? (p. 2; Guiberson & Rodrıguez,2010)./Can your child answer questions such as ‘‘what do you do when you are hungry?’’ and ‘‘what do you do whenyou are tired?’’ with appropriate responses such as ‘‘get food,’’ ‘‘go to sleep,’’ and/or ‘‘take a nap?’’ (p. 2; Fenson et al.,2007).

In terms of item construct, it’s not clear how these twelve items relate closely to one another, despite strong inter-itemreliability for these items. The items were interrelated at some level, but did not fit the model, which accounts for estimatesof item difficulty and person ability. The different item format and the broad and varied content of these items together mayhave influenced how parents responded and may, in part, explain why these items did not fit the model.

Forty-one of the vocabulary items were pruned. Of these items, 29 (70%) were nouns, two were pronouns (e.g., sus/their,ellos/they), and two were verbs (e.g., estornudar/sneeze, estaba/were). Sixteen of the pruned items were also on theInventario-II (Jackson-Maldonado et al., 2003), a Spanish parent survey intended for infants and toddlers. Inventario-IIlexical norms reported on 30 month-old children (N = 65) for these sixteen items indicated that an average of 55% of 30month-old children produced these words, and some items were produced by as many as 85% of 30 month-old children. Thegoal of the Pilot INV-III was to target a range of linguistic abilities in preschool age children (3–5 years of age). However,given that children 30 months of age frequently produce these words, these items may be too easy or basic for a screeningtool intended for preschool age children. Item bias also may have been an issue for several of the vocabulary items pruned(e.g., microscopio/microscope, venado/reindeer, computadora/computer, campamento/camping, salto mortal/somersault).These items may not have been appropriate given the cultural and experiential backgrounds of the children in the currentsample.

7.2. Refined item bank

Upon visually inspecting the item-person map, it became evident that very few items were seen at the higher ability levelestimates and not enough at the lower ability levels. This lack of variability in items across ability levels is an item-targetingproblem. More items are needed that target both of these ability levels, particularly the lower level ability estimates giventhat the tool is for screening purposes and will be used with children with a wide range of linguistic abilities. To furtherexplore the quality of items that targeted these different ability levels, we visually inspected 10 items that had the highestitem difficulty estimates and the 10 items that had the lowest item difficulty estimates (see Table 1). The 10 items that wereestimated to be most difficult represented diverse word classes (three nouns, one pronoun, three adjectives, three verbs, andone preposition). These words appeared to be more conceptually difficult, including words that denoted conditionality(podrıa/might), complex concepts (profundo/deep, acerca/about) and abstract concepts (promesa/promise, odiar/hate).

Over half of the 10 items that had the lowest item difficulty estimates were nouns, many of which were common items(sofa, sillon/sofa; couch, vaso/glass; pelota/ball). The two verbs included in the ten items with the lowest item difficulty weregeneral all purpose verbs (agarrar/catch, salir/leave), which are thought to be acquired and used by children with lower levelsof linguistic ability for both English and Spanish speaking children (Rice & Bode, 1993; Sanz Torrent, 2002; Simon-Cereijido &Gutierrez-Clellen, 2007). The preposition and locative arriba/above was one of the easiest words. Studies of English-speakingpreschool children have shown that children’s gradual acquisition of prepositions increases with age, with many olderpreschool age children demonstrating consistent use of locative propositions (Brown, 1973; Nicoladis, Cornell, & Gates,2008; Rice, 1999). Indeed, a study of morphological development in Spanish-speaking children indicated that locativeprepositions are acquired in early childhood (Kvall, Shipstead-Cox, Nevitt, Hodson, & Launer, 1988). In summary, visualinspection of the 10 highest and lowest items informed what type of additional items may be useful to add to the item bankof a Spanish parent survey that screens the language abilities of preschoolers.

Page 10: Rasch analysis of a Spanish language-screening parent survey

M. Guiberson, B.L. Rodriguez / Research in Developmental Disabilities 35 (2014) 646–656 655

7.3. Limitations

There are limitations to this study that should be considered. First, this study included a relatively small sample size for anIRT approach. A larger sample size would have allowed for a two-parameter logistic model or a polytomous model (Bayloret al., 2009; Linacre, 2011). Unlike Rasch models, these IRT models can provide very useful item level statistics, including anitem’s capacity to discriminate, and the likelihood of false positives. Another limitation is that the Pilot INV-III is a translationof the English CDI-III. Translated tools can unknowingly introduce bias when used with a language or cultural group forwhich it was not designed (Banerjee & Guiberson, 2012; National Research Council, 2008). Four forms of potential bias mayhave been introduced. Cultural bias can occur when a tool requires an individual to engage in an activity that is unfamiliar,inappropriate, or uncommon in his/her home culture. Some questions, especially questions with obvious answers (e.g., whatdo you do when you are hungry?), may be uncommon in the cultures of some of the families and may have introducedcultural bias to the survey. Construct bias occurs when an item does not accurately capture the construct that it beingevaluated. Construct bias may have been introduced by using the translated sentences. If the items had been developed froma Spanish first language model, different morphosyntactic items specific to Spanish (such as clitic or verb inflection) mayhave been used. Method bias can occur when an individual is not familiar with the materials, procedure, and conventionsused in the evaluation. As mentioned earlier, the format used for the sentences and language usage sections was differentthan the checklist format of the other items. The format of these items may have been confusing or unfamiliar, and may haveresulted in method bias. And finally, item bias occurs when characteristics other than those being measured by the toolchange the probability that a person will be given credit for an item. Item bias may have been an issue for some of the surveyitems. As mentioned earlier, several of the words on the original Pilot INV-III may not have been appropriate given thecultural and experiential backgrounds of the children in the current sample.

8. Conclusions

Based on Rasch modeling, we began the iterative process of refining an item bank of a Spanish parent survey intended tomeasure language ability and screen preschool age children, indicating the risk for LI. The Rasch analysis provided measuresof construct validity and dimensionality, which assisted in insuring that included items map onto the construct of languageability in preschool age children. Through item fit statistics, items that did not fit the model were systematically removeduntil a body of 59 survey items remained. Item difficulty estimates and a person-item map were inspected, and provideduseful information that will be used to develop new questions with similar characteristics as well as questions at abilitylevels that the remaining items did not appear to target adequately. Overall, the current findings provide a first step towardsrefining an item bank of items that will eventually result in a parent survey tool that will be used to screen the language skillsof Spanish-speaking preschoolers.

References

American Speech-Language-Hearing Association. (2010a). ASHA members and others who provide bilingual and Spanish-language services for year end 2009.Retrieved from www.asha.org/uploadedFiles/Demographic-Profile-Bilingual-Spanish-Service-Members.pdf.

American Speech-Language-Hearing Association. (2010b). 2010 schools survey: SLP workforce/work conditions. Retrieved from http://www.asha.org/uploa-dedFiles/Schools10Workforce.pdf.

Baylor, C., Hula, W., Donovan, N. J., Doyle, P. J., Kendall, D., & Yorkston, K. (2011). Introduction to item response theory and rasch models for speech-languagepathologists. American Journal of Speech-Language Pathology, 20, 243–259.

Bond, T. G., & Fox, C. M. (2007). Applying the Rasch model: Fundamental measurement in the human sciences (2nd ed.). London: Erlbaum.Brown, R. W. (1973). A first language: The early stages. Cambridge, MA: Harvard University Press.Bryant, F. (2000). Assessing validity. In L. Grimm & P. Yarnold (Eds.), Reading and understanding more multivariate statistics (pp. 99–145). Washington, DC:

American Psychological Association.DeVellis, R. F. (2003). Scale development: Theory and application. Thousand Oaks, CA: Sage.Fenson, L., Marchman, V. A., Thal, D., Dale, P. S., Reznick, J. S., & Bates, E. (2007). MacArthur-Bates communicative development inventories: Users guide and technical

manual. Baltimore, MD: Brookes.Greenfield, P. M., Trumbull, E., Keller, H., Rothstein-Fisch, C., Suzuki, L. K., & Quiroz, B. (2006). Cultural conceptions of learning and development. In P. A. Alexander

& P. H. Winne (Eds.), Handbook of educational psychology (2nd ed., pp. 675–692). Mahwah, NJ: Erlbaum.Guiberson, M. (2008). Concurrent validity of a parent survey measuring communication skills of Spanish speaking preschoolers with and without delayed

language. Perspectives on Communication Disorders in Culturally and Linguistically Diverse Populations, 15, 64–72.Guiberson, M., & Atkins, J. (2012). Speech-language pathologists’ preparation, practices, and perspectives on serving culturally and linguistically diverse children.

Communication Disorders Quarterly, 33, 169–180.Guiberson, M., & Banerjee, R. (2012). Using questionnaires to screen young dual language learners for language disorders. 14th Young exceptional children,

monograph: Supporting young children who are dual language learners with or at-risk for disabilities, Missoula, MT: Council for Exceptional Children Division forEarly Childhood.

Guiberson, M., & Rodriguez, B. (2010). Measurement properties and classification accuracy of two Spanish parent surveys of language development for preschoolage children. American Journal of Speech-Language Pathology, 19, 225–237.

Guiberson, M., Rodriguez, B., & Dale, P. (2011). Classification accuracy of brief parent report measures of language development in Spanish-speaking toddlers.Language, Speech, and Hearing Services in Schools, 42, 536–549.

Hamilton, S. (2006). Screening for developmental delay: Reliable, easy-to-use tools. The Journal of Family Practice, 55, 415–422.Henard, D. H. (2000). Item Response Theory. In Grimm, L., & Yarnold, P. (Eds.), Reading and understanding more multivariate statistics pp. 67–97). (IIWashington,

DC: American Psychological Association.Individuals with Disabilities Education Act of 1997, Pub. L. No. 101–336 (1997).Individuals with Disabilities Education Act of 2004. Pub. L. No. 108–446, § 118 Stat. 2647 (2004).Jackson-Maldonado, D., Thal, D. J., Fenson, L., Marchman, V. A., Newton, T., & Conboy, B. (2003). MacArthur inventarios del desarrollo de habilidades comunicativas:

User’s guide and technical manual. Baltimore, MD: Brookes.

Page 11: Rasch analysis of a Spanish language-screening parent survey

M. Guiberson, B.L. Rodriguez / Research in Developmental Disabilities 35 (2014) 646–656656

Kayser, H., & Guiberson, M. (2008). Hispanic family and child socialization. In H. Kayser (Ed.), Educating Latino preschool children (pp. 47–60). San Diego, CA: Plural.Kvaal, J. T., Shipstead-Cox, N., Nevitt, S. G., Hodson, B. W., & Launer, P. B. (1988). The acquisition of 10 Spanish morphemes by Spanish speaking children. Language,

Speech, and Hearing Services in Schools, 19, 384–394.Linacre, J. M. (2011). A user’s guide to Winsteps Ministep [Computer software manual]. Retrieved October 11, 2011. from http://winsteps.com/winman/

index.htm?table3_2.htm.Nicoladis, E., Cornell, E. H., & Gates, M. (2008). Developing spatial localization abilities and children’s interpretation of where. Journal of Child Language, 35, 269–

289.Paradis, J., Emmerzael, K., & Duncan, T. S. (2010). Assessment of English language learners: Using parent report on first language development. Journal of

Communication Disorders, 43, 474–497.Payan, R. M., & Nettles, M. T. (2007). Current State of English-Language Learners in the U.S. k-12 Student Population. Education Testing Service, News from the ETS

Policy Information Center, 16(2). Retrieved from www.ets.org/Media/Research/pdf/PIC-PNV16N2.pdf.Restrepo, A. (1998). Identifiers of predominately Spanish-speaking children with language impairment. Journal of Speech, Language and Hearing Research, 41,

1398–1411.Rice, S. (1999). Patterns of acquisition in the emerging mental lexicon: The case of to and for in English. Brain and Language, 68, 268–276.Rice, M. L., & Bode, J. (1993). GAPs in the verb lexicons of children with specific language impairment. First Language, 13, 113–131.Sanz Torrent, M. (2002). Los verbos en ninos con trastorno de lenguaje [Verb use in children with language impairment]. Revista de Logopedia, Foniatrıa y

Audiologıa, 22, 100–110.Simon-Cereijido, G., & Gutierrez-Clellen, V. F. (2007). Spontaneous language markers of Spanish language impairment. Applied Psycholinguistics, 28, 317–339.Skellern, C., Rogers, Y., & O’Callaghan, M. J. (2001). A parent-completed developmental questionnaire: Follow up of ex-premature infants. Child Health, 37, 125–

129.Squires, J., Twombly, E., Bricker, D., & Potter, L. (2009). Ages and stages questionnaire user’s guide (3rd ed.). Baltimore, MD: Brookes.U.S. Census Bureau. (2010). Hispanic heritage month 2010: September 15–October 15. Washington, DC: US Census Bureau.U.S. Department of Education (2012). Early reading first program description and goals, Retrieved from http://www2.ed.gov/programs/earlyreading/index.html.U.S. Department of Health and Human Services. (2001). Screening and assessment in head start. Head Start Bulletin70.U.S. Department of Health and Human Services, Agency for Healthcare Research and Quality, U.S. Preventive Services Task Force. (2006). Screening for speech and

language delay in preschool children. www.ahrq.gov/clinic/uspstf/uspschdv.htm..U.S. Office of Special Education Programs. (2007). Study of personnel needs in special education. http://ferdig.coe.ufl.edu/spense/..