38
DOCUMENT RESUME ED 376 198 TM 022 305 AUTHOR Kupermintz, Haggai; And Others TITLE Enhancing the Validity and Usefulness of Large-Scale Educational Assessments: I. NELS:88 Mathematics Achievement. INSTITUTION Center for Research on the Context of Secondary School Teaching. SPONS AGENCY National Science Foundation, Washington, D.C.; Office of Educational Research and Improvement (ED), Washington, DC. PUB DATE 6 Jun S4 CONTRACT G0087CO235; RED-9253068 NOTE 39p.; Version of a paper presented at the Annual Meeting of the American Educational Research Association (New Orleans, LA, April 4-8, 1994). PUB TYPE Reports Evaluative/Feasibility (142) Speeches /Conference Papers (150) EDRS PRICE MF01/PCO2 Plus Postage. DESCRIPTORS Achievement Tests; *Educational Assessment; Ethnic Groups; Factor Analysis; Grade 8; Grade 10; Longitudinal Studies; *Mathematics Achievement; National Surveys; Racial Differences; Regression' (Statistics); *Scores; Secondary Education; Sex Differences; Socioeconomic Status; Student Attitudes; Testing Programs; Test Items; Test Use; *Test Validity; Thinking Skills IDENTIFIERS *Large Scale Assessment; *National Education Longitudinal Study 1988 ABSTRACT This study demonstrates that the validity and usefulness of mathematics achievement tests can be improved by defining psychologically meaningful subscores that yield differential relations with student. teacher, and school variables. The National Education Longitudinal Study of 1988 (NELS:88) 8th- and 10th-grade math tests were subjected to full information item factor analysis. Math knowledge and math reasoning factors were distinguished at both grade levels. Regression analyses showed that student attitudes, instructional variables, course, and program experiences related more to knowledge, whereas gender, socioeconomic status, and sone ethnic differences related more to reasoning. Teacher emphasis on higher-order thinking, student use of home computers, and early experience with advanced mathematics courses related to both dimensions. It is recommended that national educational surveys .use multidimensional achievement scores, not total scores alone. One figure and eight tables illustrate the analysis. (Contains 35 references.) (Author/SLD) *********************************************************************** Reproductions supplied by EDRS are the best that can be made from the original document. ***********************************************************************

ED 376 198 AUTHOR TITLE · DOCUMENT RESUME ED 376 198 TM 022 305 AUTHOR Kupermintz, Haggai; And Others TITLE Enhancing the Validity and Usefulness of Large-Scale. Educational Assessments:

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: ED 376 198 AUTHOR TITLE · DOCUMENT RESUME ED 376 198 TM 022 305 AUTHOR Kupermintz, Haggai; And Others TITLE Enhancing the Validity and Usefulness of Large-Scale. Educational Assessments:

DOCUMENT RESUME

ED 376 198 TM 022 305

AUTHOR Kupermintz, Haggai; And OthersTITLE Enhancing the Validity and Usefulness of Large-Scale

Educational Assessments: I. NELS:88 MathematicsAchievement.

INSTITUTION Center for Research on the Context of SecondarySchool Teaching.

SPONS AGENCY National Science Foundation, Washington, D.C.; Officeof Educational Research and Improvement (ED),Washington, DC.

PUB DATE 6 Jun S4CONTRACT G0087CO235; RED-9253068NOTE 39p.; Version of a paper presented at the Annual

Meeting of the American Educational ResearchAssociation (New Orleans, LA, April 4-8, 1994).

PUB TYPE Reports Evaluative/Feasibility (142)Speeches /Conference Papers (150)

EDRS PRICE MF01/PCO2 Plus Postage.DESCRIPTORS Achievement Tests; *Educational Assessment; Ethnic

Groups; Factor Analysis; Grade 8; Grade 10;Longitudinal Studies; *Mathematics Achievement;National Surveys; Racial Differences; Regression'(Statistics); *Scores; Secondary Education; SexDifferences; Socioeconomic Status; Student Attitudes;Testing Programs; Test Items; Test Use; *TestValidity; Thinking Skills

IDENTIFIERS *Large Scale Assessment; *National EducationLongitudinal Study 1988

ABSTRACTThis study demonstrates that the validity and

usefulness of mathematics achievement tests can be improved bydefining psychologically meaningful subscores that yield differentialrelations with student. teacher, and school variables. The NationalEducation Longitudinal Study of 1988 (NELS:88) 8th- and 10th-grademath tests were subjected to full information item factor analysis.Math knowledge and math reasoning factors were distinguished at bothgrade levels. Regression analyses showed that student attitudes,instructional variables, course, and program experiences related moreto knowledge, whereas gender, socioeconomic status, and sone ethnicdifferences related more to reasoning. Teacher emphasis onhigher-order thinking, student use of home computers, and earlyexperience with advanced mathematics courses related to bothdimensions. It is recommended that national educational surveys .usemultidimensional achievement scores, not total scores alone. Onefigure and eight tables illustrate the analysis. (Contains 35references.) (Author/SLD)

***********************************************************************

Reproductions supplied by EDRS are the best that can be madefrom the original document.

***********************************************************************

Page 2: ED 376 198 AUTHOR TITLE · DOCUMENT RESUME ED 376 198 TM 022 305 AUTHOR Kupermintz, Haggai; And Others TITLE Enhancing the Validity and Usefulness of Large-Scale. Educational Assessments:

CRCCenter for Research on the Context of Secondary Teaching

School of Education, CERAS Building, Stanford University, Stanford, CA 94305-3084

U.S. DEPARTMENT OS nix...nomOnce of Educational Reearch and Improvement

ECU RONAL RESOURCES INFORMATIONCENTER (ERIC)

This document has boon repfoduced asreceived from the person or ceconastronoriginating It.

O Minor changes have been mad* to improvereproducZion Quaid),

Pants of view or opinions statist:1in this docu-ment do riot necessarily represent ofhc.alOERI ooltion Or who,

Enhancing the Validity and Usefulnessof Large-Scale Educational Assessments:

I. NELS:88 Mathematics Achievement

Haggai Kupermintz, Michelle M. Ennis,Laura S. Hamilton, Joan E. Talbert,

Richard E. Snow

Stanford UniversityJune 6, 1994

r3)

2O

BEST COPY AVAILABLE

Page 3: ED 376 198 AUTHOR TITLE · DOCUMENT RESUME ED 376 198 TM 022 305 AUTHOR Kupermintz, Haggai; And Others TITLE Enhancing the Validity and Usefulness of Large-Scale. Educational Assessments:

Enhancing the Validity and Usefulnessof Large-Scale Educational Assessments:

I. NELS:88 Mathematics Achievement

Haggai Kupermintz, Michele M. Ennis,Laura S. Hamilton, Joan E. Talbert,

Richard E. Snow

Stanford UniversityJune 6, 1994

This paper is a revision of a paper presented at theAmerican Educational Research Association meeting, New Orleans,April 4-8, 1994. This research was begun with support from theU.S. Department of Education (OERI Grant No. G0087CO235). Thepresent analysis and report was supported by the National ScienceFoundation (Grant No. RED-9253068). The views and conclusionspresented here are those of the authors and should not beinterpreted as necessarily representing the official policies,either expressed or implied, of the U.S. Department of Education,the National Science Foundation, or the U.S. Government.

The work reported here was a team effort; the authors shareequally the responsibility for the research and its conclusions.Requests for reprints should be sent to Professor Richard E.Snow, School of Education, Stanford University, Stanford, CA94305.

3

Page 4: ED 376 198 AUTHOR TITLE · DOCUMENT RESUME ED 376 198 TM 022 305 AUTHOR Kupermintz, Haggai; And Others TITLE Enhancing the Validity and Usefulness of Large-Scale. Educational Assessments:

Enhancing the Validity and Usefulness of

Large-Scale Educational Assessments:

I. NELS:88 Mathematics Achievement

Haggai Kupermintz

Michele M. Ennis

Laura S. Hamilton

Joan E. Talbert

Richard E. Snow

School of Education

Stanford University

Stanford, CA 94305

June 6, 1994

Submitted for Publication

Do not quote without permission

Page 5: ED 376 198 AUTHOR TITLE · DOCUMENT RESUME ED 376 198 TM 022 305 AUTHOR Kupermintz, Haggai; And Others TITLE Enhancing the Validity and Usefulness of Large-Scale. Educational Assessments:

Validity of Large-scale Assessments I

Abstract

This study demonstrates that the validity and usefulness of mathematicsachievement tests can be improved by defining psychologically meaningfulsubscores that yield differential relations with student, teacher, and schoolvariables. The NELS:88 8th and 10th grade math tests were subjected to fullinformation item factor analysis. Math knowledge and math reasoning factorswere distinguished at both grade levels. Regression analyses showed thatstudent attitudes, instructional variables, course, and program experiencesrelated more to knowledge, whereas gender, SES, and some ethnic differencesrelated more to reasoning. Teacher emphasis on higher-order thinking,student use of home computers, and early experience with 'advanced mathcourses related to both dimensions. It is recommended that nationaleducational surveys use multidimensional achievement scores, not total scoresalone.

5

Page 6: ED 376 198 AUTHOR TITLE · DOCUMENT RESUME ED 376 198 TM 022 305 AUTHOR Kupermintz, Haggai; And Others TITLE Enhancing the Validity and Usefulness of Large-Scale. Educational Assessments:

Enhancing the Validity and Usefulness ofLarge-Scale Educational Assessments:I. NELS:88 Mathematics Achievement

This study is one of a series examining the construct validity ofmathematics and science achievement tests used in national survey researchon school and teaching effects on educational outcomes. Our purpose was todetermine whether or not psychologically meaningful subscores could bedistinguished within these tests that might show differential patterns ofrelationship with educational variables. If so, then the usefulness of suchsurveys for informing educational policy and practice might be significantlyenhanced.

Background

There is mounting federal and state commitment to nationaleducational achievement tests as a way to monitor and promote the success ofU.S. schools. Also, national educational surveys including achievement testshave been used increasingly in recent decades to estimate the effects ofmyriad societal and school variables on educational outcomes for the purposeof informing educational policy. Principal examples are the research on socialinequalities in education (Coleman et. al., 1966; Jencks et. al., 1972) and onpublic versus private school effectiveness (Coleman, Hoffer, & Kilgore, 1982;Coleman & Hoffer, 1987; Chubb & Moe, 1990). As state-by-statecomparisons using the National Assessment of Educational Progress (NAEP)are instituted (Glaser & Linn, 1992, 1993) and as one or another proposednational assessment system is developed, we can expect substantial andregularized influence of assessment data on state as well as federaleducational policy. Also, as policy makers focus on different parts of theeducation problem, evaluations will need to address readiness to learn,opportunity to learn, gender and ethnic differences, subject matter differences,and a host of other special issues. Furthermore, the new goals for educationnot only specify higher standards of achievement, they emphasize deepunderstanding and higher-order reasoning as central educational outcomes.Assessments used to study these issues involve new psychologicalinterpretations and thus depend on new construct validity arguments.

The survey methods and measuring instruments used in this work havebeen criticized, as have some of the interpretations derived from them(Alexander & Pallas, 1983; Cain & Goldberger, 1983; Haertel, 1988; Haertel,James, & Les, in, 1987; Witte, 1990). Standardized achievement tests, ofcourse, have been criticized across a broad front beyond their use in survey orpolicy research (e.g., Gifford, 1989a, 1989b), and there are many suggestions

6

Page 7: ED 376 198 AUTHOR TITLE · DOCUMENT RESUME ED 376 198 TM 022 305 AUTHOR Kupermintz, Haggai; And Others TITLE Enhancing the Validity and Usefulness of Large-Scale. Educational Assessments:

for improvement. Many of these proposals derive in one way or another fromconsidering the cognitive psychology of educational achievemeat alongside thepsychometrics of achievement assessment (see, e.g., Frederiksen, Glaser,Lesgold, & Shafto, 1990; Frederiksen, Mislevy, & Bejar, 1993; Snow &Lohman, 1989). There has also been a move to bring cognitive analysis tobear in the improvement of survey questionnaires and methods (Jabine, Straf,Tanur, & Torangeau, 1984). In both initiatives, the goal is to build a bridgebetween cognitive science and measurement science for the benefit ofeducational evaluation policy and practice.

The present research attempts to contribute to that bridge. Itaddresses the problem of how to improve the interpretation and use of testsand surveys that assess high school students' achievement in the diversesubject areas and classroom contexts of U.S. secondary education. Itconcentrates on some existing tests and questionnaires from the NationalEducational Longitudinal Study of 1988 (NELS:88), although the presentapproach could be used in building new kinds of instruments, or inreanalyzing old measures and data as well.

A schematic view of the NELS:88 survey. NELS:88 is the latest ofthree national longitudinal surveys conducted by the U.S. Department ofEducation and, compared with its predecessors, is especially designed tomeasure instructional practices and cognitive outcomes in four core subjectareas. It began in Spring 1988 with a national survey and testing of 8th gradestudents. For details on design and initial analysis, see Horn, Hafner, andOwings (1992). The first follow-up of these students was conducted at 10thgrade in Spring 1990, with a second follow-up at 12th grade in Spring 1992.Extensive student, parent, teacher, and school questionnaires wereadministered.

Figure 1 gives a schematic framework showing the main categories ofvariables available in the NELS:88 data structure and identifies with arrowsthe relationships studied and reported in the present paper. We areconcerned here only with the analyses of mathematics tests into subscores at8th and 10th grades and their prediction from student and teaching variablesat these grades. A following paper provides the parallel analyses of thescience tests. Data and analyses for 12th grade math and science will beadded as our research continues. The project combines analyses of nationalNELS:88 data with our own small-scale studies of the same tests andquestionnaires. Technical reports showing our exploratory large-scale andsmall scale work on the 8th grade math and science tests are available (seeEnris, Kerkhoven, & Snow, 1993; Snow & Ennis, in press). Another report,on technical issues and comparative methodology, is forthcoming (Kupermintz

2

Page 8: ED 376 198 AUTHOR TITLE · DOCUMENT RESUME ED 376 198 TM 022 305 AUTHOR Kupermintz, Haggai; And Others TITLE Enhancing the Validity and Usefulness of Large-Scale. Educational Assessments:

& Snow, in preparation). Further reports on our analyses in other regions ofthe data structure of Figure 1 are also planned.

Figure 1 here

The NELS:88 tests. Rock, Pollack, Owings, and Hafner (1990)provided a detailed psychometric report for the NELS:88 base year testbattery. In brief summary, they produced four multiple-choice tests, coveringreading, history, math, and science, to fit into 1-1/2 hours of testing time andyet be sufficiently reliable to justify IRT scoring. The tests would thus allowadaptive testing at 10th and 12th grades, vertical scaling to study individualstudent gain across the three testings, and cross-sectional trend comparisonswith the 10th-to-12th grade gains obtained in 1980-82 in the High School andBeyond study. The tests were also shown to be relatively unspeeded and freeof gender and ethnic bias. In addition, the test developers paid attention tothe need for educational and psychological diagnosis, as far as was possiblewithin practical limits. They formed content testlets to allow subscores forspecific content areas within subject-matter domains and, for reading andmath, they designed proficiency level scores to provide somewhat richercognitive interpretation than is usually available from standardized tests.

However, the test design also required that three forms of the 10thgrade math test be administered, for students who scored in the low 25%(Form L), middle 50% (Form M), or high 25% (Form H) of the 8th gradedistribution. Only 20 items were common to the 8th grade and all three 10thgrade forms. Table 1 provides the math item identification numbers used inthe 8th grade form and in each 10th grade form, keyed to the master itemnumbers we use to report our analyses. A verbal description of each item isalso provided, reproduced here from Rock, etal. (1990); we are prohibited bygovernment rules from providing more detailed item descriptions. The 10thgrade reading comprehension test was also designed and administered in twoforms for two levels of 8th grade performance. However, for our math itemanalysis we used only the 8th grade total reading IRT scale score as areference variable.

Table 1 here

3

8

Page 9: ED 376 198 AUTHOR TITLE · DOCUMENT RESUME ED 376 198 TM 022 305 AUTHOR Kupermintz, Haggai; And Others TITLE Enhancing the Validity and Usefulness of Large-Scale. Educational Assessments:

It is not our intent here to criticize the NELS:88 psychometric work.Indeed, we think it an excellent example of how to use modern measurementtheory and practice to make high quality, functional assessment instrumentsthat are useful within the purposes and constraints of study designs such asthose represented by NELS:88.

Nonetheless, it does seem worthwhile also to move outside theboundaries of conventional psychometric theory and practice to see whethermore elaborated, richer cognitive interpretations might be gotten from thetests, and also the questionnaires. NELS:88 cost millions of dollars toconduct; millions more will be invested in data analysis. We thinkpsychological interpretations of student achievement scores in these analysescan and should go beyond the typically vague, molar constructs of "amount ofscience knowledge possessed" or "level of mathematical ability reached."Traditionally, such statements offer the only interpretation one can attach tototal scores from conventional achievement tests. In current cognitiveinstructional psychology, by contrast, "science knowledge" and "mathematicalability" are highly differentiated theoretical constructs. To whatever extentpossible, these differentiations ought to be represented explicitly ineducational assessments. The NELS:88 content and proficiency levelsubscores developed by Rock, et.al. (1990) are useful steps toward moredetailed construct interpretations; the present research tests whether or not itis possible to go still farther in this direction.

Overview of Project Methodology

Emphasis on construct validity argument. Most achievement tests arestill evaluated for validity only on content sampling considerations -- what thetest measures is simply represented by the categories and tasks identified inthe test specification tables, along with evidence that the test is withoutsignificant content bias. But validity theory has progressed far beyond thesimple operationalism of a generation ago (see, e.g., Cronbach, 1988, 1989;Messick, 1989). Test users are entitled to expect evidence justifyingrecommended interpretations and ruling out rival alternatives. And theseinterpretations always involve hypothesized cognitive processes and structures,not just content distinctions. Indeed, most achievement tests are built usingtest specification tables that explicitly include process as well as contentdistinctions, though these distinctions are almost never validated empirically.

Some might argue that the prime issue, at least for educational testssuch as NAEP, is indeed content sampling not psychological constructs.NAEP tests are simply designed to show the proportions of population groups

4

Page 10: ED 376 198 AUTHOR TITLE · DOCUMENT RESUME ED 376 198 TM 022 305 AUTHOR Kupermintz, Haggai; And Others TITLE Enhancing the Validity and Usefulness of Large-Scale. Educational Assessments:

who do or do not respond correctly to given problems, and the results areoften reported at the item level, with the item itself in view. Even herehowever, as soon as one moves to interpretations about proficiency levels, orrelations with other person or school characteristics, or explanations forchanges in trend lines across years, psychological constructs enter thediscussion. Furthermore, when tests such as those of NELS:88 becomecriteria for psychological and sociological hypothesis testing and modelling inthe service of policy decisions, interpretations rest primarily on constructvalidity arguments.

Our approach thus puts the validity standard foremost. We use twokinds of studies in tandem: large-scale statistical analyses based on the itemlevel test and questionnaire data in the national sample; and small-scaleinterview studies of the same .tests and questionnaires using local studentsamples.

Large-scale studies. In support of the validity concern, a secondemphasis reflects Tukey's (1969) view of data analysis as detective work. Inconventional psychometrics, one usually chooses a strong measurement model,such as IRT, checks that the data can be made to fit it, and then proceedswith application. Similarly, a particular statistical model is often fit withoutregard for alternatives. But interest should also attach to features of the datathat the chosen model leaves out. It is not usually appreciated that all modelsthrow some kinds of information away; there are always tradeoffs. Therefore,our approach in the large-scale studies uses alternative statistical analyses tosee if different methods lead to similar or different interpretations. We studyvarious item statistics, different kinds of item intercorrelations andscatterplots, different component and factor analyses of items, differentrotations of axes, and different hierarchical modelling techniques. Nonmetricmultidimensional scaling adds a useful alternative to the metric methods. Ourmain aim is to test whether meaningfully distinguishable and interpretablesubscores are possible within each test. If so, then the achievement constructrepresented by the test has been substantially elaborated.

Small-scale studies. The small-scale studies also emphasize detectivework with multiple methods to reach understandable and usable subscores.Here we obtain small samples of local students, administer the tests andquestionnaires to them under standard conditions similar to those used in themaional study, and then interview them individually about their responses toeach item and their knowledge and attitudes in each subject-matter domain.Retrospective reports about thought processes during the test are obtained.Detailed coding schemes are used to score these interviews. Also includedfor the science domain is a computer-based depth interview technique. In the

5

u

Page 11: ED 376 198 AUTHOR TITLE · DOCUMENT RESUME ED 376 198 TM 022 305 AUTHOR Kupermintz, Haggai; And Others TITLE Enhancing the Validity and Usefulness of Large-Scale. Educational Assessments:

plan for future studies are additional think-aloud, teach-back, and otherperformance tasks designed to assess student ability, domain knowledge, andthinking style. We can administer reference tests to locate students onnational norms, and choose student participants to represent differentcategories of expertise and experience in the subject matter domains.Computing subscores for each student using formulae derived from the large-scale analyses allows individual profiles on the large-scale referencedimensions to be compared. The evidence on items and students from thesmall-scale studies is also used to corroborate or elaborate the large-scaleanalyses. There is thus a two-way street; the large-scale analyses can helpdirect the small-scale work and vice versa.

Given space limitations, the present report focuses on what weconsider the best large-scale results. Details of supporting analyses of bothlarge-scale and small-scale studies are merely summarized. Also, of course,validation is a continuing process. The derived subscores for each test andquestionnaire are provisional; their meaning will be elaborated as multipleanalyses continue across the 8th, 10th, and 12th grade data. The aim of allthese methods is to reach richer substantive descriptions of what each itemmight represent psychologically for different individuals.

Samples. The NELS:88 base year sample consisted of 24,600 8thgraders from 1052 schools. For some large-scale analyses, we used randomstudent subsamples to allow for cross validation. Analyses of test item datawere conducted initially using one-sixteenth random samples. Givencomparable results in different samples, these analyses were then merged toform approximate quarter-samples and then half-samples of the 8th and 10thgrade data. This allowed us to bring teacher questionnaire variables into theanalyses in various combinations. In the NELS:88 sampling design, onlycertain pairs of teachers were surveyed for each student; no student had bothmath and science teachers reporting within a grade, and some students whohad a math or a science teacher reporting in 8th grade may not have had asame-subject teacher reporting in 10th grade (see Horn, Hafner, & Owings,1992). There was also substantial student attrition between grades. Thus,sample size varies across analyses. For example, one original subsample wechose for preliminary analyses consisted of 8th graders who had both mathand English teachers reporting; it contained 6022 cases. This number wasreduced to 5823 8th graders for test item dimensional analyses due to missingscores, reduced further to 4059 cases for analyses in which both 8th and 10thgrade math scores were required, and reduced still further to 3044 cases forwhom a math teacher reported at 10th grade. Since about half of the originalsubsample was lost due to these restrictions, we constructed subsamples thatmaximized our test and teacher data in the subject domains. For math, our

6

Page 12: ED 376 198 AUTHOR TITLE · DOCUMENT RESUME ED 376 198 TM 022 305 AUTHOR Kupermintz, Haggai; And Others TITLE Enhancing the Validity and Usefulness of Large-Scale. Educational Assessments:

student sample included all students for whom cognitive tests and mathteacher responses were available at 8th and 10th grade; it consisted of 5460cases. Attrition between grades is not random (see Ingels, et al., 1992), so wecan expect substantive differences in results for analyses on differentsubsamples.

The samples for small-scale studies noted in this report included 50 8thand 10th graders. They were recruited as paid volunteers from local schools.

Scoring. For the present analyses, the NELS:88 achievement testswere scored at the item level simply as right versus wrong. The mathproficiency level scores were also used in some analyses (Rock, et al., 1990).The IRT total score for Reading Comprehension was used to representgeneral verbal comprehension and reading ability. The IRT total scores formath and science were used for comparison purposes. Some of the NELS:88teacher and student questionnaire items were rescored for our purposes usingour own analyses of national data.

Analyses and Results

Our report of analyses and results is organized as follows. We firstreport the dimensional analyses that lead us to define psychologically differentdimensions and subscores of mathematics achievement. Next we relate thesesubscores to the categories of the test specification table originally used toselect items. We then form categories of predictor variables representingstudent, course, program, and instructional variables for use in regressionmodels. Finally, we present regression results for main effects of thesevariables on the 10th grade math achievement subscores, as well as on mathtotal scores.

Dimensional analysis. In previous exploratory work with the 8th gradedata alone, five component subscales were identified for the mathematics test.We used conventional factor and principal component analysis on bothPearson and tetrachoric item correlation matrices, as well as nonmetricmultidimensional scaling for this purpose. A small-scale study using local. 8thgraders also provided interviews concerning approaches and reactions toparticular items. The five-component solution with varimax rotation seemedmost satisfactory based on eigenvalues, the number of salient loadings oneach dimension, and their interpretability using all available information. Aunidimensional model could also be rejected using Stout's (1987) test. Theupper panel of Table 2 shows these 8th grade mathematics subscales. Detailsof the analysis are given by Ennis, Kerkhoven, and Snow (1993).

7

.1 4

Page 13: ED 376 198 AUTHOR TITLE · DOCUMENT RESUME ED 376 198 TM 022 305 AUTHOR Kupermintz, Haggai; And Others TITLE Enhancing the Validity and Usefulness of Large-Scale. Educational Assessments:

Table 2 here

Although justifiable on both statistical and substantive grounds in the8th grade data, these distinctions were regarded as provisional. The last twocomponents were small; the last, particularly, was of doubtful reliability. Andit is well-known that item-level correlational analyses of this sort can bedistorted by distributional anomalies. When the test data for the 10th gradeyear became available, it was possible to analyze both grade levels in paralleland to include a cross validation by comparing factor loadings obtained inrandom subsamples. We also investigated full-information factor analysis(Bock, Gibbons, & Muraki, 1988) as a new and improved approach to thisproblem. This new raethod is now implemented in the TESTFACT computerprogram (Wilson, Woods, & Gibbons, 1991). It is based on specifying amulti-dimensional item response model for the test items. Thurstonian factorstructure is employed for modelling the latent ability dimensions that affectthe probability of correct responses. Combining the mpdels for abilitystructure and item response provides a strong tool for estimating itemloadings on distinct abilities. The statistical analysis of the test data is basedon maximum likelihood estimates and iterative computation, where aprincipal factor analysis of the tetrachoric correlation matrix providesreasonable starting points for the iterative process. The procedure allows forcombining information from different test forms, omitted responses, andadjustment for guessing. Factorial solutions can be rotated using orthogonalor oblique techniques. An explicit Chi-square statistic for improvement in fitis used to determine the number of statistically significant dimensions. Oncea factor structure is decided upon, a Bayes estimate (average of the posteriorability distribution) generates scores on each ability dimension.

Along with exploratory use of conventional factor analyses applied tothe 20 math items common to 8th grade and 10th grade test forms, and to thelow, middle, and high 10th grade test forms separately, we also applied thefull information factor method to the 8th grade and 10th grade test items, andto the combined data set for both grades. Since mathematical abilities wereexpected to be correlated, promax rotation was employed. Results provedhighly satisfactory. We were able to obtain factor solutions that apparentlyprovided the same two major dimensions in both grades. At the 8th gradelevel, the full-information procedure again identified the inferential reasoningfactor shown in Table 2 and a knowledge-computation factor that combinedthe advanced and basic facts knowledge and computation factors of Table 2.It also isolated the specific counterexample reasoning items as a factor. Atthe 10th grade level, the full-information procedure again provided the

8

1.3

Page 14: ED 376 198 AUTHOR TITLE · DOCUMENT RESUME ED 376 198 TM 022 305 AUTHOR Kupermintz, Haggai; And Others TITLE Enhancing the Validity and Usefulness of Large-Scale. Educational Assessments:

inferential reasoning and knowledge-computation factors, along with a smallfactor different from that obtained at the 8th grade level. With the 8th and10th grade levels analyzed together, the number of dimensions was decidedusing change in Chi-squared statistics for the fitted models. This indicatedsignificant results for adding a second (Chi-squared change = 739.34, df =57) and a third factor (Chi-squared change = 155.38, df = 56).

The lower panel of Table 2 defines the two major factors that appearcommon to both grade levels regardless of test form. Table 3 gives the factorloadings for each item at the two levels. In both years, items that were highlyloaded on the first factor, symbolized as MR, required mainly inferentialreasoning. Typically, no direct computation was called for in answering theseitems; rather, the correct solution was derived more from a logical argument.Although some knowledge of formal mathematical facts facilitated thesolution, it alone was not sufficient to arrive at the correct answer. Most ofthe items in this factor introduced some kind of a scenario, not justmathematical expressions. On the other hand, items that were highly loadedon the knowledge-computation factor, symbolized here as MK, requiredmainly a straightforward computation or the use of mathematical knowledge.Most of these items were composed of a mathematical expression for which aone-step solution was possible. New items that were added to the 10th gradetest had the same characteristics as 8th grade test items in each factor; of the19 common items in the 10th grade reasoning factor, 16 were also classifiedas reasoning items in the 8th grade. The correlation between the two factorscores (over persons) was .72 and .75 at 8th and 10th grade, respectively. Thecorrelation (over items) of the reasoning factor loadings from the 10th gradewith the 8th grade reasoning factor loadings was .69; the correlation was -.74with the 8th grade knowledge-computation factor. New items added to theknowledge-computation dimension appeared to be in the form of one-stepsolution mathematical expressions; 11 of the 18 items in this factor wereclassified in the 8th grade test as knowledge-computation. The correlation of10th grade factor loadings was .68 with the 8th grade knowledge-computationfactor and -.73 with the 8th grade reasoning factor.

Table 3 here

A noted, the full information factor analysis indica:L;(1 the existence ofa third factor in each grade (X and Y in Table 3). In the 8th grade, thissmall third factor was not easily identified as a distinct mathematical ability;in the earlier exploratory work, it was interpreted as counterexamplereasoning (MC4 in Table 2). The three defining items were the only

9

1.4

Page 15: ED 376 198 AUTHOR TITLE · DOCUMENT RESUME ED 376 198 TM 022 305 AUTHOR Kupermintz, Haggai; And Others TITLE Enhancing the Validity and Usefulness of Large-Scale. Educational Assessments:

comparison items for which the correct answer was "the relationship cannotbe determined from the information given", so item format may also be thekey here. Respondents, when answering incorrectly, also tended to prefer oneparticular option (resulting in high values for the guessing parameters). Inthe 10th grade, the third factor seemed to capture a technical aspect offunctional notation that was present in both of the two highest loading items.These items were also loaded substantially on the knowledge computationfactor.

For present purposes, we have not retained these third factors, or thedistinction between basic and more advanced knowledge and computationthat seemed apparent in the 8th grade exploratory analyses. This does notmean that such distinctions can never be important. For example,counterexample or none-of-the-above reasoning could be usefullydistinguished if represented by a more substantial range of items. Also, inother work on gender differences in the NELS:88 data, the subscore for basicfacts computation favored females over males, on average, whereas anadvanced knowledge subscore did not (see Snow & Ennis, in press).Furthermore, factors that appear at one grade level and not another, or itemfactor loadings that change between levels, should not be discounted. Forexample, variance arising from one source, e.g. reasoning, at one grade levelcould be replaced by variance from another source, e.g. knowledge, at a latergrade level due to the effects of intervening instruction. We adopt thecommon two-factor representation as most expeditious for present purposeswhile recognizing that some finer distinctions may prove more useful for otherpurposes.

The common two-factor solution for the math tests at both 8th and10th grade levels permitted computation of separate factor scores at eachlevel expressed on a common psychometric scale, despite the involvement ofdifferent items at each level. This in turn allowed us to examine somealternative measures of achievement gain or growth. If one assumes that likefactors at the two grade levels are indeed the same ability dimension, thensimple gains or residual gains might be computed separately for knowledgeand reasoning, or a three-parameter growth model might be considered (withparameters reflecting average baseline, average growth, and differentialgrowth in knowledge versus reasoning). Clearly, a two-dimensionalrepresentation of learning gain would be valuable theoretically, even thoughthe growth model approach is extremely limited when only two points in timeare available. When 12th grade data can be added, however, the growthmodel approach may prove uniquely useful.

10

1.5

Page 16: ED 376 198 AUTHOR TITLE · DOCUMENT RESUME ED 376 198 TM 022 305 AUTHOR Kupermintz, Haggai; And Others TITLE Enhancing the Validity and Usefulness of Large-Scale. Educational Assessments:

For the present, we think it preferable to consider like factors acrossgrade levels as similar but not vertically equated dimensions. Regressionanalyses can simply treat 8th grade factors as predictors of 10th grade factors,with no special status in the spectrum of other predictors. No assumption of8th to 10th grade equivalence is then needed. (see Kupermintz & Snow,forthcoming, for technical discussion of this issue, as well as comparative andcross validational analyses).

Our subscale interpretations were aided considerably by small-scaleinterview studies of local 8th and 10th graders concerning their approachesand reactions in trying to solve particular items. Their responses for mathitems included: guessing, eliminating some alternatives then guessing,computation, estimation, reasoning the item out, and immediately knowingthe correct response. Apparent sources of incorrect responses were alsoidentified, including: computational errors, carelessness, lack of knowledge,incorrect knowledge, failure to reason, and incorrect reasoning. Instances inwhich examinees arrived at the correct answer using incorrect reasoning orknowledge were also noted. It did appear that MR items called more onreasoning strategies and produced errors due to incorrect reasoning as well aslack of knowledge. On the other hand, MK items more often involvedcomputation or estimation strategies, with errors more often arising fromcomputational mistakes as well as faulty knowledge (see Ennis, Kerkhoven, &Snow, 1993).

In addition to assisting with the interpretation of our math subscores,the interviews identified some problem items. Of particular concern areseveral items for which the correct answer can be obtained using incorrectreasoning or knowledge. Examples are given in Ennis, Kerkhoven, & Snow(1993). We decided to leave these in the large-scale analyses, but to keeptrack of them in further analyses and interpretation. Of the four items thatmight be questioned on this basis, none was crucial to defining a factor ateither grade level. Three of the four had low loadings on all factors at one orboth levels. Thus it seems that the results were not negatively influenced byretaining these items.

Relation to test specifications and proficiency scores. As noted earlier,the NELS:88 math test was developed using a typical process-by-content testspecification table. Separate content scores were computed. Examinees werealso assigned to proficiency levels according to their performance on threefour-item subsets. An important question concerns how our proposed mathsubscores relate to these proficiency levels and to the process and contentdimensions of the test specification table.

11

Page 17: ED 376 198 AUTHOR TITLE · DOCUMENT RESUME ED 376 198 TM 022 305 AUTHOR Kupermintz, Haggai; And Others TITLE Enhancing the Validity and Usefulness of Large-Scale. Educational Assessments:

Table 4 shows the items as given in the original 8th grade specificationtable (see Rock, et al., 1990) and indicates the subscore to which we assignedeach item. Items that mark one of the three proficiency levels are alsodesignated (1), (2), or (3). Our subscores do not represent any of the originalprocess or content distinctions cleanly. Although there is a preponderance ofMK items in the skills-knowledge row and of MR items in the understanding-comprehension and problem solving rows, each subscore spans both processand content categories. Comparing subscores and proficiency levels, eachproficiency level appears to be a mixture of knowledge and reasoning at the8th grade level. We do not yet have a test specification table for the 10thgrade items. In the 10th grade factor analysis, however, Proficiency Level 1items load on the MK or the Y factor, Proficiency Level 2 items all load onMK, and all but one Proficiency Level 3 item load on MK. Also, many of theproficiency items show relatively low factor loadings at 8th grade, as thoughthe factor solution did not represent them well. It would appear that thepsychology of the proficiency levels is not well understood and may changeacross grades, with reasoning variance tending to become knowledge variance.

Table 4 here

Student, course, program, and instructional variables as predictors. Inaddition to the student achievement factors from 8th grade, several othercategories of variables at one or both grade levels provided predictors ofachievement factors at 10th grade. As shown in Figure 1, 8th grade studentreading ability was one index of general prior achievement. Gender,ethnicity, and socio-economic status indices were obvious additions. Further,the student survey included questions on learning opportunities outside ofschool, such as visits to museums, computer availability, parental help withhomework, and amount of TV watching, as well as formal courses taken inschool during present and past years. In addition to courses taken, we haveincluded an index for academic program, contrasting advanced, academic, andgeneral-vocational tracks. A separate index for those students in programs forthe gifted and talented was also included.

Both the 8th and 10th grade student questionnaires also containedlocus of control and self esteem scales, as well as items concerned withanxiety about asking questions in class and attitude toward different subjectareas. Our own factor analysis of the 8th grade national sample data (notreproduced here) produced component scores for positive and negativestatements about self esteem and a separate score reflecting attributions of

12

Page 18: ED 376 198 AUTHOR TITLE · DOCUMENT RESUME ED 376 198 TM 022 305 AUTHOR Kupermintz, Haggai; And Others TITLE Enhancing the Validity and Usefulness of Large-Scale. Educational Assessments:

success to luck versus hard work (called here Pos Self 8, Neg Self 8, and Luck8).

The variable labeled "positive self-esteem" represents a factor reflectingpositively worded statements (such as "I am a person of worth"); the "negativeself-esteem" factor represents negatively-worded statements (such as "at tirries,I think I am no good at all"). We also produced separate scales for mathanxiety and positive attitude toward math (called Math Am( and Math Att 8).We expect that these student affective indicators will be important criterionme: cures along with the cognitive subscores in subsequent longitudinalanalyses; however, we have used them only as predictors in the present work.

From the 10th grade teacher questionnaire, we have used teacherreport on student absence and track level. We also subjected teacher reportsof instructional practices to a series of principal component analyses withvarimax rotation (not reproduced here) to identify distinct instructionaltreatment dimensions. These are defined in Table 5. Of particular interesthere was the teaching dimension called "emphasis on higher-order thinking",including emphasis on conceptual structure and problem solving. In thepresent work and in related studies (e.g., Raudenbush, Rowan, & Cheong,1993; Talbert & DeAngelis, forthcoming) we have investigated this variable asan indicator of teaching for understanding as opposed to memorization andcomputation. As a complement to this index, we also included as a predictorvariable student report of teacher emphasis on understanding from the 10thgrade student questionnaire.

Table 5 here

Regression analyses for main effects. To explore the degree to whichpredictors of academic success might be differentially important for differentcomponents of math achievement, regression analyses were carried out at thesubscore level as well as for the total math IRT scores.

Four regression models were computed for each of the two major mathsubscores in 10th grade. Prior achievements represented by 8th grade mathand reading scores were entered at the first step in each model. The studentmodel then included student SES, gender, ethnicity, absenteeism, and the 8thgrade affective factors. A second model examined course and programvariables. A third included all the teacher and instructional factors. Finally,a fourth model included indicators of opportunity to learn outside of school.

13

18

Page 19: ED 376 198 AUTHOR TITLE · DOCUMENT RESUME ED 376 198 TM 022 305 AUTHOR Kupermintz, Haggai; And Others TITLE Enhancing the Validity and Usefulness of Large-Scale. Educational Assessments:

All models were fitted by ordinary least squares on random halves of thesample and compared. The present report shows combined results only forthe total sample. is, fixed alpha level of .01 (corresponding to a t-ratio ofabout 2.5 in this sample) was used to designate statistical significance. Allvariables (except dummy indicators) were standardized prior to model fitting.

Table 6 here

Table 6 presents the regression results for the MK factor as dependentvariable. As expected, 10th grade math knowledge was predicted by 8th gradereading and math reasoning as well as math knowledge. Higher achievementon MK was also associated with higher SES and Asian ethnicity. It isinteresting to note that no gender or other ethnicity effects were found forthis factor. As for the affective variables, higher math knowledge wasassociated with a positive attitude toward math and, to a lesser extent, withemphasizing hard work over luck and reporting a less negative self esteem.There was also a negative effect due to student absence.

Strong effects were found for the course and program indicators.Students who scored higher on math knowledge were more likely to havetaken algebra (I or H) and geometry courses, while lower achievement wasstrongly related to having taken general math courses. Students in theadvanced track scored somewhat higher on math knowledge, whereas studentsin the general-vocational track scored much lower, compared with students inthe academic track. Also important was having taken algebra in 8th grade.

The best instructional treatment predictor of math knowledge wasteacher emphasis on higher order thinking. Student report of teacheremphasis on understanding was also significant. Higher student scores onmath knowledge were also related to teacher reports of more use oftraditional instruction, less use of individualized instruction, and more timeassigned to homework. A negative effect was associated with emphasis onmath applications.

Finally, students who reported visiting science museums and having acomputer at home for their educational use showed higher math knowledgescores. Students with lower scores received more help from parents onhomework.

Table 7 here

14

19

Page 20: ED 376 198 AUTHOR TITLE · DOCUMENT RESUME ED 376 198 TM 022 305 AUTHOR Kupermintz, Haggai; And Others TITLE Enhancing the Validity and Usefulness of Large-Scale. Educational Assessments:

Table 7 presents the regression results for the MR factor. Again asexpected, 8th grade reading and math knowledge as well as math reasoningpredicted 10th grade math reasoning. Note however that prior mathreasoning appears to be a stronger predictor of later math knowledge thanprior math knowledge is of later math reasoning. Also as before, studentswith higher SES performed better on reasoning. Large gender and ethnicityeffects were found; males and non-black students showed higher mathreasoning scores on average. No affective variables showed significantrelation to reasoning. These patterns stand in contrast to the findings for mathknowledge.

Course effects were less pronounced on math reasoning compared withmath knowledge. Again, general math courses were associated with lowerscores, whereas Algebra II and Geometry courses were associated with higherscores. It is also noteworthy that taking Algebra I was associated with lowermath reasoning scores; this effect became significant in the overall model andis analyzed further below. No significant track effects were found.

As with math knowledge, student achievement on math reasoning wasassociated with teacher reports of more emphasis on higher order thinkingand less emphasis on individualized instruction, though both effects appearedto be weaker. Here also student reports of teacher emphasis onunderstanding were not significant.

Having a computer in the home for educational use appeared to havea marked relation to reasoning. Again, parent help with homework showed anegative relation. Museum visits had little relation to reasoning.

The next stage of analysis consisted of fitting two overall regressionmodels, for MK and MR separately, to include all of 'he significant predictorsfrom the previous analyses. Table 8 presents these overall models for the10th grade MK and MR factor scores, along with comparable results for thetotal math IRT score.

Table 8 here

For the knowledge factor, after taking other significant factors intoaccount, the student variables reflecting SES, Asian ethnicity, positive selfesteem, and emphasis on hard work over luck fell out of statistical

15

:20

Page 21: ED 376 198 AUTHOR TITLE · DOCUMENT RESUME ED 376 198 TM 022 305 AUTHOR Kupermintz, Haggai; And Others TITLE Enhancing the Validity and Usefulness of Large-Scale. Educational Assessments:

significance. The coefficient for Black ethnicity became significantly negative.The effects of museum visits and home computers were reduced. All othereffects previously noted remained significant in the overall model.

The overall model for the reasoning factor was consistent with thepartial models. The exception, previously noted, was that taking Algebra I inthe 9th or 10th grade became a statistically significant negative predictor.Further analysis revealed that this result arose from a quadratic pattern in thedistribution of Algebra I course taking along the two math achievementscales; the likelihood of having taken that course 'n 9th or 10th grade wasconsiderably higher in the middle achievement region in comparison to theupper and lower regions. However, since lower MR scores in particular weremore associated with Algebra I than higher MR scores, on average, negativerelation appeared for MR, not for MK.

It is also worth noting that when comparing the magnitude of thecoefficients between the overall and partial models, almost no changes wereobserved for MR. On the other hand, overall model coefficients for MKweregenerally much smaller when compared with the coefficients from the partialmodels.

Table 8 also shows that using the total math IRT score as a criterionoften seems to average the effects found for the two math factors studiedseparately, and important effects are thus missed. The gender effect wassignificant on reasoning, nonsignificant but opposite in sign on knowledge.Yet the total score analysis taken alone would have dismissed genderdifferences as unimportant in general. Also, the Black ethnicity effect wasmuch stronger on reasoning than knowledge. On the other hand, theknowledge factor showed stronger effects for student math attitude, and forall significant course and instructional treatment variables. Student report ofteacher emphasis on understanding was important only for the knowledgescore, not for reasoning, yet the total score analysis would support a generalconclusion. Track showed no relation to reasoning, whereas home computeravailability was associated more strongly with reasoning. These differencesdemonstrate our point that total score analyses may misconstrue some effectsand miss some effects entirely; psychological construct interpretations andpolicy considerations may both be helped by differentiating total scores intopsychologically and educationally meaningful subscores.

Discussion and Conclusions

16

Page 22: ED 376 198 AUTHOR TITLE · DOCUMENT RESUME ED 376 198 TM 022 305 AUTHOR Kupermintz, Haggai; And Others TITLE Enhancing the Validity and Usefulness of Large-Scale. Educational Assessments:

The analyses reported herein examine only a few of the manyimportant questions that can be addressed with the NELS:88 data. And theresults in hand so far must be considered provisional. Our plans for furtheranalyses include studies of a variety of student and instructional treatmentinteraction hypotheses and much more detailed analysis of gender, ethnic, andother personal effects. Nonetheless, we believe our research to date supportsseveral important conclusions and implications. These provide guidelines forfurther NELS:88 data analyses. They also suggest a new approach to futuresurvey research on educational achievement, with particular reference toreadiness and opportunity to learn, and the evaluation of educationalprograms.

A first conclusion is that the NELS:88 mathematics test ismultidimensional and should be treated as such. At least two dimensions,representing separate scores for knowledge and reasoning, can bedistinguished at both 8th and 10th grade levels. These two dimensions differpsychologically and thus statistically in their relations to other student,instructional, and school characteristics. They should therefore bedistinguished in research seeking to improve our understanding of studentreadiness to learn, of teacher, course, and program effects on opportunity tolearn, and of effective instructional design in general.

A second set of conclusions derives from this distinction. Mathknowledge and reasoning, which can be distinguished in the laboratory, canalso be distinguished in survey-level multiple choice tests. Of course, thelabels "knowledge" versus "reasoning" offer only a simplistic shorthand. Bothare complex constructs intimately related in cognitive learning andperformance. But it is reasonable to think of these two aspects ofmathematical achievement as having different weight in influencing studentperformance on particular tasks. Some tasks emphasize the application ofconcepts and computational skill. Although reasoning may be involved indeciding what is applicable and what to do when, differences in studentsuccess or failure on an item arise mainly from the presence or absence of therelevant declarative , id procedural knowledge. Other tasks emphasizeperceptive analysis and sequencing of steps to find solutions to problemsembedded in scenes. Knowledge of concepts and computation is involved, buthere student success or failure arises more from the ability to decontextualizethe key mathematical aspects of a problem and interrelate them in a system.The contrast in the NELS:88 math test may be akin to the more generaldistinction between crystallized and fluid intelligence (see, e.g., Carroll, 1993;Snow, 1982). It is known that many other math achievement tests displaythese two aspects of ability. Although growth in crystallized knowledge andskill and in fluid analytic reasoning in mathematics are both promoted by

17

02

Page 23: ED 376 198 AUTHOR TITLE · DOCUMENT RESUME ED 376 198 TM 022 305 AUTHOR Kupermintz, Haggai; And Others TITLE Enhancing the Validity and Usefulness of Large-Scale. Educational Assessments:

educational experiences, it is to be expected that they will be differentiallyaffected by specific instructional practices and learning opportunities.

Our results suggest some of these differentials. Average differencesfavoring males occur on reasoning but not on knowledge. Averagedifferences favoring Asian-Americans occur on knowledge, but not onreasoning, while average differences reflecting African-American disadvantageare more pronounced on reasoning than on knowledge. Research aiming tounderstand such differences will be aided by knowing where to look morespecifically, and thus perhaps what to lock at in the spectrum of readiness andopportunity to learn variables. Student affective variables such as positiveattitude toward math seem more relevant to knowledge than to reasoning.Instructional, course, and program variables also show more pronouncedrelation to knowledge than to reasoning, as does time spent on homework.On the other hand, home computing and the advantages of SES in generalseem more strongly related to reasoning. All these patterns seem consistentwith the hypothesis that crystallized knowledge growth is more influenced byformal educational factors and by personal factors such as attitude,homework, and attendance, whereas growth in fluid reasoning ability is morea function of informal learning experience promoted by home and familyadvantages, as well as school advantages, over the long haul.

With respect to opportunity-to-learn objectives, it is clear that takingAlgebra early, and taking advanced courses by 10th grade, are positive factorsin promoting both knowledge and reasoning development. Severalinstructional factors, such as emphasis on traditional instruction andhomework, also promote knowledge growth specifically. But beyond coursetaking patterns, only teacher emphasis on higher order thinking and parentalprovision of home computers for educational use seem associated with bothknowledge and reasoning development in mathematics.

The notion of opportunity to learn should consider the cognitivedimensions of student learning and differences in the instructionalenvironments conducive to developing different kinds of cognitive aptitudes.Indeed, the thrust of current reforms in math education aim precisely toenhance U.S. students' math reasoning aptitude, which our analysis suggests isnot strongly related to in-school learning opportunities at present. Large-scaleassessments of student learning and educational progress should certainly aimto represent the cognitive and educational distinctions being made bycognitive psychologists, math educators, and by the nation's education goals.

A final conclusion is a reemphasis and recommendation that furtherresearch aimed at theory or policy not use total math IRT scores as a lone

18

Page 24: ED 376 198 AUTHOR TITLE · DOCUMENT RESUME ED 376 198 TM 022 305 AUTHOR Kupermintz, Haggai; And Others TITLE Enhancing the Validity and Usefulness of Large-Scale. Educational Assessments:

criterion. We believe our analysis shows that mathemati::s achievement ismultidimensional and that much is to be gained by recognizing this fact.Research that relies on total score criteria misrepresents some importantissues and misses others altogether.

19

9 4

Page 25: ED 376 198 AUTHOR TITLE · DOCUMENT RESUME ED 376 198 TM 022 305 AUTHOR Kupermintz, Haggai; And Others TITLE Enhancing the Validity and Usefulness of Large-Scale. Educational Assessments:

References

Alexander, K. L. & Pallas, A. M. (1983). Private schools and public policy:New evidence on cognitive achievement in public and private schools.Sociology of Education, 56, 170-181.

Bock, D., Gibbons, R., & Muraki, E (1988). Full-information item factoranalysis. Applied Psychological Measurement, 12, 261-280.

Ca In, G. C. & Goldberger, A. S. (1983). Public and private schools revisited.Sociology of Education, _56., 208-218.

Carroll, J. B. (1993) Human Cognitive Abilities New York: CambridgeUniversity Press.

Chubb, J. E. & Moe, T. M. (1990) Politics, markets, and America's schools.Washington, D.C.: The Brookings Institution.

Coleman, J. S. et.al. (1966) Equality of educational opportunity. Washington,D.C.: U.S. Government Printing Office.

Coleman, J. S., & Hoffer, T. (1987). Public and private high schools: Theimpact of communities. New York: Basic Books.

Coleman, J. S., Hoffer, T., & Kilgore, S. D. (1982). High school achievement:Public, private, and Catholic schools compared. New York: BasicBooks.

Cronbach, L. J. (1988). Five perspectives on validity argument. In H. Wainer& H. I. Braun (Eds.), Test validity (pp. 3-17). Hillsdale, NJ: LawrenceErlbaum Associates.

Cronbach, L. J. (1989). Construct validation after thirty years. In R. L. Linn(Ed.), Intelligence: Measurement, theory. and public policy (pp. 147-171). Urbana and Chicago: University of Illinois Press.

Ennis, M., Kerkhoven, J.I.M., & Snow, R.E. (1993) Enhancing the validityand usefulness of large-scale educational assessments. Report P 93-151, Center for Research on the Context of Secondary SchoolTeaching, Stanford University, Stanford, CA, January.

20

Page 26: ED 376 198 AUTHOR TITLE · DOCUMENT RESUME ED 376 198 TM 022 305 AUTHOR Kupermintz, Haggai; And Others TITLE Enhancing the Validity and Usefulness of Large-Scale. Educational Assessments:

Frederiksen, N., Glaser, R., Lesgold, A., & Shafto, M. (Eds.) (1990).Diagnostic monitoring of skill and knowledge acquisition. Hillsdale,NJ: Lawrence Erlbaum Associates.

Frederiksen, N., Mislevy, R., & Bejar, I. (Eds.) (1993). Test theory for a newgeneration of tests. Hillsdale, NJ: Lawrence Erlbaum Associates.

Gifford, B.R. (Ed.)(1989a). Test policy and test performance: Education,language. and culture. Boston: Kluwer.

Gifford, B.R. (Ed.)(1989b). Test policy and the politics of opportunityallocation: The workplace and the law. Boston: Kluwer.

Glaser, R. & Linn, R. (1992). Assessing student achievement in the states.Stanford, CA: National Academy of Education.

Glaser, R. & Linn, R. (1993) The trial state assessment: Prospects andrealities. Stanford, CA: National Academy of Education.

Haertel, E. H. (1988). A technical review of the National Assessment ofEducational Progress: Introduction and overview. InternationalJournal of Educational Research, 12, 673-677.

Haertel, E. H., James, T., & Levin, H. M. (Eds.) (1987). Comparing publicand private schools. Vol. 2: School achievement. New York: FalmerPress.

Horn, L., Hafner, A., & Owings, J. (1992). A profile of American eighth-grade mathematics and science instruction. National Center forEducational Statistics NCES 92-486. Washington, D.C.: U.S.Department of Education.

Inge ls, S.J., Scott, L.A., Lindmark, J.T., Frankel, M.R., Myers, S.L., & Wu, S.(1992). NELS:88 First follow-up drop-out component data file user'sguide. National Center for Educational Statistics NCES 92-083.Washington, DC: U.S. Department of Education.

Jabine, T.B., Straf, M.L., Tanur, J.M., & Tourangeau, R. (1984). Cognitiveaspects of survey methodology: Building a bridge between disciplines.Wash, DC: National Academy Press.

Jencks, C., Smith, M., Acland, H., Bane, M. J., Cohen, D., Gintis, H., Heynes,B., & Michelson, S. (1972). Inequality. New York: Basic Books.

21

Page 27: ED 376 198 AUTHOR TITLE · DOCUMENT RESUME ED 376 198 TM 022 305 AUTHOR Kupermintz, Haggai; And Others TITLE Enhancing the Validity and Usefulness of Large-Scale. Educational Assessments:

Kupermintz, H. & Snow, R.E. (in preparation). Methodological issues inresearch on longitudinal educational achievement: The example ofNELS:88. Center for Research on the Context of Teaching, StanfordUniversity.

Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement(3rd ed., pp. 13-103). New York: Macmillan.

Raudenbush, S.W., Rowan, B., & Cheong Y.F. (1993) Higher-orderinstructional goals in secondary schools: Class, teacher, and schoolinfluences. American Educational Research Journal, 30 523-553.

Rock, D.A., Pollack, J.M., Owings, J., & Hafner, A. (1990). Psychometricreport for the NELS:88 base year test battery. National Center forEducational Statistics NCES 91467. Washington, D.C.: U.S.Department of Education.

Snow, R.E. (1982). Education and intelligence. In R.J. Sternberg (Ed.),Handbook of Human Intelligence. New York: Cambridge UniversityPress.

Snow, R.E. and Ennis, M. (in press) Correlates of high mathematical abilityin a national sample of eighth graders. In C.P. Benbow and D.Lubinski (Eds) From psychometrics to giftedness. Baltimore, MD:Johns Hopkins University Press.

Snow, R. E. & Lohman, D. F. (1989). Implications of cognitive psychologyfor education& measurement. In R. L. Linn (Ed.), Educationalmeasurement (3rd ed., pp. 262-331). New York: Macmillan.

Stout, W. (1987). A nonparametric approach for assessing latent traitunidimensionality. Psychometrika, 51, 589-617.

Talbert, J.E. & DeAngelis, K, (forthcoming) School context effects onteaching for understanding in high school mathematics. Center forResearch on the Context of Teaching, Stanford University, Stanford,CA.

Tukey, J.W. (1969). Analyzing data: Sanctification or detective work.American Psychologist .24, 83-91.

22

Page 28: ED 376 198 AUTHOR TITLE · DOCUMENT RESUME ED 376 198 TM 022 305 AUTHOR Kupermintz, Haggai; And Others TITLE Enhancing the Validity and Usefulness of Large-Scale. Educational Assessments:

Wilson, D., Wood, R., & Gibbons, R.D. (1991). TESTFACT: Testscoring, item statistics, and item factor analysis. Chicago, IL:Scientific Software Inc.

Witte, J. F. (1990). Understanding high school achievement: After a decadeof resear. do we have any confident policy recommendations? Paperpresented to the American Political Science Association, SanFrancisco.

23 .

28

Page 29: ED 376 198 AUTHOR TITLE · DOCUMENT RESUME ED 376 198 TM 022 305 AUTHOR Kupermintz, Haggai; And Others TITLE Enhancing the Validity and Usefulness of Large-Scale. Educational Assessments:

Figure Caption

Fig 1. A schematic representation of categories of variables in the NELS:88survey indicating the main relationships studied in the present report.

24

29

Page 30: ED 376 198 AUTHOR TITLE · DOCUMENT RESUME ED 376 198 TM 022 305 AUTHOR Kupermintz, Haggai; And Others TITLE Enhancing the Validity and Usefulness of Large-Scale. Educational Assessments:

Table 1

NELS:88 Mathematics Items and Descriptions

Master hem 10th Grade 10th Grade 10th GradeNumber 8th Grade Form L Form M Form H Description

MO1

M02M03M04M05M06M07M08M09M10M11M12M13M14M15M16M17M18M19M20M21M22M23M24M25M26M27M28M29M30M31M32M33M34M35M36M37M38M39M40

1

2345678

9

1

23

4

1

234

56789

1

2345678

1 0 5 1 0 91 1 1 0 1 1 1012 1 1 12 1113 1 2 1 3 1 21 4 1 3 1 4 1315 1 4 141 6 1 5 1 5

17 1 6 1618 1 7 1 7 1519 18 1820 21 1921 22 1622 2323 24 2024 25 21 1725 2626 27 22 1827 2828 29 23 1929 30 24 2030 31 25 2131 32 2632 3333 27 2234 34 28 2335 35 29 2436 36 30 2537 31 2638 37 32 2739 38 33 2840 39 34 29

Compare 2 algebraic expressions, given values of variablesCompare 2 numbers read from a graphRead two numbers from a graph and perform an operation with themCompare two algebraic expressions, given a relationshipPerform an arithmetic operation and compare result with a numberDetermine coordinates of points on a graph, perform an operationCompare two algebraic expressionsPerform an arithmetic operation, compare result with a numberPerform an arithmetic operation, compare result with a numberCompare statements about locations on two number linesCompare length of line segments illustrated in a diagramCompare expressions involving multiplication and division of integersCom "pare an integer with an expression using division of decimalsCompare expressions, given information containing exponentsCompare expressions, requiring solution of simple equationsCompare two quantities of money expressed differentlyCompare two simple arithmetic expressions involving divisionCompare two simple arithmetic expressions involving divisionCompare two simple arithmetic expressions involving multiplicationSet up a simple equation that is the solution of a word problemEstimate a probability that is the solution of a word problemDetermine the greatest of 4 decimal numbersDetermine the smallest of 4 fractions in a word problemChoose verbal description of a problem that doesn't match diagramDetermine the length of a line segment in a diagramEvaluate a relationship given statements about the variablesFind an algebraic expression odd or even given fact about variablesSolve a word problem requiring logical inferenceSolve a word problem whose answer is an algebraic expressionSolve a word problem using multiplication or factoringChoose which decimal number is between two other numbersChoose points on a number line that include a specified decimalEstimate a number using a percentage indicated in a diagramSolve a simple algebraic equationEvaluate statements inferred from a word problem with a fractionChoose which expression is different from a specified percentageSolve a word problem requiring logical inferenceEvaluate statements referring to area and diagonal of a diagramSupply number that completes an algebraic equation correctlySimplify an algebraic expression

M41 6 Perform an arithmetic operation, compare result with numberM42 7 Compare two numbers containing exponentsM43 8 Compare two numbers involving multiplication and division of fractionsM44 9 Compare two expressions involving addition and multiplication of integersM45 1 9 Compare two expressions involving addition and subtraction of a variableM46 20 Perform an arithmetic operation involving decimals, compare with numberM47 40 Identify parallel line segmentsM48 35 34 Determine distance between points in a diagramM49 36 36 Solve a long division problemM50 37 37 Determine length of side of figure given areaM51 38 38 Determine least odd integer from set of expressionsM52 39 39 Solve an algebraic inequalityM53 40 40 Determine which of a set of expressions represents a positive numberM54 30 Compute a factorialM55 31 Solve a word problem involving area and dimensionsM56 32 Determine highest score given lowest score, mean, and rangeM57 33 Solve an equation involving function notationM58 35 Solve an equation involving function notation and exponents

Source: Rock, Pollack, Owings, & Hefner (1990)

30BEST COPY AVAILABLE

Page 31: ED 376 198 AUTHOR TITLE · DOCUMENT RESUME ED 376 198 TM 022 305 AUTHOR Kupermintz, Haggai; And Others TITLE Enhancing the Validity and Usefulness of Large-Scale. Educational Assessments:

Table 2

Proposed Math Subscales and Interpretations forPreliminary 8th Grade and Revised 8th and 10th Grade Analyses

Components from Preliminary 8th Grade Analysis

MC1: Advanced Knowledge - Computation (items 5, 8, 9, 11, 12, 13, 14, 18, 34, 39, 40)

Computational items that require knowledge of roots, exponents, negative numbers, algebra,multiplication or division of decimals.

MC2: Inferential Reasoning (items 21, 22, 23, 25, 27, 31, 35, 37)"If--then" items that require examinee to draw conclusions about possible outcomes, given a particularscenario, and probability items; these items do not require (or invite) computation.

MC3: Basic Facts - computation (items 15, 16, 17, 19, 20)Items that require basic mathematics knowledge, with answers readily computed by adding, subtracting,multiplying, or dividing whole numbers.

MC4: Counterexample Reasoning (items 4, 7, 10)Items that invite examinees to devise their own concrete examples (a typical item involves thecomparison of two unspecified real numbers) to eliminate alternatives.

MCS: Multiple Steps with Figures (items 3, 33, 38)Items that require interpretation of graphical or figural information as well as several computational

steps.

Factors from Revised 8th and 10th Grade Analysis

MR: Mathematical Inferential ReasoningItems requiring inferential reasoning as in MC2 above, usually involving a scenario not just mathexpressions. Minimal computation needed. Multiple steps required. Mathematical knowledge necessarybut not sufficient.

MK: Mathematical Knowledge and ComputationItems requiring computation and knowledge as in MC1 and MO above, usually involving solution ofa mathematical expression in one or two steps. Minimal inferential reasoning required.

Page 32: ED 376 198 AUTHOR TITLE · DOCUMENT RESUME ED 376 198 TM 022 305 AUTHOR Kupermintz, Haggai; And Others TITLE Enhancing the Validity and Usefulness of Large-Scale. Educational Assessments:

Table 3

Factor Loadings From Full Information Factor Aralysis of NELS:88 Math Itemsin 8th and 10th Grades, After Promax Rotation (N=4059)

ath Grade ;0th GradeFactor mg mE E WI HE I

M54 99* -32 031455 94* -27 15M25 69* -09 08 81* 04 -14M37 57* -06 18 81* 01 -06M22 92* -12 -07 80* 03 -03M56 78* 01 -04M50 75* 11 08M38 16 25* 10 72* 12 -06M48 68* 04 01M31 70* 02 03 66* -00 '14

M36 . 39* 10 29 59* 21 -03M21 58* 02 -05 57* 20 -13M52 56* 19 01M35 58* 05 -05 50* 09 09M27 48* 14 23 48* 34 11M28 40* 05 17 48* 12 13M10 17 -10 71* 45* 36 -01M26 36* 27 16 45* 31 -00M51 44* 38 04M53 43* 22 08M33 36* 00 06 41* 15 02M30 26* 16 14 39* 11 141406 18 26 32* 37* 35 091424 38* 20 -01 36* 16 11M29 34* 17 22 36* 26 09M47 36* -02 221402 40* 06 20 34* 31 -04M49 29* 20 26M32 48* 07 08 32* 04 09M45 30 85* 361442 11 72* 08M15 -06 84* -11 11 70* -03M44 05 70* 101441 22 68* 161405 -05 52* 34 11 67* 12M19 08 63* -09 09 62* 23M13 26 36* 14 13 61* 00M12 12 38* 22 12 61* 031428 36* 15 30 38 60* -15M14 28 39* 11 26 59* -03M39 20 37* 35 34 57* -02M11 30* 29 13 28 57* -101409 14 43* 28 24 53* 00MO1 10 31 38* 00 53* 291407 11 -05 75* 36 52* -12M34 11 51* 19 14 52* 15M43 08 49* 411440 -18 60* 40 15 46* 201404 01 06 73* 42 44* -011403 23 36* 09 27 43* 021408 36* 28 13 19 42* 10M46 05 42* 40M17 31* 17 01 14 26* 08M57 14 42 72*M58 01 39 53*M20 39* 26 -18 21 04 45*M23 54* -02 -09 28 -10 33*M16 16 49* -17 07 20 28*

Egtta.Decimal points omitted* indicates highest loading for each item

32

BEST COPY AVAILABLE

Page 33: ED 376 198 AUTHOR TITLE · DOCUMENT RESUME ED 376 198 TM 022 305 AUTHOR Kupermintz, Haggai; And Others TITLE Enhancing the Validity and Usefulness of Large-Scale. Educational Assessments:

Table 4

NELS:88 8th Grade Math Items with Process. Content. Subscale. and Proficiency Level Designations

Content

Process Arithmetic Algebra Geometry Data /Probability

MK 3

Advancedto_pics

X 6Skills /knowledge

MC 5 (2)MR 8

MK 9

M< 12

MC 13 (2)MK 16 (1)MR 17 (1)MR 18 (2)MK 19 (1)MR 22

X 1

NK 15MK 34MK 40 (3)

MR 25

Understanding /comprehension

X 10

MR 20 (1)MR 31

MR 32MR 33MR 36 (3)

X 4

X 7

NI< 14 (2)MR 26MR 27MR 29MK 39 (3)

MR 11 (3)MR 37MK 38

MR 2

MR 21

MR 24

Problem solving MR 23MR 28MR 30

MR 35

Page 34: ED 376 198 AUTHOR TITLE · DOCUMENT RESUME ED 376 198 TM 022 305 AUTHOR Kupermintz, Haggai; And Others TITLE Enhancing the Validity and Usefulness of Large-Scale. Educational Assessments:

Table 5Instructional F w t I - . -

Mathematics Teacher Ouestionnaire

Factor and

Item Identifiera Item Description

TRADITIONAL INSTRUCTIONF1T2 18D Use of oral question responseF1T2 18A Use of lectureF1T2 12A Use of textbooksF1T2 16A Time spent instructing whole class

ADMINISTRATIVE TASKSF1T2 16FF1T2 16DF1T2 16EF1T2 16B

DISCUSSIONF1T2 18EF1T2 18C11T2 18H

Time spent on administrative tasksTime spent maintaining orderTime spent administering test/quizzesTime spent instructing small groups

Use of student-led discussionsUse of whole-group discussionUse of oral reports

INDIVIDUALIZATIONF1T2 18G Use of written assignmentsF1T2 18F Use of working in small groupsF1T2 16C Time spent instructing individuals

MATERIALS/AUDIO-VISUALSF1T2 18B Use of filmF1T2 12CF1T2 12BF1T2 16G

TEACHER CONTROLF1T2 17CF1T2 17EF1T2 17DF1T2 17BF1T2 17A

Use of audio-visual materialsUse of other reading materialsTime spent conducting lab periods

Control over teaching techniquesControl over amount of homeworkControl over discipliningControl over content taughtControl over texts/materials

EMPHASIS ON MATH APPLICATIONSF1T2M19F Emphasis on importance of mathF1T2M19K Emphasis on math in businessF1T2M19I Emphasis on math in scienceF1T2M19D Emphasis on interest in mathF1T2M19L Emphasis on q's about math

EMPHASIS ON HIGHER ORDER THINKINGF1T2M19J Emphasis on math conceptsF1T2M19A Emphasis on logical structureF1T2M19B Emphasis on nature of proofF1T2M19G Emphasis on problem solution

EMPHASIS ON KNOWLEDGE /'COMPUTATIONF1T2M19C Emphasis on memorizing factsF1T2M19H Emphasis on speedy computation

Note.

a Item identifiers are variable names on NELS:88 public data release

34

Page 35: ED 376 198 AUTHOR TITLE · DOCUMENT RESUME ED 376 198 TM 022 305 AUTHOR Kupermintz, Haggai; And Others TITLE Enhancing the Validity and Usefulness of Large-Scale. Educational Assessments:

a

Table 6

Regression Results for 10th Grade Mathematical Knowledge (N460)

VARIABLE STUDENT COURSE/PRG INSTRUCT OUTSIDE

INTERCEPT 42 15 39 35

PRIOR ACHEVEMENTREADINGS 16 15 16 18MATH KNOWLEDGE 8 41 37 40 43MATH REASONINGS 34 31 " 35 36'STUDENTABSENT -09GENDER -05SES 07ASIAN 11BLACK -02HISPANIC 04MATH ANX -01LUCK 8 -04NEG SELF 8 -03POS SELF 8 -01MATH ATT 8 06

COURSE/PROGRAMADVANCED TRACK 06GENERAL TRACK -08GIFTED PRG 05GENERAL MATH TAKEN -34ALGEBRA I TAKEN 18ALGEBRA II TAKEN 17GEOMETRY TAKEN 19ALGEBRA IN 8TH 09

INSTRUCTIONALTRADITIONAL INSTRUCT 06 *ADMINISTRATIVE TASKS -03DISCUSSION 00INDIVIDUALIZATION -07MATERIALSAV -01TEACHER CONTROL -02EMPH. MATH APPUCATIONS -05HIGHER-ORDER THINKING 13KNOWLEDGE/COMPUTATION -02UNDERSTANDING 05HOMEWORK DISCUSS 02TIME ON HOMEWORK 05

OUTSIDESCIENCE MUSEUMS 08HOURS TV -00COMPUTER IN HOME 08COMPUTER CLASS 02HELP WITH HOMEWORK -03

R-SOUARED 63 66 64 62

NoteDecimal points omittedp <.01

Page 36: ED 376 198 AUTHOR TITLE · DOCUMENT RESUME ED 376 198 TM 022 305 AUTHOR Kupermintz, Haggai; And Others TITLE Enhancing the Validity and Usefulness of Large-Scale. Educational Assessments:

Table 7

Regression Results for 10jh Grade Mathematical Reasoning (N=5460)

VARIABLE STUDENT COURSE/PRG INSTRUCT OUTSIDE

INTERCEPT

PRIOR ACHIEVEMENT

42 38 45 40

READING 8 16 15 15` 16MATH KNOWLEDGE 8 28 24 27 29 *MATH REASONING 8 47 50 . 51 51 *

STUDENTABSENT -05GENDER 13aS 08ASIAN 03BLACK -20HISPANIC -04MATH ANX -01LUCK 8 -01NEG SELF 8 -03POSSELF 8 -00MATH ATT 8 02

COURSE/PROGRAMADVANCED TRACK 02GENERAL TRACK 00GIFTED PRG 06GENERAL MATH TAKEN -15ALGEBRA I TAKEN -05ALGEBRA II TAKEN 11GEOMETRY TAKEN 12ALGEBRA IN 8TH 06 *

INSTRUCTIONALTRADITIONAL INSTRUCT 03ADMINISTRATIVE TASKS -01DISCUSSION 00INDIVIDUALIZATION -05MATERIALSAV -01TEACHER CONTROL -01EMPH. MATH APPUCATIONS -01HIGHER-ORDER THINKING 06KNOWLEDGE/COMPUTATION -02UNDERSTANDING 01HOMEWORK DISCUSS 00TIME ON HOMEWORK 02

OUTSIDESCIENCE MUSEUMS 03HOURS TV -01COMPUTER IN HOME 15COMPUTER CLASS 04HELP WITH HOMEWORK -03

R-SQUARED 66 65 65 65

.11.912Decimal points omitted*p<.01

Page 37: ED 376 198 AUTHOR TITLE · DOCUMENT RESUME ED 376 198 TM 022 305 AUTHOR Kupermintz, Haggai; And Others TITLE Enhancing the Validity and Usefulness of Large-Scale. Educational Assessments:

r

Table 8

Regression Results for Overall Models for 10th Grade Mathematical Knowledge. Mathematical Reasoning. andTotal Math IRT Score (N=5460)

VARIABLE MK MR TOTAL IRT

INTERCEPT 19 33 12

PRIOR ACHIEVEMENTREADINGS 13 14 ' 13MATH KNOWLEDGE 8 33 21 24 'MATH REASONING 8 29 44 33

STUDENTABSENT -06 -04 -05GEN:ER -03 12 * 01SES 03 06 03 'ASIAN 06 -00 00BLACK -08 -24 -14HISPANIC 01 -06 -01LUCK 8 -02 -01 -01NEG SELF 8 -01 -01 -01MATH ATT 8 04 01 03

COURSE/PROGRAMADVANCED TRACK 04 02 02GENERAL TRACK -06 02 -04GENERAL MATH TAKEN -29 -14 -22ALGEBRA I TAKEN 12 -07 06 'ALGEBRA II TAKEN 17 11 12GEOMETRY TAKEN 10 06 08 *

ALGEBRA IN 8TH 08 07 08

INSTRUCTIONALTRADITIONAL INSTRUCT 04 02 03INDIVIDUALIZATION -05 -04 -05EMPH. MATH APPLICATIONS -03 * -02 -03HIGHER-ORDER THINKING 08 05 06KNOWLEDGE/C OMPUTATION -02 -01 -01UNDERSTANDING 04 01 03TIME ON HOMEWORK 03 * 01 02

OUTSIDESCIENCE MUSEUMS 02 -01 00COMPUTER IN HOME 06 09 07HELP WITH HOMEWORK -03 -04 -03

R-SQUARED 68 67 75

NoteDecimal points omittedp<.01

Page 38: ED 376 198 AUTHOR TITLE · DOCUMENT RESUME ED 376 198 TM 022 305 AUTHOR Kupermintz, Haggai; And Others TITLE Enhancing the Validity and Usefulness of Large-Scale. Educational Assessments:

8th

GR

AD

E10

th G

RA

DE

12th

GR

AD

E

Tea

cher

Tea

cher

Tea

cher

Tea

cher

Tea

cher

Bac

kgro

und

Bac

kgro

und

Con

text

Bac

kgro

und

Con

text

Inst

ruct

iona

lP

ract

ices

[Cou

rse

Pro

gram

Cho

ices

Stu

dent

Ach

ieve

men

tF

acto

rs

Stu

dent

Per

sona

lF

acto

rs

Inst

ruct

iona

lP

ract

ices

Cou

rse

Pro

gram

Cho

ices

T Stu

dent

Ach

ieve

men

tF

acto

rs

[Stu

dent

Per

sona

!F

acto

rs

Stu

dent

Stu

dent

Stu

dent

Stu

dent

Rea

ding

Out

side

Rea

ding

Out

side

Abi

lity

Lear

ning

Abi

lity

Lear

ning

Inst

ruct

iona

lP

ract

ices

Cou

rse

Pro

gram

Cho

ices

Stu

dent

Ach

ieve

men

tF

acto

rs

Stu

dent

Per

sona

lF

acto

rs

Stu

dent

Rea

ding

Abi

lity

Stu

dent

Out

side

Lear

ning

OT

HE

R S

TU

DE

NT

, PA

RE

NT

AL,

AN

D H

OM

E B

AC

KG

RO

UN

D V

AR

IAB

LES

3839