224
Instructor’s Resource Manual and Test Bank for Measurement and Assessment in Teaching Eleventh Edition M. David Miller University of Florida Robert L. Linn Professor Emeritus, University of ColoradoBoulder Norman E. Gronlund Late of University of Illinois at UrbanaChampaign Prepared by Michael Poulin Boston Columbus Indianapolis New York San Francisco Upper Saddle River Amsterdam Cape Town Dubai London Madrid Milan Munich Paris Montreal Toronto Delhi Mexico City Sao Paulo Sydney Hong Kong Seoul Singapore Taipei Tokyo

Measurement and Assessment in Teaching

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Instructor’s Resource Manual and Test Bank

for

Measurement and Assessment in Teaching Eleventh Edition

M. David Miller

University of Florida

Robert L. Linn

Professor Emeritus, University of Colorado–Boulder

Norman E. Gronlund

Late of University of Illinois at Urbana–Champaign

Prepared by

Michael Poulin

Boston Columbus Indianapolis New York San Francisco Upper Saddle River

Amsterdam Cape Town Dubai London Madrid Milan Munich Paris Montreal Toronto

Delhi Mexico City Sao Paulo Sydney Hong Kong Seoul Singapore Taipei Tokyo

i

______________________________________________________________________________

Copyright © 2013, 2009, 2005 by Pearson Education, Inc. All rights reserved. Manufactured in the United States of

America. This publication is protected by Copyright, and permission should be obtained from the publisher prior to

any prohibited reproduction, storage in a retrieval system, or transmission in any form or by any means, electronic,

mechanical, photocopying, recording, or likewise. To obtain permission(s) to use material from this work, please

submit a written request to Pearson Education, Inc., Permissions Department, One Lake Street, Upper Saddle River,

New Jersey 07458, or you may fax your request to 201-236-3290.

Instructors of classes using Miller, Linn, and Gronlund’s Measurement and Assessment in Teaching, 11e, may

reproduce material from the instructor's resource manual and test bank for classroom use.

10 9 8 7 6 5 4 3 2 1 ISBN-10: 0-13-297795-8

ISBN-13: 978-0-13-297795-1

ii

PREFACE

This manual is intended for use with the 11th edition of MEASUREMENT AND

ASSESSMENT IN TEACHING. It consists of (1) descriptions of student activities that might be

used to enhance learning, and (2) a set of test items for each chapter of the book. The majority of

test items are multiple-choice but there are also some other selection-type items and short-answer

problems. The test items for each chapter have been revised to reflect changes the 11th edition of

the textbook and many new items have been added.

The student projects contained in this manual are useful for developing skill in the construction

and selection of tests and other assessment procedures. These types of activities also help

students appreciate some of the difficulties involved in developing objectives, constructing

specifications for tests and assessments, constructing test items and performance-based

assessment tasks, and selecting published tests. Such projects are typically considered an

important part of a measurement course, both from the standpoint of enhancing student learning

and as a useful means of evaluating how well students are able to apply the concepts and

procedures learned in the course.

The test bank items are arranged by chapter and are intended only as a beginning pool of items to

aid in constructing classroom tests. The items are geared closely to each chapter in the textbook

and thus have more knowledge items than would ordinarily be used in a typical classroom test.

The best procedure is to supplement these items with selection-type and supply-type items of

your own as well as performance tasks requiring more extended responses. By supplementing the

items in this way it should be possible to give more emphasis to testing and assessment issues

that cut across chapters and to measuring the understanding of concepts and issues discussed in

class.

iii

CONTENTS

Page

STUDENT PROJECTS……………………………………………………………….... 1

1. Educational Testing and Assessment: Context, Issues, and Trends………..………... 4

2. The Role of Measurement and Assessment in Teaching………………………….… 15

3. Instructional Goals and Objectives: Foundation for Assessment…………………… 26

4. Validity……………………………………………………………………......…..… 38

5. Reliability and Other Desired Characteristics………………………………..……… 49

6. Planning Classroom Tests and Assessments………………………………………… 61

7. Constructing Objective Test Items: Simple Forms……………………………..….... 73

8. Constructing Objective Test Items: Multiple-Choice Forms………………………... 85

9. Measuring Complex Achievement: The Interpretative Exercise…………………... 96

10. Measuring Complex Achievement: Essay Questions……………………………. 107

11. Measuring Complex Achievement: Performance-Based Assessments………….. 118

12. Portfolios………………………………………………………………………… 129

13. Assessment Procedures: Observational Techniques, Peer Appraisal, and

Self-Report……………………………………………………………...……. 139

14. Assembling, Administering, and Appraising Classroom Tests and Assessments.. 149

15. Grading and Reporting…………………………………………………………… 159

16. Achievement Tests…………………………………………………………….… 170

17. Aptitude Tests ……………………………………………………………………. 181

18. Test Selection, Administration, and Use…………………………………………. 192

19. Interpreting Test Scores and Norms……………………………...……………… 204

Appendix A – Elementary Statistics…………………………………………….…… 215

1

STUDENT PROJECTS

During a one-semester course it is usually possible to have students develop a complete test and

assessment construction project and critically review one or more standardized tests or

assessments in their teaching fields. If the course is being offered for a shorter period of time, or

as part of another course, it may be desirable to use a series of shorter projects.

Complete Test and Assessment Construction Project

Each student is asked to select some course, or unit of work within a course, and develop a test

and assessment construction project that includes the following:

1. A list of 5 to 15 important learning outcomes to be assessed.

2. A list of subject-matter topics to be covered in the instruction.

3. A set of specifications for the test items and assessment tasks as described in Chapter

6.

4. A 40-item test using a combination of selection-type and short-answer, supply-type

items which includes: (a) complete directions, (b) test items that are appropriate for

the specific learning outcomes being measured, and (c) a scoring key. Each test item

should be keyed to a specific learning outcome.

5. Four extended-response assessment tasks using either the essay question format

discussed in Chapter 10 or the performance-based task approach described in Chapter

11. The assessment tasks should include complete directions, including specification

of any special resources (e.g., equipment, books) available to students and a scoring

guide. Each task should include a brief description of the learning outcomes the task

is intended to measure and why those outcomes would be difficult or impossible to

measure with items like those used in the 40-item test.

6. A bibliography of books and other source materials used in completing the project.

This project is fairly time consuming but it takes the student through the major steps of

constructing tests and assessments that are emphasized in the textbook. Since the steps in the

project closely parallel the sequence of the chapters in the textbook, it is possible for students to

start on it early in the semester and to work on each phase of it as it is discussed in class.

The above project can be reduced in scope by reducing the number of objectives, the number of

test items, or the number of performance assessment tasks.

2

Brief Test and Assessment Construction Projects

1. Have students select a chapter in the textbook and do the following:

a. State the learning outcomes stressed in the chapter.

b. Construct 10 objective test items and one essay question or performance-

based assessment task.

c. Indicate the learning outcome measured by each item and task.

2. Have student construct one multiple-choice item to measure each of the following

learning outcomes.

a. Knowledge of a specific term.

b. Knowledge of a specific test.

c. Knowledge of a method or procedure.

d. Understanding of a fact, principle, or procedure.

e. Ability to apply a fact, principle, or procedure.

3. Construct an interpretive exercise for each of the following.

a. A paragraph of written material.

b. Some type of pictorial material.

4. Construct a performance-based assessment task that could be completed in one class

period that would measure the ability to apply critical course concepts in a realistic

setting.

Portfolio Construction Project

Have students construct guidelines for a portfolio intended to display progress to parents during

the school year. Allow students to choose the subject area and grade for the portfolio. The

guidelines should specify:

a. The purpose of the portfolio.

b. Who will have access to the portfolio.

c. The number and types of entries students are expected to include.

d. The role of collaboration in developing portfolio entries.

e. The inclusion of self-evaluations of the entries.

f. The evaluation criteria to be employed.

Critical Evaluation of Published Tests

Have each student critically evaluate one or more of the following tests, using Chapter 18 of the

textbook.

1. Achievement test battery.

2. Achievement test in a specific content area.

3. Reading test (readiness, diagnostic, or survey type).

4. Scholastic aptitude test or multiaptitude test.

5. Test in a special area (art, music, creativity).

3

Item Analysis Project

Provide students with responses to a set of items (possibly one of your own tests) and have them

conduct an item analysis and interpret the results. Access to an easy-to-use item analysis package

for a personal computer would facilitate this project as well as show students how to use such

software.

Construction of Rating Scales and Checklists

Have student select an appropriate area and construct either a rating scale or a checklist.

4

Chapter 1 Educational Testing and Assessment: Context, Issues, and Trends

Exercise 1-A

HISTORY OF TEST-BASED REFORM

LEARNING GOAL: Identifies trends in the use of tests in educational reform efforts.

Directions: Indicate whether the following statements about the use of tests in reform efforts

during the past forty years are true (T) or false (F) by circling the appropriate letter.

T F 1. The current emphasis on accountability in education has resulted in increased

school testing.

T F 2. The rapid growth of minimum-competency testing requirements in the

1970s and early1980s was stimulated by the widely-held belief that high

school graduates often lacked essential skills.

T F 3. There is public support for the use of test results to compare schools

academically.

T F 4. Concerns that accountability leads to teaching to the test have contributed to calls

for increased reliance on performance-based assessments.

T F 5. Content standards specify the minimum score required to pass a test.

LEARNING GOAL: Distinguishes between the purposes and characteristics of content standards

and performance standards.

Directions: List the defining features of content standards and of performance standards.

Distinguish between and describe the primary purposes of these two types of standards.

Content Standards:

Performance Standards:

Note: Answers will vary

5

Exercise 1-B

PERFORMANCE ASSESSMENTS

LEARNING GOAL: Identifies characteristics of and rationales for the use of performance

assessments.

Directions: Indicate whether measurement specialists would agree (A) or disagree (D) with

each of the following statements concerning performance assessments by circling the appropriate

letter.

A D 1. The belief that testing and assessment shapes instruction has led to

increased emphasis on performance assessments.

A D 2. The best way to achieve authentic assessment in the classroom is through

performance assessments.

A D 3. Many proponents of performance assessments accept the idea that “what

you test is what you get.”

A D 4. Tasks requiring extended responses have been the target of most criticisms

of testing and assessment.

A D 5. Anything that can be measured by a performance assessment task could

also be measured by a multiple-choice test.

LEARNING GOAL: Identifies advantages and disadvantages of performance-based

assessments.

Directions: List some of the major advantages and disadvantages of performance-based

assessments.

Advantages:

Disadvantages:

Note: Answers will vary

6

Exercise 1-C

NATIONAL AND INTERNATIONAL ASSESSMENT

LEARNING GOAL: Identifies characteristics and limitations of national and international

assessments.

Directions: Indicate whether the following statements about national and international

assessment are true (T) or false (F) for circling the appropriate letter.

T F 1. The National Assessment of Educational Progress (NAEP) enables schools to

compare the performance of their students to the nation as a whole.

T F 2. NAEP provides a means of monitoring trends in the achievement of a student

over 25 years.

T F 3. In addition to national results, NAEP now provides results for state-by-state

comparisons on a voluntary basis.

T F 4. NAEP collects achievement data for students by both age and grade level.

T F 5. Comparisons of nations based on international assessments are as trustworthy as

comparison of regions of the country based on NAEP results.

T F 6. Comparability of results in international assessments is assured by translating

assessments to the languages spoken in different countries.

LEARNING GOAL: Identifies influences at the national level that may influence the role and

nature of testing and assessment in the future.

Directions: Briefly describe actions of the federal government that are likely to influence testing

and assessment in the future.

Note: Answers may vary.

7

Exercise 1-D

CURRENT TRENDS IN EDUCATIONAL MEASUREMENT

LEARNING GOAL: Identifies factors related to current trends in testing and assessment.

Directions: Indicate whether measurement specialists would agree (A) or disagree (D) with

each of the following statements concerning current trends in testing and assessment by circling

the appropriate letter.

A D 1. Computers are especially useful for adaptive testing.

A D 2. Computer-administered simulations of problems enable the measurement of

complex skills not readily measured by paper-and-pencil tests.

A D 3. Despite concern about the quality of school programs, there has been a demand

for less testing and assessment.

A D 4. Computerized adaptive tests increase efficiency by reducing the number of items

that need to be administered to achieve reliable measurement for a given test

taker.

A D 5. The focus on the consequences of testing and assessment has decreased in recent

years.

A D 6. The computer provides the potential to present simulations that can measure the

processes that student use to solve problems.

LEARNING GOAL: Describes advantages and limitations of expanded uses of computer-based

tests and assessments.

Directions: Briefly describe some of the important advantages and major limitations of expanded

uses of computer-based tests assessments.

Advantages:

Limitations:

Note: Answers may vary

8

Exercise 1-E

CONCERNS AND ISSUES IN TESTING AND ASSESSMENT

LEARNING GOAL: Identifies factors related to concerns and issues in testing and assessment.

Directions: Indicate whether test specialists would agree (A) or disagree (D) with each of the

following statements describing concerns and issues in testing and assessment by circling the

appropriate letter.

A D 1. Many of the criticisms of testing are the result of misinterpretation and misuse of

test scores.

A D 2. A common misinterpretation of scores on tests and assessments is to

assume they measure more than they do.

A D 3. Testing can only benefit students.

A D 4. If a particular group of students receives lower scores on a test, it means the test is

biased against members of that group.

A D 5. It is good practice to post scores on standardized tests so that students in a class

can see how their performance compares to that of their peers.

A D 6. Test anxiety may lower the performance of some students.

LEARNING GOAL: Lists the possible effects of students and parents examining school testing

and assessment results.

Directions: List the advantages and disadvantages of the legal requirement that students and

parents must be provided with access to school testing and assessment records.

Advantages:

Disadvantages:

Note: Answers may vary

9

Answers to Student Exercises

1-A 1-B 1-C 1-D 1-E

1. T 1. A 1. F 1. A 1. A

2. T 2. D 2. T 2. A 2. A

3. T 3. A 3. T 3. D 3. D

4. T 4. D 4. T 4. D 4. D

5. T 5. D 5. F 5. D 5. D

6. F 6. A 6. D

10

Chapter 1

Educational Testing and Assessment: Context, Issues, and Trends

1. Externally mandated testing and assessment programs are often appealing to policy makers

because they

A. are popular with teachers.

B. are written by teachers in the child’s given school or school system.

C. indicate whether a given school or school district is effective.

D. indicate high or low teacher quality.

2. Content standards are intended to specify which of the following?

A. instructional approaches to use in teaching specific content

B. the curriculum for all subjects and grade levels

C. what students are expected to learn in a subject or course

D. lists of curriculum materials that should be used in lessons

3. Accountability programs for educational reform have put pressure on schools to make which

of the following decisions?

A. abolish the use of published tests and assessments for test preparation

B. reduce the number of classroom aides working in schools

C. increase the use of a variety of tests and assessments to prepare students

D. offer financial incentives such as scholarships for high-performing students

4. When externally mandated tests are used to measure current student achievement and

progress, the tests are being used as

A. a barometer

B. a lever

C. a method of formative assessment

D. a process to test teacher efficiency and quality.

5. Which of the following best summarizes the findings in the report “A Nation at Risk”?

A. children tended to be tested too much, especially in later grades

B. tests should be administered beginning in the upper elementary or middle school

grades

C. children in the USA scored better than students in most European countries but

lower than most students in Asian countries

D. the quality of American education was mediocre compared with other countries

6. One negative influence of the pressures of accountability on schools is that it encourages

teachers to

A. show students how to make educated guesses on difficult multiple choice test

questions.

B. put less emphasis on important instructional topics not on the test.

C. stress the importance of test scores on students’ overall academic career.

D. organize into grade level teams in order to co-teach curriculum.

11

7. Which of the following would likely be a possible danger of the accountability movement on

the local school program?

A. a narrowing of objectives

B. a neglect for basic skills

C. an expansion of the curriculum

D. an overemphasis on performance objectives

8. Which of the following events followed shortly after the publication of “A Nation at Risk”?

A. Many teachers and administrators were fired and/or transferred to other positions.

B. All 50 states introduced some form of educational reform.

C. No Child Left Behind was enacted.

D. One standardized test was adopted for use by all 50 states.

9. Which of the following summarizes the main difference between content standards and

performance standards?

A. Content standards define what will be learned while performance standards define

how things will be learned.

B. Content standards define how things will be learned while performance standards

define what will be learned.

C. Content standards measure student effort while performance standards measure

the quality of the student performance.

D. Content standards are gender specific while performance standards are specific to

certain minority groups.

10. Computerized testing can increase the efficiency of testing by incorporating

A. adaptive testing procedures.

B. conventional test layout and formats.

C. informal teacher-made tests.

D. more essay questions.

11. Test critics have focused much of their attention on which of the following?

A. how essay tests are administered

B. how math and science tests are scored

C. the use of multiple-choice items

D. printing tests other languages for nonnative English speaking students

12. Abolishing all published tests would most likely yield which of the following results?

A. quicker administrative staffing decisions

B. less effective educational decisions

C. a more objective assessment of accountability programs

D. more opportunities for individuals to succeed on merit

12

13. Misuse of published tests probably can best be prevented by more careful

A. administration.

B. interpretation.

C. scoring.

D. collation.

14. Which of the following would serve as a particularly well founded criticism of standardized

tests?

A. they are used to evaluate teachers rather than children’s achievement levels

B. they measure only limited characteristics of an individual

C. they require excessive time to administer

D. they result in an overemphasis on complex reasoning skills

15. Critics of externally mandated tests argue that these tests cause anxiety for children. Which

of the following arguments might a proponent of externally mandated test likely counter with?

A. Moderate test anxiety can lead to student motivation to learn and do well on tests.

B. Students with test anxiety tend score well on these tests because they are awarded

extra time.

C. Giving students positive rewards for doing well on tests negates most test anxiety.

D. Test anxiety may be present for older students but is virtually nonexistent for

younger students.

16. Which of the following is of particular concern regarding the interpretation of students’ test

scores?

A. that students were given adequate time to take the test

B. that the test did not contain any open-response questions

C. that the test was administered in the morning

D. that the test results do not lead to stereotyping or labeling students

17. Mr. Johnson has told Billy that he “can do better” while admitting to Monica that she is

probably doing “as well as can be expected.” Which of the following acts is Mr. Johnson

likely guilty of?

A. alienating parents

B. reinforcing a self-fulfilling prophecy

C. relying too much on test results

D. not taking into account that some tests may contain gender bias

18. Ms. Smith is using an assessment to gauge how well James is learning day-to-day class

material as well as devising educational programs designed to help students learn the

classroom material better. Which of the following types of assessment would be most

beneficial to Ms. Smith?

A. externally mandated

B. summative

C. formative

D. arbitrary

13

19. Briefly explain, in 4–6 sentences, the mandates of No Child Left Behind and how it relates to

the inclusion or exclusion of testing children with disabilities.

14

Chapter 1: Answer Key

1. C

2. C

3. C

4. A

5. D

6. B

7. A

8. B

9. A

10. A

11 C

12. B

13. B

14. B

15. A

16. D

17. B

18. C

19. According to NCLB, all students regardless of disability must indicate proficiency in

mastering learning goals. Only students with the most severe disabilities may obtain a waiver

from this requirement and such waivers must be arrived at the district level. However, in

achieving State proficiency standards, allowance and accommodations must be allowed for

students with disabilities. The requirement for such accommodations states that a student’s

disability cannot be an impediment or the cause of his/her inability to demonstrate

competency in learning goals. An example of this would be allowing a student with learning

disabilities to have extra time to take the State mandated tests.

15

Chapter 2 The Role of Measurement and Assessment in Teaching

Exercise 2-A

PRINCIPLES AND PROCEDURES OF CLASSROOM ASSESSMENT

LEARNING GOAL: Distinguishes between sound and unsound principles and procedures.

Directions: Indicate whether each of the following statements represents a sound (S) or

unsound (U) principle or procedure of classroom assessment by circling the appropriate letter to

the left of the statement.

S U 1. The first step in measuring classroom learning is to decide on the type of

test to use.

S U 2. Classroom assessment should be based on objective data only.

S U 3. The type of classroom assessment used should be determined by the

performance to be measured.

S U 4. Effective classroom assessment requires the use of a variety of assessment

techniques.

S U 5. Assessment techniques should replace teacher observation and judgment.

S U 6. Error of measurement must always be considered during the interpretation

of assessment results.

LEARNING GOAL: State the meaning of test, measurement, and assessment.

Directions: In your own words, state the meaning of each of the following terms.

Test:

Measurement:

Assessment:

Note: Answers will vary.

16

Exercise 2-B

CLASSROOM ASSESSMENT AND

THE INSTRUCTIONAL PROCESS

LEARNING GOAL: Identifies how classroom assessment functions in the instructional process.

Directions: Indicate whether the textbook authors would agree (A) or disagree (D) with each of

the following statements by circling the letter to the left of the statement.

A D 1. The main purpose of classroom assessment is to improve student learning.

A D 2. The first step in both teaching and assessment is to determine the intended

student learning outcomes.

A D 3. Classroom assessments should not be given until the end of instruction.

A D 4. Instructional objectives should aid in selecting the types of assessment

instruments to use.

A D 5. Assessment results should be used primarily for assigning grades.

LEARNING GOAL: Describes the role of instructional objectives.

Directions: Describe the role of instructional objectives in classroom assessment.

Note: Answers will vary.

17

Exercise 2-C

MEANING OF PLACEMENT, FORMATIVE, DIAGNOSTIC, AND SUMMATIVE

ASSESSMENT

LEARNING GOAL: Classifies examples of classroom assessment procedures.

Directions: For each of the following descriptions, indicate which type of assessment is

represented by circling the appropriate letter using the following key.

KEY P = Placement F = Formative

D = Diagnostic S = Summative

P F D S 1. An achievement test is used to certify student mastery.

P F D S 2. Students are given a ten-item test to determine their learning progress.

P F D S 3. A teacher observes the process used by a student solving arithmetic

problems.

P F D S 4. Algebra students take an arithmetic test on the first day of class.

P F D S 5. Course grades are assigned.

P F D S 6. An assessment is given at the beginning of a new unit.

LEARNING GOAL: State examples of types of assessment procedures.

Directions: For each of the following types of assessment state one specific example that

illustrates its use in some subject area.

Placement:

Formative:

Diagnostic:

Summative:

Note: Answers will vary.

18

Exercise 2-D

MEANING OF CRITERION-REFERENCED

AND NORM-REFERENCED INTERPRETATIONS

LEARNING GOAL: Distinguishes between examples of each type of interpretation.

Directions: Indicate whether each of the following statements represents a criterion-referenced

(C) interpretation or a norm-referenced (N) interpretation by circling the appropriate letter.

C N 1. Erik obtained the highest score on the reading test.

C N 2. Carlos can identify all of the parts of a sentence.

C N 3. Connie can type 60 words per minute.

C N 4. John earned an average score on an arithmetic test.

C N 5. Tonia defined only 20 percent of the science terms.

C N 6. Maria set up her laboratory equipment faster than anyone else.

LEARNING GOAL: Writes statements representing each type of interpretation.

Directions: Write three statements that represent criterion-referenced interpretations and three

statements that represent norm-referenced interpretations.

Criterion-referenced interpretations:

Norm-referenced interpretations:

Note: Answers will vary.

19

Exercise 2-E

MEANING OF CONTRASTING TEST TYPES

LEARNING GOAL: Distinguishes between contrasting test types.

Directions: For each of the following test descriptions indicate which test type is represented by

circling the letter to the left of each description using the following key.

KEY A = Informal C = Mastery E = Speed G = Objective I = Verbal

B = Standardized D = Survey F = Power H = Subjective J = Performance

A B 1. A test using national norms for interpretation.

C D 2. A test used to measure many skills with just a few items for each skill.

E F 3. A test with many items, most relatively simple.

G H 4. A test on which different students obtain the same score results.

I J 5. A test requiring students to set up laboratory equipment.

LEARNING GOAL: Describes a test representing a given test type.

Directions: In the spaces below, write a brief description of a specific test representing each of

the test types.

Survey test:

Mastery test:

Power test:

Objective test:

Note: Answers will vary.

20

Answers to Student Exercises

2-A 2-B 2-C 2-D 2-E

1. U 1. A 1. S 1. N 1. B

2. U 2. A 2. F 2. C 2. D

3. S 3. D 3. D 3. C 3. E

4. S 4. A 4. P 4. N 4. G

5. U 5. D 5. S 5. C 5. J

6. S 6. P 6. N

21

Chapter 2

The Role of Measurement and Assessment in Teaching

1. Classroom assessment of students should primarily focus on which of the following?

A. behavior

B. grading

C. learning

D. feedback

2. Which of the following terms is the most limited?

A. Assessment

B. Measurement

C. Testing

D. Quantitative description

3. Which of the following forms of assessment is the most effective way to determine whether

students are making satisfactory progress?

A. diagnostic

B. formative

C. norm-referenced

D. summative

4. Measurement always involves which of the following?

A. numbers

B. testing

C. performance

D. value judgments

5. When teachers use tests and assessments in the classroom, the highest priority should be

given to which of the following factors?

A. assigning course grades

B. improving instruction

C. maintaining adequate school records

D. reporting student progress to parents

6. Which of the following is one of the most important issues to consider when selecting an

assessment technique?

A. accuracy

B. convenience

C. objectivity

D. relevance

22

7. The first step in measuring student achievement is to determine the

A. date of testing

B. difficulty of the test

C. current student averages

D. method of assessment

8. Measures of maximum performance most likely would include which of the following?

A. mid-term tests

B. attitude scales

C. student journals

D. personality measures

9. Which of the following would be evaluated by using a measure of typical performance?

A. Arithmetic computation

B. Arithmetic problem solving

C. Writing a friendly letter

D. Reading comprehension

10. Which of the following methods of assessment would most likely be given at the beginning

of instruction?

A. Contextual

B. Formative

C. Diagnostic

D. Summative

11. Formative assessment is used primarily for which of the following purposes?

A. grading students

B. monitoring student progress

C. placing students in groups

D. selecting students for awards

12. Summative assessments are most appropriate for which of the following?

A. determining the extent to which instructional goals have been

B. achieved

C. diagnosing student strengths and weaknesses

D. measuring entry learning skills in students from various backgrounds

E. measuring progress during learning

13. Assessments may be classified as norm-referenced and criterion-referenced on the basis of

the types of

A. directions used.

B. interpretations to be made.

C. learning outcomes measured.

D. test items used.

23

14. Which of the following types of assessments would most likely be norm-referenced?

A. Diagnostic

B. Mastery goal attainment

C. Readiness

D. College entrance exam

15. Which of the following factors is likely to differ when constructing norm-referenced and

criterion-referenced tests?

A. Arrangement of items

B. Item difficulty

C. Types of items

D. Relevance to objectives

16. Which of the following is most likely to be used in a criterion-referenced interpretation?

A. Average score in a group

B. Highest score in a group

C. Percentage correct score out of 20

D. Percentile score of 80

17. Norm-referenced and criterion-referenced tests are best viewed as

A. standardized tests.

B. similar tests with similar intents.

C. two different types of tests.

D. valid measures of student learning.

18. Which of the following best represents an untimed test that has items arranged in

increasing order of difficulty?

A. Diagnostic test.

B. Standardized test.

C. Performance assessment.

D. Power test.

19. Which of the following tests is likely to measure many skills with only a few items for each

skill?

A. Diagnostic test

B. Mastery test

C. Performance assessment

D. Survey test

20. A test is referred to as objective when it meets which of the following criteria?

A. Different scorers obtain the same results

B. It is constructed using a variety of question types

C. It measures a clearly defined set of standards

D. There is a standard procedure for interpreting the results

24

21. Which of the following would best describe most teacher-made tests?

A. Informal, power.

B. Formal, speed.

C. Standardized, power.

D. Standardized, speed.

22. Which of the following represents a norm-referenced interpretation?

A. Henry wrote over 500 words for each of his essay questions.

B. Jane defined 70 percent of the items correctly.

C. Bruce’s score was near the top of the class.

D. Emily completed 30 of the 40 math problems correctly.

23. Mr. Rich is a new teacher and is concerned about his class understanding the material being

taught. Which of the following types of assessment would best monitor his instruction?

A. A pretest at the beginning of the class.

B. Frequent class quizzes.

C. Standardized achievement tests.

D. Aptitude tests.

25

Chapter 2: Answer Key

1. C

2. C

3. B

4. A

5. B

6. D

7. D

8. A

9. C

10. C

11. B

12. A

13. B

14. D

15. B

16. C

17. C

18. D

19. D

20. A

21. A

22. C

23. B

26

Chapter 3 Instructional Goals and Objectives: Foundation for Assessment

Exercise 3-A

INSTRUCTIONAL OBJECTIVES AS LEARNING GOALS

LEARNING GOAL: Distinguishes between statements of learning process and learning

outcomes.

Directions: Indicate whether each of the following features describes a learning process (P) or a

learning outcome (O).

P O 1. Learns the meaning of terms.

P O 2. Develops a more favorable attitude toward reading.

P O 3. Demonstrates concern for the environment.

P O 4. Locates a position on a map.

P O 5. Practices interpreting charts and graphs.

P O 6. Describes the value of good study habits.

LEARNING GOAL: Writes well-stated outcomes.

Directions: In the spaces below (1) rewrite as learning outcomes each of the statements at the

top of the page that were classified as learning processes, and (2) write three general statements

of learning outcomes for a course of subject area.

1. Learning outcomes rewritten from process statements.

2. Three general learning outcomes for a course.

Note: Answers will vary.

27

Exercise 3-B

DOMAINS OF THE TAXONOMY (COGNITIVE, AFFECTIVE, PSYCHOMOTOR)

LEARNING GOAL: Identifies examples of instructional objectives belonging to each

taxonomy.

Directions: Indicate the taxonomy domain to which each of the following general instructional

objectives belongs by circling the appropriate letter.

KEY A = Affective

C = Cognitive

P = Psychomotor

A C P 1. Understands basic concepts.

A C P 2. Appreciates the contributions of scientists.

A C P 3. Evaluates a book.

A C P 4. Operates a slide projector.

A C P 5. Writes smoothly and legibly.

A C P 6. Demonstrates an interest in science.

LEARNING GOAL: Writes general instructional objectives that fit each taxonomy domain.

Directions: Write two general instructional objectives for each of the following domains of the

taxonomy.

Cognitive objectives:

Affective objectives:

Psychomotor objectives:

Note: Answers will vary.

28

Exercise 3-C

SELECTING APPROPRIATE INSTRUCTIONAL OBJECTIVES

LEARNING GOAL: Distinguishes between sound and unsound criteria for selecting

instructional objectives.

Directions: Indicate whether each of the following statements is a sound (S) or unsound (U)

criterion for selecting instructional objectives, by circling the appropriate letter.

S U 1. Instructional objectives should be limited to those learning outcomes that can be

measured objectively.

S U 2. Instructional objectives should be in alignment with the goals of the school.

S U 3. Instructional objectives should be concerned primarily with knowledge of facts.

S U 4. Instructional objectives should be selected in terms of their feasibility.

S U 5. Instructional objectives should specify the intended learning outcomes.

LEARNING GOAL: Describes the importance of selecting appropriate instructional objectives.

Directions: In your own words, describe the importance of carefully selecting instructional

objectives.

Note: Answers will vary.

29

Exercise 3-D

STATING GENERAL INSTRUCTIONAL OBJECTIVES

LEARNING GOAL: Distinguishes between well-stated and poorly stated general instructional

objectives.

Directions: For each of the following pairs of objectives, indicate the one that is best stated as a

general instructional objective by circling the letter of your answer (A or B).

1. A Reads supplementary references.

B Sees the importance of reading.

2. A Is aware of the value of money.

B Comprehends oral directions.

3. A Shows students how to make accurate computations.

B Judges the adequacy of an experiment.

4. A Demonstrates proficiency in laboratory skills.

B Gains minimum proficiency in mathematics.

5. A Studies weather maps.

B Constructs weather maps.

6. A Is familiar with the use of the library.

B Locates references in the library.

LEARNING GOAL: Rewrites poorly stated objectives as well-stated general instructional

objectives.

Directions: Rewrite as well-stated general instructional objectives each of the six poorly stated

objectives at the top of the page.

1.

2.

3.

4.

5.

6.

Note: Answers will vary.

30

Exercise 3-E

STATING SPECIFIC LEARNING OUTCOMES

LEARNING GOAL: Distinguishes between performance and non-performance statements of

specific learning outcomes.

Directions: For each of the following pairs of specific learning outcomes, indicate the one that is

stated in performance terms.

1. A States the principle.

B Realizes the value of the principle.

2. A Increases ability to read.

B Selects the main thought in a passage.

3. A Learns facts about current events.

B Relates facts in explaining current events.

4. A Distinguishes facts from opinions.

B Is aware that opinions should not be stated as facts.

5. A Grasps the meaning of terms when used in context.

B Defines the terms in his or her own words.

6. A Identifies the value of a given point on a graph.

B Determines the trend shown in a graph.

LEARNING GOAL: States specific learning outcomes in performance terms.

Directions: Write three specific learning outcomes, in performance terms, for each of the

following general instructional objectives.

Knows basic terms.

Demonstrates good study habits.

Interprets a weather map.

Note: Answers will vary.

31

Answers to Student Exercises

3-A 3-B 3-C 3-D 3-E

1. O 1. C 1. U 1. A 1. A

2. P 2. A 2. S 2. B 2. B

3. P 3. C 3. U 3. B 3. B

4. O 4. P 4. S 4. A 4. B

5. P 5. P 5. S 5. B 5. B

6. O 6. A 6. B 6. B

32

Chapter 3

Instructional Goals and Objectives: Foundation for Assessment

1. For measurement purposes, instructional goals and objectives should be stated in terms of the

A. instructional process.

B. learning process.

C. subject-matter content to be covered.

D. types of learning outcomes expected.

2. Content standards, such as those developed by the National Council of Teachers of

Mathematics provide which of the following for teachers?

A. a general framework for developing curriculum specifications

B. comprehensive instructional materials for classroom use

C. detailed curriculum specifications

D. specifications of standards of performance for students

3. Which of the following is the best example of student learning?

A. Mary increases her speed in reading.

B. John interprets weather maps to predict next week’s weather.

C. Sara memorizes her weekly spelling words.

D. Charles practices playing his guitar for music class.

4. Recent research on learning and cognitive development has led to an increased emphasis on

which of the following?

A. basic skills instruction

B. learning hierarchies of sequential skills of increasing complexity

C. use of drill-and-practice learning activities

D. students constructing meaning from problem solving

5. When specifying learning outcomes to be used in the development of classroom tests and

assessments, teachers need to guard against an overemphasis on

A. application of principles.

B. higher level thinking skills.

C. easy to measure factual knowledge.

D. problem solving skills.

6. Which of the following factors should be considered first by teachers when assigning weight

to each learning outcome in an assessment?

A. the complexity of the objective

B. the emphasis given on popular standardized tests

C. the instructional time devoted to the topic

D. the number of times it will be assessed

33

7. Which of the following questions would best satisfy the criteria for selecting instructional

objectives?

A. Can they be quickly and easily assessed?

B. Are they representative of all disciplines?

C. Can they all be assessed on an essay exam?

D. Are they attainable by the students to be taught?

8. Which of the following would indicate the lowest level of learning for a student?

A. applies a principle to a specific situation.

B. explains a principle in his or her own words.

C. gives a textbook definition of a principle.

D. states an example of a principle.

9. Which of the following indicates the most specific learning outcome?

A. educational goal

B. developmental objective

C. standardized test goal

D. behaviorally stated objective

10. Which of the following is an example of an “application” behavior?”

A. solving a math problem

B. reciting a poem

C. underlining a sentence

D. enjoying classical music

11. Which of the following is best stated as a general instructional objective.

A. applies learning, as needed

B. demonstrates how to use laboratory equipment

C. gains skill in reading

D. possesses ability to use reference materials

12. Which of the following is best stated as a specific learning outcome?

A. can tell when conclusions lack validity

B. develops ability to evaluate conclusions

C. knows when conclusions are valid

D. states valid conclusions

13. Which of the following is an example of a performance term?

A. appreciates

B. outlines

C. realizes

D. thinks

34

14. Which of the following is stated in performance terms?

A. explains the value of a hypothesis

B. increases his or her ability to recognize hypotheses

C. realizes the importance of testing hypotheses

D. sees the difference between a fact and a hypothesis

15. Which of the following best describes how teachers should address unanticipated learning

outcomes?

A. ignored them

B. include them in instruction as they occur

C. note them for future use, but do not assess them

D. document them as evidence of poor planning for the current term

16. The three major domains of the Taxonomy of Educational Objectives are

A. affective, cognitive, psychomotor.

B. attitude, knowledge, performance.

C. knowledge, understanding, application.

D. competency, attitudes, skills.

17. Which of the following provides the best example of a student showing analysis of material?

A. circles an answer

B. outlines a chapter

C. define vocabulary terms

D. recites multiplication tables

18–25. Following is a list of statements that a teacher compiled to clarify what is meant by

understanding principles. If the statement is properly stated in performance terms, circle

P. If it is not properly stated in performance terms, circle N.

Key P = Performance.

N = Not performance.

P N 18. Makes a prediction using the principle studied.

P N 19. Describes situations in which the principle is applicable.

P N 20. Realizes the essential features of the principle.

P N 21. Is familiar with the uses of the principle.

P N 22. Explains the principle in his or her own words.

P N 23. States tenable hypotheses based on the principle.

P N 24. Identifies misapplications of the principle.

P N 25. Develops complete understanding of the principle.

35

26. Which of the following is considered a behavioral verb?

A. think

B. choose

C. appreciate

D. understand

27–29. Identify the correct cognitive taxonomic level with the behavior.

K = Knowledge

U = Understanding

Ap = Application

An = Analysis

S = Synthesis

E = Evaluation

27. Critique an artistic product based on sound objectives principles.

28. Outline the major themes of a play.

29 Recite a poem.

30–34 Identify the correct taxonomy from the Taxonomy of Educational Objectives

C = Cognitive

A = Affective

P = Psychomotor

30. Choosing to go to an opera instead of a rock concert.

31. Hitting a tennis ball over the net.

32. Correctly solving an algebra problem.

33. List and describe the three domains of the Taxonomy of Educational Objectives. Give an

example of the behavior that might be appropriately included in each domain.

34. Describe a behavioral instructional objective? Why is it important that instructional

objectives include behavioral rather than nonbehavioral verbs. Give three examples of a

behavioral verb and three examples of a nonbehavioral verb.

36

Chapter 3: Answer Key

1. D

2. D

3. B

4. D

5. C

6. C

7. D

8. C

9. D

10 A

11. B

12. D

13. B

14. A

15. B

16. A

17. B

18. P

19. P

20. N

21. N

22. P

23. P

24. P

25. N

26. B

27. E

28. A

29. K

30. A

31. P

32. C

33. The three domains of the Taxonomy of Behavioral Objectives are Cognitive, Affective, and

Psychomotor. The cognitive domain includes factual and intellectual knowledge similar to

most content taught in school. The affective domain includes areas such as values, interests

and motivation. The psychomotor domain includes skills related to fine motor and gross

motor movements and skills. An example of a skill under the cognitive domain would be

correctly solving a set of math problems. A skill included under the affective domain would

be an interest in stamp collecting as a hobby. An example of a psychomotor skill would be

riding a bicycle.

37

34. A behavioral instructional objective is one in which the student must perform a quantitatively

measurable task. Such a task is measurable if two or more observers can agree that the

behavior did or did not take place. If the goal or required task is not stated behaviorally, it is

difficult to judge whether the content has been mastered and the objective achieved.

Examples of behavioral verbs are list, solve, and outline. Examples of nonbehavioral verbs

are know, understand, and appreciate.

38

Chapter 4 Validity

Exercise 4-A

VALIDITY AND RELATED CONCEPTS

LEARNING GOAL: Identifies the nature of validity.

Directions: Indicate which of the following statements concerning validity are correct (C) and

which are incorrect (I) by circling the appropriate letter.

C I 1. A test is by definition valid if it is consistent.

C I 2. Validity is a matter of degree (e.g., high, low).

C I 3. Validity is a general quality that applies to various uses of assessment results.

C I 4. Validity is a unitary concept.

C I 5. An objective test is by definition valid.

C I 6. Validity may be described by the correlation of assessment scores with a

criterion measure.

LEARNING GOAL: Distinguishes among validity, reliability, and usability.

Directions: Briefly describe the key feature of each concept.

Validity:

Reliability:

Usability:

Note: Answers will vary

39

Exercise 4-B

MAJOR VALIDITY CONSIDERATIONS

LEARNING GOAL: Identifies characteristics of the major validity considerations.

Directions: for each of the following statements, indicate which major validity consideration is

being described by circling the appropriate letter using the following key.

KEY A = content

B = construct

C = test-criterion relationships

D = consequences

A B C D 1. Can be expressed by an expectancy table.

A B C D 2. Infers a trait by observable behavior.

A B C D 3. Evaluates what happens when assessment results are used.

A B C D 4. Its correlation can range from –1.00 to +1.00

A B C D 5. Emphasizes the representativeness of the sample of tasks.

A B C D 6. Involves use of a table of specifications.

LEARNING GOAL: Writes an example that illustrates each of the four major validity

considerations.

Directions: Briefly describe an example of evidence that would be relevant for each major

consideration.

Content

Construct

Test-Criterion Relationships:

Consequences:

Note: Answers may vary.

40

Exercise 4-C

MEANING OF CORRELATION

LEARNING GOAL: Interprets correlation coefficients and the effects of various conditions on

them.

Directions: In each of the following pairs of statements, select the statement that indicates the

greater degree of relationship and circle the letter of your answer (A or B). Assume other

things are equal.

1. A A correlation coefficient of .60.

B A correlation coefficient of .10.

2. A A predictive validity coefficient.

B A concurrent validity coefficient.

3. A A predictive validity coefficient of .70.

B A predictive validity coefficient of .80.

4. A A correlation between test scores and a criterion measure obtained one week later.

B A correlation between test scores and a criterion measure obtained one year later.

5. A The concurrent validity of a test for the academically gifted standardized on

academically gifted students.

B The concurrent validity of a test for the academically gifted standardized on

students taken as a cross section of any given school.

LEARNING GOAL: Lists factors influencing a correlation coefficient.

Directions: List three factors that will cause correlation coefficients to be small.

Note Answers will vary.

41

Exercise 4-D

EXPECTANCY TABLE

LEARNING GOAL: Interprets an expectancy table.

Directions: In the expectancy table below, the row for each score level shows the percentage of

students who earned a grade of A, B, C, D, or F. Review the table and answer the question

following it (“Chances” means chances in 100).

Percentage of Students

Score F D C B A Total

__________________________________________

115–134 0 12 20 26 42 100

95–114 10 18 18 24 30 100

75–94 32 26 18 18 6 100

__________________________________________

_______ 1. If Sara had a score of 120, what are her chances of obtaining a grade of A?

_______ 2. If Bob had a score of 113, what are his chances of obtaining a failing grade.

_______ 3. If Tanya had a score of 90, what are her chances of obtaining a grade of C or

higher?

_______ 4. How many times greater are Sara's chances than Tanya's of obtaining a grade of A?

_______ 5. What score levels provide the best prediction?

_______ 6. What score levels provide the weakest prediction?

LEARNING GOAL: Describes the advantages, limitations, and cautions in using expectancy

tables.

Directions: In the appropriate spaces below, describe the advantages, limitations, and cautions in

using expectancy tables.

Advantages:

Limitations:

Cautions in Interpreting:

Note: Answers will vary.

42

Exercise 4-E

FACTORS AND CONDITIONS INFLUENCING VALIDITY

LEARNING GOAL: Identifies the influence of assessment practices on validity.

Directions: Indicate what influence each of the following assessment practices is most likely to

have on validity by circling the appropriate letter to the left of each statement, using the

following key.

KEY R = Raise validity; L = Lower validity.

R L 1. Increase item difficulty by using more complex sentence structure.

R L 2. Increase the number of items measuring each specific skill from five to ten.

R L 3. Replace multiple-choice items with short-answer items for measuring the ability

to define terms.

R L 4. Replace multiple-choice items by laboratory performance tasks for measuring

ability to conduct experiments.

R L 5. Use selection-type items instead of supply-type items to measure spelling ability.

R L 6. Use an essay test to measure factual knowledge of historical events.

LEARNING GOAL: Lists factors that lower assessment validity.

Directions: In the space provided below, list as many factors as you can think of that might

lower the validity of a classroom assessment.

Note: Answers will vary.

43

Answers to Student Exercises

4-A 4-B 4-C 4-D 4-E

1. I 1. C 1. A 1. 42 1. L

2. C 2. B 2. B 2. 10 2. R

3. C 3. D 3. B 3. 42 3. R

4. I 4. C 4. B 4. 7 4. R

5. I 5. A 5. A 5. 115–134 5. L

6. C 6. A 6. 95–114 6. L

44

Chapter 4

Validity

1. The term validity, as used in testing and assessment, refers to which of the following?

A. interpretation of the results

B. items or tasks in the test or assessment

C. sets of scores

D. setting learning standards

2. The current concept of validity is best described as

A. a collection of test items.

B. test scores over time.

C. a statistical concept.

D. a unitary concept.

3. All the following relationships between validity and reliability are possible EXCEPT which

of the following?

A. high validity and low reliability.

B. high validity and high reliability.

C. low validity and low reliability.

D. low validity and high reliability.

4. Which of the following data sources would a teacher likely examine to obtain evidence of

validity based on content considerations?

A. frequency distribution

B. correlation coefficient size

C. description of criterion used

D. table of specifications

5. If a test is valid for one group of individuals it most likely means that it

A. is valid for all groups of individuals

B. may not be valid for other groups of individuals

C. by definition possesses strong content validity

D. holds strong construct validity

6. If a teacher wants to generalize from the sample of items in a test to the larger domain of

achievement that the sample represents, the teacher is concerned with

A. a concurrent validation study.

B. content considerations.

C. criterion-related evidence of validity.

D. evidence of face validity.

45

7. When evaluating a standardized achievement test, the most important validity consideration

is the

A. construct claim made by the publisher.

B. content covered by the test.

C. variety of questions offered on the test.

D. predictive relationship with a criterion.

8. Which of the following would likely be a validity consideration inferred from observable

behavior?

A. Construct

B. Content

C. Criterion

D. Consequence

9. Criterion-related validity considerations typically include which of the following?

A. correlations

B. cut-off scores

C. psychological traits

D. construct variables

10. Which of the following is an example of concurrent validity?

A. two sets of behaviors occurring at once

B. updating classroom behaviors

C. known high science achievers scoring high on a biology achievement test

D. the setting of a standard of performance that is expected to be reached

11. Criterion-related evidence of validity can best be obtained by examining which of the

following?

A. reliability coefficient

B. expectancy table

C. table of specifications

D. test sample

12. Interpreting a student's chances of success in college can be most effectively done through

the use of which of the following?

A. construct validity

B. reliability

C. expectancy tables

D. test blueprints

13. Which of the following is an example of criterion-referenced validity?

A. permanence

B. construct

C. predictive

D. content

46

Below are the names of four major considerations in the evaluation of the validity of particular

interpretations and uses of assessment results. For each statement indicate which consideration is

of primary importance and indicate your answer by the appropriate letter (A = Content, B =

Criterion relationships, C = Construct, D = Consequences).

A B C D 14. A test to substitute for a more complex measure.

A B C D 15. A test of science principles.

A B C D 16. An assessment of musical abilities.

A B C D 17. A test of school achievement.

A B C D 18. A proposed math test correlates with a highly valid math test.

A B C D 19. Test results used to place children into reading groups.

A B C D 20. A test to select college students.

A B C D 21. A test used to determine grade-to-grade promotion.

Indicate what influence each of the following assessment practices is most likely to have on

validity by circling the appropriate letter (R = raise validity, L = lower validity).

R L 22. Increasing the number of test items used.

R L 23. Including irrelevant difficulty in test items.

R L 24. Changing the administration rules on a standardized test.

R L 25. Increasing the reading level of test questions.

R L 26. Using a variety of assessment procedures.

R L 27. Telling students how extended responses will be scored.

R L 28. Placing the most difficult items at the beginning of the test.

29. Define each of the four types of validity.

30. What is a construct? Give an example. How might one go about assessing the construct

validity of a test?

31. What is correlation? Give an example. What is the statistic by which it is expressed? What

are the outside ranges of this statistic?

47

Chapter 4: Answer Key

1. A

2. D

3. A

4. D

5. B

6. B

7. B

8. B

9. A

10. C

11. B

12. C

13. C

14. B

15. A

16. C

17. A

18. B

19. D

20. C

21. A

22. R

23. L

24. L

25. L

26. R

27. R

28. L

29. The four types of validity described in the text are: content, construct, criterion-related and

consequence. Content validity refers to the idea that a test should assess a representative

sample of all of the content presented to students. Construct validity refers to the idea that a

test measures a hypothetical concept or trait that is inferred from observable behavior.

Criterion-related validity refers to the idea that the test adequately measures a standard or

criterion of skills for which it is designed to measure. Consequence validity refers to the

adequacy of the decisions or interpretations by which test results will be made.

48

30. Construct validity refers to the testing of a hypothetical trait or set of traits that a person

exhibiting a set of observable behaviors is thought to possess. Examples of constructs include

giftedness, self-esteem and reading comprehension skill. Construct validity is usually

assessed by giving the test in question to a group of individuals who are believed by experts

to hold high levels of that concept. For example, a test of giftedness might be given to a

group of individuals judged to be highly gifted. They should score high on that test in order

to demonstrate the test’s construct validity.

31. Correlation is the degree of relatedness between two events. That is, two events are

correlated when a change in one variable or events leads to an expected change in the second

variable or event. An example of two events being correlated is that as people exercise more

they tend to drink more fluids. Correlation is measured by the Person Product Moment (r)

statistic. The outside limits of this statistic are –1.00 to 1.00.

49

Chapter 5 Reliability and Other Desired Characteristics

Exercise 5-A

COMPARISON OF VALIDITY AND RELIABILITY

LEARNING GOAL: Identifies similarities and differences between validity and reliability.

Directions: Indicate whether each of the following statements is characteristic of validity (V),

reliability (R), or both (B), by circling the appropriate letter to the left of the statement.

V R B 1. Can be expressed by an expectancy table or regression equation.

V R B 2. Refers to the consistency of a measurement.

V R B 3. Is often based on a comparison with an external criterion.

V R B 4. May be used to predict future behaviors.

V R B 5. Compares performance on two halves of an assessment.

V R B 6. Contributes to more effective classroom teaching.

LEARNING GOAL: Explains the relationship between validity and reliability.

Directions: In the appropriate spaces below, briefly explain each of the following statements.

1. If assessment results are highly valid, they will also be highly reliable.

2. If assessment results are highly reliable, they may or may not be valid.

3. In selecting an assessment, validity has priority over reliability.

Note: Answers will vary.

50

Exercise 5-B

METHODS FOR DETERMINING RELIABILITY

LEARNING GOAL: Distinguishes among the methods for determining reliability.

Directions: For each of the following statements, indicate which method of determining

reliability is being described by circling the appropriate letter. Use the following key.

KEY: A = Test-retest, same form; B = Equivalent form,

C = Equivalent form, test-retest; D = Coefficient alpha; E = Split half

A B C D 1. Provides an inflated reliability coefficient for a speeded test.

A B C D 2. Would probably be wise to use if the same test is to be administered twice

to the same students

A B C D 3. Pretest, posttest

A B C D 4. Do two versions of my test measure identical content?

A B C D 5. Correlation coefficient must be adjusted with the Spearman-Brown

formula.

A B C D 6. Student scores should be consistent when they are given the same test

twice.

LEARNING GOAL: Summarizes the procedure for obtaining various types of reliability

coefficients.

Directions: Briefly describe the procedure for obtaining each type of reliability coefficient.

Test-retest:

Equivalent forms:

Split-half:

Interrater agreement:

Note: Answers will vary.

51

Exercise 5-C

RELATING RELIABILITY TO THE USE OF ASSESSMENT RESULTS

LEARNING GOAL: Selects the reliability method that is most relevant to a particular use of

assessment results.

Directions: For each of the following objectives, select the reliability method that is most

relevant by circling the appropriate letter using the following key.

KEY: T = Test-retest, E = Equivalent form, S = Split-half, I = Interrater consistency.

T E S I 1. Determining whether test scores on school records are still dependable.

T E S I 2. Selecting an achievement test to measure growth over one school year

(pre-test, post-test)

T E S I 3. Two versions of my test measure the same content?

T E S I 4. Evaluating the adequacy of judgmental scoring of performances on

a complex task.

T E S I 5. Seeking support for the adequacy of the sample of test items.

T E S I 6. Determining whether an informal classroom assessment has

internal consistency.

LEARNING GOAL: Justifies the selection of a reliability method for a particular test use.

Directions: For each of statements 1 through 6 above, write a sentence or two to justify why you

think the selected reliability method would provide the most relevant information for that

particular use.

1.

2.

3.

4.

5.

6.

Note: Answers will vary

52

Exercise 5-D

RELIABILITY COEFFICIENT AND STANDARD ERROR OF MEASUREMENT

LEARNING GOAL: Identifies the similarities and differences between the two basic methods of

expressing reliability.

Directions: Indicate whether each of the following statements is more characteristic of the

reliability coefficient (R), the standard error of measurement (E), or both (B) by circling the

appropriate letter to the left of the statement.

R E B 1. Indicates the degree to which a set of scores contains error.

R E B 2. Is high when the range of scores is low.

R E B 3. Cannot be computed without the other.

R E B 4. Useful in selecting a test for a particular grade.

R E B 5. Increases as the spread of scores increases.

R E B 6. Would be zero if the test were perfectly reliable.

LEARNING GOAL: Describes the use of standard error in interpreting test scores.

Directions: In the appropriate spaces below describe how confidence bands (error bands) are

used to interpret each of the following.

1. An individual test score.

2. The difference between two test scores.

Note: Answers will vary

53

Exercise 5-E

FACTORS INFLUENCING RELIABILITY AND INTER-RATER CONSISTENCY

LEARNING GOAL: Identifies the influence of assessment practices on reliability.

Directions: Indicates whether each of the following practices is most likely to raise (R), lower

(L), or have no effect (N) on reliability.

R L N 1. Add more items like those in the test.

R L N 2. Remove ambiguous tasks from the assessment.

R L N 3. Add five items that everyone answers correctly.

R L N 4. Replace a multiple-choice test with an essay test.

R L N 5. Modify the assessment tasks to obtain a wide spread of scores.

R L N 6. Replace a 10-item multiple-choice quiz with a 10 item true-false quiz.

LEARNING GOAL: Computes and interprets inter-rater consistency expressed as the percent of

exact agreement.

Directions: Using the information from the following table compute the percent exact agreement

for the scores provided by two independent raters.

________________________________________________________________________

Scores Assigned by Rater 1

_________________________________________________________________

Score 1 2 3 4 Row

Total

_________________________________________________________________

Scores 4 0 1 4 12 17

Assigned 3 1 5 17 6 29

by 2 6 18 7 1 32

Rater 2 1 16 5 1 0 22

_________________________________________________________________

Column

Total 23 29 29 19 100

________________________________________________________________________

Percent exact agreement =

Briefly interpret the results:

Note: Answers will vary

54

Answers to Student Exercises

5-A 5-B 5-C 5-D 5-E

1. V 1. D 1. T 1. B 1. R

2. R 2. B 2. T 2. B 2. R

3. V 3. A 3. E 3. E 3. N

4. V 4. B 4. I 4. E 4. N

5. R 5. D 5. E 5. E 5. I

6. B 6. A 6. S 6. E 6. N

55

Chapter 5

Reliability and Other Desired Characteristics

1. The term reliability is closest in meaning to which of the following terms?

A. consistency

B. objectivity

C. practicality

D. validity

2. The term reliability, as used in testing and assessment, refers to which of the following?

A. accuracy of test construction

B. method of test interpretation

C. test or assessment results

D. method of test construction

3. A set of test scores may be classified as which of the following?

A. inconsistent and accurate

B. unreliable and valid

C. inaccurate and valid

D. reliable and invalid

4. Reliability can be best determined by

A. analyzing the overall assessment plan.

B. correlating assessment scores.

C. comparing test scores to a criterion.

D. comparing the errors examiners make.

5. Which of the following types of reliability provides a measure of internal consistency?

A. test blueprint analysis method

B. equivalent-forms with no time interval

C. Kuder-Richardson method

D. test-retest with a one-month interval

6. Which of the following methods of determining reliability is most likely to provide the

smallest reliability coefficient?

A. administer Form A— two week interval—administer Form B.

B. administer Form A—three month interval—administer Form B.

C. administer Form A—no time interval—administer Form B.

D. administer Form A and Form B on the same day and apply the Kuder-Richardson

formula.

7. Which of the following methods of estimating reliability is easiest to obtain?

A. equivalent-forms (with a time interval)

B. equivalent-forms (without a time interval)

C. split-half method

D. test-retest method

56

8. Which of the following types of reliability provides an inflated reliability coefficient for a

speeded test?

A. equivalent-forms (with a time interval)

B. equivalent-forms (without a time interval)

C. split-half method

D. test-retest method

9. Which of the following methods of estimating reliability takes into account the greatest

number of types of consistency in scores?

A. test-retest (immediate)

B. test-retest (time interval)

C. equivalent-forms (immediate)

D. equivalent-forms (time interval)

10. Which of the following types of reliability is the best to assess when students take a pretest

and posttest?

A. equivalent-forms method

B. Kuder-Richardson method

C. split-half method

D. D interrater reliability method

11. The split-half method provides a measure of

A. equivalence.

B. internal consistency.

C. stability.

D. external reliability.

12. The standard error of measurement refers to the error involved in which of the following?

A. any assessment

B. computing the standard deviation

C. setting mastery standards

D. standardized testing

13. The standard error of measurement is especially useful for which of the following

calculations?

A. comparing the reliability of different tests

B. converting raw scores to standard scores

C. estimating validity coefficients

D. interpreting individual test scores

14. Whenever the reliability coefficient is low, the standard error of measurement will be:

A. zero.

B. low.

C. high.

D. unchanged.

57

15. A reliability coefficient (r) and a standard error of measurement (SEM) were computed on a

test with 10 items. What would happen to those statistics if the test were increased to forty

items?

A. r would decrease, SEM would increase.

B. r would decrease, SEM would stay the same.

C. r would increase, SEM would decrease.

D. r would increase, SEM would stay the same.

16. As the reliability coefficient (r) increases the SEM

A. increases.

B. decreases.

C. stays the same.

D. increases at first then decreases to zero.

17. If a test has a standard deviation of 8 and an equivalent-form reliability of .75, the

standard error of measurement is

A. 2.

B. 3.

C. 4.

D. 6.

18. The standard error of measurement tends to be smallest for which of the following scores?

A. average

B. extremely high

C. extremely low

D. high and low

19. A set of test scores is most likely to provide a small reliability coefficient when there is a

large

A. number of test items.

B. number of test scores.

C. spread of scores.

D. standard error of measurement.

20. Which of the following test characteristics would remain most constant over different

groups?

A. reliability coefficient

B. standard error of measurement

C. standard deviation

D. validity coefficient

58

21. The reliability of a criterion-referenced, performance mastery test focuses on which of the

following?

A. decision consistency

B. spread of scores

C. stability of test scores

D. standard error of measurement

22. Which of the following would be the best procedure to use for improving the reliability of a

classroom test?

A. making items easier

B. making items more difficult

C. increasing the number of items or tasks

D. using more extended-response tasks

23. An achievement test is most useful if it is

A. easy to score.

B. comprised of different question types.

C. reliable and valid.

D. given in essay form.

24. An SEM for a given test is 3. A student gets a score of 75 on the test. What are the limits of

the confidence band on this student’s score?

A. 69–71

B. 72–78

C. 79–82

D. 80–86

25. As SEM increases, what happens to the confidence band?

A. It widens.

B. It narrows.

C. It stays the same.

D. It automatically goes to zero.

26. As reliability increases, what happens to the confidence band?

A. It widens.

B. It narrows.

C. It stays the same.

D. It automatically goes to zero.

27. Two raters are asked to score an essay test. A total of 100 students took the test. The raters

scored the test and gave the same grade on 40 of the tests. What was their interrater reliability?

28. What does interrater reliability measure? In what types of cases or assessments is it

important?

29. Describe the factors that influence reliability. Provide an example of each.

59

30. What factors should be taken into account when deciding how high reliability should be?

60

Chapter 5: Answer Key

1. A

2. C

3. D

4. B

5. C

6. B

7. C

8. C

9. D

10. A

11. B

12. A

13. D

14. C

15. C

16. B

17. C

18. A

19. D

20. B

21. A

22. C

23. C

24. B

25. A

26. B

27. 40%

28. Interrater reliability is the degree of agreement between two raters that a given behavior has

taken place. It is of particular importance in performance or behavioral assessment situations.

29. Factors that influence reliability are number of assessments performed, the spread in the

distribution of scores, the objectivity of the questions in the assessment and them methods of

estimating reliability.

30. Factors influencing decisions about how high a reliability is adequate for a given assessment

are importance of the decisions to be made, if the decision is final, if the decision is

irreversible, if the results cannot be confirmed, if the decision concerns individuals rather

than groups, and if the consequences of the decision are lasting.

61

Chapter 6 Planning Classroom Tests and Assessments

Exercise 6-A

TYPES AND USES OF CLASSROOM TESTS AND ASSESSMENTS

LEARNING GOAL: Relates type of test item or assessment task to information needed.

Directions: For each of the following questions, indicate which type of test item or assessment

task provides the most useful information by circling the appropriate letter to the left of the

question.

KEY P = Placement

F = Formative

S = Summative

P F S 1. Are students making satisfactory progress in learning to make

connections among major mathematical concepts?

P F S 2. What types of errors are students making in learning grammar?

P F S 3. Should Carman enroll in an advanced mathematics course?

P F S 4. Is Michael ready for instruction on the new unit?

P F S 5. What final grade should Lizanne receive in the science course?

P F S 6. How do my students rank in achievement?

LEARNING GOAL: States whether a criterion-referenced or norm-referenced test is more useful

for a particular use and justifies the choice.

Directions: For each of the questions 1–6 above, (1) state whether a criterion-referenced test or a

norm-referenced test would provide more useful information, and (2) explain, in a sentence or

two, why you think that test type would be more useful.

1.

2.

3.

4.

5.

6.

Note: Answers will vary.

62

Exercise 6-B

SPECIFICATIONS FOR CLASSROOM TESTS AND ASSESSMENTS

LEARNING GOAL: Identifies the procedures involved in preparing specifications for classroom

tests and assessments.

Directions: For each of the following statements, determine whether the procedure is a desirable

(D) or undesirable (U) practice when preparing specifications for test and assessments. Circle

the appropriate letter to the left of the statement.

D U 1. Start by identifying the intended learning outcomes.

D U 2. Limit the specifications to those outcomes that can be measured objectively.

D U 3. Consider the instructional emphasis when specifying the sample of items and

tasks.

D U 4. Increase the relative weighting of topics by including more items on those topics.

D U 5. Use a table of specifications for summative tests only.

D U 6. Consider the purpose of testing when determining item difficulty.

LEARNING GOAL: Explains the importance and nature of using a table of specifications.

Directions: Briefly explain each of the following statements in the space that follows it.

1. Well-defined specifications contribute to validity.

2. Well-defined specifications contribute to interpretability of the results.

3. Tables of specifications may differ for end of unit and end of course assessments.

Note: Answers will vary.

63

Exercise 6-C

USE OF OBJECTIVE ITEMS AND PERFORMANCE ASSESSMENT TASKS

LEARNING GOAL: Identifies whether objective items or performance assessment tasks are

more appropriate for a given condition.

Directions: For each of the following conditions, determine whether objective (O) items or

performance (P) assessment tasks would be more appropriate. Circle the correct letter to the left

of each statement.

O P 1. A broad sampling of learning outcomes is desired.

O P 2. The need is to measure ability to organize.

O P 3. Probably offers highest interrater reliability

O P 4. Time available for scoring is short.

O P 5. The need is to measure knowledge of important facts and major concepts covered

throughout the semester.

O P 6. The need is to measure learning at the synthesis level.

LEARNING GOAL: States whether objective items or performance tasks are more useful for

measuring a particular instructional objective and justifies the choice.

Directions: For each of the following general instructional objectives, (1) state whether

objective items or performance tasks would be more appropriate, and (2) explain, in a sentence

or two, why you think that approach would be more appropriate. 1. Knows specific facts. 2. Interprets a weather map. 3. Evaluates a plan for an experiment. Note: Answers will vary.

64

Exercise 6-D

SELECTING SPECIFIC OBJECTIVE-TYPE ITEMS FOR CLASSROOM TESTS

LEARNING GOAL: Identifies the most relevant objective-type items for a given specific

learning outcome.

Directions: Indicate the type of objective test item that is most appropriate for measuring each of

the specific learning outcomes listed below by circling the appropriate letter to the left of the

outcome, using the following key.

KEY A = Short answer, B = True-false, C = Matching, D = Multiple-choice.

A B C D 1. Links inventors and their inventions.

A B C D 2. Distinguishes between correct and incorrect statements.

A B C D 3. Recalls chemical formulas.

A B C D 4. Identifies the correct date for a historical event.

A B C D 5. Reduces fractions to lowest terms.

A B C D 6. Selects the best reason for an action.

LEARNING GOAL: States specific learning outcomes that can be measured most effectively by

each item type.

Directions: For each of the following types of objective test items, state two specific learning

outcomes that can be measured most effectively by that item type.

Short-answer:

True-false:

Matching:

Multiple-choice:

Note: Answers will vary.

65

Exercise 6-E

PREPARING CLASSROOM TESTS AND ASSESSMENTS

LEARNING GOAL: Distinguishes between sound and unsound procedures for constructing

classroom tests and assessments.

Directions: Indicate whether each of the procedures listed below is sound (S) or unsound (U) in

the construction of classroom tests and assessments by circling the appropriate letter to the left of

the statement.

S U 1. Using a table of specifications in test preparation.

S U 2. Writing more test items and assessment tasks than needed.

S U 3. Including a large number of items and tasks for each interpretation.

S U 4. Writing items on the day before testing.

S U 5. Including some clues on items to aid struggling learners.

S U 6. Putting items and tasks aside for a while before reviewing them.

LEARNING GOAL: Describes the role of item difficulty in preparing classroom tests.

Directions: Describe what is appropriate item difficulty for tests that are designed for each

particular type of interpretation and explain why they differ.

Norm-referenced interpretation:

Criterion-referenced interpretation:

Why they differ:

Note: Answers will vary.

66

Answers to Student Exercises

6-A 6-B 6-C 6-D 6-E

1. F 1. D 1. O 1. C 1. S

2. F 2. D 2. P 2. B 2. S

3. P 3. D 3. O 3. A 3. S

4. P 4. D 4. O 4. D 4. U

5. S 5. U 5. O 5. A 5. U

6. S 6. D 6. P 6. D 6. S

67

Chapter 6

Planning Classroom Tests and Assessments

1. Which of the following is the first consideration in planning for a classroom test or

assessment?

A. Should it criterion-referenced or norm-referenced?

B. Should it be objective or performance-based?

C. What will the results be used for?

D. What content should be covered?

2. Which of the following types of assessment should be used to evaluate student progress in

learning a unit on multiplication?

A. Diagnostic

B. Formative

C. Placement

D. Summative

3. A pretest is a type of

A. diagnostic test.

B. formative test.

C. placement test.

D. summative test.

4. To certify student accomplishment or assign final grades it would be best to use which of the

following?

A. diagnostic assessment

B. formative assessment

C. readiness assessment

D. summative assessment

5. Using a table of specifications will most likely improve a test’s

A. objectivity.

B. practicality.

C. reliability.

D. validity.

6. Failure to use proper specifications for tests and assessments will most likely result in an

overemphasis on which of the following?

A. difficult material

B. state-mandated curriculum

C. factual knowledge

D. writing ability

68

7. The distribution of items and tasks in a table of specifications should reflect the relative

A. importance of objective.

B. objectivity of measurement.

C. practicality of measurement.

D. timeliness of the topic.

8. If a classroom test measures recall of factual information only, it is apt to lack which of the

following?

A. interpretability

B. reliability

C. usability

D. validity

9. Constructing a test with items from a single cell of a two-way (content by objective) table of

specifications is likely to

A. decrease validity and decrease reliability.

B. decrease validity and increase reliability.

C. increase validity and decrease reliability.

D. increase validity and increase reliability.

10. A table of specifications is also referred to as which of the following?

A. a one-way chart

B. a scatter plot

C. a test blueprint

D. a frequency distribution

11. The weight assigned to each instructional objective in a table of specifications should be

determined by which of the following factors?

A. whether it comes first or last in the order of instruction.

B. the instructional time devoted to it.

C. whether or not it is represented on state standardized tests.

D. the time required to respond.

12. The two major classes of essay questions are

A. extended-response and restricted-response.

B. short answer and selection.

C. short answer and supply.

D. supply and selection.

13. One advantage of the supply items over performance-based tasks is that they

A. can be used to measure complex outcomes.

B. have a more desirable influence on student learning.

C. require only paper and pencil.

D. require less time to prepare.

69

14. Which characteristic of a test or assessment is apt to be increased following the

principle, “Use the type of test item or assessment task that measures a learning

outcome most directly”?

A. Objectivity.

B. Practicality.

C. Reliability.

D. Validity.

15. To measure recall of important historical dates it would be best to use which of the following

types of items?

A. matching

B. multiple-choice

C. short-answer

D. true-false

16. To measure integration and application of critical concepts it would be best to use which of

the following question types?

A. extended-response performance tasks

B. multiple-choice items

C. restricted-response essays

D. true-false items

17. Which of the following types of objective test item is likely to be most appropriate for

measuring the following objective: “Distinguishes between fact and opinion”?

A. Matching

B. Multiple-choice

C. Short-answer

D. True-false

18. One advantage of the multiple-choice item over other selection type items is that they

A. eliminate guessing.

B. are easier to construct.

C. are easier to score.

D. provide clues to misunderstandings.

19. For a criterion-referenced interpretation, the difficulty should be determined by which of the

following factors?

A. length of the test

B. nature of the learning tasks

C. spread of the scores

D. type of items used

70

20. The most desirable way to increase the difficulty of a classroom test is to do which of the

following?

A. introduce irrelevant difficulty into the items

B. include more higher-level learning outcomes

C. include more obscure instructional content

D. use a longer test and shorter time limits

21. Removing clues to the answer from test items is most likely to improve the test’s

A. objectivity.

B. practicality.

C. reliability.

D. validity.

22. Using specifications when constructing test items and assessment tasks is most likely to

improve the

A. objectivity.

B. reliability.

C. sampling.

D. standardization.

23. Substituting multiple-choice items for extended-performance tasks is most likely to increase

which of the following?

A. objectivity

B. practicality

C. reliability

D. the number of items in the test

24. The answer to an item in a classroom test should be one that is

A. plausible under specific circumstances.

B. agreed upon by experts.

C. stated somewhere in the textbook glossary.

D. mentioned in the classroom.

25. The reading level of a test item should be at which of the following levels?

A. higher than the reading level of the student.

B. lower than the reading level of the student.

C. the same as the reading level of the student.

D. one level higher than the average of the class.

26. The most basic principle in selecting the type of test items and assessment tasks is to select

item types that are

A. the most direct measure of the intended learning outcome.

B. easy to construct.

C. the most challenging for students to answer in a short time period.

D. liked best by students.

71

27 What are the two types of tables of specifications? How are they different? When would you

use each?

28. Discuss the difference between selection-type and supply-type items? What are two

advantages and disadvantages of each?

29. Identify a type of learning outcome where you would prefer to use matching items and

explain why.

72

Chapter 6: Answer Key

1. C

2. B

3. C

4. D

5. D

6. C

7. A

8. D

9. B

10. C

11. B

12. A

13. C

14. D

15. C

16. A

17. D

18. D

19. B

20. B

21. D

22. C

23. D

24. B

25. B

26. A

27. The two types of table of specifications are a two-way table and a one-way table. A two-way

table has the content topics and/or subtopics down the y-axis of the table and the taxonomic

levels of the objectives (from knowledge through evaluation) across the x-axis. The one-way

table has the content or topics running down the y-axis and the number of tests running

across the x-axis. The two-way table will be the best type to use in most cases.

28. Selection items are those in which the correct answer is given to a student along with a

number of incorrect answers. The student is then asked to select the correct answer. Supply

items are those in which no answers are provided to the student and the student is to supply

the answer on his/her own, from memory. One advantage to selection items is that that more

of them can be included on a test than supply items, and thus sampling of content is better. A

second advantage is that they are more objective to score. A disadvantage of selection items

is that they do not measure higher order learning outcomes and are not very “real-world”

(i.e., most tasks in life are not multiple choice). Advantages to supply items are that they are

more “real world” and that they measure higher-order learning outcomes.

29. Matching items work well when you wish to have students understand the relationship

between two facts or events. An example of a good use of matching questions would be in a

social studies lesson where the names of major explorers are in one list and their discoveries

in another list, and the student task is to match the explorers to their discoveries.

73

Chapter 7 Constructing Objective Test Items: Simple Forms

Exercise 7-A

CHARACTERISTICS OF SHORT-ANSWER, TRUE-FALSE, AND MATCHING ITEMS

LEARNING GOAL: Distinguishes among the characteristics of different item types.

Directions: Indicate which type of objective test item best fits each of the characteristics listed

below by circling the appropriate letter, using the following key.

KEY: S = Short answer, T = True-false, M = Matching.

S T M 1. Is classified as a supply-type item.

S T M 2. Most effective when relationships are involved.

S T M 3. Is most influenced by guessing.

S T M 4. Is most difficult to score.

S T M 5. Directions are most difficult to write for this type.

S T M 6. Correct answer may be obtained on the basis of misinformation.

LEARNING GOAL: States advantages and limitations of each item type.

Directions: For each of the following types of objective test items, state one advantage and one

limitation.

Short-answer item

Advantage:

Limitation:

True-false item

Advantage:

Limitation:

Matching Item

Advantage

Limitation

Note: Answers will vary.

74

Exercise 7-B

EVALUATING AND IMPROVING SHORT-ANSWER ITEMS

LEARNING GOAL: Identifies common faults in short-answer items.

Directions: Indicate the type of fault, if any, in each of the following short-answer items by

circling the appropriate letter, using the following key.

KEY A = Has no faults, B = Has more than one correct answer,

C = Contains clue to the answer

A B C 1. John Glenn first orbited the earth in_______.

A B C 2. In what year did Burgoyne surrender at Saratoga? _______.

A B C 3. The United Nations Building is located in the City of _______ _______.

A B C 4. An animal that eats only plants is classified as _______.

A B C 5. Test specifications can be indicated by a table of_______.

A B C 6. Abraham Lincoln was born in _______.

LEARNING GOAL: Improves defective short-answer items.

Directions: Rewrite as well-constructed short-answer items each of the faulty items in 1–6

above. If an item has not faults, write no faults in the space.

1.

2.

3.

4.

5.

6.

Note: Answers will vary.

75

Exercise 7-C

EVALUATING AND IMPROVING TRUE-FALSE ITEMS

LEARNING GOAL: Identifies common faults in true-false items.

Directions: Indicate the type of fault, if any, in each of the following true-false items by circling

the appropriate letter, using the following key.

KEY A = Has no faults, B = Is ambiguous, C = Contains a clue to the answer,

D= Opinion state (not true or false).

A B C D 1. Camping is fun for the entire family.

A B C D 2. A parasite may provide a useful function.

A B C D 3. The best place to study is in a quiet room.

A B C D 4. A nickel is larger than a dime.

A B C D 5. Abraham Lincoln was born in Kentucky.

A B C D 6. True-false statements should never include the word always.

LEARNING GOAL: Improves defective true-false items.

Directions: Rewrite as well-constructed true-false items each of the faulty items in 1 to 6 above.

If an item has no faults, write no faults in the space.

1.

2.

3.

4.

5.

6.

Note: Answers will vary.

76

Exercise 7-D

EVALUATING AND IMPROVING MATCHING ITEMS

LEARNING GOAL: Identifies common faults in a matching exercise.

Directions: Indicate the specific faults in the following matching exercise by circling the

appropriate letter below the exercise (Y = yes, N = no).

Directions: Match the items in the two columns.

Column I Column II

____1. Wind vane A. Used to measure temperature

____2. Tornado B. Water vapor in the air

____3. Humidity C. Violent storm

____4. Thermometer D. Used to measure wind direction

Y N 1. Directions are inadequate.

Y N 2. Columns are inappropriately placed.

Y N 3. Clues are provided in the answers.

Y N 4. Both lists are the same length

Y N 5. Order of responses is improper.

Y N 6. Matching exercise lacks homogeneity.

LEARNING GOAL: Improves a defective matching exercise.

Directions: In the space below, rewrite the matching exercise at the top of the page. You may (1)

add material, (2) delete material, or (3) rework it into more than one exercise, but you should

cover the same type of material.

Note: Answers will vary.

77

Exercise 7-E

CONSTRUCTING SHORT-ANSWER, TRUE-FALSE, AND MATCHING ITEMS

LEARNING GOAL: Constructs sample test items that are relevant to stated learning outcomes.

Directions: In the spaces provided, construct (1) two short-answer items (one in question form

and one in incomplete statement form), (2) four true-false items, and (3) one four-item matching

exercise. State the specific learning outcome for each item or set of items.

Short-answer item (question form)

Outcome:

Item:

Short-answer item (incomplete statement form)

Outcome:

Item:

True-false items

Outcome:

Items:

1.

2.

3.

4.

78

Matching exercise

Outcome:

Directions:

Column I Column II

____1.

____2.

____3.

____4.

____5.

Note: Answers will vary.

79

Answers to Student Exercises

7-A 7-B 7-C 7-D

1. S 1. B 1. D 1. Y

2. M 2. A 2. B 2. N

3. T 3. C 3. D 3. Y

4. S 4. A 4. B 4. Y

5. M 5. C 5. A 5. Y

6. T 6. B 6. D 6. N

80

Chapter 7

Constructing Objective Test Items: Simple Forms

1. Which of the following item types is classified as a supply-type item?

A. Matching

B. Short answer

C. True-false

2. Which of the following item types would provide the highest score based on guessing alone?

A. Matching

B. Short answer

C. True-false

3. Which of the following item types would be most effective for measuring the ability to

distinguish between factual statements and opinion statements?

A. Matching

B. Short answer

C. True-false

4. Which of the following item types is least useful for diagnosing learning difficulties?

A. Matching

B. Short answer

C. True-false

5. Which of the following item types is difficult to assess students’ higher order thinking skills?

A. True-false

B. Short answer

C. Essay

6. Short-answer test items are clearly superior to matching or true-false items in measuring

A. the ability to distinguish between fact and opinion.

B. the ability to interpret data.

C. computational skill.

D. knowledge of terms.

7. The main shortcoming in using short-answer items is the difficulty for educators to do which

of the following?

A. construct them

B. make them challenging

C. score them

D. interpret the results

81

8. Which of the following is the most well-written short answer item?

A. Cleveland may be found on ______.

B. A person first landed on the moon in ___________.

C. The author of Silas Marner is _________ _________.

D. The United Nations building is in ________ _______.

9. Which of the following is the most well-written short-answer item?

A. A test that is ________ is not necessarily _____ or ________.

B. A test that is ________ is not ___________ valid or ________.

C. A test that is reliable is not ___________ _____ or ________.

D. A test that is reliable is not necessarily _____ or useful.

10. In correcting a short-answer test to measure students’ knowledge of specific principles, a

teacher deducted one point from the total score for each misspelled word. How would this

procedure affect test results?

A. Lowers reliability.

B. Lowers validity.

C. Raises the validity.

D. Raises both the reliability and the validity

11. In large-scale mathematics testing programs, many of the advantages of supply-type items

can be maintained by using which of the following item types?

A. grid-in

B. matching

C. essay

D. true-false

12. Which of the following is a limitation of short-answer items?

A. they are not provided with teacher’s edition textbooks

B. they are time consuming to score

C. they reduce the chance of guessing the correct answer

D. they don’t provide an opportunity to distinguish between fact and opinion

13. “All whales are mammals because they are large.” Asking students to mark the previous

statement true or false would be considered poor testing practice because

A. it cannot be classified as true or false.

B. misinformation could lead to the correct answer.

C. the statement is too vague.

D. the word “all” provides a clue to the answer.

14. Which of the following is the most well-written true-false item?

A. A barometer may be useful in predicting weather.

B. All barometers give precise measures of air pressure.

C. A rising barometer forecasts fair weather.

D. The barometer is the most useful weather instrument.

82

15. Absolute terms like all or none that provide clues in true-false statements are known as

A. exaggeration clues.

B. grammatical inconsistencies.

C. specific determiners.

D. verbal associations.

16. A true-false test can be made more reliable if a teacher does which of the following?

A. closely relate the items to the learning outcomes to be measured

B. increase the number of items

C. increase the difficulty of the items

D. instruct students to answer every item

17. Which of the following true-false items statements has a specific determiner?

A. Alaska has both oil and mineral deposits.

B. Alaska is cold in the winter.

C. Alaska is increasing in population.

D. Alaska is never hot in the summer.

18. Matching items are most useful for measuring learning outcomes at which of the following

levels?

A. application

B. interpretation

C. knowledge

D. synthesis

19. The matching item is a modified form of the

A. multiple-choice item.

B. short-answer item.

C. true-false item.

D. fill-in item.

20. One difficulty in constructing matching items is that it is difficult to find material that

requires students to

A. analyze.

B. explain.

C. interpret.

D. relate.

21. A difficulty in constructing matching items is finding material that is

A. homogeneous.

B. interesting.

C. related to the teaching objectives.

D. unquestionably true.

83

22. Guessing on a matching item can be reduced by doing which of the following?

A. making the responses longer than the premises

B. making the responses more homogeneous

C. using an equal number of premises and responses

D. using responses more than once

23. Excessive use of matching items will most likely result in overemphasis on which of the

following?

A. complex learning

B. reasoning ability

C. rote learning

D. synthesis outcomes

24. Which of the following types of items has the greatest chance of measuring irrelevant

material?

A. Matching

B. Short answer

C. True-false

D. Fill-in

25. One way to cut down on the assessment of irrelevant information in matching items is to

write them first as true-false questions.

A. True.

B. False.

26. One problem with true-false items is that when a false item is correct, it does not measure

whether the student actually knows the information that makes the item true.

A. Agree.

B. Disagree.

27. In which of the following types of questions would interrater reliability probably be an

issue?

A. Short-answer.

B. Multiple choice.

C. True-False.

D. Fill-in

28. Discuss why a teacher would likely not take off for spelling when scoring a short answer

question?

29. Explain what is wrong with the following true-false question: “It takes more skill to write

an opera than a rock and roll song.”?

84

Chapter 7: Answer Key

1. B

2. C

3. C

4. C

5. A

6. C

7. C

8. C

9. D

10. B

11. A

12. B

13. B

14. C

15. C

16. B

17. D

18. C

19 A

20. D

21. A

22. D

23. C

24. A

25. B

26. A

27. A

28. It reduces validity.

29. It measures an opinion.

85

Chapter 8 Constructing Objective Test Items: Multiple-Choice Forms

Exercise 8-A

CHARACTERISTICS OF MULTIPLE-CHOICE ITEMS

LEARNING GOAL: Identifies the advantages and limitations of multiple-choice items in

comparison to other item types.

Directions: The following statements compare multiple-choice (MC) items to other item types

with regard to some specific characteristic or use. Indicate whether test specialists would agree

(A) or disagree (D) with each statement by circling the appropriate letter.

A D 1. MC items avoid the possible ambiguity of the short-answer item.

A D 2. MC items are easier to construct than true-false items.

A D 3. MC items have less need for homogeneous material than the

matching exercise

A D 4. MC items can be scored more reliably than short-answer items.

A D 5. MC items can measure all learning outcomes effectively.

A D 6. MC items have higher reliability per item than true-false items.

LEARNING GOAL: Lists the characteristics of an effective multiple-choice item.

Directions: List the important characteristics of each of the following parts of a multiple-choice

item.

Item Stem:

Correct Answer:

Distracters:

Note: Answers will vary.

86

Exercise 8-B

EVALUATING STEMS OF MULTIPLE-CHOICE ITEMS

LEARNING GOAL: Distinguishes between effective and ineffective stems for multiple-choice

items.

Directions: For each of the following pairs, indicate which element would make the most

effective stem for a multiple-choice item by circling its letter (A or B).

1. A Why did the cost of energy rise so rapidly in the 1970s?

B Which one of the following statements is true about energy?

2. A Achievement tests should

B Achievement tests are useful for

3. A A whale is a

B Whales are classified as

4. A Aluminum, which is finding many new uses, is made from

B Aluminum is made from

5. A The man who first explored Lake Michigan was

B The Frenchman who first explored Lake Michigan was

6. A Which of the following illustrates what is meant by the word climate?

B Which of the following does not illustrate what is meant by the word climate?

LEARNING GOAL: Describes the faults in ineffective stems for multiple-choice items.

Directions: For each of the ineffective stems in 1–6 above, briefly describe the type of fault it

contains.

1.

2.

3.

4.

5.

6.

Note: Answers will vary.

87

Exercise 8-C

EVALUATING ALTERNATIVES USED IN MULTIPLE-CHOICE ITEMS

LEARNING GOAL: Distinguishes between effective and ineffective use of alternatives in

multiple-choice items.

Directions: Each of the following multiple-choice item stems has two sets of alternatives.

Indicate which set would make the most effective alternatives for the items by circling the letter

(A or B). The items are kept simple and the alternatives are placed across the page to save space.

1. A United State astronaut flew to the moon in

A (1) 1967 (2) 1968 (3) 1969 (4) 1970

B (1) a spaceship (2) 1969 (3) 1979 (4) 1989

2. Who was the 33rd President of the United States?

A (1) Lincoln (2) Bush (3) Bush (4) Truman

B (1) Roosevelt (2) Truman (3) Eisenhower (4) Kennedy

3. Which of the following represents what is meant by the term reforestation?

A (1) Cutting (2) Replanting (3) Spraying (4) Surveying

B (1) Recutting (2) Replanting (3) Spraying (4) Resurveying

4. Which of the following best describes observable student performance?

A (1) Constructs (2) Fears (3) Realizes (4) Thinks

B (1) Constructs (2) Fears (3) Realizes (4) None of these

LEARNING GOAL: Describes the faults in ineffective sets of alternatives.

Directions: For each of the ineffective sets of alternatives in 1 to 4 above, briefly describe the

type of fault it contains.

1.

2.

3.

4.

Note: Answers will vary.

88

Exercise 8-D

EVALUATING AND IMPROVING MULTIPLE-CHOICE ITEMS

LEARNING GOAL: Identifies common faults in multiple-choice items.

Directions: Indicate the major type of fault, if any, in each of the following multiple-choice

items by circling the appropriate letter, using the following key. The alternatives are placed

across the pages to save space.

KEY A = No fault B = Stem is inadequate,

C = Contains inappropriate distracters D = Contains a clue to the answer

A B C D 1. Reliability (a) means consistency, (b) is the same as objectivity,

(c) refers to usability, (d) is a synonym for interpretability.

A B C D 2. The characteristic that is most desired in test results is

(a) consistency, (b) reliability, (c) stability, (d) validity.

A B C D 3. If a test is lengthened, its reliability will

(a) decrease, (b) increase, (c) stay the same, (d) none of these.

A B C D 4. A method of determining reliability that requires correlating scores

from two halves of a test is called (a) equivalent forms, (b) Kuder-

Richardson method, (c) split-half method, (d) test-retest method.

LEARNING GOAL: Improves defective multiple-choice items.

Directions: Rewrite as well-constructed multiple-choice items each of the faulty items in 1–4

above. If an item has no faults, write no faults in the space.

1.

2.

3.

4.

Note: answers will vary.

89

Exercise 8-E

CONSTRUCTING MULTIPLE-CHOICE ITEMS

LEARNING GOAL: Constructs sample multiple-choice items that are relevant to stated learning

outcomes.

Directions: In a subject area you have studied or plan to teach, state the desired learning outcome

and construct one multiple-choice item for each of the general instructional objectives listed

below.

Understands basic terms:

Outcome:

Item:

Understands specific facts:

Outcome:

Item:

Understands principles (or facts):

Outcome:

Item:

Applies principles or facts

Outcome:

Item:

Note: Answers will vary.

90

Answers to Student Exercises

8-A 8-B 8-C 8-D

1. A 1. A 1. A 1. B

2. D 2. B 2. B 2. C

3. A 3. B 3. B 3. C

4. A 4. B 4. B 4. D

5. D 5. B

6. A 6. A

91

Chapter 8

Constructing Objective Test Items: Multiple-Choice Forms

1. The problem presented in a multiple-choice item should be clear after reading which of the

following?

A. item stem

B. the correct answer

C. the distracters

D. item stem and all the alternatives

2. A distracter in a multiple-choice item refers to which of the following?

A. an alternative that is plausible yet clearly incorrect

B. alternatives that may be correct under certain circumstances

C. negatively stated alternatives

D. any alternative that diverts the reader’s attention away from the item stem

3. An incorrect alternative can be made more plausible by

A. avoiding textbook language.

B. making it grammatically inconsistent with the stem.

C. making it shorter than the others.

D. using common errors made by students.

4. One advantage of multiple-choice items over true-false items is that they reduce the

A. chance for cheating.

B. difficulty of machine scoring.

C. influence of guessing on the score.

D. time needed in test preparation.

5. Which of the following describes one advantage of multiple-choice items over matching

items?

A. for multiple choice items, clearly stated objectives are not needed

B. multiple choice items are more adaptable to different types of outcomes

C. multiple choice items require less testing time

D. there is less need to correct for guessing on multiple choice items

6. One advantage of multiple-choice items over short-answer items is that multiple choice items

A. encourage students to study harder.

B. measure computation skill more effectively than short-answer items.

C. provide more freedom of response.

D. provide a more objective measure of achievement.

92

7. A 50-item multiple-choice test would provide more reliable test scores than a 50-item true-

false test because

A. a bigger spread of scores is obtained.

B. greater care is needed during item writing.

C. the items can be written to include correct answers.

D. the scoring is more objective.

8. Which of the following learning outcomes is most likely to require the best-answer type of

multiple-choice item?

A. Can justify methods and procedures

B. Can identify proper grammar usage

C. Can distinguish between fact and opinion

D. Can understand specific facts

9. Which of the following learning outcomes is most likely to require the best-answer type of

multiple-choice item?

A. Distinguishes between fact and opinion.

B. Identifies the dates of historical events.

C. Understands specific historical facts.

D. Selects the reason a historical event occurred.

10. Inexperienced item writers will produce more effective multiple-choice items if they start

with which of the following?

A. best-answer items

B. a list of possible alternatives

C. the stem in question form

D. the entire test blueprint

11. A multiple-choice item that measures at the understanding level must include which of the

following?

A. a table of specifications

B. a situation that is new to the students

C. at least two plausible distracters

D. introductory material that requires a high level of reading ability

12. A major drawback in using multiple-choice items in which more than one alternative may

be marked as correct creates a difficulty in

A. administering the test.

B. constructing the test items.

C. developing directions for the test.

D. scoring the test.

93

13. The reliability of a multiple-choice item test will tend to increase, with an increase in which

of the following?

A. difficulty of items

B. number of learning outcomes measured

C. number of alternatives in each item

D. diversity in the group being tested

14. Which of the following provides the best stem for a multiple-choice item?

A. Penicillin is

B. Penicillin was discovered by

C. Penicillin, which has many uses in medicine, was discovered by

D. Which of the following scientists discovered penicillin?

15. Which of the following provides the best stem for a multiple-choice item?

A. Which of the following did not contribute to the depression?

B. One major factor that contributed to the depression is

C. The depression was

D. The depression was caused by

16. What is wrong with the stem of the following multiple-choice question? “Which of the

following states is the largest state in the United States?”

A. Largest can be measured either geographically or by population.

B. It measures opinion rather than fact.

C. It measures only a lower order skill.

D. It should be posed as a statement instead of a question.

17. Which of the following sets of alternatives would be best for a multiple-choice item about a

battle in the Civil War?

A. Davis, Grant, Lincoln, none of the above.

B. Lincoln, Mason-Dixon Line, Sherman, Vicksburg.

C. Grant, Jackson, Lee, Sherman.

D. Jefferson, Lincoln, Roosevelt, Washington.

18. Which of the following sets of alternatives is best for the following multiple-choice item:

“The perimeter of a rectangle 4 inches long and 2 inches wide is ______”?

A. 6 inches, 8 inches, 12 inches, 16 inches

B. 2 inches, 12 inches, 24 inches, 36 inches

C. 11 inches, 12 inches, 13 inches, 22 inches

D. 2 inches, 3 inches, 17 inches, 18 inches

19. In order to measure application with multiple-choice items, the problem situations should be

A. described in complex terms.

B. new to the students.

C. the same as those solved in class.

D. able to measure factual knowledge.

94

20. When using the “best-answer” type of multiple-choice item, which of the following should

be included?

A. a stem that consists of at least three sentences

B. the alternative “none of the above”

C. more than seven alternatives

D. alternatives of equal length

21. Ambiguity can best be reduced in multiple-choice items by

A. avoiding the use of “best-answer” items.

B. having another teacher review the items.

C. keeping the length of the alternatives equal.

D. using no more than four alternatives.

22. Multiple-choice tests are particularly well suited for measuring analysis, synthesis and

evaluation.

Agree

Disagree

23. List three advantages that multiple-choice items have over true false items?

24. If one test had 50 multiple-choice items and an equivalent test had 50 true-false items,

which test would probably have higher reliability? Why? What is probably the only way

that a true-false test might possess higher reliability than a multiple-choice test?

95

Chapter 8: Answer Key

1. A

2. A

3. D

4. C

5. B

6. D

7. A

8. A

9. D

10. C

11. B

12. D

13. C

14. D

15. B

16. A

17. C

18. A

19. B

20. D

21. B

22. Disagree

23. Multiple choice items are less susceptible to guessing. They require that the correct answer

be selected, not that an incorrect statement simply be reorganized, and they can measure

skills higher than simple facts.

24. All things being equal, a multiple-choice test will have a higher reliability than an equivalent

true-false test because of the high probability of getting a true-false item correct as a result of

guessing. One way to increase the reliability of true-false tests is to increase the number of

items on the test. Thus, a true-false test with many more items than a corresponding multiple-

choice test might possess higher reliability.

96

Chapter 9 Measuring Complex Achievement: The Interpretative Exercise

Exercise 9-A

CHARACTERISTICS OF INTERPRETIVE EXERCISES

LEARNING GOAL: Identifies the advantages and limitations of interpretive exercises in

comparison to other item types.

Directions: The following statements compare the interpretive exercise (IE) to other types with

regard to some specific characteristics or use. Indicate whether test specialists would agree (A)

or disagree (D) with the statements by circling the appropriate letter.

A D 1. The IE is more difficult to construct than other item types.

A D 2. The IE can be designed to measure more complex learning outcomes than the

single-objective item.

A D 3. The IE provides a more reliable measure of complex learning outcomes than the

essay test.

A D 4. The IE is more effective than the essay test for measuring the ability to

organize ideas.

A D 5. The IE is one of the most effective item types to use with poor readers.

A D 6. The IE measures knowledge of specific facts more effectively than other item

types.

LEARNING GOAL: Lists the characteristics of an effective interpretive exercise.

Directions: List the important characteristics of each of the following parts of an interpretive

exercise.

Introductory material:

Related test items:

Note: Answers will vary.

97

Exercise 9-B

EVALUATING AND IMPROVING THE INTERPRETIVE EXERCISE

LEARNING GOAL: Identifies the common faults in a interpretive exercise.

Directions: Indicate the specific faults in the following interpretive exercise by circling

appropriate letter (Y = yes, N = no) by each fault.

INTERPRETIVE EXERCISE

Directions: Read the paragraph and mark your answers.

Some teachers falsely believe that multiple-choice items are limited to the measurement of

simple learning outcomes because they depend on the recognition of the answer rather than the

recall of it. However, complex outcomes can be measured, and the selection of the correct

answer is not based on the mere recognition of a previously learned answer. It involves the use of

higher mental processes to arrive at a solution and then the correct answer is selected from

among the alternatives presented. This is the reason multiple-choice items are also called

selection-type items rather than recognition-type items. It makes clear that the answer is

selected by the mental process involved and is not limited to recognition.

T F 1. Some teachers think multiple-choice items measure only at the

recognition level.

T F 2. All selection-type items are recognition-type items.

T F 3. Multiple-choice items are also called selection-type items.

T F 4. Selection-type items include multiple-choice, true-false, and matching.

Items for evaluating the interpretive exercise:

Y N 1. Directions are adequate.

Y N 2. Some items measure simple reading skill only.

Y N 3. Some items measure extraneous material.

Y N 4. Some of the items can be answered without reading the paragraph.

LEARNING GOAL: Improves defective interpretive exercise.

Directions: Rewrite the directions for the above interpretive exercise and write one true-false

item that calls for interpretation of the material.

Note: Answers will vary.

98

Exercise 9-C

CONSTRUCTING INTERPRETIVE EXERCISES

LEARNING GOAL: Constructs sample exercise for interpreting a paragraph.

Directions: Construct an interpretive exercise that measures the ability to interpret a paragraph

of written material. Include complete directions, the paragraph, and at least two multiple-choice

items.

Note: Directions will vary.

99

Exercise 9-D

CONSTRUCTING INTERPRETIVE EXERCISES

LEARNING GOAL: Constructs sample exercise for interpreting pictorial material.

Directions: Construct an interpretive exercise that measures the ability to interpret a picture or

cartoon. Include complete directions, the pictorial material, and two objective items of any

type.

Note: Directions will vary.

100

Exercise 9-E

CONSTRUCTING INTERPRETIVE EXERCISES

LEARNING GOAL: Constructs sample exercise for interpreting a table, chart, or graph.

Directions: Construct an interpretive exercise that measures the ability to interpret a table,

chart, or graph. Include complete directions, the pictorial material, and two objective items of

any type.

Note: Directions will vary.

101

Answers to Student Exercises

9-A 9-B

1. A 1. N

2. A 2. Y

3. A 3. Y

4. D 4. Y

5. D

6. D

102

Chapter 9

Measuring Complex Achievement: The Interpretive Exercise

1. The interpretive item is probably most effective for measuring which of the following?

A. a broad range of factual knowledge

B. higher-order thinking skills

C. the ability to organize ideas

D. the ability to present relevant arguments

2. Using interpretive exercises will typically result in more effective

A. interpretive writing skills.

B. measurement of complex learning outcomes.

C. motivation to learn factual information.

D. sampling of course content.

3. The introductory material used in an interpretive exercise should include which of the

following qualities?

A. it should be based on pictorial material rather than written material

B. it should be complex and difficult enough that only the better students will answer

correctly

C. it should be in harmony with the instructional objectives

D. it should be selected from the material that the student had studied during the

course

4. The type of interpretive exercise to use should be determined by which of the following?

A. the amount of reading required

B. the intended learning outcomes

C. scoring method to be used

D. types of test items to be written

5. One advantage of the interpretive exercise over the single multiple-choice item is that the

interpretive exercise

A. can measure more complex outcomes.

B. is easier to construct.

C. is easier to score.

D. measures factual information more effectively.

6. One advantage of the interpretive exercise over performance-based assessment tasks is that

the interpretive exercise

A. measures more important outcomes.

B. places less emphasis on reading.

C. provides a more structured task.

D. prevents students from cheating.

103

7. Which of the following “enabling skills” would most likely to lower the validity of an

interpretive exercise?

A. imitating

B. reading

C. thinking

D. writing

8. Requiring students to recall an excessive amount of factual information in order to solve an

interpretive exercise can result in lower

A. objectivity.

B. practicality.

C. reliability.

D. validity.

9. Which of the following is a common error made by test designers in constructing interpretive

exercises?

A. including introductory material that is new to students

B. including material that requires a low level of reading skill

C. including items that can be answered on the basis of general knowledge

D. including too many items for each section of the assessment

10. Which of the following types of test items are most commonly used with interpretive

exercises?

A. alternative-response and key-type

B. key-type and multiple-choice

C. matching and alternative-response

D. matching and multiple-choice

11. The use of interpretive exercises to measure complex learning outcomes reduces the

influence of which of the following?

A. irrelevant factual information

B. students guessing answers

C. reading skills

D. thinking skills

12. The introductory material in the interpretive exercise is most effective if it

A. comes directly from the textbook.

B. is new to the students.

C. places a high demand on reading skill.

D. was thoroughly covered in class discussion.

13. The interpretive exercise is especially useful for educators because it assists in measuring the

ability to

A. detect invalid inferences.

B. express original ideas.

C. recall information.

D. use grammar and spelling skills.

104

14. The test items in an interpretive exercise are most effective if they include which of the

following qualities?

A. they are answered directly from the introductory material

B. they can be answered without any additional information

C. they contain only two alternatives

D. they require more than the recall of information to answer

15. When testing very young students, it is best to use material that is

A. familiar.

B. humorous

C. visual.

D. interesting.

16. When compared to essay questions, interpretive exercises possess

A. greater validity.

B. lower standard deviation.

C. higher interrater reliability.

D. easier construction.

17. A simple method for checking the adequacy of an interpretive exercise is to do which of the

following?

A. attempt to answer the questions without the introductory material

B. read the exercise to students and obtain their feedback on it

C. read and answer the item to yourself, just as a student would

D. score the item after students have taken it and then review their answers

18. Which of the following factors makes it difficult use interpretive exercises on a test?

A. administering them

B. constructing them

C. scoring them

D. relating them to learning outcomes

19. The interpretive exercise is least useful for measuring the ability to

A. appraise the plan for an experiment.

B. evaluate the adequacy of an experiment.

C. produce a product.

D. recognize the validity of conclusions.

20. If the higher achieving students in class can answer the questions on an interpretive exercise

without looking at the introductory material but the lower achieving students can’t, the

exercise is likely

A. well constructed and valid.

B. well constructed and invalid.

C. poorly constructed and valid.

D. poorly constructed and invalid.

105

21. Which of the following would be considered a sound principle for constructing an

interpretive exercise based on a reading passage?

A. offering students a minimum of six alternatives to choose from

B. using introductory materials that students have seen before

C. choosing a passage that is relatively brief

D. incorporating data from a table that is one page in length

22. Keeping the introductory materials on an interpretive exercise relatively brief would assist

students who may be experiencing issues of:

A. excessive page turning.

B. short-term memory.

C. long-term memory.

D. English grammar.

23. Which of the following would likely be a legitimate use of the interpretive exercise?

A. answering reading comprehension questions

B. evaluating a work of art

C. judging a musical composition

D. describing a movie theme

24. Which of the following should probably be the relationship between introductory materials

in a science curriculum interpretive exercise and reading level of students?

A. higher than the class average

B. higher than grade level

C. lower than grade level

D. right at grade level

25. How are interpretive exercises, true-false items, and matching items alike? How are they

different? How are interpretive exercises and performance items alike? How are they

different?

26. What types of errors of interpretation might a social studies interpretive exercise

possess if it contains a relatively long passage that requires high-level reading? Be

as specific as possible.

106

Chapter 9: Answer Key

1. B

2. B

3. C

4. B

5. A

6. C

7. B

8. D

9. C

10. B

11. A

12. B

13. A

14. D

15. C

16. C

17. A

18. B

19. C

20. D

21. C

22. B

23. A

24. C

25. Interpretive, true false and matching exercises are similar in that they are objective and of the

selection variety. They are different in that while true-false and matching items are usually

used to measure lower-order knowledge and factual material, interpretive exercises may be

used for higher-order learning such as understanding and application. Interpretive exercises

and performance items are alike in that they can both measure higher forms of learning. They

are different in that introductory exercises are of the selection variety and performance items

are of the supply variety.

26. Error would result in the teacher not knowing if the student did not answer the questions

correctly because he or she did not know the social studies materials or if the reading

demands were too great for the student. Likewise, an involved passage might have taxed the

student’s attention or short-term memory demands rather than the student’s not learning the

social studies material.

107

Chapter 10 Measuring Complex Achievement: Essay Questions

Exercise 10-A

CHARACTERISTICS OF ESSAY QUESTIONS

LEARNING GOAL: Identifies the advantages and limitations of essay questions in comparison

to objective items.

Directions: The following statements compare essay questions to objective items with regard to

some specific characteristic or use. Indicate whether test specialists would agree (A) or disagree

(D) with the statement by circling the appropriate letter.

A D 1. Essay questions are more efficient than objective items for measuring

knowledge of facts.

A D 2. Essay questions are more subject to bluffing than are objective items.

A D 3. Essay questions are preferred when a teacher is measuring the student’s

ability to organize.

A D 4. Essay questions measure a more limited sampling of content than

objective questions in a given amount of testing time.

A D 5. Essay questions provide more reliable scores than do objective items.

A D 6. Essay questions can measure complex learning outcomes that are

difficult to measure by other means.

LEARNING GOAL: Lists the characteristics of an effective essay question.

Directions: List the important characteristics of each type of essay question.

Restricted-response question:

Extended-response question:

Note: Answers will vary.

108

Exercise 10-B

EVALUATING AND IMPROVING ESSAY QUESTIONS

LEARNING GOAL: Describes faults in essay questions and rewrites them as effective items.

Directions: Describe the faults in each of the following sample essay questions and rewrite each

question so that it meets the criteria for an effective essay item.

1. Why are essay questions better than objective items?

Faults:

Rewrite item:

2. List the rules from your textbook for constructing essay questions.

Faults:

Rewrite item:

3. How do you feel about using essay questions?

Faults:

Rewrite item:

4. Write on one of the following: (1) constructing essay questions, (2) scoring essay

questions, (3) using essay questions to improve learning.

Faults:

Rewrite item:

Note: Answers will vary.

109

Exercise 10-C

CONSTRUCTING RESTRICTED-RESPONSE ESSAY QUESTIONS

LEARNING GOAL: Constructs sample restricted-response essay questions.

Directions: Construct one restricted-response essay question for each of the types of thought

questions listed.

1. Comparing two things.

2. Justifying an idea or action.

3. Classifying things or ideas.

4. Applying a fact or principle.

Note: Answers will vary.

110

Exercise 10-D

CONSTRUCTING EXTENDED-RESPONSE ESSAY QUESTIONS

LEARNING GOAL: Constructs sample extended-response essay questions.

Directions: Construct one extended-response essay question for each of the types of thought

questions listed. For each question, describe the scoring procedures to be used and the elements

included in the scoring.

1. Synthesis: Production of a plan for doing something (e.g., experiment), for constructing

something (e.g., graph, table, dress), or for taking some social action (e.g., preventing

pollution).

Question:

Scoring procedure:

2. Evaluation: Judging the value for something (e.g., a proposal, book, poem, teaching

method, research study) using definite criteria.

Question:

Scoring procedure:

Note: Answers will vary.

111

Exercise 10-E SCORING ESSAY QUESTIONS LEARNING GOAL: Distinguishes between good and bad practices in scoring essay questions. Directions: Indicate whether each of the following statements describes a good (G) practice or a bad (B) practice in scoring essay questions by circling the appropriate letter. G B 1. Use a model answer for scoring restricted-response questions. G B 2. Evaluate all answers on a student’s paper before doing the next paper. G B 3. Review a student's scores on earlier tests before reading the answers. G B 4. Score content and writing skills separately. G B 5. Use the rating method for scoring extended-response questions. G B 6. Lower the score one point for each misspelled word. LEARNING GOAL: Prepares a list of points for scoring essay questions. Directions: List five do’s and five don’ts to serve as a guide for scoring essay tests. DO: 1. 2. 3. 4. 5. DON'T: 1. 2. 3. 4. 5. Note: Answers will vary.

112

Answers to Student Exercises

10-A 10-E

1. D 1. G

2. A 2. B

3. A 3. B

4. A 4. G

5. D 5. G

6. A 6. B

113

Chapter 10

Measuring Complex Achievement: Essay Questions

1. The use of some essay questions in a classroom test will probably improve the assessment’s

A. objectivity.

B. practicality.

C. reliability.

D. validity.

2. For which of the following types of learning outcomes is the essay item most useful?

A. Application

B. Comprehension

C. Knowledge

D. Synthesis

3. Which of the following is useful for improving essay testing?

A. providing unlimited time for student responses

B. permitting students to choose among optional questions

C. determining the scoring procedures in advance

D. scoring the answers while looking at students’ names

4. Which of the following characteristics is shared by both objective tests and essay tests?

A. Both are efficient for measuring knowledge of specific facts.

B. Both are useful in both formative and summative assessment

C. Both provide for extensive sampling of content.

D. The reliability of scoring is high for both.

5. One major problem in using essay questions to evaluate learning is that they are difficult to

A. administer.

B. construct.

C. interpret.

D. score.

6. Essay questions should be used in achievement tests when

A. a wide sampling of material is desired.

B. knowledge of factual information is stressed.

C. little time is available for scoring.

D. organizing and integrating ideas is important.

7. Essay questions are more appropriate than multiple-choice items when the learning outcome

calls for which of the following?

A. development of an argument

B. identification of concepts

C. interpretation of data

D. recognition of relationships

114

8. Essay questions are more appropriate than objective items when measuring the ability

to do which of the following?

A. identify the importance of information

B. integrate information

C. interpret information

D. recall information

9. Which of the following is a serious limitation of an essay test?

A. difficulty of construction

B. limited sampling

C. lack of validity

D. susceptibility to cheating

10. An extended-response essay question is better than a restricted-response question if

A. administration time is limited.

B. complex learning outcomes are being assessed.

C. questions cannot be phrased clearly.

D. reliable scoring is of special importance.

11. A restricted-response essay question is better than an extended-response question if

A. creativity is desired in the response.

B. scoring is to be done by the rating method.

C. the task involves a global approach to problem solving.

D. specific information needs to be supplied by the student.

12. For which of the following learning outcomes would objective items be better than essay

questions?

A. Identifying the meaning of concepts.

B. Relating concepts to form a theory.

C. Synthesizing the arguments in favor of a proposal.

D. Using concepts in solving problems.

13. Which of the following is considered a sound essay grading procedure?

A. assigning points to restricted-response answers

B. using the rating method for extended-response questions

C. using a separate score for spelling errors

D. grading each student’s complete paper before doing the next one

14. Deducting points for neatness on essay responses will have the greatest influence on which of

the following?

A. objectivity

B. reliability

C. validity

D. writing ability

115

15. Which of the following is a desirable practice for scoring restricted-response essay

questions?

A. Give extra credit for brief answers.

B. Lower the score if bluffing is detected.

C. Prepare a model answer in advance.

D. Use the rating method of scoring.

16. Which of the following test construction procedures is most likely to result in valid responses

to extended-response essay questions?

A. Clearly indicate the nature of the desired answer.

B. Set brief time limits for each question to restrict “bluffing.”

C. Write questions that can be answered in a few sentences.

D. Write questions that are limited to the recall of factual information, but cover a

broad range of topics.

17. Which of the following is a desirable practice for grading essay answers?

A. Grade content and spelling separately.

B. Mark an answer with a zero, if bluffing is detected.

C. Read the essays of the highest performing students first to establish the scoring

standard.

D. Score a student's answer in light of what is known about his or her past

achievement.

18. In order to assess students’ ability to express themselves creatively in writing, it would be

best to use which of the following?

A. an objective test of writing skill

B. extended-response essay questions

C. restricted-response essay questions

D. themes and other writing assignments

19. Presenting students with several essay questions and permitting them to choose any two of

them to respond to is an undesirable testing practice for which of the following reasons?

A. it encourages students to write more on each question

B. it is difficult for students to adequately prepare for the test

C. the basis of comparing students is undermined

D. students do not have an opportunity to demonstrate all that they have learned

20. Which of the following would most likely improve the scoring of essay answers on a test?

A. having a second competent scorer grade the papers

B. including errors in grammar in the total score

C. looking at the student's name before reading each paper

D. reading answers of students who typically are high achieving first

116

21. Discuss three differences between restricted-response and extended-response essay

questions.

22. Discuss three differences between analytic and holistic scoring.

117

Chapter 10: Answer Key

1. D

2. D

3. C

4. B

5. D

6. D

7. A

8. B

9. B

10. B

11. D

12. A

13. B

14. C

15. C

16. A

17. A

18. B

19. C

20. A

21. Restricted-response questions closely circumscribe the questions and the manner of

answering that the students should use in their essay answer. They restrict students from

answering the question in a tangential or irrelevant fashion and are easier to score. Extended

response essays give the students much more latitude in framing their answers. They are

more general than the restricted response essay ands possesses fewer instructions.

22. Holistic scoring involves reading the essay in its entirety and giving it an overall, or holistic,

score based on that reading. Analytic scoring requires that the essay be read and scored in

sections, with the section subtotals added before an overall essay grader is given. Analytic

scoring usually results in more reliable scoring.

118

Chapter 11 Measuring Complex Achievement: Performance-Based Assessments

Exercise 11-A

CHARACTERISTICS OF PERFORMANCE-BASED ASSESSMENT TASKS

LEARNING GOAL: Identifies the advantages and limitations of performance-based assessment

tasks.

Directions: The following statements compare performance-based assessment (PBA) tasks to

objective items with regard to some specific characteristic or use. Indicate whether test

specialists would agree (A) or disagree (D) with the statement by circling the appropriate letter.

A D 1. PBA tasks are more likely to be used for higher-level learning objectives.

A D 2. PBA tasks provide a better means of assessing the breadth of a student's

knowledge.

A D 3. PBA tasks are preferred when teachers are measuring the process that a student

uses to solve a problem.

A D 4. PBA tasks measure a more limited sampling of behavior.

A D 5. PBA tasks are more suitable for measuring the ability to solve ill-structured

problems.

A D 6. PBA tasks provide more reliable scores.

LEARNING GOAL: Lists the characteristics of an effective performance-based assessment task.

Directions: List the important characteristics of each type of performance-based assessment task.

Restricted-response task:

Extended-response task:

Note: Answers will vary.

119

Exercise 11-B

CONSTRUCTING RESTRICTED-RESPONSE PERFORMANCE-BASED

ASSESSMENT TASKS

LEARNING GOAL: Constructs simple restricted-response performance-based assessment tasks.

Directions: Construct two restricted-response performance-based assessment tasks for a grade

and subject matter of your choice. Include a description of the directions to students and the

criteria to be used in judging their performances.

1. Directions:

Task:

Scoring criteria:

2. Directions:

Task:

Scoring criteria:

Note: Answers will vary.

120

Exercise 11-C

CONSTRUCTING EXTENDED-RESPONSE PERFORMANCE-BASED ASSESSMENT

TASKS

LEARNING GOAL: Constructs an extended-response performance-based assessment task.

Directions: Construct an extended-response performance-based assessment task involving

problem solving for a grade and subject matter of your choice. The task should require students

to decide on an approach to solving the problem, identify or gather relevant information, and

integrate that information to produce a product. Include a description of the directions to students

and the criteria to be used in judging their performances.

Directions:

Task

Scoring:

Note: Answers will vary.

121

Exercise 11-D

RATING SCALES

LEARNING GOAL: Distinguishes between desirable and undesirable practices in using rating

scales.

Directions: Indicate whether each of the following statements describes a desirable (D) practice

or an undesirable (U) practice in using rating scales with performance-based assessments by

circling the appropriate letter.

D U 1. The descriptive graphic rating scale should be favored over the

numerical scale.

D U 2. In rating performance, derive the characteristics to be rated from

the list of learning objectives.

D U 3. Use a least ten points on each scale to be rated.

D U 4. Use holistic rating procedures to provide students with diagnostic

feedback.

D U 5. Separate ratings of secondary characteristics such as neatness,

form ratings of accomplishment of primary learning objectives.

D U 6. Communicate the criteria to be used in judging performances to

students.

LEARNING GOAL: Constructs items for a rating scale.

Directions: Prepare two items for a descriptive graphic rating scale to be used in assessing some

type of student performance or some product produced by the student. Do not use sample items

from your textbook.

Note: Answers will vary.

122

Exercise 11-E

CHECKLISTS

LEARNING GOAL: Distinguishes between desirable and undesirable practices in using

checklists.

Directions: Indicate whether each of the following statements describes a desirable (D) practice

or an undesirable (U) practice in using checklists by circling the appropriate letter.

D U 1. Use a checklist wherever frequency of occurrence is an important element in the

assessment.

D U 2. When assessing performance with a checklist, include in that checklist both

desired actions and common errors.

D U 3. Use a checklist for assessing some products.

D U 4. Use a checklist to determine if steps in performance were completed in proper

order.

D U 5. Use a checklist for assessing process but not for assessing student products.

D U 6. Avoid the use of checklists in assessing process.

LEARNING GOAL: Constructs a performance checklist.

Directions: Prepare a brief checklist for some simple performance-based assessment task.

Include directions describing how to respond.

Note: Answers will vary.

123

Answers to Student Exercises

11-A 11-D 11-E

1. A 1. U 1. U

2. A 2. D 2. D

3. A 3. U 3. D

4. A 4. U 4. D

5. A 5. D 5. U

6. D 6. D 6. U

124

Chapter 11

Measuring Complex Achievement: Performance-Based Assessments

1. For which of the following types of learning outcomes are performance-based assessment

tasks most useful?

A. Comprehension of concepts

B. Distinguishing fact from opinion

C. Knowledge of appropriate procedures

D. Problem solving

2. Performance-based assessments are more effective than multiple-choice items in measuring

which of the following?

A. the ability to formulate problems

B. the ability to recognize faulty procedures

C. reliability of scoring

D. understanding of concepts

3. Which of the following validity considerations has led to the strongest argument for the

increased use of performance-based assessments?

A. Content

B. Test-criterion relationship

C. Construct

D. Consequences

4. An advantage of performance-based assessments over objective tests is that they can be used

to evaluate which of the following?

A. the attitudes of students

B. both process and product

C. reading skills

D. strengths and weaknesses

5. An advantage of performance-based assessments over objective tests is that they

A. are easier to construct.

B. are easier to score.

C. can better communicate instructional goals requiring complex problem solving.

D. can provide coverage of a broader array of instructional objectives in a given period

of time.

6. Which of the following is a limitation of performance-based assessments?

A. alignment to state standards

B. ability to measure higher-order learning

C. lengthy administration time

D. the ability to assess all learners accurately

125

7. A major advantage of developing criteria for judging performance prior to task

administration is that the criteria can

A. help students understand what is expected.

B. increase students’ appreciation of the task.

C. reduce the scoring burden for teachers.

D. restrict the range of student performances.

8. Restricted-response performance-based assessments are better than extended-response

assessments under which of the following conditions?

A. when the problems are unstructured and allow for multiple solutions

B. when the problems call for originality

C. when the problems require the integration of information from several sources

D. when the problems are structured so that model performances can be constructed

9. Extended-response performance-based assessments are better than restricted-response

assessments under which of the following conditions?

A. when the administration time is limited

B. if a broad sampling of the domain of content is desired

C. when a measure of the ability to gather and integrate information is needed

D. if reliable scoring is of special importance

10. Which of the following would be the best justification for the relatively large amount of time

required to respond to many performance-based assessment tasks?

A. students and parents like them

B. multiple scores can be derived from a single task

C. performance on one task generalizes well to performance on other tasks

D. the tasks can provide students with valuable learning opportunities

11. The dependence of task performance on skills that are irrelevant to the intended purpose of

the assessment tasks (e.g., reading skill for some mathematics tasks) will have the biggest

negative influence on which of the following factors?

A. variety of student performances

B. reliability

C. consistency in scoring

D. validity

12. The most reliable grading of task performances is likely to result when a teacher does which

of the following?

A. grades all performances on one task before going to the next one

B. looks at a student's name before grading the performance

C. starts with performances of the lower performing students

D. uses the rating method of scoring for all tasks

126

13. The points on a rating scale will be least ambiguous when using which of the following

scales?

A. constant alternative

B. descriptive graphic

C. evaluative

D. numerical

14. The most objective information is obtained from an assessment when rating which of the

following?

A. adjustment

B. attitudes

C. overt behavior

D. personality traits

15. Rating the performances of students who are perceived to be most able higher than

comparable performance of students perceived to be less able is an example of which of the

following errors?

A. central tendency error

B. halo effect

C. mathematical-logical error

D. severity error

16. A teacher who rates all performances lower than they are rated by other teachers

demonstrates an example of which of the following errors?

A. central tendency error

B. halo effect

C. mathematical-logical error

D. severity error

17. Which of the following statements best describes an instance of a logical rating error?

A. A rater gives a lower rating to a student who has obtained low scores on previous

tests and assessments than to the same performance by other students.

B. A rater gives low scores to the performance of the class “clown” because of the

rater’s belief that someone who acts so silly in class could not perform well.

C. A rater rates all performances as about average.

D. A rater uses only the high end of the scale in rating all performances.

18. When rating the products of student performances, it is best for educators to do which of the

following?

A. grade all the products produced by a given student before moving on to the next

student

B. identify the student so that background knowledge can be considered

C. include judgments of neatness of the product in the overall rating

D. rate performances on one task for all students before rating performance on

another task

127

19. Analytic scoring is better than holistic scoring when an educator is trying to

A. increase the efficiency of scoring.

B. provide diagnostic feedback to students.

C. remove sources of unreliability.

D. increase overall validity.

20. A checklist should be used in cases where the judgment is based on which of the following?

A. a matter of degree

B. a total impression

C. present or absent decisions

D. ambiguous criteria

21. Describe the advantages and limitations for using performance-based assessments.

22. Describe the types of rating scales used in performance assessment. Contrast a rating scale

with a checklist.

128

Chapter 11: Answer Key

1. D

2. A

3. D

4. B

5. C

6. C

7. A

8. D

9. C

10. D

11. D

12. A

13. B

14. C

15. B

16. D

17. B

18. D

19. B

20. C

21. A major advantage of performance assessments is that they can clearly communicate

instructional goals that involve complex performances in natural settings in and outside of

school. A second advantage of performance assessments is that they can measure complex

learning outcomes that cannot be measured by other means. A third advantage of

performance assessments is that they provide a means of assessing process or procedure as

well as the product that results from performing a task. Finally, a fourth advantage of

performance assessments is that they implement approaches that are suggested by modern

learning theory. Regarding limitations, the most commonly cited limitation of performance

assessments is the unreliability of ratings of performances across teachers or across time for

the same teacher. A second limitation of performance assessments is that they are time-

consuming.

22. The type of rating scale most often used in performance assessments is the numerical rating

scale. With this type of rating scale, the rater checks or circles a number to indicate the

degree to which a characteristic is present. Another type of rating scale is the graphic rating

scale. Another type of scale is the graphic rating scale. The distinguishing feature of the

graphic rating scale is that a horizontal line follows each characteristic. The rating is made by

placing a check on the line. A set of categories identifies specific positions along the line, but

the rater is free to check between these points. Checklists differ from rating scales in that

checklists are of the yes-no variety and thus have only two possible choices.

129

Chapter 12 Portfolios

Exercise 12-A

PURPOSES

LEARNING GOAL: Identifies and distinguishes among the major purposes of portfolios.

Directions: The poles of four dimensions distinguishing the purposes of portfolios are listed

below. Describe the ways in which the purposes of portfolios at the ends of each continuum

differ.

1. a. Instruction

b. Assessment

2. a. Current accomplishments

b. Progress

3. a. Showcase

b. Documentation

4. a. Finished

b. Working

Note: Answers will vary.

130

Exercise 12-B

STRENGTHS AND WEAKNESSES

LEARNING GOAL: Identifies major strengths and weaknesses of using portfolios of student

work for particular purposes.

Directions: Identify four strengths and three weaknesses of portfolios when used for purposes of

assessment.

Strengths:

1.

2.

3.

4.

Weaknesses:

1.

2.

3.

Note: Answers will vary.

131

Exercise 12-C

GUIDELINES FOR PORTFOLIO ENTRIES

LEARNING GOAL: Constructs sample guidelines for entries in a portfolio designed for a

specified assessment purpose.

Directions: Assuming that a portfolio is intended for assessing a student’s progress in writing

during the school year, construct sample guidelines for entries in a portfolio dealing with each of

the four issues below.

Guidelines of Uses of Portfolio:

Guidelines Regarding Access to Portfolio:

Guidelines on Portfolio Construction and Entries:

Guidelines on the Criteria for Evaluation of the Portfolio:

Note: Answers will vary.

132

Exercise 12-D

EVALUATION CRITERIA

LEARNING GOAL: Identifies characteristics of effective criteria for evaluating student

portfolios.

Directions: Indicate whether measurement specialists would agree (A) or disagree (D) with

each of the following statements concerning current trends in testing and assessment by circling

the appropriate letter.

A D 1. Fairness is enhanced by clear specifications of evaluation criteria.

A D 2. Reliability of scores assigned to portfolios is enhanced by using holistic evaluation

criteria for the portfolio as a whole rather than criteria for individual entries.

A D 3. It is desirable for students to include self-evaluations of their work with their

portfolio entries.

A D 4. Analytic criteria are more useful for summative evaluations than for formative

evaluations of portfolio entries.

A D 5. Evaluation criteria should be communicated to students in the guidelines provided to

students for constructing their portfolios.

A D 6. Evaluation criteria, while necessary for portfolios used for assessment purposes, are

not needed for portfolios used for instructional purposes.

LEARNING GOAL: Constructs evaluation criteria for entries in a portfolio used to display best

works.

Directions: Construct a set of evaluation criteria to be used for a portfolio designed to be a

showcase of a student’s best work in a subject area of interest to you.

Note: Answers will vary.

133

Exercise 12-E

PORTFOLIO CONSTRUCTION

LEARNING GOAL: Constructs a showcase portfolio for a class

Directions: For a class of your choice, construct a showcase portfolio to demonstrate your best

work in that subject area.

Note: Answers will vary.

134

Answers to Student Exercises

12-D

1. A

2. D

3. A

4. D

5. A

6. D

135

Chapter 12

Portfolios

1. Student portfolios are distinguished from file folders of work in that portfolios are

characterized as _________ collections of student work.

A. artistic

B. comprehensive

C. graded

D. purposeful

2. One of the strengths of portfolios that makes them appealing to many teachers is

A. the efficiency and time savings they provide to teachers.

B. the ease with which they can be integrated with instruction.

C. their high reliability.

D. their uniformity of work for purposes of grading.

3. A potential weakness in using portfolios for purposes of student assessment is that they

A. are used in parent conferences.

B. frequently include only “best work” entries.

C. lack standardization needed for comparability.

D. often include student self-evaluations of their work.

4. Portfolios of student work can be especially useful in parent-teacher conferences because

they provide parents with which of the following?

A. a complete record of student work

B. concrete examples of student accomplishments

C. grades on each entry in the portfolio

D. reliable scores that are easily understood

5. Portfolios involving student collaboration on entries are most readily justified when

portfolios are used for purposes of

A. assessment.

B. grading.

C. instruction.

D. job applications.

6. Which of the following is a major obstacle to the effective use of portfolios?

A. they are labor intensive

B. they frequently involve student collaboration

C. they include only examples of a student’s best work

D. they are unpopular with students

136

7. Which of the following is a common misperception of portfolios?

A. they can be used for communication with parents

B. they consist of a haphazard collection of student work

C. they include student self-evaluations of their work

D. they require a clear specification of purpose

8. Comparability of the selections included in a portfolio is of greatest concern when portfolios

are used for which of the following?

A. assignment of course grades

B. communication with parents

C. feedback to students

D. instructional purposes

9. The types of work that are appropriate to include in a portfolio should be

A. completely open to allow for student creativity.

B. determined solely by the student.

C. specified in the portfolio guidelines.

D. the same for all types of portfolios.

10. Which of the following evaluation criteria is most useful when using analytic scoring criteria

for individual portfolio entries?

A. formative evaluation

B. parent teacher conferences

C. student self analysis

D. summative evaluation

11. Which of the following goals is least likely to be effectively achieved using portfolio

assessments?

A. communication with parents

B. demonstration of progress in achievement

C. student self-reflection concerning performance

D. the assessment of factual knowledge

12. The assignment of summative grades to students based on portfolio assessments is best done

with the use of which of the following?

A. analytic evaluation criteria

B. holistic evaluation criteria

C. peer evaluations

D. student self evaluations

13. The inclusion of collaborative assessment tasks in portfolios is particularly useful when

portfolios are used primarily for purposes of

A. assessment

B. communication with parents

C. grading

D. instruction

137

14. Reliability in rating portfolios is enhanced by which of the following?

A. clearly specified evaluation criteria

B. collaborative assessment tasks

C. student freedom to decide on the types of work to include

D. the inclusion of drafts and peer comments as well as final copy

15. Rescoring of portfolios by persons other than a student’s teacher is most likely to be needed

when portfolios are used for which of the following?

A. assigning grades to students

B. formative evaluation of student work

C. interrater reliability

D. reporting achievement to parents

16. Many of the rubric rules for grading performance assessments also apply to grading

portfolios.

A. True

B. False

17. It is best to evaluate portfolios on their appearance.

A. Agree

B. Disagree

18. Which of the following is an advantage of communicating portfolios results with parents?

A. it allows parents to have input into classroom curriculum

B. it usually results in parents liking the teacher better

C. it is used as evidence to retain a given student

D. gives parents insight into what goes on in classrooms

19. Which of the following is a particular issue that educators need to consider when portfolios

are used for group projects?

A. Will the work be best work or work in progress?

B. Will students receive an individual grade or a group grade?

C. Should spelling and grammar count?

D. Are all of the children reading at grade level?

20. Which of the following words best describes portfolios?

A. accidental

B. systematic

C. occasional

D. voluntary

21. Describe the major advantages and limitations of portfolios.

22. Why are portfolios useful tools in communicating student progress to parents?

138

Chapter 12: Answer Key

1. D

2. B

3. C

4. B

5. C

6. A

7. B

8. A

9. C

10. A

11. D

12. B

13. D

14. A

15. C

16. A

17. A

18. D

19. B

20. B.

21. Advantages to portfolios include: They are useful in integrating with student instruction; they

give students an opportunity to show what they can do; they encourage students to become

reflective learners; and they help students take responsibility for setting goals and evaluating

their progress. Other advantages: they provide teachers and students with opportunities to

collaborate and reflect on student progress; they are effective way of communicating with

parents by showing concrete examples of student work and demonstrations of progress; they

provide a mechanism for student-centered and student-directed conferences with parents; and

they give parents concrete examples of a student’s development over time as well as current

skills. Among their limitations are that they are difficult to score, they often suffer from poor

interrater reliability, they are more difficult to construct than they at first appear, and it is

difficult to convert portfolio assessment to summative grades.

22. Portfolios provide an excellent means of communicating with parents. The products and

student self-reflections can provide parents with a window into the classroom. It gives them a

more intimate basis for seeing aspects of their children’s experiences in school. Portfolios

can also be used as a vehicle for student-directed conferences with students, parents, and

teachers. The specifics of the portfolio provide a framework for meaningful three-way

discussions of the student’s achievements, progress, and areas to work on next. Parents’

comments on the specific entries and overall portfolio can also contribute to and become part

of the portfolios.

139

Chapter 13

Assessment Procedures: Observational Techniques, Peer Appraisal, and Self-Report

Exercise 13-A

ANECDOTAL RECORDS

LEARNING GOAL: Distinguishes between desirable and undesirable practices in using

anecdotal records.

Directions: Indicate whether each of the following statements describes a desirable (D) practice

or an undesirable (U) practice in using anecdotal records by circling the appropriate letter.

D U 1. Confine observations to areas that can be verified by objective testing.

D U 2. Keep factual descriptions of incidents and interpretations of them separate.

D U 3. Limit each anecdote to a single incident.

D U 4. Limit each anecdote to the behavior of only one student.

D U 5. Wait until after school hours to record the observed incidents.

D U 6. Record both positive and negative incidents.

LEARNING GOAL: Writes an anecdotal record on an incident.

Directions: Briefly observe some aspect of student performance (e.g., speaking, playing a game)

and write an anecdotal record of the incident.

Note: Answers will vary.

140

Exercise 13-B

USE OF PEER APPRAISAL AND SELF-REPORT TECHNIQUES

LEARNING GOAL: Selects the most appropriate technique for a particular use.

Directions: Indicate which technique is most appropriate for each of the uses listed below by

circling the appropriate letter. Use the key below.

KEY G = “Guess who” technique, S = Sociometric technique,

A = Attitude scale, I = Interest inventory

G S A I 1. To analyze the social structure of a group.

G S A I 2. To aid in selecting reading material for a poor reader.

G S A I 3. To determine the reputation a student holds among his or her classmates.

G S A I 4. To see how accurately students rate peer’s talents and abilities.

G S A I 5. To determine how well a particular student is accepted by his or her

classmates.

G S A I 6. To aid students in career planning.

LEARNING GOAL: States the advantages and disadvantages of peer appraisal and self-report

techniques.

Directions: Briefly state one advantage and one disadvantage of each of the following

techniques.

Peer appraisal:

Self report:

Note: Answers will vary.

141

Exercise 13-C

GUESS WHO TECHNIQUE

LEARNING GOAL: Distinguishes between desirable and undesirable practices in using the

guess who technique.

Directions: Indicate whether each of the following statements describes a desirable (D) practice

or an undesirable (U) practice in using the guess who techniques by circling the appropriate

letter.

D U 1. Use only clearly favorable behavior descriptions.

D U 2. Have students write as many names as they wish for each behavior description.

D U 3. Permit students to name a person for more than one behavior description.

D U 4. Have students respond by using first name and initial of last name.

D U 5. Use the “guess who” technique for evaluation personal and social development

only.

D U 6. Score the responses by counting the number of nominations a student

receives on each behavior description.

LEARNING GOAL: Constructs items for a “guess who” form.

Directions: List six statements that could be used in a “guess who” form for evaluating students

“study and work habits.”

Note: Answers will vary.

142

Exercise 13-D

SOCIOMETRIC TECHNIQUE

LEARNING GOAL: Distinguishes between desirable and undesirable practices in using the

sociometric technique.

Directions: Indicate whether each of the following statements describes a desirable (D) practice

or and undesirable (U) practice in using the sociometric technique by circling the appropriate

letter.

D U 1. Students should identify themselves on sociometric assessment instruments.

D U 2. The situations used in sociometric choosing should be ones in which all students are

equally free to participate.

D U 3. Students should be told to state their first choice only, in order to simplify the

tabulation of results.

D U 4. The plotted sociogram should show the social position of each student and the social

pattern of the group.

D U 5. Each student should be shown, in an individual conference, his or her place on the

sociogram.

D U 6. Sociometric choices should be used to assess the influence of school practices on

students' social relations.

LEARNING GOAL: Constructs items for a sociometric form.

Directions: List three choice situations to be used on a sociometric form. The situations should

be suitable for the grade level at which they will be used. Indicate the grade level. Do not use any

sample items from your textbook.

Note: Answers will vary.

143

Exercise 13-E

ATTITUDE MEASUREMENT

LEARNING GOAL: Distinguishes between desirable and undesirable practices in using a

Likert-type attitude scale.

Directions: Indicate whether each of the following statements describes a desirable (D) practice

or an undesirable (U) practice in using a Likert-type attitude scale by circling the appropriate

letter.

D U 1. Use only clearly favorable and unfavorable attitude statements.

D U 2. Have students write statements for use in the attitude scale.

D U 3. Use seven or more scale choices for each question.

D U 4. Have students respond by indicating how strongly they agree or disagree.

D U 5. Use a group of judges to obtain scoring weights.

D U 6. Have students put their names on the attitude scale.

LEARNING GOAL: Constructs a Likert-type attitude scale.

Directions: List six statements that could be used to measure student attitudes toward testing

according to a Likert-type scale. Include a place to respond and the scoring weights for each

item.

Note: Answers will vary.

144

Answers to Student Exercises

13-A 13-B 13-C 13-D 13-E

1. A 1. S 1. U 1. U 1. D

2. D 2. I 2. D 2. D 2. U

3. D 3. G 3. D 3. U 3. U

4. U 4. G 4. D 4. D 4. D

5. U 5. S 5. U 5. U 5. D

6. D 6. I 6. D 6. D 6. U

145

Chapter 13

Assessment Procedures: Observational Techniques, Peer Appraisal, and Self-Report

1. Anecdotal records can be made most useful by observing a student under which of the

following conditions?

A. at the same time each day

B. for the same amount of time for each observation

C. in various situations

D. under standardized conditions

2. Which of the following is the most serious limitation in the use of anecdotal records in the

classroom?

A. the lack of opportunities to observe

B. the lack of student cooperation

C. possible bias in the observations

D. the need for several observers

3. When writing anecdotal records, educators should try to include an objective record of

students’

A. attitudes.

B. motivation.

C. unique behavior.

D. values.

4. Anecdotal records are best for obtaining information about which of the following student

skills?

A. mathematical

B. social

C. science

D. writing

5. The value of anecdotal records can be improved by recording student behavior

A. after school, when a complete record can be written.

B. in positive terms only.

C. occurring in a variety of situations.

D. on cards instead of a note pad.

6. Which of the following methods is best for highlighting evidence of exceptional or atypical

student behavior on the playground?

A. Anecdotal record

B. Checklist

C. Peer appraisal

D. Self-appraisal

146

7. Peer appraisal methods are especially useful in which of the following areas?

A. attitudes

B. interests

C. performance skills

D. social skills

8. Self-report techniques are most useful when

A. frank responses are given.

B. the items are in question form.

C. they are scored objectively.

D. they are interpreted by counselors.

9. Students make nominations to fit behavior descriptions when using which of the following

techniques?

A. guess who

B. paired-comparison

C. sociometric

D. self-report

10. The results of the guess who technique should be interpreted as evidence of how students are

A. treated by others.

B. viewed by others.

C. feeling about school activities.

D. feeling about themselves.

11. The self-report technique is likely to provide valid evidence when used to assess students'

A. achievement.

B. interests.

C. personal adjustment.

D. personality traits.

12. Which of the following best describes the attitude scale assessment?

A. an objective test

B. a peer-appraisal method

C. a projective test

D. a self-report method

13. A Likert-type attitude scale should include which of the following characteristics?

A. clearly favorable and unfavorable statements

B. questions at progressing difficulty levels

C. questions measuring personal abilities and skills

D. statements covering negative attitudes

147

14. Which of the following statements would be best for a Likert-type attitude scale?

A. Reading helps you study better.

B. Reading is exciting.

C. Reading is one of the basic skills.

D. Some students like to read.

15. Which of the following numerical choices should a Likert scale possess?

A. 1–2

B. 2–3

C. 3–5.

D. 7 or more

16. Which of the following is assumed in a student self-assessment?

A. objectivity

B. telling the truth

C. writing ability

D. long-term memory

17. A student's response to each set of three items on a Kuder General Interest Survey indicates

which of the following results?

A. disliking of each item

B. liking, indifference, or disliking of each item

C. liking of each set of 3 items in comparison to other sets

D. ranking of items in each set

18. The routine use of personality inventories in the school has declined primarily because of

which of the following factors?

A. difficulty of scoring

B. invasion of privacy issues

C. unreliability of the scores

D. wider use of projective techniques

19. It takes special training to appropriately administer and score projective personality

assessments.

True

False

20. Projective personality assessments are usually assessed with a Likert scale.

Agree

Disagree

21. Describe the advantages and limitations for using anecdotal records.

22. Describe the characteristics of a Likert scale. For what purpose might it be used?

148

Chapter 13: Answer Key

1. C

2. C

3. C

4. B

5. C

6. A

7. D

8. A

9.A

10. B

11. B

12. D

13. A

14. B

15. C

16. B

17. D

18. B

19. True

20. Disagree

21. Probably the most important advantage of anecdotal records is that they depict actual

behavior in natural situations. Anecdotal records also allow for descriptions of the most

characteristic behavior of a student, and they facilitate gathering evidence on events that are

exceptional but significant. Anecdotal records can also be used with very young students and

with students who have limited basic communication skills. One limitation of anecdotal

records is the amount of time required to maintain an adequate system of records. Another

serious limitation is the difficulty of being objective when observing and reporting student

behavior. A third difficulty is obtaining an adequate sample of behavior, which can

negatively impact validity.

22. A Likert scale is a self-report method giving clearly favorable or unfavorable attitude

statements; it asks the students to respond to each statement. Most Likert scales use a five-

point system: strongly agree (SA), agree (A), undecided (U), disagree (D), and strongly

disagree (SD). Likert scales are most commonly used to assess attitudes.

149

Chapter 14

Assembling, Administering, and Appraising Classroom Tests and Assessments

Exercise 14-A

REVIEWING AND ARRANGING ITEMS AND TASKS IN CLASSROOM TESTS AND

ASSESSMENTS

LEARNING GOAL: Distinguishes between good and bad practices in reviewing and arranging

test items and assessment tasks.

Directions: Indicate whether each of the following statements describes a good (G) practice or a

bad (B) practice in reviewing items and tasks and arranging them in classroom tests and

assessments by circling the appropriate letter.

G B 1. Have another teacher review the items and tasks for defects.

G B 2. Recheck relevance to the specific learning outcome when reviewing an item or

task.

G B 3. During item and task review, remove any racial or sexual stereotyping.

G B 4. Group items by type (e.g. multiple choice, true false etc,).

G B 5. Intersperse true-false items among multiple-choice items.

G B 6. Put easy items last to maintain student motivation.

LEARNING GOAL: Prepares a list of points for reviewing and arranging test items and

assessment tasks.

Directions: Make a list of four do’s and four don’ts to serve as a guide for reviewing and

arranging test items and assessment tasks.

DO

1.

2.

3.

4.

DON'T

1.

2.

3.

4.

Note: Answers will vary.

150

Exercise 14-B

PREPARING TEST DIRECTIONS

LEARNING GOAL: Prepares sample directions for a classroom test.

Directions: Prepare a complete set of directions for a test in a specific subject. Assume that the

general directions for the test as a whole and the specific directions for each item type are all for

the same test.

General directions:

Short-answer items:

True-false items:

Multiple-choice items:

Matching items:

Note: Answers will vary.

151

Exercise 14-C

ADMINISTERING AND SCORING CLASSROOM TESTS AND ASSESSMENTS

LEARNING GOAL: Distinguishes between good and bad practice in administering and scoring

classroom tests and assessments.

Directions: Indicate whether each of the following statements describes a good (G) practice or a

bad (B) practice in administering and scoring classroom tests and assessments by circling the

appropriate letter.

G B 1. Students are told whether there is a correction for guessing.

G B 2. Students answer every item, so their scores are corrected for guessing.

G B 3. Students are told repeatedly how important this test is to their grade.

G B 4. Students are told to skip items that seem too difficult and come back to them later.

G B 5. The teacher explains the meaning of an ambiguous question to the student who

asked about it.

G B 6. An objective test is scored by counting important items 1 point and very

important items 2 points.

LEARNING GOAL: Describes and illustrates the use of the correction-for-guessing formula.

Directions: Describe when the correction-for-guessing formula should and should not be used

for classroom tests and compare the corrected scores for the given data.

Use the correction formula when:

Do not use the correction formula when:

Compute the corrected scores on an eight-item, true-false test for the student responses shown

below:

(R = Right, W = Wrong, O = Omit)

Corrected

1 2 3 4 5 6 7 8 Score

Bob R R R R R R O O

Sara R R R R R W R W

Terry R R R W O R W W

152

Exercise 14-D

APPLICATION OF ITEM ANALYSIS PRINCIPLES TO PERFORMANCE-BASED

ASSESSMENT TASKS

LEARNING GOAL: Applies and interprets item analysis principles with performance-based

assessment tasks.

Directions: A set of eight performance-based assessment tasks were administered to a group of

30 students. Each task was scored on a five-point scale. The total score for the assessment was

the sum of the eight task scores. Total scores for the assessment and scores on the last task of the

assessment are listed below. The scores are listed in order of the total score. Use these data to

analyze the discriminating power of last task by comparing the performance of the upper and

lower groups of ten students.

Student 1 2 3 4 5 6 7 8 9 10

Total

Score

36 35 35 34 33 33 33 32 32 32

Score

on

Item 8

5 5 4 5 4 4 4 3 5 3

Student 11 12 13 14 15 16 17 18 19 20

Total

Score

31 30 30 30 29 29 29 29 28 28

Score

on

Item 8

4 4 3 3 4 3 3 3 3 2

Student 21 22 23 24 25 26 27 28 29 30

Total

Score

27 27 26 25 23 23 21 20 18 16

Score

on

Item 8

4 3 3 3 3 2 3 2 1 1

Construct an analysis table for Item 8

153

Answers to Student Exercises

14-A 14-C 1. G 1. G

2. G 2. B

3. G 3. G

4. G 4. G

5. B 5. G

6. B 6. B

14-D

Score 1 2 3 4 5

Upper 0 0 2 4 4

Lower 2 2 5 1 0

154

Chapter 14

Assembling, Administering, and Appraising Classroom Tests and Assessments

1. A fellow teacher’s review of test items and assessment tasks is helpful in

A. identifying the objectives measured.

B. improving clarity.

C. improving difficulty.

D. relating items and tasks to instruction.

2. When arranging items in a test, it is best to ensure which of the following?

A. item types are mixed within each section

B. essay questions are placed last

C. difficult items are placed first

D. items are placed randomly

3. If test directions instruct students to “answer every item,” it is not recommended that

educators

A. compute item difficulty.

B. compute item discrimination.

C. correct for guessing.

D. determine split-half reliability.

4. Which of the following is a desirable procedure for reducing student cheating on a test?

A. Correct the test for guessing

B. Do not permit questions during testing

C. Have students turn in all scratch paper

D. Allow students to use class notes

5. The correction-for-guessing formula assumes that student guesses are based on which of the

following?

A. blind choosing

B. incorrect information

C. partial information

D. testwiseness

6. A multiple-choice test contains 100 items, each having a correct answer and three distracters.

Which of the following would be the corrected score for 70 correct and 24 incorrect on the

test using the correction-for-guessing formula?

A. 58

B. 60

C. 62

D. 64

155

7. The effect of guessing on scores can be best be reduced on a multiple-choice test by increasing

which of the following?

A. complexity of the items

B. number of alternatives

C. objectivity of the items

D. use of interpretive exercises

8. A test item has positive discriminating power when answered correctly by

A. all students.

B. more high-scoring students.

C. more low-scoring students.

D. average students.

9. On a test of 50 students, if 25 students answered an item correctly, the item difficulty is

A. 25%.

B. 50%.

C. 75%.

D. 100%.

10. If item analysis data showed that an item was answered correctly by 8 out of 10 students in

the upper group and 6 out of 10 students in the lower group, the difficulty of the test item is

A. 20%.

B. 60%.

C. 70%.

D. 80%.

11. Low discriminating power is acceptable, but only if the item

A. has a 50% level of difficulty.

B. has only three alternatives.

C. is closely related to other items.

D. measures a unique learning outcome.

12. Which of the following is a major limitation of using item analysis procedures with the

typical classroom test?

A. the complexity of the computations

B. the difficulty of the interpretations

C. the small number of students

D. the time needed to collect the data

13. If 8 out of 10 students in the upper group and 2 out of 10 students in the lower group answer

an item correctly, then the difficulty and discriminating power of the item would be

A. 50%, .60.

B. 50%, .80.

C. 60%, .60.

D. 60%, .80.

156

14. A distracter in a multiple-choice item is judged good if it attracts more students who have

A. cheated on the test.

B. obtained high scores.

C. obtained low scores.

D. marked the items carelessly.

15. The 10 students with the highest scores on a set of 8 assessment tasks had the following

distribution of scores on one of the assessment tasks that was scored on a 3-point scale: 1 (1

student), 2 (5 students), 3 (4 students). Which of the following distributions of scores for the

10 lowest scoring students indicates that the task discriminated negatively?

A. 1 (3 students), 2 (4 students), 3 (3 students)

B. 1 (5 students), 2 (5 students), 3 (0 students)

C. 1 (9 students), 2 (1 student), 3 (0 students)

D. 1 (0 students), 2 (3 students), 3 (7 students)

16. If a test is composed of items that all have high discrimination indexes (based on the total test

score), the test is said to also have

A. content relevance.

B. difficulty.

C. high reliability.

D. high predictive power.

17. Item discriminating power should typically not be interpreted as item validity because

A. item analysis usually is based on a partial sample..

B. item analysis usually uses an internal criterion.

C. item validity is also based on item difficulty.

D. defects in test items lower validity.

18. In a 100-item criterion-referenced test, how many items should have zero difficulty at the end

of instruction?

A. none

B. 10

C. 50

D. 100

157

19. An item analysis yields the following results for a multiple-choice item (alternative A is the

correct answer):

Number of students in the upper and lower scoring groups on the total test choosing each

alternative.

Alternative A* B C D Omit Total

Upper Group 3 1 5 1 0 10

Lower Group 6 2 0 1 1 10

Which one of the following statements is NOT justified by these results?

A. The item should be reviewed for possible miskeying.

B. The item has negative discrimination.

C. The item has a difficulty of 45%.

D. The item is invalid.

20. It is probably a sound motivational procedure for the teacher to announce that an upcoming

test is high stakes for students.

True

False

21. One good method for discouraging cheating is to proctor an exam by moving around the

room.

Agree

Disagree

22. If paper costs are a factor, educators can reduce the print size of test questions.

True

False

23. It is good practice to not break up a test item and continue it on the next page.

Agree

Disagree

24. It is good practice to have a colleague check test items for bias.

True

False

158

Chapter 14: Answer Key

1. B

2, B

3. C

4. C

5. A

6. C

7. B

8. B

9. B

10. C

11. D

12. C

13. A

14. C

15. D

16. C

17. B

18. A

19. D

20. True

21. Agree

22. False

23. Agree

24. True

159

Chapter 15 Grading and Reporting

Exercise 15-A

TYPES OF MARKING AND REPORTING SYSTEMS

LEARNING GOAL: Distinguishes among the characteristics of different types of marking and

reporting systems.

Directions: Indicate which type of marking and reporting system best fits each statement listed

below by circling the appropriate letter, using the following key.

KEY: A = traditional letter grade (A, B, C, D, F), B = Two-letter grade (pass, fail),

C = Checklist of objectives, D = Parent-teacher conference.

A B C D 1. Provides for two-way reporting.

A B C D 2. Provides most useful learning guide to student.

A B C D 3. Provides least information concerning learning.

A B C D 4. Most preferred by college admissions officers.

A B C D 5. May be too complex to be understood by parents.

A B C D 6. Most widely used method of reporting in high school.

LEARNING GOAL: Lists the advantages and disadvantages of the traditional (A, B, C, D, F)

marking system.

Directions: List the advantages and disadvantages of using the traditional (A, B, C, D, F)

marking system as the sole method of reporting student progress.

Advantages:

Disadvantages:

Note: Answers will vary.

160

Exercise 15-B

ASSIGNING RELATIVE LETTER GRADES

LEARNING GOAL: Distinguishes between desirable and undesirable practices in assigning

relative letter grades.

Directions: Indicate whether each of the following statements describes a desirable (D) practice

or an undesirable (U) practice in assigning relative letter grades by circling the appropriate

letter.

D U 1. The grades should reflect the learning outcomes specified for the course.

D U 2. To give test scores equal weight in a composite score, the scores should be

simply added together.

D U 3. If you decide to assign different weights to some scores, the weighting should be

based on the maximum possible score on the test.

D U 4. Grades should be lowered for tardiness or misbehavior.

D U 5. Grading typically should be based on the normal curve.

D U 6. Pass-fail decisions should be based on an absolute standard of achievement.

LEARNING GOAL: Assigns weights in obtaining composite scores for grading purposes.

Directions: Following is a list of types of information a teacher would like to include in

assigning a final grade to each student. If the teacher wants to count each type of information

one-fourth of the final grade, what weight should be given to each type of information?

Type of Information Range of Scores Weight to Be Used

_______________________ _______________ _________________

Midsemester examination 30 to 50

Term project 5 to 10

Performance assessments 15 to 25

Final examination 20 to 100

Note: Answers will vary.

161

Exercise 15-C

ASSIGNING ABSOLUTE GRADES

LEARNING GOAL: Distinguishes between desirable and undesirable practices in assigning

absolute grades.

Directions: Indicate whether each of the following statements describes a desirable (D) practice

or an undesirable (U) practice in assigning absolute letter grades by circling the appropriate

letter.

D U 1. Absolute grades should be used with mastery learning.

D U 2. Clearly defined domains of learning tasks should provide the basis for

grading.

D U 3. If all students pass a test, a harder test should be given before grades are

assigned.

D U 4. The distribution of grades to be assigned should be predetermined and

explained.

D U 5. Criterion referenced grades may be assigned as pass/fail.

D U 6. When you are using absolute grading, the standard for passing should be

predetermined.

LEARNING GOAL: Lists guidelines for effective grading.

Directions: List five important guidelines for effective grading.

1.

2.

3.

4.

5.

Note: Answers will vary.

162

Exercise 15-D

PARENT-TEACHER CONFERENCE

LEARNING GOAL: Distinguishes between desirable and undesirable practices in conducting a

parent-teacher conference.

Directions: Indicate whether each of the following statements describes a desirable (D) practice

or an undesirable (U) practice in conducting parent-teacher conferences by circling the

appropriate letter.

D U 1. Before the conference, assemble a portfolio of specific information about and

examples of the student's learning progress.

D U 2. Present examples of the student’s work to parents

D U 3. Begin the conference by describing the student’s learning difficulties.

D U 4. Make clear to parents that, as a teacher, you know what is best for the

student's learning and development.

D U 5. In the concluding phase, review your conference notes with the parents.

D U 6. End the conference with a positive comment about the student.

LEARNING GOAL: Lists questions that might be asked of parents during the conference.

Directions: Write a list of questions that you could ask parents during parent-teacher conferences

that might help you better understand students’ problems regarding learning and development.

Note: Answers will vary.

163

Exercise 15-E

REPORTING RESULTS OF PUBLISHED TESTS TO PARENTS

LEARNING GOAL: Distinguishes between desirable and undesirable practices in reporting

results of published tests to parents.

Directions: Indicate whether each of the following statements describes a desirable (D) practice

or an undesirable (U) practice in reporting results of published tests to parents by circling the

appropriate letter.

D U 1. Describe what the test measures in brief, understandable terms.

D U 2. Make clear the distinction between percentile rank and percentage-correct scores.

D U 3. Use grade-equivalent scores to indicate the grade at which the student can

perform.

D U 4. Give explanations without using jargon whenever possible.

D U 5. Describe a difference between two test scores as a “real difference” only after the

error of measurement is considered.

D U 6. Explain how the test results will be used only if the parent asks.

LEARNING GOAL: Identifies and corrects common errors in reporting standardized test results.

Directions: Indicate what is wrong with each of the following statements and rewrite each one so

that it provides an accurate report.

1. “Derek’s percentile rank of 70 in spelling means he can spell 70 percent of the words in

the test.”

2. “Marie’s stanine score of 6 in reading indicates she is performing below average in

reading.”

3. “Erik’s grade-equivalent scores of 5.4 in reading and 6.2 in math indicate that his

performance in math is superior to his performance in reading.”

Note: Answers will vary.

164

Answers to Student Exercises

15-A 15-B 15-C 15-D 15-E

1. D 1. U 1. D 1. D 1. D

2. C 2. D 2. D 2. D 2. D

3. B 3. U 3. D 3. D 3. D

4. A 4. U 4. U 4. D 4. D

5. C 5. U 5. D 5. D 5. D

6. A 6. D 6. D 6. U 6. U

165

Chapter 15

Grading and Reporting

1. The main purpose of a marking and reporting system should be to accomplish which of the

following?

A. improve student learning

B. inform parents about students’ school progress

C. maintain effective school records

D. provide evidence of achievement for colleges and employers

2. A serious limitation of reporting progress with a single letter grade only is that letter grades

A. are disliked by administrators.

B. are difficult to average.

C. include too many different elements.

D. tend to be limited to achievement.

3. When a letter grade (A, B, C, D, F) is used to report student progress, the grade should be

based on which of the following?

A. achievement

B. effort

C. attitude

D. behavior

4. A school’s marking and reporting system should be based on which of the following?

A. estimates of students’ learning ability

B. fixed percentages of grades

C. instructional objectives

D. a normal curve

5. An effective grading and reporting system is based on which of the following?

A. adequate assessment of students

B. estimates of each student's learning potential

C. the normal curve

D. the use of at least five test scores

6. Which of the following methods is most useful for overcoming learning difficulties?

A. Checklist of objectives

B. Informal letter to parents

C. Pass-fail system

D. Single letter grade

166

7. Assigning grades that represent a pure measure of achievement is most feasible with which

of the following systems?

A. multiple marking

B. pass-fail

C. satisfactory-unsatisfactory

D. single letter grade

8. Which of the following is a major disadvantage of using the pass-fail system?

A. it tends to lower the grade-point average

B. it doesn’t assess graduations of learning

C. students take courses they are unable to pass

D. teachers find grading more difficult

9. One advantage of the pass-fail grading system in elective courses is that it

A. encourages students to explore new areas of study.

B. helps students improve their grade-point average.

C. makes criterion-referenced grading possible.

D. motivates students to study harder.

10. Which of the following is the most serious limitation of the traditional letter grade as a means

of reporting student progress?

A. colleges prefer more detailed reports

B. schools have not agreed upon a common set of letters

C. they are limited to academic learning outcomes

D. they lack common meaning from one teacher to another

11. When weighting sets of test scores to obtain a composite score for assigning grades, the

weighting should be based on which of the following?

A. average score on each test

B. right score on each test

C. number of items on each test

D. spread of scores on each test

12. Absolute grading would require information concerning a student’s

A. growth in achievement.

B. level of performance.

C. performance in relation to learning ability.

D. rank in the group.

13. Relative grading involves comparing a student’s achievement to which of the following?

A. a set of norms

B. the student’s learning ability

C. the student’s past performance

D. the achievement levels of the other students

167

14. Mastery learning would most likely require which of the following types of grading?

A. absolute

B. relative

C. curved

D. stratified

15. The distribution of letter grades (A, B, C, D, F) to be assigned in relative grading should be

determined by which of the following?

A. school aides and parents

B. class averages

C. teacher and administrator agreement

D. the percentages in a normal distribution

16. One advantage of a parent-teacher conference as a reporting method is

A. its ease of use.

B. its flexibility.

C. the systematic record it provides.

D. the time it saves in preparing written reports.

17. During parent-teacher conferences, which of the following actions should be avoided?

A. beginning by discussing the student’s weaknesses

B. interruptions by the parent during the teacher’s report

C. listening to parents’ complaints about school

D. telling the parents anything negative about the child

18. Near the end of a parent-teacher conference, it is most important for the teacher to

A. clarify the student’s shortcomings.

B. plan how the conference can be ended on time.

C. summarize and plan a course of action.

D. tell the parents what you they are expected to do next.

19. When reporting standardized test results to parents, the explanation should be

A. complete and detailed.

B. kept separate from other assessment information.

C. presented in simple terms.

D. repeated for clarity.

20. Which of the following is the most useful score for reporting standardized test results to

parents?

A. NCE score

B. percentile rank

C. raw score

D. T-score

168

21. Grades such as T-scores are based on variability.

True

False

22. Sending letters home to the parent or guardian as a report card assumes that the adult is able

to understand the vocabulary on the report.

Agree

Disagree

23. It is appropriate to collapse achievement and effort into a single letter grade.

True

False

24. Record keeping of student grades may be streamlined by using computer spreadsheets.

Agree

Disagree

25. List and discuss three major limitations to letter grading systems.

26. Under what circumstances are letters to parents useful in regards to reporting student grades?

Should this system be the sole method of reporting grades? Why or why not?

169

Chapter 15: Answer Key

1. A

2. C

3. A

4. C

5. A

6. A

7. A

8. B

9. A

10. D

11. D

12. B

13. D

14. A

15. C

16. B

17. A

18. C

19. C

20. B

21. True

22. Agree

23. True

24. Agree

25. There are three major limitations to traditional letter grades. First, they typically represent a

combination of achievement, effort, work habits, and good behavior. Second, the proportion

of students assigned each letter grade varies from teacher to teacher. Third, they do not

indicate a student’s specific strengths and weaknesses in learning. In order to reduce the

effects of these limitations, grades for effort should be eliminated. If such effort scores are to

be recorded, they should exist as a separate grade from letter grades for achievement. Next, a

standard school- or district-wide standard should exist for what constitutes a given letter

grade. Finally, letter grades should be accompanied by another grading system, such as

checklists, that outlines students’ strengths and weaknesses

26. Some schools have turned to the use of letters to provide for greater flexibility in reporting

student progress to parents (or guardians). Letters make it possible to report on the unique

strengths, weaknesses, and learning needs of each student and to suggest specific plans for

improvement. In addition, the letter/report can include as much detail as needed to make

clear the student’s progress in all areas of development. However, these letters should not be

the sole method of grading and if used, should be combined with other methods such as letter

grades.

170

Chapter 16 Achievement Tests

Exercise 16-A

STANDARDIZED ACHIEVEMENT TESTS VERSUS INFORMAL CLASSROOM

TESTS

LEARNING GOAL: Identifies the comparative advantages of standardized and informal

classroom tests for measuring student achievement.

Directions: Indicate whether each of the following statements best describes a standardized

achievement test (S) or an informal classroom test (C) by circling the appropriate letter.

S C 1. Likely to be more relevant to a teacher's instructional objectives.

S C 2. Likely to provide more reliable test scores.

S C 3. Technical quality of test items is consistently high.

S C 4. Most useful in formative assessment.

S C 5. Typically provides the larger spread of scores.

S C 6. Best for use in rapidly changing content areas.

LEARNING GOAL: States a major advantage and limitation of standardized achievement tests.

Directions: Briefly state one major advantage and one major limitation of standardized

achievement tests.

Advantage:

Limitation:

Note: Answers will vary.

171

Exercise 16-B

USE OF PUBLISHED ACHIEVEMENT TEST BATTERIES

LEARNING GOAL: Selects the most appropriate type of achievement test battery for a

particular use.

Directions: Indicate which type of test is most useful for each of the following testing purposes

by circling the appropriate letter using the following key.

KEY S = Survey achievement test battery

D = Diagnostic achievement test battery

S D 1. To compare schools on basic skill development.

S D 2. To describe the specific skills a student has yet to learn in reading.

S D 3. To measure achievement in science and social studies.

S D 4. To detect specific weaknesses in adding fractions.

S D 5. To determine how a fifth-grade class compares to other fifth grade classes in

reading.

S D 6. To determine mastery of particular language skills.

LEARNING GOAL: States a major advantage and limitation of achievement test batteries.

Directions: Briefly state one major advantage and one major limitation of achievement test

batteries of the survey type.

Advantage:

Limitation:

Note: Answers will vary.

172

Exercise 16-C

COMPARISON OF READING READINESS TESTS AND READING SURVEY TESTS

LEARNING GOAL: Identifies the functions measured by different types of reading tests.

Directions: Indicate whether each of the functions listed below is measured by reading readiness

tests (R), by reading survey tests (S), by both (B), or by neither (N), by circling the appropriate

letter.

R S B N 1. Auditory discrimination.

R S B N 2. Comprehension of the meaning of words.

R S B N 3. Ability to draw inferences.

R S B N 4. Ability to read maps.

R S B N 5. Rate of reading.

R S B N 6. Attitude toward reading.

LEARNING GOAL: Compares reading tests.

Directions: Compare two reading survey tests (or readiness tests) for a particular grade level. (1)

Briefly describe how the two tests differ, and (2) indicate which one you would prefer to use for

a particular purpose and why.

Note: Answers will vary.

173

Exercise 16-D

COMPARISON OF STANDARDIZED AND CUSTOMIZED ACHIEVEMENT TESTS

LEARNING GOAL: Distinguishes among the characteristics of different types of achievement

tests.

Directions: Indicate which type of test best fits each feature listed below by circling the

appropriate letter, using the following key.

KEY: S = Standardized achievement tests

C = Customized achievement tests

S C 1. Are most useful to the classroom teacher.

S C 2. Have the greatest need for adequate norms.

S C 3. Most adaptable to changing conditions.

S C 4. Most likely to have some content the students have not studied.

S C 5. Best for making criterion-referenced interpretations.

S C 6. Likely to provide the most valid measure of local instructional objectives.

LEARNING GOAL: Describes the procedure for producing customized achievement tests.

Directions: List and briefly describe the procedural steps to follow in producing locally prepared

customized achievement tests.

Note: Answers will vary.

174

Exercise 16-E

SELECTING PUBLISHED ACHIEVEMENT TESTS

LEARNING GOAL: Selects the type of test that is most appropriate for a particular purpose.

Directions: For each of the following statements, indicate which type of test would be used by

circling the appropriate letter using the following key.

KEY: A = Achievement Test Battery, B = Separate Test of Content,

C = Customized Achievement Test, D = Individual Achievement Test

A B C D 1. To test a student’s mastery of classroom objectives.

A B C D 2. To test a student who has a learning disability.

A B C D 3. To compare a student's performance in reading and mathematics.

A B C D 4. To test students at the end of each unit.

A B C D 5. Give a science test to a child who has difficulty in reading.

A B C D 6. To measure student progress from one grade level to the next.

LEARNING GOAL: Compares the usefulness of customized tests and standardized tests.

Directions: State one advantage and one disadvantage of using a customized test instead of a

standardized test to measure student achievement.

Advantage:

Disadvantage:

Note: Answers will vary.

175

Answers to Student Exercises

16-A 16-B 16-C 16-D 16-E

1. C 1. S 1. R 1. C 1. C

2. S 2. D 2. R 2. S 2. D

3. S 3. S 3. S 3. C 3. A

4. C 4. D 4. S 4. S 4. D

5. S 5. S 5. S 5. C 5. D

6. C 6. S 6. N 6. C 6. A

176

Chapter 16

Achievement Tests

1. Which of the following should be the first consideration when selecting a published

achievement test?

A. cost

B. interpretability

C. reliability

D. validity

2. Which of the following factors should be given the most weight in selecting a published

achievement test battery?

A. Cost of administration and scoring.

B. Equivalence of the various forms.

C. Relevance to local objectives.

D. Reliability of the tests and subtests.

3. The validity of a published test used to measure student achievement at the end of a specific

school science course can best be determined by

A. curriculum experts in science.

B. test experts with a science background.

C. the teacher of the course.

D. the test publisher.

4. A standardized achievement test differs most from a teacher-made objective test in which of

the following areas?

A. Arrangement of question types.

B. Known reliability of items.

C. Objectivity of scoring.

D. Level of difficulty.

5. One advantage of a teacher-made test over a standardized achievement test is that the

teacher-made test has greater

A. interpretability.

B. objectivity.

C. relevance.

D. reliability.

6. One advantage of a standardized test over a teacher-made test is that the standardized test has

greater

A. known technical quality.

B. flexibility.

C. overall objectivity.

D. relevance.

177

7. The validity of a standardized achievement test to be selected for classroom use can best be

determined by examining the test’s

A. directions.

B. items.

C. norms.

D. reliability.

8. The main advantage of using an achievement test battery, instead of a series of separate

achievement tests covering the same areas, is that the subtests in the achievement test battery

have

A. comparable norms.

B. higher reliability.

C. higher validity.

D. more items.

9. Which of the following should be determined first when evaluating a standardized

achievement test battery?

A. how reliability is reported for the tests

B. the content that the tests measure

C. whether or not comparable forms are available

D. whether the norms are adequate

10. An essential characteristic of a test battery is that each test can be interpreted in terms of the

same

A. content.

B. norm group.

C. objectives.

D. type of item.

11. A standardized achievement test is best used for which of the following purposes?

A. assigning grades

B. comparing achievement in several schools

C. evaluating a school's objectives

D. tabulating the achievements of each student in a classroom

12. Achievement test batteries are not used as often at the high school level compared to the

elementary school level for which of the following reasons?

A. because course content varies more at the high school level

B. high school courses are more difficult

C. teachers have greater test construction skills at the high school level

D. teachers in high school assign more homework

178

13. When making criterion-referenced interpretations of standardized achievement test results,

educators should pay special attention to the number of items included in

A. each item cluster.

B. each subtest.

C. the total test.

D. the criterion used for the standard.

14. A survey achievement test battery would be least useful for determining a student's

A. achievement in different areas.

B. progress from year to year.

C. relative level of performance.

D. specific learning weaknesses.

15. One advantage of a diagnostic achievement test over a survey battery is that a diagnostic

achievement test includes

A. better norms.

B. clearer directions.

C. more items.

D. simpler scoring.

16. A reading readiness test is best used to identify which of the following?

A. pre-requisite skills in students

B. children who are nonreaders

C. students with visual defects

D. academically gifted children

17. Reading readiness tests place major emphasis on which of the following skills?

A. finger dexterity and motor

B. eye movements and motivation

C. recognition and discrimination

D. social and emotional adjustment

18. Achievement tests of the survey type are not very effective for diagnosing learning problems

because of inadequate

A. sampling.

B. selection of norm groups.

C. standardization.

D. preparation.

19. One advantage of customized achievement tests over standardized tests is that customized

achievement tests

A. are more readily adapted.

B. are based on more adequate norms.

C. contain more adequate rules for administration.

D. provide more reliable test scores.

179

20. Which of the following is likely to produce the most valid measure of classroom learning?

A. Achievement battery.

B. Locally prepared customized test.

C. Publisher-prepared customized test.

D. Single-content standardized achievement test.

21. An individual achievement test is usually used with children who have disabilities.

True

False

22. When giving a standardized achievement test in science, a teacher should be aware of any

gaps between the reading level of the tests and students’ reading problems.

Agree

Disagree

23. One problem in using a customized test bank is the issues of low reliability.

True

False

24. Standardized test scores usually leave rules of test administration up to the teacher.

Agree

Disagree

25. Discuss the major differences between a standardized test battery and a single content

standardized test? Discuss two advantages and two limitations of each.

26. In what types of situations would a standardized test battery, a single-content standardized

test, a diagnostic teacher-made test, and a standardized individual achievement test be used?

180

Chapter 16: Answer Key

1. D

2. C

3. C

4. B

5. C

6. A

7. B

8. A

9. B

10. B

11. B

12. A

13. A

14. D

15. C

16. A

17. C

18. A

19. A

20. D

21. True

22. Agree

23. True

24. Disagree

25. Standardized test batteries measure several curricular areas with the same battery For

example, a battery might measure reading, mathematics written expression and spelling. A

single-content standardized test measures only one content area (e.g. reading). An advantage

of the test battery is that a teacher does not have to purchase separate tests for each content

area tested. Another advantage is that the norm group is comparable in a battery through

assessment of the curricular areas. A disadvantage is that, because it asks fewer questions at

each content area, it may be less reliable and diagnostic than the single-content test.

Conversely, norm groups are not comparable across single content tests and hence

meaningful comparisons cannot be made across content areas. However, single content area

tests are more useful for diagnostic purposes.

26. A standardized test battery is probably the best when comparable results are needed across

subject areas for given schools or school districts within a state. Single content tests are best

when more diagnostic information is needed and when teachers wish to see if specific

content has been mastered in a given content area. Individual standardized diagnostic tests

are usually used to test the achievement of poor readers or students with disabilities.

181

Chapter 17 Aptitude Tests

Exercise 17-A

COMPARISON OF APTITUDE AND ACHIEVEMENT TESTS

LEARNING GOAL: Identifies the similarities and differences in the characteristics of aptitude

and achievement tests.

Directions: Indicate whether each of the following statements is characteristic of an aptitude (P)

test, an achievement (C) test, or both (B) types of tests by circling the appropriate letter.

P C B 1. Measures learned ability.

P C B 2. Useful in predicting future achievement.

P C B 3. Content-related evidence of validity is emphasized.

P C B 4. Criterion-related evidence of validity is emphasized.

P C B 5. Can be used in grades from kindergarten through grade 12.

P C B 6. Emphasizes reasoning abilities.

LEARNING GOAL: Lists the major differences between aptitude and achievement tests.

Directions: List the major differences between aptitude tests and achievement tests.

Note: Answers will vary.

182

Exercise 17-B

GROUP TESTS OF LEARNING ABILITY

LEARNING GOAL: Identifies the types of scores provided by selected group tests.

Directions: Indicate the types of scores provided by each of the group tests listed below by

circling the appropriate letter using the following key:

KEY: A = single score, B = verbal and quantitative scores only,

C = verbal, nonverbal and total scores only,

D = verbal, quantitative, and nonverbal scores, E = more than three scores.

A B C D E 1. Cognitive Abilities Test.

A B C D E 2. Differential Aptitude Tests.

A B C D E 3. Matrix Analogies Test.

A B C D E 4. Otis-Lennon School Ability Test.

LEARNING GOAL: States advantages and disadvantages of using different types of learning

ability tests.

Directions: Briefly state one advantage and one disadvantage of each type of group test of

learning ability.

Single score.

Separate-scores (verbal, nonverbal, quantitative).

Note: Answers will vary.

183

Exercise 17-C

INDIVIDUAL TESTS

LEARNING GOAL: Identifies the similarities and differences in the characteristics of individual

tests.

Directions: Indicate whether each of the following statements is characteristic of the Stanford-

Binet Intelligence Scale (S), the Wechsler Intelligence Scales-Revised (W), or both (B).

S W B 1. Uses a variety of item types.

S W B 2. Items are arranged by subtest.

S W B 3. Includes a vocabulary test.

S W B 4. Provides separate verbal and performance IQs.

S W B 5. Scores are reported in Standard Age Scores.

S W B 6. Provides total scores and scores on subtests.

LEARNING GOAL: List conditions that might lower scores on tests of learning abilities.

Directions: List five conditions that might lower a student's score on a test of learning ability.

1.

2.

3.

4.

5.

Note: Answers will vary.

184

Exercise 17-D

DIFFERENTIAL APTITUDE TESTING

LEARNING GOAL: Identifies the characteristics of the Differential Aptitude Tests (DAT).

Directions: Indicate whether each of the following statements is characteristic of the Differential

Aptitude Tests (DAT) by circling yes (if it is) and no (if it is not).

Yes No 1. The DAT would be classified as a test battery.

Yes No 2. The intercorrelations between subtests on the DAT are high (average about .90).

Yes No 3. Some of the DAT subtests measure abilities like those measured by group

scholastic aptitude tests.

Yes No 4. The DAT profile indicates scores in terms of percentile rank.

Yes No 5. The DAT can be administered “adaptively.”

Yes No 6. The eight tests on the DAT are speed tests.

LEARNING GOAL: States a major advantage and limitation of the Differential Aptitude Tests.

Directions: Briefly state one major advantage and one major limitation of using the Differential

Aptitude Tests instead of a series of separate tests from different publishers.

Advantage of DAT:

Limitation of DAT:

Note: Answers will vary.

185

Exercise 17-E

SELECTING APPROPRIATE TESTS

LEARNING GOAL: Selects the type of test that is most appropriate for a particular use.

Directions: For each of the following purposes indicate which type of test should be used by

circling the appropriate letter using the following key:

KEY: G = Group test of learning ability, I = Individual test of learning ability,

D = Differential aptitude tests.

G I D 1. To test a preschool child.

G I D 2. To test a fourth-grade student who is unable to speak.

G I D 3. To test a sixth-grade student who has a severe learning disability.

G I D 4. To assist a tenth-grade student with career planning.

G I D 5. To aid in forming learning groups within the classroom.

G I D 6. To aid in planning an individual program for students with severe learning

disabilities.

LEARNING GOAL: Compares the usefulness of culture-fair test and conventional test of

learning ability.

Directions: State one advantage and one disadvantage of using a culture-fair test instead of a

conventional learning ability test for testing students from disadvantaged homes.

Advantage:

Disadvantage:

Note: Answers will vary.

186

Answers to Student Exercises

17-A 17-B 17-C 17-D 17-E

1. B 1. D 1. B 1. Y 1. I

2. B 2. E 2. B 2. N 2. I

3. C 3. A 3. B 3. Y 3. I

4. P 4. C 4. W 4. Y 4. G

5. B 5. S 5. Y 5. D

6. P 6. B 6. N 6. D

187

Chapter 17

Aptitude Tests

1. Tests of learning ability differ from published achievement tests in that tests of learning

ability

A. are useful in predicting future achievement.

B. depend less on specific school learning.

C. measure school objectives more effectively.

D. provide norms for score interpretation.

2. One advantage of a learning ability test over an achievement test for predicting achievement

is that a learning ability test

A. can be used before instruction has been given.

B. provides more reliable scores.

C. measures a broader range of course content.

D. measures only innate learning potential.

3. In the spectrum of ability tests, which of the following test types would be most different

from the content-oriented achievement test?

A. a nonverbal test

B. a school-oriented aptitude test

C. a test of general educational development

D. a verbal ability test

4. Scholastic aptitude tests are best interpreted as measures of which of the following?

A. fixed learning capacity

B. mastery of the school’s curriculum

C. present learning ability

D. recent exposure to course content

5. Which of the following tests provides the widest array of scores?

A. Cognitive Abilities Test.

B. Differential Abilities Tests.

C. Otis-Lennon Ability Test.

D. School and College Ability Tests.

6. Standard age scores (SAS), used on some group and individual tests has a mean of

A. 10

B. 16.

C. 50.

D. 100.

188

7. Which of the following is an advantage of a learning ability test with verbal and nonverbal

scores?

A. a check on the poor reader is provided.

B. differential prediction is made possible.

C. scoring is made easier.

D. test administration is simplified.

8. Scholastic aptitude tests used for purposes like college admission should be interpreted as

measures of

A. inherited ability.

B. innate learning ability.

C. potential for future development.

D. present developed ability.

9. The fourth edition of the Stanford-Binet Intelligence Scale is arranged by

A. age levels.

B. cognitive areas and subtests.

C. spiral omnibus pattern.

D. verbal and performance tests.

10. The Stanford-Binet differs from the WISC-R in that the Stanford-Binet uses

A. individual administration.

B. standard age scores.

C. separate subtests.

D. verbal and performance tests.

11. In comparison to a group test of learning ability, the Stanford-Binet provides more

A. objective results.

B. observational information.

C. restriction on test responses.

D. use of standard scores.

12. The Stanford-Binet has a score with a standard deviation of 16. In a normally distributed

population of ten-year-old children approximately two-thirds of the cases will fall between

A. 68 and 100.

B. 68 and 132.

C. 84 and 116.

D. 100 and 132.

13. The Wechsler Intelligence Scales differ from the Stanford-Binet in that the Wechsler

Intelligence Scales are

A. arranged by age levels rather than subtests.

B. suitable for group administration.

C. made up of multiple subtests.

D. evaluated with separate verbal and performance scale scores.

189

14. If a student’s standard age score drops from 90 in the fifth grade to 85 in the sixth grade, the

score difference is most likely due to which of the following?

A. inadequate learning opportunities in sixth grade

B. lack of motivation

C. some type of emotional problem

D. the errors of measurement

15. Carl, a third grade student, received a standard age score of 66 on a group test of scholastic

aptitude. Based on this score, his teacher should recommend that Carl be

A. continued in the third grade.

B. given an individual ability test.

C. moved back to the second grade.

D. placed in a class for academically gifted students.

16. Culture-fair testing typically uses materials that are

A. common in many cultures.

B. free of cultural influences.

C. indicators of innate abilities.

D. most familiar to members of minority groups.

17. The Differential Aptitude Tests can be used to compare student's scores in different aptitude

areas because the subtests are

A. highly intercorrelated.

B. speeded tests.

C. standardized on the same group.

D. valid and reliable.

18. If students asked if they could improve their aptitudes as measured by the DAT, which of the

following would be the most appropriate response for an educator?

A. All of these aptitudes can be readily modified.

B. Aptitudes are fixed traits that cannot be modified.

C. Aptitudes are seldom modified by training.

D. Some of these aptitudes can be improved more readily than others.

19. One advantage of the computerized adaptive edition of a test such as the Differential

Aptitude Tests over the paper-and-pencil version is that

A. a profile of scores can be obtained.

B. speed of response is used in scoring.

C. students can complete more items.

D. test results can be obtained sooner.

20. Many tests that profess to be culturally fair are nonverbal or pictorial.

True

False

190

21. Most learning aptitude tests given to children with disabilities are group tests.

Agree

Disagree

22. Aptitude tests measure potential for learning while achievement tests measure learned

material.

True

False

23. The Stanford-Binet and the Wechsler Scales may be given by any teacher who has read the

manual.

Agree

Disagree

24. Most if not all students eventually are administered an individual test of learning ability.

True

False

25. Discuss two advantages and two disadvantages of group and individual learning ability tests?

26. Discuss some of the issues and concerns that culture fair tests of learning ability try to

address?

191

Chapter 17: Answer Key

1. B

2. A

3. A

4. C

5. B

6. D

7. A

8. D

9. A

10. B

11. B

12. C

13. D

14. D

15. B

16. A

17. C

18. D

19. D

20. True

21. Disagree

22. False

23. Disagree

24. False

25. Group tests possess time and economic efficiency. They can be given to a number of students

at once. As such, they are usually given more often to the average student than those

suspected of possessing a disability. Group tests are usually be administered by an individual

familiar with administration and scoring. They are less diagnostic than individual tests and

are more dependent on student reading ability. Individual tests are given in a one-on-one

setting. Hence they are more expensive and time consuming to give. These tests however

allow for more follow-up questions and behavioral observations than group tests. They

depend less on reading than group tests. However, they usually can be administered and

scored only by a licensed school psychologist.

27. The assumption behind culture fair tests is that all cultures do not look at intelligence or

intelligent behavior the same way. What might be considered intelligent behavior in one

culture may not be viewed that way in another. Also, cultures differ in the complexities of

their language and how members interpret concept. Thus culture fair tests try as much as

possible to incorporate universal or near universal concepts that occur inmost cultures. They

try to mitigate the effects of language by asking questions that measure nonverbal attributes.

192

Chapter 18 Test Selection, Administration, and Use

Exercise 18-A

SOURCES OF INFORMATION ON PUBLISHED TESTS

LEARNING GOAL: Identifies the most useful source of information for a given situation.

Directions: Below is a list of four sources of information concerning published tests. For each of

the statements following the list, indicate the source of information that should be consulted first

by circling the appropriate letter.

KEY A = Mental Measurements Yearbooks, B = Professional journals,

C = Test manual, D = Test publisher's catalog.

A B C D 1. To find information about cost of tests and scoring.

A B C D 2. To obtain critical reviews of a published test.

A B C D 3. To find out how a particular published test was constructed.

A B C D 4. To locate the most recent research studies using a particular test.

A B C D 5. To determine if any tests of study skills have been published.

A B C D 6. To determine the type of norms used in a published test.

LEARNING GOAL: Summarizes the purpose and content of the Standards for Educational and

Psychological Testing.

Directions: Briefly describe the purpose and content of the Standards for Educational and

Psychological Testing.

Purpose:

Content:

Note: Answers will vary.

193

Exercise 18-B

EVALUATING AN ACHIEVEMENT TEST

LEARNING GOAL: Evaluates a test using a test evaluation form.

Directions: Select an achievement test at the grade level of your choice and obtain a copy of the

test, the manual, and other accessory material; your instructor can help you with this. Study the

test materials, consult the reviews in the latest Mental Measurements Yearbook (MMY), and

write your evaluation using the following test evaluation form. Be brief and include only the

most essential information.

TEST EVALUATION FORM

Test title _____________________ Author(s)_____________________________

Publisher ______________________ Copyright date(s) _____________________

Purpose of test ________________________________________________________

For grades (ages)_______________ Forms _________________________________

Scores available _______________ Method of scoring _____________________

Administration time_____________ Time(s) of parts ______________________

Validity (cite manual pages) __________. Summarize evidence below.

Content considerations:

Test-criterion relationships:

Construct considerations:

Evidence regarding consequences of use:

194

TEST EVALUATION FORM, CONTINUED

Reliability (cite manual page numbers) _______. Summarize data below.

Age or Grade Type of Range of Number Range of

reliability reliabilities Tested reliabilities

(Total test) (Part scores)

Standard errors of measurement ____________ _____________

Norms (cite manual page numbers) ___________. Summarize data below.

Type (e.g., percentile rank):

Groups (size, age, or grade):

Separate norms (e.g., type of district):

Criterion-referenced interpretation

Describe (if available):

Practical features

Ease of administration:

Ease of scoring:

Ease of interpretation:

Adequacy of manual and materials:

Comments of reviewers (See MMY)

Summary Evaluation

Advantages:

Limitations:

Note: Answers will vary.

195

Exercise 18-C

EVALUATING AN ABILITY TEST

LEARNING GOAL: Evaluates a test using test evaluation form.

Directions: Select an aptitude test at the grade level of your choice and obtain a copy of the test,

the manual, and other accessory material; your instructor can help you with this. Study the test

materials, consult the reviews in the latest Mental Measurements Yearbook (MMY), and write

your evaluation using the following test evaluation form. Be brief and include only the most

essential information.

TEST EVALUATION FORM

Test title _____________________ Author(s)_____________________________

Publisher ______________________ Copyright date(s) _____________________

Purpose of test ________________________________________________________

For grades (ages)_______________ Forms _________________________________

Scores available _______________ Method of scoring _____________________

Administration time_____________ Time(s) of parts ______________________

Validity (cite manual pages) __________. Summarize evidence below.

Content considerations:

Test-criterion relationships:

Construct considerations:

Evidence regarding consequences of use:

196

TEST EVALUATION FORM, CONTINUED

Reliability (cite manual page numbers) _______. Summarize data below.

Age or Grade Type of Range of Number Range of

reliability reliabilities Tested reliabilities

(Total test) (Part scores)

Standard errors of measurement ____________ _____________

Norms (cite manual page numbers) ___________. Summarize data below.

Type (e.g., percentile rank):

Groups (size, age, or grade):

Separate norms (e.g., type of district):

Criterion-referenced interpretation

Describe (if available):

Practical features

Ease of administration:

Ease of scoring:

Ease of interpretation:

Adequacy of manual and materials:

Comments of reviewers (See MMY)

Summary Evaluation

Advantages:

Limitations:

Note: Answers will vary.

197

Exercise 18-D

ADMINISTERING PUBLISHED TESTS

LEARNING GOAL: Distinguishes between good and bad practices in administering published

tests.

Directions: Indicate whether each of the following statements describes a good (G) practice or a

bad (B) practice in administering a published test by circling the appropriate letter.

G B 1. Read the directions word for word.

G B 2. Give students extra time if there was an interruption during testing.

G B 3. Walk around the room and point out to students where they made silly answers.

G B 4. Tell students what to do about guessing if the directions failed to include them.

G B 5. If asked about a particular item, tell the student: “I’m sorry but I cannot help you.

Do the best you can.”

G B 6. Record any unusual student behavior during testing.

LEARNING GOAL: Describes a procedure for improving students’ test-taking skills.

Directions: Briefly describe an ethical procedure that a classroom teacher might follow for

improving students’ test-taking skills.

Note: Directions will vary.

198

Exercise 18-E

USES OF PUBLISHED TESTS

LEARNING GOAL: Distinguishes between correct and incorrect statements concerning uses of

published tests.

Directions: Indicate whether test specialists would agree (A) or disagree (D) with each of the

following statements concerning test use by circling the appropriate letter.

A D 1. Published achievement tests are most useful in the areas of basic skills.

A D 2. Published achievement tests need not match the instructional objectives of

the school.

A D 3. The best index of underachievement is a relatively large difference between the

scores of learning ability tests and achievement tests.

A D 4. Norm-referenced achievement tests are especially useful for

individualizing instruction.

A D 5. Course grades are more valid when based on scores from published

achievement tests.

A D 6. No important educational decision should be based on the scores of

published tests alone.

LEARNING GOAL: Lists misuses of published tests.

Directions: List as many ways as you can think of that published test results might be misused.

Use brief concise statements.

Note: Answers will vary.

199

Answers to Student Exercises

18-A 18-D 18-E

1. D 1. G 1. A

2. A 2. B 2. D

3. C 3. B 3. D

4. B 4. B 4. D

5. A 5. G 5. D

6. C 6. G 6. A

200

Chapter 18

Test Selection, Administration, and Use

1. Tests in Print should be consulted by educators when they are seeking

A. a comprehensive list of published tests.

B. newly created test blueprints.

C. test reviews.

D. validity and reliability data.

2. The Mental Measurements Yearbooks are best known for which of the following contents?

A. annual reports on standardized assessments

B. test descriptions

C. test reviews

D. well organized technical information

3. Which of the following publications would be most appropriate for educators to consult for

descriptions of the latest editions of a test?

A. Test in Print

B. Test critique electronic files

C. Mental Measurements Yearbooks

D. Test publishers’ catalogues

4. Which of the following publications would be most appropriate for educators to consult for

information that would be most helpful in evaluating a test manual?

A. Standards for Educational and Psychological Testing

B. test critique electronic files

C. Tests in Print

D. test publishers’ catalogues

5. Which of the following publications would be most appropriate for educators to consult for

information that provides guidance about the responsibilities of test developers and test users

for informing test takers about tests?

A. Code of Fair Testing Practices.

B. Mental Measurements Yearbooks.

C. Test Critiques.

D. Test Publishers’ technical manuals.

6. Which of the following can most adequately be determined from the description of an

achievement test in a test publisher's catalogue?

A. readability

B. reliability

C. usability

D. validity

201

7. Which of the following is the first consideration in selecting published achievement tests?

A. Availability of comparable forms

B. Cost of the tests

C. Ease of administration

D. Relevance to local objectives

8. To obtain information concerning the issue of whether a published achievement test is valid

for students, it is best for educators to

A. compare the items to local curriculum goals.

B. examine item analysis data in the test manual.

C. examine reliability data in the test manual.

D. make a local test-retest study of the scores.

9. In order for educators to obtain information concerning the validity of a test of learning

ability, it would be best to first examine the test manuals’

A. item analysis data.

B. predictive studies.

C. reliability data.

D. standardization studies.

10. Changing the instructions when administering a standardized achievement test will probably

have the greatest influence on the test scores'

A. interpretability.

B. objectivity.

C. reliability.

D. relevance to local objectives.

11. Publishers attempt to reduce the influence of test-taking skills by doing which of the

following?

A. not providing alternative test forms

B. providing practice tests

C. using multiple-choice items

D. using special scoring formulas

12. Published achievement tests are probably most useful to classroom teachers for which of the

following reasons?

A. diagnosing strengths and weaknesses

B. evaluating teaching

C. grading students

D. reporting to parents

202

13. Measuring educational progress over several grade levels with published tests is most

feasible when measuring which of the following?

A. basic skills

B. critical thinking skills

C. science

D. social studies

14. Published achievement tests are least useful in which of the following situations?

A. curriculum planning

B. grade assignment

C. grouping students

D. monitoring educational progress

15. Standardized achievement tests are inadequate for evaluating teaching effectiveness because

they typically have

A. inadequate norms.

B. inappropriate item difficulty.

C. low reliability.

D. low relevance to local objectives.

16. A standardized published achievement test is one useful tool in diagnosing learning

disabilities.

True

False

17. It is probably best not to report the results of published tests to parents because they do not

have the necessary statistical background.

Agree

Disagree

18 Using published test results to retain teachers or give merit raises is a fair and equitable

method.

True

False

19. One reason that teachers teach to the tests is that they feel they are being evaluated on the

basis of their students’ test results.

Agree

Disagree

20. Discuss some things that a teacher should ensure when administering a published test, so as

not to invalidate the results or interpretability?

21. What are three legitimate uses of published test information? What are three inappropriate

uses?

203

Chapter 18: Answer Key

1. A

2. C

3. D

4. A

5. A

6. C

7. D

8. A

9. B

10. A

11. B

12. A

13. A

14. B

15. D

16. True

17. Disagree

18. False

19. Agree

20. Teachers can make sure that they are not invalidating the test by following the administration

rules. Perhaps the greatest thing they can do is shift their thinking from teacher/her role to

test administrator role. Other things they can do include trying to motivate students to do

their best, strictly following all test administration rules, keeping accurate time, and not

giving students extra time. Still other strategies include recording any significant events

during test administration that may influence test results, and collecting all test materials

promptly at the end of the test.

21. Perhaps the best use of published tests results is in instructional planning. Knowing how

students did in a class, school, or school system can help educators make instructional

modifications for the future. A second good use of test results is in reporting the results to

parents in conferences. This information helps reinforce the teacher’s message about the

student reaching his or her learning goals. Finally, test results can be helpful in diagnosing or

qualifying children for special education services or in identification of a learning disability.

However, no published test should be used alone to accomplish any of these goals. In

appropriate uses of published tests include using test results to give a student grades for a

semester or marking period, using tests to evaluate teaching effectiveness, or assigning

students to a remedial track or retaining them in a grade for the following academic year.

204

Chapter 19 Interpreting Test Scores and Norms

Exercise 19-A

USES CRITERION-REFERENCED AND NORM-REFERENCED INTERPRETATIONS

LEARNING GOAL: Relates test interpretation to the type of information needed.

Directions: For each of the following questions, indicate whether a criterion-referenced (C) or

a norm-referenced (N) interpretation would be more useful by circling the appropriate letter.

C N 1. How does a student’s test performance compare to that of other students in

the same grade?

C N 2. What type of remedial work would be most helpful for a student struggling

academically?

C N 3. Has a student met state-mandated learning goals?

C N 4. Which students’ test performances exceed those of 90 percent of their

classmates?

C N 5. Which students have achievement mastery of computational skills?

C N 6. How does student test performance in our school compare with that of other

schools?

LEARNING GOAL: Describes the cautions to keep in mind when making criterion-referenced

interpretations of tests designed for norm-referenced use.

Directions: List and briefly describe several factors to consider when using criterion-referenced

interpretations with norm-referenced survey tests.

Notes: Answers will vary.

205

Exercise 19-B

NATURE OF DERIVED SCORES

LEARNING GOAL: Distinguishes among the characteristics of different types of derived scores.

Directions: Indicate which type of derived score is described by each statement listed below by

circling the appropriate letter. Use the following key.

KEY: G = grade equivalent scores, P = percentile rank, S = standard scores.

G P S 1. Provides units that are based on the average score earned in different groups.

G P S 2. Provides units that are systematically unequal.

G P S 3. Provides units that are most nearly equal.

G P S 4. Provides units that are most meaningful when interpreted with reference to

normal curves.

G P S 5. Provides units that are most meaningful at the elementary school level.

G P S 6. Provides units that are easily interpreted and typically compare students

with their own age group.

LEARNING GOAL: States the advantages and limitations of derived scores.

Directions: Briefly state one advantage and one disadvantage of percentile ranks and standard

scores.

Percentile Ranks

Advantage:

Disadvantage:

Standard Scores

Advantage:

Disadvantage:

Note: Answers will vary.

206

Exercise 19-C

GRADE EQUIVALENT SCORES

LEARNING GOAL: Distinguishes between appropriate and inappropriate interpretations of

grade equivalent scores.

Directions: Indicate whether each of the following interpretations of grade equivalent (GE)

scores are appropriate (A) or inappropriate (I) by circling the appropriate letter.

A I 1. A student who obtained a GE score of 3.1 in the spring of grade 4 would be

expected to obtain a GE score of 4.1 in the spring of grade 5.

A I 2. A student in grade 4 who obtained a GE score of 4.7 in April scored higher

than about half the students in the grade 4 norm group.

A I 3. A student with a GE score of 5.3 in reading and a GE score of 6.1 in math is

performing better in math than in reading.

A I 4. A student who has GE scores in all subjects that are more than 1.5 above grade

placement should probably be skipped to the next grade.

A I 5. A GE score of 11.0 for a grade 6 student indicates that the student did

exceptionally well on the 6th grade content, but not that he or she could do

grade 11 work.

A I 6. One year gains in GE scores for two students of 3.0 to 4.5 for one and 5.0 to 6.5

for the other indicate equivalent amounts of progress.

LEARNING GOAL: State the advantages and limitations of grade-equivalent scores.

Directions: Briefly state some of the major advantages and major limitations of grade equivalent

scores.

Advantages:

Limitations:

Note: Answers will vary.

207

Exercise 19-D

RELATIONSHIP OF DIFFERENT SCORING SYSTEMS

LEARNING GOAL: Converts scores from one scoring system to others.

Directions: Complete the following table by converting the given scores into comparable derived

scores in the other scoring systems. Round your answers to the nearest whole number, except for

z-scores where one decimal place should be reported. Assume that all score distributions are

normal and based on a common reference group. The first row of scores, form a given z-score of

1.0 has been completed to illustrate the procedure. Try to complete the exercise without looking

at the table in your book.

________________________________________________________________________

Standard Age

Score Percentile

z-Score T-Score (SD = 16) Stanine Rank

________________________________________________________________________

1.0 60 116 7 84

________________________________________________________________________

–1.0

________________________________________________________________________

0.5

________________________________________________________________________

–0.5

________________________________________________________________________

35

________________________________________________________________________

70

________________________________________________________________________

100

________________________________________________________________________

124

________________________________________________________________________

0.7

________________________________________________________________________

–1.3

________________________________________________________________________

LEARNING GOAL: Explains the value of using score bands on test profiles.

Directions: Explain why it is desirable to plot scores on a test profile as score bands instead of

specific score points.

Note: Answers will vary.

208

Exercise 19-E

INTERPRETATIONS OF SCORES ON PUBLISHED TESTS

LEARNING GOAL: Distinguishes between appropriate and inappropriate interpretations of

scores on published tests.

Directions: Indicate whether test specialists would agree (A) or disagree (D) with each of the

following statements about interpretations of scores on published tests by circling the appropriate

letter.

A D 1. Percentile ranks of tests of the same subject can be used interchangeably.

A D 2. The percentile rank score does not require an assumption of a normal

distribution.

A D 3. Information about a student's previous educational experiences and

language background is used in interpreting test scores.

A D 4. A band that extends one standard error of measurement above and below a

student's observed score helps guard against overly precise interpretations.

A D 5. Because the units on a grade equivalent scale are approximately equal, the

difference between 4.0 and 5.0 can be treated as equivalent to that between

8.0 and 9.0.

A D 6. Before making an important decision based on a test score, the interpretation

should be verified by other evidence.

LEARNING GOAL: States the advantages and limitations of using local norms.

Directions: Briefly state the advantages and limitations of using local norms to interpret test

performance.

Advantages:

Limitations:

Note: Answers will vary.

209

Answers to Student Exercises

19-A 19-B 19-C 19-E

1. N 1. G 1. I 1. D

2. C 2. P 2. A 2. A

3. N 3. S 3. I 3. A

4. N 4. S 4. I 4. A

5. C 5. G 5. A 5. D

6. N 6. P 6. I 6. A

210

Chapter 19

Interpreting Test Scores and Norms

1. A student in grade 6 earned a raw score of 50 on an achievement test. Which of the following

interpretations is most justified?

A. The grade equivalent score is 6.0.

B. The percentage-correct score is 50.

C. The percentile score is 50.

D. The score is interpreted without more information.

2. Brent earned a score of 91 on a 100-item achievement test. Which of the following

interpretations is most justified?

A. He is in the top half of the group.

B. His percentile score is 91.

C. His stanine score is 8.

D. His percentage-correct score is 91.

3. Which of the following best illustrates a criterion-referenced interpretation?

A. Comparing test performance with that of others.

B. Comparing performance on two different tests.

C. Describing the nature of the individual's performance.

D. Evaluating the correlation of test scores with a criterion.

4. When making criterion-referenced interpretations of standardized tests, educators must be

sure that

A. the total test is reliable.

B. the norms are adequate.

C. there are enough items for each interpretation.

D. there are provisions for using percentile scores.

5. Which of the following best describes test norms?

A. actual performance of representative groups

B. desired performance based on expert judgment

C. standards set by testing selected groups

D. test scores that have been normalized

6. At the end of grade 5, Erik has a grade-equivalent score of 6.5 in reading. When he is at the

end of grade 6 his score will most likely be

A. 7.0

B. 7.5

C. greater than 7.5

D. less than 6.0

211

7. Grade equivalent scores are least useful for

A. comparing performance on two tests.

B. describing growth in achievement.

C. interpreting scores to students.

D. reporting to parents.

8. A percentile score of 50 on an achievement test indicates that

A. half the norm group had lower scores.

B. half the items were marked correctly.

C. the raw score is at least 50.

D. this person failed the test.

9. Which of the following is a disadvantage of percentile ranks?

A. they are difficult to prepare

B. they are difficult to interpret

C. they depend on the number of items in the test

D. they have unequal units

10. The z-score serves as the basis or which of the following?

A. percentile score

B. standard age score

C. average score

D. formative score

11. Which of the following is true about T scores?

A. They are criterion-referenced.

B. They possess an absolute zero.

C. They are based on the z-score.

D. They are based on percentiles.

12. One advantage of T-scores over z-scores is that they

A. always have the same standard deviation.

B. are more easily converted to percentile ranks.

C. can be added and subtracted.

D. include only positive scores.

13. If a student ranks 5th in a class of 50, the percentile rank would be

A. 5.

B. 10.

C. 45.

D. 90.

212

14. In a normal distribution, which of the following scores equals a percentile rank of 16?

A. Standardized Age Score of 66.

B. Stanine of 2.

C. T-score of 40.

D. z-score of –1.6.

15. The smallest number of percentile ranks fall between which of the following ranges of T-

scores?

A. 30 and 35

B. 40 and 45

C. 50 and 55

D. 55 and 60

16. In a normal distribution a percentile rank of 84 would be the equivalent to a stanine of

A. 6.

B. 7.

C. 8.

D. 9.

17. The “score bands” used on test profiles indicate which of the following?

A. score intercorrelations

B. score objectivity

C. score reliability

D. score validity

18. Other things being equal, which of the following represents the highest level of achievement?

A. Normal-curve equivalent = 80

B. Percentile rank = 84

C. T-score = 59

D. z-score = .8

19. In a normal distribution, which of the following represents the lowest level of achievement?

A. Normal-curve equivalent = 40

B. Percentile rank = 40

C. Stanine = 5

D. T-score = 40

20. Which of the following statements about stanine scores is most accurate?

A. It is possible to have a stanine of 0.

B. Stanines are based on a nine-point scale.

C. Stanines are more precise than other types of standard scores.

D. The stanine of 105 corresponds closely to the mean of the normal curve.

21. Percentages and percentiles are interchangeable statistics.

Agree

Disagree

213

22. Norm relevancy should be left up to the test publisher.

True

False

23. It is legitimate for the test consumer to ask questions about the recency of test norms.

Agree

Disagree

24. The normal curve has the particular property of being symmetrical.

True

False

25. List and describe the three attributes that good test norms should possess.

26. Describe z-scores and T scores. How are they different? What is their relationship to the

normal curve?

214

Chapter 19: Answer Key

1. D

2. D

3. C

4. C

5. A

6. C

7. A

8. A

9. D

10. B

11. C

12. D

13. D

14. C

15. A

16. B

17. C

18. A

19. D

20. B

21. Agree

22. False

23. Agree

24. True

25. Good norms should be relevant, representative and up to date (recent). Relevancy is the

degree of agreement between the test norm group and the attributes of the test takers.

Relevancy of norm groups needs to be decided by teachers and other educational

professionals before adopting a test with a given group of students. Representativeness of

norms reflects the notion of a random sample in the test authors selecting the norm group for

the test. This is extremely difficult and expensive, however, so we must usually settle for

something less. At a minimum, we should demand that all significant subgroups of the

population be adequately represented. Norm recency means that the norms are up to date and

not updated. Test norms ten or more years old are probably outdated. Anytime a test is put

out in a new edition the test should probably also contain up to date test norms.

215

Appendix A Elementary Statistics

Exercise A-1

MEASURES OF CENTRAL TENDENCY

LEARNING GOAL: Distinguishes among measures of central tendency.

Directions: For each of the following statements, indicate which measure of central tendency is

being used by circling the appropriate letter using the following key.

KEY: A = Mean, B = Median, C = Mode.

A B C 1. It is the most frequent score in a set of scores.

A B C 2. It accounts for the numerical value of each score.

A B C 3. It is always an actual score.

A B C 4. It is always equal to the 50th percentile.

A B C 5. It is determined by dividing the sum of a set of scores by the number of

scores.

A B C 6. It would not change if an extremely high score earned by a single

individual was deleted from the set.

LEARNING GOAL: Selects the measure of central tendency that is most appropriate for a

particular use.

Directions: For each of the following statements, indicate whether the mean (A) or the median

(B) is most appropriate by circling the letter.

A B 1. To use with the quartile deviation.

A B 2. To use with the standard deviation.

A B 3. To divide a set of scores into two equal halves.

A B 4. To compute a set of standard scores.

A B 5. To limit the influence of a single score of 85 when all the other scores range

between 35 and 65.

A B 6. To report the most widely used measure of central tendency.

216

Exercise A-2

MEASURES OF VARIABILITY

LEARNING GOAL: Distinguishes among the measures of variability.

Directions: For each of the following statements, indicate which measure of variability is being

described by circling the appropriate letter using the following key.

KEY: A = Standard deviation, B = Quartile deviation, C = Range.

A B C 1. It is based on the highest and lowest scores only.

A B C 2. It is half the distance between the 25th and 75th percentiles.

A B C 3. It is also called the semi-interquartile range.

A B C 4. It accounts for the numerical value of each score.

A B C 5. It is influenced least by adding one extremely low score.

A B C 6. It can be used to identify the range of the middle 68 percent of scores in a

normal distribution.

LEARNING GOAL: Selects the measure of variability that is appropriate for a particular use.

Directions: For each of the following statements, indicate which measure of variability would be

most appropriate by circling the letter using the following key.

KEY: A = Standard deviation, B = Quartile deviation, C = Range.

A B C 1. To obtain the most stable measure of variability.

A B C 2. To obtain the simplest and quickest estimate of variability.

A B C 3. To obtain the range of the middle 50 percent of a set of scores.

A B C 4. To compute the amount of error in test scores.

A B C 5. To use with a small set of scores that includes one extremely high score.

A B C 6. To compute a set of T-scores.

217

Exercise A-3

CONSTRUCTING GRAPHS AND COMPUTING MEASURES OF CENTRAL

TENDENCY AND VARIABILITY

LEARNING GOAL: Constructs graphical representations of scores and computes measures of

central tendency and variability.

Directions: Use the following set of scores to: (1) construct a frequency polygon with a class

interval of 3, (2) construct a stem-and-leaf diagram, and (3) compute the mean, median, range,

and standard deviation.

Student Score

_______________

A 60

B 58

C 56

D 54

E 53

F 52

G 48

H 47

I 45

J 42

K 40

L 39

M 37

N 35

O 32

P 29

Q 28

R 20

S 15

T 10

Median =

Range =

Mean =

Standard deviation =

218

Exercise A-4

CORRELATION COEFFICIENT AND REGRESSION

LEARNING GOAL: Identifies the characteristics of the product-moment correlation coefficient.

Directions: Indicate whether each of the following features describes the product-moment

correlation coefficient by circling Yes (if it does) and No (if it does not).

Yes No 1. During the computation, it accounts for the numerical value of each score.

Yes No 2. It is easy to compute without the aid of a calculator.

Yes No 3. It can be used to compute estimates of test reliability.

Yes No 4. It can be used to evaluated test-criterion relationships.

Yes No 5. The degree of relationship is shown by the plus and minus signs.

Yes No 6. It can be used to indicate the cause/effect relations between measurement

variables.

LEARNING GOAL: Uses the regression equation to obtain predicted criterion scores from test

scores.

Directions: The regression equation for predicting a criterion measure, Y, from a test score, X,

is: predicted Y = –1.0 + .4X. Find the predicted criterion score for the following three students.

Carlos: Test score = 20

Predicted criterion score =

Kim: Test score = 15

Predicted criterion score =

Sam: Test score = 10

Predicted criterion score =

219

Exercise A-5

COMPUTING THE PRODUCT-MOMENT CORRELATION COEFFICIENT

LEARNING GOAL: Computes the product-moment correlation coefficient.

Directions: Compute the product-moment correlation for the pairs of scores in the following

table.

____________________________________________

Student X Y X2 Y

2 XY

____________________________________________

A 15 16

B 18 15

C 12 8

D 13 11

E 19 17

F 10 9

G 14 13

H 11 5

I 17 17

J 11 9

220

Answers to Student Exercises

A-1 (Top) 1. C 2. A 3. C 4. B 5. A 6. C

(Bottom) 1. B 2. A 3. B 4. A 5. B 6. A

A-2 (Top) 1. C 2. B 3. B 4. A 5. B 6. A

(Bottom) 1. A 2. C 3. B 4. A 5. B 6. A

A-3 STEM LEAF

6 0

5 23468

4 02578

3 2579

2 089

1 05

Median = 41

Range = 50

Mean = 40

Standard deviation = 14

A-4 1. Y 2. N 3. Y 4. Y 5. N 6. N

Predicted criterion scores: Carlos, 7; Kim, 5; Sam, 3.

A-5 Product-moment correlation = .86.