Classroom Assessment A Practical Guide for Educators by Craig A. Mertler Chapter 3 Characteristics of Assessments

Classroom Assessment

A Practical Guide for Educatorsby Craig A. Mertler

Chapter 3

Characteristics ofAssessments

Introduction

The quality of educational decisions is only as

good as the information that leads to them.

If inappropriate information is collected, or if it is

collected without precision, the decisions that

follow will logically be inaccurate.

Two key characteristics of all assessments are

validity and reliability.

What Is Validity? Validity: the degree to which evidence and

theory support the interpretations of test scores entailed by proposed uses of tests.• does not emphasize the results themselves,

but rather how the results are used (validity deals with the decisions that follow the interpretation of test results—i.e., appropriate and inappropriate uses of assessment results)

• the most fundamental consideration when developing and evaluating tests and other assessments

• example—the SMART

What Is Validity? Validity (continued)

• Three important points about validity: concept of validity applies to the ways

teachers interpret and use assessment results

assessment results have different degrees of validity, depending on purposes

judgments about validity should be made only after examination of several types of validity evidence

What Is Validity? Sources of Validity Evidence

• validity is an abstract concept; it cannot be directly observed; must gather evidence in support of it

• content evidence of validity focuses on extent to which content addressed

by assessment items adequately samples the larger domain of performance

most important type of evidence for teachers relevance: do items emphasize what has been

taught? representativeness: how well do the items

represent the total content area? two-way tables (“content” x “taxonomic level”)

can assist teachers in gathering this evidence

What Is Validity? Sources of Validity Evidence (continued)

• criterion evidence of validity focuses on extent to which scores resulting from

an assessment are related to another similar, well-established assessment (the criterion)

predominantly a concern for standardized tests Predictive evidence of validity: form of criterion-

related evidence where criterion is measured sometime in the future.

Concurrent evidence of validity: form of criterion-related evidence where criterion is measured at the same time or consists of some measure available at the same time.

What Is Validity?

Sources of Validity Evidence (continued)• construct evidence of validity

focuses on degree to which there is a fit between hypothetical construct (unobservable human trait) being measured and the responses actually supplied by students

typically a concern for standardized tests sometimes viewed as an “umbrella” for all

sources of validity evidence

What Is Validity?

Sources of Validity Evidence (continued)

• face evidence of validity

not considered a formal source of evidence

informal measure of extent to which the users or takers of tests believe that the test results are valid

often plays an important role in terms of student (and teacher) motivation

What Is Validity?

Establishing Validity of Quantitative Assessments• for classroom assessments—five guiding questions

Does my assessment procedure emphasize what I have taught?

Do my assessment tasks accurately represent outcomes specified in my school’s, district’s, or state’s curriculum guide?

Is the content in my assessment procedure important and worth learning?

Do students perceive that the problems or tasks on my assessment emphasize the concepts and other material that I have taught?

Do students generally believe that the assessment measures the appropriate behaviors, skills, or characteristics as they were taught?

content evidence

face evidence

What Is Validity?

Establishing Validity of Quantitative Assessments

(continued)

• for standardized assessments—evidence of

validity is based on statistical analysis (especially

for criterion-related evidence)

• Correlation coefficient (r): statistical measure that

indicates the extent to which scores on one

assessment agree with scores on the other;

ranges from -1.00 to +1.00; known as a validity

coefficient.

What Is Validity? Establishing Validity of Qualitative Assessments

• for informal classroom assessments—five questions

Have I limited my observations to concrete behaviors, as opposed to more global impressions of students?

Have I observed/noted the specific behavior a sufficient number of times in order to draw definitive conclusions?

Have I observed/noted the behavior in different settings or situations?

Have I based my conclusions only on the information that I have gathered?

Are there plausible, alternative explanations for the given behavior?

representa-tiveness of observatio

ns

nature of

inferences

What Is Reliability? Reliability: the consistency of measures when the

testing procedure is repeated on a population of individuals or groups.• validity = accuracy; reliability = consistency• also speaks to scores and their interpretation

and use, not to the assessment itself• scores—and their consistency—are affected by

error• error can result from student illness, content

assessed but not taught, etc.• random errors affect consistency; systematic

errors affect validity

What Is Reliability?

Establishing Reliability of Quantitative

Assessments

• established by correlating test results with

themselves or with other forms of the test

(anticipate that high scores on one form of the

assessment are associated with high scores on

the other)

• Reliability coefficient (r): a correlation

coefficient representing measures of reliability.


Establishing Reliability of Quantitative Assessments (continued)

• Test-retest method: estimates reliability over time; results in a coefficient of stability.

procedure is not realistic for classroom use

• Alternate-forms and equivalent-forms methods: administration of tests with different items, or same items, that have been rearranged; results in an alternate-forms coefficient.

again, procedure not realistic for classroom use

What Is Reliability? Establishing Reliability of Quantitative Assessments

(continued)• Internal consistency methods: estimate of

reliability with only one administration; determines how well items correlate with one another. Split-half method: divides test into two

comparable halves. KR-21 method: all possible split-half

combinations. Cronbach’s method: similar to KR-21, but for

items with different point values.


Establishing Reliability of Qualitative

Assessments

• Interrater consistency: calculation of percent

agreement between two or more raters of

student performance.

The Relationship Between Validity and Reliability

Validity is the more important feature.

Reliability is a prerequisite to validity (in other

words, if items accurately assess a domain of

content, the scores will also be consistent).

Assessment results may be reliable (i.e.,

consistent) but not valid (i.e., accurate).

The Relationship Between Validity and Reliability

Valid test results are also reliable, but

reliable test results are not necessarily

valid. • • • • • • • • • • • •• • • • • • • • • • • • • • • • •••• • • • • • • • • • ••••• • • • • • • •• •• • • • •

• •

(a)

lacks validity andreliability

(b)

fair validity andfair reliability

(c)

good reliability butlacks validity

(d)

good validity andgood reliability

Teacher Responsibilities Related to Validity and Reliability

Ensuring the validity and reliability of classroom

assessments is a primary responsibility of

teachers.

Refer to both The Standards for Teacher

Competence in the Educational Assessment of

Students and The Code of Fair Testing Practices

in Education.

Documents

Classroom Assessment A Practical Guide for Educators by Craig A. Mertler Chapter 3 Characteristics of Assessments