Lions, Tigers & Validity, Oh My - About The Conference · Lions, Tigers & Validity, Oh My ... Types of evidence to gather when validating a test: Content validity ... the periodic

1/26/2017

1

CONFIDENTIAL & PROPRIETARY Copyright © 2016 Pearson Education, Inc. or its affiliates. All rights reserved.

March 1, 2017

Lions, Tigers & Validity, Oh My!

Lions, and tigers, and bears, oh, my!

2

Fundamentals We Will Cover

What is validity and where does it come from?

Is there more than one type of validity?

How do you know if you've achieved validity?

What types of security concerns impact validity in a testing program?

3

What is validity & where does it come from?

Two questions underlie the concept of validity

1. Does the test measure what it was designed to measure?

2. Are the interpretations drawn from the test legitimate and justifiable?

5


6

Why are we testing?

What is the purpose of the test?

What about the test takers needs to be measured to achieve this purpose?

1/26/2017

2

Purpose of testing?

Substitution for other measures (e.g., psychiatrist)

• Depression inventory

Prediction of future behavior (e.g., college entrance)

• Success in a first year of college after 1 year

Reflection of test-taker performance on content domain (e.g., licensure/certification)

• Assessment of minimal competence7


For licensure and certification examinations:

The purpose for testing is to establish whether the candidate has the minimal competence needed for professional practice

And..

The test is designed to measure the knowledge, skills and abilities (KSAs) needed to successfully perform the tasks one would perform on the job.

These are usually determined through a job analysis study.

2. Are the interpretations drawn from the test legitimate & justifiable?

Purpose Interpretation

Substitution for other measures

Depression inventory The degree to which an individual exhibits the measured personality trait(s)

Prediction of future behaviour

Success at college Whether or not the test-taker is likely to persist after their first year of college

Reflection of test-taker performance on content domain

Assessment of minimal competence Whether or not the test-taker is competent to practice a profession

Two questions underlie validity


2. Are the interpretations drawn from the test legitimate and justifiable?

10

Validity is established to the extent that you can answer Yes to these questions

What I’ve NOT said…

“The test is valid”

• It is not the test itself that is valid; it is the interpretations of the test scores that are deemed to be valid or not.

This is an important distinction because…

The validity of this interpretation is derived not just from the test items but from all factors that go into arriving at that score interpretation:

Is the test blueprint an adequate reflection of the job and the tasks individuals would be expected to perform on the job?

Does the content of the test items reflect current, competent practice?

Has the standard for competent practice (i.e., the cut score) been appropriately set?

12

1/26/2017

3

Is there more than one type of validity?

Validating a Test

Process of gathering evidence to justify the interpretations you want to make from the test scores

In the case of a licensure or certification exam, the question is:

Do the test scores reflect the knowledge, skills, and abilities needed to practice the profession competently?

Evaluating Validity

Types of evidence to gather when validating a test:

Content validity

Construct validity

Criterion-related validity

Content Validity

The test is valid to the extent that it is a representative sample of the behavior domain. Not a random sample!

Includes only relevant topics

Content validity is not evaluated after the test is developed and administered. It is built into the exam from the beginning – beginning with the decision

about what should be tested and how that decision is made.

Construct Validity

Answers the question: What trait or characteristic is measured by the test?

Real interest is the underlying theory.

Accumulation of evidence (not just one empirical relationship)

Interpretation of test scores: People who pass the test are “minimally competent.”

Not predicting success within the profession per se – only answering the question “Is this person minimally competent?”

Criterion-Related Validity

Typically involves a decision-making process:Should I use the test to select students for my university?

Should I use the test to select salespeople for my company?

Should I use the test to replace a psychiatric interview?

The test results are valid to the extent that it improves the effectiveness of my decision.

1/26/2017

4


Predictive

• Test is used to predict behaviour• Real interest is the criterion

behaviour• Example: University admissions

test is used to predict success at university.• Criterion: First-year grades

Concurrent

• Test data are collected at the same type as external behaviour

• Real interest is in using the test as a substitute for something else

• Example: Test measuring depression used as a substitute for formal diagnosis by a psychiatrist• Criterion: Psychiatrist diagnosis


Cautions:

1. Validity evidence is only as good as the criterion.

2. Validity determined for a group of test-takers errors will be made for individuals.

How do you know if you’ve achieved validity?

Content validity: Types of evidence

1. Are the test items linked to a test plan that has been developed using the participation of subject matter experts (SMEs)?

2. Have SMEs been involved in the item writing and review process?

3. Are the correct answers to test items validated by learning materials or a well-recognised reference, such as a textbook?

4. Have items been written using formatting and style guidelines?

5. Is an item bias and sensitivity review a component of the item development process?

6. Is there a review process for accepting items into the operational pool which involves reviewing the statistical properties of the items?

7. Is there a plan for the periodic review of operational items to ensure they still reflect current practice?

22

23

Tests are “imperfect measures of constructs because they either leave out something that should be included…or else include something that should be left out, or both”

~Samuel Messick, 1989, p. 34

Threats to validity

Threats to validity

24

What you want to measure

What you

actually measure

Construct-irrelevant varianceSome test scores are either inflated or suppressed due to extraneous variables that the test was not designed to assess

Construct underrepresentationContent coverage is not adequate to generalise test taker performance to the construct in question

What you actually measure

What you

want to measure

1/26/2017

5

Construct & Criterion-Related Validity: Types of evidence

Internal structure: rxx

Internally consistent? Stable over time?

Correlations with other measuresHigh correlations with other measures of the same constructLow correlations with measures of different constructs

Criterion-related validity studiesWhat does the test predict?Group differences

What types of security concerns impact validity in a testing program?

Cheating

Acting dishonestly or unfairly in order to gain an advantage In testing that means declaring knowledge, skills or abilities that the test taker does not really have.

Cheating impacts validity because if someone cheats, their score report is not a true reflection of the knowledge, skills and abilities they have.

Examples: Test-taker collaboration, using hidden notes, etc

27

Prior knowledge of test content

Prior knowledge of test content is gained through item leakage.

Prior knowledge impacts validity because their score report may not be an accurate reflection of their knowledge, skills and abilities.

Examples: Brain-dump sites, proctor collaboration, test-taker collaboration, program design flaws (same form over and over again, shallow item banks)

28

Differences in administration procedures

These differences impact validity because the score result may not be valid or reliable (consistent).

Examples: Allowing some test-takers to have more time than others, not checking IDs at one site but checking at another (allowing proxy-testing to occur), difference in administration sites (noisy vs quiet, dim versus bright, etc)

29

Summary

30

1/26/2017

6

The validity chain

Purpose of the test defined

Test blueprint

developed

Test content written

according to the test blueprint

Standard setting

conducted to set cut

score

Testing Occurs

Scoring methods applied

Leads to

Candidate’s observed

performance on the test (test score)

Interpretedas

Whether or not candidate has

KSAs required of competent practice

The interpretative argument is valid to the extent the inferences and assumptions made along the path are plausible, as determined by well-documented evidence gained through logical

and/or empirical validation.

QUESTIONS?

32

Documents

Lions, Tigers & Validity, Oh My - About The Conference · Lions, Tigers & Validity, Oh My ... Types of evidence to gather when validating a test: Content validity ... the periodic