1 CHAPTER 9 Reliability Coefficient for Criterion Referenced Tests

11

CHAPTER 9CHAPTER 9Reliability Coefficient Reliability Coefficient for for

CriterionCriterion Referenced Referenced TestsTests

Reliability Coefficients for Reliability Coefficients for Criterion Criterion Referenced Referenced TestsTests

Criterion:Criterion: What we intend to measure (DV)What we intend to measure (DV)

Norm-Norm-Referenced: Referenced: As in Intelligence tests for As in Intelligence tests for Ex. We compare the examinee's score with Ex. We compare the examinee's score with their norm (Normative IQ) or their norm (Normative IQ) or Deviation IQ.

Criterion-Criterion-Referenced: Referenced: As in achievement tests As in achievement tests we want to know if the examinee achieved a we want to know if the examinee achieved a particular domain (math, psych, or a particular domain (math, psych, or a particular behavior). particular behavior).

22

33

Reliability Coefficients for Reliability Coefficients for Criterion Referenced Criterion Referenced TestsTests

• Reliability Coefficients for Reliability Coefficients for Criterion Criterion Referenced Referenced Tests Tests are are used for 2 different purposes: used for 2 different purposes:

• 1-Domain Score Estimation 1-Domain Score Estimation oror

• 2- Mastery Allocations2- Mastery Allocations

1. 1. Domain Score EstimationDomain Score Estimation

• We use the We use the samesame type of calculation to type of calculation to determine the reliability coefficient as we determine the reliability coefficient as we did before. did before. Reliability coefficient for Reliability coefficient for Domain Score Estimation Domain Score Estimation of data in of data in table table 9.19.1 is is same as same as table table 7.17.1

• Ex. First we do an ANOVA to find the Ex. First we do an ANOVA to find the MS(MS within or MS person, and MS MS(MS within or MS person, and MS residual) then use the residual) then use the Hoyt’s Method Hoyt’s Method to to calculate the reliability calculate the reliability coefficient. coefficient. Next slidesNext slides

44

55

Reliability Coefficients for Reliability Coefficients for Criterion ReferencedCriterion Referenced Tests Tests MSMS person= MS withinperson= MS within

MS items = MS betweenMS items = MS between

Hoyt’s (1941) MethodHoyt’s (1941) MethodMS person= MS withinMS person= MS within

MS items = MS betweenMS items = MS betweenMS residual has its own calculations, it is not =MS totalMS residual has its own calculations, it is not =MS total

66

77

1. 1. Domain Score EstimationDomain Score Estimation

• *1-Domain Score Estimation*1-Domain Score Estimation

Domain Score for an examinee is Domain Score for an examinee is same as same as Observed Score (X) Observed Score (X) in in Classical theoryClassical theory.. It is the It is the proportion of the items in a specific proportion of the items in a specific domain that examinee can answer domain that examinee can answer correctly.correctly.

Ex. Your score of 85 on Test Construction has a D.S. of 85. Ex. Your score of 85 on Test Construction has a D.S. of 85.

88

Reliability Coefficients for Reliability Coefficients for Criterion Criterion Referenced TestsReferenced Tests

• *Decision Consistency*Decision Consistency

It is about the consistency of your decision. It is about the consistency of your decision. Decision Consistency Decision Consistency concerns with the extent to concerns with the extent to which the same decisions are made from different which the same decisions are made from different sets of measurements. sets of measurements. ConsistencyConsistency of decisions of decisions is based on is based on two different forms two different forms of a test of a test (parallel (parallel forms test). forms test). or, on or, on two administrations two administrations of the same of the same test test (test-retest).(test-retest).

A high reliability coefficient (p) indicates that there A high reliability coefficient (p) indicates that there is is consistencyconsistency in examinees scores. in examinees scores.

99

Reliability Coefficients for Reliability Coefficients for Criterion Criterion ReferencedReferenced TestsTests

*Factors Affecting *Factors Affecting Decision ConsistencyDecision Consistency

•1. 1. Test length Test length

•2. Location of the 2. Location of the cut-scorecut-score in the score in the score distributionsdistributions

•3. Test score 3. Test score generalizabilitygeneralizability

•4. 4. SimilaritySimilarity of the score distributions of the score distributions for for the two formsthe two forms

1010

Mastery AllocationMastery Allocation• *2. Mastery Allocation:*2. Mastery Allocation:

Involves comparing the Involves comparing the percent-percent-correctcorrect score to an arbitrary score to an arbitrary established established cut scorecut score.. If the percent- If the percent-correct score is correct score is equal or greater equal or greater than than the the cut score, cut score, the examinee has the examinee has mastered that domain. mastered that domain.

1111

Mastery AllocationMastery Allocation• *Mastery Allocation:*Mastery Allocation:

Mastering a domain Mastering a domain is called is called Mastery Mastery AllocationAllocation

Ex. EPPP exam Ex. EPPP exam cut score cut score in Florida is in Florida is 70%, If you scored 70% or greater on 70%, If you scored 70% or greater on this exam then you mastered the this exam then you mastered the psychology domain. You get your psychology domain. You get your psychologist licensepsychologist license and you can call and you can call yourself a psychologist.yourself a psychologist.

1212

UNIT III UNIT III VALIDITYVALIDITY

CHAP 10: INTRODUCTION TO CHAP 10: INTRODUCTION TO VALIDITYVALIDITY

CHAP 11: STATISTICAL CHAP 11: STATISTICAL PROCEDURES FOR PREDICTION PROCEDURES FOR PREDICTION AND CLASSIFICATIONAND CLASSIFICATION

CHAP 12: BIAS IN SELECTIONCHAP 12: BIAS IN SELECTION

CHAP 13: FACTOR ANALYSISCHAP 13: FACTOR ANALYSIS1313

1414

1515

• Validity: Validity: Validity refers to the Validity refers to the degreedegree that a that a test measures test measures what is intended to measurewhat is intended to measure. . It is about the It is about the qualityquality ((accuracy/trueness )accuracy/trueness ) of a test. of a test.

• *Characteristics of Validity:*Characteristics of Validity:

• 1. Result1. Result

• 2. Context 2. Context

• 3. Coefficient3. Coefficient

1616

*Characteristics of Validity:*Characteristics of Validity:

• 1. Result1. Result

• Validity refers to the Validity refers to the results of a results of a testtest, not to the test itself. , not to the test itself.

• Ex. Ex. If you are taking a statistic test If you are taking a statistic test you want to know that the you want to know that the resulting resulting score score is is valid to measure valid to measure your your knowledge of statistics.knowledge of statistics.

1717

INTRODUCTION TO VALIDITYINTRODUCTION TO VALIDITY

• 2. Context2. Context

• Validity of TheValidity of The resulting resulting score score (statistics) must be (statistics) must be interpreted interpreted within the within the context context in which the test in which the test occurs (statistics). occurs (statistics).

1818

INTRODUCTION TO VALIDITYINTRODUCTION TO VALIDITY• 3. Coefficient3. Coefficient

• Just like reliability coefficient validity Just like reliability coefficient validity coefficient also has coefficient also has degrees of degrees of variability variability from from low to high.low to high.

• P= 0 to 1P= 0 to 1

• Ex. Ex. The validity of the last year Test The validity of the last year Test Construction Exam. p=0.90Construction Exam. p=0.90

ValidityValidity• Validity has been described as

'the agreementagreement between a a test test score score and the quality quality it is believed to measure' (Kaplan and Saccuzzo, 2001). In other words, it measures the gapgap between what what a test actually measures a test actually measures and what is intended to measure. what is intended to measure. Next Next

SlideSlide

1199

ValidityValidity• This This gapgap can be caused by two can be caused by two

particular circumstances: particular circumstances:

• (a) (a) the the design of the test design of the test is insufficient is insufficient for the for the intended purpose, intended purpose, (ex. use (ex. use essays for older examinees) and essays for older examinees) and (b) (b) the test is used in a context or the test is used in a context or fashion which was fashion which was not intended in the not intended in the design design (change questions to multiple (change questions to multiple choice for math).choice for math).

2200

External External && InternalInternal ValidityValidity

• External ValidityExternal ValidityExternal validity External validity addresses the ability to generalizegeneralize your study to other people and other situations. other situations. Ex. Ex. Correlational studies. The association Correlational studies. The association between stress and depressionbetween stress and depression

2121

External External && InternalInternal ValidityValidity• Internal ValidityInternal Validity

Internal validity Internal validity addresses the "true" "true" causescauses of the outcomes that you observed in your study. Strong internal validity means that you not only have reliable measures of your independent and dependent variables But a strong justification that causallycausally links links your independent variables to your your independent variables to your dependent variables dependent variables (Ex. Experimental studies. The affect of stressstress on heart attackheart attack).

2222

2323

*Major Types of Validity 3Cs*Major Types of Validity 3Cs

Items Stats

how well a test estimates/predict a performance

teacher’s Math test and the researcher test (fcat)

EPPP GRE

Test non-observable construct or trait your Dep Test or Clinical interview (underlying construct i.e. Sleeping, eating, hopeless) &(underlying construct i.e. Sleeping, eating, hopeless) & BDI-2 BDI-2 score score

2424

*Face validity*Face validity• Face validity is that the test appears to Face validity is that the test appears to

be valid. This is validated using be valid. This is validated using common-sense rules, for examplecommon-sense rules, for example

• aa mathematical test should include mathematical test should include some numerical elements.some numerical elements.

2525

2626

Face validityFace validity• 1. 3+5=• 2. 12-10=• 3. 8-5=• 4. 25-16=• 5. 13+3-8=• Multiple Choice; Please select the best answerMultiple Choice; Please select the best answer. • 6. Judy had 10 pennies. She lost 2. How many

pennies does she have left?• A. 2• B. 8• C. 10• D. 12

Face validityFace validity

2727

Face ValidityFace Validity• A test can A test can appear to be invalid appear to be invalid but but

actually be actually be perfectly validperfectly valid, for example , for example where correlations between where correlations between unrelated unrelated items items and the and the desired items desired items have been have been found. found.

• Ex. Ex. Successful Successful pilotspilots in WW2 in WW2 were found to were found to very often have had an active very often have had an active childhood childhood interest interest in in flyingflying model planes model planes (The (The association between association between flying model planesflying model planes and and WW2 successful pilots).WW2 successful pilots).

2828

Face Validity Face Validity

• A test that does A test that does notnot have have face face validityvalidity may be may be rejected rejected by test-by test-takers (if they have that option) takers (if they have that option) and people who are choosing and people who are choosing the test the test to use from amongst a to use from amongst a set of options.set of options.

2929

3030

3131

3232

3333

Types of ValidityTypes of Validity

• 1. *Content Validity1. *Content Validity

• Measures the Measures the knowledge of the knowledge of the content domain content domain of which it was of which it was designed to measure. designed to measure.

• Ex. Ex. If the content domain is If the content domain is statistics the test should measure statistics the test should measure the statistical knowledge, not the statistical knowledge, not English, Math, or psychology etc.,English, Math, or psychology etc.,

3434

1. *Content Validity1. *Content Validity• Instruction: Multiple Choice; Please select the best answerInstruction: Multiple Choice; Please select the best answer.

((structured framework)structured framework)• 6. Judy had 10 pennies. She lost 2. How many pennies

does she have left?• A. 2• B. 8• C. 10• D. 12

• The red part is called “Performance Domain” “Performance Domain” or Domain CharacteristicDomain Characteristic, which deals with your knowledge of the domain.. knowledge of the domain..

• The yellow is called “Matching Item.”“Matching Item.”

1.Content Validity1.Content Validity

• *Content Validity*Content Validity

• A test has content validity if it sufficiently A test has content validity if it sufficiently covers the area that it iscovers the area that it is intended intended to cover. to cover. This is particularly important in ability or This is particularly important in ability or attainment/achievement tests that validate attainment/achievement tests that validate skills or knowledge in a particular domain.skills or knowledge in a particular domain.

• *Content Under-Representation *Content Under-Representation occurs occurs when important areas are missed.when important areas are missed. *Construct-Irrelevant Variation *Construct-Irrelevant Variation occurs when occurs when irrelevant factors contaminate the test.irrelevant factors contaminate the test.

3535

3636

1. Content Validity1. Content Validity• *Content Validity has *Content Validity has 4 Steps4 Steps

• 1. Defining the 1. Defining the performance domainperformance domain of of interestinterest

• 2. Selecting 2. Selecting a panel of qualified experts in a panel of qualified experts in the content domain.the content domain.

• 3. 3. Providing Providing a a structured framework structured framework (instruction)(instruction) for the process of for the process of matching matching item item (Question)(Question) to the to the performance domainperformance domain (answers.)(answers.)

• 4. Collecting 4. Collecting and summarizing theand summarizing the data data from the from the matchingmatching process. process.

3737

1. Content Validity1. Content Validity

• *Content Validity has 4 *Content Validity has 4 StepsSteps

• 1. Defining the performance 1. Defining the performance domain of interestdomain of interest

• Ex. Ask yourself what am I Ex. Ask yourself what am I trying to measure? Psych, trying to measure? Psych, Stats, English??Stats, English??

3838

1. Content Validity1. Content Validity• 2. Selecting a panel of qualified 2. Selecting a panel of qualified

experts in the content domain.experts in the content domain.

• Ex. Select expert Ex. Select expert statisticians to review statisticians to review your stats questions. your stats questions. Another ex. Qualifying Another ex. Qualifying exam questions.exam questions.

3939


• 3. Providing a 3. Providing a structured structured framework framework (instruction)(instruction) for for the process of the process of matching item matching item (Question)(Question) to the to the performance domainperformance domain (answers.)(answers.)

• Ex. Go back 4 slides and see Question #3Ex. Go back 4 slides and see Question #3

4040

1. Content Validity1. Content Validity• 4. Collecting and 4. Collecting and summarizing the data summarizing the data from the from the matching matching process. process.

Select and collect a Select and collect a sample of these relevant sample of these relevant questions (items).questions (items).

4141


• *Practical Considerations in *Practical Considerations in Content ValidityContent Validity

• *Content validity requires the *Content validity requires the following 4 decisions following 4 decisions (questions).(questions).

• 1. Should objective be weighted 1. Should objective be weighted to reflect their importance? to reflect their importance? Ex. Next slide

4242


• 2. How should the item-matching 2. How should the item-matching task be structured? task be structured? Ex. Next slide

• 3. What aspect of item should be 3. What aspect of item should be examined? examined? Ex. Next slide

• 4. How should results be 4. How should results be summarized?summarized?

• Ex. Next slide

4343

1. Content Validity1. Content Validity• 1. Should objective be weighted to 1. Should objective be weighted to

reflect their importance?reflect their importance?

In Content Validity we should rate In Content Validity we should rate the the importance of objectives. importance of objectives. The The designer of the test should provide designer of the test should provide a scale such as a a scale such as a “rubric”“rubric” for for measuring the measuring the objectivesobjectives in a test. in a test. This also helps you to measure the This also helps you to measure the inter-inter-rater reliability rater reliability of a test more accurately.of a test more accurately.

4444

1. Content Validity1. Content Validity• 2. How should the 2. How should the item-matching item-matching task task

be structured?be structured?

Katz (1958) Katz (1958) suggested that the suggested that the expert expert reviewers reviewers should read the item and should read the item and identify the identify the correct/bestcorrect/best response. response.

Hambleton (1980) Hambleton (1980) idea was that the idea was that the expertsexperts should should rate the degree of rate the degree of matchingmatching to a specific objective by to a specific objective by using a using a 5 point scale5 point scale

poor fit 1____2____3____4____5 excellent fitpoor fit 1____2____3____4____5 excellent fit

4545

1. Content Validity1. Content Validity• 3. What aspect of item should 3. What aspect of item should

be examined?be examined?• We should have a We should have a clear clear

description description of of itemitem and and domaindomain to consider the to consider the matching matching item(s) item(s) to a to a performance performance domain or domain domain or domain characteristics.characteristics.

Ex. Go back to Question # 6Ex. Go back to Question # 6

4646

1. Content Validity1. Content Validity• 4. How should results be summarized4. How should results be summarized There are 5 ways: read p. 221 There are 5 ways: read p. 221 1. Percentage of items matched to

objectivesobjectives 2. Percentage of items matched to

objectives with high “importance” ratingobjectives with high “importance” rating 3. Correlation between the importance

weighting of objectives and the number of objectives and the number of items items measuring those objectives

4. Index of item-objective congruenceitem-objective congruence 5. Percentage of objectives not assessedobjectives not assessed

by any of the items on the test

4747

2. Criterion Related 2. Criterion Related ValidityValidity• *Criterion Related Validity is a measure *Criterion Related Validity is a measure

of the of the extentextent to which a to which a test is related test is related to same criterion to same criterion or, or, how well a test estimates/predict a performance

• Ex. SAT would be a Ex. SAT would be a predictorpredictor of college of college performance, performance, GRE, Graduate GRE, Graduate performanceperformance, EPPP psychologist , EPPP psychologist performance, performance, and Driver License Test, and Driver License Test, basic traffic signs and signals and/or basic traffic signs and signals and/or driving driving performance.performance.

4848

2. Criterion Related Validity2. Criterion Related Validity• Criterion Related Validity is Criterion Related Validity is

concerned with concerned with how well a how well a test either test either estimatesestimates current current performance performance (Concurrent (Concurrent Validity) Validity) or or how well it how well it predictspredicts the the future future performance performance (Predictive (Predictive Validity).Validity). Ex. EPPP ExamEx. EPPP Exam

Ex. ofEx. of ConcurrentConcurrent and and PredictivePredictive ValidityValidity

• Researchers want to know if 6 grade Researchers want to know if 6 grade students Math score is valid. They give students Math score is valid. They give students a test, designed to measure students a test, designed to measure mathematical aptitude for 6 graders.mathematical aptitude for 6 graders.

• They then compare and They then compare and correlatecorrelate this this scores with the test scores already held scores with the test scores already held by the teachers (midterm scores). by the teachers (midterm scores). rr

4949

Ex. of Ex. of ConcurrentConcurrent and and Predictive ValidityPredictive Validity

• They evaluate the accuracy of their They evaluate the accuracy of their test, and decide whether it test, and decide whether it measures what it is measures what it is supposed to. supposed to. The key element is that the two The key element is that the two methods were compared at about methods were compared at about the the same time same time (Concurrent) (Concurrent) or only or only a few days apart).a few days apart).

5050

Ex. of Concurrent and Ex. of Concurrent and PredictivePredictive ValidityValidity

• However, If the researchers had However, If the researchers had measured the mathematical measured the mathematical aptitude, implemented a aptitude, implemented a new new educational programeducational program, and then , and then retested the students after retested the students after six six monthsmonths, this would be , this would be predictivepredictive validity.validity.

5151

2. Criterion Related Validity2. Criterion Related Validity• ConcurrentConcurrent validityvalidity is is

measured by measured by comparing two comparing two tests tests done at the done at the same timesame time, for , for example a example a written test written test and a and a hands-on exercise hands-on exercise that seek to that seek to assess the same criterion. This assess the same criterion. This can be used to limit criterion can be used to limit criterion errors. errors. Ex. For diagnosis of depression; Ex. For diagnosis of depression; Clinical interview Clinical interview and and BDI IIBDI II 5252

2. Criterion Related Validity2. Criterion Related Validity• Predictive validity, Predictive validity, by by

contrast, compares contrast, compares success success in the test in the test with actual with actual success in the success in the futurefuture job. job. The The test is then adjusted test is then adjusted over time over time to improve its validityto improve its validity..

• Ex. EPPP exam and psychologist Ex. EPPP exam and psychologist performanceperformance

5353

2. Criterion Related Validity2. Criterion Related Validity• Criterion-related validityCriterion-related validity

• Criterion-related validity Criterion-related validity is like is like construct validityconstruct validity, , but relates the test but relates the test to some to some external criterion, external criterion, such as such as particular aspects of the jobparticular aspects of the job..

• There are dangers with the external criterion There are dangers with the external criterion being selected based on its convenience being selected based on its convenience rather than being a full representation of the rather than being a full representation of the job.job. Ex. An air traffic control test may use a Ex. An air traffic control test may use a limited set of scenarios.limited set of scenarios.

5454

5555

2. Criterion Related Validity2. Criterion Related Validity• * The general design of a * The general design of a criterion-related criterion-related

validity validity has the following 5 steps; has the following 5 steps; p.224p.224 1. 1. Identify a suitable criterion behavior Identify a suitable criterion behavior (depression) and a method for measuring(depression) and a method for measuring it (your depression test).it (your depression test). 2. 2. Identify an appropriate sample of Identify an appropriate sample of examinees (depressed patients) examinees (depressed patients) representative of those for whom the testrepresentative of those for whom the test will ultimately be used.will ultimately be used.

2. Criterion Related Validity2. Criterion Related Validity3. 3. Administer the test and keep a Administer the test and keep a

record of each examinee’s score.record of each examinee’s score.4. 4. When the criterion data are available, When the criterion data are available,

obtain a measure of performance on obtain a measure of performance on the criterion for each examinee the criterion for each examinee (1.mild, 2. mod, 3. severe).(1.mild, 2. mod, 3. severe).

5. 5. Determine the strength of the relationship Determine the strength of the relationship between between test scores test scores and and criterion criterion performance performance Ex. The relationship Ex. The relationship between between the teachers math scores the teachers math scores and and the researchers math scores the researchers math scores (researcher (researcher determine the criterion performance) determine the criterion performance) r=?r=?5656

5757

3. Construct Validity3. Construct Validity• 3. *Construct Validity3. *Construct Validity

• A test has A test has construct validity construct validity if if accurately measuresaccurately measures a theoretical, non- a theoretical, non-observable observable construct or trait construct or trait (i.e. (i.e. intelligence, motivation, depression, intelligence, motivation, depression, anxiety, stats, biology, etc.) Ex. The anxiety, stats, biology, etc.) Ex. The relationship between The relationship between The Clinical Clinical interviews interviews Symptoms/Characteristics Symptoms/Characteristics of depression which is the underlying of depression which is the underlying construct), and the construct), and the scores on BDI II scores on BDI II (mild, moderate, severe) (mild, moderate, severe) ).).

5858

3. Construct Validity3. Construct Validity• 3. *Construct Validity3. *Construct Validity

**Construct-Irrelevant Construct-Irrelevant Variation Variation occurs when occurs when irrelevant factors irrelevant factors contaminate the test.contaminate the test.

3. Construct Validity3. Construct Validity• Construct validityConstruct validity• Underlying many tests is a Underlying many tests is a

constructconstruct or theory that is being or theory that is being assessed. assessed.

• Ex. There are a number of Ex. There are a number of tests/constructs for describing tests/constructs for describing intelligenceintelligence (spatial ability, verbal (spatial ability, verbal reasoning, processing speed, etc.) reasoning, processing speed, etc.) which the test will individually assess. which the test will individually assess.

5959

3. Construct Validity3. Construct Validity• Constructs Constructs can be about can be about causes, causes,

about about effectseffects and the and the causecause--effect effect relationship.relationship.

• If theIf the construct construct is not valid then the is not valid then the test on which it is based will not be test on which it is based will not be valid. valid.

• Ex. There have been Ex. There have been historical historical constructs constructs that intelligence is based on that intelligence is based on the size and shape of the skull the size and shape of the skull (Phrenology).(Phrenology).

6060

6161

6262

6363

3. Construct Validity *Measurements3. Construct Validity *Measurements• *Multitrait-Multimethod Matrix*Multitrait-Multimethod Matrix Campbell and Fiske (1959) Campbell and Fiske (1959)

described this approach as described this approach as “concerned with the adequacy “concerned with the adequacy of tests as measures of a of tests as measures of a construct/trait. With this construct/trait. With this technique the researcher must technique the researcher must think of think of two or more ways two or more ways (methods) (methods) to measure the to measure the construct/traitconstruct/trait of interest of interest Next slideNext slide

6464

3. Construct Validity *Measurements3. Construct Validity *Measurements

• (1.True-False, 2. Forced Choice, (1.True-False, 2. Forced Choice, andand 3. Incomplete sentences 3. Incomplete sentences are are methodsmethods) and ) and (A. sex-guilt,(A. sex-guilt, B. B. hostility-guilt , hostility-guilt , andand C. morality- C. morality-conscience conscience areare trait or constructtrait or construct) ) .. Using one sample of subjectsUsing one sample of subjects, , measurements are obtained by measurements are obtained by samesame or or differentdifferent methods. methods. Compare the correlation between Compare the correlation between the two measurements and identify the two measurements and identify one of the one of the 3 types; 3 types; Next slideNext slide

6565

3. Construct Validity• 1. *1. *Reliability Coefficients Reliability Coefficients Using Using samesame

measurement measurement methodmethod for for samesame trait,trait, it’s like it’s like test retest reliability. (you use the test retest reliability. (you use the samesame trait trait andand methodmethod (twice) Ideally should be high r. (twice) Ideally should be high r. See Table 10.2 on next slideSee Table 10.2 on next slide

2. *Convergent Validity Coefficient: 2. *Convergent Validity Coefficient: Using Using different different measurement measurement methodmethod butbut same same trait trait (it’s like parallel forms reliability i.e. (it’s like parallel forms reliability i.e. form A and form B. Ideally should be high form A and form B. Ideally should be high r).Ther).The 22 measurement measurement methodsmethods or the 2 or the 2 variables converge (come together) and it is variables converge (come together) and it is called called CConvergent Validity Coefficientonvergent Validity Coefficient. . See See Table 10.2 on next slideTable 10.2 on next slide

6666

3. Construct Validity3. Construct Validity

3. Construct Validity 3. Construct Validity Construct=TraitConstruct=Trait

• *3. Divergent or Discriminate Validity Coefficient *3. Divergent or Discriminate Validity Coefficient • (2 different kinds) (2 different kinds) A. A. Correlations between Correlations between

measures of measures of ddifferentifferent construct construct (trait) (trait) using the using the same same measurementmeasurement method method is (is (HeteroHeterotraittrait--MonoMonomethodmethod Coefficient). Coefficient).

• Or, Or, B. B. usingusing differentdifferent measurementmeasurement methodsmethods for for differentdifferent constructsconstructs (trait) (trait)

• ((HeteroHeterotraittrait--HeteroHeteromethodmethod Coefficient). Coefficient). • Ideally there is low or no relationship between Ideally there is low or no relationship between

the variables. the variables. They Diverge (come apart).They Diverge (come apart). it is it is called called Divergent Validity Coefficient Divergent Validity Coefficient

• See Table 10.2 on next slideSee Table 10.2 on next slide 6767

6868

3. Construct Validity

• Factor AnalysisFactor Analysis

• Exploratory and Exploratory and ConfirmatoryConfirmatory

• Factor Analysis is another way to Factor Analysis is another way to measure the measure the validityvalidity of a test of a test. It is . It is about about Data reduction.Data reduction.

• Raymond Cattell in his 16 PF Raymond Cattell in his 16 PF reducedreduced 4500 4500 personality related questions into personality related questions into 187187 questions and questions and 16 16 related variables related variables or factors. or factors. Next slideNext slide

Descriptors of Low Range Primary Factor Descriptors of High Range

Impersonal, distant, cool, reserved, detached, formal, aloof

Warmth(A)

Warm, outgoing, attentive to others, kindly, easy-going, participating, likes people

Concrete thinking, lower general mental capacity, less intelligent, unable to handle abstract problems

Reasoning(B)

Abstract-thinking, more intelligent, bright, higher general mental capacity, fast learner

Reactive emotionally, changeable, affected by feelings, emotionally less stable, easily upset

Emotional Stability

(C)

Emotionally stable, adaptive, mature, faces reality calmly

Deferential, cooperative, avoids conflict, submissive, humble, obedient, easily led, docile, accommodating

Dominance(E)

Dominant, forceful, assertive, aggressive, competitive, stubborn, bossy

Serious, restrained, prudent, taciturn, introspective, silentLiveliness

(F)Lively, animated, spontaneous, enthusiastic, happy-go-lucky, cheerful, expressive, impulsive

Expedient, nonconforming, disregards rules, self-indulgent

Rule-Consciousness

(G)

Rule-conscious, dutiful, conscientious, conforming, moralistic, staid, rule bound

Shy, threat-sensitive, timid, hesitant, intimidatedSocial Boldness

(H)Socially bold, venturesome, thick-skinned, uninhibited

Utilitarian, objective, unsentimental, tough minded, self-reliant, no-nonsense, rough

Sensitivity(I)

Sensitive, aesthetic, sentimental, tender-minded, intuitive, refined

Trusting, unsuspecting, accepting, unconditional, easyVigilance

(L)Vigilant, suspicious, skeptical, distrustful, oppositional

Grounded, practical, prosaic, solution oriented, steady, conventional

Abstractedness(M)

Abstract, imaginative, absent minded, impractical, absorbed in ideas

Forthright, genuine, artless, open, guileless, naive, unpretentious, involved

Privateness(N)

Private, discreet, nondisclosing, shrewd, polished, worldly, astute, diplomatic

Self-assured, unworried, complacent, secure, free of guilt, confident, self-satisfied

Apprehension(O)

Apprehensive, self-doubting, worried, guilt prone, insecure, worrying, self blaming

Traditional, attached to familiar, conservative, respecting traditional ideas

Openness to Change

(Q1)

Open to change, experimental, liberal, analytical, critical, free-thinking, flexibility

Group-oriented, affiliative, a joiner and follower dependent

Self-Reliance(Q2)

Self-reliant, solitary, resourceful, individualistic, self-sufficient

Tolerates disorder, unexacting, flexible, undisciplined, lax, self-conflict, impulsive, careless of social rules, uncontrolled

Perfectionism(Q3)

Perfectionistic, organized, compulsive, self-disciplined, socially precise, exacting will power, control, self-sentimental

Relaxed, placid, tranquil, torpid, patient, composed low drive

Tension(Q4)

Tense, high energy, impatient, driven, frustrated, over wrought, time driven.

Primary Factors and Descriptors in Cattell's 16 Personality Factor Model (Adapted From Conn & Rieke, 1994

6969

Raymond Cattell's 16 Personality Factors

7070

• 3. Construct Validity3. Construct Validity *Construct Validity has the following 4*Construct Validity has the following 4 steps steps (Same as Research Hypotheses and (Same as Research Hypotheses and

Testing)Testing)1.1. Formulate one or moreFormulate one or more hypotheseshypotheses (state (state

your hypothesis) your hypothesis) Stress and Dep.Stress and Dep.2.2. Select (or develop) Select (or develop) a a measurement measurement

instrument instrument 3.3. Gather empirical Gather empirical datadata to test your to test your

hypotheses (collect your data and hypotheses (collect your data and calculate the statistics.calculate the statistics.

4.4. Determine if the Determine if the data are consistent data are consistent with with hypotheses (do your stats and make a hypotheses (do your stats and make a decision)decision)

7171

Validity CoefficientValidity Coefficient• The validity coefficientThe validity coefficient

• The validity coefficient is calculated as The validity coefficient is calculated as a a correlation between the two items correlation between the two items (variables) being compared(variables) being compared, very , very typically typically success in the test success in the test as as compared with compared with success in the job.success in the job.

• A validity of 0.6 and above is A validity of 0.6 and above is considered high, which suggests that considered high, which suggests that very few tests give strong indications very few tests give strong indications of job performance. of job performance.

7272

7373

*Validity Coefficients for True ScoresTrue Scores• Validity coefficient is like Validity coefficient is like reliabilityreliability and and

generalizability coefficients.generalizability coefficients.• rXY=SP/√ssX.ssYrXY=SP/√ssX.ssY Pearson Correlation Pearson Correlation

CoefficientCoefficient• pXtYt=pXY/√pXX’.pYY’ pXtYt=pXY/√pXX’.pYY’ This formula sometimes is This formula sometimes is

called the called the correlation for attenuation correlation for attenuation because it because it has a validity coefficient that is corrected for has a validity coefficient that is corrected for errors of measurement in the errors of measurement in the predictor (X) predictor (X) and and criterion (Y).criterion (Y).

• pXtYtpXtYt=Validity Coefficient for =Validity Coefficient for True scoreTrue score• pXYpXY=p value of X & Y =p value of X & Y = = SP = SP = 0.50.5

• pXX’=pXX’=p value for validity of X= p value for validity of X= ssssX X = = 0.60.6

• pYY’=pYY’=p value for validity of Y = p value for validity of Y = ssssY = Y = 0.50.5

7474

*The Relationship between Reliability and Validity

• If a test is unreliable pIf a test is unreliable pRR=0 it can not be =0 it can not be valid. If a test is reliable pvalid. If a test is reliable pRR=.6 doesn’t =.6 doesn’t mean it is valid. However, for mean it is valid. However, for Ex. in Ex. in psychology If data are psychology If data are valid, they must be reliable valid, they must be reliable therefore, therefore, if a psychological test if a psychological test is valid Pv=.90, it is also is valid Pv=.90, it is also reliablereliable..

7575


Mathematically Mathematically ppVV≤√p≤√pRR

This means the criterion related This means the criterion related validity coefficient can not validity coefficient can not exceed the square root of the exceed the square root of the predictor reliability coefficient.predictor reliability coefficient.

Ex. If reliability coefficient pEx. If reliability coefficient pRR=.81=.81Validity coefficient pValidity coefficient pVV≤√.81 ≤√.81

which is ≤.90which is ≤.90

7676


If someone who is 200 pounds steps If someone who is 200 pounds steps on a scale 10 times and gets different on a scale 10 times and gets different readings of 15, 250, 95, 140, etc., the readings of 15, 250, 95, 140, etc., the scale is scale is not reliablenot reliable. If the scale consistently reads "150", then it is reliable, reliable, but not valid. not valid. If it reads "200" each time, then the measurement is both reliable and validreliable and valid. This is what is meant by the statement, "Reliability is "Reliability is necessary necessary but not sufficient for but not sufficient for validity." validity." A test cannot be valid and not A test cannot be valid and not reliable. reliable.

Relationship between reliability and validityRelationship between reliability and validity• If data are valid, they must be reliable. If If data are valid, they must be reliable. If

people receive very different scores on a people receive very different scores on a test every time they take it, the test is not test every time they take it, the test is not likely to predict anything. However, if a test likely to predict anything. However, if a test is reliable, that does not mean that it is is reliable, that does not mean that it is valid. For example, we can measure valid. For example, we can measure strength of grip very reliably, but that does strength of grip very reliably, but that does not make it a valid measure of intelligence not make it a valid measure of intelligence or even of mechanical ability. or even of mechanical ability. Reliability is a Reliability is a necessary, but not sufficient, condition for necessary, but not sufficient, condition for validity. validity.

7777

Documents

1 CHAPTER 9 Reliability Coefficient for Criterion Referenced Tests