27
Reliability Reliability and Validity and Validity

Reliability and Validity. Criteria of Measurement Quality How do we judge the relative success (or failure) in measuring various concepts? How do we judge

Embed Size (px)

Citation preview

Reliability Reliability and Validityand Validity

Criteria of Measurement Criteria of Measurement QualityQuality

How do we judge the relative How do we judge the relative success (or failure) in measuring success (or failure) in measuring various concepts?various concepts?Reliability – consistency of Reliability – consistency of

measurement measurement Validity – confidence in measures Validity – confidence in measures

and designand design

Reliability and ValidityReliability and Validity

Reliability focuses on measurementReliability focuses on measurementValidity also extends to:Validity also extends to:

Precision in the design of the study – Precision in the design of the study – ability to isolate causal agents while ability to isolate causal agents while controlling other factorscontrolling other factors (Internal Validity)(Internal Validity)

Ability to generalized from the unique and Ability to generalized from the unique and idiosyncratic settings, procedures and idiosyncratic settings, procedures and participants to other populations and participants to other populations and conditionsconditions (External Validity)(External Validity)

ReliabilityReliability

Consistency of MeasurementConsistency of MeasurementReproducibility over timeReproducibility over timeConsistency between different Consistency between different

coders/observerscoders/observersConsistency among multiple indicatorsConsistency among multiple indicators

Estimates of ReliabilityEstimates of ReliabilityStatistical coefficients that tell use how Statistical coefficients that tell use how

consistently we measured somethingconsistently we measured something

Measurement ValidityMeasurement Validity

Are we really measuring concept we Are we really measuring concept we defined?defined? Is it a valid way to measure the concept?Is it a valid way to measure the concept?

Many different approaches to Many different approaches to validationvalidation Judgmental as well as empirical aspectsJudgmental as well as empirical aspects

Key to Reliability and Key to Reliability and ValidityValidity

Concept explicationConcept explication Thorough meaning analysisThorough meaning analysis

Conceptual definition:Conceptual definition: Defining what a concept means Defining what a concept means

Operational definition: Operational definition: Spelling out how we are going to measure Spelling out how we are going to measure

conceptconcept

Four Aspects of Four Aspects of Reliability:Reliability:

1. Stability1. Stability

2. Reproducibility2. Reproducibility

3. Homogeneity3. Homogeneity

4. Accuracy4. Accuracy

1. Stability1. Stability

Consistency across timeConsistency across time repeating a measure at a later time to repeating a measure at a later time to

examine the consistency examine the consistency Compare time 1 and time 2Compare time 1 and time 2

2. Reproducibility2. Reproducibility

Consistency between observersConsistency between observers

Equivalent application of measuring deviceEquivalent application of measuring device Do observers reach the same conclusion?Do observers reach the same conclusion? If we don’t get the same results, what are we If we don’t get the same results, what are we

measuring?measuring? Lack of reliability can compromise validityLack of reliability can compromise validity

3. Homogeneity3. Homogeneity

Consistency between different measures of Consistency between different measures of the same conceptthe same concept Different items used to tap a given concept Different items used to tap a given concept

show similar results – ex. open-ended and show similar results – ex. open-ended and closed-ended questionsclosed-ended questions

4. Accuracy4. Accuracy

Lack of mistakes in measurementLack of mistakes in measurement

Increased by clear, defined proceduresIncreased by clear, defined procedures Reduce complications that lead to errorsReduce complications that lead to errors

Observers must have sufficient:Observers must have sufficient: TrainingTraining MotivationMotivation ConcentrationConcentration

Increasing ReliabilityIncreasing Reliability

General:General: Training coders/interviewers/lab personnelTraining coders/interviewers/lab personnel More careful concept explication More careful concept explication

(definitions)(definitions) Specification of procedures/rulesSpecification of procedures/rules Reduce subjectivity (room for interpretation)Reduce subjectivity (room for interpretation)

Survey measurement:Survey measurement: Increase the number of items in scaleIncrease the number of items in scale Weeding out bad items from “item pool”Weeding out bad items from “item pool”

Content analysis coding:Content analysis coding: Improve definition of content categoriesImprove definition of content categories Eliminate bad codersEliminate bad coders

Indicators of ReliabilityIndicators of Reliability

Test-retestTest-retest Make measurements more than once and Make measurements more than once and

see if they yield the same resultsee if they yield the same result

Split-halfSplit-half If you have multiple measures of a concept, If you have multiple measures of a concept,

split items into two scales, which should split items into two scales, which should then be correlatedthen be correlated

Cronbach’s Alpha or Mean Item-total Cronbach’s Alpha or Mean Item-total CorrelationCorrelation

Reliability and ValidityReliability and Validity

Reliability is a necessary condition for Reliability is a necessary condition for validityvalidity If it is not reliable it cannot be validIf it is not reliable it cannot be valid

Reliability is NOT a sufficient condition for Reliability is NOT a sufficient condition for validityvalidity If it is reliable it may not necessarily be validIf it is reliable it may not necessarily be valid

Example:Example: Bathroom scale, old springs Bathroom scale, old springs

Not Reliable or ValidNot Reliable or Valid

Reliable but not ValidReliable but not Valid

Reliable and ValidReliable and Valid

Types of ValidityTypes of Validity1. Face validity1. Face validity

2. Content validity2. Content validity

3. Pragmatic (criterion) validity3. Pragmatic (criterion) validity A. Concurrent validityA. Concurrent validity B. Predictive validityB. Predictive validity

4. Construct validity4. Construct validity A. Testing of hypothesesA. Testing of hypotheses B. Convergent validityB. Convergent validity C. Discriminant validityC. Discriminant validity

Face ValidityFace Validity

Subjective judgment of experts about:Subjective judgment of experts about:““what’s there”what’s there”Do the measures make sense?Do the measures make sense?

Compare each item to conceptual Compare each item to conceptual definitiondefinitionDo it represent the concept in question?Do it represent the concept in question?

If not, it should be droppedIf not, it should be droppedIs the measure valid “on its face”Is the measure valid “on its face”

Content ValidityContent Validity

Subjective judgment of experts about:Subjective judgment of experts about: ““what is not there”what is not there”

Start with conceptual definition of each Start with conceptual definition of each dimension:dimension: Is it represented by indicators at the operational level?Is it represented by indicators at the operational level? Are some over or underrepresented?Are some over or underrepresented?

If current indicators are insufficient:If current indicators are insufficient: develop and add more indicatorsdevelop and add more indicators

Example--Civic Participation questions:Example--Civic Participation questions: Did you vote in the last election?Did you vote in the last election? Do you belong to any civic groups?Do you belong to any civic groups? Have you ever attended a city council meeting?Have you ever attended a city council meeting? What about “protest participation” or “online What about “protest participation” or “online

organizing”?organizing”?

Pragmatic ValidityPragmatic Validity

Empirical evidence used to test validityEmpirical evidence used to test validity Compare measure to other indicatorsCompare measure to other indicators

1. Concurrent validity1. Concurrent validity Does a measure predict simultaneous criterion?Does a measure predict simultaneous criterion?

Validating new measure by comparing to existing Validating new measure by comparing to existing measuremeasure

E.g., Does new intelligence test correlate with E.g., Does new intelligence test correlate with established testestablished test

2. Predictive validity2. Predictive validity Does a measure predict future criterion?Does a measure predict future criterion?

E.g., SAT scores: Do they predict college GPA?E.g., SAT scores: Do they predict college GPA?

Construct ValidityConstruct Validity

Encompasses other elements of validityEncompasses other elements of validity Do measurements:Do measurements:

A. Represent all dimensions of the conceptA. Represent all dimensions of the concept B. Distinguish concept from other similar B. Distinguish concept from other similar

concepts concepts

Tied to meaning analysis of the conceptTied to meaning analysis of the concept Specifies the dimensions and indicators to be Specifies the dimensions and indicators to be

testedtested

Assessing construct validityAssessing construct validity A. Testing hypothesesA. Testing hypotheses B. Convergent validityB. Convergent validity C. Discriminant validityC. Discriminant validity

A. Testing A. Testing HypothesesHypotheses

When measurements are put into When measurements are put into practice:practice: Are hypotheses that are theoretically derived, Are hypotheses that are theoretically derived,

supported by observations?supported by observations? If not, there is a problem with:If not, there is a problem with:

A. TheoryA. Theory B. Research design (internal validity)B. Research design (internal validity) C. Measurement (construct validity?)C. Measurement (construct validity?)

In seeking to examine construct validity:In seeking to examine construct validity: Examine theoretical linkages of the concept to Examine theoretical linkages of the concept to

othersothers Must identify antecedent and consequencesMust identify antecedent and consequences

What leads to the concept?What leads to the concept? What are the effects of the concept?What are the effects of the concept?

B. Convergent B. Convergent ValidityValidity

Measuring a concept with different Measuring a concept with different methodsmethods If different methods yield the same results:If different methods yield the same results:

than convergent validity is supportedthan convergent validity is supported

E.g., Survey items measuring Participation:E.g., Survey items measuring Participation: VotingVoting Donating to money to candidatesDonating to money to candidates Signing petitionsSigning petitions Writing letters to the editorWriting letters to the editor Civic group membershipsCivic group memberships Volunteer activitiesVolunteer activities

C. Discriminant (Divergent) C. Discriminant (Divergent) ValidityValidity

Measuring a concept to Measuring a concept to discriminate that concept from discriminate that concept from other closely related conceptsother closely related conceptsE.g., Measuring Maternalism and E.g., Measuring Maternalism and

Paternalism as distinct conceptsPaternalism as distinct concepts

Dimensions of Validity for Dimensions of Validity for Research DesignResearch Design

InternalInternalValidity of research designValidity of research design

Validity of sampling, measurement, Validity of sampling, measurement, proceduresprocedures

ExternalExternalGiven the research design, how valid areGiven the research design, how valid are

Inferences made from the conclusionsInferences made from the conclusions Implications for real worldImplications for real world

Internal and External Internal and External Validity in Experimental Validity in Experimental

DesignDesign Internal validity:Internal validity:

Did the experimental treatment make a difference?Did the experimental treatment make a difference? Or is there an internal design flaw that invalidates the Or is there an internal design flaw that invalidates the

results?results?

External validity:External validity: Are the results generalizable?Are the results generalizable?

Generalizable to:Generalizable to: What populations?What populations? What situations?What situations?

Without internal validity, there is no Without internal validity, there is no external validityexternal validity