Chapter 5

Chapter 5

Measuring Variables and Sampling

Today: Begin Exam 2 material (Chapters 5, 6, 4)

Scales of measurement Psychometric properties

Reliability Validity

Tuesday: Finish chapter 5 Discuss Exam 1

Roadmap

We have:

A research question An idea for a research design A hypothesis

But how do we measure what we’re interested in?

Zoom out: where are we?

We study variables and need to measure them

accurately 4 scales of measurement

Nominal Ordinal Interval Ratio

Scales of Measurement

symbols classify or categorize into GROUPS or

TYPES Name, Categorize, Classify Caution: use of numbers to indicate group

Examples- gender, marital status, experimental condition

Nominal Scale

A rank order scale of measurement Examples- order of finish, Letter grade in class,

social class (low, med., high) Allows you to determine which person is higher

or lower but not how much higher or lower. Can’t make direct comparisons

Ordinal Scale

Rank ordering PLUS equal intervals of distance

between adjacent numbers Example- Celsius and Fahrenheit temperature, IQ

scores, year Now you can make comparisons Equal distances but no absolute zero point

Interval Scale

rank ordering, equal intervals PLUS an absolute

zero point Absolute zero = absence of variable Examples- Kelvin temperature, income, weight,

height, response time.

Ratio Scale

Reliability: Consistency/stability of scores Validity: Are you measuring what you are trying

to measure? Ideally, we want:

Measures that are reliable Inferences that are valid

Reliability is necessary but not sufficient in order to have validity

Psychometric properties

Think about a Target

4 Primary types

Test-Retest Reliability Equivalent- Forms Reliability Internal Consistency Reliability Interrater Reliability

Indicate level of reliability with a reliability coefficient Correlation; should be positive and strong (> .70)

Measuring Reliability

Refers to consistency over time Same measure administered twice (with a time

interval between)

Test- Retest

Equivalent forms- two versions of the same

measure Administer to the same group of people

Problem- hard to develop equivalent measures

Example: SAT, GRE

Equivalent-Forms Reliability

Consistency with which test items measure a

single construct. More items increases reliability, but we use as

few items as possible Why?

Internal Consistency

I feel sad I feel down I feel depressed I feel miserable I feel awful

Example: Internal Consistency

I feel hungry I feel happy I have green eyes Big Bird is scary I like turtles

http://www.youtube.com/watch?v=CMNry4PE93Y

Example: Internal Consistency

http://www.youtube.com/watch?v=CMNry4PE93Y

Measured using coefficient alpha (α)

a.k.a. Cronbach’s alpha Should be .7 or higher

High values mean the items are measuring the same construct

If your scale measures more than 1 thing, each construct gets its own coefficient α

Internal Consistency

Interrater reliability- consistency of ratings made

by different judges GRE writing section Expressive writing studies Correlation between ratings should be strong/positive

Interrater Reliability

percentage of times different observers agree

% of times raters agree- easy to calculate and understand

Interobserver Agreement

Accuracy of inferences or interpretations made

on the basis of scores Measuring schizophrenia, or love

We can’t directly observe it! It’s the accuracy of the interpretation from the test

Validity

Construct Operationalization Important to consider:

Does your operationalization truly reflect what you’re measuring?

Validation Never-ending process

Validity

Content validity: judgment of the degree to

which items adequately represent a construct’s domain. Do items appear to represent the thing you’re trying

to measure? (face validity) Does your measure exclude any important parts of

what you’re trying to measure? Does your test measure something besides what you

wanted? (i.e., include irrelevant items)

Obtaining Validity: Based on Content

Some constructs are multidimensional and need

measures that address all dimensions Homogeneity—degree to which a set of items

measure a single construct Item-to-total correlation Coefficient alpha

Obtaining Validity: Based on Internal Structure

Criterion-related validity: degree to which scores

predict or relate to an already established test Two types of criterion validity:

Predictive: using your measure to predict future performance

Concurrent: using your measure to predict current performance on the same construct, or a related one.

Obtaining Validity: Based onRelations to Other Variables

Convergent validity: relationship between your

measure and other measures of that same construct

Discriminant validity: evidence that scores from your measure are NOT similar to scores of tests on different constructs.

Obtaining Validity: Based onRelations to Other Variables

Reliability and validity info apply to the measure

of interest in the reported sample Situation-specific, not broad

Standardized tests: norming group If you want to use a test with a group not represented

in the norming group, be cautious Report R & V for your own sample, and be wary of

articles that make blanket statements about a measure’s R & V

Appropriate Use of Reliability and Validity Info

Documents

Chapter 5