Adlt673 session 5_quantitative_validity_practicatility - class 5

Session 5. Designing SoTL Studies: Validity and Practicality in Quantitative Research Studies

Virginia Commonwealth UniversityADLT 673 – Teaching as Scholarship in Medical EducationKelly Lockeman, PhD

“Everything that can be counted does not necessarily count; everything that counts cannot necessarily be counted.”

- Albert Einstein

“Measure what is measurable, and make measurable what is not so.”

- Galileo Galilei

Quantitative Research Experimental

Single subject True

experimental Quasi-

experimental

Non-Experimental Descriptive Comparative Correlational Ex post facto Causal-

comparative

Do you have a good hypothesis? Is it stated in declarative form? Is it consistent with known facts,

previous research, and theory? Does it state the expected

relationship between two or more variables?

Is it testable? Is it clear? Is it concise?

Variables: Building Blocks for Research

Variable: A concept or characteristic that can take on different values or be divided into categories.

A B C

Defining Variables

Conceptual Definitions: use words and concepts to describe the variable.

Operational Definitions: indicate how the concept is measured or manipulated.• How would you define each of these variables conceptually?

• How would you operationalize them in a research study?

GenderRace Social ClassLeadership StyleAchievementEfficacyDepression

Part I: Validity

Testing Your HypothesisIndependent Variable (IV): The predictor

or cause. In experimental studies, the researcher controls or manipulates the independent variable (the treatment or intervention).

Dependent Variable (DV): The outcome or effect that the researcher measures (e.g., knowledge, skills or attitudes).

A B CIndepende

ntIndepende

ntDependent

Questions of Validity

Possible problems related to causality:

The assessment was not measured well.

The intervention was not manipulated well.

Something other than the intervention

caused change in the assessment.

Sample Hypothesis: The intervention (IV) improves students on the assessment (DV).

Construct vs. Internal ValidityConstruct Validity:

Am I measuring what I think I am measuring?

Am I implementing what I think I am implementing?

Internal Validity: Did the treatment cause the outcome?

Characteristics of Validity It is the inference that is valid or

invalid, not the measure. An instrument can be valid for one

use but not another. Validity is a matter of degree. Validity involves an overall

evaluative judgment based on evidence.

A study does not have absolute validity or

absolutely no validityThe level of validity relates to the confidence in the conclusions

Construct and internal validity are measured on a continuum

Construct validity does not imply internal validity (and vice versa)

When a hypothesis is supported, it does not necessarily mean that the

study has either construct or internal validity

Evaluating Validity

What is meant by “construct”? A concept, model, or schematic idea A construct is the global notion of the

measure, such as: ▪ Student motivation▪ Intelligence▪ Student learning▪ Student anxiety

The specific method of measuring a construct is called the operational definition.

For any construct, researchers can choose many possible operational definitions.

What is a proxy measure? Example: What is “productivity”? (Operational

definition) How do we measure productivity? (Proxy measures) Common measures of productivity:

▪ Work output▪ Time and face time at work▪ Absences

Common data collection methods:▪ Observation▪ Record review▪ Self report

Proxy: approximates the real thing

Optimize Construct Validity

Measure learning directly (clear operational definitions; learning is not the same as enjoyment or perceived learning).

Measure student learning through student learning objectives (ensure these are aligned with assessments).

Use established scales to measure student attitudes and personality (don’t reinvent the wheel; tests in Print).

Optimize Construct Validity, cont.

Know how to score the measure (make sure you’ve established this before data collection; know what is reasonable; rubrics; training; IRR).

Determine whether to use graded or ungraded measures (pros and cons of both).

Minimize participant and researcher expectancies.

Optimize Construct Validity, cont.

Determine whether to use multiple operational definitions (can use multiple measures).

Use a retention measure to investigate long-term effects (but treat long term results with caution about other influences).

Good Differences between Conditions Improve Construct Validity

The treatment (intervention) needs to be manipulated well to ensure

construct validityThe only difference between

conditions should be the treatment

Other variables that are different between conditions

are confounds

To determine construct validity, treatments need specific operational definitions

Anything that can affect the results and cause a difference between students in treatment and control conditions needs to

be documented

Potential problems in using different sections of a class

Construct validity of the treatment is questionable in

any design that compares one section of a class with

anotherClasses are a social space, and the

students and instructors are interdependent

Students can ask different questions

The class may have a different “tone”

Splitting a class into two groups can minimize this concern; if students in

a split class can be randomly assigned to a condition, internal

validity will increase

Different Types of Comparison in Research Design Between Participants Within Participants:

Multiple TreatmentsWithin Participants: Multiple Measures

How it works

Students in one condition compared to students in another condition (control – Treatment; multiple T’s)

All students in both control and treatment conditions

Students receive both pre-test (control) and post-test (treatment)

Strengths No carryover effects from multiple treatments; no instrumentation or testing effects from multiple assessments

No selection bias; greater statistical power

No selection bias; greater statistical power

Weaknesses

Selection bias without random assignment; many differences if groups are separate (e.g., two separate classes); lower statistical power

Instrumentation and testing effects; carryover effects

Instrumentation and testing effects; other confounds that occur between assessments

Improve Internal Validity by:

Random assignment; adding covariates

Counterbalancing Increase number of assessments; add no treatment separate control condition; use alternative measures for assessment

External Validity

Can the sample used in the study generalize to other groups

or populations?

Generally, it is impossible in

classroom studies to get a sample that will generalize

to all students.

The researcher

should report demographic

characteristics

How realistic is the

situation? In a classroom, if the treatment

works, external

validity is higher

Part II: Practicality

Common Problems in SoTL Research

Trying to measure everything Small number of students = low statistical

power Only a single class; limits type of design Difficulties in random assignment Difficulties in determining whether the

treatment is potent enough to have an effect (relates to power)

Conducting an ethical study in a classroom or training situation

Don’t Use Want to make

statement about causality

Have low number of students

Use Have single group of

students that cannot be divided

Have only one session in which to collect data

Additional Options:Correlate many variables at the same time

Simple Correlation

One-Group, Post Test Only

Don’t Use Want to make

statement about causality

Want to make comparison to another group

Use Desired focus is on

describing treatment and not assessment

Cannot have pre-test or control group

Want single group of students that cannot be divided

Two-Group, Post-Test Only

Don’t Use Have low number of

students Groups are very

different Have different

assessments for each condition

Use Concerned about

carryover effects Concerned about testing

and instrumentation effects

Have multiple groups Have only one session to

collect data Additional Options:

• Use random assignment to improve internal validity• Add post-test to assess long-term change• Add additional conditions• Use covariates to improve internal validity and power

One Group, Pre-test, Post-test

Don’t Use Items other than

treatment occur between assessments

First assessment affects second

Students likely to change between assessments with no treatment

Use Have low number of

students Have single group

that cannot e divided Cannot have control

condition

Additional Options:• Add post-test to assess long-term change• Use alternative measures to minimize testing and

instrumentation effects

Two-Group, Pre-test/Post-test

Don’t Use Have single group

of students that cannot be divided

Use Have multiple

groups

Additional Options:• Use random assignment to improve internal

validity• Add post-test to assess long-term change• Use alternative measures to minimize testing

and instrumentation effects• Add additional conditions• Use covariates to improve internal validity and

power

Within Participants Design

Don’t Use Early treatments

affect later treatments Early assessments

affect later assessments



that cannot be divided

Additional Options:• Add additional treatments• Counterbalance conditions to improve internal validity• Include pre-test to assess students before any treatment

Crossover Design

Don’t Use First assessment, by

itself, affects second Have single group of

students that cannot be divided


students Have multiple groups

Additional Options:• Include pre-test to assess before treatment• Add post-test to examine long-term change• Use random assignment to improve internal validity• Use alternative measures to minimize testing and

instrumentation effects

Interrupted Time-Series Design

Don’t Use Have only one

session to collect data

Early assessments affect later assessments



that cannot be divided Want to determine

long-term effects

Additional Options:• Add control condition to improve internal

validity• Add additional treatment condition, with

treatment at different time to improve internal validity

More Complex Designs Use multiple treatments to

investigate interactions (Interactions) Use moderators to determine when

treatment has effect (Concept of ATI) Use mediators to investigate how

treatment has effect (Mixed Method?)

Remember! Each design has advantages and

disadvantages. Often, there is no clear right way,

although some designs will be better than others.

There is no single ideal study that eliminates all potential problems and all alternative hypotheses.

One study cannot answer all of your questions!