To Explain Or To Predict?

To Explain or To Predict?Explanatory vs. Predictive Modeling in Scientific Research

Galit Shmuéli

Georgetown UniversityOctober 30, 2009

The path to discovery

Predict

Explain

What are

“explaining”?

“predicting”?

Statistical modeling in social science research

Purpose: test causal theory (“explain”)Association-based statistical models

Prediction nearly absent

Whether statisticians like it or not,

in the social sciences,

association-based statistical models are used for testing causal theory.

Justification: a strong underlying theoretical model provides the causality.

Lesson #1:

Definition: Explanatory Model

A statistical model used for testing causal theory

(“proper” or not)

Definition: Predictive Model

An empirical model used for predicting new records/scenarios

Multi-page sections with theoretical justifications of each hypothesis

Concept operationalization

4 pages of such tables

AngerEconomic stability

Well-being

Poverty

Statistical model (here: path analysis)

“Statistical” conclusions

Research conclusions

Lesson #2

In the social sciences,

empirical analysis is mainly used for testing causal theory.

Empirical prediction is considered un-academic.

Some statisticians share this view: The two goals in analyzing data... I prefer to describe as “management” and “science”. Management seeks profit... Science seeks truth.

Parzen, Statistical Science 2001

Prediction in the Information Systems literature

Predictive goal stated?Predictive power assessed?

“Examples of [predictive] theory in IS do not come readily to hand, suggesting that they are not common” Gregor, MISQ 2006

1072 articles

of which

52 empirical with predictive claims

Breakdown of the 52 “predictive” articles

To PredictTo Explain

test causal theory

(utility)

relevancenew theory

predictability

Scientific use of empirical models

Why Predict?

Why are statistical

explanatory models different than

predictive models?

Theory vs. its manifestation

“The goal of finding models that are predictively accurate differs from the goal of finding models that are true.”

Given the research environment in the social sciences, two critically important points are:

1. Explanatory power and predictive accuracy cannot be inferred from one another.

2. The “best” explanatory model is (nearly) never the “best” predictive model, and vice versa.

Point #1

Explanatory Power

Predictive Power ≠

Cannot infer one from the other

What is R2 ?

In-sample vs. out-of-sample evaluation

out-of-sample

Performance Evaluation

Danger: type I,II errors

goodness-of-fit

p-values

Danger: over-fitting

prediction accuracy

interpretation

run time

Suggestion for social scientists:

Report predictive accuracy in addition to explanatory power

Explanatory Power

Best explanatory model

Best predictive model

Point #2

Predict ≠ Explain

“We should mention that not all data features were found to be useful. For example, we tried to benefit from an extensive set of attributes describing each of the movies in the dataset. Those attributes certainly carry a significant signal and can explain some of the user behavior. However, we concluded that they could not help at all for improving the accuracy of well tuned collaborative filtering models.”

Bell et al., 2008

Predict ≠ ExplainThe FDA considers two products bioequivalent if the 90% CI of the relative mean of the generic to brand formulation is within 80%-125%

“We are planning to… develop predictive models for bioavailability and bioequivalence”

Lester M. Crawford, 2005Acting Commissioner of Food & Drugs

Let’s dig in

Explanatory goal:

minimize model bias

Predictive goal:

minimize MSE (model bias + sampling variance)

What isOptimized?

Bias Prediction MSE

Var(Y)= uncontrollable

bias2 = model misspecification

estimation (sampling variance)

Linear Regression Example

True modelEstimated model

2211)( xxxf

)ˆˆ(0))(( 221122

1 xxVarxfYEMSE

2211ˆˆ)(ˆ xxxf

)ˆ())(*ˆ( 112

222122

2 xVarxAxxfYEMSE

11)(* xxf

11̂)(*ˆ xxf

11 '' xxxxA

Underspecified modelEstimated model

MSE2 < MSE1 when: σ2 large

|β2| small corr(x1,x2) high

limited range of x’s

Two statistical modeling paths

China's Diverging Paths, photo by Clark Smith

Goal Definition

Design & Collection

Data Preparation

Variables? Methods? Evaluation,

Validation & Model Selection

Model Use & Reporting

Study design

Hierarchical data

Observational or experiment?

Primary or secondary data?

Instrument (reliability+validity vs. measur accuracy)

How much data?

How to sample?

& data collection

Data preparation

reduced-feature models

missing

partitioning

outliers

PCASVD

trends

Interactive visualization

summary stats plots

Which variables?

Multicollinearity?

theory associations ex-post availability

A, B, A*B?

ensemblesPLS

ridge regression

variance bias

Methods / Models

Blackbox / interpretableMapping to theory

boosting

Evaluation, Validation& Model Selection

Training dataEmpirical model Holdout data

Predictive power

Over-fitting analysis

Theoretical model

Empirical model

ValidationModel fit ≠

Explanatory power

Inference

Model Use

Test causal theory

(utility) PredictionsRelevanceNew theoryPredictability

Predictive performance

Over-fitting analysis

Null hypothesis

Naïve/baseline

Goal Definition

Design & Collection

Data Preparation

Variables? Methods? Evaluation,

Validation, & Model Selection

Model Use & Reporting

How does all this impact

research

in the (social) sciences?

Three Current Problems

“While the value of scientific prediction… is beyond question… the inexact sciences [do not] have…the use of predictive expertise well in hand.”

Helmer & Rescher, 1959

Distinction blurred

Inappropriate modeling/assessment

Prediction underappreciated

What can be done?

Statisticians should acknowledge the difference and teach it!

It’s time for Change

To Predict

To Explain

To Explain Or To Predict?

Education

Models or Theories Allow scientists to communicate their ideas to others. A model should be able to explain previous observations and predict future outcomes

DEPARTMENT OF NURSING › sites › default › files › users › ...the context of nursing’s philosophical framework and scientific foundation to anticipate, predict, and explain

PENERAPAN MODEL PEMBELAJARAN PREDICT-OBSERVE … · iii penerapan model pembelajaran predict-observe-explain (poe) dengan metode praktikum untuk meningkatkan rasa ingin tahu dan prestasi

Objectives Define 5 ways scientists predict future population sizes. Explain different stages of demographic transition

Introduction to Criminology distribute › sites › default › files › upm... · criminal behavior and attempts to define, explain, and predict it is criminology. Criminology

mrsmcgaffin.weebly.com · Web viewName: Period: Cancer Epidemiology P-E-O Probe (Predict – Explain – Observe) Predict As we have learned, cancer is caused by mutations occurring

To explain or to predict - Universitetet i oslo · To explain or to predict: Explain Mechanisms Causality DAGs Exposure & outcome Confounder, Mediator, Collider To explain or to predict:

PENERAPAN STRATEGI PREDICT-OBSERVE-EXPLAIN (POE) … · berdasarkan hasil analisis angket sebagian besar siswa merespon dengan sangat baik penggunaan strategi Predict-Observe-Explain

To Predict to Explain

PENGARUH STRATEGI PEMBELAJARAN PDEODE (PREDICT …repository.radenintan.ac.id/8175/1/SKRIPSI.pdf · 2019-10-16 · PENGARUH STRATEGI PEMBELAJARAN PDEODE (PREDICT DISCUSS EXPLAIN OBSERVE

Using Agency Theory to Model Cooperative Public Purchasing · · 2013-07-19Using Agency Theory to Model Cooperative Public Purchasing ... to explain, predict and ... USING AGENCY

Predict, observe, explain : activities

for Parents Slideshow - Study Skills by SOAR Learning€¦ · predict earthquakes? Explain your answer. Critical Thinking – Applying Concepts 3. Using the term density, explain

IMPLEMENTASI DESAIN PEMBELAJARAN PROBEX PREDICT …repository.ump.ac.id/7100/1/Yeyen Nur Aisah_JUDUL.pdf · IMPLEMENTASI DESAIN PEMBELAJARAN PROBEX (PREDICT – OBSERVE - EXPLAIN)

€¦ · Web viewStudents will be able to explain how air applies pressure and predict how various changes affect the pressure

RESEARCH METHODS. Goals of Psychology Describe Explain Predict Control …………behavior and mental processes

Topographical attributes to predict soil hydraulic properties ......Soil, landscape and hydraulic properties were investigated in order to predict and explain local-scale runoff and

Penerapan Model Poe (Predict-Observe-Explain) untuk

entirely explain the cause-and-effect relations of phenomena, predict these phenomena,

PENGEMBANGAN MODUL KIMIA BERBASIS POE (PREDICT, …eprints.walisongo.ac.id/9910/1/SKRIPSI FULL.pdf · pengembangan modul kimia berbasis poe (predict, observe, explain) pada materi