29
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Differentiating between statistical significance and substantive importance Jane E. Miller, PhD

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Differentiating between statistical significance and substantive importance Jane

Embed Size (px)

Citation preview

Page 1: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Differentiating between statistical significance and substantive importance Jane

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.

Differentiating between statistical significance and

substantive importanceJane E. Miller, PhD

Page 2: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Differentiating between statistical significance and substantive importance Jane

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.

Overview

• Substantive significance defined• Quick review of statistics– What questions can they answer?– What questions can’t they answer?

• How to implement a balanced presentation of multivariate results. Both – Statistical significance– Substantive importance

Page 3: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Differentiating between statistical significance and substantive importance Jane

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.

Objective of most research papers• Few people who write about multivariate analysis are

focused solely on statistical mechanics such as developing new computer algorithms or formal statistical tests.– Some statisticians and methodologists will have those

interests.

• Most of us are interested in studying some relationship among social science or health concepts.– Test a hypothesis, derived from theory or previous empirical

studies.– Inferential statistics are a necessary tool for hypothesis

testing in quantitative research.

Page 4: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Differentiating between statistical significance and substantive importance Jane

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.

What is substantive significance?

• Substantive significance of an association between two variables.– “So what?”– “How much does it matter?”

• Real-world relevance to topic• In various disciplines, substantive significance = – “clinically…– “economically…– “educationally…– …meaningful” variation.

Page 5: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Differentiating between statistical significance and substantive importance Jane

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.

Example: BMI & mortality

• Body mass index (BMI) shows a statistically significant positive association with mortality.

• But is that gradient substantively significant?– Is it worth designing an intervention to decrease BMI

as a way of decreasing mortality?

Page 6: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Differentiating between statistical significance and substantive importance Jane

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.

Key criteria for assessing substantive significance

• Is the association causal?– Will changing the hypothesized cause lead to change

in the purported effect?– Will weight loss (reduced BMI) yield lower mortality?

• Is the effect big enough to matter?– Is the excess mortality among overweight or obese

persons large enough to justify a program?

• Can the hypothesized cause be changed?– Is BMI malleable?

Page 7: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Differentiating between statistical significance and substantive importance Jane

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.

Example prose• “For every hour a boy played a video game, he

read just two minutes less than a boy who didn’t play video games. Notably, non-gaming boys didn’t read much at all either, spending only eight minutes a day with a book.”

• From a NYT summary of Cummings and Vandewater, 2007. “Relation of Adolescent Video Game Play to Time Spent in Other Activities,” Archives of Pediatrics and Adolescent Medicine.

Page 8: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Differentiating between statistical significance and substantive importance Jane

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.

Quick review of statistical significance testing

Page 9: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Differentiating between statistical significance and substantive importance Jane

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.

Start with a hypothesis• In the gaming example, the authors hypothesized that

the more time adolescents spent on video games, the less time they spent on homework.– So far, description is purely in terms of the concepts under

study.– No statistical jargon, yet…

• To formalize this for statistical testing– Homework time = dependent variable (Y)– Gaming time = independent variable (Xi)– Ha= gaming time is negatively associated with homework

time.• In other words, Xi is inversely associated with Y

Page 10: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Differentiating between statistical significance and substantive importance Jane

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.

Contrast it against the null hypothesis

• The assumption of “no difference between groups” is called the null hypothesis (H0).

• In the study on effects of gaming on homework time – H0: time among gamers = time among non-gamers

OR– time among gamers - time among non-gamers = 0– In words, the null hypothesis states that there is no

difference in the amount of time spent on homework by gamers versus non-gamers.

Page 11: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Differentiating between statistical significance and substantive importance Jane

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.

What ? does inferential statistics answer?

• “How likely would it be to obtain a difference at least as large as that observed between groups in the sample if in fact there is no difference between groups in the population?”

• The p-value tells us the probability of falsely rejecting the null hypothesis.– Conventional levels of “statistical significance” : p<.05– Strictly speaking, p<.05 tells us that for a large sample

such as that used in the gaming study (N~1,400), the estimated coefficient on time spent gaming is at least 1.96 times its standard error.

Page 12: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Differentiating between statistical significance and substantive importance Jane

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.

What questions DOESN’T it answer?• Whether the relationship is – Causal• Association ≠ causation

– In the expected direction• The difference could be statistically significant but in the

opposite of the hypothesized direction.

– Big enough to matter in the real-world context• Each hour spent gaming reduced reading time by 2

minutes. Is that enough to induce genuine concern from parents or teachers?

– Malleable

Page 13: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Differentiating between statistical significance and substantive importance Jane

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.

Conclusion: Don’t stop at “p<.05”!• “p<.05” answers only part of what we want to know

about our research question.– It is a necessary but not sufficient part of statistical analysis.

• Also need to consider questions about– Substantive significance• Direction• Size

– Causality• Non-causal associations should not be used to inform

policy or program changes.• Confounding or spurious associations should be ruled out.

– Often why we estimate a multivariate model.

Page 14: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Differentiating between statistical significance and substantive importance Jane

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.

Substantive significance overlooked

• Many statistics textbooks show how to assess and present statistical significance.

• Few if any show how to assess and present substantive significance.

Page 15: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Differentiating between statistical significance and substantive importance Jane

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.

Balance presentation of statistical and substantive significance

• How to include both:– Inferential statistics for formal hypothesis testing.– Interpretation of substantive significance of findings

in the context of the specific research question.• Critical for policy-makers and others not formally trained in

statistics.

Page 16: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Differentiating between statistical significance and substantive importance Jane

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.

Principles for presenting results• Name the specific variables. Avoid – Writing about “my dependent variable” or “the

coefficient.”– Using acronyms from your database

• Report numbers in tables.– Complete set of coefficients, standard errors,

goodness-of-fit statistics.

• Interpret numbers in text.– Incorporate units and categories for variables into

the prose description.

Page 17: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Differentiating between statistical significance and substantive importance Jane

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.

What to report for coefficients• Direction (AKA “sign”)– For categorical independent variables (IV), which

category has higher value of the dependent variable (DV)?

– For continuous IVs, is the trend in the DV up, down, or level?

• Magnitude– How big is the difference in the DV across values of

the IV?

• Statistical significance

Page 18: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Differentiating between statistical significance and substantive importance Jane

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.

Gender as a predictor of birth weight• Poor: “Boys weigh significantly more at birth than

girls.”– Concepts and direction but not magnitude. – Statistical significance is ambiguous: Is the term

“significant” intended in the statistical sense or to describe a large difference?

• Slightly better: “Gender is associated with a difference of 116.1 grams in birth weight (p<.01).”– Concepts, magnitude, and statistical significance but not

direction: Was birth weight higher for boys or for girls?• Best: “At birth, boys weigh on average 116 grams

more than girls (p<.01).”– Concepts, reference category, direction, magnitude, and

statistical significance.

Page 19: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Differentiating between statistical significance and substantive importance Jane

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.

Substantive issues for coefficients on continuous predictors

• A β on a continuous independent variable measures the change in the dependent variable for a 1-unit increase in that independent variable.– For some variables, a 1-unit increase is too small to be

substantively meaningful.• E.g., a $1 increase in annual per capita income in the US today.

– For other variables, a 1-unit increase is too big to be plausible.• E.g., a 1-unit increase in a variable measured as a proportion.

• “The Goldilocks problem”• Need to look at distribution of values in your data.

Page 20: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Differentiating between statistical significance and substantive importance Jane

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.

Solutions to Goldilocks problems

• In those cases, to assess whether a coefficient is “big” or “small,” need a different sized contrast.

• Important for comparing coefficients across variables.

• See related podcasts on the Goldilocks problem.• Identifying a Goldilocks problem• Solutions:

– Defining variables– Specifying models– Interpreting results

Page 21: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Differentiating between statistical significance and substantive importance Jane

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.

Substantive significance in the discussion

• Place findings back in the broader perspective of the original research question.

• Do they correspond to your hypothesis in terms of – Direction (sign) of the effect?– Size?– Was the effect size attenuated when potential confounders or

mediators were introduced into the model?

• What is the evidence for a causal relationship?– If not causal, what explains the association?– If causal, what are the implications for policy, programs, etc.?

Page 22: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Differentiating between statistical significance and substantive importance Jane

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.

Substantive issues from gaming study• “But the meaning of the finding [that girls who are

gamers spend less time than non-gamers on homework] is not clear, as high-academic achievers often spend less time on homework as well.”– Places the finding in broader context by discussing other

correlates of homework time.

• “Although only a small % of girls played video games, our findings suggest that gaming may have different social implications for boys than for girls.”– Raises the question of selection effects: which girls play video

games, and do their other characteristics affect how they spend their time?

Page 23: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Differentiating between statistical significance and substantive importance Jane

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.

Relate findings to previous studies’• Are your findings consistent with the published

literature on the subject in terms of statistical significance, sign, and approximate size?

• If not, why not?– Different sample (place, time, subgroup)– Different data source or study design– Different model specification• Included potential confounders not previously analyzed.• Tested for possible mediating effects of 1+ factors.

Page 24: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Differentiating between statistical significance and substantive importance Jane

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.

Statistical significance in the discussion

• Describe in words, not #s.– No detailed standard errors, p-values, or test

statistics.

• Focus on the purpose of the statistical tests– Did the main variable of interest increase

proportion of variance explained by the model?– Did some other variable “explain” the association

between your key variable and the outcome?

Page 25: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Differentiating between statistical significance and substantive importance Jane

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.

Summary• Emphasize the substantive issues behind the

statistical analyses.– Design the specification to match topic and data.– Choose plausible, relevant numeric contrasts.

• Aim for a balanced presentation of statistical significance and substantive importance.– Use prose to ask and answer research question.– Use tables to report comprehensive, detailed statistics.– Use charts if needed to convey complex patterns.

Page 26: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Differentiating between statistical significance and substantive importance Jane

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.

Suggested resources• Chapter 3 (Statistical significance, substantive significance,

and causality) in – Miller, J.E., 2004. The Chicago Guide to Writing about Numbers

OR

– Miller, J.E., 2013. The Chicago Guide to Writing about Multivariate Analysis, 2nd edition. (“WAMA II”)

• Miller J.E. and Y.V. Rodgers, 2008. “Economic Importance and Statistical Significance: Guidelines for Communicating Empirical Research.” Feminist Economics. 14(2):117-149.• Chapter 10 (Goldilocks problem) in WAMA II.

Page 27: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Differentiating between statistical significance and substantive importance Jane

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.

Suggested online resources

• Podcasts on– Comparing two numbers or series– Reporting coefficients from OLS and logit models– Defining the Goldilocks problem– Resolving the Goldilocks problem: Presenting results

Page 28: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Differentiating between statistical significance and substantive importance Jane

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.

Suggested practice exercises

• Study guide to The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.– Questions #2 and #4 from the problem set for

chapter 3– Suggested course extensions for chapter 3• “Reviewing” exercises #1–4• “Writing and revising” exercises #1 and #2

Page 29: The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Differentiating between statistical significance and substantive importance Jane

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.

Contact information

Jane E. Miller, [email protected]

Online materials available athttp://press.uchicago.edu/books/miller/multivariate/index.html