28
The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Resolving the Goldilocks problem: Variables and measurement Jane E. Miller, PhD The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition.

The Chicago Guide to Writing about Multivariate Analysis, 2 nd edition. Resolving the Goldilocks problem: Variables and measurement Jane E. Miller, PhD

Embed Size (px)

Citation preview

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Resolving the Goldilocks problem: Variables and measurement

Jane E. Miller, PhD

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Overview• Identifying criteria for choosing fitting

contrasts for each variable• Understanding conceptual and contextual

aspects of your variables• Becoming familiar with the distributions of

your variables• Transforming variables• Describing your variables in the methods

section

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Criteria for choosing pertinent-sized contrasts for each of your variables

• Theoretical criteria• Empirical criteria• Measurement issues

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Theoretical criteria for choosing fitting contrasts

• Theoretical criteria relate to how that concept is measured and compared in the literature or real-world context.

• Examples:– Multiples of the poverty level that correspond

with program eligibility criteria for that place and time.

– Multiples of standard deviations of weight-for-height , based on international child growth standards.

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Identifying theoretical criteria for your topic

• Start by reading the literature to identify which ones pertain to each of your – Independent variables (IVs)– Dependent variables (DV)

• Also identify real-world factors pertaining to your variables. E.g., – Physical properties (e.g. freezing point of water)– Clinically meaningful contrasts– Socially relevant contrasts

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Empirical criteria for choosing fitting contrasts

• Based on the observed distribution of values in your data.

• Examples:– Multiples of standard deviations• Comparing values at the mean, and ±1 standard

deviation in the IV

– Interquartile range • Comparing values at the 25th and 75th percentiles of

the IV.

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

When to use empirical criteria

• Best used if theoretical criteria are not available for your topic.

• Or possibly to compare with other studies that have used same criteria.

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Measurement issues and choice of contrast size

• For some variables, a one-unit contrast is too small to be measured accurately.

• Examples:– Difficult for most individuals to accurately recall

their annual income to the nearest dollar.– Difficult to measure blood pressure to the nearest

1 mm Hg (millimeter of mercury)

• In such situations, use a larger contrast.

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Getting to know your variables

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Understanding the context• Become familiar with the range of values that

make sense for each of your variables:– When, where, and to whom the data pertain.

• E.g., pertinent values for family income will be different:– Now versus 200 years ago.– In the US versus in a developing country today.– For a low-income sample of the US than for the

entire population.

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Understanding conceptual attributes of your measures

• Become familiar with the ranges of values that make sense for each of your variables– A birth weight of 9,999 grams is too high• ~=22 lb., which is the size of an average 12 month old!

– In this case, problems arose due to ignoring• System of measurement (metric, not British)• Units• Real-world meaning of the number.

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Identifying the valid theoretical range of values

• Different types of measures have different valid ranges:– Proportions must fall between 0.0 and 1.0.– Temperature in °Fahrenheit can be either positive

or negative, but in °Kelvin can only be positive.– Number of children in a family has a narrower

theoretical range than does annual family income.

• Identify the pertinent limits for each of your variables.

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Examining the range of observed values

• Examine the distributions of the variables in your data set to become familiar with the – Units– Range– Distribution of values– Categories • Of nominal variables• Ordinal versions of continuous variables

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Identifying variables for which a 1-unit contrast is not suitable

• Based on your theoretical, contextual, and empirical investigations of each variable in your model, identify those for which– A one-unit contrast is too big• E.g., those with low values or a very narrow range

– A one-unit contrast is too small• E.g., those with very high values or a wide range

– A one-unit contrast is just right

• See podcast on defining the Goldilocks problem

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Defining variables to address the Goldilocks problem

• Many Goldilocks issues can be addressed by modifying one or more variables before specifying the multivariate model: – Rescaling– Using a different level of aggregation– Creating a categorical version of a continuous

variable.

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Transforming your variables

• These transformations can:– Make a one-unit increase in Xi align better with

the research question.– Shift the scale of the βs to be more consistent

across the set of variables in the model.

• For any of these approaches, retain the original variable and create a new variable with the transformed version.– Never overwrite the original data!

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Rescaling your variables

• For some research questions, a simple change of scale can help make a one-unit contrast in the independent variable align better with the research question.

• For example, working with – annual income in $10,000s instead of $1s.– ozone concentration in parts per thousand instead

of parts per million.

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Rescaling and the decimal system• Rescaling variables involves dividing or multiplying

the original variable by some value• Often a multiple of ten, e.g.,

– Multiply by 1,000– Divide by 100

• Although changing the scale of a variable by an order of magnitude or two is mathematically convenient, it is also arbitrary and in many cases unrelated to the topic or data under study.

– E.g., increments of 10 or 100 days don’t correspond to common usage as well as increments of 7 or 30 or 365 days.

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Changing the level of aggregation• An alternative way to make the scale of

variables fit better with a one-unit increase is to change the level of aggregation. – If a one-unit change in the original variable is too

small, shift to a lower level of aggregation, e.g.,• weekly income instead of annual income;• population at the county instead of state level.

– If a one-unit change is too large, shift to a higher level of aggregation, e.g.,• cost per dozen instead of per piece.

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Creating a categorical version of a continuous variable

• For topics for which standard ranges or cutoffs are commonly used, consider creating a categorical version of a continuous variable. E.g.,– Age ranges that relate to developmental,

economic, social, or health phenomena• 0–17 years (children), 18–64 years, 65+ years

– Clinically meaningful ranges of blood pressure• <120 mm Hg; 120–139 mm Hg; 140+ mm Hg

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Describing exploratory workin your methods section

• In the methods section, describe the behind-the-scenes work you did to address Goldilocks issues.

• Explain the reasons for those transformations given your research question and data.– Exploratory analysis of distributions of your

variables in your data set.– Background reading on commonly used cutoffs or

calculations for the variables you are using.

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Defining newly created variables in your methods section

• If you transformed variables or created categorical versions of continuous variables,– Report units and levels of aggregation for all

transformed variables. E.g.,• Income in $10,000s.• Logged(income in $1s).

– Specify cutoffs used to define categories. E.g., • Ranges of BMI used to define overweight or obesity.• Poverty thresholds (multiples of the Federal Poverty

Level) for different years or household compositions.

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Summary• Transforming one or more of your variables before

specifying your multivariate model can– Make a one-unit increase in each independent variable

align better with the research question.– Shift the scale of the βs to be more consistent across

independent variables in the model.

• In your methods section, describe– Exploratory data analysis to become familiar with

observed values and distributions of each variable in your model.

– The calculations and criteria used to create new variables.– Citations for those criteria and calculations.

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Suggested resources• Miller, J. E. 2013. The Chicago Guide to Writing

about Multivariate Analysis, 2nd Edition. – Chapter 10, on the Goldilocks problem– Chapter 4, on types of variables, units and

distribution– Chapter 7, on choosing effective examples– Chapter 13, on the data and methods section

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Suggested online resources

• Podcasts on – Defining the Goldilocks problem– Resolving the Goldilocks problem using• Model specification• Effective ways of presenting results

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Suggested practice problems

• Study guide to The Chicago Guide to Writing about Multivariate Analysis, 2nd Edition.– Problem sets for • chapter 7, question #6• chapter 10, questions #1 through 5.

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Suggested extensions• Study guide to The Chicago Guide to Writing

about Multivariate Analysis, 2nd Edition.– Suggested course extensions for • chapter 4

– “Reviewing” questions #1 and 3.

• chapter 10– “Reviewing” exercises #1 and 2.– “Applying statistics and writing” question #1, 2, 3, and 5.– “Revising” questions #1, 2, 3, and 9.

• chapter 13, “writing” exercises #3 and 4.

– “Getting to know your variables” assignment

The Chicago Guide to Writing about Multivariate Analysis, 2nd edition.

Contact information

Jane E. Miller, [email protected]

Online materials available athttp://press.uchicago.edu/books/miller/multivariate/index.html