33
Relationships Can Be Deceiving Statistics lecture 5

Relationships Can Be Deceiving Statistics lecture 5

Embed Size (px)

Citation preview

Relationships Can Be Deceiving

Statistics lecture 5

Goals for Lecture 5

Recognize when correlation can be misleading

Realize reasons why two variables may be related, without cause-and-effect

Understand non-statistical considerations that can help establish a causal relationship

Thought Question 1

For each of these, is the correlation higher or lower than it would have been without the outlier?

Thought Question 2

There is a strong correlation in Lisbon between weekly sales of hot castanhas and weekly sales of tecidos para espirra. Does this mean that castanhas cause people to espirrar?

Thought Question 3

Research has found that countries with higher average fat intake tend to have higher breast cancer rates. Does this provide evidence that dietary fat is a contributing cause of breast cancer?

Problems with Correlations

Outliers can inflate or deflate correlations Groups combined inappropriately may mask

relationships

With Outliers

Without Outliers

Hours Worked vs. Annual Earnings

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

0 20 40 60

hours

earn

ing

s

r = +.53

Hours Worked vs. Annual Earnings

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

0 20 40 60

hours

earn

ing

s

r = +.53

Hours Worked vs. Annual Earnings

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

0 20 40 60

hours

earn

ing

s

r = +.39

Combining GroupsCan Deceive

Class correlation of weight to height:

r = .69 Men’s correlation of weight to height:

r = .58 Women’s correlation of weight to height:

r = .21

Combining groups

More combining groups

Remember!

Correlation does not imply causation.

(Igrejas and liquor stores, shoe size and reading ability)

Correlation of variables

When considering relationships between measurement variables, there are two kinds: Explanatory (or independent) variable: The

variable that attempts to explain or is purported to cause (at least partially) differences in the…

Response (or dependent or outcome) variable Often, chronology is a guide to distinguishing

them (examples: baldness and heart attacks, poverty and test scores)

Some reasons why two variables could be related

The explanatory variable is the direct cause of the response variable

Some reasons why two variables could be related

The explanatory variable is the direct cause of the response variable

Example: pollen counts and percent of population suffering allergies, intercourse and babies

Some reasons two variables could be related

The response variable actually is causing a change in the explanatory variable

Some reasons two variables could be related

The response variable is causing a change in the explanatory variable

Example: hotel occupancy and advertising spending, divorce and alcohol abuse

Some reasons two variables could be related

The explanatory variable is a contributing -- but not sole -- cause

Some reasons two variables could be related

The explanatory variable is a contributing -- but not sole -- cause

Example: birth complications and violence, gun in home and homicide, hours studied and grade, diet and cancer

Some reasons two variables could be related

Confounding variables may exist

Some reasons two variables could be related

Confounding variables may exist

Example: happiness and heart disease, traffic deaths and speed limits

Some reasons two variables could be related

Both variables may result from a common cause

Some reasons two variables could be related

Both variables may result from a common cause

Example: SAT score and GPA, hot chocolate and tissues, storks and babies, fire losses and firefighters, WWII fighter opposition and bombing accuracy

Some reasons two variables could be related

Both variables are changing over time

Some reasons two variables could be related

Both variables are changing over time

Example: divorces and drug offenses, divorces and suicides

Some reasons two variables could be related

The association may be nothing more than coincidence

Some reasons two variables could be related

The association may be nothing more than coincidence

Example: clusters of disease, brain cancer from cell phones

So how can we confirm causation?

The only way to confirm is with a designed experiment. But non-statistical evidence of a possible connection may include:

A reasonable explanation of cause and effect.

A connection that happens under varying conditions.

Potential confounding variables ruled out.

Why?

Orchestra conductors tend to live long lives. Fewer accidents after speed limits were

lowered in 1973 due to the oil embargo. In the week before the 1994 Northridge

earthquake, 149 were admitted for heart attacks. In the week after there were 201.

PERGUNTAS?