13
CORRELATION VS. CAUSATION 4.2

C ORRELATION V S. C AUSATION 4.2 C AUTIONS ABOUT C ORRELATION AND R EGRESSION Correlation and Regression ONLY describe only linear relationships r and

Embed Size (px)

Citation preview

Page 1: C ORRELATION V S. C AUSATION 4.2 C AUTIONS ABOUT C ORRELATION AND R EGRESSION Correlation and Regression ONLY describe only linear relationships r and

CORRELATION VS. CAUSATION4.2

Page 2: C ORRELATION V S. C AUSATION 4.2 C AUTIONS ABOUT C ORRELATION AND R EGRESSION Correlation and Regression ONLY describe only linear relationships r and

CAUTIONS ABOUT CORRELATION AND REGRESSION

Correlation and Regression ONLY describe only linear relationships

r and Least Squares Line are NOT resistant Extreme values and influential points can have

large effect Plot your scatter plot FIRST!!!!

Page 3: C ORRELATION V S. C AUSATION 4.2 C AUTIONS ABOUT C ORRELATION AND R EGRESSION Correlation and Regression ONLY describe only linear relationships r and

EXTRAPOLATION Predicting x values from y’s (Extrapolation)

You SHOULD remain within the domain of your data Or very close to it

Predictions Outside your domain are often VERY inaccurate

The following is the least squares regression equation obtained for a young child’s heights in feet (y) compared to her age in years(x). Assuming the girl will live to be 52, predict her height at this ripe old age.

xy 1495.3388375.2^

10 feet tall

Age (yrs) Height (ft)

3 2.795

4 2.925

4.25 2.9575

4.5 3.0225

4.75 3.055

5 3.0875

Obviously people don’t continue to grow over time…

Just remember to be careful when extrapolating!!

Page 4: C ORRELATION V S. C AUSATION 4.2 C AUTIONS ABOUT C ORRELATION AND R EGRESSION Correlation and Regression ONLY describe only linear relationships r and

LURKING VARIABLES Lurking Variable

Variable not in your study that can (and probably does) effect the interpretation of the relationship between your two measured variables

Often makes up the “left over” r2

May be hidden Can cause a “strong” or “weak” relationship that isn’t true Dangerous to data and Interpretations

What do I do about them?

Try to identify them BEFORE the study

Talk about their possible effects in your interpretations

Use a residual plot with time as your x to try to identify potential effects

Page 5: C ORRELATION V S. C AUSATION 4.2 C AUTIONS ABOUT C ORRELATION AND R EGRESSION Correlation and Regression ONLY describe only linear relationships r and

SHOULD I USE AVERAGED DATA? Averaged data is okay, BUT

It shouldn’t really be used to predict or interpret for INDIVIDUALS

Correlations based on Averaged Data are often too High when applied to individuals

Averaged Data should be used to make predictions about averages

So What Do I Need to Do?

Pay attention to the WHOLE Situation: Look at the Data (Contextually) Look for Possible Lurking Variables Make sure to DOUBLE CHECK any Contextual

Inferences you make!!

Page 6: C ORRELATION V S. C AUSATION 4.2 C AUTIONS ABOUT C ORRELATION AND R EGRESSION Correlation and Regression ONLY describe only linear relationships r and

CAUSATION r and r2, our regression statistics are describing an

association between 2 variables. But does this association mean that the explanatory

variable CAUSES the response variables An obvious example of this statement comes from a true

study that found the association listed below:

An actual study performed over a one year time span found a statistically strong relationship between the number of ice cream cones sold in a month and the number of homicides in the same month.

While there appeared to be a statistical association between these two variables, we know that it would be incorrect to say that the number of ice cream cones sold CAUSES the number of homicides.

This is where a LURKING variable comes into play…

Page 7: C ORRELATION V S. C AUSATION 4.2 C AUTIONS ABOUT C ORRELATION AND R EGRESSION Correlation and Regression ONLY describe only linear relationships r and

CAUSATION (VISUALLY)

Below are three different visual examples of different situations and underlying variables that can Explain an association

x y

Dotted lines = association

Arrow = causal relationship

Causation

x y

z

x y

z

Common Response

(lurking variable)

ConfoundingCommon Response

Causation doesn’t mean there aren’t other factors that effect the result… Just that the response is directly caused by the explanatory variable…

Page 8: C ORRELATION V S. C AUSATION 4.2 C AUTIONS ABOUT C ORRELATION AND R EGRESSION Correlation and Regression ONLY describe only linear relationships r and

CAUSATION (DIRECT) Let’s look at situations where direct causation occurs

A study of recorded the heights of young males (between the ages of 12 and 15) and their fathers. The study found an association between the two heights with an r2 of about 25%.

While there is a direct cause between the thickness of the rat’s stomach and the ounces of battery acid eaten, this is an example of a situation that you can’t generalize to all cases. IE… The effect might not be the same for humans.

There is a direct causal relationship between the height of a father and their son through heredity. It is possible to have direct causation with a low r2, it just says that the father’s height only explains about 25% of the variation in the son’s height.

A study performed on a number of lab rats found an association between the number of ounces of battery

acid eaten and the thickness level of the stomach lining.

Page 9: C ORRELATION V S. C AUSATION 4.2 C AUTIONS ABOUT C ORRELATION AND R EGRESSION Correlation and Regression ONLY describe only linear relationships r and

COMMON RESPONSE (LURKING VARIABLE)

Let’s look at situations where there is a “lurking” variable An actual study performed over a one year time span

found a strong relationship statistically between the number of ice cream cones sold in a month and the number of homicides sold in the same month

Earlier we found a fairly good association between the number of tv’s that a person owns and their life expectancy.

While this study may show an association between the two, we know that there are many other “lurking” variables that can have an effect on life expectancy and the # of tv’s you own…. (DISCUSSION!!)

While this study provided evidence that there was an association between ice cream and homicides, they both are probably effected by a lurking variable such as heat/temperature. IE – when people are hot, they eat ice cream and when they are hot they are CRANKY

The MORAL: Association

doesn’t mean CAUSATION

Page 10: C ORRELATION V S. C AUSATION 4.2 C AUTIONS ABOUT C ORRELATION AND R EGRESSION Correlation and Regression ONLY describe only linear relationships r and

CONFOUNDING

Two variables are “confounding” when you can’t tell which variable is effecting the responseMr. Arnold and Mr. Reed have been selected to compare the effectiveness of two well known laundry detergents, PRIDE and NONE. Each takes their respective detergents home, wash their clothes, and then bring them to a panel of judges for submission. It is found that PRIDE is the better detergent because Mr. Reed’s clothes are more clean.

While we can say that the detergent had an effect on the cleanliness of their clothes, there are other factors that could have equally effected the outcome… Washer quality, Water Quality, Laundry Cycle, etc… When we can’t tell if the “lurking” variables or the explanatory variable had the effect, the study is CONFOUNDING.

The MORAL: Association

doesn’t mean CAUSATION

Page 11: C ORRELATION V S. C AUSATION 4.2 C AUTIONS ABOUT C ORRELATION AND R EGRESSION Correlation and Regression ONLY describe only linear relationships r and

SO WHEN CAN I SAY CAUSE?

Remember, even HIGH correlation doesn’t mean CAUSATION

When can I say it?

If you do an EXPERIMENT and control lurking

variables OR if you can prove high association over repeated studies, then you can say the

magic word!!!

CauseMan, I look good!!

Page 12: C ORRELATION V S. C AUSATION 4.2 C AUTIONS ABOUT C ORRELATION AND R EGRESSION Correlation and Regression ONLY describe only linear relationships r and

MORAL OF THE STORY

Correlation and Association doesn’t mean CAUSATION

Really examine the CONTEXT of your data Don’t just look at the numbers

Numbers tell you everything!!

I love Numbers!!

Don’t listen to that Geek! You

better look at the CONTEXT, not just

the numbers.

Page 13: C ORRELATION V S. C AUSATION 4.2 C AUTIONS ABOUT C ORRELATION AND R EGRESSION Correlation and Regression ONLY describe only linear relationships r and

HOMEWORK

#38-45