19
1 Chapter 7 Scatterplots, Association, and Correlation

Association, and Correlationhomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7... · Positive Association 4 50 55 60 65 70 75 80 85 0 50 100 150 Percent of country between 15

  • Upload
    vuliem

  • View
    221

  • Download
    2

Embed Size (px)

Citation preview

Page 1: Association, and Correlationhomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7... · Positive Association 4 50 55 60 65 70 75 80 85 0 50 100 150 Percent of country between 15

1

Chapter 7

 Scatterplots, Association, and Correlation

Page 2: Association, and Correlationhomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7... · Positive Association 4 50 55 60 65 70 75 80 85 0 50 100 150 Percent of country between 15

  Here, we see a positive relationship between a bear’s age and its neck diameter.

2

Scatterplots & Correlation

As a bear gets older, it tends to have a larger neck.

Page 3: Association, and Correlationhomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7... · Positive Association 4 50 55 60 65 70 75 80 85 0 50 100 150 Percent of country between 15

3

Scatterplots & Correlation

  Statistics is about … variation.

  Recognize, quantify and try to explain variation.  Variation in neck

measurements can be explained, at least in part, by the age of the bear.

 Older bear Larger neck

Page 4: Association, and Correlationhomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7... · Positive Association 4 50 55 60 65 70 75 80 85 0 50 100 150 Percent of country between 15

Positive Association

4 50 55 60 65 70 75 80 85

050

100

150

Percent of country between 15 and 64 yrs-old

Cel

l pho

ne u

sage

per

100

peo

ple

Cell phone usage per 100 people vs. Percent of individuals between 15 & 64

  Data from 2008.   These variables have a

positive correlation…   A country with a larger

percentage of people between 15-64 tends to have more cell phone users.

Page 5: Association, and Correlationhomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7... · Positive Association 4 50 55 60 65 70 75 80 85 0 50 100 150 Percent of country between 15

Negative Association

5

  Outside temperature and amount of natural gas used.

  These variables have a negative correlation…   Days with higher

temperature tend to use less natural gas.

  Higher temperature Less gas used

0

5

10

Gas

-5.0 .0 5.0 10.0 15.0

Temp

Page 6: Association, and Correlationhomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7... · Positive Association 4 50 55 60 65 70 75 80 85 0 50 100 150 Percent of country between 15

6

Scatterplots & Correlation  When the two variables of interest are

continuous variables, we can plot their relationship with a scatterplot (or scatter diagram).

  A scatterplot gives you a quick look at the general relationship between the variables.

  Each observation provides one point on the plot. 0

5

10

Gas

-5.0 .0 5.0 10.0 15.0

Temp

Page 7: Association, and Correlationhomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7... · Positive Association 4 50 55 60 65 70 75 80 85 0 50 100 150 Percent of country between 15

7

  Response variable – plotted on the vertical axis.   Also called the dependent variable.

  Explanatory variable – plotted on the horizontal axis.   Used to try to explain variation in the response variable.   Also called the independent variable.

50 100 150 200 250 300

2025

3035

4045

50

Engine HorsePower

Hig

hway

MP

G

HWY-mpg is the response variable

Engine HPW is the explanatory variable

Here, we use Engine HPW to explain the variability in HWY-mpg.

Page 8: Association, and Correlationhomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7... · Positive Association 4 50 55 60 65 70 75 80 85 0 50 100 150 Percent of country between 15

Correlation and Association

  When describing relationships, we use the terms correlation and association interchangeably. If variables are correlated, we say they are associated.

8

Definition A correlation exists between two variables when higher values of one variable consistently go with higher values of another variable or when higher values of one variable consistently go with lower values of another variable.

Page 9: Association, and Correlationhomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7... · Positive Association 4 50 55 60 65 70 75 80 85 0 50 100 150 Percent of country between 15

9

Positive Association

 Positive Association

 Above average values of Age are associated with above average values of Neck Measure (age-high goes with neck-high)

 Below average values of Age are associated with below average values of Neck Measure(age-low goes with neck-low)

(correlation)

Page 10: Association, and Correlationhomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7... · Positive Association 4 50 55 60 65 70 75 80 85 0 50 100 150 Percent of country between 15

10

Negative Association

 Negative Association

 Below average values of Engine HPW are associated with above average values of HWY-mpg (HPW-low goes with MPG-high).

 Above average values of Engine HPW are associated with below average values of HWY-mpg (HPW-high goes with MPG-low).

50 100 150 200 250 300

2025

3035

4045

50

Engine HorsePower

Hig

hway

MP

G(correlation)

Page 11: Association, and Correlationhomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7... · Positive Association 4 50 55 60 65 70 75 80 85 0 50 100 150 Percent of country between 15

Strength of Association

11

 Correlation applies only to quantitative (continuous) variables.

 Correlation measures the strength of linear association.

  The correlation coefficient (r) gives the direction of the linear association and quantifies the strength of the linear association between two quantitative variables.

  Correlation is a `unitless’ quantity (not in ‘feet’ or ‘inches’… no units)

Page 12: Association, and Correlationhomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7... · Positive Association 4 50 55 60 65 70 75 80 85 0 50 100 150 Percent of country between 15

12 12

Strength of Association

1.0 -1.0 0.0

Very Weak or No Linear

Relationship

Strong Positive Linear

Relationship

Strong Negative Linear

Relationship

Correlation Coefficient (r) will be between -1 and 1.

Page 13: Association, and Correlationhomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7... · Positive Association 4 50 55 60 65 70 75 80 85 0 50 100 150 Percent of country between 15

13

r = ?

r =0.3 r =0.7 r =1

r = – 1

r =0.0

r = – 0.3 r = – 0.7

weak (fuzzy)

weak (fuzzy)

none

stronger (more clear)

stronger (more clear)

r not meaningful, this is non-linear

super strong

super strong

Page 14: Association, and Correlationhomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7... · Positive Association 4 50 55 60 65 70 75 80 85 0 50 100 150 Percent of country between 15

14

Things to look for in a scatterplot

  1. Direction of association   Positive or negative.

  2. Form of association   Linear, curved, clustered, scattered (no relationship).

  3. Strength of association   How closely the points follow a clear form.

  4. Outliers   A point that lies outside of the general pattern.

2520151050

2.0

1.5

1.0

0.5

0.0

Tar (mg)

Nic

otin

e (m

g)

Nicotine Content vs. Tar Content

Page 15: Association, and Correlationhomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7... · Positive Association 4 50 55 60 65 70 75 80 85 0 50 100 150 Percent of country between 15

15

Example

Direction _____________

Form _____________

Strength ___________

Outliers? ___________

Page 16: Association, and Correlationhomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7... · Positive Association 4 50 55 60 65 70 75 80 85 0 50 100 150 Percent of country between 15

16

Association vs. Causation

 The existence of an association does not equate to causation.

  To imply that a change in one variable causes a change in another is a very strong statement – use ‘association’ for our relationships in this class.

Page 17: Association, and Correlationhomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7... · Positive Association 4 50 55 60 65 70 75 80 85 0 50 100 150 Percent of country between 15

17 17

Correlation Cautions

 Don’t confuse correlation with causation.  There is a strong positive correlation between

shoe size and intelligence.

 Beware of lurking variables.

Page 18: Association, and Correlationhomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7... · Positive Association 4 50 55 60 65 70 75 80 85 0 50 100 150 Percent of country between 15

18 18

Beware of lurking variables

  Lurking variable – a hidden variable that stands behind a relationship and affects the other two variables.

Number of firefighters at scene

fire

dam

age

(dol

lars

$)

Size of fire?

Page 19: Association, and Correlationhomepage.stat.uiowa.edu/~rdecook/stat1010/notes/Chapter7... · Positive Association 4 50 55 60 65 70 75 80 85 0 50 100 150 Percent of country between 15

  Increasing the size of the fire will cause greater damage.

  Increasing the number of firefighters at the fire will not cause greater damage, but we do tend to see more firefighters at larger fires.

 Correlation does NOT imply causality. 19

Association vs. Causation