26
Reminders/Comments Thanks for the quick feedback I’ll try to put HW up on Saturday and I’ll email you Final project will be assigned in the last week of class You’ll have that week to do it Participation is awarded for attendance, vocal and group participation in lecture and quiz section, and discussion board HW due and quiz to take tomorrow 1

Chapter 14: Describing Relationships (Scatterplots and ...azimmer/Lect14_ch14.pdf · Chapter 14: Describing Relationships (Scatterplots and Correlation) Aaron Zimmerman STAT 220 -

  • Upload
    others

  • View
    21

  • Download
    1

Embed Size (px)

Citation preview

Page 1: Chapter 14: Describing Relationships (Scatterplots and ...azimmer/Lect14_ch14.pdf · Chapter 14: Describing Relationships (Scatterplots and Correlation) Aaron Zimmerman STAT 220 -

Reminders/Comments

• Thanks for the quick feedback

• I’ll try to put HW up on Saturday and I’ll email you

• Final project will be assigned in the last week of class

• You’ll have that week to do it

• Participation is awarded for attendance, vocal and groupparticipation in lecture and quiz section, and discussionboard

• HW due and quiz to take tomorrow

1

Page 2: Chapter 14: Describing Relationships (Scatterplots and ...azimmer/Lect14_ch14.pdf · Chapter 14: Describing Relationships (Scatterplots and Correlation) Aaron Zimmerman STAT 220 -

Warm Up

In the mid-19th century, Francis Galton wanted to knowwhether fathers pass some physical characteristics on to theirsons. To investigate this question, he traveled around Britainand recorded the heights of 1000 first born sons and theheights of their fathers. He found that tall fathers tend tohave tall sons, and short fathers tend to have short sons.

• What are the explanatory and response variables?

• Does this study give evidence of a causalrelationship? Why or why not?

2

Page 3: Chapter 14: Describing Relationships (Scatterplots and ...azimmer/Lect14_ch14.pdf · Chapter 14: Describing Relationships (Scatterplots and Correlation) Aaron Zimmerman STAT 220 -

Chapter 14: Describing Relationships

(Scatterplots and Correlation)

Aaron ZimmermanSTAT 220 - Summer 2014

Department of StatisticsUniversity of Washington - Seattle

3

Page 4: Chapter 14: Describing Relationships (Scatterplots and ...azimmer/Lect14_ch14.pdf · Chapter 14: Describing Relationships (Scatterplots and Correlation) Aaron Zimmerman STAT 220 -

Motivating Example

• Francis Galton’s study is anexample of a multivariate study

? Most studies examine data onmore than one variable

• Our strategy for multivariateproblems is the same

? 1) First plot the data, then addnumerical summaries

? 2) Look for overall patterns anddeviations from those patterns

? 3) When the overall pattern isquite regular, there is sometimes away to describe it concisely

4

Page 5: Chapter 14: Describing Relationships (Scatterplots and ...azimmer/Lect14_ch14.pdf · Chapter 14: Describing Relationships (Scatterplots and Correlation) Aaron Zimmerman STAT 220 -

Motivating Example

• The most common graphicalway to describe the relationbetween two quantitativevariables is a scatterplot

• The explanatory variable(if there is one) goes on thex-axis

• The response variable (ifthere is one) goes on they-axis

5

Page 6: Chapter 14: Describing Relationships (Scatterplots and ...azimmer/Lect14_ch14.pdf · Chapter 14: Describing Relationships (Scatterplots and Correlation) Aaron Zimmerman STAT 220 -

Motivating Example

• As with univariate graphs,the first step is to look forthe overall pattern and forstriking deviations fromthat pattern

? You can describe theoverall pattern of ascatterplot by thedirection, form, andstrength of the relationship

? An important kind ofdeviation is an outlier, anindividual value that fallsoutside the overall patternof the relationship

6

Page 7: Chapter 14: Describing Relationships (Scatterplots and ...azimmer/Lect14_ch14.pdf · Chapter 14: Describing Relationships (Scatterplots and Correlation) Aaron Zimmerman STAT 220 -

Motivating Example

• The direction of a scatterplotis described by the associationbetween the variables

? Two variables arepositively associated whenabove average values of onetend to accompany aboveaverage values of the otherand vice versa. The scatterplotslopes upward as we movefrom left to right

• Heights of fathers arepositively associated withheights of sons

7

Page 8: Chapter 14: Describing Relationships (Scatterplots and ...azimmer/Lect14_ch14.pdf · Chapter 14: Describing Relationships (Scatterplots and Correlation) Aaron Zimmerman STAT 220 -

Motivating Example

• The direction of a scatterplotis described by the associationbetween the variables

? Two variables arenegatively associated whenabove average values of onetend to accompany belowaverage values of the otherand vice versa. The scatterplotslopes downward as we movefrom left to right

Key idea #1: If there is strong association between two vari-ables, knowing one helps a lot in predicting the other. Butwhen there is a weak association, information about one vari-able does not help much in guessing the other.

8

Page 9: Chapter 14: Describing Relationships (Scatterplots and ...azimmer/Lect14_ch14.pdf · Chapter 14: Describing Relationships (Scatterplots and Correlation) Aaron Zimmerman STAT 220 -

Motivating Example

• The form of a scatterplot canbe linear or curved

? The relationship betweenheights of fathers and heightsof sons is roughly linear

• The strength of a relationshipis determined by how closelythe points follow a clear form

? The relationship betweenheights of father and heightsof sons is moderately strong

9

Page 10: Chapter 14: Describing Relationships (Scatterplots and ...azimmer/Lect14_ch14.pdf · Chapter 14: Describing Relationships (Scatterplots and Correlation) Aaron Zimmerman STAT 220 -

Scatterplot Practice

• Was the average midtermscore around 25, 50, or 75?

• Was the SD of the midtermscores around 5, 10, or 20?

• Was the SD of the final scoresaround 5, 10, or 20?

• Which exam was harder?

• Was there more spread in themidterm or final scores?

• T/F: There was a strongpositive association betweenmidterm and final scores

10

Page 11: Chapter 14: Describing Relationships (Scatterplots and ...azimmer/Lect14_ch14.pdf · Chapter 14: Describing Relationships (Scatterplots and Correlation) Aaron Zimmerman STAT 220 -

Motivating Example

• There are a few outliers inGalton’s scatterplot

? Some short fathers had verytall sons

? Some relatively tall fathershad very short sons

• For correlation and regression(the next two topics), it will beimportant to note whetherthere are many outliers in thesame direction

• It doesn’t look like there areany more outliers in onedirection than another

11

Page 12: Chapter 14: Describing Relationships (Scatterplots and ...azimmer/Lect14_ch14.pdf · Chapter 14: Describing Relationships (Scatterplots and Correlation) Aaron Zimmerman STAT 220 -

Correlation

• When the relationship between two variables is roughlylinear, the direction and strength of the relationship canbe described numerically by the correlation r

• Positive r indicates positive association between thevariables, while negative r indicates negative association

• The correlation r always falls between -1 and 1

? Values of r near zero indicate a very weak straight-linerelationship

? The strength of the relationship increases as r movesaway from 0 towards either -1 or 1.

? Values of r close to -1 or 1 indicate that the points arealmost in a straight line

12

Page 13: Chapter 14: Describing Relationships (Scatterplots and ...azimmer/Lect14_ch14.pdf · Chapter 14: Describing Relationships (Scatterplots and Correlation) Aaron Zimmerman STAT 220 -

Correlation

13

Page 14: Chapter 14: Describing Relationships (Scatterplots and ...azimmer/Lect14_ch14.pdf · Chapter 14: Describing Relationships (Scatterplots and Correlation) Aaron Zimmerman STAT 220 -

Correlation

• The correlation betweenheights of fathers and heightsof sons must be positivebecause the variables arepositively associated

• The exact value is r = 0.52

• Again, this indicates amoderately positive linearrelationship between thevariables

14

Page 15: Chapter 14: Describing Relationships (Scatterplots and ...azimmer/Lect14_ch14.pdf · Chapter 14: Describing Relationships (Scatterplots and Correlation) Aaron Zimmerman STAT 220 -

Four other reminders about correlation

• The correlation between two variables does not changewhen we change the units of measurement

• Correlation ignores the distinction between explanatoryand response variables

• The correlation is strongly affected by a few outliers

• Correlation only measures linear relationships

? Association can be used to describe any relationship

? Correlation only measures the direction and strengthof linear relationships

15

Page 16: Chapter 14: Describing Relationships (Scatterplots and ...azimmer/Lect14_ch14.pdf · Chapter 14: Describing Relationships (Scatterplots and Correlation) Aaron Zimmerman STAT 220 -

United Nations Data Example

• There is a great deal of interest in identifying variablesthat are associated with the average life expectancy incountries around the world

• The following three examples will examine three potentialexplanatory variables

? The percent of the population who attends primaryschool

? The log per capita GDP of the country

? The annual population growth in the country

• In each example, the response variable is the averagelife expectancy in the country

16

Page 17: Chapter 14: Describing Relationships (Scatterplots and ...azimmer/Lect14_ch14.pdf · Chapter 14: Describing Relationships (Scatterplots and Correlation) Aaron Zimmerman STAT 220 -

Example #1

• Can we use correlation todescribe the relationship?

• How would you describethe correlations direction?

• How would you describethe correlations strength?

• What is your estimate ofthe value of thecorrelation?

17

Page 18: Chapter 14: Describing Relationships (Scatterplots and ...azimmer/Lect14_ch14.pdf · Chapter 14: Describing Relationships (Scatterplots and Correlation) Aaron Zimmerman STAT 220 -

Example #1

• The association betweenprimary school attendanceand life expectancy ispositive, but the form iscurved

• So, it is not appropriateto use correlation as anumerical summary ofthis relationship

18

Page 19: Chapter 14: Describing Relationships (Scatterplots and ...azimmer/Lect14_ch14.pdf · Chapter 14: Describing Relationships (Scatterplots and Correlation) Aaron Zimmerman STAT 220 -

Example #2

• Can we use correlation todescribe the relationship?

• How would you describethe correlations direction?

• How would you describethe correlations strength?

• What is your estimate ofthe value of thecorrelation?

19

Page 20: Chapter 14: Describing Relationships (Scatterplots and ...azimmer/Lect14_ch14.pdf · Chapter 14: Describing Relationships (Scatterplots and Correlation) Aaron Zimmerman STAT 220 -

Example #2

• We can use correlation, itlooks more or less linear

• I’d say the correlation isstrong and positive

• The correlation is around0.8.

20

Page 21: Chapter 14: Describing Relationships (Scatterplots and ...azimmer/Lect14_ch14.pdf · Chapter 14: Describing Relationships (Scatterplots and Correlation) Aaron Zimmerman STAT 220 -

Example #3

• Can we use correlation todescribe the relationship?

• How would you describethe correlations direction?

• How would you describethe correlations strength?

• What is your estimate ofthe value of thecorrelation?

21

Page 22: Chapter 14: Describing Relationships (Scatterplots and ...azimmer/Lect14_ch14.pdf · Chapter 14: Describing Relationships (Scatterplots and Correlation) Aaron Zimmerman STAT 220 -

Example #3

• We can use correlation, itlooks more or less linear

• I’d say the correlation ismoderate and negative

• The correlation is around-0.7.

22

Page 23: Chapter 14: Describing Relationships (Scatterplots and ...azimmer/Lect14_ch14.pdf · Chapter 14: Describing Relationships (Scatterplots and Correlation) Aaron Zimmerman STAT 220 -

Describe the correlation

For social scientists

• Sociologists at the Universityof Virginia (Oishi et al 2011)investigated the associationbetween the level of incomeinequality in the U.S. each yearsince 1972 and the level of“happiness” in that year. Theycalculated the Gini coefficient(a measure of incomeinequality) in each year, andcompared these values withaverage survey responses ineach year to the question,“how happy are you?”

For other scientists

• River ecologists at theAmerican Museum of NaturalHistory investigated theassociation betweentemperature and oxygen levelsin river water. They measuredthe water temperature (indegrees Celcius) at 500locations, and compared thesevalues to the number of mg/Lof dissolved oxygen in thewater at each location.

23

Page 24: Chapter 14: Describing Relationships (Scatterplots and ...azimmer/Lect14_ch14.pdf · Chapter 14: Describing Relationships (Scatterplots and Correlation) Aaron Zimmerman STAT 220 -

Describe the correlation

24

Page 25: Chapter 14: Describing Relationships (Scatterplots and ...azimmer/Lect14_ch14.pdf · Chapter 14: Describing Relationships (Scatterplots and Correlation) Aaron Zimmerman STAT 220 -

Describe the correlation

For social scientists

• Weak, negative correlation

• r = −0.32

For other scientists

• Strong negative correlation

• r = −0.86

25

Page 26: Chapter 14: Describing Relationships (Scatterplots and ...azimmer/Lect14_ch14.pdf · Chapter 14: Describing Relationships (Scatterplots and Correlation) Aaron Zimmerman STAT 220 -

Homework

• Read Chapter 14

• Do problems 14.6 (explain why), 14.7, 14.13, 14.17,14.19, 14.22, 14.27 (also report the correlation betweenthe variables)

26