25
Introduction to Introduction to Statistics Statistics Correlation Correlation Chapter 15 Chapter 15 April 23-28, 2009 April 23-28, 2009 Classes #27-28 Classes #27-28

Introduction to Statistics Introduction to Statistics Correlation Chapter 15 April 23-28, 2009 Classes #27-28

Embed Size (px)

DESCRIPTION

Notation A correlation requires two scores for each individual –One score from each of the two variables –They are normally identified as X and Y

Citation preview

Page 1: Introduction to Statistics Introduction to Statistics Correlation Chapter 15 April 23-28, 2009 Classes #27-28

Introduction to StatisticsIntroduction to Statistics

CorrelationCorrelationChapter 15Chapter 15

April 23-28, 2009April 23-28, 2009Classes #27-28Classes #27-28

Page 2: Introduction to Statistics Introduction to Statistics Correlation Chapter 15 April 23-28, 2009 Classes #27-28

CorrelationCorrelation

A statistical technique that is used to A statistical technique that is used to measure and describe a relationship measure and describe a relationship between two variablesbetween two variables– For example: For example:

GPA and TD’s scoredGPA and TD’s scoredStatistics exam scores and amount of time spent Statistics exam scores and amount of time spent studyingstudying

Page 3: Introduction to Statistics Introduction to Statistics Correlation Chapter 15 April 23-28, 2009 Classes #27-28

NotationNotation

A correlation requires two scores for each A correlation requires two scores for each individual individual – One score from each of the two variablesOne score from each of the two variables– They are normally identified as X and YThey are normally identified as X and Y

Page 4: Introduction to Statistics Introduction to Statistics Correlation Chapter 15 April 23-28, 2009 Classes #27-28

Three characteristics of X and Y Three characteristics of X and Y are being measured…are being measured…

The direction of the relationshipThe direction of the relationship– Positive or negativePositive or negativeThe form of the relationshipThe form of the relationship– Usually linear formUsually linear formThe strength or consistency of the The strength or consistency of the relationshiprelationship– Perfect correlation = 1.00; no consistency would Perfect correlation = 1.00; no consistency would

be 0.00be 0.00– Therefore, a correlation measures the degree of Therefore, a correlation measures the degree of

relationship between two variables on a scale relationship between two variables on a scale from 0.00 to 1.00.from 0.00 to 1.00.

Page 5: Introduction to Statistics Introduction to Statistics Correlation Chapter 15 April 23-28, 2009 Classes #27-28

AssumptionsAssumptionsThere are 3 main assumptions…There are 3 main assumptions…

– 1. The dependent and independent are normally distributed. 1. The dependent and independent are normally distributed. We can test this by looking at the histograms for the two We can test this by looking at the histograms for the two variablesvariables

– 2. The relationship between X and Y is linear. We can check 2. The relationship between X and Y is linear. We can check this by looking at the scattergramthis by looking at the scattergram

– 3. The relationship is homoscedastic. We can test 3. The relationship is homoscedastic. We can test homoscedasticity by looking at the scattergram and homoscedasticity by looking at the scattergram and observing that the data points form a “roughly symmetrical, observing that the data points form a “roughly symmetrical, cigar-shaped pattern” about the regression line.cigar-shaped pattern” about the regression line.

If the above 3 assumptions have been met, then we can use If the above 3 assumptions have been met, then we can use correlation and test r for significancecorrelation and test r for significance

Page 6: Introduction to Statistics Introduction to Statistics Correlation Chapter 15 April 23-28, 2009 Classes #27-28

Pearson rPearson r

The most commonly used correlationThe most commonly used correlationMeasures the degree of straight-line Measures the degree of straight-line relationshiprelationshipComputation:Computation:r = SP / (SSr = SP / (SSXX)(SS)(SSYY))

Page 7: Introduction to Statistics Introduction to Statistics Correlation Chapter 15 April 23-28, 2009 Classes #27-28

Example 1Example 1

X 30 38 52 90 95305

Y160180180210240970

X2

900 1,444 2,704 8,100 9,025 22,173

Y2

25,600 32,400 32,400 44,100 57,600 192,100

XY 4,800 6,840 9,36018,90022,80062,700

(X) (X2) (Y) (Y2) (XY)

Page 8: Introduction to Statistics Introduction to Statistics Correlation Chapter 15 April 23-28, 2009 Classes #27-28

Example 1Example 1

SSSSX X = = XX22 - - ((X)X)2 2 = 22,173 - = 22,173 - 30530522 = = nn 5 5

= 22,173 - 93025/5 = 22,173 - 18,605= 22,173 - 93025/5 = 22,173 - 18,605= 3,568= 3,568

SSY = Y2 - (Y)2 = 192,100 - 9702 = n 5

= 192,100 - 940,900/5 = 192,100 - 188,180 = 3,920

Page 9: Introduction to Statistics Introduction to Statistics Correlation Chapter 15 April 23-28, 2009 Classes #27-28

Example 1Example 1

SP = SP = XY - XY - ((X)(X)(Y)Y) = = nn

62,700 - 62,700 - (305)(970)(305)(970) 55

= 62,700 - 295,850/5 = 62,700 - 59,170= 62,700 - 295,850/5 = 62,700 - 59,170= 3,530= 3,530

Page 10: Introduction to Statistics Introduction to Statistics Correlation Chapter 15 April 23-28, 2009 Classes #27-28

Example 1Example 1

r = SP / (SSr = SP / (SSXX)(SS)(SSYY))

= 3,530 / (3,568)(3,920)= 3,530 / (3,568)(3,920)

= 3,530 / 13,986,560= 3,530 / 13,986,560

= 3,530 / 3,739.861= 3,530 / 3,739.861

= .944= .944

Page 11: Introduction to Statistics Introduction to Statistics Correlation Chapter 15 April 23-28, 2009 Classes #27-28

Coefficient of Determination (rCoefficient of Determination (r22))

The value The value rr22 is called the coefficient of is called the coefficient of determination because it measures the determination because it measures the proportion in variability in one variable that can proportion in variability in one variable that can be determined from the relationship with the be determined from the relationship with the other variableother variable– For example:For example:

A correlation of r = .42 (or r = - .42) means A correlation of r = .42 (or r = - .42) means that rthat r2 =2 = .17 (or 17%) of the variability in the .17 (or 17%) of the variability in the Y scores can be predicted from the Y scores can be predicted from the relationship with the X scoresrelationship with the X scores

Page 12: Introduction to Statistics Introduction to Statistics Correlation Chapter 15 April 23-28, 2009 Classes #27-28

Coefficient of Determination (r2) and Interpret:

The coefficient of determination is r2 = .891. Education, by itself, explains

89.1% of the variation in voter turnout.

891.)944(.)( 222 rr

Page 13: Introduction to Statistics Introduction to Statistics Correlation Chapter 15 April 23-28, 2009 Classes #27-28

Example 2Example 2A researcher predicts that there is a high A researcher predicts that there is a high correlation between years of education and voter correlation between years of education and voter turnoutturnout– She chooses Alamosa, Boston, Chicago, Detroit, and She chooses Alamosa, Boston, Chicago, Detroit, and

NYC to test her theoryNYC to test her theory

Page 14: Introduction to Statistics Introduction to Statistics Correlation Chapter 15 April 23-28, 2009 Classes #27-28

Example 2Example 2

The scores on each The scores on each variable are displayed variable are displayed in table format:in table format:– Y = % TurnoutY = % Turnout– X = Years of X = Years of

EducationEducation

CityCity XX YY

AlamosaAlamosa 11.911.9 5555

BostonBoston 12.112.1 6060

ChicagoChicago 12.712.7 6565

DetroitDetroit 12.812.8 6868

NYCNYC 13.013.0 7070

Page 15: Introduction to Statistics Introduction to Statistics Correlation Chapter 15 April 23-28, 2009 Classes #27-28

ScatterplotScatterplot

The relationship between X and Y is linear. The relationship between X and Y is linear.

Page 16: Introduction to Statistics Introduction to Statistics Correlation Chapter 15 April 23-28, 2009 Classes #27-28

Make a Computational Table

X Y X2 Y2 XY

11.9 55

12.1 60

12.7 65

12.8 68

13.0 70∑X = ∑Y = ∑X2 = ∑Y2 = ∑XY =

Page 17: Introduction to Statistics Introduction to Statistics Correlation Chapter 15 April 23-28, 2009 Classes #27-28

Find Pearson’s r and Interpret:

2222 )(][)([

))((

YYNXXN

YXXYNr

Page 18: Introduction to Statistics Introduction to Statistics Correlation Chapter 15 April 23-28, 2009 Classes #27-28

Pearson’s rPearson’s r

Had the relationship between % college Had the relationship between % college educated and turnout, r =.32.educated and turnout, r =.32.– This relationship would have been positive This relationship would have been positive

and weak to moderate.and weak to moderate.

Had the relationship between % college Had the relationship between % college educated and turnout, r = -.12.educated and turnout, r = -.12.– This relationship would have been negative This relationship would have been negative

and weak.and weak.

Page 19: Introduction to Statistics Introduction to Statistics Correlation Chapter 15 April 23-28, 2009 Classes #27-28

Find the Coefficient of Determination (r2) and Interpret:

Page 20: Introduction to Statistics Introduction to Statistics Correlation Chapter 15 April 23-28, 2009 Classes #27-28

Hypothesis Testing with PearsonHypothesis Testing with PearsonWe can have a two-tailed hypothesis:We can have a two-tailed hypothesis:HHoo: : ρρ = 0.0 = 0.0

HH11: : ρρ ≠ 0.0 ≠ 0.0

We can have a one-tailed hypothesis:We can have a one-tailed hypothesis:

HHoo: : ρρ = 0.0 = 0.0

HH11: : ρρ < 0.0 (or < 0.0 (or ρρ > 0.0) > 0.0)

Note that Note that ρρ (rho) is the population parameter, while r is the (rho) is the population parameter, while r is the sample statisticsample statistic

Page 21: Introduction to Statistics Introduction to Statistics Correlation Chapter 15 April 23-28, 2009 Classes #27-28

Find Find rrcriticalcritical

See Table B.6 (page 537)See Table B.6 (page 537)– You need to know the alpha levelYou need to know the alpha level– You need to know the sample sizeYou need to know the sample size– See that we always will use:See that we always will use: df df = n-2= n-2

Page 22: Introduction to Statistics Introduction to Statistics Correlation Chapter 15 April 23-28, 2009 Classes #27-28

Find Find rrcalculatedcalculated

See previous slides for formulasSee previous slides for formulas

Page 23: Introduction to Statistics Introduction to Statistics Correlation Chapter 15 April 23-28, 2009 Classes #27-28

Make you decision…Make you decision…

rrcalculatedcalculated < < rrcritical critical thenthen Retain HRetain H00

rrcalculatedcalculated > > rrcritical critical thenthen Reject HReject H00

Page 24: Introduction to Statistics Introduction to Statistics Correlation Chapter 15 April 23-28, 2009 Classes #27-28

Always include a brief summary Always include a brief summary of your results:of your results:

Was it positive or negative?Was it positive or negative?Was it significant ?Was it significant ?Explain the correlationExplain the correlationExplain the variationExplain the variation– Coefficient of Determination (rCoefficient of Determination (r22))

Page 25: Introduction to Statistics Introduction to Statistics Correlation Chapter 15 April 23-28, 2009 Classes #27-28

CreditsCreditshttp://campus.houghton.edu/orgs/psychology/stat15b.ppt#267,2,Reviewhttp://campus.houghton.edu/orgs/psychology/stat15b.ppt#267,2,Reviewhttp://publish.uwo.ca/~pakvis/Interval.ppt#276,17,Practical Example using http://publish.uwo.ca/~pakvis/Interval.ppt#276,17,Practical Example using Healey P. 418 Problem 15.1Healey P. 418 Problem 15.1