8 Correlations [Session 8]

Embed Size (px)

Citation preview

  • 8/3/2019 8 Correlations [Session 8]

    1/37

    CoCo--relation Analysisrelation Analysis

    Session 10Session 10

  • 8/3/2019 8 Correlations [Session 8]

    2/37

    Introduction

    Is there an association between two or more

    variables? If yes, what is form and degree of that

    relationship?Is the relationship strong or significant enough to

    be useful to arrive at a desirable conclusion?

    Can the relationship be used for predictive

    purposes, that is, to predict the most likely value

    of a dependent variable corresponding to the

    given value of independent variable or variables?

  • 8/3/2019 8 Correlations [Session 8]

    3/37

    DefinitionDefinition

    CorrelationCorrelation

    existsexists betweenbetween twotwo variables variables whenwhenoneone ofof themthem isis relatedrelated toto thethe otherother inin

    somesome wayway

  • 8/3/2019 8 Correlations [Session 8]

    4/37

    AssumptionsAssumptions

    1)1) TheThe samplesample ofof paired paired datadata ((x,yx,y)) isis aarandomrandom samplesample..

    2)2) TheThe pairspairs ofof ((x,yx,y)) datadata havehave aa bivariatebivariatenormalnormal distributiondistribution..

  • 8/3/2019 8 Correlations [Session 8]

    5/37

    Methods of Correlation Analysis

    In this chapter, the following methods of finding the correlationcoefficient between two variables xandyare discussed:

    Scatter Diagram method

    Karl Pearsons Coefficient of Correlation method

    Spearmans Rank Correlation method

    Method of Least-squares

    Figure shows how the strength of the association between two

    variables is represented by the coefficient of correlation.

    Negative Correlation Positive Correlation

    1.00 0.50 0 + 0.50 + 1.00

    Perfect negative

    correlation

    Moderate negative

    correlation

    No correlation Moderate positive

    correlation

    Strong negativecorrelation

    Weak negativecorrelation

    Weak positivecorrelation

    Strong positivecorrelation

    Perfect positivecorrelation

  • 8/3/2019 8 Correlations [Session 8]

    6/37

    DefinitionDefinition

    Scatterplot (or scatter diagram)Scatterplot (or scatter diagram)

    isis aa graphgraph inin whichwhich thethe pairedpaired ((x,yx,y))samplesample datadata areare plotted plotted with with aahorizontalhorizontal xxaxisaxis andand aa verticalverticalyyaxisaxis..

    EachEach individualindividual ((x,yx,y)) pairpair isis plottedplottedasas aa singlesingle pointpoint..

  • 8/3/2019 8 Correlations [Session 8]

    7/37

    Scatter Diagram of Paired DataScatter Diagram of Paired Data

  • 8/3/2019 8 Correlations [Session 8]

    8/37

    Scatter Diagram of Paired Data

  • 8/3/2019 8 Correlations [Session 8]

    9/37

    Positive Linear CorrelationPositive Linear Correlation

    x x

    yy y

    x

    Scatter Plots

    (a) Positive (b) Strongpositive

    (c) Perfectpositive

  • 8/3/2019 8 Correlations [Session 8]

    10/37

    Negative Linear CorrelationNegative Linear Correlation

    x x

    yy y

    x(d) Negative (e) Strong

    negative(f) Perfect

    negative

    Scatter Plots

  • 8/3/2019 8 Correlations [Session 8]

    11/37

    No Linear CorrelationNo Linear Correlation

    xx

    yy

    (g) No Correlation (h) Nonlinear Correlation

    Scatter Plots

  • 8/3/2019 8 Correlations [Session 8]

    12/37

    Karl Pearson's Correlation

    Coefficient

  • 8/3/2019 8 Correlations [Session 8]

    13/37

    7fdxdy - (7fdx)(7fdy)/N

    (SDx) (SDy)r=

    Definition Karl Pearson[For Classified data]

    Correlation Coefficient r

    SDx = fdx (fdx)/N

    SDy = fdy (fdy)/N

    dx = Xi A

    dy = Yi A

  • 8/3/2019 8 Correlations [Session 8]

    14/37

    Example 1Example 1

    FindFind coefficientcoefficient ofof correlationcorrelation betweenbetween heightheight (X)(X) andand

    weightweight (Y)(Y) fromfrom thethe followingfollowing datadata.. Also,Also, obtainobtain thethe

    twotwo regressionregression lineline..

    HeightHeight 6161 6565 6868 6262 6060

    Weight Weight 6262 5555 7070 6060 5353

  • 8/3/2019 8 Correlations [Session 8]

    15/37

    Answer 1Answer 1

    rr == 00..6565

    XX 6363..22 == 00..3232(Y(Y 6060))

    YY 6060 == 11..3333(X(X 6363..22))

  • 8/3/2019 8 Correlations [Session 8]

    16/37

    Example 2Example 2

    GivenGiven thethe twotwo regressionregression lineslines

    44xx 55yy ++ 3333 == 00

    2020xx 99yy 107107 == 00AndAnd variancevariance ofofxx beingbeing99,, calculatecalculate

    1)1) MeanMean xx andand MeanMean yy

    2)2) CorrelationCorrelation CoefficientCoefficient ofofxx && yy3)3) SDSD ofof yy

  • 8/3/2019 8 Correlations [Session 8]

    17/37

    Answer 2Answer 2

    MeanMean XX == 1313,, MeanMean YY == 1717

    rr == 00..66

    VarianceVariance ofof y y ==1616

  • 8/3/2019 8 Correlations [Session 8]

    18/37

    Round toRound to threethree decimal placesdecimal places

    Use calculator or computer if possibleUse calculator or computer if possible

    Rounding the

    Linear Correlation Coefficient r

  • 8/3/2019 8 Correlations [Session 8]

    19/37

    Properties of theProperties of the

    Linear Correlation CoefficientLinear Correlation Coefficientrr

    1.1. --11ee rree 11

    2. Value of2. Value ofrrdoes not change if all values ofdoes not change if all values ofeither variable are converted to a differenteither variable are converted to a different

    scale.scale.

    3.3. TheThe rris not affected by the choice ofis not affected by the choice ofxxandandyy..InterchangeInterchange xxandandyyand the value ofand the value ofrrwill notwill notchange.change.

    4.4. rr measures strength of ameasures strength of a linearlinear relationship.relationship.

  • 8/3/2019 8 Correlations [Session 8]

    20/37

    Interpreting the Linear CorrelationInterpreting the Linear Correlation

    CoefficientCoefficient rr == ++ 11 :: PerfectPerfect PositivePositive CorrelationCorrelation

    rr == 11 :: PerfectPerfect NegativeNegative CorrelationCorrelation

    rr == 00 :: UncorrelatedUncorrelated CorrelationCorrelation

    StandardStandard ErrorError (S(S..EE..)) == ((11 rr22)/)/N,N, NN == pair pair ofofobservationsobservations

    ProbableProbable ErrorError == 00..67456745 XX SS..EE..

  • 8/3/2019 8 Correlations [Session 8]

    21/37

    Spearmans Rank Correlation Coefficient

    This method is applied to measure the association between two variables when

    only ordinal (or rank) data are available. In other words, this method is applied in a

    situation in which quantitative measure of certain qualitative factors such as

    judgment, brands personalities, TV programmes, leadership, colour, taste, cannot

    be fixed, but individual observations can be arranged in a definite order(also called

    rank). The ranking is decided by using a set of ordinal rank numbers, with 1 for the

    individual observation ranked first either in terms ofquantity orquality; and n for the

    individual observation ranked last in a group of n pairs of observations.

    Mathematically, Spearmans rank correlation coefficient is defined as:

    where R = rank correlation coefficient

    R1 = rank of observations with respect to first variable

    R2 = rank of observations with respect to second variable

    d = R1 R2, difference in a pair of ranksn = number of paired observations or individuals being ranked

    The number 6 is placed in the formula as a scaling device, it ensures that the

    possible range of R is from 1 to 1. While using this method we may come across

    three types of cases.

  • 8/3/2019 8 Correlations [Session 8]

    22/37

    Advantages

    This method is easy to understand and its application issimpler than Pearsons method.

    This method is useful for correlation analysis when variables

    are expressed in qualitative terms like beauty, intelligence,

    honesty, efficiency, and so on.This method is appropriate to measure the association between

    two variables if the data type is at least ordinal scaled (ranked)

    The sample data of values of two variables is converted into

    ranks either in ascending order or descending order forcalculating degree of correlation between two variables.

  • 8/3/2019 8 Correlations [Session 8]

    23/37

    Disadvantages

    Values of both variables are assumed to be normallydistributed and describing a linear relationship rather than non-

    linear relationship.

    A large computational time is required when number of pairs

    of values of two variables exceed 30.This method cannot be applied to measure the association

    between two variable grouped data.

  • 8/3/2019 8 Correlations [Session 8]

    24/37

    Rank Order CorrelationRank Order Correlation

    HitsHits RankRank HRHR RankRank DD DD22

    11 1010 33 88 22 44

    22 99 44 77 22 44

    33 88 55 66 22 44

    44 77 11 1010 --33 9955 66 77 44 22 44

    66 55 66 55 00 00

    77 44 22 99 --55 2525

    88 33 1010 11 22 44

    99 22 99 22 00 00

    1010 11 88 33 22 44

  • 8/3/2019 8 Correlations [Session 8]

    25/37

    Rank Order Correlation, contRank Order Correlation, cont

    HitsHits RankRank HRHR RankRank DD DD22

    11 1010 33 88 22 44

    22 99 44 77 22 44

    33 88 55 66 22 44

    44 77 11 1010 --33 99

    55 66 77 44 22 44

    66 55 66 55 00 00

    77 44 22 99 --55 2525

    88 33 1010 11 22 44

    99 22 99 22 00 00

    1010 11 88 33 22 44

    Rho = 1- [6 (D2) / N (N2-1)]

    Rho = 1- [6(58)/10(102-1)]

    Rho = 1- [348 / 10 (100 -1)]

    Rho = 1- [348 / 990]

    Rho = 1- 0.352

    Rho = 0.648

    (D2

    = 58)N=10

  • 8/3/2019 8 Correlations [Session 8]

    26/37

    PearsonsPearsons rrH

    itsH

    itsHRHR 77

    xyxy11 33 33

    22 44 88

    33 55 1515

    44 11 44

    55 77 3535

    66 66 3636

    77 22 1414

    88 1010 8080

    99 99 8181

    1010 88 8080

    77x/nx/n

    =5.5=5.5

    77x/nx/n

    = 5.5= 5.577xy/nxy/n

    ==32.8632.86

    7xy/n - (7x/n)(7y/n)

    (SDx) (SDy)r=

    r= 32.86 - (5.5) (5.5)/(3.03) (3.03)

    r= 35.86 - 30.25 / 9.09

    r= 5.61 / 9.09

    r= 0.6172

  • 8/3/2019 8 Correlations [Session 8]

    27/37

    Example 3Example 3 Compute the correlation coefficient:Compute the correlation coefficient:

    Age ofAge of

    husbandshusbands

    Age of wivesAge of wives

    1515--2525 2525--3535 3535--4545 4545--5555 5555--6565 6565--7575 TotalTotal

    1515--2525 11 11 -- -- -- -- 222525--3535 22 1212 11 -- -- -- 1515

    3535--4545 -- 44 1010 11 -- -- 1515

    4545--5555 -- -- 33 66 11 -- 1010

    5555--6565 -- -- -- 22 44 22 88

    6565--7575 -- -- -- -- 11 22 33

    TotalTotal 33 1717 1414 99 66 44 5353

  • 8/3/2019 8 Correlations [Session 8]

    28/37

    Sol.3Sol.3 1515--2525 2525--3535 3535--4545 4545--5555 5555--6565 6565--7575

    ff fdfdyy fdfdyy fdfdxxddyy

    1515--2525

    2525--3535

    3535--4545

    4545--5555

    5555--6565

    6565--7575

    ff

    fdfdxx

    fdfdxx

    fdfdxxdd

    yy

    X

    dx

    dyY

  • 8/3/2019 8 Correlations [Session 8]

    29/37

    Sol.3Sol.3 1515--2525 2525--3535 3535--4545 4545--5555 5555--6565 6565--7575

    --22 --11 00 +1+1 +2+2 +3+3 ff fdfdyy fdfdyy fdfdxxddyy

    1515--2525 --22 11 11 -- -- -- -- 22

    2525--3535 --11 22 1212 11 -- -- -- 1515

    3535--4545 00 -- 44 1010 11 -- -- 1515

    4545--5555 +1+1 -- -- 33 66 11 -- 1010

    5555--6565 +2+2 -- -- -- 22 44 22 88

    6565--7575 +3+3 -- -- -- -- 11 22 33

    ff 33 1717 1414 99 66 44 5353

    fdfdxx

    fdfdxx

    fdfdxxdd

    yy

    X

    dx

    dyY

  • 8/3/2019 8 Correlations [Session 8]

    30/37

    Sol.3Sol.3 1515--2525 2525--3535 3535--4545 4545--5555 5555--6565 6565--7575

    --22 --11 00 +1+1 +2+2 +3+3 ff fdfdyy fdfdyy fdfdxxddyy

    1515--2525 --22 11 11 -- -- -- -- 22 --44 88 66

    2525--3535 --11 22 1212 11 -- -- -- 1515 --1515 1515 1616

    3535--4545 00 -- 44 1010 11 -- -- 1515 00 00 00

    45

    45--5555 +1+1 -- -- 33 66 11 -- 1010 +10+10 1010 88

    5555--6565 +2+2 -- -- -- 22 44 22 88 +16+16 3232 3232

    6565--7575 +3+3 -- -- -- -- 11 22 33 +9+9 2727 2424

    ff

    33 1717 14

    14 99

    6644

    5353 +16+169

    29

    2 8686fdfd

    xx --66 --1717 00 99 1212 1212 +1+1

    00

    fdfdxx 1212 1717 00 99 2424 3636 9898

    fdfdxxddyy 88 1414 00 1010 2424 3030 8686

    X

    dx

    dyY

    4

    4

    2

    12 0

    0 0 0

    0 6 2

    4 16 12

    6 18

  • 8/3/2019 8 Correlations [Session 8]

    31/37

    Sol.3Sol.3 1515--2525 2525--3535 3535--4545 4545--5555 5555--6565 6565--7575

    --22 --11 00 +1+1 +2+2 +3+3 ff fdfdyy fdfdyy fdfdxxddyy

    1515--2525 --22 11 11 -- -- -- -- 22 --44 88 66

    2525--3535 --11 22 1212 11 -- -- -- 1515 --1515 1515 1616

    3535--4545 00 -- 44 1010 11 -- -- 1515 00 00 00

    4545--5555 +1+1 -- -- 33 66 11 -- 1010 +10+10 1010 885555--6565 +2+2 -- -- -- 22 44 22 88 +16+16 3232 3232

    6565--7575 +3+3 -- -- -- -- 11 22 33 +9+9 2727 2424

    ff

    33 1717 1414 99 66 44 5353 +16+16 9292 8686fdfd

    xx --66 --1717 00 99 1212 1212 +1+1

    00

    fdfdxx 1212 1717 00 99 2424 3636 9898

    fdfdxxddyy 88 1414 00 1010 2424 3030 8686

    X

    dx

    dyY

    4

    4

    2

    12 0

    0 0 0

    0 6 2

    416 12

    6 18

    r= 0.907

  • 8/3/2019 8 Correlations [Session 8]

    32/37

    0.27

    2

    1.41

    3

    2.19

    3

    2.83

    6

    2.19

    4

    1.81

    2

    0.85

    1

    3.05

    5

    Data from the Garbage Project

    x Plastic (lb)

    y Household

    Is there a significant linear correlation?

  • 8/3/2019 8 Correlations [Session 8]

    33/37

    0.27

    2

    1.41

    3

    2.19

    3

    2.83

    6

    2.19

    4

    1.81

    2

    0.85

    1

    3.05

    5

    Data from the Garbage Project

    x Plastic (lb)

    y Household

    Is there a significant linear correlation?

    Plastic Household

    0.27 2

    1.41 3

    2.19 3

    2.83 6

    2.19 41.81 2

    0.85 1

    3.05 5

  • 8/3/2019 8 Correlations [Session 8]

    34/37

    0.27

    2

    1.41

    3

    2.19

    3

    2.83

    6

    2.19

    4

    1.81

    2

    0.85

    1

    3.05

    5

    Data from the Garbage Project

    x Plastic (lb)

    y Household

    Is there a significant linear correlation?

    r= 0.842

    R2 = 0.71

  • 8/3/2019 8 Correlations [Session 8]

    35/37

    Correlation AnalysisCorrelation Analysis

    Vs. Regression AnalysisVs. Regression Analysis

    CorrelationCorrelation meansmeans thethe relationshiprelationship betweenbetween

    twotwo or or moremore variablesvariables toto measuremeasure thethedirectiondirection andand degreedegree ofof linearlinear relationshiprelationship..

    RegressionRegression analysisanalysis aimsaims atat establishingestablishing thethe

    functionalfunctional relationshiprelationship..

  • 8/3/2019 8 Correlations [Session 8]

    36/37

    Correlation does not imply causationCorrelation does not imply causation

    CorrelationCorrelation doesdoes notnot implyimply causationcausation isis aa phrasephraseusedused inin thethe sciencessciences andand statisticsstatistics toto emphasizeemphasizethatthat correlationcorrelation betweenbetween twotwo variables variables doesdoes notnotimplyimply therethere isis aa causecause--andand--effecteffect relationshiprelationship

    betweenbetween thethe twotwo.. ItsIts converse,converse, correlationcorrelation provesprovescausation,causation, isis aa logicallogical fallacyfallacy byby whichwhich twotwo eventseventsthatthat occuroccur togethertogether areare claimedclaimed toto havehave aa causecause--andand--effecteffect relationshiprelationship.. ForFor example,example,

    AA occursoccurs inin correlationcorrelation withwith BB..

    Therefore,Therefore, AA causescauses BB..

    ThisThis isis aa logicallogical fallacyfallacy becausebecause therethere areare atat leastleastfourfour otherother possibilitiespossibilities::

  • 8/3/2019 8 Correlations [Session 8]

    37/37

    Correlation does not imply causationCorrelation does not imply causation

    1.1. BB maymay bebe thethe causecause ofof A,A, oror

    2.2. somesome unknownunknown thirdthird factorfactor isis actuallyactually thethe causecause ofof

    thethe relationshiprelationship betweenbetween AA andand B,B, oror3.3. thethe "relationship""relationship" isis soso complexcomplex itit cancan bebe labeledlabeled

    coincidentalcoincidental (i(i..ee..,, twotwo eventsevents occurringoccurring atat thethe samesametimetime thatthat havehave nono simplesimple relationshiprelationship toto eacheach otherotherbesidesbesides thethe factfact thatthat theythey areare occurringoccurring atat thethe samesame

    time)time)..4.4. BB maymay bebe thethe causecause ofof AA atat thethe samesame timetime asas AA isis

    thethe causecause ofof BB (contradicting(contradicting thatthat thethe onlyonlyrelationshiprelationship betweenbetween AA andand BB isis thatthat AA causescauses B)B)..ThisThis describesdescribes aa selfself--reinforcingreinforcing systemsystem..