21
UNIT 07 CORRELATION INTRODUCTION So far we have studied the characteristics of only one variable like weights, prices sales etc. this type of study is called univariate analysis. If there exists some relationship between two variables and if we study it, then the statistical analysis of such data is called bivariate analysis. The determination of the existence and extent of the relationship between two phenomenon, is one of the most important objectives of statistics further existence of relation ship between two or more variables enable us to predict further values. To carry out our analysis effectively, it often becomes necessary to observe and study the relationship existing between two measurable variables like between price and demand, yield and rainfall etc. The term for such analysis of the relationship existing between two different variables is known as correlation. So correlation is a statistical technique which measures and analysis the degree of relationship existing between two measurable variables in other words the term correlation indicates the relationship between such as variable, in which with changes in the value of one variables, the value of the other variable also change. DEFINATION: According to L. R. cannon “If two or more quantities vary in sympathy, so that movements in one tend to be accompanied by corresponding movements in the other, then they are said to be correlated”. According to Croxton and Cowden “The relationship of quantitative nature, the appropriate statistical tool for discovering and measuring the relationship and expressing it in brief formula is known as correlation.” According to A. M. Tuttle “ Correlation is an analysis of the co- variation between two or more variables. Thus the association of any two varieties is known as correlation. The correlation expresses the relationship or inter dependence of two sets of variables upon each other in such a way that the changes in the value of one variable are in sympathy with the changes in the other. Correlation is the numerical measurement showing the degree of correlation between two variable. CAUSE AND EFFECT Correlation also means the presence of cause and effect relationship between the two distributions. For example when we say that there is relationship between price and demand it means that price 123

Module 7

Embed Size (px)

Citation preview

Page 1: Module 7

UNIT 07CORRELATION

INTRODUCTIONSo far we have studied the characteristics of only one variable like weights, prices sales etc. this type

of study is called univariate analysis. If there exists some relationship between two variables and if we study it, then the statistical analysis of such data is called bivariate analysis.

The determination of the existence and extent of the relationship between two phenomenon, is one

of the most important objectives of statistics further existence of relation ship between two or more variables enable us to predict further values.

To carry out our analysis effectively, it often becomes necessary to observe and study the relationship existing between two measurable variables like between price and demand, yield and rainfall etc.

The term for such analysis of the relationship existing between two different variables is known as correlation. So correlation is a statistical technique which measures and analysis the degree of relationship existing between two measurable variables in other words the term correlation indicates the relationship between such as variable, in which with changes in the value of one variables, the value of the other variable also change.

DEFINATION:According to L. R. cannon “If two or more quantities vary in sympathy, so that movements in one

tend to be accompanied by corresponding movements in the other, then they are said to be correlated”.

According to Croxton and Cowden “The relationship of quantitative nature, the appropriate statistical tool for discovering and measuring the relationship and expressing it in brief formula is known as correlation.”

According to A. M. Tuttle “ Correlation is an analysis of the co-variation between two or more variables.

Thus the association of any two varieties is known as correlation. The correlation expresses the relationship or inter dependence of two sets of variables upon each other in such a way that the changes in the value of one variable are in sympathy with the changes in the other. Correlation is the numerical measurement showing the degree of correlation between two variable.

CAUSE AND EFFECTCorrelation also means the presence of cause and effect relationship between the two distributions.

For example when we say that there is relationship between price and demand it means that price is the cause and demand is the effect. In other words as price increases the amount of demand decreases and vice- versa.

It is generally assumed that when two variable are correlated, certain cause and effect relationship exists between them. But there is a possibility that statistically two variables are found correlated but practically they are not related at all. For example, there cannot be cause and effect relationship between rainfall and percentage of pass in the examination, even though there exists correlation between them. Such correlation is called SPURIOUS CORRELATION, which arises due to chance factor.

Usefulness of Correlation Correlation is useful in physical and social sciences. Following are the important uses.

1. Correlation is very useful to economists to study the relationship between variables like price and quantity demanded. It helps businessmen to estimate costs, sales, price and other related variables .

2. Some variables show some kind of relationship, correlation analysis helps in measuring the degree of relationship between the variables like supply & demand etc.

3. The relation between variables can be verified and tested for significance, with the help of the correlation analysis.

4. The coefficient of Correlation is a relative measure and we can compare the relationship between variable which are expressed in different units.

5. Sampling error can also be Calculated 6. Correlation is the basis for the concept of regression and ratio of variation.

123

Page 2: Module 7

Types of CorrelationCorrelation is classified into following types.

1. Positive and Negative2. Simple and Multiple3. Partial and total 4. Linear and Non-Linear

1. Positive and NegativeThe direction of variation of the variables determines whether correlation is positive or negative.Correlation is said to be positive when the values of two variables move in the same direction, so

that an increase in the values of one variable is associated with an increase in the values of the other variable also and a decrease in the value of one variable is associated with the decrease in the values of other variables.

Correlation is said to be negative if an increase or decrease in the values of one variable is associated with a decrease or increase in the values of the other that the changes in the values move in the opposite direction.2. Simple and multiple When we study only two variables, the relationship is described as simple correlation.

But in a multiple correlation we study more than two variables simultaneously, example, the relationship of price, demand and supply of a commodity.3. Partial & Total:

The study of two variables excluding some other variables is called partial correlation. For example.We study price and demand laminating the supply side in total correlation all the facts are taken into

account4. Linear and non linear

Correlation is said to be linear, if the amount of change in one variable tends to bear a constant ratio to the amount of change in the other variable .if the ratio of change between two variable is uniform, than there will be linear correlation between them.

Correlation said to be non linear, if the amount of change in one variable does not bear a constant ratio to the amount of change in the other related variable.

Methods of studying correlation The commonly used methods for studying the correlation between two variables are 1. Graphic Method

a) Scatter diagramb) Simple graph

2. Mathematical methodKarl Pearson’s coefficient of correlation.

1. a) Scatter Diagram:This is the most simplest way of studying correlation between the two distribution, by plotting the

values on a chart known as scatter diagram. In this method, the given data are plotted on a graph paper in the form of dots. X variables are plotted on the horizontal axis and y variables on the vertical axis. Thus we have the dots and we can know the scatter of the various points this will show the type of correlation.

Following diagrams illustrate the degree and direction of relationship

Positive correction Negative No correction

Diagram I indicates positive correlation as it shows that the values of the two variables move in the same direction,

Diagram 2 indicates negative correlation as the values of the two variables move in the reverse direction.

Diagram 3 indicates no correlation.

124

Page 3: Module 7

SIMPLE GRAPH: In this method, two different curves, one representing the values of x and the other representing the

values of y are obtained on a graph paper. If the two curves run parallel to each in the same direction either upward or downward, then there exists positive correlation. On the other hand if the two curves run in opposite direction, then the correlation is negative correlation.

The above methods of studying correlation help us only to form an approximate idea. It is not possible to understand the exact size of correlation with the help of the above methods. The numerical values of correlation is obtained by applying the method suggested by Prof. Karl Pearson.

Mathematical MethodKarl Pearson’s coefficient of correlation

Karl personas a great British Bio-metrician and statistician, has propounded the formula for calculating the coefficient of correlation. The formula is based on arithmetic mean and standard deviation and it is most widely used. The formula indicates whether the correlation is positive or negative. The answer lies between +1 and –1. Zero represents the absence of correlation. It is denoted by ‘ ’ which is the symbol of the degree of correlation and is obtained by using the following formula.

1. When Deviations are taken from Actual Mean = dxdy

d2x x d2 y dx =Deviations of x values of variable from i.e.(x –) dy = Deviations of y values of variable from y i.e ( y - y )

2. When Deviations are taken from Assumed Mean. Formula = dxdy X n - ( dx x dy) d2x X n – X d2y X n – (dy)2

dxdy = sum of the product of deviation of x and y series.dx = sum of deviations taken from assumed mean of x seriesdy = sum of deviations taken from assumed mean of y series d2x = sum of squares of deviations of x seriesd2y = sum of squares of deviations of y series.

Calculation of coefficient of correlationProblems on up grouped data - (individual Series)

1 Taking Deviations from Actual Mean:

ILLUSTRATION = 01Calculate Coefficient of Correlation from the following data.

X 57 59 62 63 64 65 55 58 57Y 113 117 126 126 130 129 111 116 112

SOLUTION:Steps: 1. Calculate actual Mean of x & y series 2. Take the Deviations from Actual Mean & square3. Find the Product of the deviations use formula

= dxdy d2x X d2y

X (x – 60) d2x y (y –120) d2y dxdy = dxdy

125

Page 4: Module 7

dx dy d2x X d2y = 216 102 x 472 = 216 = 216 = 0.9844 48144 219.41

575962636465555857

-3-12345-5-2-3

914916252549

113117126126130126111116112

-7-366109-9-4-8

499363610081811664

21+312184045450821

x =540 102 472 216

Calculation of Actual Mean x = x = 540 = 60 y = y = 1080 =120

n 9 n 9

PROBABLE ERROR:To find out the reliability or the significance of the value of K.P. Coefficient correlation, probable

error is used.

A according to Horace Secrist “The probable error of the coefficient of correlation is an amount which if added to or subtracted from the mean correlation coefficient produces amounts within which the chances are even that a coefficient of correlation from a series selected at random will fall. The formula for calculating probable.

P.E = 0.6745 x 1 – 2 n

Functions of probable errors.1. If the value of is less than the probable error, the value of r is not all significant2. If the value of is more than six times the probable error, the value of is significant ( = 6PE)3. If the probable error is less than 0.3 the correlation should not be considered at all.4. If the probable error is small, the correlation is definitely existing.

Example :- Given values are = 0. 9844, n =9SOLUTION

P E = 0.6745 x 1- 2 n = 0.6745 x 1 – ( 0.9844) 2 9 = 0.6745 x 1 – 0.9690 = 0.6745 x 0.0310

3 3 P.E = 0.006969

Conclusion:- P.E is very small , it means there exist high degree of positive correlation.

ILLUSTRAION –02Find Karl Personas coefficient of correlation from the following data. Also calculate probable

errors. Wages in Rs. 100 101 102 102 100 99 97 98 96 95Cost of living 98 99 99 97 95 92 95 94 90 91

126

Page 5: Module 7

SOLUTIONWage in Rs

X

(x-x)(x-99)

dxd2x

Cost of living

y

(y - y)(y-95)

dyd2y dxdy

= x = 990 = 99 n 10y = y = 950 = 95 n 10 = dxdy . d2x X d2y

= 61 .54 X 96

= 61 = 61 = 0.8472 5184 72PE =0.6745 X 1 – 2 nPE =0.6745 x 1- (0.8472) 10= 0.6745 x 1 –0.7177 = 0.6745 x 0.282 = 0.1904High degree of positive correlation present

1001011021021009997989695

123310-2-1-3-4

149910419

16

98999997959295949091

34420-30-1-5-4

91616409012516

3812600011516

990x

0 54d2x

950y

96d2y

61dxdy

II. METHOD ASSUMED MEAN METHOD When Deviation are taken from Assumed mean

ILLUSTRATION = 3 Calculate the coefficient of correlation for following data also calculate the probable error

Price x 42 38 42 45 42 44 40 46 44 40Demand y 26 40 29 27 30 27 35 25 26 30

SOLUTION AX =

xx-42dx

d2 Yy – 27

dyDy xy

Steps:1. Select any value as assumed mean

from x & y series2. Take the deviations from assumed

mean3. Square the deviations & find the

product of the deviation4. Use the formula.

42384245424440464440

0-40302-242-2

016090441644

26402927302735252630

-1+3+2030

+8-2-13

1169409064419

0-520000

-16-8-2-6

3dx57d2x

25dy

261–84

dxdySTPES

1. Select any value as assumed mean from x and y series2. Take the deviation form assumed mean 3. Square the deviation and find the product of the deviation4. Use the formula

= dxdy Xn –( dx dy) dx2 X n-(dx)2 X d2y X n-(dy)2

= –84X10-(3X25) 57X10 –(3)2 X 261 X 10-(25) = -840-75 570-9 X 2610-625

127

Page 6: Module 7

= -915 561 X 1985

= -915 =0.86812 1054.9PE = 0.6745 X 1 – 2 =0.6745 X 1 –(-0.868) 2 n 10 =0.6745 X 1-0.7534 = 0.6745 X 0.2466 = 0.1663 = 0.526 3.16 3.16 3.16 PE =0.0526 ILLUSTRATION = 04 Calculate co efficient of correlation between the marks obtained by ten students in accountancy and statistics.Student 1 2 3 4 5 6 7 8 9 10Accountancy x 45 70 65 30 90 40 50 75 85 60Statistic y 35 90 70 40 95 40 60 80 80 50

Student Xx – 90

dxd2x y

y – 95dy

d2y dxdySteps:

1. Select any value as assumed mean from x & y series

2. Take the deviations from assumed mean

3. Square the deviations & find the product of the deviation

4. Use the formula.

1 45 -45 2025 35 -60 3600 27002 70 -20 400 90 -5 25 1003 65 -25 625 70 -25 625 6254 30 -60 3600 40 -55 3025 33005 90 0 0 95 0 0 06 40 -50 2500 40 -55 3025 27507 50 -40 1600 60 -35 1225 14008 75 -15 225 80 -15 225 2259 85 -5 25 80 -15 225 7510 60 -30 900 50 -45 2025 1350

-290dx

11900d2x

-310dy

14000 d2y

12525dxdy

= dxdy Xn –( dx dy) dx2 X n-(dx)2 X d2y X n-(dy)2

= –84X10-(3X25) 57X10 –(3)2 X 261 X 10-(25) = -840-75 570-9 X 2610-625 = -915 561 X 1985

= -915 =0.86812 1054.9PE = 0.6745 X 1 – 2 =0.6745 X 1 –(-0.903) 2 = 0.6745 x 1 –8154 n 10 3.16 =0.6745 X 1846 = 0.1245 = 0.0394 3.16 3.16 PE =0.0394

ILLUSTRATION =05 Calculate Karl Pearson’s co efficient of correlation between x & y also calculate PE

X=58 43 41 39 43 46 43 45 41 47 45 44Y=11 27 31 42 30 28 28 20 19 20 32 30

SOLUTION

128

Page 7: Module 7

Xx –45

dxd2x y

y – 27dy

d2y dxdy

58 13 169 11 -16 256 -20843 -2 4 27 0 0 041 -4 16 31 4 16 -1639 -6 36 42 15 225 -9043 -2 4 30 3 9 -646 1 1 28 1 1 143 -2 4 28 1 1 -245 0 0 20 -7 49 041 -4 16 19 -8 64 3247 2 4 20 -7 49 -1445 0 0 32 5 25 044 -1 1 30 3 9 -3

-5dx

255d2x

-6d2y

704-306dxdy

= dxdy X n –( dx dy) d2x X n -(dx)2 X d2y X n-(dy)2

= 306 X 10-(-5 X –6) 255 x 10 –(-5)2 X 704 X 10-(-6)2

= -3090 2550 –25 X 7040 –36 = -3090 50.25 X 83.69

= -3090 =0.7348 4205.4PE = 0.6745 X 1 – 2 =0.6745 X 1 –(-0.7348) 2 = 0.6745 x 1 –0.2652 n 10 3.16 = 0.0566 ILLUSTRATION =06Calculate Karl Persons coefficient of correlation between the age of husband & wives also calculate PE

Age of husband (x) 20 25 30 35 40 45 50 55 60 65 75Age of wife (y) 17 24 28 32 35 38 42 51 56 60 62

SOLUTION since is independent of the change of origin and scale we take dx = x –45 and dy = y – 35 5

X(x –45)/5

dxd2x y

y – 35dy

d2y dxdy

20 -5 25 17 -18 324 9025 -4 16 24 -11 121 4430 -3 9 28 -7 49 2135 -2 4 32 -3 9 640 -1 1 35 0 0 045 0 0 38 3 9 050 1 1 42 7 49 755 2 4 51 16 256 3260 3 9 56 21 441 6365 4 16 60 25 625 10070 5 25 62 27 729 13511N

0dx

110d2x

60dy

2161dx2

498dxdy

= dxdy X n –( dx dy) d2x X n -(dx)2 X d2y X n-(dy)2

129

Page 8: Module 7

= 498 X 11-(0 X 60) 110 x 11 –(0)2 X 261 X 11-(60)2

= 5478 – 0 1210 –0 X 28732 –3600

= 5478 1210 X 25132

= 5478 = 0.7348

5515.258

PE = 0.6745 X 1 – 2 =0.6745 X 1 –(-0.9932) 2 = 0.6745 x 0.0136 n 11 3.3166 = 0.0027Since >6PE, the result is significant

CAICULATION OF COEFFICIENT OF CORRELATION IN BIVARITE FREQUENCY DISTRIBUTION

When the number of observation is very large the data is classified into two way frequency distribution the class intervals for y are in the column heading and for y in the stubs the formula for calculating the co efficient of correlation

= fdxdy X N –( fdx X fdy ) fd2x X N – (fdx )2 X (fdy)2 X N –(fdy)2

Steps 1. Find the mid points of the various class for x & y variables 2. Take the step deviation of x variables (dx) and of y variables (dy)

3. Multiply dx, dy and the respective frequency of each cell and note the figure obtained in the left hand corner of each cell.4. Sum up the all the values as calculated and get the total i.e. fdxdy,5. Find fdx and fd2x, taking the deviation from the assumed mean 6. Find fdy and fd2y taking deviation from the assumed mean 7. Write the formula substitute the values.

ILLUTRATION =07

Calculate coefficient of correlation between the marks obtained by a batch of 100 student in accountancy and statistics as given below:-

Marks in statistics x

Marks in Accountancy yTotal

20-30 30-40 40-50 50-60 60-7015-25 5 9 3 - - 1725-35 - 10 25 2 - 3735-45 - 1 12 2 - 1545-55 - - 4 16 5 2555-65 - - - 4 2 6Total 5 20 44 24 7 100

Solution: A=40C=10

A=45C=10

Y 20-30 30-40 40-50 50-60 60-70 TotalMid y 25 35 45 55 65

Mid x dy -2 -1 0 1 2 f fdx Fd2x Fdxdy

130

Page 9: Module 7

dx

15-25 20 -220 5

189 3 - - 17 -34 68 38

25-35 30 -1 -10 1

025

-22 - 37 -37 37 8

35-45 40 0 -0

1 120

2 - 15 0 0 0

45-55 50 1 - - 416 1

6

10 5 25 25 25 26

55-65 60 2 - - -8

48

2 6 12 24 16

Total F 5 20 44 24 7100N

-34fdx

154fd2x

88

Fdy -10 -20 0 24 148

fdxy

fdxdyFd2y 20 20 0 24 28

92fd2x

yfdxdy 20 28 0 22 18 88

Formula: = dxdy X n –( dx dy) d2x X n -(dx)2 X d2y X n-(dy)2

= 88 X 100 –(-34 X 8) 154 x 100 –(-34)2 X 92 X 100 –(8)2

= 8800 +272 = 9092 15400 –1156 X 9200 –64 14244 X 9136

= 9072 = 9072 = 0.7953 119.35 X 95.58 1140.74

ILLUSTRATION = 08

Calculate the coefficient of correlation between ages of husbands and ages of wives in the following bivariate frequency distribution. Find also its probable error and comment on the result.

Age of Husbands

Age of wivesTotal

10 –20 20 –30 30 –40 40 –50 50 -6015 –25 6 3 - - - 925 –35 3 16 10 - - 2935 –45 - 10 15 7 - 3245 –55 - - 7 10 4 2155 –65 - - - 4 5 9Total 9 29 32 21 9 100

7 N

Solutiondx = x – 40, dy= x – 35

10 10

SOLUTION

131

Page 10: Module 7

A=35 C=10A=40

Y 10-20 20-30 30-40 40-50 50-60

MV 15 25 35 45 55 Total

C=10 MV dydx

-2 -1 0 1 2Total

Ffdx Fd2x

Fdxdy

15-25 20 -224 6

63 - - - 9 -18 -36 30

25-35 30 -16

316 1

610 - - 29 -29 -29 22

35-45 40 0 -0 1

015

07 - 32 0 0 0

45-55 50 1 - - 710 1

08

4 21 21 21 18

55-65 60 2 - - -8

420

59 18 36 28

Total

F 9 29 32 21 9100N

-8fdx

122fd2x

98

fdy -18 -29 0 21 18-8

fdy

fdxdy

Fd2y 36 29 0 21 36-8fd2

yfdxd

y30 22 0 18 28 98

Formula:- = dxdy X n –( dx dy) d2x X n -(dx)2 X d2y X n-(dy)2

= 98 X 100 –(-8 X 8) 122 x 100 –(-8)2 X 122 X 100 –(-8)2

= 9800 –64 = 9092 12200 –64 X 12200 –64 12136 X 12136 = 9736 =0.7953

12136PE = 0.6745 X 1 – 2 =0.6745 X 1 –(-0.8022) 2 = 0.6745 x 1 –0.6435 n 10 10 = 0.3565 x 0.6745 = 0.2404 =0.02404

10 106 x 0.02404 = 0.14424, since >6PE, Correlation is significant.

ILLUSTRATION = 09From the following table calculate the Karl Person’s coefficient of correlation between the marks

obtained is Accountancy and statistics. Also calculate the value of probable error.Marks in

Accountancy

Marks in statistics

50-59 60-69 70-79 80-89 90-99 Total

Below 60 - 6 7 6 - 1960-64.9 5 8 10 4 5 3265-69.9 8 6 8 6 1 2970-74.9 7 12 15 10 5 4975-79.9 10 8 12 3 4 3780-84.9 5 4 13 5 5 32

85&above - 6 10 6 - 22Total 35 50 75 40 20 220

(KU BBM)SOLUTION

Let x represents marks in Accountancy

132

Page 11: Module 7

Let y represents marks in statistics.dx = x –72.45 dy = y – 74.5 5 10

A=72.45

C=5

50-59 60-69 70-79 80-89 90-99MV

54.5 64.5 74.5 84.5 94.5 F

MV dy dx

-2 -1 0 1 2 f fdx Fd2x fdxdy

57.45 -3 -18

6 7-18

6 - 19 -57 171 0

60-64.9 62.45 -220

516

8 10-8

4-20

5 32 -64 128 8

65-69.9 67.45 -116

86

6 8-6

6-2

1 29 -29 29 14

70-74.9 72.45 00

70

12 150

100

5 49 0 0 0

75-79.9 77.45 1-20

10-8

8 123

38

4 37 37 37 -17

80-84.9 82.45 2-20

5-8

4 1310

520

5 32 64 128 2

85-89.9 87.45 3 --18

6 1018

6 - 22 66 198 0

TOTAL F 35 50 75 40 20220N

17fdx

691fd2x

7

Fdy -70 -50 0 40 40-40fdy

fdxdy

Fd2

y140 50 0 40 80

310fd2

y

fdxdy

-4 6 0 -1 6 7

= dxdy X n –( dx dy) d2x X n -(dx)2 X d2y X n-(dy)2

= 7 X 220 –(17 X –40) 691 x 220 –(17)2 X 310 X 220 –(-40)2

= 1540 +680 = 2220 152020 –289 X 68200 –1600 151731 X 66600 = 2220 =2220 = 0.02208

38952 x 258.069 100523

PE = 0.6745 X 1 – 2 =0.6745 X 1 –(0.02208) 2 = 0.6745 x 1 –0.00048 n 220 10 = 0.99952 x 0.6745 = 0.06744

14.82 since >6PE, Correlation is not significant.ILLUSTRATION – 10

Calculate from the following data:-a. The value of Karl Pearson’s coefficient of correlation between salary in Rs and age in years.b. Also calculate its probable error and interpret the result.

Salary in Rs Age in years25 30 35 40 45 50 55 Total

Under 3000 - - 2 4 2 3 2 133000-4999 - - 5 6 6 2 3 225000-6999 1 6 8 10 4 1 4 347000-8999 8 8 6 8 5 5 3 439000-10999 3 5 6 5 3 4 - 2611000-12999 - 5 4 3 - - - 12

Total 12 24 31 36 20 15 12 150

133

Page 12: Module 7

SOLUTIONdx = x –5999.5 , dy = y –40

2000 5

25 30 35 40 45 50 55 TotalMV dy

dx-3 -2 -1 0 1 2 3 F fdx Fd2x Fdxdy

1000-2999

1999.5 -2 - -4

2 4-4 2

-12 3

-12 2 13 -26 52 -24

3000-4999

3999.5 -1 - -5

5 6-6 6

-42

-93 22 -22 22 -14

5000-6999

5999.5 0 1 60

8 100

40

10

4 34 0 0 0

7000-8999

7999.5 1-24

8-16

8-6

6 85

510

59

3 43 43 43 -22

9000-10999

9999.5 2-18

3-20

5-

12 6 56

316

4 - 26 52 104 -28

11000-12999

11999.5

3 --30

5-

12 4 3 - - - 12 36 108 -42

Total F F 12 24 31 36 20 15 12150N

83fdx

329fd2x

-130

Fdx -36 -48 -31 0 20 30 36-29fdy

fdxdy

Fd2y 108 96 31 0 20 60 108423fd2y

fdxdy -42 -66 -21 0 1 10 -12 -130

= dxdy X n –( dx dy) d2x X n -(dx)2 X d2y X n-(dy)2

= -130 X 150 –(83 X –29) 329 x 150 –(83)2 X 423 X 150 –(-29)2

= -19500 + 2404 = -17003 49350 –6889 X 63450 –841 42461 X 62609 = -17003 = -17093 = -0.33146

206.06 x 250.217 51567.95

PE = 0.6745 X 1 – 2 =0.6745 X 1 –(0.33146) 2 = 0.6745 x 1 –0.1098 n 150 12.247 = 0.8902 x 0.6745 = 0.049027 12.247 Here exist negative correlation

ILLUSTRATION = 11Calculate from the following data the value of Karl Pearson’s coefficient of correlation between

sales revenue and advertisement expenditure. Also calculate its probable error and interpret its result.Sales Revenue in

lakhs of RsAdvertisement expenditure in 000 of Rs

25-30 20-25 15-20 10-15 5-10 TotalUnder 125 - 2 5 3 - 10127-174.9 5 6 10 3 3 27175-224.9 4 4 20 4 4 36225-274.9 6 4 9 2 2 23

275-& above - 2 1 - 1 04Total 15 18 45 12 10 100

SOLUTIONdx = x –199.95 dy = y –17.5

50 5

134

Page 13: Module 7

A=17.5 C=5A=199.95C=50

Y 25-30 20-25 15-20 10-15 5-10

TotalMV 27.5 22.5 17.5 12.5 7.5

X MV dxdy

2 1 0 -1 -2 F Fdx Fd2x Fdxdy

75-124.9 99.95 -2 --4

2 56

3 - 10 -20 40 2

125-174.9149.9

5-1

-105

-66 10

33

63 27 -27 27 -7

175-224.9199.9

50

04

04 20

04

04 36 0 0 0

225-274.9249.9

51

126

44 9

-22

-42 23 23 23 10

275-224.9299.9

52 -

42 1 -

-41 04 8 16 0

F 15 18 45 12 10100N

-16fdx

106fd2x 05

Fdy 30 18 0 -12 -2016

fdy

fdxdy

Fd2y 60 18 0 12 40130fd2y

fdxdy 62 -2 0 7 -2 05

= dxdy X n –( dx dy) d2x X n -(dx)2 X d2y X n-(dy)2

= 5 X 100 –(-16 X –16) 106 x 100 –(16)2 X 130 X 100 –(-16)2

= 500 +256 = 756 10600 –256 X 13000 –256 10344 X 12744 = 756 =756 = 0.0658

101.70 x 112.89 11480.9PE = 0.6745 X 1 – 2 =0.6745 X 1 –(0.0658) 2 = 0.6745 x 1 –0.00432 n 100 10 = 0.99568 x 0.6745 = 0.06715

10 There is no significant correlation

TERMINAL QUESTIONS (5, 10 & 15 MARKS)1. What is meant by correlation? What is it intended to measure?2. What is a scatter diagram? How does it help us in studying the correlation?3. Briefly a explain a.. Positive and Negative correlation b. Linear and Non-Linear correlation4. How do you interpret the value of correlation5. What is probable error? State it uses.

PRACTICAL PROBLEMS6. Calculate the value of coefficient of correlation between price and supply. What is probable error?Price 8 10 15 17 20 22 24 25Supply 25 30 32 35 37 40 42 45

[Answer = 0.98, P.E 0.009]7. Compute Karl Pearson’s coefficient of correlation between per capita National income and per capita consumer expenditure from the data given below.Per capital national income 249 251 248 252 258 269 271 272 280 275Per capita consumer expenditure

237 238 236 240 245 255 254 252 258 251

[Answer =0.9675, PE = 0.01387]

135

Page 14: Module 7

8. Calculate the coefficient of correlation from the following data. And calculate its probable error. X 30 60 30 66 72 24 18 12 42 06Y 06 36 12 48 30 06 24 36 30 12

[Answers = 0.575, PE =0.14277]9. Calculate Karl Pearson’s coefficient correlation Advertisement and sales as per the data given.Advertisement cost in 000of Rs 39 65 62 90 82 75 25 98 36 78Sales in lakhs of Rs 47 53 58 86 62 68 60 91 51 84

[Answer = 0.7804, P E = 0.08345]10. Calculate Karl Pearson’s coefficient of correlation from data given.

X 368 384 385 361 347 384 395 403 400 385Y 22 21 24 20 22 26 26 29 28 27

[Answer, = 0.79]11. Compute the coefficient correlation between dividends and prices of securities as given below.

Security prices in

Rs

Annual Dividends In RsTotal

6-8 8-10 10-12 12-14 14-16 16-18

130-140 - - 1 3 4 2 10120-130 - 1 3 3 3 1 11110-120 - 2 3 2 - - 7100 –110 - 2 3 2 - - 790-100 2 2 1 1 - - 680-90 3 1 1 - - - 570-80 2 1 - - - - 3Total 7 8 11 12 9 3 50

[Answer = 0.71, PE = 0.0473]12. Calculate Karl Pearson’s coefficient of correlation from the following Bivariate frequency distribution

and also calculate probable error.Age of

Husbands in year

Age of wives in year

23-30 30-37` 37-44 44-51 51-58 Total

18-25 9 3 - - - 1225-32 - 20 10 4 - 3432-39 - - 12 5 3 2039-46 - - 8 7 5 2046-53 - - 10 4 14Total 9 23 30 26 12 100

Answer = 0.596, P E = 0.0434913. Calculate Karl Pearson’s coefficient of correlation between income & Food expenditure. Also calculate

P.EFood

expenditure in percentage

Family income in Rs

200-300 300-400 400-500 500-600 600-700 Total

10-15 - - - 3 7 1015-20 - 4 9 4 3 2020-25 7 6 12 5 - 3025-30 3 10 19 8 - 40Total 10 20 40 20 10 100

[Answer = -0.44]14. Calculate coefficient of correlation from the following data also calculate probable error.

xY

44.5-49.5 49.5-54.5 54.5-59.5 59.4-64.5 64.5-69.5 Total54.5-59.5 3 4 2 - - 959.5-64.5 4 8 8 2 - 2264.5-69.5 - 7 12 8 4 3169.5-74.5 - 3 8 8 5 2474.5-79.5 - - 3 5 6 14

Total 7 22 33 23 15 100

136

Page 15: Module 7

[Answer = 0.60734, P.E = 0.04256

15. The following table gives the number of students having different height and weight find coefficient of correlation and probable error.

Height in inches

Weights in pounds80-90 90-100 100-110 110-120 120-130 Total

50-55 1 3 7 5 2 1855-60 2 4 10 7 4 2760-65 1 5 12 10 7 3565-70 - 3 8 6 3 20Total 4 15 37 28 16 100

[Answers: =0.0945, PE = 0.0668]

137