26
Unit-8 REGRESSION ANALYSIS INTRODUCTION So far we have studied correlation analysis, which measures the direction and strength of the relationship between two variables. After establishing the correlation existing between the two variables one may be interested in estimating the value of one variable with the help of value of another variable. The statistical method with the help of which we are in a possible to estimate or predict the unknown value of one variable from the known value of another variables is called Regression. The Regression succeeds the correlation once the correlation ship between the two variations is established, the regression analysis proceeds with the estimation of probable values. Sir. Francis Galton, a British biometrician, introduced the concept regression for the first time in 1877: while studying the correlation between the heights of sons and their fathers. He concluded in his studies, “Tall fathers tend to have tall sons and short fathers short sons. The average height of the sons of a group of tall fathers is less than that of the fathers. While the average height of the sons of a group of short fathers is greater than that of the fathers. It means the coming generations of tall or short parents tend to step back to average height of population. Now a days a modern statistician prefer to use the term Regression in the sense of estimation, which is an important statistical tool in a economics business. Meaning Regression means returning or stepping back to the average value. In statistics, the term Regression means simple the average relationship. We can predict or estimate the value of dependent variable from the given related values of independent variable with the help of a Regression Technique. The measure of Regression studies the nature of correlation ship to estimate the most probable values. It establishes a functional relationship between the independent and dependent variables. Definition According to Blair “Regression is the measure of the average relationship between two or more variable in terms of the original units of the data” According to Taro Yamame “ One of the most frequently used technique in economics and business research to find a relation between two or more variables that are related casually, is regression analysis. According to Wallis and Robert “It is often more important to find out what the relation actually is, in order to estimate or predict one variable and statistical technique appropriate in such a case is called regression analysis. 138

Module 8

Embed Size (px)

Citation preview

Page 1: Module 8

Unit-8REGRESSION ANALYSIS

INTRODUCTION So far we have studied correlation analysis, which measures the direction and strength of the relationship

between two variables. After establishing the correlation existing between the two variables one may be interested in estimating the value of one variable with the help of value of another variable. The statistical method with the help of which we are in a possible to estimate or predict the unknown value of one variable from the known value of another variables is called Regression.

The Regression succeeds the correlation once the correlation ship between the two variations is established, the regression analysis proceeds with the estimation of probable values.

Sir. Francis Galton, a British biometrician, introduced the concept regression for the first time in 1877: while studying the correlation between the heights of sons and their fathers. He concluded in his studies, “Tall fathers tend to have tall sons and short fathers short sons. The average height of the sons of a group of tall fathers is less than that of the fathers. While the average height of the sons of a group of short fathers is greater than that of the fathers.

It means the coming generations of tall or short parents tend to step back to average height of population. Now a days a modern statistician prefer to use the term Regression in the sense of estimation, which is an important statistical tool in a economics business.

Meaning Regression means returning or stepping back to the average value. In statistics, the term

Regression means simple the average relationship. We can predict or estimate the value of dependent variable from the given related values of independent variable with the help of a Regression Technique.

The measure of Regression studies the nature of correlation ship to estimate the most probable values. It establishes a functional relationship between the independent and dependent variables.

Definition According to Blair “Regression is the measure of the average relationship between two or more variable

in terms of the original units of the data”According to Taro Yamame “ One of the most frequently used technique in economics and business

research to find a relation between two or more variables that are related casually, is regression analysis.According to Wallis and Robert “It is often more important to find out what the relation actually is, in

order to estimate or predict one variable and statistical technique appropriate in such a case is called regression analysis.

USES OF REGRESSION ANALYSIS Regression analysis is of great practical use even more than the correlation analysis; the following are

some uses,1. Regression analysis helps in establishing a functional relationship between two or more

variables once this is established, it can be used for various advanced analytic purpose.2. With the use of electronic machine and computers tedium of collection of regression equation

particularly expressing multiple and a non-linear relationship has been reduced a great deal.3. Since most of the problems of economic analysis are based on cause and effect relationship.

The regression analysis is a highly valuable tool in economic and business research.4. The regression analysis is very useful for prediction purpose. Once a functional relationship is

known, the value of dependent variable can be predicted from the given value of the independent variable.

CORRELATION AND REGRESSION

These two techniques are directed towards a common purpose of establishing the degree and the direction of relationship between two or more variables but the methods of doing so are different. The choice of one or the other will depend on the purpose. In spite certain similarities between these two, but there are some basic differences in the two approaches, which have been summarized below:

138

Page 2: Module 8

CORRELATION REGRESSION1. Correlation, literally means related or

sympathetic movements between variables2. There is a sort of interdependence, which is

mutual.3. There is no cause and effect relation ship. It

only shows the existence of some association in the movement of variables.

4. It may be spurious correlation if the sympathetic movement is on account of the influence of an out side variable which has no relevance.

5. It is a relative measure showing association between variables.

6. It is used only for testing and verification of the relationship. It tenders only a limited information.

7. It is not very useful for further mathematical treatment.

1. Regression literally means return to the normal, which is true on account of the average of relationship.

2. It establishes a functional relationship, which is mathematical showing dependence of one variable on the other.

3. It may have a cause and effect relationship.4. It is a mathematical relationship, which

should be interpreted suitably.5. It is an absolute measure of relationship.6. Besides verification it can also be used for

estimation and prediction. It tenders more comprehensive information.

7. It is very useful for further mathematical treatment.

METHODS OF REGRESSION ANALYSISThere are two methods:

1. Graphic methods (Not included in the syllabus)2. Algebraic method.

The algebraic methods for simple linear regression can be broadly divided in to the following,A. Regression lines.B. Regression Equations.C. Regression coefficient.

A. REGRESSION LINES:In the graphical jargon, a regression line is a straight line fitted to the data by the method of least squares.

It indicates the best probable mean value of one variable corresponding to the mean value of the other. Since a regression line is the line of best fit, it cannot be used conversely therefore, there are always two regression lines constructed for the relation ship between tow variables x and y. Thus one regression line shows regression of x upon y and the other shows regression of y upon x.

When two variables have relationship, then we can draw a regression line. The regression line of x on y gives the most probable vales of x for any given value of y. In the same manner the regression line of y on x gives the most probable values of y for any given value of x. Thus there will be two regression lines in the case of two variables.

REGRESSION EQUATIONSRegression equation is an algebraic method. It is an algebraic expression of the regression line. It can be

classified in to regression equation, regression coefficients. As there are two regression lines, there are two regression equations. For the two variables x and y, there

are two regression equations. They are regression equation of x on y and the regression equation of y on x.I Regression equation of x on y

II Regression Equation of Y on X

139

Y(Y-Y)=r (X-X)

(X-X)=r (Y-Y) Y

Page 3: Module 8

Application of Regression Equations when all required values are given

ILLUSTRATION =01From the following results, obtain the two-regression equation and estimate the yield of crops when the

rainfall is 29 cms and the rainfall when the yield is 600 kg.Y

YieldIn Kg

XRainfallIn cm

MeanS.D

508.436.8

26.74.6

Co efficient of correlation between yield and rainfall=0.52Solution: To estimate the yield of crops, we have to use Y on X Regression Equation.

Y-508.4 = 4.16 (x-26.7) Y-508.4 =4.16x-111.072 Y = 4.16x-111.072+508.4

Y=4.16x +397.328 R.line When x =29=4.16 x 29 + 397.328= 120.64 + 397.328= 517.968 kgs

Similarly to estimate rainfall, we have to used x on y Regression equation.

When Y=600 KgsX=0.065X600-6.346=39-6.346X=32.654

140

Y(X-X)=r (Y-Y)

36.8Y-508.4=0.52 (X-26.7) 4.6

4.6X-26.7=0.52 (Y-508.4) 36.8

X-26.7=0.065 (Y-508.4)

Y(Y-Y)= r (X-X)

X-26.7=0.065Y-33.046

X=0.065Y-33.046+26.7

X=0.065Y-6.346 R, Line

Page 4: Module 8

ILLUSTRATION =02Find out the regression equation, showing the regression of capacity utilization on production from the

following data.Production In lakh

UnitsAverage

35.6Standard Deviation

10.5Capacity Utilization

(in percentage)84.8 8.5

Coefficient of correlation}=0.62Estimate the production when the capacity utilization is 70%SOLUTION; Let the production and capacity utilization be denoted by X and Y respectively. Then we are given;

To estimate production we have to use X on Y regression equation

ILLUSTRATION = 03

Karl Pearson’ coefficient of correlation between the ages of brother’s and sisters in a community was found to be 0.8. Average of the brother’s ages was 25 years and that of sister’s were 22years.Their standard deviations were 4 and 5 respectively.Find a. The expected age of brother when the sister’s age is 12 years.

b. The expected age of sister when the brother’s age is 33 years.Solution:

Brother Sister X Y

Mean age 25 years 22years Standard

Deviation 4 5

Co-efficient of Correlation 0.8To estimate the brother’s age, we have to use X on Y Regression equation. X=? When Y =12

141

X=35.6 Y=84.8 X=10.5 Y=8.5 P=0.62

10.5(X-35.6)=0.62 (Y-84.8) 8.5

X=35.6=0.7658(Y-84.8)X-35.6=0.7658Y-64.94X=0.7658y—64.94+35.6 X=0.7658y-29.34 R.LineWhen Y=70%=0.7658X70-29.34=53.606-29.34X=24.266 lakh unit

(X-X)= r (Y-Y) Y

(X-X)=r (Y-Y) Y

Page 5: Module 8

To estimate the sister’s age, we have to use Y on X regression equation Y=? When X=33years

ILLUSTARION=04Give the following data, estimate

1. The value of Y when X=702. The value of X when Y=90

X-Series Y-Series Mean 18 100

Standard deviation 14 20Co-efficient of correlation 0.8SOLUTION

142

4X-25=0.8 (Y-22)

5X-25=0.64(Y-22)X-25=0.64Y-14.08X=0.64Y-14.08+25X=0.64Y+10.92 R.Line When Y=12=0.64X12+10.92X=18.6 years, brother’s age

Y(Y-Y)=r (X-X)

5(Y-22)=0.8 (X-25)

4Y-22=1.0 (X-25)Y-22=1X-25Y=X-22+22

Y=X-3 R.LineWhen X=33Y=33-3Y=30 years, sister’s age

I .Y=? When X =70 use Y on X R. equation

Y(Y-Y)= r (X-X)

X 20

Y-100=0.8 (X-18) 14

Y-100=1.143 (X-18)Y-100=1.143X-20.574Y=1.143X-20.574+100Y=1.143X+79.426 R.LineWhen X=70Y=1.143 X 79 + 79.426Y=80.01+79.426Y=159.436

II. X=? When Y=90 Use X on Y R. Equation

14X-18=0.8 (Y-100)

20X-18=0.56 (Y-100)X-18=0.56Y-56X=0.56Y-56+18X=0.56Y-38 R.LineWhen Y=90X=0.56 X 90-38=50.4-38X=12.4

(X-X)= r (Y-Y) Y

Page 6: Module 8

ILLUSTRATION=05To study the relationship between expenditure on a accommodation (X) and expenditure on Food (Y), an

enquiry in to 50 families gave the following result;

Estimate the expenditure on food when expenditure on accommodation is Rs200.

SOLUTIONTo estimate expenditure on food, we should use Y on X Regression Equation.

20 (Y-192)=0.6 (X-170) when X=200 60 Y=0.1999 X 200 + 158Y-192=0.1999(X-170) =39.98+158Y-192=0.1999X-33.9999 Y=Rs.197.98Y=0.1999X+158 R.L Rs.197.98 is required to be spent on food.

ILLUSTRATION=06

Obtain the two Regression Equations from the following;

X-Series Y-SeriesMean 20 25Variance 4 9Coefficient of correlation =0.75

SOLUTIONObtaining of two Regression lines

ILLUSTRATION=07

143

∑X=8500, ∑Y=9600, X=60, Y=20, r=0.60

Y(Y-Y)=r (X-X)

∑X 8500 ∑y 9600X= = =170, Y= =192 n 50 n 50

X= Variance == 2 bxy=Regression coefficient on x on yb=Regression coefficient

Xbxy= r

Y (X-X)=bxy (Y-Y)

2X-20=0.75 (Y-25)

3X-20=0.5 (Y-25)X-20=0.5-12.5X=0.5-12.5+20X=0.5+7.5 R.Line

X on Y R. Equation

Y= Variance = = 3 bxy=Regression coefficient on Y on Xb=Regression coefficient

Ybxy= r

X (Y-Y)=bxy (X-X)

3Y-25=0.75 (X-20)

2Y-25=1.125 (X-20)Y-25=1.125-22.5Y=1.125X-22.5+25Y=1.125+2.5 R.Line

Y on X R. Equation

Page 7: Module 8

ILLUSTRATION = 07You are given the following data.

X-Sries Y-SeriesMean 47 96Variance 64 81

Coefficient of Correlation =0.36

Calculate Y when X is 50, and X when Y is 88.

SOLUTION

ILLUSTRATION=08The following results for heights and weights of 100 men were calculated.

Mean Standard DeviationCoefficient ofCorrelation

Weights 150 lbs 20 lbs0.60

Heights 68 ” 2.5 “Find an estimate

1. The weight of a man whose height is 5’ (5’=60”)2. Height of a man whose is 200 lbs

SOLUTIONLet X= Weight and Y = Height.

144

X= Variance = 64 = 8 bxy= r YX-X =bxy (Y-Y)

8X-47=0.36 (Y-96)

9X-45=0.3199 (Y-96)X-47=0.3199Y-30.7199X=0.3199Y-30.7199+47X=0.3199Y+16.28 R.LineWhen Y=88X=0.3199 X 88 + 16.28X=28.1512 + 16.28X= 44.4312

X on Y R.Equation

Y= Variance = 81 = 9 Ybxy= r (Y-Y) =bxy (X-X)

9Y-96=0.36 (X-47)

8Y-96=0.405 (X-47)Y-96=0.405X-19.035Y=0.405X-19.035+96Y=0.405X+76.965 R.LineWhen X=50Y=0.405 X 50 + 76.965=20.25 + 76.965Y= 97.215

Y on X R.Equation

Page 8: Module 8

REGRESSION COEFFICIENTSRegression coefficient is denoted by ‘b’. There are two regression equations and therefore

there are two regression coefficients also. Regression coefficients measure the changes in the series corresponding to a unit change in the other series.

The Regression coefficient of X on Y

Give us the value by which X-variable changes for a unit change in the value of Y-variable.

Similarly the regression on of Y on X

Refers to the value by which Y-variable changes for a unit change in X-variable

These two coefficients measure the change in dependent variable corresponding to the unit change in independent variable. They also help in direct calculation of coefficient of correlation.

Square – root of the product of two Regression coefficient gives us the value of correlation, as under;

Bxy X byx =2

145

(X-X)=bxy (Y-Y) 20(X-150)= X 0.6 (Y-68) 2.5X-150=4.8 (Y-68)X-150= 4.8Y-326.4X= 4.8Y-326.4+150X=4.8Y-176.4 RLwhen Y=60 5X=4.8 X 600-176.4X=111.6”OR X =9’-3.6”

X on Y R Equation(Y-Y)=byx (X-X) 20(Y-68)= (X-150) 2.5Y-68=0.075 (X-150)Y-68= 0.075X-11.25Y= 0.075X-11.25+68Y=0.075X+176.4 RLwhen X=200 lbsY=0.075 X 200 + 56.75 Y =71.75 lbs

X on Y R Equation

X i.e bxy =r

Y

∑dxdy X n – (∑dx X ∑dy)bxy =

∑d 2Yxn - (∑dy)2

Y i.e. byx =r

X

∑dxdy X n – (∑dx X ∑dy)byx =

∑d 2xX n-(∑dx)2

r = bxy X byx

X YBxy X box = X r

Y X

Page 9: Module 8

CALCULATION OF REGRESSION COEFFICENTS AND MAKING ESTIMATION OF UN-KNOWN VALUE

INDIVIDUAL SERIES =

ILUSTRATION =09From the data given below find out;

a. Regression coefficientsb. Regression Equationsc. Estimate the age when B.P is 130d. Estimate the B.P when age is 50 yearse. Find the coefficient of correlation through Regression coefficients.

Age 56 42 72 36 63 47 55 49 38 42 68 60B.P 147 125 160 118 149 128 150 145 115 140 152 155

SOLUTIONAgeX

X-47dx

D2xB.PY

Y-128dy

D2Ydxdy

564272366347554938426860

9-525-1116082-9-52113

812562512125606448125441169

147125160118149128150145115140152155

19-332-102102217-13122427

3619

10241004410

484289169144576729

171158001103360

17634117-60504351

N=12

64∑dx

1892∑d2x

N=12

148∑dy

4326∑d2y

2554∑dxdy

146

When actual data is given and deviation are taken from assumed mean

∑dxX=A+ X C

N64

=47+ X 112

X=52.33∑dy

Y=A + X C n148

=128+ X 1 12

=128+12.33Y= 140.33

Xbxy= Y

∑dxdy X n – (∑dx X ∑dy)byx = ∑d 2Y X n - (∑dY)2

= 2554 X 12 – 64X1484326X12 – (148) 2

= 30648 – 947251912 – 21904

= 21176 =0.705730008 0.7057

X on Y =R. Equation(x- )=bxy (Y-Y)(X-52.33)=0.7057 (Y-140.33)

Regression coefficient X onY

Ybxy= x

∑dxdy X n – (∑dx X ∑dy)byx = ∑d 2x X n - (∑dX)2

= 2554 X 12 – 64X1481892 X12 – (64) 2

= 2117622704 – 4096

= 21176 18608 =1.138

X on Y =R. Equation(Y-Y)=byx (x- )Y-140.33=1.138 (X-52.33)

Regression coefficient X on Y

Page 10: Module 8

Coefficient of correlation =√bxy X bys = √0.7057 X 1.138 =0.896

ILLUSTRATION=10From the following data, obtain the two Regression Equations. Also calculate coefficient of

correlation based on regression coefficient.Sales: X 91 97 108 121 67 124 51 73 111 57Purchases: Y 71 75 69 97 70 91 39 61 80 47

SOLUTION

XX-67

dxdx2 Y

Y-70dy

Dx2 dxdy

919710812167124517311157

24304154057-16644-10

57690016812416

0324925636

1936100

71756997709139618047

15-127021-31-910-23

1251

7290

44196181100529

24150-41

14580

1197496-54440230

230∑dx

11150∑d2x

0∑dy

2868∑d2x

3900∑dxdy

147

X-52.33=0.7057Y-99.031X=0.7057Y-99.031+52.33X=0.7057Y-46.701Estimation of age (X) whenB.P(Y) is 130X=0.7057 X 130-46.701=91.741-46.701X=45.04 years

Y-140.33=1.138X-59.55Y=1.138X-59.55+140.33Y=1.138X-80.78Estimation of B.P (Y) whenAge(X) is 50 yearsY=1.138 X 50-80.78=56.9-80.78Y=137.68

X=A +∑dx X C W

=67+230 X 1 10=90

Y= A + ∑dy X C N

=70 + 0 X 1 10Y = 70

XBxy =

Y

∑dxdy X n – (∑dx X ∑dy)Bxy=

∑dy2 X n – (∑dy)2

= 3900 X 10 – (230 X 0)2868 X 10 – (0) 2

=39000 – 0 = 39000 28680 – 0 28680 = 1.359

X on y Regression on coefficients

YBxy =

X

∑dxdy X n – (∑dx X ∑dy)Bxy=

∑d2x X n – (∑dX)2

= 3900 X 10 – (230 X 0)11150 X 10 – (230) 2

=39000 = 39000 11150 - 52900 = 1.359 = 0.665

Y on X Regression on coefficients

Page 11: Module 8

Coefficient of Correlation = √bxy X byx =√1.359 X 0.665 = 0.9506

ILLUSTRATION = 11The following data related to the ages of husband and wives. Obtain the two Regression

equations and estimate the most likely age of husband for the age of wife 25 years.Ages of husbands 25 28 30 32 35 36 38 39 42 55Ages of wife’s 20 26 29 30 25 18 26 35 35 46

SOLUTION

Xx-36dx

D2x YY-29dy

D2y dxdy

25283032353638394255

-4-8-6-4-1023619

121643616104936361

20262930251826353546

-9-301-4-11-36617

819011612193636289

99240-440-61836323

N=100

∑dx648

∑d2x0

∑dy598

∑d2y494

∑dxdy

148

Regression Equation

(X-X) = bxy (Y-Y)X-90 = 1.359 (Y-70)X-90 = 1.359Y – 95.BX = 1.359Y – 95.B + 90X = 1.359Y - 5.13 R.Line

Regression Equation

(Y-Y) = byx (X-X)(Y-70)= 0.665 (X-90)Y-70 = 0.665X – 59.85Y = 0.665X – 59.85 + 70Y = 0.665X + 10.15 R.Line

∑dxdy X n – (∑dx X ∑dy)byx =

∑d 2y X n-(∑dx)2

= 494 X 10 – 0 X 0598X 10 – (0) 2

= 4940=0.8261

∑dxdy X n – (∑dx X ∑dy)byx = ∑d 2xX n-(∑dx)2

= 494 X 10 – (0 X 0)648X 10 – (0) 2

= 49406480 =0.7623

X = A + ∑dx X C N

= 36 + 0 X 1 10X =36Y = A + ∑dy X C

N=29 + 0 X 1 10Y = 29

XBxy = r R. coefficient.

Y Y

Box = r R. coefficient X

Page 12: Module 8

ILLUSTRATION =12A panel of two Judges P and Q graded dramatic performance by independently awarding marks as

follows.Performance 1 2 3 4 5 6 7Marks by ‘P’ 46 42 44 40 43 41 45Marks by ‘Q’ 40 38 36 35 39 37 41

The eight performance which judge Q could not attend, was awarded 37 marks by judge P. If Judge Q had also been present, how many marks could be expected to have been awarded by him to the eight performances.SOLUTION

Let the marks awarded by judge P be represented by X and those awarded by judge Q be Y. We have to find out the value of Y when X=37. This can be done by finding out the regression equation Y on X.Computation of Regression Equation Y on X

XX-43Dx

D2X YY-38dy

D2Y dxdy

46424440434145

3-11-30-22

9119044

40383635393741

20-2-31-13

4049119

60-29026

0∑dx

28∑d2X

0∑dy

28∑d2y

21∑dxdy

Regression Equation of Y on X

If judge Q was present, he would have awarded 33.5 marks.

149

∑dxX=A+ X C

N

=43+ 0 X 1 7X=43

Y=A + ∑dy X C N

=38 + 0 X 1 7Y=38

Y- Y = bxy (X-X)Y – 38 = bxy (X-43)

∑dxdy X n – (∑dx X dy) 21 X 7 – 0 147bxy = ∑d2x X n – (∑dx)2 28 X 7 – 0 = 196 = 0.75Y – 38 = 0.75 (X-43)Y-38 = 0.75X – 32.25Y=0.75x +38 – 32.25Y=0.75x + 5.75 R.LineWhen X = 37=0.75 X 37 + 5.75 Y=33.5

XBxy= r

Y

Regression Equation

X – X = bxy (Y-Y)X – 36 = 0.8261Y – (Y-29)X –36 = 0.8261Y – 23.9569X=0.8261Y – 23.9569 + 36X = 0.8261Y + 12.0431 R.LIf a wife’s age is 25 (y)X = 0.8261 X 25 + 12.0431=20.6525 + 12.0431X = 32.6956Husband’s age is 32.6956 years

Regression Equation

Y – Y = byx (X-X)Y – 29 = 0.7623 – (X-36)Y –29 = 0.7623X – 27.4428Y=0.7623X – 27.4428 + 29

Y = 0.7623X + 1.5572 R.Line

Coefficient of correlationr=√bxy X byx=√0.8261 X 0.7623r = 0.7935

Page 13: Module 8

REGRESSION EQUATION IN A BIVARIATE GROUPED FREQUENCY DISTRIBUTIONThe procedure is the same as we have followed in case of individual series.

The modified formula is as under ; Regression coefficient of X on Y

ILLUSTRATION = 12

Following table gives the ages of husbands and wives for 50 newly married couples. Find the two regression lines. Also estimate. A) The age of husband when wife is 20 and B) The age of wife when husband is 30.

Age of wivesAge of Husbands

20-25 25-30 30-35 Total16-20 9 14 - 2320-24 6 11 3 2024-28 - - 7 7Total 15 25 10 50

SOLUTION Class interval for age of husband x is = 5

Class interval for age of wife (Y) is =4

A=27.5 C=5A=22 C=4

X 20-25 25-30 30-35 Total

22.5 27.5 32.5

Y MV dxdy

-1 0 1 f fdy fd2y fdxdy

16-20 18 -1 9 9 14 - 23 -23 23 9

150

Xi.e,bxy=

Y ∑fdxdy X N – (∑fdx X ∑fdy) c of x

bxy = X ∑fd2y X N - (∑fdy)2 c of y

Regression coefficient of Y on X

Yi.e box = r

X∑fdxdy X N – (∑fdx X ∑fdy) c of y

box = X∑fd2x X N – (∑fdx)2 c of x

Coefficient of correlation = √bxy X byx

X – 27.5Dx = 5

Y – 22dy = 4

Page 14: Module 8

20-24 22 0 6 11 3 20 0 0 0

24-28 26 1 - -7

7 7 7 7 7

Total F 15 25 1050N

-16∑fsy

30∑fd2y

16

fdx -15 0 10-5

∑fdx

Fd2x 15 0 1025

∑fd2xfdxdy 9 0 7 16

r =√bxy X box =√0.723 X 0.47 = 0.5829

ILLUSTRATION –14 The following are the marks obtained by 132 students in Test X and Test Y. calculate a) The Regression Coefficient

b) Two Regression Equationsc) Coefficient of correlation

XY

30-40 40-50 50-60 60-70 70-80 Total

151

∑fdx –5X = A + X C = 27.5 + X 5

N 50 = 27

Regression Coefficient of X on Y∑fdxdy X N – (∑fdx X ∑fdy) c of x

bxy = X ∑fd2y X n – (∑dy)2 c of y =16 X 50 – (-5 X –16) 5 X 5 30 X 50 – (-16)2 4 4800 – 80 5 720 5

= X = X 1500 –256 4 1244 4 = 3600 4976 = 0.723

(X-X) = bxy (Y-Y)X – 27 = 0.723 (Y – 20.72)X – 27 = 0.723Y – 14.98X = 0.723Y – 14.98 + 27X = 0.723Y + 12.02 R. LineEstimate of husband’s age when Y =20X = 0.723 X 20 + 12.02X = 26.48 years

X on Y R.E

∑fdx –16Y = A + X C = 22 + X 4

N 50 64

= 27 – 50

= 22 – 1.28 = 20.72∑fdxdy X N – (∑fdx X ∑fdy) c of y

bxy = X ∑fd2y X n – (∑fdy)2 c of x =16 X 50 – (-5 X –16) 4 X 4 25 X 50 – (-5)2 4 5800 – 80 4 720 4

= X = X 1500 –256 5 1225 5 = 2880 6125 = 0.47

(Y-Y) = byx (X-X)(Y – 20.72) = 0.47 (X – 27)Y – 20.72 = 0.47X – 12.69Y = 0.47X – 12.69 + 20.72Y= 0.47X + 12.03 R. LineEstimate of wife’s age when X =30Y = 0.47 X 30 + 8.03 = 1410 + 8.03 = 22.13 years

Y on X R.E

Page 15: Module 8

20-30 2 5 3 - - 1030-40 1 8 12 6 - 2740-50 - 5 22 14 1 4250-60 - 2 16 9 2 2960-70 - 1 8 6 1 1670-80 - 2 4 2 8Total 3 21 63 39 6 132

SOLUTION A=55 c=10 A=45C=10

X 30-40 40-50 50-60 60-70 70-80 Total

35 45 55 65 75

Y MV dxdy

-2 -1 0 1 2 f fdy Fd2y fdxdy

20-30 25 -28

210

5 3 - - 10 -20 40 18

30-40 35 -12

18

8 12-6

6 - 27 -27 27 4

40-50 45 0 -0

5 220

140

1 42 0 0 0

50-60 55 1 --2

2 169

94

2 29 29 29 11

60-70 65 2 -2

1 812

64

1 16 32 64 14

70-80 75 3 - - 212

412 2 8 24 72 24

Total F 3 21 63 39 6132n

38∑fdy

232∑fd2y

71

Fdx -6 -21 0 39 1224

∑fdx

Fd2x 12 21 0 39 2496

∑fd2xfdxdy 10 14 0 27 20 71

152

∑fdx X = A + X C

N=55 + 24 X 10 132=55 + 240 132=55 + 1.82 X = 56.82

∑fdy Y = A + X C

N=45 + 38 X 10 132=45 + 380 132=45 + 2.878 = 47.878

Regression on Coefficient of X on Y ∑fdxdy X N – (∑fdx X ∑fdy) C of Xbxy = X ∑fd2y X N – (∑fdy)2 C of Y= 71 X 132 – (24 X 38) 10

232 X 132 – (38)2 10= 9372 – 912 = 8460

30624 – 1444 29180 =0.289R. Equation

X-X=bxy (Y-Y)X-56.82 = 0.289 (Y-47.88)X-56.82=0.29Y – 13.8852X=0.29Y – 13.8852 + 56.82X=0.29Y + 42.93 R.Line

Regression on Coefficient of Y on X ∑fdxdy X N – (∑fdx X ∑fdy) C of Ybyx = X ∑fd2x X N – (∑fdx)2 C of X= 71 X 132 – (24 X 38) 10

96 X 132 – (24)2 10= 8460 = 8460

12672 – 576 12096 =0.699R. Equation

Y-Y=bxy (X-X)Y-47.88 = 0.699 (X-56.82)Y-47.88=0.7x– 39.774Y=47.88=0.7x-39.774Y=0.7x + 8.11 R.Line

Page 16: Module 8

Coefficient of Correlation = √bxy X byx =√0.29 X 0.7 = 0.450

ILLUSTRATION = 15

Following is the distribution of students according to their Height and Weight.

HeightIn inches X

Weight in lbsY90-100 100-110 110-120 120-130 TOTAL

50-55 4 7 5 2 1855-60 6 10 7 4 2760-65 6 12 10 7 3565-70 3 8 6 3 20

TOTAL 19 37 28 16 100

From the above,a) Estimate the weight when height is 63 inchesb) Estimate the height when weight is 115 lbsc) Calculate coefficient of correlation

SOLUTION: Let X be height in inches, Let Y be weight is lbs

Y 90-100 100-110110-120

120-130

Total

95 105 115 125

X MV dydx

-2 -1 0 1 f fdx fd2x fdxdy

50-55 52.5 -216

414

7 5-4

2 18 -36 72 26

153

∑fdxX = A X C

N - 43

=62.5 + X 5 100

= 62.5 – 215 100

= 60.35

∑fdyY= A X C

N - 59

=115 + X 10 100

= 115 - 590 100

Y = 109.1

Page 17: Module 8

55-60 57.5 -112

610

10 7-4

4 27 -27 2718

60-65 62.5 00

60

12 100

7 35 0 0 0

65-70 67.5 1-6

3-8

8 63

3 20 20 20 -11

Total f 19 37 28 16100N

-43∑fdx

119∑fd2x

33

fdxy -38 -37 0 16 -59 ∑fdy

∑fdxdyfd2y 76 37 0 16129

∑fd2

yfdxdy 22 16 0 -5 33

ILLUSTRATION = 16From the following data find:

a) The most probable value of Y, when X is 60 andb) The most probable value of X, when Y is 40 andc) The coefficient of correlation

154

X =53.2, Y=27.9, byx -1.5 and bxy = - 0.2

Xbxy = r

Y ∑fdxdy X N – (∑fdx X ∑fdy) Cof x

bxy = X ∑fd2y X N – (∑fdy)2 Cof y

=33 X 100 –(-43 X 59) 5 129 X 100 – (59)2 10 3300 - 2537 = X 0.5 12900 – 3481= 763 X 0.5 = 381.5 9419 1 9419 = 0.0405R. Equation

(X – X) = bxy (Y-Y)X – 60.35 = 0.0405 (Y – 109.1)X – 60.35 = 0.0405y – 4.41855X=0.0405y – 4.41855 + 60.35X=0.0405y + 55.93145 R.LEstimation of height (x) when weight (y) is 115 lbs.X=0.0405 X 115 + 55.93145X=4.6575 + 55.93145X=60.6 inches height

X on Y Regression Equation

Ybyx = r

X ∑fdxdy X N – (∑fdx X ∑fdy) Cof y

byx = X ∑fd2x X N – (∑fdx)2 Cof x

=33 X 100 –(-43 X 59) 10 119 X 100 – (-43)2 5 3300 + 2537 2 = X 11900 – 1849 1= 763 X 2 =0.15 10051 byx =01518 R. Equation

(Y – Y) = bxy (X-X)Y – 109.1 = 0.1518 (X – 60.35)Y – 109.1 = 0.1518x – 9.16113Y=0.1518x – 9.16113 + 109.1Y=0.1518x + 99.93897 R.LEstimation of weight (y) when height (x) is 63 inches.Y=0.1518 X 63 + 99.93897 =9.5634 + 99.93897Y=109.5 lbs r=√bxy X box =√0.0405 X 04518 = 0.0784

Y on X Regression Equation

Page 18: Module 8

SOLUTION

Coefficient correlation will be r = √bxy X box = √-1.5 X –0.2= - 0.5477

THEORETICAL QUESTIONS (5 , 10 & 15 Marks)1.What is meant by Regression? How is this concept useful to business fore casting? 2. Destination clearly between correlation and Regression analysis.3. What is Regression analysis? State its uses.4. Define Regression and explain its importance5. Briefly explain:

a. Regression lineb. Regression Equationc. Regression Coefficient

PRACTICAL PROBLEMS6. Given the following data, calculate,

a. The expected value of Y when X=60b. The expected value of X when Y=120

X YMean 65 120SD 5 10

Coefficient of correlation = 0.6 [Answers, X=65 Y=114]

PROBLEM = 07Given the following data estimate the marks in Mathematics for a student who has secured 60 marks in English.

Arithmetic Average of Marks in Maths = 80Arithmetic Average of Marks in English = 50

SD of Marks in Mathematics _ _ _ _ _ _ _ 15 SD of Marks in English _ _ _ _ _ _ _ _ _ _ 10

Coefficient of Correlation _ _ _ _ _ _ _ _ _ _ 0.4[Answer : 86]

PROBLEM = 08Find the most likely Price in Bangalore corresponding to the price ofRs.70 at Mysore from the following

data Average price at Mysore = Rs.65

Average price at Bangalore = Rs.67 SD of Price at Mysore = Rs.2.5

SD of Price at Bangalore = Rs.3.5Coefficient of correlation between the two prices of the commodity in the two cities is 0.8.Also estimate the price at Mysore Corresponding to the price Rs.50 at Bangalore.

155

X on Y R.Equation

X(X-X)=r (Y-Y)

Y(X-53.2)=-0.2 (Y-27.9)X-53.2 = - 0.2Y + 5.58X = -0.2Y + 5.58 + 53.2X = -0.2Y + 58.78 R.LineIf Y is 40X= - 0.2 X 40 + 58.78X = 50.78

Y on X R.Equation

(Y-Y) = box (X-X)Y-27.9 = -1.5 (X-53.2)Y-27.9 = - 1.5x + 79.8Y = - 1.5x + 79.8 + 27.9Y=1.5x + 107.7 R.LIf x is 60Y = -1.5 X 60 + 107.7= - 90 + 107.7Y = 17.2

Page 19: Module 8

[Answer: 72.6 and 55.3]PROBLEM = 09

You are given the following data. X Y

Mean 36 85 S. D. 11 8

Coefficient of correlation = 0.66 1.Find the two regression equations

2.Estimate the Value of X when Y = 75[Answer X75= 26.92]

PROBLEM = 10 The following are the marks in Statistics (X) and Mathematics (Y) of ten students

X 56 55 58 58 57 56 60 64 69 57Y 68 67 67 70 65 68 70 66 68 66

Calculate the coefficient of correlation based on bxy and byx also estimate the marks in Mathematics of a student who scores 62 marks in Statistics.

[Answer: r = 0.78,bxy= 0.0294, Y = 67.59]PROBLEM NO: 11

From the following data, obtain both the regression equations and estimate the demand (Y) if the price (X) is 75.Price (X) 60 63 66 69 72 78 81 90 96 99

Demand(Y) 85 87 84 80 82 79 78 73 70 72

PROBLEM NO: 12Form the data given below, find

a. The two regression equations b. The Coefficient of Correlation between the marks in Economics and Statistics.c. The most likely marks in Statistics when marks in Economics are 30.

Marks in Economics X 25 28 35 32 31 36 39 38 34 32Marks in Statistics Y 43 46 49 41 36 32 31 30 33 39

[Ans: X = 40.892 –1.234Y, Y = 59.248 –0.664X, r =0.394, Y= 39]PROBLEM =13

The following data relate to price and demand of a commoditya) Estimate demand when price is Rs.30b) Estimate price when demand is 65 unitsc) Coefficient of correlation.

Demand in units 20 22 25 23 18 16 14 17 21 19Price in Rs 50 45 38 42 55 58 59 54 49 57

[Answer a) 29.6 b) 13.21 c) r = - 0.94]

PROBLEM = 14The following table shows the frequency distribution of couples classified according to the ages.

Calculate,a) Obtain two Regression coefficients.b) Estimate the age of husband when wife’s age is 28 years.c) Calculate coefficient of correlation.

Wife’s ageIn years Y

Husbands age in years X20-25 25-30 30-35 35-40 TOTAL

15-20 20 10 3 2 3520-25 4 18 6 4 3225-30 - 5 11 - 1630-35 - - 2 - 235-40 - - - 5 5

TOTAL 24 33 22 11 90[ Answers, r = 0.612, X = 22.5, Y = 28.6, b = 31.7 , box = 0.558 ]

156

Page 20: Module 8

PROBLEM = 15 From the following data,

a) Estimate X when Y = 30 and also b) Estimate Y when X = 20 XY

5-15 15-25 25-35 35-45 TOTAL

0-10 1 1 - - 210-20 3 6 5 1 1520-30 1 8 9 2 2030-40 - 3 9 3 1540-50 - - 4 4 8

TOTAL 5 18 27 10 60[Answer a) 28.7 b)22.31]

PROBLEM NO =16From the following data, calculate

a)Regression coefficients b) Coefficient of correlation based on bxy and box. YX

30-35 35-40 40-45 45-50 TOTAL

25-30 20 10 3 2 3530-35 4 28 6 4 4235-40 - 5 11 - 1640-45 - - 2 - 245-50 - - - 5 5

TOTAL 24 43 22 11 100[Answer: X = 32.5, Y = 38.5 bxy = 0.6744 box = 0.5576, = 0.6132]

PROBLEM = 17Calculate two Regression Coefficients. Estimate the value of X when Y = 49 also calculate

coefficient of correlation based on bxy and box.X 43 44 46 40 44 42 45 42 38 40 42 57Y 29 31 19 18 19 27 27 29 41 30 26 10

[Answer X = 64.8, Y = ? , bxy = -0.44, byx = -1.2198, = -0.732]PROBLEM = 18

From the following bivariate table calculate the following a) Two Regression coefficientsb) Coefficient of correlation based on bxy and box

XY

59.9 79.5 99.5 119.5 139.5 159.5 179.5 TOTAL

2.25 3 4 3 6 2 1 1 207.25 2 3 5 10 3 1 1 2512.25 5 4 6 11 5 3 3 3717.25 10 11 12 15 12 15 10 8522.25 4 2 3 10 7 5 6 3727.25 1 1 2 8 8 5 4 2932.25 1 1 1 10 5 4 5 27

TOTAL 26 26 32 70 42 34 30 260

[Answer: X = 17.80, Y = 122.42, bxy = 0.05, box = 1.06, r = 0.230]

157

Page 21: Module 8

158