40
Department of Quantitative Methods & Information Systems Introduction to Business Statistics QM 220 QM 220 Chapter 14 Dr. Mohammad Zainal Simple linear regression Simple Linear Regression Model This chapter considers the relationship between two variables in two ways: 1 B i i l i 1. By using regression analysis. 2. By computing the correlation coefficient. B i th i dl l t th By using the regression model, we can evaluate the magnitude of change in one variable due to a certain change in another variable in another variable. For example, an economist can estimate the amount of change in food expenditure due to a certain change in the income of a household by using the regression model. QM-220, M. Zainal 2

Introduction to Business Statistics QM 220 Chapter 14 14-2.pdf · Introduction to Business Statistics QM 220 Chapter 14 ... The line crosses theThe line crosses the ... relationship

Embed Size (px)

Citation preview

Page 1: Introduction to Business Statistics QM 220 Chapter 14 14-2.pdf · Introduction to Business Statistics QM 220 Chapter 14 ... The line crosses theThe line crosses the ... relationship

Department of Quantitative Methods & Information Systems

Introduction to Business StatisticsQM 220QM 220

Chapter 14

Dr. Mohammad Zainal

Simple linear regression

Simple Linear Regression Model

This chapter considers the relationship between twovariables in two ways:

1 B i i l i1. By using regression analysis.2. By computing the correlation coefficient.

B i th i d l l t thBy using the regression model, we can evaluate themagnitude of change in one variable due to a certain changein another variablein another variable.

For example, an economist can estimate the amount ofchange in food expenditure due to a certain change in thec ge ood e pe d u e due o ce c ge eincome of a household by using the regression model.

QM-220, M. Zainal 2

Page 2: Introduction to Business Statistics QM 220 Chapter 14 14-2.pdf · Introduction to Business Statistics QM 220 Chapter 14 ... The line crosses theThe line crosses the ... relationship

Simple linear regression

Simple Linear Regression Model

A sociologist may want to estimate the increase in thecrime rate due to a particular increase in the unemployment

trate.

Besides answering these questions, a regression model alsoh l di t th l f i bl f i l fhelps predict the value of one variable for a given value ofanother variable.

i i i iFor example, by using the regression line, we can predictthe (approximate) food expenditure of a household with a

i igiven income.

QM-220, M. Zainal 3

Simple linear regression

Simple Linear Regression Model

The correlation coefficient, on the other hand, simply tellsus how strongly two variables are related.

It does not provide any information about the size of thechange in one variable as a result of a certain change in theth i blother variable.

For example, the correlation coefficient tells us howi f i istrongly income and food expenditure or crime rate and

unemployment rate are related.

QM-220, M. Zainal 4

Page 3: Introduction to Business Statistics QM 220 Chapter 14 14-2.pdf · Introduction to Business Statistics QM 220 Chapter 14 ... The line crosses theThe line crosses the ... relationship

Simple linear regression

Simple Linear Regression Model

Simple Regression

Going back to the example of an economist investigatingthe relationship between food expenditure and income.What factors or variables does a household consider whend idi h h it h ld d f ddeciding how much money it should spend on food everyweek or every month?

C i i f i fCertainly, income of the household is a factor.

Is it the only factor?

QM-220, M. Zainal 5

Simple linear regression

Simple Linear Regression Model

M h i bl l ff f d di hMany other variables also affect food expenditure such as:

Assets owned

Size

preferences and tastes

any special dietary needs

These variables are called independent or explanatoryvariables because they all vary independently, and theyexplain the variation in food expenditures among differenthouseholds

QM-220, M. Zainal 6

Page 4: Introduction to Business Statistics QM 220 Chapter 14 14-2.pdf · Introduction to Business Statistics QM 220 Chapter 14 ... The line crosses theThe line crosses the ... relationship

Simple linear regression

Simple Linear Regression Model

I th d th i bl l i h diff tIn other words, these variables explain why differenthouseholds spend different amounts of money on food.

Food expenditure is called the dependent variable becauseFood expenditure is called the dependent variable becauseit depends on the independent variables.

Studying the effect of two or more independent variablesStudying the effect of two or more independent variableson a dependent variable using regression analysis is calledmultiple regression.p g

If we choose only one (usually the most important)independent variable and study the effect of that singlevariable on a dependent variable, it is called a simpleregression.

QM-220, M. Zainal 7

Simple linear regression

Simple Linear Regression Model

A i d l i th ti l ti th tA regression model is a mathematical equation thatdescribes the relationship between two or more variables.

A simple regression includes only two variables: oneA simple regression includes only two variables: oneindependent and one dependent.

Note that whether it is a simple or a multiple regressionNote that whether it is a simple or a multiple regressionanalysis, it always includes one and only one dependentvariable.

It is the number of independent variables that changes insimple and multiple regressions.

QM-220, M. Zainal 8

Page 5: Introduction to Business Statistics QM 220 Chapter 14 14-2.pdf · Introduction to Business Statistics QM 220 Chapter 14 ... The line crosses theThe line crosses the ... relationship

Chapter 13: Simple linear regression

Simple Linear Regression Model

Li R iLinear Regression

The relationship between two variables in a regressionanalysis is expressed by a mathematical equation called aanalysis is expressed by a mathematical equation called aregression equation or model.

A regression equation when plotted may assume one ofA regression equation, when plotted, may assume one ofmany possible shapes, including a straight line.

A regression equation that gives a straight-lineA regression equation that gives a straight linerelationship between two variables is called a linearregression model; otherwise, the model is called a nonlinearregression model.

QM-220, M. Zainal 9

Simple linear regression

Simple Linear Regression ModelThe two diagrams below show a linear and a nonlinearThe two diagrams below show a linear and a nonlinear

relationship between the dependent variable foodexpenditure and the independent variable incomep p

QM-220, M. Zainal 10

Page 6: Introduction to Business Statistics QM 220 Chapter 14 14-2.pdf · Introduction to Business Statistics QM 220 Chapter 14 ... The line crosses theThe line crosses the ... relationship

Simple linear regression

Simple Linear Regression Model

Al b R iAlgebra Review

In an algebra class we learn that the equation of a line canbe written in the formbe written in the form

y = a + bxwhere b is the slope of the line and a is the y interceptwhere b is the slope of the line and a is the y intercept.

This form is known as the slope–intercept form for theequation of a lineequation of a line.

Once we know the slope–intercept form of the equation,we can use the y intercept and slope of the line to graph thewe can use the y intercept and slope of the line to graph theline.

QM-220, M. Zainal 11

Simple linear regression

Simple Linear Regression Model

E l Gi th ti 2 + 3 4 th lExample: Given the equation 2x + 3y = 4, use the slope-intercept form to graph the related line.

Solution:Solution:

Step 1. We solve for y to get the slope-intercept form as follows:follows:

2 4

3 3y x= − +

Step 2. The coefficient of x is the slope is b = - 2/3 . The constant is the y intercept a = 4/3y p

QM-220, M. Zainal 12

Page 7: Introduction to Business Statistics QM 220 Chapter 14 14-2.pdf · Introduction to Business Statistics QM 220 Chapter 14 ... The line crosses theThe line crosses the ... relationship

Simple linear regression

Simple Linear Regression ModelStep 3 To graph the line we first locate the y intercept atStep 3. To graph the line, we first locate the y intercept at

(0, 4/3) and then use the slope to get a second point bystarting at the y intercept and going 3 units to the right,f ll d b 2 i dfollowed by 2 units down.

Y intercept2 4

3 3y x= − +

Y intercept

Slope or coefficient of x(rate of change)

QM-220, M. Zainal 13

(rate of change)

Simple linear regression

Simple Linear Regression Model

E l Gi th f ll i h f li fi d th lExample: Given the following graph of a line, find the slopeand y intercept. Then write the equation of the line in theslope-intercept form y = a + bx.slope intercept form y a + bx.

Solution

Step 1 The line crosses theStep 1. The line crosses the

y axis at (0,3).

Step 2 The slope of the line12

Step 2. The slope of the line

is 1/2.

Step 3 The equation in

2

Step 3. The equation in

slope-intercept form is1

3QM-220, M. Zainal 14

32

y x= +

Page 8: Introduction to Business Statistics QM 220 Chapter 14 14-2.pdf · Introduction to Business Statistics QM 220 Chapter 14 ... The line crosses theThe line crosses the ... relationship

Simple linear regression

Simple Linear Regression Analysis

I i d l h i d d i bl i llIn a regression model, the independent variable is usuallydenoted by x and the dependent variable is usually denotedbby y.

The x variable, with its coefficient, is written on the rightside of the = sign, whereas the y variable is written on theleft side of the = sign.

The y-intercept (constant term) is denoted by A, and thecoefficient of the x variable (slope) is denoted by B.

QM-220, M. Zainal 15

Simple linear regression

Simple Linear Regression Analysis

Th i l li i d l i iThe simple linear regression model is written as

Coefficient of x or

(1)B A

slope

(1)y B x A= +

Dependent variable

Constant term or Y-intercept

Independent i bl

p

QM-220, M. Zainal 16

variable

Page 9: Introduction to Business Statistics QM 220 Chapter 14 14-2.pdf · Introduction to Business Statistics QM 220 Chapter 14 ... The line crosses theThe line crosses the ... relationship

Simple linear regression

Simple Linear Regression AnalysisModel 1 is called a deterministic modelModel 1 is called a deterministic model.It gives an exact relationship between x and y.This model simply states that y is determined exactly by xThis model simply states that y is determined exactly by x

and for a given value of x there is one and only one (unique)value of y.

However, in many cases the relationship betweenvariables is not exact.

F i t if i f d dit d i i thFor instance, if y is food expenditure and x is income, thenmodel 1 would state that food expenditure is determined byincome only and that all households with the same incomeyspend the same amount on food.

QM-220, M. Zainal 17

Simple linear regression

Simple Linear Regression AnalysisBut as mentioned earlier food expenditure is determinedBut as mentioned earlier, food expenditure is determined

by many variables, only one of which is included in model 1.In reality, different households with the same incomeIn reality, different households with the same income

spend different amounts of money on food because of thedifferences in the sizes of the household, the assets they own,and their preferences and tastesand their preferences and tastes.

To take these variables into consideration and to makeour model complete, we add another term to the right sideour model complete, we add another term to the right sideof model 1.

This term is called the random error term. It is denotedby ε (Greek letter epsilon) which makes the model 2 to bedeterministic

(2)y A B x ε= + +

QM-220, M. Zainal 18

(2)y A B x ε+ +

Page 10: Introduction to Business Statistics QM 220 Chapter 14 14-2.pdf · Introduction to Business Statistics QM 220 Chapter 14 ... The line crosses theThe line crosses the ... relationship

Simple linear regression

Simple Linear Regression Analysis

Th d t i i l d d i th d l tThe random error term ε is included in the model torepresent the following two phenomena.1 Missing or omitted variables:1. Missing or omitted variables:

The random error term ε is included to capture the effectof all those missing or omitted variables that have not beengincluded in the model.2. Random variation:

Human behavior is unpredictable.A household may have many parties during one month

and spend more than usual on food during that month. Thevariation in food expenditure for such reasons may becalled random variation

QM-220, M. Zainal 19

called random variation.

Simple linear regression

Simple Linear Regression AnalysisI d l 2 A d B th l ti tIn model 2, A and B are the population parameters.The regression line obtained for model 2 by using the

population data is called the population regression linepopulation data is called the population regression line.The values of A and B in the population regression line

are called the true values of the y-intercept and slope.y p pAs we know, population data are difficult to obtain.As a result, we almost always use sample data to estimate, y p

model 2.The values of the y-intercept and slope calculated from

sample data on x and y are called the estimated values of Aand B and are denoted by a and b

QM-220, M. Zainal 20

Page 11: Introduction to Business Statistics QM 220 Chapter 14 14-2.pdf · Introduction to Business Statistics QM 220 Chapter 14 ... The line crosses theThe line crosses the ... relationship

Simple linear regression

Simple Linear Regression AnalysisU i d b it th ti t d i d lUsing a and b, we write the estimated regression model as

ˆ (3)y a bx= +where ŷ (read as y hat) is the estimated or predicted value ofy for a given value of x.y g

Equation 3 is called the estimated regression model; itgives the regression of y on x.Scatter Diagram

Suppose we take a sample of seven households from a lowto moderate income neighborhood and collect informationto moderate income neighborhood and collect informationon their incomes and food expenditures for the past month.

QM-220, M. Zainal 21

Simple linear regression

Simple Linear Regression AnalysisTh i f ti bt i d (i h d d f d ll ) i iThe information obtained (in hundreds of dollars) is given

in Table belowEach pair consists of one observation on income and aEach pair consists of one observation on income and a

second on food expenditure. For example, the firsthousehold's income for the past month was $3500 and itsfood expenditure was $900.

By plotting all seven pairs of values,we obtain a scatter diagram or scatterplot.

QM-220, M. Zainal 22

Page 12: Introduction to Business Statistics QM 220 Chapter 14 14-2.pdf · Introduction to Business Statistics QM 220 Chapter 14 ... The line crosses theThe line crosses the ... relationship

Simple linear regression

Simple Linear Regression AnalysisTh f ll i fi i th tt di f th d tThe following figure gives the scatter diagram for the data

of the previous.

Each dot in this diagram represents one household.A scatter diagram is helpful in detecting a relationship

between two variables.

QM-220, M. Zainal 23

Simple linear regression

Simple Linear Regression AnalysisB l ki t th tt di b th tBy looking at the scatter diagram, we can observe that

there exists a strong linear relationship between foodexpenditure and income.p

If a straight line is drawn through the points, the pointswill be scattered closely around the line.

In fact, we can draw many straight lines that pass throughthe points.

Each line will give different values for a and b of model 3Each line will give different values for a and b of model 3

QM-220, M. Zainal 24

Page 13: Introduction to Business Statistics QM 220 Chapter 14 14-2.pdf · Introduction to Business Statistics QM 220 Chapter 14 ... The line crosses theThe line crosses the ... relationship

Simple linear regression

Simple Linear Regression Analysis

I i l i t t fi d li th t b t fitIn regression analysis, we try to find a line that best fitsthe points in the scatter diagram.

S h li id th b t ibl d i ti f thSuch a line provides the best possible description of therelationship between the dependent and independentvariablesvariables.

The least squares method, discussed in the next section,gives such a linegives such a line.

The line obtained by using the least squares method iscalled the least squares regression line.called the least squares regression line.

QM-220, M. Zainal 25

Simple linear regression

Simple Linear Regression Analysis

L t S LiLeast Squares Line

The value of y obtained for a member from the survey isll d th b d t l l fcalled the observed or actual value of y.

As mentioned earlier, the value of y, denoted by ŷ,bt i d f i b i th i li i ll dobtained for a given x by using the regression line is called

the predicted value of y.

The random error ε denotes the difference between theThe random error ε denotes the difference between theactual value of y and the predicted value of y for populationdata.data.

QM-220, M. Zainal 26

Page 14: Introduction to Business Statistics QM 220 Chapter 14 14-2.pdf · Introduction to Business Statistics QM 220 Chapter 14 ... The line crosses theThe line crosses the ... relationship

Simple linear regression

Simple Linear Regression Analysis

F l f i h h ld i th diffFor example, for a given household, ε is the differencebetween what this household actually spent on food duringthe past month and what is predicted using the populationthe past month and what is predicted using the populationregression line.

The ε is also called the residual because it measures theThe ε is also called the residual because it measures thesurplus (positive or negative) of actual food expenditureover what is predicted by using the regression model.p y g g

If we estimate model 2 by using sample data, thedifference between the actual y and the predicted y basedon this estimation cannot be denoted by ε. The randomerror for the sample regression model is denoted by e.

QM-220, M. Zainal 27

Simple linear regression

Simple Linear Regression Analysis

Th i ti t f If ti t d l 2 iThus, e is an estimator of ε. If we estimate model 2 usingsample data, then the value of e is given by

t l f d dit di t d f d dite = actual food expenditure–predicted food expenditure = y-ŷ

i th ti l di t b t th t l iti fe is the vertical distance between the actual position of ahousehold and the point on the regression line.

QM-220, M. Zainal 28

Page 15: Introduction to Business Statistics QM 220 Chapter 14 14-2.pdf · Introduction to Business Statistics QM 220 Chapter 14 ... The line crosses theThe line crosses the ... relationship

Simple linear regression

Simple Linear Regression Analysis

Th l f i iti if th i t th t i thThe value of an error is positive if the point that gives theactual food expenditure is above the regression line andnegative if it is below the regression linenegative if it is below the regression line.

The sum of these errors is always zero.

I th d th f th t l f d ditIn other words, the sum of the actual food expendituresfor seven households included in the sample will be thesame as the sum of the food expenditures predicted fromsame as the sum of the food expenditures predicted fromthe regression model.

ˆ( ) 0e y y∑ ∑ ( ) 0e y y= − =∑ ∑

QM-220, M. Zainal 29

Simple linear regression

Simple Linear Regression Analysis

T fi d th li th t b t fit th tt f i tTo find the line that best fits the scatter of points, wecannot minimize the sum of errors.

I t d i i i th f d t dInstead, we minimize the error sum of squares, denotedby SSE, which is obtained by adding the squares of errors.

2 2ˆ( ) 0∑ ∑The values of a and b that give the minimum SSE are

called the least squares estimates of A and B and they are

2 2( ) 0e y y= − =∑ ∑

called the least squares estimates of A and B, and they are

and xySSb a y bx

SS= = −

( )( ) ( )2

2where and

xxSS

x y xSS xy SS x= − = −∑ ∑ ∑∑ ∑

QM-220, M. Zainal 30

where, and xy xxSS xy SS xn n

= − = −∑ ∑

Page 16: Introduction to Business Statistics QM 220 Chapter 14 14-2.pdf · Introduction to Business Statistics QM 220 Chapter 14 ... The line crosses theThe line crosses the ... relationship

Simple linear regression

Simple Linear Regression Analysis

Th l t i li i ll d thˆ bThe least squares regression line is called theregression of y on x.

Th ti b i f ti ti l i

y a bx= +

The equation above is for estimating a sample regressionline.

B t if h t l ti d t t W fi dBut if we have access to a population data set. We can findthe population regression line by using the same formulaswith a little adaptationwith a little adaptation.

If we have access to population data, we replace a by A, bby B, and n by N in all these formulas, and use the values ofby , d by N ese o u s, d use e v ues oΣx, Σy, Σxy, and Σx2 calculated for population data to makethe required computations.

QM-220, M. Zainal 31

Simple linear regression

Simple Linear Regression Analysis

The population regression line is written as

|y x A Bxμ = +

where μy|x is read as “the mean value of y for a given x.”

When plotted on a graph, the points on this populationp g p , p p pregression line give the average values of y for thecorresponding values of x.p g

These average values of y are denoted by μy|x.

QM-220, M. Zainal 32

Page 17: Introduction to Business Statistics QM 220 Chapter 14 14-2.pdf · Introduction to Business Statistics QM 220 Chapter 14 ... The line crosses theThe line crosses the ... relationship

Simple linear regression

Simple Linear Regression Analysis

Example: Find the least squares regression line for the dataon incomes and food expenditures on the seven householdsi i h f ll i T bl U i i d dgiven in the following Table. Use income as an independent

variable and food expenditure as a dependent variable.

QM-220, M. Zainal 33

Simple linear regression

Simple Linear Regression AnalysisS l tiSolutionTo find a and b, the following steps are performed

212 / 7 30 2857212 / 7 30.2857

64 / 7 9.1429

(212)(64)

x

y

= == =

Income x

Expenditure y

35 9

xy x2

315 1225

2

(212)(64)2150 211.7143

7

(212)

xySS = − =49 15

21 7

735 2401

147 441(212)

7222 801.42867xxSS = − =

211.7143.2642b = =

39 11

15 5

28 8

429 1521

75 225

224 784 .2642801.42869.1429 (.2642)(30.2857) 1.1414

b

a = − =

28 8

25 9

212 64 2150 7222

224 784

225 625

QM-220, M. Zainal 34

ˆ 1.1414 .2642y x= +212 64 2150 7222

Page 18: Introduction to Business Statistics QM 220 Chapter 14 14-2.pdf · Introduction to Business Statistics QM 220 Chapter 14 ... The line crosses theThe line crosses the ... relationship

Chapter 13: Simple linear regression

Simple Linear Regression Analysis

U i thi ti t d i d l fi d thUsing this estimated regression model, we can find thepredicted value of y for any specific value of x.

Suppose we randomly select a household whose monthlySuppose we randomly select a household whose monthlyincome is $3500 so that x = 35.

The predicted value of food expenditure for thisThe predicted value of food expenditure for thishousehold is

ŷ = 1.1414+(.2642)(35) = $10.3884 hundred = $1,038.84ŷ 1.1414+(.2642)(35) $10.3884 hundred $1,038.84Based on our regression line, we predict that a household

with a monthly income of $3500 is expected to spendy p p$1038.84 per month on food.

QM-220, M. Zainal 35

Simple linear regression

Simple Linear Regression AnalysisThi l f ŷ l b i t t d i tThis value of ŷ can also be interpreted as a point

estimator of the mean value of y for x = 35.We can state that, on average, all households with aWe can state that, on average, all households with a

monthly income of $3500 spend about $1038.84 per monthon food.

But in our data on seven households, there is onehousehold whose income is $3500. The actual foodexpenditure for that household is $900expenditure for that household is $900

The difference between the actual and predicted valuesgives the error of prediction. Thus, the error of predictionfor this household is

e = y – ŷ = 9.00 – 10.3884 = - $138.84

QM-220, M. Zainal 36

Page 19: Introduction to Business Statistics QM 220 Chapter 14 14-2.pdf · Introduction to Business Statistics QM 220 Chapter 14 ... The line crosses theThe line crosses the ... relationship

Simple linear regression

Simple Linear Regression AnalysisTh f di ti i $138 84The error of prediction is −$138.84.The negative error indicates that the predicted value of y

is greater than the actual value of y.is greater than the actual value of y.Thus, if we use the regression model, this household's food

expenditure is overestimated by $138.84.

QM-220, M. Zainal 37

Simple linear regression

Simple Linear Regression AnalysisI t t ti f d bInterpretation of a and b

How do we interpret a = 1.1414 and b = .2642 obtained inprevious example?previous example?Interpretation of a

Consider a household with zero income Using theConsider a household with zero income. Using theestimated regression line , we get the predicted value of yfor x = 0 as $114.14.

We can state that a household with no income is expectedto spend $114.14 per month on food.

We can also state that the point estimate of the averagemonthly food expenditure for all households with zeroincome is $114 14

QM-220, M. Zainal 38

income is $114.14.

Page 20: Introduction to Business Statistics QM 220 Chapter 14 14-2.pdf · Introduction to Business Statistics QM 220 Chapter 14 ... The line crosses theThe line crosses the ... relationship

Simple linear regression

Simple Linear Regression AnalysisW h ld b f l h ki thiWe should be very careful when making this

interpretation of a.In our sample of seven households, the incomes vary fromIn our sample of seven households, the incomes vary from

a minimum of $1500 to a maximum of $4900.Hence, our regression line is valid only for the values of x

between 15 and 49.If we predict y for a value of x outside this range, the

prediction usually will not hold trueprediction usually will not hold true.Interpretation of b

The value of b in a regression model gives the change in yThe value of b in a regression model gives the change in ydue to a change of one unit in x.

QM-220, M. Zainal 39

Simple linear regression

Simple Linear Regression Analysis

B i th i ti bt i d i th lBy using the regression equation obtained in the example,we see:

ˆWhen 30 1 1414 2642(30) 9 0674x y= = + =

H h i d b it f 30 t 31 ŷ

When 30, 1.1414 .2642(30) 9.0674

ˆWhen 31, 1.1414 .2642(31) 9.3316

x y

x y

= = + == = + =

Hence, when x increased by one unit, from 30 to 31, ŷincreased by 9.3316 − 9.0674 = .2642, which is the value ofb.b.

Because our unit of measurement is hundreds of dollars,we can state that, on average, a $100 increase in income will, g ,result in a $26.42 increase in food expenditure.

We can also state that, on average, a $1 increase in income

QM-220, M. Zainal 40

of a household will increase the food expenditure by $.2642.

Page 21: Introduction to Business Statistics QM 220 Chapter 14 14-2.pdf · Introduction to Business Statistics QM 220 Chapter 14 ... The line crosses theThe line crosses the ... relationship

Simple linear regression

Simple Linear Regression Analysis

Wh b i iti i i ill l d tWhen b is positive, an increase in x will lead to anincrease in y and a decrease in x will lead to a decrease in y.

When b is positive the movements in x and y are in theWhen b is positive, the movements in x and y are in thesame direction. Such a relationship between x and y iscalled a positive linear relationship.called a positive linear relationship.

When b is negative, an increase in x will lead to a decreasein y and a decrease in x will cause an increase in y. They ychanges in x and y in this case are in opposite directions.

Such a relationship between x and y is called a negativelinear relationship.

QM-220, M. Zainal 41

Simple linear regression

Simple Linear Regression Analysis

Remember:Remember:b is computed as b = SSxy/SSxx.The value of SSxx is always positive and that of SSxy can be positive ory

negative.Hence, the sign of b depends on the sign of SSxy. If SSxy is positive,

then b will be positive, and if SSxy is negative, then b will be negative.

QM-220, M. Zainal 42

p , xy g , g

Page 22: Introduction to Business Statistics QM 220 Chapter 14 14-2.pdf · Introduction to Business Statistics QM 220 Chapter 14 ... The line crosses theThe line crosses the ... relationship

Simple linear regression

Simple Linear Regression AnalysisA ti f th R i M d lAssumptions of the Regression Model

Like any other theory, the linear regression analysis isl b d t i tialso based on certain assumptions.

Consider the population regression model

Four assumptions are made about this model.

(4)y A B x ε= + +

These assumptions are made about the populationregression model and not about the sample regressionmodel.

QM-220, M. Zainal 43

Simple linear regression

Simple Linear Regression Analysis

A i 1 Th d h lAssumption 1: The random error term ε has a mean equal to zero for each x.

Assumption 2: The errors associated with different observations are independent.

Assumption 3: For any given x, the distribution of errors is normal.

Assumption 4: The distribution of population errors for each x has the same (constant) standard deviation, which is denoted by σε .

QM-220, M. Zainal 44

Page 23: Introduction to Business Statistics QM 220 Chapter 14 14-2.pdf · Introduction to Business Statistics QM 220 Chapter 14 ... The line crosses theThe line crosses the ... relationship

Simple linear regression

Simple Linear Regression Analysis

QM-220, M. Zainal 45

Simple linear regression

Simple Linear Regression AnalysisA N t th U f Si l Li R iA Note on the Use of Simple Linear Regression

We should apply linear regression with caution.When we use simple linear regression we assume that theWhen we use simple linear regression, we assume that the

relationship between two variables is described by astraight line.

In the real world, the relationship between variables maynot be linear.

B f i l li i it i b tt tBefore we use a simple linear regression, it is better toconstruct a scatter diagram and look at the plot of the datapoints. We should estimate a linear regression model only ifp g ythe scatter diagram indicates such a relationship.

QM-220, M. Zainal 46

Page 24: Introduction to Business Statistics QM 220 Chapter 14 14-2.pdf · Introduction to Business Statistics QM 220 Chapter 14 ... The line crosses theThe line crosses the ... relationship

Simple linear regression

Simple Linear Regression Analysis

QM-220, M. Zainal 47

Simple linear regression

Standard Deviation of Random Errors

Wh id i d f d di llWhen we consider incomes and food expenditures, allhouseholds with the same income are expected to spenddiff t t f ddifferent amounts on food.

Consequently, the random error ε will assume differentl f th h h ldvalues for these households.

The standard deviation σε measures the spread of thesed th l ti i lierrors around the population regression line.

The standard deviation of errors tells us how widely thef f ierrors and, hence, the values of y are spread for a given x.

QM-220, M. Zainal 48

Page 25: Introduction to Business Statistics QM 220 Chapter 14 14-2.pdf · Introduction to Business Statistics QM 220 Chapter 14 ... The line crosses theThe line crosses the ... relationship

Simple linear regression

Standard Deviation of Random ErrorsI th Fi b l th i t th ti l liIn the Figure below, the points on the vertical line

through x = 20 give the monthly food expenditures for allhouseholds with a monthly income of $2000.households with a monthly income of $2000.

The distance of each dot from the point on the regressionline gives the value of the corresponding error.

QM-220, M. Zainal 49

Simple linear regression

Standard Deviation of Random ErrorsTh t d d d i ti f th dThe standard deviation of errors σε measures the spread

of such points around the population regression line. Thesame is true for x = 35 or any other value of x.same is true for x 35 or any other value of x.σε denotes the standard deviation of errors for the

population. But, usually σε is unknown.The degrees of freedom for a simple linear regression

model aredf = n - 2

In such cases, it is estimated by se, which is the standardd i ti f f th l d t d it i l l t ddeviation of errors for the sample data and it is calculatedas:

yy xySS bSSs

−= ( )2

2y

h SS∑∑

QM-220, M. Zainal 50

2es n=

−( )2

yywhere SS yn

= − ∑∑

Page 26: Introduction to Business Statistics QM 220 Chapter 14 14-2.pdf · Introduction to Business Statistics QM 220 Chapter 14 ... The line crosses theThe line crosses the ... relationship

Simple linear regression

Standard Deviation of Random ErrorsE l C t th t d d d i ti f fExample: Compute the standard deviation of errors se for the data on monthly incomes and food expenditures of the seven households given in the previous exampleseven households given in the previous example

Income x

Expenditure y

y2

315

( )264

646 60.85717yySS = − =

35 9

49 15

21 7

315

735

147

60.8571 .2642(211.7143).9922

7 2es−

= =−

21 7

39 11

15 5

429

75

28 8

25 9

224

225

646

QM-220, M. Zainal 51

212 64 646

Simple linear regression

Coefficient of Determination

We may ask the question: How good is the regressionmodel?

I th d H ll d th i d d t i blIn other words: How well does the independent variableexplain the dependent variable in the regression model?

The coefficient of determination is one concept thatThe coefficient of determination is one concept thatanswers this question.

To find the coefficient of determination we need to findTo find the coefficient of determination, we need to findthe followings:

The total sum of squares that is denoted by SST whiche o su o squ es s de o ed by SS w cis the same as SSyy

QM-220, M. Zainal 52

Page 27: Introduction to Business Statistics QM 220 Chapter 14 14-2.pdf · Introduction to Business Statistics QM 220 Chapter 14 ... The line crosses theThe line crosses the ... relationship

Simple linear regression

Coefficient of Determination

Th f th t i d t d b SSE d it iThe error sum of squares that is denoted by SSE and it isgiven by

2ˆ( )SSE y y= −∑The regression sum of squares that is denoted by SSR and

it can be found using

( )y y∑

it can be found using

The coefficient of determination, denoted by r2, represents

2ˆ( )SSR y y SST SSE= − = −∑The coefficient of determination, denoted by r , represents

the proportion of SST that is explained by the use of theregression model. The computational formula for r2

2 xy

yy

bSSSSR SST SSEr

SST SST SS

−= = =

QM-220, M. Zainal 53

Simple linear regression

Coefficient of Determination

SSR i th ti f SST th t i l i d b th fSSR is the portion of SST that is explained by the use ofthe regression model

SSE is the portion of SST that is not explained by the useSSE is the portion of SST that is not explained by the useof the regression model.

The coefficient of determination calculated for populationThe coefficient of determination calculated for populationdata is denoted by ρ2 (ρ is the Greek letter rho) and the onecalculated for sample data is denoted by r2.p y

The coefficient of determination gives the proportion ofSST that is explained by the use of the regression model.

The value of the coefficient of determination always lies inthe range zero to one.

QM-220, M. Zainal 54

Page 28: Introduction to Business Statistics QM 220 Chapter 14 14-2.pdf · Introduction to Business Statistics QM 220 Chapter 14 ... The line crosses theThe line crosses the ... relationship

Simple linear regression

Coefficient of Determination

E l F th d t f th thl i d f dExample: For the data of the monthly incomes and foodexpenditures of seven households, calculate the coefficientof determinationof determination.

Solution

b = 2642 SS = 211 7143 and SS =60 8571b = .2642, SSxy = 211.7143, and SSyy=60.8571

Hence,

2 (.2642)(211.7143).92

60.8571xy

yy

bSSr

SS= = =

QM-220, M. Zainal 55

Simple linear regression

Coefficient of Determination

W t t th t 92% f th t t l i ti i f dWe can state that 92% of the total variation in foodexpenditures of households occurs because of the variationin their incomes and the remaining 8% is due toin their incomes, and the remaining 8% is due torandomness and other variables.

Usually the higher the value of r2 the better theUsually, the higher the value of r , the better theregression model.

This is so because if r2 is larger a greater portion of theThis is so because if r is larger, a greater portion of thetotal errors is explained by the included independentvariable and a smaller portion of errors is attributed topother variables and randomness.

QM-220, M. Zainal 56

Page 29: Introduction to Business Statistics QM 220 Chapter 14 14-2.pdf · Introduction to Business Statistics QM 220 Chapter 14 ... The line crosses theThe line crosses the ... relationship

Simple linear regression

Inferences About BThi ti i d ith ti ti d t t fThis section is concerned with estimation and tests of

hypotheses about the population regression slope B.

We can also make confidence intervals and testWe can also make confidence intervals and testhypotheses about the y-intercept A of the populationregression line.regression line.

However, making inferences about A is beyond the scopeof this text.

13.5.1 Sampling Distribution of bWhen we find a regression line, our main target is to findg , g

a true value for the slope B (from population) but, in almostall cases, the regression line is estimated using sample data.

QM-220, M. Zainal 57

Simple linear regression

Inferences About BS b d th l i li i fSo based on the sample regression line, inferences are

made about the population regression line.

Th l b f l i li i i t ti tThe slope b of a sample regression line is a point estimatorof the slope B of the population regression line.

Th diff t l i li ti t d fThe different sample regression lines estimated fordifferent samples taken from the same population will givedifferent values of bdifferent values of b.

Thus, b is a random variable, and it possesses aprobability distribution that is more commonly called itsprobability distribution that is more commonly called itssampling distribution.

QM-220, M. Zainal 58

Page 30: Introduction to Business Statistics QM 220 Chapter 14 14-2.pdf · Introduction to Business Statistics QM 220 Chapter 14 ... The line crosses theThe line crosses the ... relationship

Simple linear regression

Inferences About BB f th ti f ll di t ib t dBecause of the assumption of normally distributed

random errors, the sampling distribution of b is normal., itsmean μb, and standard deviation σb are given bymean μb, and standard deviation σb are given by

and b bB εσμ σ= =

If σb is not known, σb is replaced by sb

b b

xxSSμ

b , b p y b

eb

ss =b

xxSS

QM-220, M. Zainal 59

Simple linear regression

Inferences About BIf i k th l di t ib ti b d tIf σb is known, the normal distribution can be used to

make inferences about B.If σb is not known, the normal distribution is replaced byIf σb is not known, the normal distribution is replaced by

the t distribution to make inferences about B.Estimation of B

The (1 − α)100% confidence interval for B is given by

b ±and the value of t is obtained from the t distribution table for α/2 area in the right tail of the t distribution and n 2

bb ts±

for α/2 area in the right tail of the t distribution and n − 2 degrees of freedom.

QM-220, M. Zainal 60

Page 31: Introduction to Business Statistics QM 220 Chapter 14 14-2.pdf · Introduction to Business Statistics QM 220 Chapter 14 ... The line crosses theThe line crosses the ... relationship

Simple linear regression

Inferences About BE l C t t 95% fid i t l f B f thExample: Construct a 95% confidence interval for B for thedata on incomes and food expenditures of seven households.

Solution:Solution:

From earlier calculations, we know

7 b 2642 SS 801 4286 d 9922n = 7, b = .2642, SSxx = 801.4286, and se = .9922

The confidence level is 95%9922

df 7 2 5

.9922.0350

801.4286= =bs

The 95% CI for B is

=.2642 ± 2.571(.0350)df = 7 – 2 = 5

α/2 = (1 - .95) / 2 = .025=.17 to .35

QM-220, M. Zainal 61

Simple linear regression

Inferences About BH th i T ti Ab t BHypothesis Testing About B

Testing a hypothesis about B when the null hypothesis isB 0 (th t i th l f th i li i ) iB = 0 (that is, the slope of the regression line is zero) isequivalent to testing that x does not determine y and thatthe regression line is of no use in predicting y for a given xthe regression line is of no use in predicting y for a given x.

However, we should remember that we are testing for alinear relationship between x and y It is possible that x maylinear relationship between x and y. It is possible that x maydetermine y nonlinearly. Hence, a nonlinear relationshipmay exist between x and ymay exist between x and y.

The test statistic is −= ob B

t

QM-220, M. Zainal 62

bs

Page 32: Introduction to Business Statistics QM 220 Chapter 14 14-2.pdf · Introduction to Business Statistics QM 220 Chapter 14 ... The line crosses theThe line crosses the ... relationship

Simple linear regression

Inferences About BT t t th h th i th t d t d t i li lTo test the hypothesis that x does not determine y linearly,

we will test the null hypothesis that the slope of theregression line is zero; that is B = 0regression line is zero; that is, B 0.

The alternative hypothesis can be:

(1) x determines y; that is B ≠ 0;(1) x determines y; that is, B ≠ 0;

(2) x determines y positively; that is, B > 0;

(3) x determines y negatively; that is B < 0(3) x determines y negatively; that is, B < 0.

The procedure used to make a hypothesis test about B issimilar to the one used in earlier chapters It involves thesimilar to the one used in earlier chapters. It involves thesame five steps.

QM-220, M. Zainal 63

Simple linear regression

Inferences About BE l T t t th 1% i ifi l l h th thExample: Test at the 1% significance level whether theslope of the regression line for the example on incomes andfood expenditures of seven households is positivefood expenditures of seven households is positive.

Solution

From earlier calculations we knowFrom earlier calculations, we know

n = 7, b = .2642, sb = .0350

Step 1 State the null and alternative hypothesesStep 1. State the null and alternative hypothesesHo: B = 0 ( The slope is zero)

H : B > 0 ( The slope is positive)H1: B > 0 ( The slope is positive)

QM-220, M. Zainal 64

Page 33: Introduction to Business Statistics QM 220 Chapter 14 14-2.pdf · Introduction to Business Statistics QM 220 Chapter 14 ... The line crosses theThe line crosses the ... relationship

Simple linear regression

Inferences About BSt 2 S l t th di t ib ti tStep 2. Select the distribution to use.Here, σε is not known. Hence, we will use the t distribution to make the test about Bto make the test about B.

St 3 D t i th j ti d j ti iStep 3. Determine the rejection and nonrejection regions.The significance level is .01. The > sign in the alternative hypothesis indicates that the test is right-tailed. Therefore,hypothesis indicates that the test is right tailed. Therefore,

QM-220, M. Zainal 65

Simple linear regression

Inferences About BSt 4 C l l t th l f th t t t ti tiStep 4. Calculate the value of the test statistic.

The value of the test statistic t for b is calculated as follows:

.2642 07.549

0350

−= =t

Step 5. Make a decision.

.0350

The value of the test statistic t = 7.549 is greater than thecritical value of t = 3.365, and it falls in the rejection region.Hence we reject the null hypothesis and conclude that xHence, we reject the null hypothesis and conclude that x(income) determines y (food expenditure) positively.

QM-220, M. Zainal 66

Page 34: Introduction to Business Statistics QM 220 Chapter 14 14-2.pdf · Introduction to Business Statistics QM 220 Chapter 14 ... The line crosses theThe line crosses the ... relationship

Simple linear regression

Linear Correlation

I thi ti ill t d th i d l l tiIn this section we will study the meaning and calculationof the linear correlation coefficient and the procedure toconduct a test of hypothesis about itconduct a test of hypothesis about it.

Linear Correlation Coefficient

Another measure of the relationship between twoAnother measure of the relationship between twovariables is the correlation coefficient.

linear correlation, is a measure of the strength of thelinear correlation, is a measure of the strength of thelinear association between two variables.

It measures how closely the points in a scatter diagramy p gare spread around the regression line.

QM-220, M. Zainal 67

Simple linear regression

Linear Correlation

Th l ti ffi i t l l t d f th l tiThe correlation coefficient calculated for the populationdata is denoted by ρ (Greek letter rho) and the onecalculated for sample data is denoted by rcalculated for sample data is denoted by r.

Note that the square of the correlation coefficient is equalto the coefficient of determination.to the coefficient of determination.

The value of the correlation coefficient always lies in therange −1 to 1; that is,g ; ,

The simple linear correlation coefficient, denoted by r (for1 1 1 1ρ− ≤ ≤ − ≤ ≤and r

samples), and is calculated as

= xySSr

SS SSQM-220, M. Zainal 68

xx yySS SS

Page 35: Introduction to Business Statistics QM 220 Chapter 14 14-2.pdf · Introduction to Business Statistics QM 220 Chapter 14 ... The line crosses theThe line crosses the ... relationship

Simple linear regression

Linear Correlation

B b th SS d SS l iti th i fBecause both SSxx and SSyy are always positive, the sign ofthe correlation coefficient r depends on the sign of SSxy.

If SS is positive then r will be positive and if SS isIf SSxy is positive, then r will be positive, and if SSxy isnegative, then r will be negative.

Also r and b calculated for the same sample will alwaysAlso, r and b, calculated for the same sample, will alwayshave the same sign. That is, both r and b are either positiveor negative.g

This is so because both r and b provide information aboutthe relationship between x and y. Likewise, thecorresponding population parameters ρ and B will alwayshave the same sign.

QM-220, M. Zainal 69

Simple linear regression

Linear Correlation

No linear relationship

Perfect positive Perfect negative plinear relationship

glinear relationship

QM-220, M. Zainal 70

Page 36: Introduction to Business Statistics QM 220 Chapter 14 14-2.pdf · Introduction to Business Statistics QM 220 Chapter 14 ... The line crosses theThe line crosses the ... relationship

Simple linear regression

Linear Correlation

QM-220, M. Zainal 71

Simple linear regression

Linear Correlation

E l C l l t th l ti ffi i t f thExample: Calculate the correlation coefficient for theexample on incomes and food expenditures of sevenhouseholdshouseholds.

Solution

SS = 211 7143 SS = 801 4286 and SS =60 8571SSxy = 211.7143, SSxx = 801.4286 and SSyy=60.8571

211.7143.96= =r

Note : r2 = ( 96)2 = 92

(801.4286)(60.8571)

Note : r2 = (.96)2 = .92

QM-220, M. Zainal 72

Page 37: Introduction to Business Statistics QM 220 Chapter 14 14-2.pdf · Introduction to Business Statistics QM 220 Chapter 14 ... The line crosses theThe line crosses the ... relationship

Simple linear regression

Linear Correlation

H th i T ti Ab t th Li C l tiHypothesis Testing About the Linear Correlation Coefficient

This section describes how to perform a test of hypothesisThis section describes how to perform a test of hypothesisabout the population correlation coefficient ρ using thesample correlation coefficient r.sample correlation coefficient r.

We can use the t distribution to make this test.

However, to use the t distribution, both variables shouldHowever, to use the t distribution, both variables shouldbe normally distributed.

Usually (although not always), the null hypothesis is thaty ( g y ), ypthe linear correlation coefficient between the two variablesis zero, that is ρ = 0.

QM-220, M. Zainal 73

Simple linear regression

Linear CorrelationTh lt ti h th i b f th f ll iThe alternative hypothesis can be one of the following:

(1) the linear correlation coefficient between the twovariables is less than zero, that is ρ < 0;variables is less than zero, that is ρ 0;(2) the linear correlation coefficient between the twovariables is greater than zero, that is ρ > 0;(3) the linear correlation coefficient between the twovariables is not equal to zero, that is ρ ≠ 0.

Th l f th t t t ti ti t i l l t dThe value of the test statistic t is calculated as

2

2

1

−=

nt r

here n − 2 are the degrees of freedom.

21− r

QM-220, M. Zainal 74

Page 38: Introduction to Business Statistics QM 220 Chapter 14 14-2.pdf · Introduction to Business Statistics QM 220 Chapter 14 ... The line crosses theThe line crosses the ... relationship

Simple linear regression

Linear CorrelationE l U i th 1% l l f i ifi d th d tExample: Using the 1% level of significance and the datafrom food expenditures and monthly income example, testwhether the linear correlation coefficient between incomesand food expenditures is positive. Assume that thepopulations of both variables are normally distributed.S l tiSolutionFrom earlier calculations, we know

n = 7 r = 96n = 7, r = .96Step 1. State the null and alternative hypothesesHo: ρ = 0 ( The linear correlation coefficient is zero)Ho: ρ 0 ( The linear correlation coefficient is zero)H1: ρ > 0 ( The linear correlation coefficient is positive)

QM-220, M. Zainal 75

Simple linear regression

Linear Correlation

St 2 S l t th di t ib ti tStep 2. Select the distribution to use.The population distributions for both variables arenormally distributed Hence we can use the t distribution tonormally distributed. Hence, we can use the t distribution toperform this test about the linear correlation coefficient.

Step 3. Determine the rejection and nonrejection regions.The significance level is .01. The > sign in the alternative hypothesis indicates that the test is right-tailed. Therefore,

QM-220, M. Zainal 76

Page 39: Introduction to Business Statistics QM 220 Chapter 14 14-2.pdf · Introduction to Business Statistics QM 220 Chapter 14 ... The line crosses theThe line crosses the ... relationship

Simple linear regression

Linear Correlation

St 4 C l l t th l f th t t t ti tiStep 4. Calculate the value of the test statistic.

The value of the test statistic t for r is calculated as follows:

2

7 2.96 7.667

1 ( 96)

−= =t

Step 5. Make a decision.

1 (.96)−

The value of the test statistic t = 7.667 is greater than thecritical value of t = 3.365, and it falls in the rejection region.Hence we reject the null hypothesis and conclude there is aHence, we reject the null hypothesis and conclude there is apositive linear relationship between the two variables.

QM-220, M. Zainal 77

Simple linear regression

Regression Analysis: A Complete ExampleE l A d l f i ht d i i d ithExample: A random sample of eight drivers insured with acompany and having similar auto insurance policies wasselected. The following table lists their driving experiencesselected. The following table lists their driving experiences(in years) and monthly auto insurance premiums.

Driving Experience (years) Monthly Auto Insurance PremiumDriving Experience (years) Monthly Auto Insurance Premium

5 $64

2 87

12 50

9 71

15 44

6 56

25 42

16 60

QM-220, M. Zainal 78

16 60

Page 40: Introduction to Business Statistics QM 220 Chapter 14 14-2.pdf · Introduction to Business Statistics QM 220 Chapter 14 ... The line crosses theThe line crosses the ... relationship

Simple linear regression

Regression Analysis: A Complete Examplea. Does the insurance premium depend on the driving experience or does thea. Does the insurance premium depend on the driving experience or does the driving experience depend on the insurance premium? Do you expect a positive or a negative relationship between these two variables?b. Compute SSxx, SSyy, and SSxy.xx yy xyc. Find the least squares regression line by choosing appropriate dependent and independent variables based on your answer in part a. d. Interpret the meaning of the values of a and b calculated in part c.e. Plot the scatter diagram and the regression line.f. Calculate r and r2 and explain what they mean.g. Predict the monthly auto insurance premium for a driver with 10 years of driving experience.h. Compute the standard deviation of errors.i. Construct a 90% confidence interval for B.j. Test at the 5% significance level whether B is negative.k. Using α = .05, test whether ρ is different from zero.

QM-220, M. Zainal 79