Upload
ggdgdfgdfgdfgdgdddd
View
218
Download
0
Embed Size (px)
Citation preview
7/30/2019 Busn210ch14 Statistics Series
1/61
Linear Regression #1: Scatter Diagram: Relationship Between 2 Variables?
Linear Regression #2: Scatter Plot with Trendline & X and Y Mean Lines
#REF!
#REF!
#REF!
7/30/2019 Busn210ch14 Statistics Series
2/61
Linear Regression #1: Scatter Diagram: Relationship Between 2 Variables?
Plotting Two variables: Dont use Line Chart, Use Scatter Chart
Plotting the point on the chart that graphs the relationship between two variables: Move along x axis a give
and then along the y axis a certain amount.
Independent, Predictor Variable = x
Dependent, Predicted Variable = y
Scatter Diagram with proper x and y axis labels to see if there is a relationship between two variabl
Direct, Positive Relationship: As x increases, y increases
Indirect, Negative Relationship: As x increases, y decreases
No relationship: no pattern can be seen
Add Trendline with linear equation and coefficient of determination (goodness of fit: of the total variation,
can model explain?)
Example 1:
Independent Variable Dependent Variable
Predictor Variable Predicted Variable
Sample Point x y
No. Time Studying (hours) Score on Test
1 3 49
2 11 87
3 2 50
4 13 89
5 8 84
6 12 79
7 13 100
8 4 579 7 64
10 14 98
11 7 81
12 7 68
13 14 88
14 4 45
15 4 52
16 5 15
17 12 72
18 16 97
19 12 8920 14 87
21 2 48
22 12 92
23 11 89
24 6 52
25 11 84
26 14 94
27 10 79
7/30/2019 Busn210ch14 Statistics Series
3/61
7/30/2019 Busn210ch14 Statistics Series
4/61
Example 4:
Independent Variable Dependent Variable
Predictor Variable Predicted Variable
Sample Point x y
No. Years Using ExcelExpert Level (Rating1 - 10))
1 3 5
2 8 1
3 6 9
4 11 5
5 20 3
6 7 4
7 9 10
8 3 6
9 19 10
10 2 1
11 16 2
12 12 7
13 1 6
7/30/2019 Busn210ch14 Statistics Series
5/61
n amount
s.
ow much
7/30/2019 Busn210ch14 Statistics Series
6/61
Linear Regression #1: Scatter Diagram: Relationship Between 2 Variables?
Plotting Two variables: Dont use Line Chart, Use Scatter Chart
Plotting the point on the chart that graphs the relationship between two variables: Move along x axis a give
and then along the y axis a certain amount.
Independent, Predictor Variable = x
Dependent, Predicted Variable = y
Scatter Diagram with proper x and y axis labels to see if there is a relationship between two variabl
Direct, Positive Relationship: As x increases, y increases
Indirect, Negative Relationship: As x increases, y decreases
No relationship: no pattern can be seen
Add Trendline with linear equation and coefficient of determination (goodness of fit: of the total variation,
can model explain?)
Example 1:
Independent Variable Dependent Variable
Predictor Variable Predicted Variable
Sample Point x y
No. Time Studying (hours) Score on Test
1 3 49
2 11 87
3 2 50
4 13 89
5 8 84
6 12 79
7 13 100
8 4 579 7 64
10 14 98
11 7 81
12 7 68
13 14 88
14 4 45
15 4 52
16 5 15
17 12 72
18 16 97
19 12 8920 14 87
21 2 48
22 12 92
23 11 89
24 6 52
25 11 84
26 14 94
27 10 79
0
20
40
60
80
100
120
0 2 4 6 8 10 12
ScoreonTest
Time Studying (hours)
7/30/2019 Busn210ch14 Statistics Series
7/61
28 6 59
29 10 66
30 11 97
Example 2:
Independent Variable Dependent Variable
Predictor Variable Predicted Variable
Sample Point x y
No. Temperature (F) Sales Chicken Soup
1 86 $3,300
2 40 $8,200
3 41 $8,900
4 78 $3,100
5 71 $4,020
6 91 $1,950
7 70 $2,500
8 37 $6,500
9 65 $6,210
10 42 $5,250
11 53 $7,200
12 83 $2,750
13 63 $7,150
14 36 $7,900
15 43 $6,210
Example 3:
Independent Variable Dependent Variable
Predictor Variable Predicted Variable
Sample Point x y
No. Temperature (F) Sales Ice Cream
1 91 $7,113
2 45 $2,044
3 46 $1,108
4 83 $7,093
5 76 $3,902
6 96 $6,676
7 75 $5,403
8 42 $886
9 70 $4,740
10 47 $2,637
11 58 $3,150
y = -100.56x + 11436
R = 0.7193
$0
$1,000
$2,000
$3,000
$4,000
$5,000
$6,000
$7,000
$8,000
$9,000
$10,000
0 20 40 60
Sa
lesC
hic
kenSoup
Temperature (F)
y = 112x - 3354.1
R = 0.9056
$0
$1,000
$2,000
$3,000
$4,000
$5,000
$6,000
$7,000
$8,000
0 20 40 60 80
Sa
lesIceCream
Temperature (F)
7/30/2019 Busn210ch14 Statistics Series
8/61
Example 4:
Independent Variable Dependent Variable
Predictor Variable Predicted Variable
Sample Point x y
No. Years Using ExcelExpert Level (Rating1 - 10))
1 3 5
2 8 1
3 6 9
4 11 5
5 20 3
6 7 4
7 9 10
8 3 6
9 19 10
10 2 1
11 16 2
12 12 7
13 1 6
y = 0.0436x + 4.9156
R = 0.0078
0
2
4
6
8
10
12
0 5 10 15 20
ExpertLeve
l(Rating1-
10
))
Years Using Excel
7/30/2019 Busn210ch14 Statistics Series
9/61
n amount
s.
ow much
y = 4.2914x + 34.362
R = 0.7266
14 16 18
7/30/2019 Busn210ch14 Statistics Series
10/61
80 100
100 120
7/30/2019 Busn210ch14 Statistics Series
11/61
5
7/30/2019 Busn210ch14 Statistics Series
12/61
Linear Regression #2: Scatter Plot with Trendline & X and Y Mean Lines
1. Create Scatter Plot with Trendline & X and Y Mean Lines to divide chart into four quadrants in order to fur
the pattern and relationship between the two variables
Example 2:
Mean: Xbar y x
Sample Point x y
No. Temperature (F) Sales Chicken Soup
1 86 $3,300
2 40 $8,200
3 41 $8,900
4 78 $3,100
5 71 $4,020
6 91 $1,950
7 70 $2,5008 37 $6,500
9 65 $6,210
10 42 $5,250
11 53 $7,200
12 83 $2,750
13 63 $7,150
14 36 $7,900
15 43 $6,210
Example 3: Xbar y x66.27273 0 0
Mean: 66.27272727 $4,068 66.27273 8000 120
Sample Point x y
No. Temperature (F) Sales Ice Cream
1 91 $7,113
2 45 $2,044
3 46 $1,108
4 83 $7,093
5 76 $3,902
6 96 $6,676
7 75 $5,403
8 42 $886
9 70 $4,740
10 47 $2,637
11 58 $3,150
y = 112x - 3354.1
R = 0.9056
$0
$1,000
$2,000
$3,000
$4,000
$5,000
$6,000
$7,000
$8,000
$9,000
0 20 40 60
Sa
les
IceCream
Te
Sales Ice Cream Xbar
7/30/2019 Busn210ch14 Statistics Series
13/61
her define
Ybar
Ybar$4,068
$4,068
80 100 120 140
perature (F)
Ybar Linear (Sales Ice Cream)
7/30/2019 Busn210ch14 Statistics Series
14/61
Linear Regression #2: Scatter Plot with Trendline & X and Y Mean Lines
1. Create Scatter Plot with Trendline & X and Y Mean Lines to divide chart into four quadrants in order to fur
the pattern and relationship between the two variables
Example 2:
Mean: 59.93333333 $5,409 Xbar y x
59.93333 0 0
Sample Point x y 59.93333 10000 100
No. Temperature (F) Sales Chicken Soup
1 86 $3,300
2 40 $8,200
3 41 $8,900
4 78 $3,100
5 71 $4,020
6 91 $1,950
7 70 $2,5008 37 $6,500
9 65 $6,210
10 42 $5,250
11 53 $7,200
12 83 $2,750
13 63 $7,150
14 36 $7,900
15 43 $6,210
Example 3: Xbar y x66.27273 0 0
Mean: 66.27272727 $4,068 66.27273 8000 120
Sample Point x y
No. Temperature (F) Sales Ice Cream
1 91 $7,113
2 45 $2,044
3 46 $1,108
4 83 $7,093
5 76 $3,902
6 96 $6,676
7 75 $5,403
8 42 $886
9 70 $4,740
10 47 $2,637
11 58 $3,150
y = 112x - 3354.1
R = 0.9056
$0
$1,000
$2,000
$3,000
$4,000
$5,000
$6,000
$7,000
$8,000
$9,000
0 20 40 60
Sa
les
IceCream
Te
Sales Ice Cream Xbar
y = -100.56x + 11436
R = 0.7193
$0
$2,000
$4,000
$6,000
$8,000
$10,000
$12,000
0 20 40 60
Sa
lesC
hic
kenSoup
Temperature (F)
7/30/2019 Busn210ch14 Statistics Series
15/61
her define
Ybar
$5,409
$5,409
Ybar$4,068
$4,068
80 100 120 140
perature (F)
Ybar Linear (Sales Ice Cream)
80 100 120
7/30/2019 Busn210ch14 Statistics Series
16/61
Linear Regression #3: Coefficient of Correlation: Strength & Direction of Relationship
Calculate the Sample Covariance long hand to get measure of strength of the linear relationship.
Use Scatter Plot with Trendline & X and Y Mean Lines to see why covariance makes sense
Calculate the Sample Covariance using Excel function COVARIANCE.S
Measure Strength and Direction of Relationship with Coefficient of Correlation
Calculate Coefficient of Correlation long hand to get a measure of the strength and direction of the linear rela
number will vary from -1 to 0 to +1 (minus one to zero to positive one) and will indicate a perfect indirect (
relationship when minus one, no relationship when it is zero and a perfect direct relationship when it is po
Reasonable positive number = Direct, Positive Relationship: As x increases, y increases
Reasonable negative number = Indirect, Negative Relationship: As x increases, y decreases
Number close to zero = No relationship: no pattern can be seen
See three charts to help visualize the three correlation situations.
Calculate Coefficient of Correlation with the Excel functions CORREL and PEARSON
Calculate Sample Standard Deviation long hand to see that it is related to Coefficient of Correlation and other Li
calculations
Xbar y
59.93333333 0
Example 2: 59.93333333 10000
x Ybar
0 $5,409
Mean: 59.93333 $5,409 100 $5,409
Count 15
n -1 14
Sample Point x y (x Deviation) (y Deviation) (x Deviation)^2 (y Deviation)^2
No.
Temperat
ure (F) Sales Chicken Soup (x - Xbar) (y - Ybar) (x - Xbar)^2 (y - Ybar)^2
1 86 $3,300
2 40 $8,200
3 41 $8,900
4 78 $3,100
5 71 $4,020
6 91 $1,950
7 70 $2,500
8 37 $6,500
9 65 $6,21010 42 $5,250
11 53 $7,200
12 83 $2,750
13 63 $7,150
14 36 $7,900
15 43 $6,210
Sum of Deviations
SUM Deviations^2 ====================>>
y = -100.5
R =
$0
$2,000
$4,000
$6,000
$8,000
$10,000
$12,000
0 20 40 60 80 1
Sa
lesC
hic
kenSoup
Temperature (F)
7/30/2019 Busn210ch14 Statistics Series
17/61
SUM Mult. Deviations =============================================>>
Sample SD x
Sample SD y
Sample Covariance
Coefficient of Correlation
Xbar y x
66.27272727 0 0
Example 3: 66.27272727 8000 120
Sample Point x y
No.
Temperat
ure (F) Sales Ice Cream
1 91 $7,113
2 45 $2,044
3 46 $1,108
4 83 $7,093
5 76 $3,902
6 96 $6,676
7 75 $5,4038 42 $886
9 70 $4,740
10 47 $2,637
11 58 $3,150
Mean: 66.27273 $4,068
Sample
Covariance
Coefficient
of
Correlation Strength and Direction of the relationship
Coefficient of Determination = R^2 = "Goodness of fit for our line" r^2
Example 4:
Sample Point x y
y = 112x - 3354.1
R = 0.9056
$0
$1,000
$2,000
$3,000
$4,000
$5,000
$6,000
$7,000
$8,000
$9,000
0 20 40 60 80
Sa
lesIceCream
Temperature (F)
Sales Ice Cream Xbar Ybar Li
7/30/2019 Busn210ch14 Statistics Series
18/61
No.
Years
Using
Excel
Expert Level (Rating 1 -
10))
1 3 5
2 8 1
3 6 9
4 11 55 20 3
6 7 4
7 9 10
8 3 6
9 19 10
10 2 1
11 16 2
12 12 7
13 1 6
Mean: 9.636364 5.307692308
Sample
Covariance
Coefficient
of
Correlation
r^2
y = 0.0436
R = 0.
0
2
4
6
8
10
12
0 5 10 1
ExpertLeve
l(Rating1-
10
))
Years Using Excel
7/30/2019 Busn210ch14 Statistics Series
19/61
ionship. This
negative)
sitive one.
ear Regression
(x Deviation)*
(y Deviation)
(x Deviation)*
(y Deviation)
6x + 11436
.7193
0 120
Coefficient of Correlation = Measures Strength
and Direction Of Liner Relationship. Does Not
Have A Problem With Units. Range From -1 to
0 to + 1. -1 = Perfect Indirect (Negative)
Relationship (as x increases, y decreases). 0 =
No Relationship. +1 = Perfect Direct (Positive)
Relationship (as x increases, y increases).
Used for Linear Relationship only.
(
)
Sample Standard Deviation = Spread In Data. How
Fairly Does The Mean Represent The Data Points?
s =
2
(1)
Sample Covariance = Measure the Strength of the Linear
Relationship Between 2 Variables, but has problem with units.
Note: See 4 Quadrant Example of why this measure makes
sense.
sxy =
( )
1
7/30/2019 Busn210ch14 Statistics Series
20/61
Correlation is not causation
Ybar
$4,068
$4,068
100 120 140
ear (Sales Ice Cream)
xy sxsy
7/30/2019 Busn210ch14 Statistics Series
21/61
x + 4.9156
.0078
5 20 25
7/30/2019 Busn210ch14 Statistics Series
22/61
Linear Regression #3: Coefficient of Correlation: Strength & Direction of Relationship
Calculate the Sample Covariance long hand to get measure of strength of the linear relationship.
Use Scatter Plot with Trendline & X and Y Mean Lines to see why covariance makes sense
Calculate the Sample Covariance using Excel function COVARIANCE.S
Measure Strength and Direction of Relationship with Coefficient of Correlation
Calculate Coefficient of Correlation long hand to get a measure of the strength and direction of the linear rela
number will vary from -1 to 0 to +1 (minus one to zero to positive one) and will indicate a perfect indirect (
relationship when minus one, no relationship when it is zero and a perfect direct relationship when it is poReasonable positive number = Direct, Positive Relationship: As x increases, y increases
Reasonable negative number = Indirect, Negative Relationship: As x increases, y decreases
Number close to zero = No relationship: no pattern can be seen
See three charts to help visualize the three correlation situations.
Calculate Coefficient of Correlation with the Excel functions CORREL and PEARSON
Calculate Sample Standard Deviation long hand to see that it is related to Coefficient of Correlation and other Li
calculations
Xbar y
59.93333333 0
Example 2: 59.93333333 10000
x Ybar
0 $5,409
Mean: 59.93333 $5,409 100 $5,409
Count 15
n -1 14
Sample Point x y (x Deviation) (y Deviation) (x Deviation)^2 (y Deviation)^2
No.Temperature (F) Sales Chicken Soup (x - Xbar) (y - Ybar) (x - Xbar)^2 (y - Ybar)^2
1 86 $3,300 26.0666667 -2109.33333 679.4711111 4449287.111
2 40 $8,200 -19.9333333 2790.666667 397.3377778 7787820.444
3 41 $8,900 -18.9333333 3490.666667 358.4711111 12184753.78
4 78 $3,100 18.0666667 -2309.33333 326.4044444 5333020.444
5 71 $4,020 11.0666667 -1389.33333 122.4711111 1930247.111
6 91 $1,950 31.0666667 -3459.33333 965.1377778 11966987.11
7 70 $2,500 10.0666667 -2909.33333 101.3377778 8464220.444
8 37 $6,500 -22.9333333 1090.666667 525.9377778 1189553.778
9 65 $6,210 5.06666667 800.6666667 25.67111111 641067.1111
10 42 $5,250 -17.9333333 -159.333333 321.6044444 25387.1111111 53 $7,200 -6.93333333 1790.666667 48.07111111 3206487.111
12 83 $2,750 23.0666667 -2659.33333 532.0711111 7072053.778
13 63 $7,150 3.06666667 1740.666667 9.404444444 3029920.444
14 36 $7,900 -23.9333333 2490.666667 572.8044444 6203420.444
15 43 $6,210 -16.9333333 800.6666667 286.7377778 641067.1111
Sum of Deviations 0.00 0.00
SUM Deviations^2 ====================>> 5272.933333 74125293.33
SUM Mult. Deviations =============================================>>
y = -100.5
R =
$0
$2,000
$4,000
$6,000
$8,000
$10,000
$12,000
0 20 40 60 80 1
Sa
lesC
hic
kenSoup
Temperature (F)
7/30/2019 Busn210ch14 Statistics Series
23/61
Sample SD x 19.40716608 19.40716608
Sample SD y 2301.013648 2301.013648
Sample Covariance -37874.3333 -37874.33333 -37874.33333
Coefficient of Correlation -0.84813245 -0.84813245
-0.84813245
Xbar y x
66.27272727 0 0
Example 3: 66.27272727 8000 120
Sample Point x y
No.
Temperat
ure (F) Sales Ice Cream
1 91 $7,113
2 45 $2,044
3 46 $1,108
4 83 $7,093
5 76 $3,902
6 96 $6,676
7 75 $5,403
8 42 $8869 70 $4,740
10 47 $2,637
11 58 $3,150
Mean: 66.27273 $4,068
Sample
Covariance 43143.69
Coefficient
of
Correlation 0.951608 Strength and Direction of the relationship
Coefficient of Determination = R^2 = "Goodness of fit for our line" r^2 0.905558201
Example 4:
Sample Point x y
y = 112x - 3354.1
R = 0.9056
$0
$1,000
$2,000
$3,000
$4,000
$5,000
$6,000
$7,000
$8,000
$9,000
0 20 40 60 80
Sa
lesIceCream
Temperature (F)
Sales Ice Cream Xbar Ybar Li
7/30/2019 Busn210ch14 Statistics Series
24/61
No.
Years
Using
Excel
Expert Level (Rating 1 -
10))
1 3 5
2 8 1
3 6 9
4 11 55 20 3
6 7 4
7 9 10
8 3 6
9 19 10
10 2 1
11 16 2
12 12 7
13 1 6
Mean: 9.636364 5.307692308
Sample
Covariance
Coefficient
of
Correlation 0.088518
r^2 0.007835
y = 0.0436
R = 0.
0
2
4
6
8
10
12
0 5 10 1
ExpertLeve
l(Rating1-
10
))
Years Using Excel
7/30/2019 Busn210ch14 Statistics Series
25/61
ionship. This
negative)
sitive one.
ear Regression
(x Deviation)*
(y Deviation)
(x Deviation)*(y Deviation)
-54983.28889
-55627.28889
-66089.95556
-41721.95556
-15375.28889
-107469.9556
-29287.28889
-25012.62222
4056.711111
2857.377778-12415.28889
-61341.95556
5338.044444
-59609.95556
-13557.95556
-530240.6667
6x + 11436
.7193
0 120
Coefficient of Correlation = Measures Strength
and Direction Of Liner Relationship. Does Not
Have A Problem With Units. Range From -1 to
0 to + 1. -1 = Perfect Indirect (Negative)
Relationship (as x increases, y decreases). 0 =
No Relationship. +1 = Perfect Direct (Positive)
Relationship (as x increases, y increases).
Used for Linear Relationship only.
rxy =
(
)
Sample Standard Deviation = Spread In Data. How
Fairly Does The Mean Represent The Data Points?
s =
2
(1)
Sample Covariance = Measure the Strength of the Linear
Relationship Between 2 Variables, but has problem with units.
Note: See 4 Quadrant Example of why this measure makes
sense.
sxy =
( )
1
7/30/2019 Busn210ch14 Statistics Series
26/61
Correlation is not causation
Ybar
$4,068
$4,068
100 120 140
ear (Sales Ice Cream)
x
7/30/2019 Busn210ch14 Statistics Series
27/61
x + 4.9156
.0078
5 20 25
7/30/2019 Busn210ch14 Statistics Series
28/61
Linear Regression #4: Calculate Slope & Y-Intercept, Create Estimated Equation and Use It
Formula for slope is derived from the expression minSUM(y observed value - y Predicted value)^2 using d
667.
Calculate Slope and Y-Intercept for Regression Line long hand.
Calculate Slope using the SLOPE Function
Calculate the y-Intercept using the INTERCEPT Function
Slope = Rise Over Run = For every one unit of x, how far does y move?
Y-intercept = y value where x = zero. = point at which line crosses axis
Use slope and y-intercept to create estimated simple linear regression equation (lin
From sample data, the slope and y-intercept are point estimates for the population parameters f
Use estimated simple linear regression line to make predictions
Be careful when making predictions with the estimated simple linear regression equation (line or model)
range of the sample data. Why? Because the data may show a linear relationship over the range of sampl
relationship outside that sampled range.See how to use FORECAST function to make predictions.
Xbar
59.93333333
Example 2: 59.93333333
x
0
Mean: 59.93333333 $5,409 100
Count 15
n -1 14
Sample Point x y (x Deviation) (y Deviation) (x Deviation)^2
No.
Temperature
(F) Sales Chicken Soup (x - Xbar) (y - Ybar) (x - Xbar)^2
1 86 $3,300 26.06666667 -2109.33333 679.4711111
2 40 $8,200 -19.93333333 2790.666667 397.3377778
3 41 $8,900 -18.93333333 3490.666667 358.4711111
4 78 $3,100 18.06666667 -2309.33333 326.4044444
5 71 $4,020 11.06666667 -1389.33333 122.4711111
6 91 $1,950 31.06666667 -3459.33333 965.1377778
7 70 $2,500 10.06666667 -2909.33333 101.33777788 37 $6,500 -22.93333333 1090.666667 525.9377778
9 65 $6,210 5.066666667 800.6666667 25.67111111
10 42 $5,250 -17.93333333 -159.333333 321.6044444
11 53 $7,200 -6.933333333 1790.666667 48.07111111
12 83 $2,750 23.06666667 -2659.33333 532.0711111
13 63 $7,150 3.066666667 1740.666667 9.404444444
14 36 $7,900 -23.93333333 2490.666667 572.8044444
15 43 $6,210 -16.93333333 800.6666667 286.7377778
$0
$2,000
$4,000
$6,000
$8,000
$10,000
$12,000
0 20 40
Sa
lesC
hic
kenSoup
Tem
7/30/2019 Busn210ch14 Statistics Series
29/61
Sum of Deviations 0.00 0.00
SUM Deviations^2 ====================>> 5272.933333
SUM Mult. Deviations ===========================================
Sample SD x 19.40716608 19.40716608
Sample SD y 2301.013648 2301.013648
Sample Covariance -37874.3333 -37874.33333
Coefficient of Correlation -0.84813245 -0.84813245Slope
Y-Intercept
x-value to make
prediction 71
Equation to Predict
Xbar y
66.27272727 0
Example 3: 66.27272727 8000
Sample Point x y
No.
Temperature
(F) Sales Ice Cream
1 91 $7,113
2 45 $2,044
3 46 $1,108
4 83 $7,093
5 76 $3,9026 96 $6,676
7 75 $5,403
8 42 $886
9 70 $4,740
10 47 $2,637
11 58 $3,150
Mean: 66.27272727 $4,068
Sample
Covariance
Coefficient of
Correlation Strength and Direction of the relationship (-1 to 0 to +1)
r^2 Coefficient of Determination = R^2 = "Goodness of fit for our line" (Number
Slope for every one unit of x, how far does y move?
Y Intercept Point at which estimated regression line crosses y-axis
x 85
Predicted y
y = 112x - 3354.1
R = 0.9056
$0
$1,000
$2,000
$3,000
$4,000
$5,000
$6,000
$7,000
$8,000
$9,000
0 20 40 60
Sa
lesIceCream
Temp
Sales Ice Cream Xbar
7/30/2019 Busn210ch14 Statistics Series
30/61
Check: $6,165.78
7/30/2019 Busn210ch14 Statistics Series
31/61
o Make Predictions
ifferential calculus. See text page
or model)
or slope and y-intercept
hen the x values are outside the
data, but may show some other
y
0
10000
Ybar
$5,409
$5,409
(y Deviation)^2
(x Deviation)*
(y Deviation)
(y - Ybar)^2
(x Deviation)*
(y Deviation)
4449287.111 -54983.28889
7787820.444 -55627.28889
12184753.78 -66089.95556
5333020.444 -41721.95556
1930247.111 -15375.28889
11966987.11 -107469.9556
8464220.444 -29287.288891189553.778 -25012.62222
641067.1111 4056.711111
25387.11111 2857.377778
3206487.111 -12415.28889
7072053.778 -61341.95556
3029920.444 5338.044444
6203420.444 -59609.95556
641067.1111 -13557.95556
y = -100.56x + 11436
R = 0.7193
60 80 100 120
perature (F)
Coefficient of Correlation = Measures Strength and Direction
Not Have A Problem With Units. Range From -1 to 0 to + 1. -1
Relationship (as x increases, y decreases). 0 = No Relationship.
Relationship (as x increases, y increases). Used for Linear Rela
rxy =
( )
sxsy
Sample Standard Deviation = Spread In Data. How Fairly Do
Data Points?
s =
2
(1)
Sample Covariance = Measure the Strength of the Linear Relati
but has problem with units. Note: See 4 Quadrant Example of
sense.
sxy =
(
)
1
Estimated Simple Linear Regression Equation
i = b0 + b1xi
Model based off of proof that minimizes:
Least Squares Criterion:
min= ( i)2
or min= ( b0 + b1xi)2
In order to get formula for b0 and b1:
Slope of Line (for every 1 unit of x, how much does y move?)
b1 =
(
)
2
7/30/2019 Busn210ch14 Statistics Series
32/61
74125293.33
=>> -530240.6667
Correlation is not causation Strength and Direction of the relationship (-1 toFor every one unit of x, how far does y move?
Point at which estimated regression line crosses y-axis
x Ybar
0 $4,068
120 $4,068
between 0 and 1)
80 100 120 140
erature (F)
Ybar Linear (Sales Ice Cream)
Y-Intercept (at what point does the line cross the y-axis?)
b0 = Ybar - b1*Xbar
7/30/2019 Busn210ch14 Statistics Series
33/61
7/30/2019 Busn210ch14 Statistics Series
34/61
f Liner Relationship. Does
= Perfect Indirect (Negative)
+1 = Perfect Direct (Positive)
tionship only.
es The Mean Represent The
onship Between 2 Variables,
hy this measure makes
7/30/2019 Busn210ch14 Statistics Series
35/61
to +1)
7/30/2019 Busn210ch14 Statistics Series
36/61
Linear Regression #4: Calculate Slope & Y-Intercept, Create Estimated Equation and Use I
Formula for slope is derived from the expression minSUM(y observed value - y Predicted value)^2 usin
667.
Calculate Slope and Y-Intercept for Regression Line long hand.
Calculate Slope using the SLOPE Function
Calculate the y-Intercept using the INTERCEPT Function
Slope = Rise Over Run = For every one unit of x, how far does y move
Y-intercept = y value where x = zero. = point at which line crosses axi
Use slope and y-intercept to create estimated simple linear regression equation (li
From sample data, the slope and y-intercept are point estimates for the population parameter
Use estimated simple linear regression line to make predictions
Be careful when making predictions with the estimated simple linear regression equation (line or model
range of the sample data. Why? Because the data may show a linear relationship over the range of s
other relationship outside that sampled range.See how to use FORECAST function to make predictions.
Xbar
59.93333333
Example 2: 59.93333333
x
0
Mean: 59.93333 $5,409 100
Count 15
n -1 14
Sample Point x y (x Deviation) (y Deviation) (x Deviation)^2
No.
Temperat
ure (F) Sales Chicken Soup (x - Xbar) (y - Ybar) (x - Xbar)^2
1 86 $3,300 26.06666667 -2109.33333 679.4711111
2 40 $8,200 -19.93333333 2790.666667 397.3377778
3 41 $8,900 -18.93333333 3490.666667 358.4711111
4 78 $3,100 18.06666667 -2309.33333 326.4044444
5 71 $4,020 11.06666667 -1389.33333 122.4711111
6 91 $1,950 31.06666667 -3459.33333 965.1377778
7 70 $2,500 10.06666667 -2909.33333 101.3377778
8 37 $6,500 -22.93333333 1090.666667 525.9377778
9 65 $6,210 5.066666667 800.6666667 25.6711111110 42 $5,250 -17.93333333 -159.333333 321.6044444
11 53 $7,200 -6.933333333 1790.666667 48.07111111
12 83 $2,750 23.06666667 -2659.33333 532.0711111
13 63 $7,150 3.066666667 1740.666667 9.404444444
14 36 $7,900 -23.93333333 2490.666667 572.8044444
15 43 $6,210 -16.93333333 800.6666667 286.7377778
Sum of Deviations 0.00 0.00
SUM Deviations^2 ====================>> 5272.933333
$0
$2,000
$4,000
$6,000
$8,000
$10,000
$12,000
0 20 40
Sa
lesC
hic
kenSoup
Tem
7/30/2019 Busn210ch14 Statistics Series
37/61
SUM Mult. Deviations ===========================================
Sample SD x 19.40716608 19.40716608
Sample SD y 2301.013648 2301.013648
Sample Covariance -37874.3333 -37874.33333
Coefficient of Correlation -0.84813245 -0.84813245
Slope -100.558955 -100.5589552
Y-Intercept $11,436.17 11436.16671x-value to make
prediction 71 $4,296.48 4296.480896
Equation to Predict y Predicted = $11436.17 - $100.56*x
y Predicted = 11436.17 + -100.56*x
Xbar y
66.27272727 0
Example 3: 66.27272727 8000
Sample Point x y
No.
Temperat
ure (F) Sales Ice Cream
1 91 $7,113
2 45 $2,044
3 46 $1,108
4 83 $7,093
5 76 $3,902
6 96 $6,676
7 75 $5,4038 42 $886
9 70 $4,740
10 47 $2,637
11 58 $3,150
Mean: 66.27273 $4,068
Sample
Covariance 43143.69
Coefficient of
Correlation 0.951608 Strength and Direction of the relationship (-1 to 0 to +1)
r^2 0.905558 Coefficient of Determination = R^2 = "Goodness of fit for our line" (Number
Slope 111.9981 for every one unit of x, how far does y move?
Y Intercept -3354.05 Point at which estimated regression line crosses y-axis
x 85
Predicted y 6165.782
Check: 6165.782
y = 112x - 3354.1
R = 0.9056
$0
$1,000
$2,000$3,000
$4,000
$5,000
$6,000
$7,000
$8,000
$9,000
0 20 40 60
Sa
lesIceCream
Temp
Sales Ice Cream Xbar
7/30/2019 Busn210ch14 Statistics Series
38/61
t To Make Predictions
differential calculus. See text page
?
ine or model)
s for slope and y-intercept
) when the x values are outside the
mple data, but may show some
y
0
10000
Ybar
$5,409
$5,409
(y Deviation)^2
(x Deviation)*
(y Deviation)
(y - Ybar)^2
(x Deviation)*
(y Deviation)
4449287.111 -54983.28889
7787820.444 -55627.28889
12184753.78 -66089.95556
5333020.444 -41721.95556
1930247.111 -15375.28889
11966987.11 -107469.9556
8464220.444 -29287.28889
1189553.778 -25012.62222
641067.1111 4056.71111125387.11111 2857.377778
3206487.111 -12415.28889
7072053.778 -61341.95556
3029920.444 5338.044444
6203420.444 -59609.95556
641067.1111 -13557.95556
74125293.33
y = -100.56x + 11436
R = 0.7193
60 80 100 120
perature (F)
Coefficient of Correlation = Measures Strength and Direction
Not Have A Problem With Units. Range From -1 to 0 to + 1. -1Relationship (as x increases, y decreases). 0 = No Relationship.
Relationship (as x increases, y increases). Used for Linear Rela
rxy =
( )
sxsy
Sample Standard Deviation = Spread In Data. How Fairly Do
Data Points?
s =
2
(1)
Sample Covariance = Measure the Strength of the Linear Relati
but has problem with units. Note: See 4 Quadrant Example of
sense.
sxy =
(
)
1
Estimated Simple Linear Regression Equation
i = b0 + b1xiModel based off of proof that minimizes:
Least Squares Criterion:
min= ( i)2
or min= ( b0 + b1xi)2
In order to get formula for b0 and b1:
Slope of Line (for every 1 unit of x, how much does y move?)
b1 =
(
)
2
Y-Intercept (at what point does the line cross the y-axis?)
7/30/2019 Busn210ch14 Statistics Series
39/61
=>> -530240.6667
Correlation is not causation Strength and Direction of the relationship (-1 to
For every one unit of x, how far does y move?
Point at which estimated regression line crosses y-axis
x Ybar
0 $4,068
120 $4,068
between 0 and 1)
80 100 120 140
erature (F)
Ybar Linear (Sales Ice Cream)
0 = bar - 1 bar
7/30/2019 Busn210ch14 Statistics Series
40/61
f Liner Relationship. Does
= Perfect Indirect (Negative)+1 = Perfect Direct (Positive)
tionship only.
es The Mean Represent The
onship Between 2 Variables,
hy this measure makes
7/30/2019 Busn210ch14 Statistics Series
41/61
to +1)
7/30/2019 Busn210ch14 Statistics Series
42/61
Linear Regression #5: Coefficient of Determination: Goodness of Fit =
Calculate Total Sum Of Squares (Total Y Deviations Squared) = SST = How well observations cluster around
deviations of y observed and Mean of Y (Ybar)
Calculate Sum of Squares Due To Error = SSE = How well observations cluster around estimated simple line
deviations between y observed and y predicted = measure of variation that is not explained by the estimat
model).Calculate Sum of Squares Due To Regression = SSR = SST - SSE = sum of squares of deviations betwe
Relationship between SST and SSR and SST is: SST = SSR + SSE. When there is no error, the predicted values
regression line and therefore SSE would equal zero. In this case SST = SSR + 0 and SSR/SST = 1, which mean
the Coefficient of Determination will always be a number between 0 and 1. 0 = "no goodness of fit
SSR/SST = Coefficient of Determination = R Squared = r^2
Use RSQ function to calculate Coefficient of Determination
Use Coefficient of Correlation Squared to calculate coefficient of Deter
Coefficient of Determination can be used for linear and non-linear relationships. This is compared to Coeffic
for linear relationships.
Xbar Ybar
Mean 59.93333 $5,409
Slope -100.559 Part of Total Variation
Intercept 11436.17 Not explained by model
Sample Point x y Predicted y Residual Residual^2
No.
Temperat
ure (F)
Sales Chicken
Soup Predicted y
(y Observed - y
Predicted)
(y Observed - y
Predicted)^2
1 86 $3,300
2 40 $8,200
3 41 $8,900
4 78 $3,100
5 71 $4,020
6 91 $1,950
7 70 $2,500
8 37 $6,500
9 65 $6,210
10 42 $5,25011 53 $7,200
12 83 $2,750
13 63 $7,150
14 36 $7,900
15 43 $6,210
SSE
SSR
$0
$2,000
$4,000
$6,000
$8,000
$10,000
0 20 40 60
Sa
lesC
hic
ke
nSoup
Temperature
Sales Chicken Soup
Xbar
Yabr
Observation 3 Total Variation (y3
Residual (y3 - Y Observed)
Explained Part of Total Variation (
Linear (Sales Chicken Soup)
7/30/2019 Busn210ch14 Statistics Series
43/61
SSR + SSE = SST
Coefficient of Determination = r^2 = Measure of goodness of fit = r^2 = SSR/SST
Check:
Coefficient of Correlation
r^2 = SSR/SST
Proportion of the variability in the dependent variable y that is explained by the estimated rHow well does the estimated regression line fit the data?
Measure of the goodness of fit for the estimated regression line
A number between 0 and +1
Can be used your nonlinear relationships as well as linear.
How well are observations are more closely grouped about the least squares line? 1 = perfec
Xbar y
66.27272727 0
Example 3: 66.27272727 8000
Sample Point x y
No.
Temperat
ure (F) Sales Ice Cream
1 91 $7,113
2 45 $2,044
3 46 $1,108
4 83 $7,093
5 76 $3,902
6 96 $6,676
7 75 $5,403
8 42 $8869 70 $4,740
10 47 $2,637
11 58 $3,150
Mean: 66.27273 4068.363636
Slope 111.9981 for every one unit of x, how far does y move?
y = 112x - 3354.1
R = 0.9056
$0
$1,000
$2,000
$3,000
$4,000
$5,000
$6,000
$7,000
$8,000
$9,000
0 20 40 60
Sa
lesIceCream
Temp
Sales Ice Cream Xbar
7/30/2019 Busn210ch14 Statistics Series
44/61
Y Intercept -3354.05 Point at which estimated regression line crosses y-axis
x 75
Predicted y 5045.801
Coefficient of
Correlation 0.951608 Strength and Direction of the relationship (-1 to 0 to +1)
r^2 Coefficient of Determination = R^2 = "Goodness of fit for our line" (Number betwProportion of the variability in the dependent variable y that is explained by the e
How well does the estimated regression line fit the data?
Can be used your nonlinear relationships as well as linear.
7/30/2019 Busn210ch14 Statistics Series
45/61
SR/SST Xbar
Bar (Y Mean Plotted Line) = Total squared59.93333
ar regression equation = sum of squares of
d simple linear regression equation (line or59.93333
n y predicted and Mean of Y (Ybar)
nd the observed values would all lie on the
perfect "goodness of fit". This means that
and 1 = "perfect goodness of fit".
ination
ient of Correlation, which can only be used
Part of Total Variation
Explained by Model
(Predicted y - Ybar)^2 (y Deviations)^2
(Predicted y - Ybar)^2
(y Observed -
Ybar)^2
SST = Total Variation
ith residual = observed value - predicte
predict = ( i)
Sum Of Squares Due To Error (in model
the Estimated Line = SSE = Not Explain
SSE = ( i)2
Total Sum Of Squares (Deviation from
cluster around the Ybar Line = SST
SST = ( )2
Sum Of Squares Due To Regression (Pr
SST = SSR
SSR = (i )2
Relationship between three:
SST = SSR + SSE
If there is no deviation in the observed
SSR = SST, thus:
SSR/SST = 1 = perfect PrCoefficient of Determination = How we
the data? = Measure of the goodness o
number between 0 and +1. Can be use
linear.
y = -100.56x + 11436
R = 0.7193
80 100 120
(F)
- Ybar)
Y Predicted - Ybar)
7/30/2019 Busn210ch14 Statistics Series
46/61
Goodness of fit of model to observed values (number between
Strength and Direction (number between -1 and 1)
gression equation
tly. 0 = Not at all.
x Ybar
0 $4,068
120 $4,068
rxy2 = r2 = SSR/SST
= The percentage of total sum of squar
estimated regression equation = Propo
variable y that is explained by the estiare more closely grouped about the lea
Using r^2 only, we can draw no conclu
between x and y is statistically significa
considerations that involve sample size
sampling distributions of the least squa
80 100 120 140
rature (F)
bar Linear (Sales Ice Cream)
7/30/2019 Busn210ch14 Statistics Series
47/61
een 0 and 1)stimated regression equation
7/30/2019 Busn210ch14 Statistics Series
48/61
y x Yabr
Observation 3 Total
Variation (y3 - Ybar)
Residual (y3 - Y
Observed) y
1 0 $5,409 41 $8,900 42 $8,900
10000 100 $5,409 41 $5,409 42 7313.249551
d value = represents error in using i to
) = How well observations cluster around
d Part of SST
ean) = How well the observations
dicted y minus Y bar) = Explained Part of
values and the model values SSE = 0 and
diction.ll does the estimated regression line fit
f fit for the estimated regression line = A
your nonlinear relationships as well as
7/30/2019 Busn210ch14 Statistics Series
49/61
that can be explained by using the
rtion of the variability in the dependent
ated regression equation. Observationsst squares line.
ion about whether the relationship
nt. Such conclusions must be based on
and properties of the appropriate
res estimators.
7/30/2019 Busn210ch14 Statistics Series
50/61
7/30/2019 Busn210ch14 Statistics Series
51/61
Explained Part of Total
Variation (Y Predicted - Ybar) y
42 7313.25
42 $5,409
7/30/2019 Busn210ch14 Statistics Series
52/61
Linear Regression #5: Coefficient of Determination: Goodness of Fit =
Calculate Total Sum Of Squares (Total Y Deviations Squared) = SST = How well observations cluster around
deviations of y observed and Mean of Y (Ybar)
Calculate Sum of Squares Due To Error = SSE = How well observations cluster around estimated simple line
deviations between y observed and y predicted = measure of variation that is not explained by the estimat
model).Calculate Sum of Squares Due To Regression = SSR = SST - SSE = sum of squares of deviations betwe
Relationship between SST and SSR and SST is: SST = SSR + SSE. When there is no error, the predicted values
regression line and therefore SSE would equal zero. In this case SST = SSR + 0 and SSR/SST = 1, which mean
the Coefficient of Determination will always be a number between 0 and 1. 0 = "no goodness of fitSSR/SST = Coefficient of Determination = R Squared = r^2
Use RSQ function to calculate Coefficient of Determination
Use Coefficient of Correlation Squared to calculate coefficient of Deter
Coefficient of Determination can be used for linear and non-linear relationships. This is compared to Coeffic
for linear relationships.
Xbar Ybar
Mean 59.93333 $5,409
Slope -100.559 Part of Total Variation
Intercept 11436.17 Not explained by model
Sample Point x y Predicted y Residual Residual^2
No.
Temperat
ure (F)
Sales Chicken
Soup Predicted y
(y Observed - y
Predicted)
(y Observed - y
Predicted)^2
1 86 $3,300 2788.096569 511.9034314 262045.1232 40 $8,200 7413.808506 786.1914937 618097.0647
3 41 $8,900 7313.249551 1586.750449 2517776.987
4 78 $3,100 3592.56821 -492.56821 242623.4415
5 71 $4,020 4296.480896 -276.4808961 76441.68594
6 91 $1,950 2285.301793 -335.3017928 112427.2923
7 70 $2,500 4397.039851 -1897.039851 3598760.197
8 37 $6,500 7715.485372 -1215.485372 1477404.689
9 65 $6,210 4899.834627 1310.165373 1716533.304
10 42 $5,250 7212.690596 -1962.690596 3852154.376
11 53 $7,200 6106.542089 1093.457911 1195650.203
12 83 $2,750 3089.773434 -339.7734341 115445.986513 63 $7,150 5100.952537 2049.047463 4198595.504
14 36 $7,900 7816.044327 83.955673 7048.555028
15 43 $6,210 7112.131641 -902.1316408 813841.4974
SSE 20804845.91
SSR
SSR + SSE = SST
$0
$2,000
$4,000
$6,000
$8,000
$10,000
0 20 40 60
Sa
lesCh
ickenSoup
Temperature
Sales Chicken SoupXbarYabrObservation 3 Total Variation (y3Residual (y3 - Y Observed)Explained Part of Total Variation (Linear (Sales Chicken Soup)
7/30/2019 Busn210ch14 Statistics Series
53/61
Coefficient of Determination = r^2 = Measure of goodness of fit = r^2 = SSR/SST
Check:
Coefficient of Correlation
r^2 = SSR/SST
Proportion of the variability in the dependent variable y that is explained by the estimated r
How well does the estimated regression line fit the data?
Measure of the goodness of fit for the estimated regression lineA number between 0 and +1
Can be used your nonlinear relationships as well as linear.
How well are observations are more closely grouped about the least squares line? 1 = perfec
Xbar y
66.27272727 0
Example 3: 66.27272727 8000
Sample Point x y
No. Temperature (F) Sales Ice Cream
1 91 $7,113
2 45 $2,044
3 46 $1,108
4 83 $7,093
5 76 $3,902
6 96 $6,676
7 75 $5,403
8 42 $886
9 70 $4,740
10 47 $2,63711 58 $3,150
Mean: 66.27273 4068.363636
Slope 111.9981 for every one unit of x, how far does y move?
Y Intercept -3354.05 Point at which estimated regression line crosses y-axis
x 75
y = 112x - 3354.1R = 0.9056
$0
$1,000
$2,000
$3,000
$4,000
$5,000
$6,000
$7,000
$8,000
$9,000
0 20 40 60
Sa
lesIceCream
Temp
Sales Ice Cream Xbar
7/30/2019 Busn210ch14 Statistics Series
54/61
Predicted y 5045.801
Coefficient of
Correlation 0.951608 Strength and Direction of the relationship (-1 to 0 to +1)
r^2 0.905558 Coefficient of Determination = R^2 = "Goodness of fit for our line" (Number betw
Proportion of the variability in the dependent variable y that is explained by the e
How well does the estimated regression line fit the data?Can be used your nonlinear relationships as well as linear.
7/30/2019 Busn210ch14 Statistics Series
55/61
SR/SST Xbar
Bar (Y Mean Plotted Line) = Total squared59.93333
ar regression equation = sum of squares of
d simple linear regression equation (line or59.93333
n y predicted and Mean of Y (Ybar)
nd the observed values would all lie on the
perfect "goodness of fit". This means that
and 1 = "perfect goodness of fit".
ination
ient of Correlation, which can only be used
Part of Total Variation
Explained by Model
(Predicted y - Ybar)^2 (y Deviations)^2
(Predicted y - Ybar)^2
(y Observed -
Ybar)^2
6870882.177 4449287.1114017920.719 7787820.444
3624896.965 12184753.78
3300635.513 5333020.444
1238440.547 1930247.111
9759573.066 11966987.11
1024738.094 8464220.444
5318337.225 1189553.778
259588.9316 641067.1111
3252097.417 25387.11111
486100.0492 3206487.111
5380358.126 7072053.77895098.71525 3029920.444
5792257.807 6203420.444
2899522.076 641067.1111
74125293.33 SST = Total Variation
53320447.43
74125293.33
ith residual = observed value - predicte
predict = ( i)
Sum Of Squares Due To Error (in model
the Estimated Line = SSE = Not Explain
SSE = ( i)2
Total Sum Of Squares (Deviation from
cluster around the Ybar Line = SST
SST = ( )2
Sum Of Squares Due To Regression (Pr
SST = SSR
SSR = (i )2
Relationship between three:
SST = SSR + SSE
If there is no deviation in the observed
SSR = SST, thus:
SSR/SST = 1 = perfect PrCoefficient of Determination = How we
the data? = Measure of the goodness o
number between 0 and +1. Can be use
linear.
y = -100.56x + 11436
R = 0.7193
80 100 120
(F)
- Ybar)
Y Predicted - Ybar)
7/30/2019 Busn210ch14 Statistics Series
56/61
0.719328653 Goodness of fit of model to observed values (number between
0.719328653
-0.84813245 Strength and Direction (number between -1 and 1)
0.719328653
gression equation
tly. 0 = Not at all.
x Ybar
0 $4,068
120 $4,068
rxy2 = r2 = SSR/SST
= The percentage of total sum of squar
estimated regression equation = Propo
variable y that is explained by the estiare more closely grouped about the lea
Using r^2 only, we can draw no conclu
between x and y is statistically significa
considerations that involve sample size
sampling distributions of the least squa
80 100 120 140
rature (F)
bar Linear (Sales Ice Cream)
7/30/2019 Busn210ch14 Statistics Series
57/61
een 0 and 1)
stimated regression equation
7/30/2019 Busn210ch14 Statistics Series
58/61
y x Yabr
Observation 3 Total
Variation (y3 - Ybar)
Residual (y3 - Y
Observed) y
1 0 $5,409 41 $8,900 42 $8,900
10000 100 $5,409 41 $5,409 42 7313.249551
d value = represents error in using i to
) = How well observations cluster around
d Part of SST
ean) = How well the observations
dicted y minus Y bar) = Explained Part of
values and the model values SSE = 0 and
diction.ll does the estimated regression line fit
f fit for the estimated regression line = A
your nonlinear relationships as well as
7/30/2019 Busn210ch14 Statistics Series
59/61
that can be explained by using the
rtion of the variability in the dependent
ated regression equation. Observationsst squares line.
ion about whether the relationship
nt. Such conclusions must be based on
and properties of the appropriate
res estimators.
7/30/2019 Busn210ch14 Statistics Series
60/61
7/30/2019 Busn210ch14 Statistics Series
61/61
Explained Part of Total
Variation (Y Predicted - Ybar) y
42 7313.25
42 $5,409