Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
ESTIMATED SLOPE
ii
i2
_
i
_
i
2_
__
1
yk
y)x(x
)x(x
)x(x
)y)(yx(xb
∑
∑∑
∑∑
=
−
−=
−
−−=
∑∑∑∑
−=
−−−=−−
))((
)()())(())((_
_____
ii
iiiii
yxx
xxyyxxyyxx
Reason: Inspecting the numerator of b1
SAMPLING DISTRIBUTION
∑ −=
=
∈
++=
2_
2
12
11
1
)x(x
σ)(bσ
β)E(b:Variance and Mean with Normal is b
slope estimated the of ondistributi sampling The),0(
:Model" RegressionError Normal" Under the
210
σε
εββ
NxY
The sampling distribution of the estimated slope, b1, is “normal” because b1 is a linear combination of the observations yi and the distribution of each observation is normal under the “normal error regression model”.
ii
i2
_
i
_
i1
yk
y)x(x
)x(xb
∑
∑∑
=
−
−=
∑
∑
−
−=
=
2_
i
_
ii
ii1
)x(x
)x(xk
ykb
∑∑∑∑
−=
=
=
2_
2
)(
11
0: thatshowcan We
xxk
xk
k
i
i
ii
i
1)(
))((
)(
))((
)()())(())((
2_
_
2_
__
____
=−
−=
−=
−−=
−−−=−
∑∑∑
∑∑
∑∑∑
xx
xxxxk
xx
xxxx
xxxxxxxxx
i
iiii
i
ii
iiiii
Next step:
b1 is an UNBIASED ESTIMATOR
1
10
10
1
2_
_
1
)(
)()(
)(
)(
b
β
ββ
ββ
=
+=
+=
=
−
−=
=
∑∑∑∑∑
∑
iii
ii
ii
i
ii
ii
xkk
xk
yEkbE
xx
xxk
yk
0 1
∑∑∑∑
−=
=
=
2_
2
)(
11
0:have We
xxk
xk
k
i
i
ii
i
VARIANCE & STANDARD ERROR
∑
∑∑
∑
∑
−=
=
=
=
−
−=
=
2_
2
22
21
21
2_
_
1
)(
)(
)()(
)(
)(
b
xx
k
yVarkbbVar
xx
xxk
yk
i
i
ii
i
ii
ii
σ
σ
σ
∑
∑
∑
−=
=
−=
−=
2_
11
2_
^
2_
2
12
)(
)()()(
)()(
xx
MSE
bsbSExx
MSExx
b
i
i
i
σσ
∑ −=
2_1
)()(
xx
MSEbSE
Design Implication: The larger the sum of squares of X, the more precise the estimate of the Slope.
MORE ON SAMPLING DISTRIBUTION
)()(
)()( 1
1
1
11
1
11
bbs
bb
bsb
σσββ
÷−
=−
222
1−=− ndfn
χdistributed as N(0,1)
freedom of degrees 2)(n with t"" as ddistribute is )(
:
1
11 −−bs
b βTheorem
CONFIDENCE INTERVALS
freedom of degrees 2)(n with t"" as ddistribute is )(
:
1
11 −−bs
b βTheorem
freedom of degrees 2)-(non with distributi t"" theof percentile α/2)100(1 theis 2)nα/2;t(1
:is βfor Interval Confidence α)100%(1 1
−−−−−±
−)2)s(bnα/2;t(1b 11
THE TEST FOR INDEPENDENCE
2
1
1
10
10
12
:r"" using test the toidentical iswhich
:freedom of degrees 2)(nat test t""0:
)|(:ResponseMean The
rnrt
)s(bbt
HxxXYE
−−
=
=
−=
+==β
ββ
freedom of degrees 2)(n with t"" as ddistribute is )(
:
1
11 −−bs
b βTheorem
STATISTICAL POWER The “power”, which is (1 – probability of Type II error), of the t-test for independence can be obtained, if needed, from Appendix B5 of the text
1
0
:0:β=
=SlopeHSlopeH
A
STATISTICAL POWER
).(H zero from is slope theof valuetruefar the how of measure edstandardiz
ameasure,ity noncentral theis δ where)(
);||Pr(|
2;)(
0:
A
1
12/1
1
1
10
bttPower
ndfbsbt
H
σβδδ
β
α =>=
−==
=
−
The “power”, which is (1 – probability of Type II error), of the t-test for independence can be obtained, if needed, from Appendix B5 of the text
freedom of degrees 2)(n with t"" as ddistribute is )(
:Theorem
1
11 −−bs
b β
More details on pages 50-51 of your textbook
For practical applications, you will less likely have anything to do with this issue of statistical power. When & if you do, specification of an Alternative Hypothesis is not easy.
Steps: (a) Click on the new Chart (scatter diagram) to make it active, (b) Click on Chart (on the top row menu), (c) a box appears to let you choose “Add Trendline”
Excel: REGRESSION LINE
Excel: ANALYSIS (1) click the Tools then (2) Data Analysis; among functions available, choose Regression.
A box appears, use the cursor to fill in the ranges of Y and X's. The results include all items needed, including regression estimates of coefficients, their standard errors, and their 95% confidence intervals. And much more.
BASIC “SAS” PROGRAM data tc; input x y; label x = 'Lot Size' y = 'Work Hrs'; cards;
proc REG data = tc; model y = x; plot y*x; run;
80 399 30 121 50 221 90 376 70 361 60 224 120 546 80 352 100 353 50 157 40 160 70 252 90 389 20 113 110 435 100 420 30 212 50 268 90 377 110 421 30 273 90 468 40 244 80 342 70 323;
(Toluca Company: data go in the middle)
x (oz) y (%)112 63111 66107 72119 5292 7580 11881 12084 114
118 42106 72103 9094 91
EXAMPLE #1: Birth weight data: Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 255.971912 19.04537379 13.44011 9.99506E-08 213.536175 298.4076492X Variable 1 -1.7370861 0.187689258 -9.255117 3.21622E-06 -2.155283843 -1.31888839
)319.1,155.2()1877)(.2281.2(7371.1C.I. 95%
255.91877.
7371.1
−−=±−=
−=
−=t
EXAMPLE #2: Age and SBP
Age (x) SBP (y)42 13046 11542 14871 10080 15674 16270 15180 15685 16272 15864 15581 16041 12561 15075 165
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Intercept 99.9585145 19.25516927 5.191256 0.000174 58.3602504 141.556779X Variable 1 0.70490069 0.286078656 2.46401 0.028454 0.08686533 1.32293605
)323.1,087(.)2861)(.1604.2(7049.C.I. 95%
464.22861.7049.
=±=
=
=t
LotSize WorkHours80 39930 12150 22190 37670 36160 224
120 54680 352
100 35350 15740 16070 25290 38920 113
110 435100 42030 21250 26890 377
110 42130 27390 46840 24480 34270 323
EXAMPLE #3: Toluca Company Data (Description on page 19 of Text)
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Intercept 62.3658586 26.17743389 2.382428 0.025851 8.21371106 116.518006X Variable 1 3.57020202 0.346972157 10.28959 4.45E-10 2.85243543 4.28796861
)288.4,852.2()3470)(.0687.2(5702.3C.I. 95%
290.103470.5702.3
=±=
=
=t
UNBIASED LINEAR ESTIMATORS
We consider the family of all unbiased linear estimators and prove that b1 is a member of this family with minimum variance:
∑∑
=
=
)( 1
1
^
ii
ii
ykb
ycβ
1xc
0c:Conditions
)xc(β)c(β
)xβ(βc
)E(yc)βE(
ycβ
ii
i
ii1i0
i10i
ii1
^
ii1
^
=
=
+=
+=
=
=
∑∑
∑∑∑∑∑
b1 satisfies these two conditions with ci = ki
0)(
1
)(
)(
1
)(
)(
2_
2_
_
2_
2_
_
2
=
−−
−
−=
−−
−
−=
−=
−=
+=
∑∑∑∑
∑∑∑
∑∑∑∑
xxxx
cxxc
xxxx
xxc
kkc
kckdk
ii
iii
ii
ii
iii
iiiii
iii dkcWant to prove that:
∑kidi = 0
1
0
[ ]
)()(
)0)(()(
2
)(
)(
12
1
^2
2221
2
222
22
221
^2
1
^
b
db
dkdk
dk
c
yc
i
iiii
ii
i
ii
σβσ
σσσ
σ
σ
σβσ
β
≥
++=
++=
+=
=
=
∑∑ ∑ ∑∑
∑∑
b1 is the member of the family with minimum variance.
SAMPLING DISTRIBUTION
−+=
=
∈
++=
∑ 2_
2_
20
2
00
0
)x(x
xn1σ)(bσ
β)E(b:Variance and Mean with Normal is b
intercept estimated the of ondistributi sampling The),0(
:Model" RegressionError Normal" Under the
210
σε
εββ
NxY
The sampling distribution of b0 is “normal” because b0, like b1, is a linear combination of the observations yi and the distribution of each observation is normal under the “normal error regression model”:
0
110
1
1
_
0
)()(
)()(
)()()E(b
β
βββ
β
=
−+
=
−=
−=
∑∑
∑∑
−
nx
nx
nx
nyE
bExyE
ii
ii
b0 is an UNBIASED ESTIMATOR
−+=
−+=
+=
−=
∑
∑
−
−
2_
2_
2
2_
22
_2
02
12
_
0
_
10
)(
)(1
)()()b(
)()()()Var(b
b
xx
xn
xxx
n
bVarxyVar
xby
σ
σσσ
VARIANCE & STANDARD ERROR
−+=
−+=
−+=
∑
∑
∑
2_
2_
0
)x(x
)x(n1MSE)SE(b
2_
2_
02
2_
2_
20
2
)(
)(1)b(
)(
)(1)b(
xx
xn
MSEs
xx
xn
σσ
DESIGN IMPLICATION
−+=
−=
∑
∑
2_
2_
2_1
)x(x
)x(n1MSE)SE(b
)x(x
MSE)SE(b
0
These Standard Errors, for given n, are affected by the spacing of the X’s levels in the data. The larger the sum of squares of X, the more precise the estimates of the Slope and the Intercept.
MORE ON SAMPLING DISTRIBUTION
)()(
)()( 0
0
0
00
0
00
bbs
bb
bsb
σσββ
÷−
=−
222
1−=− ndfn
χdistributed as N(0,1)
freedom of degrees 2)(n with t"" as ddistribute is )(
:
0
00 −−bs
b βTheorem
CONFIDENCE INTERVALS
freedom of degrees 2)(n with t"" as ddistribute is )(
:
0
00 −−bs
b βTheorem
freedom of degrees 2)-(non with distributi t"" theof percentile α/2)100(1 theis 2)nα/2;t(1
:is βfor Interval Confidence α)100%(1 0
−−−−−±
−)2)s(bnα/2;t(1b 00
x (oz) y (%)112 63111 66107 72119 5292 7580 11881 12084 114
118 42106 72103 9094 91
EXAMPLE #1: Birth weight data: Coefficients Standard Error t Stat P-value Lower 95% Upper 95%
Intercept 255.971912 19.04537379 13.44011 9.99506E-08 213.536175 298.4076492X Variable 1 -1.7370861 0.187689258 -9.255117 3.21622E-06 -2.155283843 -1.31888839
)4076.298,536.213()0454.19)(2281.2(9719.255C.I. 95%
440.130454.199719.255
=±=
=
=t
EXAMPLE #2: Age and SBP
Age (x) SBP (y)42 13046 11542 14871 10080 15674 16270 15180 15685 16272 15864 15581 16041 12561 15075 165
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Intercept 99.9585145 19.25516927 5.191256 0.000174 58.3602504 141.556779X Variable 1 0.70490069 0.286078656 2.46401 0.028454 0.08686533 1.32293605
)557.141,360.58()2552.19)(1604.2(9585.99C.I. 95%
191.52552.199585.99
=±=
=
=t
LotSize WorkHours80 39930 12150 22190 37670 36160 224
120 54680 352
100 35350 15740 16070 25290 38920 113
110 435100 42030 21250 26890 377
110 42130 27390 46840 24480 34270 323
EXAMPLE #3: Toluca Company Data (Description on page 19 of Text)
Coefficients Standard Error t Stat P-value Lower 95% Upper 95%Intercept 62.3658586 26.17743389 2.382428 0.025851 8.21371106 116.518006X Variable 1 3.57020202 0.346972157 10.28959 4.45E-10 2.85243543 4.28796861
)518.116,218.8()1774.26)(0687.2(3659.62C.I. 95%
382.21774.263659.62
=±=
=
=t
DEPARTURE FROM NORMALITY
),0(
:Model RegressionError Normal
210
σε
εββ
NxY
∈
++=
If the probability distributions of Y are not exactly normal but do not depart seriously, the sampling distributions of b0 and b1 would still be approximately normal with very little effects on the level of significance of the t-test for independence and the coverage of the confidence intervals. Even if the probability distributions of Y are far from normal, the effects are still minimal provided that the samples sizes are sufficiently large; i.e. the sampling distributions of b0 and b1 are asymptotically normal.
Sometimes it is known, a priori, that the true intercept is zero; the regression function is linear but the line goes through the origin (0,0):
),0(
:
21
σε
εβ
NxY
∈
+=origin the through Regression
xxXYE 1)|(:ResponseMean The
β==
RESULTS Let b1 be the estimate of slop β1, “the sum of squared errors” becomes Q=Σ (y - b1x)2 . The new “Least Squares Estimate” is:
xbY
xxy
1
^
21b
=
=∑∑
All inferences are still drawn through the use of the “t” distribution but with (n-1) degrees of freedom; for example:
∑
∑
=
−=
212
2
)(
1
xMSEbs
ne
MSE i
Besides “Least Squares”, parameters can be estimated using the method of “Maximum Likelihood”; results are called “MLE” – maximum likelihood estimators/estimates.
−−−=
−−−=
−−−=
∑
∏
=
=
n
iiin
ii
xy
xy
xy
1
21022/2
n
1i
21022/12
21022/12
)(2
1exp)(2
1
)(2
1exp)(2
1L
:Function Likelihood
)(2
1exp)(2
1f(y)
:Yfor Function Density
ββσπσ
ββσπσ
ββσπσ
SLOPE & INTERCEPT • The maximum likelihood estimates of the Intercept and
the Slope are identical to the Least Squares estimates. • Their variances, obtained from the inverse of the
Fisher’s Information Matrix, are also identical. • As the least squares estimates, Estimated Intercept and
Slope have the properties of least squares estimates: (1) they are unbiased, and (2) they have minimum variance in the class of all linear unbiased estimators).
• In addition, as MLEs for the normal error regression model: (3) they are consistent & (4) they are sufficient.
Readings & Exercises • Readings: A thorough reading of the text’s
sections 2.1-2.3 (pp. 40-51) is highly recommended.
• Exercises: The following exercises are good for practice, all from chapter 2 of text: 2.1-2.6.
• Due as Homework: 2.4 and 2.6.
Due As Homework #6.1 Refer to dataset “Infants”, with X = Gestational Weeks
and Y = Birth Weight: a) Obtain the 95% confidence interval for the slope and
interpret your result. Does your confidence interval include zero? What would be your conclusion about the possible linear relationship between X and Y.
b) Using the t-statistic test to see whether or not a linear association exist between X and Y.
c) Does your conclusion in part (b) agree with your conclusion in part (a)? Which result, in (a) or in (b), could tell you more about the strength of the relationship between X and Y?
#6.2 Answer the 3 questions of Exercise 6.1 using dataset “Vital Capacity” with X = Age and Y = (100)(Vital Capacity).