Upload
leon-long
View
221
Download
0
Embed Size (px)
DESCRIPTION
We know that: Let the true correlation coefficient be ρ. Then test the hypotheses:
Citation preview
Examplex 1 2 3 4 5 6 7 8y 6 7 5 2 9 6 7 6
We wish to check for a non zero correlation
We know that:
22~
12
nt
rnr
Let the true correlation coefficient be ρ. Then test the hypotheses:
H0: ρ = 0
H1: ρ ≠ 0
It has already been shown that r = 0.1458
Thus,
2
2 0.1458 6 0.36101 0.02131
r n
r
The cut off points for the t distribution with 6 degrees of freedom for 2.5% top and bottom are +/-2.447.
-2.447 2.447
The t value of 0.3610 implies H0 is accepted
;;;;;;;;;;
;;;;;;;;;;;
There is no evidence of a non zero correlation between x and y.
Similarly, we can check whether the slope b is significantly different from 0.
So the value of b is 0.1190.
Now carry out a hypothesis test.
H0: b = 0
H1: b ≠ 0
The standard error of b is
This is calculated in R as 0.3298
1/ 22ˆ / xxS
^
The test statistic is
This calculates as (0.1190 – 0)/0.3298
= 0.3608
1/ 22
ˆ
ˆ / xx
b btS
Ds…..
……….
Again, t tables using 6 degrees of freedom give cut of point of 2.447 for 2.5%.
………-2.447………………................ 2.447
Since the test statistic t (0.3608) is less than this cut-off point, we accept the null hypothesis H0.
There is no evidence at the 5% level of a non-zero value of b.
To confirm this, the 95% CI is:
0.1190 +/- 2.447 x 0.3298 = (-0.688, 0.926)
Notice that this includes zero
Confidence Intervals for Variance
222
2
~ˆ2
nn
We quoted earlier that
This can be used to obtain a confidence interval for σ2
Recall the earlier example
y 3.5 3.2 3.0 2.9 4.0 2.5 2.3x 3.1 3.4 3.0 3.2 3.9 2.8 2.2
Estimate of error variance 2
2ˆ /( 2) 0.39418 / 5 0.07884RESSS n
2252
ˆ5 ~
25Now is equal to 0.8312 for “bottom”
2.5% and 12.83 for “top” 2.5%
95% CI for 2 is (5 0.07884/12.83 , 5 0.07884/0.8312) i.e. (0.031 , 0.474)
Trees Example
More than one variable
The residual plot suggests that the linear model is satisfactory. The R squared value seems quite high though, so from physical arguments we force the line to pass through the origin.
The R squared value is higher now, but the residual plot is not so random.
We might now ask if we can find a model with both explanatory variables height and girth. Physical considerations suggest that we should explore the very simple model
Volume = b1 × height × (girth)2 +
This is basically the formula for the volume of a cylinder.
So the equation is:
Volume = 0.002108 × height × (girth)2 +
The residuals are considerably smaller than those from any of the previous modelsconsidered. Further graphical analysis fails to reveal any further obvious dependenceon either of the explanatory variable girth or height.
Further analysis also shows that inclusion of a constant term in the model does not significantly improve the fit. Model 4 is thus the most satisfactory of those models considered for the data.
However, this is regression “through the origin” so it may be more satisfactory torewrite Model 4 as
volume = b1 +
height × (girth)2
so that b1 can then just be regarded as the mean of the observations of
volume height × (girth)2
recall that is assumed to have location measure (here mean) 0.
Compare with 0.002108 found earlier
Practical Question 2
y x1 x2
3.5 3.1 303.2 3.4 253.0 3.0 202.9 3.2 304.0 3.9 402.5 2.8 252.3 2.2 30
So y = -0.2138 + 0.8984x1 + 0.01745x2 + e
Use >plot(multregress)
> ynew=c(y,12)> x1new=c(x1,20)> x2new=c(x2,100)
> multregressnew=lm(ynew~x1new+x2new)
Very large influence
Second Example
> ynew=c(y,40)> x1new=c(x1,10)> x2new=c(x2,50)
> multregressnew=lm(ynew~x1new+x2new)