8
Assignment 1, due Sept. 19 2014 1. Find the equation of the line which passes through the points (1,1) and (4) (3,5). slope = rise/run, so slope = (5-1)/(3-1) = 2 y-intercept is -1, because a decrease in 1 in the horizontal direction leads to a change of 2 in the vertical direction the equation of the line is y = β 0 +2x, so substituting the first point gives 1= β 0 + 2(1), solving gives β 0 = -1 The equation of the line is y = -1+2x 2. Suppose you are given three data points (1,4), (2,6) and (3,7) and the (4) line y =2+3x. Give the three residuals and their sum of squares. The points on the line corresponding to the given x values are shown below, as are the residuals obtained by subtraction, and their squares x 2+3x e = y - (2 + 3x) e 2 1 5 -1 1 2 8 -2 4 3 11 -4 16 The sum of squares of the residuals is 21. 3. Some data gives the summaries: n = 20, x i y i = 100, x i = 20 and y i = 10. Suppose that the response y is temperature in degrees Celcius. (a) What is S xy , the sum of corrected cross products? (2) S xy = x i y i - x i y i /n = 100 - 20(10)/20 = 90 (b) If the response was converted to temperature in degrees Fahrenheit, so that y = 32 + 1.8y , what is y i ? (2) y i = (32+1.8)y i = 20(32)+1.8 y i = 640+1.8(10) = 658 (c) If the response was converted to temperature in degrees Fahrenheit, so that y = 32 + 1.8y , what is the sum of corrected crossproducts (2) S xy ? 1 https://www.coursehero.com/file/10709401/sol114/ This study resource was shared via CourseHero.com

sol114

  • Upload
    fdf

  • View
    212

  • Download
    0

Embed Size (px)

Citation preview

Page 1: sol114

Assignment 1, due Sept. 19 2014

1. Find the equation of the line which passes through the points (1,1) and(4)(3,5).

• slope = rise/run, so slope = (5-1)/(3-1) = 2

• y-intercept is -1, because a decrease in 1 in the horizontal directionleads to a change of 2 in the vertical direction

• the equation of the line is y = β0 + 2x, so substituting the firstpoint gives 1 = β0 + 2(1), solving gives β0 = −1

• The equation of the line is y = −1 + 2x

2. Suppose you are given three data points (1,4), (2,6) and (3,7) and the(4)line y = 2 + 3x. Give the three residuals and their sum of squares.

• The points on the line corresponding to the given x values areshown below, as are the residuals obtained by subtraction, and theirsquares

x 2 + 3x e = y − (2 + 3x) e2

1 5 -1 12 8 -2 43 11 -4 16

• The sum of squares of the residuals is 21.

3. Some data gives the summaries: n = 20,∑

xiyi = 100,∑

xi = 20 and∑

yi = 10. Suppose that the response y is temperature in degreesCelcius.

(a) What is Sxy, the sum of corrected cross products?(2)

• Sxy =∑

xiyi −∑

xi

∑yi/n = 100− 20(10)/20 = 90

(b) If the response was converted to temperature in degrees Fahrenheit,so that y′ = 32 + 1.8y, what is

∑y′i?(2)

∑y′i =

∑(32+1.8)yi = 20(32)+1.8

∑yi = 640+1.8(10) = 658

(c) If the response was converted to temperature in degrees Fahrenheit,so that y′ = 32 + 1.8y, what is the sum of corrected crossproducts(2)Sxy′?

1

https://www.coursehero.com/file/10709401/sol114/

This st

udy r

esou

rce w

as

share

d via

Course

Hero.co

m

Page 2: sol114

• From above note that y′ = 32 + 1.8y.

• Therefore y′i − y′ = 1.8(yi − y)

• So Sxy′ =∑(xi = x)(y′i − y′) =

∑(xi = x)1.8(yi − y) =

1.8Sxy = 1.8(90) = 162

4. In a simple linear regression, the sum of squares function is(6)

S(β0, β1) = 3500− 700β0 − 740β1 + 100β0β1 + 50β2

0+ 54β2

1.

Find the least squares values for β0 and β1.

• Using calculus

∂S

∂β0

= −700 + 100β1 + 100β0

and∂S

∂β1

= −740 + 100β0 + 108β1

• Rearranging after division by 100 and 4 respectively gives the twoequations

β0 + β1 = 7

and25β0 + 27β1 = 185

• Substuting β0 = 7− β1 from the first equation into the second gives

25(7− β1) + 27β1 = 185

or2 ∗ β1 = 185− 175

orβ1 = 10/2 = 5

• Substuting this back in the first equation gives

β0 = 7− 5 = 2

• This can also be solved by completing the square (twice).

2

https://www.coursehero.com/file/10709401/sol114/

This st

udy r

esou

rce w

as

share

d via

Course

Hero.co

m

Page 3: sol114

• Considering S as a function of β0 allows one to write

S = 50(β0 −M)2 +D

where

M =700− β1

100= 7− β1

andD = −50M2 + 3500− 740β1 + 54β2

1

• It follows that whatever the choice of β1, the best choice ofβ0 = 7− β1.

• Substituting this in S gives S as a function of β1 only, and

S = D = −50(7− β1)2 + 3500− 740β1 + 54β2

1

• Centering this quadratic gives

S = 4(β1 −m)2 + d

where m can be determined by equating the linear terms in thecentered and uncentered versions

−4(2)m = (2)7(50)− 740

orm = 5

• This is the best choice for β1 and substituting above givesβ0 = 7− 5 = 2

5. A random sample of 13 elementary school students is selected, and eachstudent is measured on a creativity score (x) using a well-defined testinginstrument and on a task score (y) using a new instrument. The taskscore is the mean time taken to perform several hand-eye coordinationtasks. The data are:

Use R to do the following questions. Make sure your output is integratedinto your responses. (Cut and paste as necessary.)

(a) Plot Tasks versus Creativity and comment on the form and strength(4)of the association. Be sure to label the axes.

3

https://www.coursehero.com/file/10709401/sol114/

This st

udy r

esou

rce w

as

share

d via

Course

Hero.co

m

Page 4: sol114

STUDENT CREATIVITY(X) TASKS(Y)AE 28 4.5FR 35 3.9HT 37 3.9IO 50 6.1DP 69 4.3YR 84 8.8QD 40 2.1SW 65 5.5DF 29 5.7ER 42 3.0RR 51 7.1TG 45 7.3EF 31 3.3

30 40 50 60 70 80

23

45

67

89

creativity

task

s

• There is a weak positive association between tasks andcreativity scores.

(b) Calculate the summaries Sxx, Sxy, Syy and X and Y .(8)

• a program was written for the entire question as shown below

• the solutions are inserted in each part

> creat.ass

4

https://www.coursehero.com/file/10709401/sol114/

This st

udy r

esou

rce w

as

share

d via

Course

Hero.co

m

Page 5: sol114

function(creativity, tasks){

plot(creativity,tasks)

ssxx=crossprod(creativity-mean(creativity))

ssxy=crossprod(creativity-mean(creativity),tasks-mean(tasks))

ssyy=crossprod(tasks-mean(tasks))

xbar=mean(creativity)

ybar=mean(tasks)

r = ssxy/sqrt(ssxx*ssyy)

b1hat=ssxy/ssxx

b0hat=ybar-b1hat*xbar

postscript("tcplot",horizontal=F)

plot(creativity,tasks)

abline(b0hat,b1hat)

yhat = b0hat+b1hat*creativity

e=tasks-yhat

esum=sum(e)

rex=cor(e,creativity)

dev.off()

postscript("res.creat",horizontal=F)

plot(creativity,e)

abline(0,0)

dev.off()

ssres=ssyy-ssxy^2/ssxx

tss=ssyy

ssreg=ssxy^2/ssxx

R2=ssreg/tss

return(list(ssxx=ssxx,ssxy=ssxy,ssyy=ssyy,ybar=ybar,xbar=xbar,r=r,

b0hat=b0hat,b1hat=b1hat,e=e,esum=esum,rex=rex,ssres=ssres,tss=tss,

ssreg=ssreg,R2=R2))

}

$ssxx

[,1]

[1,] 3463.077

$ssxy

[,1]

[1,] 220.0923

5

https://www.coursehero.com/file/10709401/sol114/

This st

udy r

esou

rce w

as

share

d via

Course

Hero.co

m

Page 6: sol114

$ssyy

[,1]

[1,] 44.53077

$ybar

[1] 5.038462

$xbar

[1] 46.61538

(c) Use these data summaries to calculate the correlation coefficient.Does the value agree with your visual assessment in (a)?(2)

• the correlation of .56 confirms the weak positive association inthe plot

$r

[,1]

[1,] 0.5604588

(d) Use these summaries to calculate the least squares values for theintercept and slope.(2)

b0hat

[,1]

[1,] 2.075869

$b1hat

[,1]

[1,] 0.06355398

(e) Add the least squares line to the plot in (a).(1)

• see above

(f) Obtain the residuals, ei = yi − yi. Calculate their sample mean to(3)verify it is zero, and the correlation with X to verify it is also zero.

$e

[1] 0.6446202 -0.4002577 -0.5273656 0.8464327 -2.1610928 1.3855975

[7] -2.5180275 -0.7068769 1.7810662 -1.7451355 1.7828787 2.3642026

[13] -0.7460418

6

https://www.coursehero.com/file/10709401/sol114/

This st

udy r

esou

rce w

as

share

d via

Course

Hero.co

m

Page 7: sol114

$esum

[1] 8.881784e-16

$rex

[1] -2.420007e-16

• The sum of the residuals and their correlation with tasks areessentially zero.

(g) Plot the residuals versus X . Do the residuals look random?(2)

30 40 50 60 70 80

−2−1

01

2

creativity

e

• the residuals look random

(h) Obtain the residual, regression and total sums of squares, using thedata summaries.(6)

$ssres

[,1]

[1,] 30.54303

$tss

[,1]

[1,] 44.53077

$ssreg

[,1]

7

https://www.coursehero.com/file/10709401/sol114/

This st

udy r

esou

rce w

as

share

d via

Course

Hero.co

m

Page 8: sol114

[1,] 13.98774

(i) What is the value of the coefficient of determination, i.e. what(2)proportion of the variation in Tasks is explained by Creativity?

$r2

[,1]

[1,] 0.3141141

8

https://www.coursehero.com/file/10709401/sol114/

This st

udy r

esou

rce w

as

share

d via

Course

Hero.co

m

Powered by TCPDF (www.tcpdf.org)