25
Lecture 3 HSPM J716

Lecture 3 HSPM J716. Efficiency in an estimator Efficiency = low bias and low variance Unbiased with high variance – not very useful Biased with low variance

Embed Size (px)

Citation preview

Page 1: Lecture 3 HSPM J716. Efficiency in an estimator Efficiency = low bias and low variance Unbiased with high variance – not very useful Biased with low variance

Lecture 3

HSPM J716

Page 2: Lecture 3 HSPM J716. Efficiency in an estimator Efficiency = low bias and low variance Unbiased with high variance – not very useful Biased with low variance

Efficiency in an estimator

• Efficiency = low bias and low variance

• Unbiased with high variance – not very useful

• Biased with low variance -- worthless

Page 3: Lecture 3 HSPM J716. Efficiency in an estimator Efficiency = low bias and low variance Unbiased with high variance – not very useful Biased with low variance

A no-variance, reliable estimator?

• The 0 estimator

Page 4: Lecture 3 HSPM J716. Efficiency in an estimator Efficiency = low bias and low variance Unbiased with high variance – not very useful Biased with low variance

Eyeball vs. Least squares for assignment 1

• http://hspm.sph.sc.edu/COURSES/J716/demos/StudentLines/StudentLines.html

Page 5: Lecture 3 HSPM J716. Efficiency in an estimator Efficiency = low bias and low variance Unbiased with high variance – not very useful Biased with low variance

Hypothesis testing – parallels among the coin toss, card trick, and assignment 1A experiments

• A statistic calculated from our data• A critical value for that statistic calculated

theoretically based on a hypothesis about how the data were generated

• If our statistic were greater than the critical value, we would reject the hypothesis.

Page 6: Lecture 3 HSPM J716. Efficiency in an estimator Efficiency = low bias and low variance Unbiased with high variance – not very useful Biased with low variance

Hypothesis testing – all about calculating the probability of what you got and drawing an inference

• With the coin toss experiment– A statistic calculated from our data• Counted how many tails came up

– A critical value for that statistic calculated theoretically based on the hypothesis that the coin was fair• 5 consecutive results that are all the same

– When our statistic was greater than the critical value, we rejected the hypothesis

Page 7: Lecture 3 HSPM J716. Efficiency in an estimator Efficiency = low bias and low variance Unbiased with high variance – not very useful Biased with low variance

Hypothesis testing – all about calculating the probability of what you got and drawing an inference

• With the card experiment– A statistic calculated from our data• Counted how many times I guessed the card

– A critical value for that statistic calculated theoretically based on the hypothesis that the any of 52 cards could come up• Even one right guess has a probability less than 0.05, so

the critical value is 1.

– When our statistic was as big as the critical value, we rejected the hypothesis

Page 8: Lecture 3 HSPM J716. Efficiency in an estimator Efficiency = low bias and low variance Unbiased with high variance – not very useful Biased with low variance

T statistic hypothesis tests calculate a probability and draw an inference

• With the assignment 1A spreadsheet– A statistic calculated from our data• The estimated coefficient divided by its standard error

– A critical value for that statistic calculated theoretically based on the hypothesis that the true line’s slope is 0.• 2.571

– When our statistic is greater than the critical value, we reject the hypothesis

Page 9: Lecture 3 HSPM J716. Efficiency in an estimator Efficiency = low bias and low variance Unbiased with high variance – not very useful Biased with low variance

Not rejecting a false hypothesisType II error in assignment 1A part 2

Page 10: Lecture 3 HSPM J716. Efficiency in an estimator Efficiency = low bias and low variance Unbiased with high variance – not very useful Biased with low variance

How the assumptions apply to the eyeball line and the least squares line

Page 11: Lecture 3 HSPM J716. Efficiency in an estimator Efficiency = low bias and low variance Unbiased with high variance – not very useful Biased with low variance

Assumption 1 is that there is a true line and that what you see differs from the

true line because of random errors up or down for each point.

• Eyeball line: It's why you drew a line through the points, instead of using a curve or a wiggly line that goes from one point to the next.

• Least squares: It’s why you built a spreadsheet that calculates the slope and intercept of a line.

Page 12: Lecture 3 HSPM J716. Efficiency in an estimator Efficiency = low bias and low variance Unbiased with high variance – not very useful Biased with low variance

Assumption 2 is that the errors have an expected value of 0.

• Eyeball line: it's why you try to draw the line through the middle of the points, rather than off to one side or tilting differently.

• Least squares: The average of the residuals is 0.

• (The residuals are your estimates of the errors.)

Page 13: Lecture 3 HSPM J716. Efficiency in an estimator Efficiency = low bias and low variance Unbiased with high variance – not very useful Biased with low variance

Assumption 3 is that the errors all have the same variance.

• Eyeball line: It's why you don't favor one point over another in drawing the line.

• Least squares: The spreadsheet’s sum and average rows are simples sums and averages. No data row gets a different weight from another.

Page 14: Lecture 3 HSPM J716. Efficiency in an estimator Efficiency = low bias and low variance Unbiased with high variance – not very useful Biased with low variance

Assumption 4 is that the errors are independent, not correlated with

each other.

• Eyeball line: It's why you predict for X=800 using a point on the line

• Least squares: Its why you predict for X=800 with 800*slope + intercept.

Page 15: Lecture 3 HSPM J716. Efficiency in an estimator Efficiency = low bias and low variance Unbiased with high variance – not very useful Biased with low variance

Confidence interval for a coefficient

• Coefficient ± its standard error × t from table• 95% probability that the true coefficient is in

the 95% confidence interval?• If you do a lot of studies, you can expect that,

for 95% of them, the true coefficient will be in the 95% confidence interval.

• If 0 is in the confidence interval, then the coefficient is not significant.

Page 16: Lecture 3 HSPM J716. Efficiency in an estimator Efficiency = low bias and low variance Unbiased with high variance – not very useful Biased with low variance

Assignment 2

• All regression results are the same• Graphs differ• Need reason to use or doubt least squares

prediction• The reason is in the form of rejecting one or

more of the assumptions

Page 17: Lecture 3 HSPM J716. Efficiency in an estimator Efficiency = low bias and low variance Unbiased with high variance – not very useful Biased with low variance

Durbin-Watson statistic

• Serial correlation– Finds significant pattern for clinic 2

N

i i

N

i ii

u

uuDW

1

2

2

21)(

Page 18: Lecture 3 HSPM J716. Efficiency in an estimator Efficiency = low bias and low variance Unbiased with high variance – not very useful Biased with low variance

Confidence interval for prediction

• The hyperbolic outline

Page 19: Lecture 3 HSPM J716. Efficiency in an estimator Efficiency = low bias and low variance Unbiased with high variance – not very useful Biased with low variance

Formal outlier test?

• Use confidence interval of prediction– With and without the suspect point?

• How do you predict when your data have an outlier?– Totally ignoring it seems wrong.– So does letting it sway your results too much.– Investigate and use judgment.

Page 20: Lecture 3 HSPM J716. Efficiency in an estimator Efficiency = low bias and low variance Unbiased with high variance – not very useful Biased with low variance

Multiple regression

• 3 or more dimensions• 2 or more X variables

• Y = α + βX + γZ + error• Y = α + β1X1 + β2X2 + … + βpXp + error

Page 21: Lecture 3 HSPM J716. Efficiency in an estimator Efficiency = low bias and low variance Unbiased with high variance – not very useful Biased with low variance

Fitting a plane in 3D space

• Linear assumption– Now a flat plane– The effect of a change in X1 on Y is the same at all

levels of X1 and X2 and any other X variables.

• Residuals are vertical distances from the plane to the data points floating in space.

Page 22: Lecture 3 HSPM J716. Efficiency in an estimator Efficiency = low bias and low variance Unbiased with high variance – not very useful Biased with low variance

Multiple regression

• Separating effects– Example from literature– Example from handout

Page 23: Lecture 3 HSPM J716. Efficiency in an estimator Efficiency = low bias and low variance Unbiased with high variance – not very useful Biased with low variance

β interpretation

• in Y = α + βX + γZ + error• β is the effect on Y of changing X by 1, holding

Z constant.• When X is one unit bigger than you would

predict it to be from what Z is, then we expect Y to be β more than what you would predict it would be from what Z is. – Those prediction are based on linear relationships.

Page 24: Lecture 3 HSPM J716. Efficiency in an estimator Efficiency = low bias and low variance Unbiased with high variance – not very useful Biased with low variance

β-hat formula

Page 25: Lecture 3 HSPM J716. Efficiency in an estimator Efficiency = low bias and low variance Unbiased with high variance – not very useful Biased with low variance

LS

• Spreadsheet as front end• Word processor as back end• Interpretation of results– Coefficients– Standard errors– T-statistics– P-values

• Prediction