Assignment #1 due 4/12 - Metropolitan State Universityfaculty.metrostate.edu/BELLASAL/497/497S05Midterm-ans.doc · Web viewZero interior space is probably outside of the range of

Assignment #1 due 4/12 Spring 2005 Bellas
Midterm
You have three hours and twenty minutes to complete this exam. Answer all questions, and explain your answers. Fifty points total, points per part indicated in parentheses.
1. Linear regression involves estimating a linear relationship between one or more independent or explanatory variables and a dependent variable. Imagine that such a relationship has been estimated between the price of a car in thousands of dollars (Pi), the interior space in cubic feet (Si) and a dummy variable indicating whether it has four wheel drive (Di):
The estimated equation is:
Pi = 8.3 + 0.1Si + 2.3Di
A. Calculate the predicted price for a car with interior space of 100 cubic feet that does not have four wheel drive. (1)
8.3 + 0.1*100 + 2.3*0 = 18.3 or $18,300
B. What is the interpretation of the coefficient on the four wheel drive dummy variable? (1)
Other things being the same, a car with four wheel drive will be priced 2300 dollars more than a car that doesn’t have four wheel drive.
C. As you’re presenting these results to a hostile crowd, a heckler in the crowd asks you if you really believe that a car that has no interior space (Si=0) and doesn’t have four wheel drive (Di=0) would sell for $8,300. How do you respond? (1)
Zero interior space is probably outside of the range of the sample of cars on which the model is based, so the predictions of the model for a car with zero interior space is probably invalid.
2. Dummy variables take the value of 0 or 1 and allow qualitative factors to be represented in linear regression. In addition, interactive or slope dummies allow the effects of a second variable to vary from one qualitative group to another. For purposes of this question, imagine that the annual wage (in $1,000) of a person with a bachelors degree (Wi) is estimated as a function of their age (Ai) and whether or not they took a course in economics (Ei):
i.
i
i
1
0
i
A
W
e
b
Regression results showed positive values for β0 in all three models.
A. Imagine that you were to graph the predicted wage against age based on the results of the first model (model i.). Show what this would look like. (1)
B. Imagine that you were to graph the predicted wage against age based on the results of the third model (model iii.). Show what this would look like, being clear to specify the predictions for the economists and non-economists. (1)
C. What would it mean if the estimated coefficient on β3 were positive and significant? (1)
This would mean that as they age, economists’ wages increase at a faster rate than do non-economists’ wages.
D. Imagine that the estimated coefficients in the second model (model ii.) were
Wi = 12.0 + 1.5*Ai + 3.2*Ei
Calculate what the estimated coefficients would be if the economics dummy were replaced with a “didn’t take economics” dummy. (1)
Wi = 15.2 + 1.5*Ai - 3.2*DTEi
3. What would an economics class be without assumptions? This is especially true in an econometrics class because the basic regression model, conversationally known as ordinary least squares (OLS to its friends) relies on seven classical assumptions. If these assumptions are satisfied, OLS is the best linear unbiased estimator (BLUE) that can possibly exist. Without them, it is not.
A. One assumption is that the error term has constant variance. What is the eight-syllable term given to violation of this assumption? (1)
Heteroskedasticity
B. Another assumption is that no explanatory variable is a linear function of any other explanatory variable(s). What is the eight-syllable term given to the violation of this assumption? (1)
Multicollinearity
C. How do the above violations bias coefficient estimates? (2)
Neither one biases the coefficient estimates.
4. One of the classical assumptions is that the model is correctly specified, meaning that all relevant explanatory variables are included. Of course, you can’t include all relevant explanatory variables, there’s always something missing. In question #1, the example was given of estimating car prices as a function of interior space and a four wheel drive dummy variable. Imagine that some car manufacturers are seen as being cooler than others, but coolness isn’t something that can be quantified, so it is left out of the equation. How would the estimated coefficient on interior space be biased if cooler auto manufacturers tended to make smaller cars? (2)
Coolness would have a positive impact on a car’s price, and coolness and interior space (size) are negatively correlated. So, if coolness is excluded, this will have a negative bias on the estimated sign of size.
5. One linear regression hypothesis test that all regression packages do is an F-test of the explanatory power of the model.
A. What is the null hypothesis of this test? (1)
The null hypothesis is that all of the slope coefficients are zero.
B. If you get a p-value (known in SPSS as a SIG. value) of 0.038 for this F-test, what does this imply about the explanatory power of your model? (1)
This is a small p-value, which means that the null hypothesis should be rejected in favor of the alternative. That is, at least one of the slope coefficients is not zero.
6. As nice as the F-test is, the thing that most folks are really interested in is the t-test of significance of the estimated coefficients.
A. What is the null hypothesis of this test? (2)
The null hypothesis is that the coefficient in question is zero.
B. If you get a p-value of 0.237 for this t-test, what does this imply about the estimated coefficient on the variable in question? (2)
This implies that the estimated coefficient is not significantly different from zero.
C. If you get an estimated coefficient of 0.038 and an associated p-value of 0.237 for this t-test and someone asked you your best guess about the value of the coefficient on that variable, what value would you tell them? (2)
Your best guess as to the value is the estimated coefficient of 0.038, even if this is not significantly different from zero.
7. Here is some totally fake SPSS output. Calculate the correct values for the blanks. If you can’t calculate a value, make your best guess and justify it.
ANOVA
Model
A. (2) 2800 + 1200 = 4000
B. (2) Can’t calculate this, but because the R2 is so large (see part F) this is probably very small and probably 0.000.
C. (2) t = B/SE t = 7.00/3.50 = 2
D. (2) t = B/SE 1.00 = 3.00/SE SE = 3.00
E. (2) The t-value is 6.00, so the Sig. value is probably 0.000.
F. Calculate the R2 for this regression. (2) 2800/4000 = 0.70.
8. Among my favorite things about the Studenmund text are the four criteria for determining whether an explanatory variable should be added to a regression. Consider the following output from a regression (this is actual data) of the price of a house on its size in square feet and the number of bathrooms. You might also consider adding the size of the lot on which the house sits. Here are the regression results without and with lot size.
Model Summary
a.
Coefficients
a
-14944.2
34193.198
-.437
.663
178.794
26.432
.635
6.764
.000
-15255.3
23840.001
-.059
-.640
.523
8.873
4.250
.131
2.088
.038
(Constant)
SQFT
BATHS
LOT
Model
1
B
a.
Discuss whether or not lot size should be included in the regression based on Studenmund’s four criteria. (2)
Theory: Land value is part of the value, so it should be included.
Adj. R2: This increases when land is added, so it should be included.
t-test: The estimated coefficient on lot is positive and significant, so include it.
Bias: Estimated coefficients on SqFt and Baths doesn’t change much, don’t include it.
Overall, I would say that it should be included.
9. Imagine that you’re regressing the number of packs of cigarettes consumed annually (Ci) on the price of a pack of cigarettes in dollars (Pi). Offer an interpretation of the coefficient on price from each of the following models.
A. Ci = β0 – β1*Pi (2)
If the price increases by one unit the quantity consumed will fall by βi units.
B. LN(Ci) = β0 - β1*Pi (2)
If the price increases by one unit the quantity consumed will fall by 100% x βi .
C. LN(Ci) = β0 – β1*LN(Pi) (2)
The price elasticity of demand is βi .
10. I get some sick pleasure out of watching people worry about multicollinearity.
A. What options are available for detection of this multicollinearity? (2)
You can check to see if you have good explanatory power but no significant estimated coefficients.
You can look at correlation coefficients among the explanatory variables.
You can calculate variance inflation factors (VIFs) when you do a regression.
B. In one word, what should you do to address this problem in your regression? (1)
Nothing.
Most potential solutions to multicollinearity are worse than the multicollinearity is. Excluding an explanatory variable, for example, would introduce excluded variable bias whereas multicollinearity won’t bias coefficient estimates.
11. Heteroskedasticity is sometimes a problem in regression analysis.
A. Draw a scatterplot, being careful to label the axes correctly, that demonstrates heteroskedasticity. (2)
B. What are the consequences of heteroskedasticity. (2)
Estimated coefficients are not biased but the standard errors will be artificially small, so that estimated coefficients may appear to be significant when they aren’t really significant.
_1170879263.unknown
_1170879285.unknown
_1170528376.unknown
_1170528406.unknown
_1170528466.unknown
_1170528281.unknown

Documents

Assignment #1 due 4/12 - Metropolitan State Universityfaculty.metrostate.edu/BELLASAL/497/497S05Midterm-ans.doc · Web viewZero interior space is probably outside of the range of