Ec 22613 test

7/27/2019 Ec 22613 test

1/18

EC226

(continued)1

THE UNIVERSITY OF WARWICK

Summer Examinations 2012/2013

Econometrics 1

Time Allowed: 3 Hours, plus 15 minutes reading time during which notes may be made (onthe question paper) BUT NO ANSWERS MAY BE BEGUN.

Answer ALL EIGHT questions in Section A and ANY THREE questions from Section B.Section A carries 52 marks in total. Each of the questions in Section B is worth 16 marks.

AnswerSection A questions in one booklet and Section B questions in a separate booklet.

Statistical Tables and a Formula Sheet are provided. Approved pocket calculators are allowed.

Read carefully the instructions on the answer book provided and make sure that the particularsrequired are entered on each answer book.

Section A

1. A model for BMI was estimated by OLS on 1832 individuals aged above 50 year old(robust standard errors reported in brackets).

2ln( ) 1.006 0.665 ln( ) 0.086[ln( ) ] 0.048

(0.055) (0.265) (0.032) (0.011)

i i i i iBMI age age M e

(1)

RSS=8.874, TSS=9.556BMI Body Mass Index.age Age in years.

M =1 if individual is male

(a) At the 5% level, calculate the power of the test that the coefficient on Mis equal

to zero, against a 2-sided alternative, given the true value of the coefficient is0.020.

(2 marks)(b) At the 1% significance level test the joint significance of the above coefficients.

(2 marks)(c) At the 5% significance level test that the elasticity of BMI with respect to age is

zero for an individual aged 60, given the covariance between the coefficient on2ln( ) and [ln( ) ]

i iage age is -0.006.

(2.5 marks)

7/27/2019 Ec 22613 test

2/18

7/27/2019 Ec 22613 test

3/18

EC226

(continued)3

4. Consider the following model:

1 1 2 2 , 1, ,i i i ix x u i n (1)

Discuss the consequences of estimating equation (1) by OLS, in terms of the OLSestimators, b1 and b2, and the standard errors of these estimators, in each of the followingcircumstances:

(a)2i

is divided by 100.

(1.5 marks)

(b)

2

1 1

2

2 1

1,

V( )= 1, ,ii n

u i n n

(1.5 marks)

(c)i

is measured with error.

(1.5 marks)

(d)2 1

3i i

x .

(2 marks)

5. Answer the following questions:

a) Describe one assumption underlying Ordinary Least Squares that we commonly rely onwhich, if violated, yields either biased or inconsistent parameter estimates. Be sure toexplain why your answer is correct.

(3.5 marks)

b) What two assumptions must be satisfied for a variable to be an appropriate instrument inan Instrumental Variables estimation?

(3 marks)

7/27/2019 Ec 22613 test

4/18

EC226

(continued)4

6. Economists are very interested in the impact macroeconomic shocks have on time seriesdata. The figure below is a graph of the monthly Brazilian inflation rate from January, 1974

to June, 1993 (234 observations).

The data look like they may be a unit root, so we run a Dickey-Fuller test with 3 lags and getthe following output:

Augmented Dickey-Fuller test for unit root Number of obs = 230

-------- Interpolated Dickey-Fuller --------Test 1% Critical 5% Critical 10% Critical

Statistic Value Value Value

Z(t) -5.451 -3.997 -3.433 -3.133

MacKinnon approximate p-value for Z(t) = 0.0000

(Question 6 continued)

7/27/2019 Ec 22613 test

5/18

EC226

(continued)5

(Question 6 continued)a) Based on the results above, what do you conclude about the presence of a unit root in the

Brazilian inflation rate at a 5% significance level?(2 marks)

b) Notice in the figure that the inflation rate drops sharply at several points in the late 1980sand early 1990s. This reflects five government ``shock plans'' introduced with theexplicit goal of reducing the (hyper-) inflation that was then occurring. Do you thinkyour previous unit root test is appropriate given the presence of these shock plans? Whyor why not?

(2 marks)

c) Suppose (as is the case) you knew exactly in which months the government shock planswere in effect. How might you test for a unit root in the Brazilian inflation rate for thosemonths in which the shock plans weren't in effect?

(2.5 marks)

7. Answer both of the following questions:

a) Why might it be important to include a time trend or seasonal dummies in aneconometric model? In particular, what could go wrong if you were to fail to includethem?

(3.5 marks)

b) Provide an empirical example not used in class to illustrate how failing to include a timetrend or seasonal dummies could cause the problem you described in your answer above(you can make it up as long as it makes good economic sense).

(3 marks)

7/27/2019 Ec 22613 test

6/18

EC226

(continued)6

8. A major topic within industrial organization is the study of the boundaries of the firm. Whyare some products exchanged in markets while others are ``traded'' within firms? And whatdecides the type of firm, be it a corporation, a partnership, or some other organizationalstructure?

Government regulation can sometimes play a role. There are two main types of firms ininsurance markets: mutual insurance companies and stock insurance companies. Mutualinsurance companies are (mutually) owned by their policyholders, while stock insurancecompanies are owned by their shareholders. One interesting development in the USinsurance market is that mutual companies share of the market has fallen from somethinglike 75% as recently as 1940 to something like 15% today.

To analyze this question, a researcher collected data on 881 new firm formations between1900 and 1949. Some key summary statistics for his data are below:

Variable | Obs Mean Std. Dev. Min Max

----------------+--------------------------------------------------------mutual | 881 .2338252 .4235027 0 1

mutual_favored | 881 .4971623 .500276 0 1

reorg | 881 .2587968 .4382226 0 1reorg_x_favored | 881 .1169126 .3214986 0 1

const | 881 1 0 1 1

He hypothesized that, in general, firms preferred to be stock insurance companies, but weresometimes discouraged from doing that if the initial assets that had to be put up when thefirm was founded were higher for stock companies than for mutual companies. To explorethe importance of any such favoritism, he specified the following Dummy DependentVariable models:

1 2_it it it it mutual mutual favored reorg u

1 2 3_ _ _it it it it it mutual mutual favored reorg reorg x favored u

whereit

mutual measured the latent benefit to company i in year t from forming a mutual

insurance company instead of a stock insurance company, _ itmutual favored was a dummyvariable equal to 1 if it took less capital to start an insurance company as a mutual company

relative to a stock company,it

reorg was a dummy variable equal to one if the new company

was simply a corporate re-organization of an existing insurance company, and

_ _it

reorg x favored was the interaction of _it

mutual favored andit

reorg .

(Question 8 continued overleaf)

7/27/2019 Ec 22613 test

7/18

EC226

(continued)7


He estimated both of these specifications using a Logit model and obtained the followingresults. In the table, the key estimated coefficients are reported with their standard error in

parentheses underneath.

a) What is the marginal effect of a mutual company being favored on the probability a newcompany was a mutual company in the results in column (1)? Since whether a mutualcompany is favored is a dummy variable, for full marks use the discrete method forcalculating marginal effects.

(4 marks)

b) What is the marginal effect of a mutual company being favored on the probability a newcompany that was reorganizing was a mutual company using the results in column (2)?

(NOTE:

If you do not have enough time to calculate the marginal effect, you can earnpartial credit by saying how you would calculate it.)(2.5 marks)

7/27/2019 Ec 22613 test

8/18

EC226

(continued)8

Section B

9. Using data from the US, a researcher estimates a model for happiness, measured on ascale of 1-6, for a sample of 241 full-time working males estimated by OLS. 2ln( ) 0.86 0.115 ln( ) 0.056 ln( ) 0.0072[ln( )] 0.056

(0.17) (0.065) (0.024) (0.002) (0.031)

-0.012

(0.005)

i i i i i

i

h w age age m

nc

(1)

RSS= 5.826, TSS=6.927

where: h= Happiness, w = Gross monthly pay ($000), age=Age,m = 1 if married (or cohabiting), 0 otherwise, nc = Number of children.

(a) Find the age level at which happiness is a minimum.(2 marks)

(b) The researcher reports diagnostic test results for the Jarque-Bera normality testof 7.56 and a Breusch-Godfrey test of serial correlation of lag 1 of 2.11.Explain what you conclude?

(3 marks)

(c) Including a variable for 2[ln( )]w into (1) above yielded a t-ratio on the new

variable of 1.365. Calculate the RSS for this new model along with the

2

R and2R for the new model.(3 marks)

(d) The researcher re-estimated model (1) with 10 new observations added to theoriginal dataset, but with the addition of 10 new dummy variables one for eachof these last 10 people. The RSS for this model was 5.826. Undertaking an F-test of this model compared to that in (1) yielded an F-test of 0. Explain whattest the researcher has undertaken. What conclusion do you draw from theresult?

(3 marks)(e) Trying to remembering the lectures from the module EC226 and the issue with

collinearity the researcher dropped the dummy variable for the last individualand obtained a RSS of 5.889 and this when compared to model (1) yielded anF-test of 2.530. What should you now conclude?

(2 marks)(f) A friend of the researcher explains that pay is likely to be endogenous, but

your problem could be resolved by the fact that the different states in the UShave different income tax rates. Explain the logic behind the friendsobservation.

(3 marks)

7/27/2019 Ec 22613 test

9/18

EC226

(continued)9

10. On 4th June 2012 a random sample of 394 petrol stations were surveyed in Wales and the

price of standard unleaded petrol recorded. The following specification has beenestimated to try and explain variation in petrol prices using OLS

0 1 2 3 4 5ln( ) ln( ) ln( )i i i i i i ipetrol d npet den ue A (1)

RSS = 5.292.

Where: petrol= Petrol price in , d= Distance in kilometres to the nearest petrol stationowned by a different company, npet = Petrol price charged the nearest petrol stationowned by a different company, den = population density per square kilometre measured

at the petrol station, ue = unemployment rate (measured in %) in the district around thepetrol station, and A = 1 if station located on an A-road (or motorway), 0 otherwise.

(a) Interpret the coefficient 4 and 5 .

(2 marks)(b) Ten dummy variables were added to equation (1), based on 11 regions of Wales,

where Cardiff (located in South Wales) was one region and excluded from themodel as the default. The RSS from this new equation was 5.001. At the 5%significance level test this model against model (1) being careful to outline thenull and alternative hypotheses.

(3 marks)

(c) Four dummy variables were added to equation (1), for: West Wales (made up of2 regions), North Wales (made up of 3 regions), East Wales (made up of 2regions) and Swansea (also located in South Wales and defined as a region). TheRSS from this new equation was 5.113 At the 5% significance level, test thismodel against that in (b), being careful to outline the null and alternativehypotheses.

(4 marks)(d) Estimating the model in (b) separately for the 96 petrol stations based in Cardiff

and Swansea and the 298 petrol stations based in the rest of Wales yielded RSSof 1.223 and 3.699, respectively. At the 5% significance level test this modelagainst the model outlined in (b), being careful to outline the null and alternativehypotheses.

(4 marks)(e) Looking at the model in (1), briefly outline why you might be worried about the

reliability of the OLS results.(3 marks)

7/27/2019 Ec 22613 test

10/18

EC226

(continued)10

11. Consider the following model for the performance of secondary schools in each of 228

local authority averaged over a five year period (2004-2008) as the percentage of all itsstudents gaining 5+ A-C grades in GCSE exams.:

3 4 5ln( ) ln ln ln( )i i i i i iP N Y F S (1)

where P is the percentage of students with 5+ A-C GCSE results averaged over the

period 2004-2008, is the average family income level in each local authority in 2003,Y is the average local authority expenditure per pupil over the period 2004-2008, F isthe proportion of the local authority children getting free school meals over the period

2004-2008 and S is the proportion of the children in the local authority whose 2nd

language is English in 2006.

The results from estimating this equation using Ordinary Least Squares (with robuststandard error in parentheses):

ln( ) 2.64 0.067 ln 0.215 ln 0.118 ln( ) 0.0153

(0.032) (0.062) (0.044) (0.082)

i i i i i iP N Y F S e

(2)

RSS=10.85, ESS= 3.91.

(a) Explain why we may want to estimate the model with robust standard errors. Discuss

the consequences and risks of using and not using robust standard errors.(3 marks)

(b) Some authors have suggested using local authority density as an instrument foraverage local authority expenditure per pupil. Discuss whether or not such a variablemight be a valid instrument.

(3 marks)(c) Explain how you undertake the J-test for instrument exogeneity. Why cannot the J-

test be conducted to test the appropriateness if the instrument local authority density?(3 marks)

(d) The OLS coefficient estimate [standard error] on the local authority expenditure(ln(Y)) variable from estimating (1) is 0.215 [0.062]. The corresponding IV

coefficient estimate [standard error] on this variable, from using two instruments (ofwhich density is one), is 0.124 [0.102], with an instrument relevance test of 4.81 anda J-test of 6.62. What do you conclude?

(4 marks)(e) In 2006 Local Authorities who had more than 5.2% of the students of secondary

school age whose first language was not English were given an extra 10% to spendper pupil from central Government funds. Briefly layout a model which you couldestimate which attempts to identify the effect of local authority expenditure per pupilon percentage of students with 5+ GCSE results.

(3 marks)

7/27/2019 Ec 22613 test

11/18

EC226

(continued)11

12. This question addresses the economic returns to migration, a key issue facing manypotential migrants seeking a better life in another country. To address this question, severalauthors recently collected detailed historical data on the migration of men from Norway tothe United States during the age of mass migration (1850-1913), during which long-distance migration happened at an exceptionally high rate. Like many economic historians,they argued that understanding the returns to migration in this period provides insightsabout the incentives to migrate today.

They focused on migration of men between Norway and the US largely for data reasons.Norway has digitized census data from 1865 and 1900 and the authors collected US censusrecords on Norwegian-born men living in the US in 1900 using the genealogy website

Ancestry.com. They then matched men by name and age from their birth families inNorway in 1865 to the labor market in either Norway or the US in 1900. Clever!

The basic regression they wished to run was a simple earnings equation:

1 2 3ln i i i i iwage age agesq US u

where lni

wage is the log earnings of individual i (given as the mean year-1900 earnings

of i 's occupation in either Norway or the US),i

age andi

agesq are the age and its square of

individual i in 1900, andi

US is a dummy variable equal to one if individual i migrated to

the US by 1900. Summary statistics for the key variables in the analysis are below.

For reasons that will become clear later in the question, the number of men in i 's family is

important and is included in the summary statistics.

Variable | Obs Mean Std. Dev. Min Max-------------+-------------------------------------------------------------------

wage | 2655 431.7593 224.9274 90.23226 1500

lnwage | 2655 5.946887 .4900244 4.502387 7.313221age | 2655 43.72618 3.573321 38 50

agesq | 2655 1924.742 313.7008 1444 2500

US | 2655 .1227872 .3282546 0 1

men_in_family| 2655 2.165348 .4245713 2 4


7/27/2019 Ec 22613 test

12/18

EC226

(continued)12


They began with a simple OLS regression. Column (1) below reports the results. In thetable, the key estimated coefficients are reported with their standard error in parenthesesunderneath.

a) Suppose for the moment that whether someone migrated to the US or not was random.

What does the coefficient on the US dummy, 3

, imply about the returns to migration inthis sample of Norwegian men?

(3 marks)

Of course, migration is not random: men choose to migrate. The authors decide to try to control

for the non-random migration process by including household effects. Let i continue to index

individuals, but now let j index households. Of course, there can be multiple men per household

(e.g. brothers), some of whom migrated to the US and some of whom didnt. Here is their neweconometric model:

1 2 3 4

2ln

J

ij ij ij ij j j ij

jwage age agesq US d u

where each of the variables is defined as above (but now includes a j subscript) and

jd is a

dummy variable for each of the J households in the data (with associated parameter 4 ). There

are 1,259 households for the 2,655 men, thus 1,2584 s.


7/27/2019 Ec 22613 test

13/18

EC226

(continued)13


b) It turns out this regression is essentially a fixed-effects regression. Suppose you wanted

to estimate1

,2

, and3

without cluttering up the results estimating all the4

's.

How would you do that? In particular, how would you transform the data so that youdidn't have to include those 1,000+ household dummy variables? (If you find it difficultto describe what you would do using words, feel free to simply define new variables andwrite down the new regression equation and how you would estimate it.)

(4 marks)

c) Note in the sample statistics above that there are at least two men in each family in theestimation dataset. Why is this necessary?(3 marks)

Column (2) above reports the estimates of 1 , 2 ,and 3 for the fixed-effects regression.

d) Compare the estimated returns to migration ( 3 ) in each specification.

i) What effect does including household fixed effects in your regression do to theestimated returns to migration?

(2 marks)

ii) Lets call the unobserved heterogeneity,j

d , household ability. What does your

answer to part (i) imply about the correlation betweenij

US and this unobserved

household ability? In particular, are migrants drawn from households that haverelatively high or relatively low unobserved household ability? For full marks, usethe bias formula to justify your answer.

(4 marks)

7/27/2019 Ec 22613 test

14/18

EC226

(continued)14

13. Economists are very interested in whether economic development promotes the developmentof democratic institutions. Each of the 34 existing OECD countries are democratic, whilemany poor parts of the world are non-democratic. A keystone of both political economicsand development is that higher per-capita income causes democracy.

Whether or not that is true is a difficult question, however. To explore it, a group ofresearchers recently collected a dataset on measures of democracy and income per capitaacross a large number of countries. This question analyzes the data for the Ivory Coast (Coted'Ivoire). Democracy is measured by the Freedom House Political Rights Index in which acountry scores highly if political rights come closest to the ideals suggested by a checklist ofquestions (e.g. are there free and fair elections, etc.). Income per capita is measured by GDP

per capita. The authors have annual data from 1972 to 2000 for the Ivory Coast on thesemeasures.

a) As we often do with time-series data, we worry that one or both of these series is a unitroot. Which type of Unit Root test (Model A, B, or C) would you use for these data?Why?

(2 marks)


7/27/2019 Ec 22613 test

15/18

EC226

(continued)15


It turns out that you can reject the hypothesis that the data have a unit root for both series. Younext begin to explore the relationship between democracy (democracy_score) and income percapita (gdp_percap), running the following regressions:

1 1_ log_ _

t t tdemocracy score gdp percap u

1 1 3 3_ log_ _ log_ _

t t t t democracy score gdp percap gdp percap u

1 1 3 3

1

_ log_ _ log_ _

_

t t t

t t

democracy score gdp percap gdp percap

democracy score u

Summary statistics for the important variables are:

Variable | Obs Mean Std. Dev. Min Max---------------------+---------------------------------------------------------------democracy_score | 29 .1839081 .0516557 .1666667 .3333333

log_gdp_percap | 29 7.743131 .1511998 7.531815 8.022284

The regression results from these specifications are in Columns (1)-(3) below. In the table, thekey estimated coefficients are reported with their standard error in parentheses underneath.


7/27/2019 Ec 22613 test

16/18

EC226

(End)16

(Question 13 continued )

b) Unfortunately, we haven't done any checking yet for serial correlation. How would youdo this for the model in column (1)? Briefly describe the steps you would undertake,

being sure to write out any regression equations you use in the process as well as yournull and alternative hypotheses.

(4 marks)

c) At the bottom of each column in the row labelled ``BG'' is the test statistic for theBreusch-Godfrey test with a one-year lag length. Test the null hypothesis of no serialcorrelation for your column-(1) results.

(3 marks)

d) What is the long-run impact of a change in the log GDP per capita on democracy in yourcolumn (3) estimates?

(3 marks)

You'll note that all the coefficients in column (3) are individually significant at the 5% level,suggesting we've captured some important dynamics between democracy and income per capitain the Ivory Coast.

e) Carefully look again at the raw data in the figures above. Do you think the dynamic

behavior we've identified is spurious (i.e. a statistical accident) or legitimate? Why orwhy not?(4 marks)

7/27/2019 Ec 22613 test

17/18

17

EC 226 Econometrics 1: Formulas

Two Vari able L in ear Regression M odel: i i iY X

Least Squares estimates: 2( )( ) / ( )t t tb X X Y Y X X

a Y bX

Estimation of the error variance: 2 2 2 2/ ( 2) { ( ) ( ) } / ( 2)e t ts SSE n Y Y b X X n

Tests on regression slope coefficient:0 0

:H , 0( ) / bt b s , where2/ ( )

b e ts s X X

Standard Error of Prediction ofYn+1 given Xn+1:2

00 0 2

1 ( )( | ) {1 }

( )e

t

X Xse Y X s

n X X

M ul tiple Regression M odel: 1 1, 2 2, ,..t t t k k t t Y X X X

Estimation of the error variance:

2

2 1

1 1

n

i

ie

eRSS

sn k n k

R-squared: 2 1RSS

RTSS

; R-bar-squared: 2/ ( 1)

1/ ( 1)

RSS n kR

TSS n

Tests on regression coefficients:

(i) Single coefficient is equal to some hypothesised value:

0 0: , 1, 2..,i iH i k , 0 1( ) / ~ii i b n k t b s t

(ii) All slope coefficients are equal to zero:

0 1 2: .. 0kH ,2

, 12

1~

1k n k

R n kF F

R k

(iii) A sub-set of coefficients are equal to zero:

0 1 2: .. 0jH ,

*

, 1( ) 1 ~ j n kSSE SSE n k F FSSE j

7/27/2019 Ec 22613 test

18/18

18

(iv) The coefficients from the first sub-period are equal to those from the second sub-period:

1 2 1 2 1 2

0 1 1: , , , k kH ,1 2

1, 2( 1)1 2

[ ( )] /( 1)~

( ) /[ 2( 1)]

R

k n k

RSS RSS RSS kF F

RSS RSS n k

Durbin-Watson Test statistic:

2

1

2

2

1

( )

n

i i

in

i

i

u u

d

u

Durbins h-statistic:

2

2Estimate of variance of coefficient on lagged dependent variable, 1 ,

1 2c

c

n dh r r s

ns

Un i t Root Tests:

Augmented Dickey-Fuller: Null hypothesis 0 : 0H , Alternative hypothesis 1 : 0H

Model A: 11

q

t t j t j t

j

y y y

Model B: 11

q

t t j t j t

j

y y y

Model C: 11

q

t t j t j t

j

y t y y

Test statistic:

~ MacKinnon critical values( )se

Error Correction Model (ECM):

0 1 1 1 1 1

t t t t t t x x y s u

L imi ted D ependent Vari able M odel

Linear Probability Model: Pr[ 1]i i

Y X

Logit Model:exp( )

Pr[ 1] ( ) ( )1 exp( )

ii i i

i

XY F X X

X

Probit Model: 1/2 2Pr[ 1] ( ) (2 ) exp( / 2) ( )i iX

i i iY F X z dz X

Interpreting coefficients:( ) ( ( ))

( )i i i

j

ji i i

E Y F X

X X

,

where( ( ))

( )( )

ii

i

F Xf X

X

and ( )

if X is the probability density function.

For logit model: ( ) ( ) ( )[1 ( )]i i i i

f X X X X

For probit model: 1/2 2( ) ( ) (2 ) exp( / 2)i if X X z

Documents

Ec 22613 test