27
1 Introduction to Econometrics The statistical analysis of economic (and related) data STATS301 Info Session (2015/2016) Titulaire: Christopher Bruffaerts Assistant: Lorenzo Ricci

Introduction to Econometrics€¦ · The dataset contains data on test performance, school characteristics and student demographic backgrounds for school districts in California

  • Upload
    others

  • View
    6

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Introduction to Econometrics€¦ · The dataset contains data on test performance, school characteristics and student demographic backgrounds for school districts in California

1

Introduction to Econometrics The statistical analysis of economic (and related) data

STATS301

Info Session (2015/2016)

Titulaire: Christopher Bruffaerts

Assistant: Lorenzo Ricci

Page 2: Introduction to Econometrics€¦ · The dataset contains data on test performance, school characteristics and student demographic backgrounds for school districts in California

2

Contacts Titulaire : Christopher Bruffaerts

E-mail : [email protected]

Assistant : Lorenzo Ricci

Office: H.4.152

Office Hour: Monday 14h-16h (or by appointment)

E-mail : [email protected]

Etudiants-Assistants (Help for the exercices sessions):

Lina Frank

Hélène Seynaeve please use the email wisely

Page 3: Introduction to Econometrics€¦ · The dataset contains data on test performance, school characteristics and student demographic backgrounds for school districts in California

3

Organisation of the Course

STAT-S-301: Introduction to Econometrics

5 ECTS

3 parts:

• Theory (Christopher Bruffaerts)

• Theoretical exercises (Lorenzo Ricci)

• Practical work on computer (Lorenzo Ricci)

Website : http://metricssbsem.wordpress.com/ Other material in : http://beta.respublicae.be (managed by students!)

Page 4: Introduction to Econometrics€¦ · The dataset contains data on test performance, school characteristics and student demographic backgrounds for school districts in California

4

Organisation of the Course - Schedule 1 - Theory

Friday 8-10 See Gehol

Saturday (2 times) 9-12 See Gehol

2 - Theoretical exercises (All Group) Thursday 10-12 See Gehol

The goal of “theoretical exercises” is to help to solve the exercises at the end of each chapter. You will find the solution of all exercises on Respublicae (student Forum). For any question regarding exercises and computer session, please contact Lorenzo Ricci.

Page 5: Introduction to Econometrics€¦ · The dataset contains data on test performance, school characteristics and student demographic backgrounds for school districts in California

5

Date Subject (theory & exercises) September 18 Friday Introduction and Review of Statistics

25 Friday Review of Statistics

26 Saturday Linear Regression October 1 Thursday Review of Statistics October 2 Friday Multiple regression

3 Saturday Multiple regression/ Model evaluation

8 Thursday Linear Regression

9 Friday Nonlinear Regression

15 Thursday Multiple Regression

16 Friday Panel Data

22 Thursday Nonlinear Regression /Model evaluation

23 Friday Instrumental Variable

29 Thursday Panel Data November 5 Thursday Instrumental Variables

13 Friday Instrumental Variable

19 Thursday Instrumental Variable

26 Thursday Instrumental Variables

27 Friday Time series December 3 Thursday Time Series Regression

4 Friday Quasi-experiment/Summary

10 Thursday Quasi-experiment/Summary  

Page 6: Introduction to Econometrics€¦ · The dataset contains data on test performance, school characteristics and student demographic backgrounds for school districts in California

6

Computer Sessions (by group):

September 28,29 STATA Introduction and Review of Statistics

October 5,6 Linear Regression/Multiple Regression

October 19,20 Nonlinear Regression and Panel Data

November 30 Instrumental Variable and Time Series

December 2 Regression

• You have to register for the Computer sessions next week

(~ next Tuesday)

• There is an announcement on monulb

• Registration will be held in the office H 3.159. (Nicole Vanderroost’s Office).

Page 7: Introduction to Econometrics€¦ · The dataset contains data on test performance, school characteristics and student demographic backgrounds for school districts in California

7

Organisation of the Course – Final Exam

Structure of the exam:

1. multiple choice questions

2. open question on theoretical exercise

3. open questions on practical exercise

Option 1: Final exam

Option 2: Problem sets + Mid-term exam + Final exam November 6

 • We will not give you copies of the old exams • however, there is a 2010 final exam and all the mid-terms on Respublicae

Page 8: Introduction to Econometrics€¦ · The dataset contains data on test performance, school characteristics and student demographic backgrounds for school districts in California

8

Organisation of the Course – Final grade

1) Final Exam (100% of the Grade)

2) The weighted average between i. Problem set (25%),

ii. Mid-term exam (25%), iii. Final Exam (50%).

Important: The final grade will be the best between the two options. Suggestion: Take the mid-term exam and work regularly on the personal work. Statistical evidence based on last years : On average the grade of those that take option 2 is 2 points higher

Page 9: Introduction to Econometrics€¦ · The dataset contains data on test performance, school characteristics and student demographic backgrounds for school districts in California

9

Organisation of the Course – Problem sets

• Written Reports

• STATA is the statistical software used for this Course.

• Reports should be Team work

NO FREE RIDER WITHIN THE TEAM IS ALLOWED

- You can work in groups (maximum 4 people) - but each student must participate actively on the report - More details on the Problem set will be given during the first

computer session

Page 10: Introduction to Econometrics€¦ · The dataset contains data on test performance, school characteristics and student demographic backgrounds for school districts in California

10

Reports and Waivers

• The exam will be at the beginning of January.

• The course is considered successful if the total score is greater than or equal to 10/20.

• Students who are repeating the year are exempted from the course if and only if they have obtained at least 10/20 before.

Page 11: Introduction to Econometrics€¦ · The dataset contains data on test performance, school characteristics and student demographic backgrounds for school districts in California

11

References

J. H. Stock and M. W. Watson, Introduction to Econometrics (third edition), Pearson Education Limited, 2012

Chapter Subject 2-3 Review of Statistics 4-5 Linear Regression 6-7 Multiple Regression 8 Nonlinear Regression 9 Model evaluation

10 Panel Data 12 Instrumental Variable 13 Quasi-experiment 14 Time Series Regression

• I strongly recommend you to buy the book

• Study the book, do not limit your study to reading the slides!!!

Page 12: Introduction to Econometrics€¦ · The dataset contains data on test performance, school characteristics and student demographic backgrounds for school districts in California

12

Brief Overview of the Course Economics suggests important relationships, often with policy implications, but virtually never suggests quantitative magnitudes of causal effects. Meaning of econometrics Econometrics is the application of mathematics, statistical methods, and computer science, to economic data and is described as the branch of economics that aims to give empirical content to economic relations (source : Wikipedia). Econometrics = « Economy » + « metrics »

Page 13: Introduction to Econometrics€¦ · The dataset contains data on test performance, school characteristics and student demographic backgrounds for school districts in California

13

Some questions we might ask ourselves • What is the quantitative effect of reducing class size on

student achievement? • What is the price elasticity of cigarettes?

• What is the effect on output growth of a 1 percentage

point increase in interest rates by the ECB? • How does another year of education change earnings?

Page 14: Introduction to Econometrics€¦ · The dataset contains data on test performance, school characteristics and student demographic backgrounds for school districts in California

14

This course is about using data to measure causal effects • Ideally, we would like an experiment • But almost always we only have observational data

o returns to education o cigarette prices o monetary policy

• Most of the course deals with difficulties arising from using observational to estimate causal effects :

o confounding effects (omitted factors) o simultaneous causality

Remember: “correlation does not imply causation”

Page 15: Introduction to Econometrics€¦ · The dataset contains data on test performance, school characteristics and student demographic backgrounds for school districts in California

15

In this course you will: • Learn methods for estimating causal effects using

observational data

• Learn some tools that can be used for other purposes, for example forecasting using time series data

• Learn to evaluate the regression analysis of others – this

means you will be able to read/understand empirical economics papers in other econ courses

• Get some hands-on experience with regression analysis in your problem sets.

Page 16: Introduction to Econometrics€¦ · The dataset contains data on test performance, school characteristics and student demographic backgrounds for school districts in California

16

A first example of an empirical problem Class size and educational output

Policy question:

What is the effect on test scores (or some other outcome measure) of reducing class size by one student per class? by 8 students/class?

Answer : We must use data to find out (is there any way to answer this without data?)

Page 17: Introduction to Econometrics€¦ · The dataset contains data on test performance, school characteristics and student demographic backgrounds for school districts in California

17

The California Test Score Data Set Data: California school districts (n = 420) The dataset contains data on test performance, school characteristics and student demographic backgrounds for school districts in California.

Variables:

1. 5th grade test scores (Stanford-9 achievement test, combined math and reading), district average

2. Student-teacher ratio (STR) = number of students in the district divided by number of full-time equivalent teachers

Page 18: Introduction to Econometrics€¦ · The dataset contains data on test performance, school characteristics and student demographic backgrounds for school districts in California

18

Initial look at the data: univariate

This table does not tell us anything about the relationship between test scores (TS) and the Student Teacher Ratio (STR).

Page 19: Introduction to Econometrics€¦ · The dataset contains data on test performance, school characteristics and student demographic backgrounds for school districts in California

19

The California Test Score Data Set

Page 20: Introduction to Econometrics€¦ · The dataset contains data on test performance, school characteristics and student demographic backgrounds for school districts in California

20

Initial look at the data: bivariate Scatterplot of test score versus student-teacher ratio

Do districts with smaller classes have higher test scores?

Page 21: Introduction to Econometrics€¦ · The dataset contains data on test performance, school characteristics and student demographic backgrounds for school districts in California

21

The empirical strategy

We need to get some numerical evidence on whether districts with low STRs have higher test scores – but how?

1. Compare average test scores in districts with low STRs to those with high STRs (“estimation”)

2. Test the hypothesis that the mean test scores in the two types of districts are the same, against the hypothesis that they differ (“hypothesis testing”)

3. Estimate an interval for the difference in the mean test scores, high versus low STR districts (“confidence interval”)

Page 22: Introduction to Econometrics€¦ · The dataset contains data on test performance, school characteristics and student demographic backgrounds for school districts in California

22

Initial data analysis: Compare districts with “small” (STR < 20) and “large” (STR ≥ 20) class sizes:

Class Size

Average score (Y )

Standard deviation (sY)

n

Small 657.4 19.4 238 Large 650.0 17.9 182

Δ = µsmall – µlarge = E(TS|STR < 20) – E(TS|STR ≥ 20) 1. Estimation of Δ = difference between group means 2. Test the hypothesis that Δ = 0 3. Construct a confidence interval for Δ

Page 23: Introduction to Econometrics€¦ · The dataset contains data on test performance, school characteristics and student demographic backgrounds for school districts in California

23

1. Estimation

small largeY Y− = small

1small

1 n

iiY

n =∑ –

large

1large

1 n

iiY

n =∑

= 657.4 – 650.0 = 7.4

Is this a large difference in a real-world sense? • Standard deviation across districts = 19.1 • Difference between 60th and 75th percentiles of test score

distribution is 667.6 – 659.4 = 8.2 • This is a big enough difference to be important for school

reform discussions, for parents, or for a school committee?

Page 24: Introduction to Econometrics€¦ · The dataset contains data on test performance, school characteristics and student demographic backgrounds for school districts in California

24

2. Hypothesis testing Formulating hypothesis

H0: µsmall – µlarge = 0 H1: µsmall – µlarge ≠ 0

Comparison of means for 2 populations (test at level α) Statistic: compute the t-statistic,

2 2 ( )s l

s l

s l s l

s s s ln n

Y Y Y YtSE Y Y

− −= =

−+

where SE( sY – lY ) is the “standard error” of sY – lY , the subscripts s and l refer to “small” and “large” STR districts,

and 2 2

1

1 ( )1

sn

s i sis

s Y Yn =

= −− ∑

Page 25: Introduction to Econometrics€¦ · The dataset contains data on test performance, school characteristics and student demographic backgrounds for school districts in California

25

2. Hypothesis testing Compute the difference-of-means t-statistic:

Size Y sY n small 657.4 19.4 238 large 650.0 17.9 182

2 2 2 219.4 17.9238 182

657.4 650.0 7.41.83s l

s l

s l

s sn n

Y Yt − −= = =

+ + = 4.05

Distribution under H0 : t ≈ N(0,1) (for n1,n2 > 30) Decision rule : |t| > 1.96, so reject (at the 5% significance level) the null hypothesis that the two means are the same.

Page 26: Introduction to Econometrics€¦ · The dataset contains data on test performance, school characteristics and student demographic backgrounds for school districts in California

26

3.Confidence interval A 95% confidence interval for the difference between the means is,

CI 95%(µsmall – µlarge) = [ ( sY – lY ) ± 1.96 × SE( sY – lY ) ] = [7.4 ± 1.96 × 1.83] = [3.8, 11.0]

Two equivalent statements: 1. The 95% confidence interval for Δ does not include 0 2. The hypothesis that Δ = µsmall – µlarge = 0 is rejected at the

5% level

Page 27: Introduction to Econometrics€¦ · The dataset contains data on test performance, school characteristics and student demographic backgrounds for school districts in California

27

What comes next… • The mechanics of estimation, hypothesis testing, and

confidence intervals should be familiar • These concepts extend directly to regression and its

variants • Before turning to regression, however, we will review

some of the underlying theory of estimation, hypothesis testing, and confidence intervals: o why do these procedures work, and why use these

rather than others? o we will review the intellectual foundations of statistics

and econometrics