1. Introduction Mr. Sydney Armstrong Lecturer 1 The ... 3202 APPLIED ECONOMETRICS 1 Semester 2, 2015/2016 1. Introduction Mr. Sydney Armstrong Lecturer 1 The University of Guyana

ECN 3202

APPLIED ECONOMETRICS

1

Semester 2, 2015/2016

1. Introduction

Mr. Sydney Armstrong

Lecturer 1

The University of Guyana

Welcome to one of the most important courses for economists in their entire career!...

Why?

It relates the abstract economics, business and marketing concepts you learnt in various classes to practice, using real world data …

Methods learned here are heavily used in:

Private sector (banks, investment and consulting firms, mining and manufacturing companies, etc.)

Government agencies and central banks

International financial institutions (IMF, The World Bank, IFC, etc.)

This is not a “rocket science” course…

Yet, sometimes you may find some material difficult to grasp …

This is normal:

“No pain, no gain!”

“If there is a will—there is a way!”

2

What is Econometrics?

In a broad sense:

Econometrics deals with measurement of economic

phenomena … and why is “measurement” important? Why

should we care?

“… You can’t manage it, if you can’t measure it! …”

attributed to Peter Drucker

3

In a more narrow sense:

Econometrics deals with measurement of economic

phenomena (and making related predictions) using

statistical methods.

… so in the first lecture we will briefly refresh the knowledge

you’ve got from courses about statistics …

4

Week 1

INTRODUCTION:

Review of Statistical Tools needed to understand

Econometrics

Recommended Reading from ECN 2101, 2201 AND ECN

1203 is advised.

5

What is Statistics?

The word statistics is used to refer to:

-set of numbers obtained by applying a formula on a sample

(data)

-branch of mathematics concerned with the extraction of useful

information from a set of numbers.

Statistics involves three main areas:

estimation (e.g., of a relationship between some variables,

etc.)

hypothesis testing (e.g., about significance of a relationship)

prediction (of various economic variables of interest) 6

Random Variables

A random variable (or‘r.v.’) is a variable whose values are

determined by chance. A priori, such variables cannot be

predicted perfectly, but only with a probability.

Usually:

upper case letters (e.g., 𝑋,𝑌,…) are used to denote r.v.s;

lower case letters (e.g., 𝑥,𝑦,…) are used to denote fixed

values a r.v. can take.

lower case letters with subscripts (e.g., 𝑥𝑖,𝑦𝑖 …) are used to

denote data (sample) points or observations, which are

usually considered as random (unless otherwise is assumed).

7

There are two types of random variables:

a discrete r.v. can assume only a countable number of

different values (e.g., 𝑋 = number of items sold; 𝑋 = number of

votes received, 𝑋 = 1 if win a game, 0 if otherwise, etc.).

a continuous r.v. can assume all the real values in some

interval(s). E.g., 𝑋 = price; 𝑋 = temperature; 𝑋 = weight, etc.

8

Continuous Random Variables, Distribution and Density

To characterize distribution of a continuous r.v. one can use the

cumulative distribution function (cdf), denoted with 𝐹(𝑥),

defined as the probability that a r.v. 𝑋 takes a value below 𝑥,

i.e.,

𝐹(𝑥)=𝑃(𝑋≤𝑥)

If 𝐹(𝑥) happens to be differentiable, then one can also

characterize the distribution of r.v. 𝑋 using

𝑓(𝑥)=𝐹′(𝑥)=𝑑𝐹(𝑥)/𝑑𝑥

called the probability density function (pdf) or density of r.v. 𝑋. 9

Continuous Random Variables, Distribution and

Density, cont.

Importantly, note that if pdf exists then we have (from

calculus)

i.e., cdf of 𝑋 at point 𝑥 is the integral over pdf of 𝑋 up to point 𝑥.

In other words, cdf of 𝑋 at point 𝑥 is the area under the graph

of pdf of 𝑋 all the way up to point 𝑥.

10

Moreover, we always have

While both cdf and pdf are equivalent characterizations of a

probability distribution, cdf exists for any r.v., but pdf exists only

for continuous r.v., and not for all (only if 𝐹(𝑥) is differentiable).

11

Example of cdf and pfd

12

Cdf 𝐹(𝑥) gives us a %-measure of the area which lies below the

graph of pdf, 𝑓𝑥, before a point 𝑥 relative to the entire area

under 𝑓𝑥.

In Figure: 𝑥=−15, and 𝐹(15)=0.2=20%)

13

Continuous Random Variables, Distribution and

Density, cont.

If we let 𝑓(𝑥) to denote the pdf of the (continuous) r.v. 𝑋, then

the mean and the variance of r.v. 𝑋 are, respectively, defined

as

And

Note: For discrete variables, the integral is replaced with the

sum and the density is replaced with probability function 14

Mean of a r.v. 𝑋 is a measure of the center of a distribution of 𝑋,

while variance is a measure of dispersion or variation of this

r.v. 𝑋.

It is more convenient to describe dispersion of r.v. 𝑋 in the same

units of measurement as 𝑋—by using the standard deviation of

𝑋:

15

16

Laws or Rules of Expectation and Variance of a r.v.

Let 𝑎 𝑏 and 𝑐 be constants and 𝑋 and 𝑌 be some random

variables (r.v.s), then the following important results must be

remembered:

𝐸(𝑎)=𝑎 but 𝑉𝑎𝑟(𝑎)=0

•𝐸(𝑏𝑋)=𝑏𝐸(𝑋) but 𝑉𝑎𝑟(𝑏𝑋)=𝑏²𝑉𝑎𝑟𝑋

𝐸(𝑎𝑋+𝑏𝑌+𝑐)=𝑎𝐸(𝑋)+𝑏𝐸(𝑌)+𝑐 , but

𝑉𝑎𝑟(𝑎𝑋+𝑏𝑌+𝑐) =𝑎²𝑉𝑎𝑟𝑋+𝑏² V𝑎𝑟𝑌+2𝑎𝑏𝐶𝑜𝑣(𝑋,𝑌)

where 𝐶𝑜𝑣(𝑋,𝑌), called covariance between 𝑋 and 𝑌, is a

measure of statistical linear dependency between 𝑋 and 𝑌,

defined as

17

Correlation between Random Variables

A more intuitive measure of linear dependency is obtained when

covariance is divided by standard deviations of both r.v.s—in

which case we get a measure or coefficient of correlation:

which is measured between -1 and 1. The closer 𝜌𝑋𝑌 is to 1 (-1)

the higher the positive (negative) correlation is, where 1 (-1)

represents perfect positive (negative) correlation and zero

represents no correlation at all (i.e., no stat. linear dependency).

Note that if 𝑋 and 𝑌 are independent r.v.s, then 𝐸(𝑋𝑌)=𝐸(𝑋)𝐸(𝑌)

⇒ 𝐶𝑜𝑣𝑋,𝑌=0 ⇒ 𝜌𝑋𝑌=0 18

Assumption of independency of some variables is often used in

econometric studies and so is very important to understand it

well.

19

Dependent and Independent Random Variables

Intuitively, any two random variables (e.g., 𝑋 and 𝑌) are called

independent if the probability of any realization of one of them

is not affected by (or does not depend on) realization of the other

one.

If in addition to being independent, random variables are also

having exactly the same distribution, then they are called

iid = “independently and identically distributed”

In this class we will start studying assuming iid case and then

study extensions for heterogeneously distributed r.v.s and

dependently distributed r.v.s

20

Example – Normal or Gaussian Distribution (most

common!)

If 𝑋 is has Normal distribution, denoted as 𝑋~ℕ(𝜇,𝜎2), with

𝜇=𝐸(𝑋) and 𝜎2=𝑉𝑎𝑟(𝑋), then its pdf (density) at a point 𝑥 is

A special case is called Standard Normal r.v.--if 𝜇=0 and𝜎²=1,

then pdf is

Note, a probability that a St. Normal r.v. falls into the interval

[−1.96,1.96] is equal to 0.95 (i.e., 95%). 21

Formally, this is stated:

This expression, and the critical values [−1.96,1.96] are often

used for confidence intervals and hypotheses tests, and are

obtained

via integration or from ‘Normal tables’ (Table 1, Appendix E).

22

Example – Normal Distribution

Normal Probability Density Functions with Means μ and 𝜎=1

23

Example – Normal Distribution

Normal Density Functions with Mean 0 and different

Variance (σ)

24

Example – Chi-square Distribution

If 𝑍1,…,𝑍𝑑 are iid as St. Normal, denoted 𝑍1,…,𝑍𝑑~𝑖𝑖𝑑ℕ(0,1),

then the sum of their squares has Chi-square distribution

with 𝑑 degrees of freedom, i.e.,

Note: ‘degrees of freedom’ 𝑑 is the

only parameter of this distribution,

And note that

E( 𝐽) =𝑑

𝑉𝑎𝑟( 𝐽) = 2𝑑 25

Example: Student’s t-distribution (William Sealy Gosset)

If 𝑍 ~ ℕ(0,1) and 𝐽~ 𝜒(𝑑)2 and they are independent r.v.s, then

where 𝑡(𝑑) is (Student’s) t distribution with 𝑑 degrees of freedom.

very similar to St. Normal

mean is also zero

but 𝑉𝑎𝑟(𝑡)= 𝑑/(𝑑−2)≥1,for 𝑑>2

and so it has ‘thicker’ tails!

So, generates more of ‘outliers’

Converges to St. Normal with increase of 𝑑

Same as ℕ(0,1) for 𝑛→∞ 26

Probabilities and critical values can be found in ‘

Student’s 𝑡-distribution tables’ according to 𝑑 (Table 2,

Appendix E)

27

If 𝐽1 ~ 𝜒(𝑑1)2 and 𝐽2 ~ 𝜒(𝑑2)2 and are independent r.v., then

where𝐹(𝑑1,𝑑2) is F-distribution with degrees of freedom (𝑑1,𝑑2)

Note: ‘degrees of freedom’ are the only parameters of this

distribution.

Probabilities and critical values can be found in the

𝐹−tables, according to (𝑑1,𝑑2)

(Table 4, Appendix E) 28

29

A Statistic (an Estimator)

A formula applied to a sample from a population is called

statistic.

A statistic (call it ӫ ) is used to estimate some true parameter of

a population (call it 𝜃), and so, it’s often called ‘estimator’.

Simple examples of estimators for parameters of a population

from a sample of observations 𝑥1,𝑥2,…,𝑥𝑖,…,𝑥𝑁

true mean 𝐸(𝑋) can be estimated with the sample mean:

30

true variance 𝑉𝑎𝑟(𝑋) can be estimated with the sample variance:

Both and 𝑠² are good estimators of true (and unknown)

𝜎²

𝑠² is usually preferred for small samples…

31

Properties of an Estimator

An estimator is also considered as a random variable,

because it depends on randomness of the samples it is

applied for.

Several estimators can be used for estimating the same

thing!

How to choose the best one?

Use the one with better properties!

32

Desirable properties of an estimator include:

Unbiasedness: (i.e., mean of estimator for 𝜃 is 𝜃)

Efficiency: (i.e., has smaller

variance than any other estimator 𝜃 for the same parameter 𝜃)

Consistency: (as 𝑁→∞)

i.e., 𝜃 ‘converges in probability’ to the truth, 𝜃, when 𝑁→∞.

Intuitively: as sample size (𝑁) increases gets practically close

to 𝜃.

Asymptotic Normality: ~𝑎ℕ𝐸( ),𝑉𝑎𝑟( )

or

𝜃33

Confidence Intervals (CIs)

A confidence interval (CI), or interval estimate, is a range of values (e.g., denoted by [𝑐𝑙,𝑐𝑢]) that, with a chosen probability (e.g., 1−𝛼, for a small 𝛼), covers the true (unknown) parameter of interest (𝜃), if one repeats experiment many times. Or, formally:

𝑃(𝑐𝑙≤𝜃≤𝑐𝑢)=1−𝛼

where 𝑐𝑙,𝑐𝑢 are called lower and upper bounds of CI, respectively, that depend on the chosen 𝛼, called the level of significance.

To obtain a CI (i.e., values [𝑐𝑙,𝑐𝑢]) for a true parameter of interest (𝜃), we need to use information about its estimator ( 𝜃), e.g.:

sampling distribution of 𝜃 (but it is rarely known!) or

asymptotic distribution of 𝜃 (often known but not always!), or

bootstrap approximation of the sampling distribution of 𝜃

an advanced method that improves on the above ones; useful for very complicated cases (see more in advanced classes!…)

34

Example: 95% CI for population mean from Normal r.v.s

If 𝑥1,…,𝑥𝑁~𝑖𝑖𝑑ℕ(𝜇,𝜎2) then

from tables of St. Normal and 𝛼=0.05 : −𝑐𝑙=𝑐𝑢=1.96, i.e.,

That is, the interval estimator for the population mean 𝜇 is:

Note: Usually, √𝜎² is unknown! Just use its estimator √𝑠²

or√ 𝜎² …

35

If 𝑥1,…,𝑥𝑁 are not normally distributed then we can still use

these same expressions, as approximations (relying on CLT)

gives good approximation even for 𝑁≈30, but the larger 𝑁 the

better!

36

Hypothesis Testing: Logistics Summary

A statistical hypotheses is a statement (claim) about value(s) of

population parameter(s). The following steps are involved:

1.Formally state the ‘null hypothesis’—the one we test

(denoted as 𝐻0)

2.Formally state the ‘alternative hypothesis’—the one we

would favor if 𝐻0 is rejected (denoted as 𝐻1)

3.Select a significance level (𝛼) or confidence level 1−𝛼.

4.Identify an appropriate test-statistic and find critical

values from its sampling or asymptotic distribution (assuming

𝐻0 is true).

5.Make a decision:

37

Reject 𝐻0 if the test-statistic takes a value that is deemed

unlikely—when it is beyond the CI critical values implied by the

sampling (or asymptotic) distribution of the test-statistic; and

do not reject otherwise.

Note: If 𝐻0 is not rejected then, it is wrong to say “𝐻0 is

accepted”, but conclude on inability to reject 𝐻0 conditionally on

the given sample, employed econometric model, and chosen 𝛼.

For example,

𝐻0 may be close to truth, but the test is unable to reject due to

small 𝑁

It might be that for different 𝛼 or larger sample, 𝐻0 can still

be rejected…

38

Example: Two-sided (two-tail) test for Mean: z-test and t-

test

Let’s formulate our Null and Alternative hypotheses:

𝐻0:𝐸(𝑋)=𝜇=𝑐 vs. 𝐻1: 𝜇≠𝑐

We know that a very good estimate of 𝐸(𝑋) is 𝑋 (it’s BLUE!).

If the sample is too small, it might be safe to assume r.v. 𝑋 is

normally distributed, then we can apply Result 6, and so, if 𝐻0

is true, then 𝑡 has Student’s 𝑡−distribution with 𝑚=𝑁−1 degrees

of freedom:

If 𝑁 is large, we can rely on CLT (Result 8), and then, if 𝐻0 is

true:

39

So, we use 𝑡 or 𝑧 as test statistics and compare its values for

particular data to the critical values (e.g., from tables) of these

statistics…

40

Two-sided (two-tail) test for Mean: Z-test and t-test

We consider 𝐻0 as unlikely to be true (and reject 𝐻0) if the

test statistic (𝑡, 𝑧, etc.) takes a value sufficiently far out in the

tails of its distribution

Reject 𝐻0 (and thus accept 𝐻1) if the

calculated value of our test statistic 𝑡 is

less than 𝑡𝛼/2,𝑚 or bigger than 𝑡1−𝛼/2,𝑚,

i.e., if t∉[𝑡α/2,𝑚, 𝑡1−α/2,𝑚]

(Note: here, 𝑚=𝑁−1=degrees of freedom).

41

Do not reject 𝐻0 otherwise

i.e., if 𝑡∈[𝑡α/2,𝑚, 𝑡1−α/2,𝑚]

= “unable to reject this 𝐻0 with given sample, employed

econometric model, and chosen 𝛼 …”(i.e., incorrect to say𝐻0

is true or that we accept 𝐻0!).

Similar approach is used for other tests statistics, but with

their own distributions

•For the 𝑧-test we use Normal distribution to get critical

values …

•For tests about variances we use Chi-square and F-

distributions, etc.

42

One-sided (one-tail) vs. Two-sided Test: t-test and z-test

When ‘direction’ of alternative hypothesis is known a priori,

e.g., 𝜇>𝑐 or 𝜇<𝑐, then one shall use the one-sided test…

… because entire 𝛼 is allocated to one side:

If 𝐻1:𝜇>𝑐 we reject H0

if 𝑡>𝑡1−𝛼,𝑚,

i.e., if 𝑡 takes a value

too far out in the right 𝛼-tail

of distribution,

43

If 𝐻1:𝜇<𝑐 we reject H0 if 𝑡<𝑡𝛼,𝑚, i.e., if 𝑡 takes a value too far

out in the left 𝛼-tail of St. Normal dist., and so we then consider

𝐻0 as unlikely to be true.

(Note: here, 𝑚=𝑁−1=degrees of freedom).

44

The p-value

The p-value of a test is the probability of getting the test

statistic value that is at least ‘as extreme’ as the observed value,

assuming 𝐻0 is true.

That is, p-value is the smallest value of 𝛼 for which 𝐻0 can be

rejected.

Exact calculation of p-value depends on how 𝐻1 is formulated:

Example: Suppose that 𝐻0:𝜇=𝑐 and computing √𝑁(𝑥−𝑐

𝑠²) gave

some 𝑡0

If 𝐻1:𝜇>𝑐, then p-value=𝑃(𝑡≥𝑡0 |𝐻0 𝑖𝑠 𝑡𝑟𝑢𝑒)

= probability of observing the statistic to the right of 𝑡0,

assuming that 𝐻0 is true. 45

If 𝐻1:𝜇<𝑐, then p-value =𝑃(𝑡≤𝑡0 |𝐻0 𝑖𝑠 𝑡𝑟𝑢𝑒)

= probability of observing statistic to the left of 𝑡0, assuming

that 𝐻0 is true.

If 𝐻1:𝜇≠𝑐, then p-value =𝑃(𝑡≥|𝑡0| | 𝐻0 𝑖𝑠 𝑡𝑟𝑢𝑒)

+ 𝑃𝑡≤−| 𝑡0|| 𝐻0 𝑖𝑠 𝑡𝑟𝑢𝑒)

= probability of observing statistic to the right of |𝑡0|, plus the

probability of

observing statistic to the left of −|𝑡0|, assuming that 𝐻0 is true.

46

The p-value (cont.)

And why should we care about the p-value?!

Because it is much easier to apply in practice:

Instead of comparing a test statistic to a critical value from

some distribution (table), we can simply compare the p-value

(reported by most stat. packages) to the level of significance (𝛼)

…

Specifically, the p-value rule says:

Reject 𝐻𝑜 when p-value ≤𝜶 and do not reject otherwise.

So, you just need to get appropriate p-value (depending on the

type of 𝐻1!)

from the statistical software you use!... 47

Important Remarks about Hypotheses Tests

A statistical test cannot prove that a statistical hypotheses is

true!

If 𝐻0 is not rejected, all that the statistical test says is that the

‘information contained in the data is compatible with 𝐻0 and

other assumptions made’ or, simpler, that there is ‘no empirical

evidence to reject 𝐻0’.

In fact, there is entire confidence interval of values when 𝐻0

is not rejected!

But the truth is always one.

48

If 𝐻0 is rejected—the statistical test also does not prove that 𝐻0

is wrong:

it only says that it is unlikely! …

… but it also tells how much unlikely!

it gives (approx.) probability of making the mistake by

rejecting 𝐻0 when it is true—the selected significance level of

the test, 𝛼 (‘type I error’).

there is also a probability of not rejecting 𝐻0 when it is wrong

(‘type II error’),

the power of the test = 1 - probability of type II error

the larger the sample, the higher is the power of

the test

the closer the true value is to hypothesized one,

the lower is the power

the lower 𝛼 is selected, the higher is the prob. of

type II error … 49

Important Remarks about Hypotheses Tests

Thus, rejecting 𝐻0 (and therefore accepting 𝐻1) with a

statistical test is a

much stronger conclusion than not rejecting 𝐻0 and we know

approx. probability of incorrectly rejecting it—our choice of 𝛼.

50

So, the strategy researchers often use is to state 𝐻0 such that,

if the theory they want to test is correct, then 𝐻0 is to be

rejected…

Example 1: “Presumption of Innocence” in a Justice

System (Court)

The prosecutor need to provide overwhelming evidence

(‘beyond reasonable doubt’) to reject 𝐻0 ‘innocence’, i.e.,

such that probability of ‘type I’ error is very small.

This is because the error of convicting an ‘innocent’ seems

more severe error than not convicting a ‘non-innocent’ due

to lack of evidence (type-II error).

Example 2: Testing if a bridge would be sustain a pressure of

certain level.

How would you formulate 𝐻0? 51

APPENDIX

52

The Econometric Approach

Applied econometric analysis typically involves these steps:

1.Identifying a research problem (e.g., what will happen if I

increase the price of my product by 10%, ceteris paribus)

2.Studying previous works on this problem

3.Formulating an economic model that adequately captures the

essence of the problem

4.Specifying an appropriate econometric (statistical) model

5.Obtaining and understanding the data

6.Tuning the model to be compatible with the data at hand

7.Estimating the unknown parameters

8.Performing model diagnostics and statistical hypotheses tests

9.Drawing implications/conclusions and

10.Identifying directions for further research. 53

KEY FACTORS OF SUCCESS FOR THIS COURSE

Important Advice: Avoid the “snowball effect”!

One missed topic may turn into an avalanche! …

The following strategy must help you succeed in

Quizzes and Exams:

Attend (and stay awake at!:) the lectures:

Take your own notes on the printed lecture notes …

Identified unclear issues from lectures and try to find

clarifications:

From textbook; From lecturer; From tutors;

54

Read the textbook very carefully and compare it with lecture

notes …

Remember: Lectures complement the textbook (rather than

substitute it!), by providing guidance, clarifying and

expediting its reading and understanding of the material …

Attend (and stay awake at) at the practical sessions

(tutorials):

Digest the lecture & textbook material before coming to

tutorials…

Try tutorial exercises by yourself before coming to

tutorials…

Ask help from tutors whenever something is unclear…

55

Acknowledgement

Slides for this and the following lectures are prepared by Sydney

Armstrong for ECN3103 at the University of Guyana (UG), to

accompany the textbook Principles of Econometrics, 4rd edition,

of Hill, Griffiths and Lim (John Wiley & Sons, Inc. 2011). All

rights reserved.

In preparation of these slides the author used various sources

and in particular he used (with permission) material from the

lecture slides of Professor Chris O’Donnell (UQ) and Professor

Valentin Zelenyuk (UQ) as well as some figures from ‘instructor

companion site’ of John Wiley & Sons, Inc. for the textbook of

Hill, Griffiths and Lim, Principles of Econometrics, 3rd and 4th

editions. 56

Documents

1. Introduction Mr. Sydney Armstrong Lecturer 1 The ... 3202 APPLIED ECONOMETRICS 1 Semester 2, 2015/2016 1. Introduction Mr. Sydney Armstrong Lecturer 1 The University of Guyana