Upload
habao
View
224
Download
0
Embed Size (px)
Citation preview
ECN 3202
APPLIED ECONOMETRICS
1
Semester 2, 2015/2016
1. Introduction
Mr. Sydney Armstrong
Lecturer 1
The University of Guyana
Welcome to one of the most important courses for economists in their entire career!...
Why?
It relates the abstract economics, business and marketing concepts you learnt in various classes to practice, using real world data …
Methods learned here are heavily used in:
Private sector (banks, investment and consulting firms, mining and manufacturing companies, etc.)
Government agencies and central banks
International financial institutions (IMF, The World Bank, IFC, etc.)
This is not a “rocket science” course…
Yet, sometimes you may find some material difficult to grasp …
This is normal:
“No pain, no gain!”
“If there is a will—there is a way!”
2
What is Econometrics?
In a broad sense:
Econometrics deals with measurement of economic
phenomena … and why is “measurement” important? Why
should we care?
“… You can’t manage it, if you can’t measure it! …”
attributed to Peter Drucker
3
In a more narrow sense:
Econometrics deals with measurement of economic
phenomena (and making related predictions) using
statistical methods.
… so in the first lecture we will briefly refresh the knowledge
you’ve got from courses about statistics …
4
Week 1
INTRODUCTION:
Review of Statistical Tools needed to understand
Econometrics
Recommended Reading from ECN 2101, 2201 AND ECN
1203 is advised.
5
What is Statistics?
The word statistics is used to refer to:
-set of numbers obtained by applying a formula on a sample
(data)
-branch of mathematics concerned with the extraction of useful
information from a set of numbers.
Statistics involves three main areas:
estimation (e.g., of a relationship between some variables,
etc.)
hypothesis testing (e.g., about significance of a relationship)
prediction (of various economic variables of interest) 6
Random Variables
A random variable (or‘r.v.’) is a variable whose values are
determined by chance. A priori, such variables cannot be
predicted perfectly, but only with a probability.
Usually:
upper case letters (e.g., 𝑋,𝑌,…) are used to denote r.v.s;
lower case letters (e.g., 𝑥,𝑦,…) are used to denote fixed
values a r.v. can take.
lower case letters with subscripts (e.g., 𝑥𝑖,𝑦𝑖 …) are used to
denote data (sample) points or observations, which are
usually considered as random (unless otherwise is assumed).
7
There are two types of random variables:
a discrete r.v. can assume only a countable number of
different values (e.g., 𝑋 = number of items sold; 𝑋 = number of
votes received, 𝑋 = 1 if win a game, 0 if otherwise, etc.).
a continuous r.v. can assume all the real values in some
interval(s). E.g., 𝑋 = price; 𝑋 = temperature; 𝑋 = weight, etc.
8
Continuous Random Variables, Distribution and Density
To characterize distribution of a continuous r.v. one can use the
cumulative distribution function (cdf), denoted with 𝐹(𝑥),
defined as the probability that a r.v. 𝑋 takes a value below 𝑥,
i.e.,
𝐹(𝑥)=𝑃(𝑋≤𝑥)
If 𝐹(𝑥) happens to be differentiable, then one can also
characterize the distribution of r.v. 𝑋 using
𝑓(𝑥)=𝐹′(𝑥)=𝑑𝐹(𝑥)/𝑑𝑥
called the probability density function (pdf) or density of r.v. 𝑋. 9
Continuous Random Variables, Distribution and
Density, cont.
Importantly, note that if pdf exists then we have (from
calculus)
i.e., cdf of 𝑋 at point 𝑥 is the integral over pdf of 𝑋 up to point 𝑥.
In other words, cdf of 𝑋 at point 𝑥 is the area under the graph
of pdf of 𝑋 all the way up to point 𝑥.
10
Moreover, we always have
While both cdf and pdf are equivalent characterizations of a
probability distribution, cdf exists for any r.v., but pdf exists only
for continuous r.v., and not for all (only if 𝐹(𝑥) is differentiable).
11
Example of cdf and pfd
12
Cdf 𝐹(𝑥) gives us a %-measure of the area which lies below the
graph of pdf, 𝑓𝑥, before a point 𝑥 relative to the entire area
under 𝑓𝑥.
In Figure: 𝑥=−15, and 𝐹(15)=0.2=20%)
13
Continuous Random Variables, Distribution and
Density, cont.
If we let 𝑓(𝑥) to denote the pdf of the (continuous) r.v. 𝑋, then
the mean and the variance of r.v. 𝑋 are, respectively, defined
as
And
Note: For discrete variables, the integral is replaced with the
sum and the density is replaced with probability function 14
Mean of a r.v. 𝑋 is a measure of the center of a distribution of 𝑋,
while variance is a measure of dispersion or variation of this
r.v. 𝑋.
It is more convenient to describe dispersion of r.v. 𝑋 in the same
units of measurement as 𝑋—by using the standard deviation of
𝑋:
15
16
Laws or Rules of Expectation and Variance of a r.v.
Let 𝑎 𝑏 and 𝑐 be constants and 𝑋 and 𝑌 be some random
variables (r.v.s), then the following important results must be
remembered:
𝐸(𝑎)=𝑎 but 𝑉𝑎𝑟(𝑎)=0
•𝐸(𝑏𝑋)=𝑏𝐸(𝑋) but 𝑉𝑎𝑟(𝑏𝑋)=𝑏²𝑉𝑎𝑟𝑋
𝐸(𝑎𝑋+𝑏𝑌+𝑐)=𝑎𝐸(𝑋)+𝑏𝐸(𝑌)+𝑐 , but
𝑉𝑎𝑟(𝑎𝑋+𝑏𝑌+𝑐) =𝑎²𝑉𝑎𝑟𝑋+𝑏² V𝑎𝑟𝑌+2𝑎𝑏𝐶𝑜𝑣(𝑋,𝑌)
where 𝐶𝑜𝑣(𝑋,𝑌), called covariance between 𝑋 and 𝑌, is a
measure of statistical linear dependency between 𝑋 and 𝑌,
defined as
17
Correlation between Random Variables
A more intuitive measure of linear dependency is obtained when
covariance is divided by standard deviations of both r.v.s—in
which case we get a measure or coefficient of correlation:
which is measured between -1 and 1. The closer 𝜌𝑋𝑌 is to 1 (-1)
the higher the positive (negative) correlation is, where 1 (-1)
represents perfect positive (negative) correlation and zero
represents no correlation at all (i.e., no stat. linear dependency).
Note that if 𝑋 and 𝑌 are independent r.v.s, then 𝐸(𝑋𝑌)=𝐸(𝑋)𝐸(𝑌)
⇒ 𝐶𝑜𝑣𝑋,𝑌=0 ⇒ 𝜌𝑋𝑌=0 18
Assumption of independency of some variables is often used in
econometric studies and so is very important to understand it
well.
19
Dependent and Independent Random Variables
Intuitively, any two random variables (e.g., 𝑋 and 𝑌) are called
independent if the probability of any realization of one of them
is not affected by (or does not depend on) realization of the other
one.
If in addition to being independent, random variables are also
having exactly the same distribution, then they are called
iid = “independently and identically distributed”
In this class we will start studying assuming iid case and then
study extensions for heterogeneously distributed r.v.s and
dependently distributed r.v.s
20
Example – Normal or Gaussian Distribution (most
common!)
If 𝑋 is has Normal distribution, denoted as 𝑋~ℕ(𝜇,𝜎2), with
𝜇=𝐸(𝑋) and 𝜎2=𝑉𝑎𝑟(𝑋), then its pdf (density) at a point 𝑥 is
A special case is called Standard Normal r.v.--if 𝜇=0 and𝜎²=1,
then pdf is
Note, a probability that a St. Normal r.v. falls into the interval
[−1.96,1.96] is equal to 0.95 (i.e., 95%). 21
Formally, this is stated:
This expression, and the critical values [−1.96,1.96] are often
used for confidence intervals and hypotheses tests, and are
obtained
via integration or from ‘Normal tables’ (Table 1, Appendix E).
22
Example – Normal Distribution
Normal Probability Density Functions with Means μ and 𝜎=1
23
Example – Normal Distribution
Normal Density Functions with Mean 0 and different
Variance (σ)
24
Example – Chi-square Distribution
If 𝑍1,…,𝑍𝑑 are iid as St. Normal, denoted 𝑍1,…,𝑍𝑑~𝑖𝑖𝑑ℕ(0,1),
then the sum of their squares has Chi-square distribution
with 𝑑 degrees of freedom, i.e.,
Note: ‘degrees of freedom’ 𝑑 is the
only parameter of this distribution,
And note that
E( 𝐽) =𝑑
𝑉𝑎𝑟( 𝐽) = 2𝑑 25
Example: Student’s t-distribution (William Sealy Gosset)
If 𝑍 ~ ℕ(0,1) and 𝐽~ 𝜒(𝑑)2 and they are independent r.v.s, then
where 𝑡(𝑑) is (Student’s) t distribution with 𝑑 degrees of freedom.
very similar to St. Normal
mean is also zero
but 𝑉𝑎𝑟(𝑡)= 𝑑/(𝑑−2)≥1,for 𝑑>2
and so it has ‘thicker’ tails!
So, generates more of ‘outliers’
Converges to St. Normal with increase of 𝑑
Same as ℕ(0,1) for 𝑛→∞ 26
Probabilities and critical values can be found in ‘
Student’s 𝑡-distribution tables’ according to 𝑑 (Table 2,
Appendix E)
27
If 𝐽1 ~ 𝜒(𝑑1)2 and 𝐽2 ~ 𝜒(𝑑2)2 and are independent r.v., then
where𝐹(𝑑1,𝑑2) is F-distribution with degrees of freedom (𝑑1,𝑑2)
Note: ‘degrees of freedom’ are the only parameters of this
distribution.
Probabilities and critical values can be found in the
𝐹−tables, according to (𝑑1,𝑑2)
(Table 4, Appendix E) 28
29
A Statistic (an Estimator)
A formula applied to a sample from a population is called
statistic.
A statistic (call it ӫ ) is used to estimate some true parameter of
a population (call it 𝜃), and so, it’s often called ‘estimator’.
Simple examples of estimators for parameters of a population
from a sample of observations 𝑥1,𝑥2,…,𝑥𝑖,…,𝑥𝑁
true mean 𝐸(𝑋) can be estimated with the sample mean:
30
true variance 𝑉𝑎𝑟(𝑋) can be estimated with the sample variance:
Both and 𝑠² are good estimators of true (and unknown)
𝜎²
𝑠² is usually preferred for small samples…
31
Properties of an Estimator
An estimator is also considered as a random variable,
because it depends on randomness of the samples it is
applied for.
Several estimators can be used for estimating the same
thing!
How to choose the best one?
Use the one with better properties!
32
Desirable properties of an estimator include:
Unbiasedness: (i.e., mean of estimator for 𝜃 is 𝜃)
Efficiency: (i.e., has smaller
variance than any other estimator 𝜃 for the same parameter 𝜃)
Consistency: (as 𝑁→∞)
i.e., 𝜃 ‘converges in probability’ to the truth, 𝜃, when 𝑁→∞.
Intuitively: as sample size (𝑁) increases gets practically close
to 𝜃.
Asymptotic Normality: ~𝑎ℕ𝐸( ),𝑉𝑎𝑟( )
or
𝜃33
Confidence Intervals (CIs)
A confidence interval (CI), or interval estimate, is a range of values (e.g., denoted by [𝑐𝑙,𝑐𝑢]) that, with a chosen probability (e.g., 1−𝛼, for a small 𝛼), covers the true (unknown) parameter of interest (𝜃), if one repeats experiment many times. Or, formally:
𝑃(𝑐𝑙≤𝜃≤𝑐𝑢)=1−𝛼
where 𝑐𝑙,𝑐𝑢 are called lower and upper bounds of CI, respectively, that depend on the chosen 𝛼, called the level of significance.
To obtain a CI (i.e., values [𝑐𝑙,𝑐𝑢]) for a true parameter of interest (𝜃), we need to use information about its estimator ( 𝜃), e.g.:
sampling distribution of 𝜃 (but it is rarely known!) or
asymptotic distribution of 𝜃 (often known but not always!), or
bootstrap approximation of the sampling distribution of 𝜃
an advanced method that improves on the above ones; useful for very complicated cases (see more in advanced classes!…)
34
Example: 95% CI for population mean from Normal r.v.s
If 𝑥1,…,𝑥𝑁~𝑖𝑖𝑑ℕ(𝜇,𝜎2) then
from tables of St. Normal and 𝛼=0.05 : −𝑐𝑙=𝑐𝑢=1.96, i.e.,
That is, the interval estimator for the population mean 𝜇 is:
Note: Usually, √𝜎² is unknown! Just use its estimator √𝑠²
or√ 𝜎² …
35
If 𝑥1,…,𝑥𝑁 are not normally distributed then we can still use
these same expressions, as approximations (relying on CLT)
gives good approximation even for 𝑁≈30, but the larger 𝑁 the
better!
36
Hypothesis Testing: Logistics Summary
A statistical hypotheses is a statement (claim) about value(s) of
population parameter(s). The following steps are involved:
1.Formally state the ‘null hypothesis’—the one we test
(denoted as 𝐻0)
2.Formally state the ‘alternative hypothesis’—the one we
would favor if 𝐻0 is rejected (denoted as 𝐻1)
3.Select a significance level (𝛼) or confidence level 1−𝛼.
4.Identify an appropriate test-statistic and find critical
values from its sampling or asymptotic distribution (assuming
𝐻0 is true).
5.Make a decision:
37
Reject 𝐻0 if the test-statistic takes a value that is deemed
unlikely—when it is beyond the CI critical values implied by the
sampling (or asymptotic) distribution of the test-statistic; and
do not reject otherwise.
Note: If 𝐻0 is not rejected then, it is wrong to say “𝐻0 is
accepted”, but conclude on inability to reject 𝐻0 conditionally on
the given sample, employed econometric model, and chosen 𝛼.
For example,
𝐻0 may be close to truth, but the test is unable to reject due to
small 𝑁
It might be that for different 𝛼 or larger sample, 𝐻0 can still
be rejected…
38
Example: Two-sided (two-tail) test for Mean: z-test and t-
test
Let’s formulate our Null and Alternative hypotheses:
𝐻0:𝐸(𝑋)=𝜇=𝑐 vs. 𝐻1: 𝜇≠𝑐
We know that a very good estimate of 𝐸(𝑋) is 𝑋 (it’s BLUE!).
If the sample is too small, it might be safe to assume r.v. 𝑋 is
normally distributed, then we can apply Result 6, and so, if 𝐻0
is true, then 𝑡 has Student’s 𝑡−distribution with 𝑚=𝑁−1 degrees
of freedom:
If 𝑁 is large, we can rely on CLT (Result 8), and then, if 𝐻0 is
true:
39
So, we use 𝑡 or 𝑧 as test statistics and compare its values for
particular data to the critical values (e.g., from tables) of these
statistics…
40
Two-sided (two-tail) test for Mean: Z-test and t-test
We consider 𝐻0 as unlikely to be true (and reject 𝐻0) if the
test statistic (𝑡, 𝑧, etc.) takes a value sufficiently far out in the
tails of its distribution
Reject 𝐻0 (and thus accept 𝐻1) if the
calculated value of our test statistic 𝑡 is
less than 𝑡𝛼/2,𝑚 or bigger than 𝑡1−𝛼/2,𝑚,
i.e., if t∉[𝑡α/2,𝑚, 𝑡1−α/2,𝑚]
(Note: here, 𝑚=𝑁−1=degrees of freedom).
41
Do not reject 𝐻0 otherwise
i.e., if 𝑡∈[𝑡α/2,𝑚, 𝑡1−α/2,𝑚]
= “unable to reject this 𝐻0 with given sample, employed
econometric model, and chosen 𝛼 …”(i.e., incorrect to say𝐻0
is true or that we accept 𝐻0!).
Similar approach is used for other tests statistics, but with
their own distributions
•For the 𝑧-test we use Normal distribution to get critical
values …
•For tests about variances we use Chi-square and F-
distributions, etc.
42
One-sided (one-tail) vs. Two-sided Test: t-test and z-test
When ‘direction’ of alternative hypothesis is known a priori,
e.g., 𝜇>𝑐 or 𝜇<𝑐, then one shall use the one-sided test…
… because entire 𝛼 is allocated to one side:
If 𝐻1:𝜇>𝑐 we reject H0
if 𝑡>𝑡1−𝛼,𝑚,
i.e., if 𝑡 takes a value
too far out in the right 𝛼-tail
of distribution,
43
If 𝐻1:𝜇<𝑐 we reject H0 if 𝑡<𝑡𝛼,𝑚, i.e., if 𝑡 takes a value too far
out in the left 𝛼-tail of St. Normal dist., and so we then consider
𝐻0 as unlikely to be true.
(Note: here, 𝑚=𝑁−1=degrees of freedom).
44
The p-value
The p-value of a test is the probability of getting the test
statistic value that is at least ‘as extreme’ as the observed value,
assuming 𝐻0 is true.
That is, p-value is the smallest value of 𝛼 for which 𝐻0 can be
rejected.
Exact calculation of p-value depends on how 𝐻1 is formulated:
Example: Suppose that 𝐻0:𝜇=𝑐 and computing √𝑁(𝑥−𝑐
𝑠²) gave
some 𝑡0
If 𝐻1:𝜇>𝑐, then p-value=𝑃(𝑡≥𝑡0 |𝐻0 𝑖𝑠 𝑡𝑟𝑢𝑒)
= probability of observing the statistic to the right of 𝑡0,
assuming that 𝐻0 is true. 45
If 𝐻1:𝜇<𝑐, then p-value =𝑃(𝑡≤𝑡0 |𝐻0 𝑖𝑠 𝑡𝑟𝑢𝑒)
= probability of observing statistic to the left of 𝑡0, assuming
that 𝐻0 is true.
If 𝐻1:𝜇≠𝑐, then p-value =𝑃(𝑡≥|𝑡0| | 𝐻0 𝑖𝑠 𝑡𝑟𝑢𝑒)
+ 𝑃𝑡≤−| 𝑡0|| 𝐻0 𝑖𝑠 𝑡𝑟𝑢𝑒)
= probability of observing statistic to the right of |𝑡0|, plus the
probability of
observing statistic to the left of −|𝑡0|, assuming that 𝐻0 is true.
46
The p-value (cont.)
And why should we care about the p-value?!
Because it is much easier to apply in practice:
Instead of comparing a test statistic to a critical value from
some distribution (table), we can simply compare the p-value
(reported by most stat. packages) to the level of significance (𝛼)
…
Specifically, the p-value rule says:
Reject 𝐻𝑜 when p-value ≤𝜶 and do not reject otherwise.
So, you just need to get appropriate p-value (depending on the
type of 𝐻1!)
from the statistical software you use!... 47
Important Remarks about Hypotheses Tests
A statistical test cannot prove that a statistical hypotheses is
true!
If 𝐻0 is not rejected, all that the statistical test says is that the
‘information contained in the data is compatible with 𝐻0 and
other assumptions made’ or, simpler, that there is ‘no empirical
evidence to reject 𝐻0’.
In fact, there is entire confidence interval of values when 𝐻0
is not rejected!
But the truth is always one.
48
If 𝐻0 is rejected—the statistical test also does not prove that 𝐻0
is wrong:
it only says that it is unlikely! …
… but it also tells how much unlikely!
it gives (approx.) probability of making the mistake by
rejecting 𝐻0 when it is true—the selected significance level of
the test, 𝛼 (‘type I error’).
there is also a probability of not rejecting 𝐻0 when it is wrong
(‘type II error’),
the power of the test = 1 - probability of type II error
the larger the sample, the higher is the power of
the test
the closer the true value is to hypothesized one,
the lower is the power
the lower 𝛼 is selected, the higher is the prob. of
type II error … 49
Important Remarks about Hypotheses Tests
Thus, rejecting 𝐻0 (and therefore accepting 𝐻1) with a
statistical test is a
much stronger conclusion than not rejecting 𝐻0 and we know
approx. probability of incorrectly rejecting it—our choice of 𝛼.
50
So, the strategy researchers often use is to state 𝐻0 such that,
if the theory they want to test is correct, then 𝐻0 is to be
rejected…
Example 1: “Presumption of Innocence” in a Justice
System (Court)
The prosecutor need to provide overwhelming evidence
(‘beyond reasonable doubt’) to reject 𝐻0 ‘innocence’, i.e.,
such that probability of ‘type I’ error is very small.
This is because the error of convicting an ‘innocent’ seems
more severe error than not convicting a ‘non-innocent’ due
to lack of evidence (type-II error).
Example 2: Testing if a bridge would be sustain a pressure of
certain level.
How would you formulate 𝐻0? 51
APPENDIX
52
The Econometric Approach
Applied econometric analysis typically involves these steps:
1.Identifying a research problem (e.g., what will happen if I
increase the price of my product by 10%, ceteris paribus)
2.Studying previous works on this problem
3.Formulating an economic model that adequately captures the
essence of the problem
4.Specifying an appropriate econometric (statistical) model
5.Obtaining and understanding the data
6.Tuning the model to be compatible with the data at hand
7.Estimating the unknown parameters
8.Performing model diagnostics and statistical hypotheses tests
9.Drawing implications/conclusions and
10.Identifying directions for further research. 53
KEY FACTORS OF SUCCESS FOR THIS COURSE
Important Advice: Avoid the “snowball effect”!
One missed topic may turn into an avalanche! …
The following strategy must help you succeed in
Quizzes and Exams:
Attend (and stay awake at!:) the lectures:
Take your own notes on the printed lecture notes …
Identified unclear issues from lectures and try to find
clarifications:
From textbook; From lecturer; From tutors;
54
Read the textbook very carefully and compare it with lecture
notes …
Remember: Lectures complement the textbook (rather than
substitute it!), by providing guidance, clarifying and
expediting its reading and understanding of the material …
Attend (and stay awake at) at the practical sessions
(tutorials):
Digest the lecture & textbook material before coming to
tutorials…
Try tutorial exercises by yourself before coming to
tutorials…
Ask help from tutors whenever something is unclear…
55
Acknowledgement
Slides for this and the following lectures are prepared by Sydney
Armstrong for ECN3103 at the University of Guyana (UG), to
accompany the textbook Principles of Econometrics, 4rd edition,
of Hill, Griffiths and Lim (John Wiley & Sons, Inc. 2011). All
rights reserved.
In preparation of these slides the author used various sources
and in particular he used (with permission) material from the
lecture slides of Professor Chris O’Donnell (UQ) and Professor
Valentin Zelenyuk (UQ) as well as some figures from ‘instructor
companion site’ of John Wiley & Sons, Inc. for the textbook of
Hill, Griffiths and Lim, Principles of Econometrics, 3rd and 4th
editions. 56