Upload
others
View
18
Download
0
Embed Size (px)
Citation preview
Asymptotic Theory and Stochastic Regression
Chapter 13
Asymptotic Theory and Stochastic Regressors
The nature of explanatory variable is assumed to be non-stochastic or fixed in repeated samples in any regression analysis. Such an assumption is appropriate for those experiments which are conducted inside the laboratories where the experimenter can control the values of explanatory variables. Then the repeated observations on study variable can be obtained for fixed values of explanatory variables. In practice, such an assumption may not always be satisfied. Sometimes, the explanatory variables in a given model are the study variable in another model. Thus the study variable depends on the explanatory variables that are stochastic in nature. Under such situations, the statistical inferences drawn from the linear regression model based on the assumption of fixed explanatory variables may not remain valid.
We assume now that the explanatory variables are stochastic but uncorrelated with the disturbance term. In case, they are correlated then the issue is addressed through instrumental variable estimation. Such a situation arises in the case of measurement error models.
Stochastic regressors model
Consider the linear regression model
yX
be
=+
where
X
is a
(
)
nk
´
matrix of
n
observations on
k
explanatory variables
12
,,...,
k
XXX
which are stochastic in nature,
y
is a
(
)
1
n
´
vector of
n
observations on study variable,
b
is a
(
)
1
k
´
vector of regression coefficients and
e
is the
(
)
1
n
´
vector of disturbances. Under the assumption
(
)
2
()0,,
EVI
ees
==
the distribution of
,
i
e
conditional on
'
,
i
x
satisfy these properties for all all values of
X
where
'
i
x
denotes the
th
i
row of
X
. This is demonstrated as follows:
Let
(
)
'
|
ii
px
e
be the conditional probability density function of
i
e
given
'
i
x
and
(
)
i
p
e
is the unconditional probability density function of
i
e
. Then
(
)
(
)
(
)
(
)
''
||
0
iiiiii
iii
i
Expxd
pd
E
eeee
eee
e
=
=
=
=
ò
ò
(
)
(
)
(
)
(
)
2'2'
2
2
2
||
.
iiiiii
iii
i
Expxd
pd
E
eeee
eee
e
s
=
=
=
=
ò
ò
In case,
'
and
ii
x
e
are independent, then
(
)
(
)
'
|.
iii
pxp
ee
=
Least squares estimation of parameters
The additional assumption that the explanatory variables are stochastic poses no problem in the ordinary least squares estimation of
b
and
2
s
. The OLSE of
b
is obtained by minimizing
(
)
(
)
'
yXyX
bb
--
with respect
b
as
(
)
1
''
bXXXy
-
=
and estimator of
2
s
is obtained as
(
)
(
)
2
1
'
syXbyXb
nk
=--
-
.
Maximum likelihood estimation of parameters:
Assuming
(
)
2
~0,
NI
es
in the model
yX
be
=+
along with
X
is stochastic and independent of
e
, the joint probability density function
e
and
X
can be derived from the joint probability density function of
y
and
X
as follows:
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
'''
1212
'
11
''
11
''
1
'
1
'''
1212
,,,...,,,,...,
|
|
,
,,...,,,,...,
,.
nn
nn
ii
ii
nn
iii
ii
n
iii
i
n
ii
i
nn
fXfxxx
ffx
fyxfx
fyxfx
fyx
fyyyxxx
fyX
eeee
e
==
==
=
=
=
æöæö
=
ç÷ç÷
èøèø
æöæö
=
ç÷ç÷
èøèø
=
=
=
=
ÕÕ
ÕÕ
Õ
Õ
This implies that the maximum likelihood estimators of
b
and
2
s
will be based on
(
)
(
)
'
11
|
nn
iii
ii
fyxf
e
==
=
ÕÕ
so they will be same as based on the assumption that
',1,2,...,
i
sin
e
=
are distributed as
(
)
2
0,.
N
s
So the maximum likelihood estimators of
2
and
bs
when the explanatory variables are stochastic are obtained as
(
)
(
)
(
)
1
2
''
1
.
XXXy
yXyX
n
b
sbb
-
=
¢
=--
%
%%
%
Alternative approach for deriving the maximum likelihood estimates
Alternatively, the maximum likelihood estimators of
b
and
2
s
can also be derived using the joint probability density function of
and.
yX
Note: Note that the vector
'
x
is represented by an underscore in this section to denote that it ‘s order is
(
)
11
k
´-
éù
ëû
which excludes the intercept term.
Let
'
,1,2,...,
i
xin
=
are from a multivariate normal distribution with mean vector
x
m
and covariance matrix
xx
S
, i.e.,
(
)
'
~,
ixxx
xN
m
S
and the joint distribution of
y
and
'
i
x
is
'
~,.
y
yyyx
x
xyxx
i
y
N
x
m
s
m
éù
S
æö
æö
æö
êú
ç÷
ç÷
ç÷
SS
êú
èø
èø
èø
ëû
Let the linear regression model is
'
01
,1,2,...,
iii
yxin
bbe
=++=
where
'
i
x
is a
(
)
11
k
´-
éù
ëû
vector of observation of random vector
0
,
x
b
is the intercept term and
1
b
is the
(
)
11
k
-´
éù
ëû
vector of regression coefficients. Further
i
e
is disturbance term with
(
)
2
~0,
i
N
es
and is independent of
'
x
.
Suppose
~,.
y
yyyx
x
xyxx
y
N
x
m
s
m
éù
S
æö
æö
æö
êú
ç÷
ç÷
ç÷
SS
êú
èø
èø
èø
ëû
The joint probability density function of
(
)
'
,
i
yx
based on random sample of size
n
is
(
)
(
)
'
1
1
2
2
11
,'exp.
2
2
yy
k
xx
yy
fyx
xx
mm
mm
p
-
éù
--
æöæö
êú
=-S
ç÷ç÷
--
êú
èøèø
S
ëû
Now using the following result, we find
1
:
-
S
Result: Let
A
be a nonsingular matrix which in partitioned suitably as
,
BC
A
DE
æö
=
ç÷
èø
where
E
and
1
FBCED
-
=-
are nonsingular matrices, then
111
1
111111
.
FFCE
A
EDFEEDFCE
---
-
------
æö
-
=
ç÷
-+
èø
Note that
11
.
AAAAI
--
==
Thus
1
1
12111
2
1
1
,
yxxx
xxxyxxxxxyyxxx
s
s
-
-
----
æö
-SS
S=
ç÷
ç÷
-SSS+SSSS
èø
where
21
.
yyyxxxxy
ss
-
=-SSS
Then
(
)
(
)
(
)
(
)
(
)
{
}
2
121
1
2
2
2
11
,'exp''.
2
2
yxxxxyxxxx
k
fyxyxxx
mmsmm
s
p
--
éù
éù
=----SS+-S-
êú
ëû
ëû
S
The marginal distribution of
'
x
is obtained by integrating
(
)
,'
fyx
over
y
and the resulting distribution is
(
)
1
k
-
variate multivariate normal distribution as
(
)
(
)
(
)
(
)
1
1
1
2
2
11
'exp'.
2
2
xxxx
k
xx
gxxx
mm
p
-
-
éù
=--S-
êú
ëû
S
The conditional probability density function of
y
given
'
x
is
(
)
(
)
(
)
(
)
(
)
{
}
2
'
1
2
2
,'
|'
'
11
exp
2
2
yxxxxy
fyx
fyx
gx
yx
mm
s
ps
-
=
éù
=----SS
êú
ëû
which is the probability density function of normal distribution with
· conditional mean
(
)
1
(|')'
yxxxxy
Eyxx
mm
-
=+-SS
and
· conditional variance
(
)
(
)
2
|'1
yy
Varyx
sr
=-
where
1
2
yxxxxy
yy
r
s
-
SSS
=
is the population multiple correlation coefficient.
In the model
01
',
yx
bbe
=++
the conditional mean is
(
)
(
)
'
01
01
|'|
'.
ii
EyxxEx
x
bbe
bb
=++
=+
Comparing this conditional mean with the conditional mean of normal distribution, we obtain the relationship with
0
b
and
1
b
as follows:
1
1
'
01
.
xxxy
yx
b
bmmb
-
=SS
=-
The likelihood function of
(
)
,'
yx
based on a sample of size
n
is
(
)
'
1
1
2
2
11
exp.
2
2
n
iyiy
n
nk
ixix
i
yy
L
xx
mm
mm
p
-
=
éù
ìü
--
æöæö
ïï
êú
=-S
íý
ç÷ç÷
--
êú
èøèø
ïï
S
îþ
ëû
å
Maximizing the log likelihood function with respect to
,,and,
yxxxxy
mm
SS
the maximum likelihood estimates of respective parameters are obtained as
1
23
1
'
1
1
1
1
(,,...,)
1
'
1
n
yi
i
n
xik
i
n
xxxxii
i
n
xyxyii
i
yy
n
xxxxx
n
Sxxnxx
n
Sxynxy
n
m
m
=
=
=
=
==
===
æö
S==-
ç÷
èø
æö
S==-
ç÷
èø
å
å
å
å
%
%
%
%
'
23
1
where (,,...,), is [(-1)(-1)] matrix w
ith elements ()() and is
1
[(-1)1] vector with elements ()().
iiiikxxtiitjjxy
t
tiii
t
xxxxSkkxxxxS
n
kxxyy
n
=´--
´--
å
å
Based on these estimates, the maximum likelihood estimators of
10
and
bb
are obtained as
(
)
1
1
01
1
0
1
'
''.
xxxy
SS
yx
XXXy
b
bb
b
b
b
-
-
=
=-
æö
==
ç÷
ç÷
èø
%
%%
%
%
%
Properties of the estimators of least squares estimator:
The estimation error of OLSE
(
)
1
''
bXXXy
-
=
of
b
is
(
)
(
)
(
)
(
)
1
1
1
''
''
''.
bXXXy
XXXX
XXX
bb
beb
e
-
-
-
-=-
=+-
=
Then assuming that
(
)
1
''
EXXX
-
éù
ëû
exists, we have
(
)
(
)
{
}
(
)
(
)
1
1
1
()''
''
''
0
EbEXXX
EEXXXX
EXXXE
be
e
e
-
-
-
éù
-=
ëû
éù
=
ëû
éù
=
ëû
=
because
(
)
1
''
XXX
-
and
e
are independent. So
b
is an unbiased estimator of
b
.
The covariance matrix of
b
is obtained as
(
)
(
)
(
)
(
)
(
)
(
)
(
)
{
}
(
)
(
)
(
)
(
)
(
)
(
)
11
11
11
11
2
1
2
'
''''
''''
''''
'''
'.
VbEbb
EXXXXXX
EEXXXXXXX
EXXXEXXXX
EXXXXXX
EXX
bb
ee
ee
ee
s
s
--
--
--
--
-
=--
éù
=
ëû
éù
=
ëû
éù
=
ëû
éù
=
ëû
éù
=
ëû
Thus the covariance matrix involves a mathematical expectation. The unknown
2
s
can be estimated by
(
)
(
)
2
'
ˆ
'
ee
nk
yXbyXb
nk
s
=
-
--
=
-
where
eyXb
=-
is the residual and
(
)
(
)
(
)
22
2
2
ˆˆ
'
.
EEEX
ee
EEX
nk
E
ss
s
s
éù
=
ëû
éù
æö
=
ç÷
êú
-
èø
ëû
=
=
Note that the OLSE
(
)
1
''
bXXXy
-
=
involves the stochastic matrix
X
and stochastic vector
y
, so
b
is not a linear estimator. It is also no more the best linear unbiased estimator of
b
as in the case when X is nonstochastic. The estimator of
2
s
as being conditional on given
X
is an efficient estimator.
Asymptotic theory:
The asymptotic properties of an estimator concerns the properties of the estimator when sample size
n
grows large.
For the need and understanding of asymptotic theory, we consider an example. Consider the simple linear regression model with one explanatory variable and
n
observations as
(
)
(
)
2
01
,0,,1,2,...,.
iiiii
yxEVarin
bbeees
=++===
The OLSE of
1
b
is
(
)
(
)
(
)
1
1
2
1
n
ii
i
n
i
i
xxyy
b
xx
=
=
--
=
-
å
å
and its variance is
(
)
2
1
.
Varb
n
s
=
If the sample size grows large, then the variance of
1
b
gets smaller. The shrinkage in variance implies that as sample size
n
increases, the probability density of OLSE
b
collapses around its mean because
()
Varb
becomes zero.
Let there are three OLSEs
123
,and
bbb
which are based on sample sizes
123
,and
nnn
respectively such that
123
,
nnn
<<
say. If c and
d
are some arbitrarily chosen positive constants, then the probability that the value of
b
lies within the interval
c
b
±
can be made to be greater than
(
)
1
d
-
for a large value of n. This property is the consistency of
b
which ensure that even if the sample is very large, then we can be confident with high probability that
b
will yield an estimate that is close to
b
.
Probability in limit
Let
ˆ
n
b
be an estimator of
b
based on a sample of size
n
. Let
g
be any small positive constant. Then for large
n
, the requirement that
n
b
takes values with probability almost one in an arbitrary small neighborhood of the true parameter value
b
is
ˆ
lim1
n
n
P
bbg
®¥
éù
-<=
ëû
which is denoted as
ˆ
plim
n
bb
=
and it is said that
ˆ
n
b
converges to
b
in probability. The estimator
ˆ
n
b
is said to be a consistent estimator of
b
.
A sufficient but not necessary condition for
ˆ
n
b
to be a consistent estimator of
b
is that
ˆ
lim
n
n
E
bb
®¥
éù
=
ëû
and
ˆ
lim0.
n
n
Var
b
®¥
éù
=
ëû
Consistency of estimators
Now we look at the consistency of the estimators of
b
and
2
.
s
(i) Consistency of b
Under the assumption that
'
lim
n
XX
n
®¥
æö
=D
ç÷
èø
exists as a nonstochastic and nonsingular matrix (with finite elements), we have
1
2
21
1'
lim()lim
1
lim
0.
nn
n
XX
Vb
nn
n
s
s
-
®¥®¥
-
®¥
æö
=
ç÷
èø
=D
=
This implies that OLSE converges to
b
in quadratic mean. Thus OLSE is a consistent estimator of
b
. This also holds true for maximum likelihood estimators also.
Same conclusion can also be proved using the concept of convergence in probability.
The consistency of OLSE can be obtained under the weaker assumption that
*
'
plim.
XX
n
æö
=D
ç÷
èø
exists and is a nonsingular and nonstochastic matrix and
'
plim0.
X
n
e
æö
=
ç÷
èø
Since
1
1
(')'
''
.
bXXX
XXX
nn
be
e
-
-
-=
æö
=
ç÷
èø
So
1
1
*
''
plim()plimplim
.0
0.
XXX
b
nn
e
b
-
-
æöæö
-=
ç÷ç÷
èøèø
=D
=
Thus
b
is a consistent estimator of
b
. The same is true for maximum likelihood estimators also.
(ii) Consistency of s2
Now we look at the consistency of
2
s
as an estimate of
2
s
. We have
2
1
1
11
1
'
1
'
1
1''(')'
''''
1.
see
nk
H
nk
k
XXXX
nn
kXXXX
nnnnn
ee
eeee
eeee
-
-
--
=
-
=
-
æö
éù
=--
ç÷
ëû
èø
éù
æöæö
=--
êú
ç÷ç÷
èøèø
êú
ëû
Note that
'
n
ee
consists of
22
i
1
1
and{,1,2,...,}
n
i
i
in
n
ee
=
=
å
is a sequence of independently and identically distributed random variables with mean
2
s
. Using the law of large numbers
2
11
1
*
212
2
'
plim
''''X''
plimplimplimplim
n
0..0
0
plim()(10)0
.
n
XXXXXXX
nnnnn
s
ee
s
eeee
s
s
--
-
-
æö
=
ç÷
èø
éùéù
æöæöæöæö
=
êúêú
ç÷ç÷ç÷ç÷
èøèøèøèø
êúêú
ëûëû
=D
=
éù
Þ=--
ëû
=
Thus
2
s
is a consistent estimator of
2
s
. The same holds true for maximum likelihood estimates also.
Asymptotic distributions:
Suppose we have a sequence of random variables
{
}
n
a
with a corresponding sequence of cumulative density functions
{
}
n
F
for a random variable
a
with cumulative density function
.
F
Then
n
a
converges in distribution to
a
if
n
F
converges to
F
point wise. In this case,
F
is called the asymptotic distribution of
.
n
a
Note that since convergence in probability implies the convergence in distribution, so
plim
D
nn
aaaa
=Þ¾¾®
(
n
a
tend to
a
in distribution), i.e., the asymptotic distribution of
n
a
is
F
which is the distribution of
a
.
Note that
(
)
:
E
a
Mean of asymptotic distribution
()
Var
a
: Variance of asymptotic distribution
lim()
n
n
E
a
®¥
: Asymptotic mean
2
limlim()
nn
nn
EE
aa
®¥®¥
éù
-
ëû
: Asymptotic variance.
Asymptotic distribution of sample mean and least squares estimation
Let
1
1
n
nni
i
YY
n
a
=
==
å
be the sample mean based on a sample of size
n
. Since sample mean is a consistent estimator of population mean
Y
, so
plim
n
YY
=
which is constant. Thus the asymptotic distribution of
n
Y
is the distribution of a constant. This is not a regular distribution as all the probability mass is concentrated at one point. Thus as sample size increases, the distribution of
n
Y
collapses.
Suppose consider only the one third observations in the sample and find sample mean as
3
*
1
3
.
n
ni
i
YY
n
=
=
å
Then
(
)
*
n
EYY
=
(
)
(
)
3
*
2
1
2
2
2
9
and
9
3
3
0as.
n
ni
i
VarYVarY
n
n
n
n
n
s
s
=
=
=
=
®®¥
å
Thus
*
plim
n
YY
=
and
*
n
Y
has the same degenerate distribution as
.
n
Y
Since
(
)
(
)
*
nn
VarYVarY
>
, so
*
n
Y
is preferred over
.
n
Y
Now we observe the asymptotic behaviour of
n
Y
and
*
.
n
Y
Consider a sequence of random variables
{}.
n
a
Thus for all
n
, we have
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
(
)
**
**
2
2
2
2
2
**2
0
0
3
3.
nn
nn
nn
nn
nn
nn
nYY
nYY
EnEYY
EnEYY
VarnEYYn
n
VarnEYYn
n
a
a
a
a
s
as
s
as
=-
=-
=-=
=-=
=-==
=-==
Assuming the population to be normal, the asymptotic distribution of
·
(
)
2
is0,
n
YN
s
·
(
)
*2
is0,3
n
YN
s
.
So now
n
Y
is preferable over
*
n
Y
. The central limit theorem can be used to show that
n
a
will have an asymptotically normal distribution even if the population is not normally distributed.
Also, since
(
)
(
)
(
)
(
)
2
~0,
~0,1
n
n
nYYN
nYY
ZN
s
s
-
-
Þ=
and this statement holds true in finite sample as well as asymptotic distributions.
Consider the ordinary least squares estimate
(
)
1
''of
bXXXy
b
-
=
in linear regression model
.
yX
be
=+
If
X
is nonstochastic then the finite covariance matrix of b is
21
()(').
VbXX
s
-
=
The asymptotic covariance matrix of b under the assumption that
'
lim
xx
n
XX
n
®¥
=S
exists and is nonsingular. It is given by
(
)
1
22
21
1'
lim'limlim
.0.
0
nnn
xx
XX
XX
nn
ss
s
-
®¥®¥®¥
-
æöæö
=
ç÷ç÷
èøèø
=S
=
which is a null matrix.
Consider the asymptotic distribution of
(
)
nb
b
-
. Then even if
e
is not necessarily normally distributed, then asymptotically
(
)
(
)
(
)
(
)
21
2
2
~0,
'
~.
xx
xx
k
nbN
nbb
bs
bb
c
s
-
-S
-S-
If
'
XX
n
is considered as an estimator of
,
xx
S
then
(
)
(
)
(
)
(
)
22
'
'
''
XX
nbb
bXXb
n
bb
bb
ss
--
--
=
is the usual test statistic as is in the case of finite samples with
(
)
(
)
1
2
~,'.
bNXX
bs
-
Econometrics | Chapter 13 | Asymptotic Theory and Stochastic Regressors | Shalabh, IIT Kanpur