86
High-Dimensional Sparse Econometric Models, an Introduction Alexandre Belloni ICE, July 2011 Alexandre Belloni High-Dimensional Sparse Econometrics

High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

  • Upload
    phungtu

  • View
    233

  • Download
    0

Embed Size (px)

Citation preview

Page 1: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

High-Dimensional Sparse EconometricModels,

an Introduction

Alexandre Belloni

ICE, July 2011

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 2: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Outline

1. High-Dimensional Sparse Econometric Models (HDSM)◮ Models◮ Motivating Examples for linear/nonparametric regression

2. Estimation Methods for linear/nonparametric regression◮ ℓ1-penalized estimators

◮ formulations◮ penalty parameter

◮ computational methods◮ post-ℓ1-penalized estimators◮ summary of rates of convergence

3. Plenty of model selection applications in Econometrics◮ Results for IV Estimators◮ Results for Quantile Regression◮ Results for Variable Length Markov Chain

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 3: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

High-Dimensional Sparse Models

◮ We have a large number of parameters p, potentially largerthan the sample size n

◮ The idea is that a low dimensional submodel captures veryaccurately all essential features of the full dimensionalmodel.

◮ The key assumption is that the number of relevantparameters is smaller than the sample size

◮ fundamentally, a model selection problem

◮ One question is whether sparse models make sense.◮ Another question is how to compute such models.◮ Another question is what guarantees can we establish.

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 4: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

1. High-Dimensional Sparse Econometric Model

HDSM. A response variable yi obeys

yi = x ′i β0 + ǫi , ǫi ∼ N(0, σ2), i = 1, ..., n

where xi are fixed p-dimensional regressors,

xi = (xij , j = 1, ..., p), En[x2ij ] = 1.

p possibly much larger than n.

The key assumption is sparsity:

s := ‖β0‖0 =

p∑

j=1

1{β0j 6= 0} ≪ n.

This generalizes the standard large parametric econometricmodel by letting the identity

T := support (β0)

of the relevant s regressors be unknown.Alexandre Belloni High-Dimensional Sparse Econometrics

Page 5: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

High p motivation

◮ transformations of basic regressors zi ,

xi = (P1(zi ), ..., Pp(zi ))

◮ for example, in wage regressions, Pj ’s are polynomials orB-splines in education and experience.

◮ or simply a very large list of regressors,◮ a list of country characteristics in cross-country growth

regressions (Barro & Lee)◮ housing characteristics in hedonic regressions

(American Housing Survey)◮ price and product characteristics at the point of purchase

(scanner data, TNS).

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 6: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Sparseness

◮ The key assumption is that the number of non-zeroregression coefficients is smaller than the sample size:

s := ‖β0‖0 =

p∑

j=1

1{β0j 6= 0} ≪ n.

◮ The idea is that a low s-dimensional submodel accuratelyapproximates the full p-dimensional model. Theapproximation error in fact is zero.

◮ Can easily relax this assumption and put in a non-zeroapproximation error.

◮ We can relax the Gaussianity of the error terms ǫi

substantially based on self-normalized moderate deviationtheory results

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 7: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

High-Dimensional Sparse Econometric Model

HDSM′. For yi the response variable and zi the fixedelementary regressors

yi = g(zi) + ǫi , E [ǫi ] = 0, E [ǫ2i ] = σ2, i = 1, ..., n

Consider a p-dimensional dictionary of series terms,

xi = (P1(zi ), ..., Pp(zi ))

for example, B-splines, polynomials, and interactions, where

p is possibly much larger than n.

Expand via generalized series: g(zi) = x ′i β0 + ri ,

◮ x ′i β0 is the best s−dimensional parametric approximation

s := ‖β0‖0 =

p∑

j=1

1{β0j 6= 0} ≪ n.

◮ ri is the approximation error.Alexandre Belloni High-Dimensional Sparse Econometrics

Page 8: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

High-Dimensional Sparse Econometric Model

HDSM′. As a result we have an approximately sparse model:

yi = x ′i β0 + ri + ǫi , i = 1, . . . , n

The parametric part x ′i β0 is the target, with s non-zero

coefficients.We choose s to balance the approximation error with the oracleestimation error:

En[r2i ] ∝ σ2 s

n.

◮ This model generalizes the exact sparse model, by lettingin approximation error and non-Gaussian errors.

◮ This model also generalizes the standard series regressionmodel by letting the identity

T = support (β0)

of the most important s series terms be unknown.

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 9: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Example 1: Series Models of Wage Function

◮ In this example, abstract away from the estimationquestions.

◮ Consider a series expansion of the conditional expectationE [yi |zi ] of wage yi given education zi .

◮ A conventional series approximation to the regressionfunction is, for example,

E [yi |zi ] = β1P1(zi ) + β2P2(zi) + β3P3(zi) + β4P4(zi) + ri

where P1, ..., P4 are first low-order polynomials or splineswith a few knots.

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 10: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

8 10 12 14 16 18 20

6.0

6.5

7.0

education

wage

Traditional Approximation of Expected Wage Function using Polynomials

◮ In the figure, true regression function E [yi |zi ] computedusing U.S. Census data.

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 11: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

◮ With the same number of parameters, can we a find amuch better series approximation?

◮ For this to be possible, some high-order series terms in theexpansion

E [yi |zi ] =

p∑

k=1

β0kPk (zi) + r ′i

must have large coefficients.◮ This indeed happens in our example, since E [yi |zi ] exhibits

sharp oscillatory behavior in some regions, which lowdegree polynomials fail to capture.

◮ And we find the “right” series terms by usingℓ1-penalization methods

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 12: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

8 10 12 14 16 18 20

6.0

6.5

7.0

education

wage

Lasso Approximation of Expected Wage Function using Polynomials and Dummies

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 13: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

8 10 12 14 16 18 20

6.0

6.5

7.0

education

wage

Traditional vs Lasso Approximation of Expected Wage Functions

with Equal Number of Parameters

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 14: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Errors of Conventional and the lasso-Based SparseApproximations

L2 error L∞ errorConventional Series Approximation 0.135 0.290lasso-Based Series Approximation 0.031 0.063

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 15: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Example 2: Cross-Country Growth Regression

◮ Relation between growth rate and initial per capita GDP

GrowthRate = α0 + α1 log(GDP) + ǫ

◮ Test the convergence hypothesis, α1 < 0 , poor countriescatch up with richer countries. Prediction from the classicalSolow growth model.

◮ Linear regression only with log(GDP) yields:

GrowthRate = 0.0352 + 0.0013 log(GDP) + ǫ

◮ Test the convergence hypothesis conditional on covaritesdescribing institutions and technological factors:

GrowthRate = α0 + α1 log(GDP) +

p∑

j=1

βjXj + ǫ

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 16: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Example 2: Cross-Country Growth Regression

In Barro-Lee data, we have p = 60 covariates, n = 90observations. Need to select significant covariates among 60.

Penalization Real GDP per capita (log) is in all modelsParameter

λ Additional Selected Variables2.3662 Black Market Premium (log)1.3935 Black Market Premium (log)

Political Instability0.929 Black Market Premium (log)

Political InstabilityRatio of nominal government expenditure on defense

to nominal GDPRatio of import to GDP

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 17: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Example 2: Cross-Country Growth Regression

◮ We find a sparse model for conditioning effects using ℓ1-penalization. We re-fit the resulting model using LS.

◮ We find that the coefficient on lagged GDP is negative, andthe confidence intervals, adjusted for model selection,exclude zero.

Coefficient Confidence Setlagged GDP [-0.0261, -0.0007]

◮ Our findings fully support the conclusions reached in Barroand Lee and Barro and Sala-i-Martin.

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 18: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

2. Estimation via ℓ1-penalty methods

The canonical LS which minimizes

Q̂(β) = En[(yi − x ′i β)2]

is not consistent when dim(β) = p ≥ n.We can try minimizing an AIC/BIC type criterion function

Q̂(β) + λ‖β‖0, ‖β‖0 =

p∑

j=1

1{βj 6= 0}

but this is not computationally feasible when p is large.

Essentially a search over too many (non-nested) models.

Technically, it is a combinatorial problem known to be NP-hard.

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 19: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

2. Estimation via ℓ1-penalty methods

It was required to balance and understand simultaneously

computational issues︸ ︷︷ ︸hard combinatorial search

and statistical issues︸ ︷︷ ︸lack of global identification

Understanding that statistical aspects can help computationalmethods and vice-versa was crucial.

Further developments in model selection estimation will relyheavily on computational advances.

(MCMC is an example of a computational method with aprofound impact on the statistics/econometrics literature)

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 20: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

2. Estimation via ℓ1-penalty methods

avoid computing: Q̂(β)+λ‖β‖0 but get something meaninful

A solution (Tibshirani, JRSS, 96) is to replace theℓ0-“norm” by the closest convex function, the ℓ1-norm:

‖β‖1 =

p∑

j=1

|βj |

◮ lasso estimator β̂ then minimizes

Q̂(β) + λ‖β‖1

◮ this change makes the regularization function to be convex◮ good computational methods for large scale problems◮ many ways to combine the fit and regularization functions

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 21: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Formulations

Idea: balance fit Q̂(β) and penalty ‖β‖1 using a parameter λ

Several ways to do it:

minβ Q̂(β) + λ‖β‖1 lasso

minβ Q̂(β) : ‖β‖1 ≤ λ sieve

minβ ‖β‖1 : Q̂(β) ≤ λ ℓ1-minimization method

minβ ‖β‖1 : ‖∇Q̂(β)‖∞ ≤ λ Dantzig selector

minβ

√Q̂(β) + λ‖β‖1

√lasso

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 22: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Penalty parameter λ

Motivation/heuristic is to choose the parameter λ so that β0 isfeasible for the problem but not too many more points.

sieve: ideally set λ = ‖β0‖1

ℓ1-minimization: ideally set λ = σ2

Dantzig selector: ideally set λ = ‖∇Q̂(β0)‖∞ = ‖2En[ǫixi ]‖∞

Note that lasso and√

lasso are unrestricted minimizations.

Their penalties are chosen to majorate the score at the truevalue, respectively

λ > ‖∇Q̂(β0)‖∞ and λ > ‖∇√

Q̂(β0)‖∞

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 23: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Heuristics for lasso via Convex Geometry

−5 −4 −3 −2 −1 0 1 2 3 4 5

βT c

true value(β0T c = 0)

Q̂(β)

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 24: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Heuristics for lasso via Convex Geometry

−5 −4 −3 −2 −1 0 1 2 3 4 5

βT c

Q̂(β)

true value(β0T c = 0)

Q̂(β0) + ∇Q̂(β0)′βT c

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 25: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Heuristics for lasso via Convex Geometry

−5 −4 −3 −2 −1 0 1 2 3 4 5

βT c

Q̂(β)λ = ‖∇Q̂(β0)‖∞

λ‖β‖1

true value(β0T c = 0)

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 26: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Heuristics for lasso via Convex Geometry

−5 −4 −3 −2 −1 0 1 2 3 4 5

βT c

Q̂(β)

true value(β0T c = 0)

λ‖β‖1

λ > ‖∇Q̂(β0)‖∞

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 27: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Heuristics via Convex Geometry

−5 −4 −3 −2 −1 0 1 2 3 4 5

βT c

Q̂(β) + λ‖β‖1

true value(β0T c = 0)

λ > ‖∇Q̂(β0)‖∞

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 28: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Penalty parameters for lasso and√

lasso

◮ In the case of lasso, the optimal choice of penalty level λ,guaranteeing near-oracle performance, is

λ =σ · 2√

2 log p/n ≥P ‖2En[ǫixi ]‖∞ = ‖∇Q̂(β0)‖∞.

◮ The choice relies on knowing σ, which may be apriori hardto estimate when p ≫ n.

◮ In the case of√

lasso the score is a pivotal random variable,with distribution independent of unknown parameters:

∥∥∥∥∇√

Q̂(β0)

∥∥∥∥∞

=‖En[ǫixi ]‖∞√

En[ǫ2i ]

=‖En[(ǫi/σ)xi ]‖∞√

En[(ǫi/σ)2],

◮ hence for√

lasso the penalty λ does not depend on σ.

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 29: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Computational Methods for lasso

Big Picture of Computational Methods

Method Problem Size Comp. ComplexityComponentwise extremely large convergence

First-order very large pn(

log pε

)

(up to 100000)Interior-point large (p + n)3.5 log

(p+nε

)

(up to 5000)

Remark:Several implementations for these methods are available:

◮ SDPT3 or SeDuMi for interior-point methods◮ TFOCS for first-order methods◮ several downloadable codes for componentwise methods

(easy to implement)Alexandre Belloni High-Dimensional Sparse Econometrics

Page 30: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Computational Methods for lasso: Componentwise

Componentwise Method.A common approach to solve multivariate optimization:

(i) pick a component and fix all remaining components,

(ii) improve the objective function along the chosencomponent,

(iii) loop steps (i)-(iii) until convergence is achieved.

This has been widely used when the minimization over a singlecomponent can be done very efficiently.

This is the case of lasso.

Its simple implementation also contributed for the widespreaduse of this approach.

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 31: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Computational Methods for lasso: Componentwise

For a current iterate β, let β−j = (β1, . . . , βj−1, 0, βj+1, . . . , βp)′:

◮ if 2En[xij(yi − x ′i β−j)] > λ, the minimum over βj is

βj =(−2En[xij (yi − x ′

i β−j)] + λ)/En[x2

ij ];

◮ if 2En[xij(yi − x ′i β−j)] < −λ the minimum over βj is

βj =(−2En[xij (yi − x ′

i β−j)] − λ)/En[x2

ij ];

◮ if 2|En[xij(yi − x ′i β−j)]| ≤ λ the minimum over βj is

βj = 0.

Remark: Similar formulas can be computed for other methodslike

√lasso.

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 32: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Computational Methods for lasso: first-order methods

First-order methods. The new generation of first-ordermethods focus on structured convex problems that can be castas

minw

f (A(w) + b) + h(w) or minw

h(w) : A(w) + b ∈ K .

where◮ f is a smooth function◮ h is a structured function that is possibly non-differentiable

or with extended values◮ K is a structure convex set (typically a convex cone).

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 33: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Computational Methods for lasso: first-order methods

lasso is cast as

minw

f (A(w) + b) + h(w)

with

f (·) =‖ · ‖2

n, h(·) = λ‖ · ‖1, A = X , and b = −Y

Thus,

minβ

‖Xβ − Y‖2

n+ λ‖β‖1

Remark: A number of different first-order methods have beenproposed and analyzed. We just mention briefly one.

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 34: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Computational Methods for lasso: first-order methods

On every iteration for a given µ > 0 and current point βk is

β(βk ) = arg minβ

2En[xi (yi − x ′i β

k )]′β +12µ‖β − βk‖2 +

λ

n‖β‖1.

It follows that the minimization in β above is separable and canbe solved by soft-thresholding, letting

mkj := βk

j + 2En[xij(yi − x ′i β

k )]/µ

We update our iterate as

βj(βk ) = sign

(mk

j

)max

{∣∣∣mkj

∣∣∣ − λ/µ, 0}

.

Remark: The additional regularization term µ‖β − βk‖2 bringcomputational stability that allows the algorithm to enjoy bettercomputational complexity properties.

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 35: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Computational Methods for lasso: ipm

Interior-point methods. Interior-point methods (ipms) solverstypically focus on solving conic programming problems instandard form,

minw

c′w : Aw = b, w ∈ K . (2.1)

The conic constraint will be biding at the optimal solution.These methods use a “barrier function” FK for the cone K that is

strictly convex, smooth and FK (w) → ∞ when w → ∂K ,

and for some µ > 0 proceed to compute

w(µ) = arg minw

c′w + µFK (w) : Aw = b. (2.2)

Letting µ → 0, w(µ) converges to the solution (2.1).We can recover an approximate solution for (2.1) efficiently byusing an approximate solution for w(µk ) as a warm-start toapproximate w(µk+1), µk > µk+1.

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 36: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Computational Methods for lasso: ipm

Not all barriers are created equally, solving the problems

w(µ) = arg minw

c′w + µFK (w) : Aw = b. (2.3)

will be tractable only if FK is a special barrier (depends on K ).Three cones play a major role in optimization theory:

◮ Non-negative orthant:IRp

+ = {w ∈ IRp : wj ≥ 0, j = 1, . . . , p},FK (w) = −

∑pj=1 log(wi)

◮ Second-order cone:Qn+1 = {(w , t) ∈ IRn × IR+ : t ≥ ‖w‖},FK (w) = − log(t − ‖w‖)

◮ Semi-definite positive matrices:Sk×k

+ = {W ∈ IRk×k : W = W ′, v ′Wv ≥ 0, for all v ∈ IRk},FK (w) = − log det(W )

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 37: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Computational Methods for lasso: ipm

These cones (and three others) are called self-scaled cones.

Their cartesian products are also self-scaled.

Therefore, when using ipms, it is VERY importantcomputationally to arrive a formulation that is not only convexbut also one that relies on self-scaled cones.

It turns out that many tractable penalization methods (includingℓ1 but others as well) can be formulated with these three cones.

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 38: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Computational Methods for lasso: ipm

lasso minβ

Q̂(β) + λ‖β‖1

to formulate the ‖ · ‖1, let β = β+ − β−, where β+, β− ∈ IRp+.

for any vector v ∈ IRn and any scalar t ≥ 0 we have that

v ′v ≤ t is equivalent to ‖(v , (t − 1)/2︸ ︷︷ ︸=:u

)‖ ≤ (t + 1)/2︸ ︷︷ ︸=:z

.

mint,β+,β−,u,z,v

tn

+ λ

p∑

j=1

β+

j + β−j

v = Y − Xβ+ + Xβ−

t = −1 + 2zt = 1 + 2u(v , u, z) ∈ Qn+2, t ≥ 0, β+ ∈ IRp

+, β− ∈ IRp+.

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 39: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Quick comparison on a very sparse model:s = 5, β0j = 1 for j = 1, . . . , s, β0j = 0 for j > s.

Table: Average running time over 100 simulations. Differentprecisions across methods.

n = 100, p = 500 Componentwise First-order Interior-pointlasso 0.2173 10.99 2.545√lasso 0.3268 7.345 1.645

n = 200, p = 1000 Componentwise First-order Interior-pointlasso 0.6115 19.84 14.20√lasso 0.6448 19.96 8.291

n = 400, p = 2000 Componentwise First-order Interior-pointlasso 2.625 84.12 108.9√lasso 2.687 77.65 62.86

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 40: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Post-Model Selection Estimator

Define the post-lasso estimator as follows:

◮ Step 1:Select the model using the lasso, T̂ := support(β̂).

◮ Step 2:Apply ordinary least square (LS) to the selected model T̂ .

Remark: We could use√

lasso, Dantzig selector, sieve orℓ1-minimization instead of lasso.

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 41: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Post-Model Selection Estimator Geometry

β1

β2

β0β̂

minβ∈Rp

‖β‖1 : Q̂(β) ≤ γ,

In this example the correct model is selected, but note the shrinkagebias. The shrinkage bias can be removed by post model selection

estimator, i.e. applying OLS to the selected model.

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 42: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Post-Model Selection Estimator Geometry

β1

β2

β0β̂

minβ∈Rp

‖β‖1 : Q̂(β) ≤ γ,

In this example the correct model was not selected. The selectedmodel contains a “junk” regressor which can impact the post model

selection estimator.

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 43: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Issues on post model selection estimator

◮ lasso and others ℓ1-penalized methods will successfully“zero out” lots of irrelevant regressors.

◮ however, these procedures are not perfect at modelselection – might include “junk”, exclude some relevantregressors,

◮ and these procedures bias/shrink the non-zero coefficientestimates towards zero.

◮ post model selection estimators are motivated to removethe bias towards zero

◮ but results need to account for possible misspecification inthe selected model.

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 44: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Regularity Condition

◮ A simple sufficient condition is as follows.Condition RSE. Take any C > 1.Then for all n large enough, the sC-sparse eigenvalues of the

empirical design matrix En[xix ′i ],

are bounded away from zero and from above; namely,

0 < κ ≤ min‖δ‖0≤sC

En[x ′i δ]

2

‖δ‖2 ≤ max‖δ‖0≤sC

En[x ′i δ]

2

‖δ‖2 ≤ κ′ < ∞.(2.4)

◮ This holds under i.i.d. sampling if E [xix ′i ] has eigenvalues

bounded away from zero and above, and:• xi has light tails (i.e., log-concave) and s log p = o(n);• or bounded regressors maxij |xij | ≤ K and s log4 p = o(n).

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 45: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Result 1: Rates for ℓ1-penalized methods

Theorem (Rates)Under condition RSE, wp → 1

√En[x ′

i β̂ − x ′i β0]2 . σ

√s log(n ∨ p)

n

◮ the rate is very close to the oracle rate√

s/n, obtainablewhen we know the “true” model T .

◮ p shows up only through a logarithmic term log p.◮ for Dantzig selector:

◮ Candes and Tao (Annals of Statistics 2007)◮ for lasso:

◮ Meinshausen and Yu (Annals of Statistics 2009),◮ Bickel, Ritov, Tsybakov (Annals of Statistics 2009)

◮ for√

lasso:◮ B., Chernozhukov and Wang (Biometrika, 2011)

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 46: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Result 2: Post-Model Selection Estimator

Theorem (Rate for Post-Model Selection Estimator)Under conditions of Result 1, wp → 1

√En[x ′

i β̂PL − x ′i β0]2 . σ

√sn

log(n ∨ p),

Under some further exceptional cases faster, up to σ√

sn .

◮ Even though lasso does not in general perfectly select therelevant regressors, post-lasso performs at least as well.

◮ analysis account for a selected model that can be misspecified◮ This result was first derived in the context of median regression

◮ B. and Chernozhukov (Annals of Statistics, 2011),◮ and extended to least squares by

◮ B. and Chernozhukov (R & R Bernoulli, 2011).

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 47: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Monte Carlo

◮ In this simulation we used

s = 6, p = 500, n = 100

yi = x ′i β0 + ǫi , ǫi ∼ N(0, σ2),

β0 = (1, 1, 1/2, 1/3, 1/4, 1/5, 0, . . . , 0)′

xi ∼ N(0,Σ), Σij = (1/2)|i−j |, σ2 = 1

◮ Ideal benchmark: Oracle estimator which runs OLS of yi

on xi1, ..., xi6. This estimator is not feasible outsideMonte-Carlo.

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 48: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Monte Carlo Results: Risk

Risk:√

E [(x ′i β̂ − x ′β0)2]

n = 100, p = 500

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

LASSO Post-LASSO Oracle

Estimation Risk

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 49: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Monte Carlo Results: Bias

Bias: ‖E [β̂] − β0‖2

n = 100, p = 500

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

LASSO Post-LASSO

Bias

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 50: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Applications of model selection

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 51: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Gaussian Instrumental Variable

Gaussian IV model

yi = d ′i α + ǫi ,

di = D(zi) + vi , (first stage)

(ǫi

vi

)∼ N

(0,

(σ2

ǫ σǫv

σǫv σ2v

))

◮ can have additional controls wi in both equations;suppressed here, see references for details

◮ want to estimate “optimal” instrument D(zi) = E [di |zi ]using a large list of “technical instruments,”

xi := (xi1, ..., xip)′ := (P1(zi ), ..., Pp(zi ))′, and we can write

D(zi) = x ′i β0 + ri , with ‖β0‖0 = s, En[r2

i ] . σ2vs/n

where p is large, possibly much larger than n.◮ use estimated optimal instrument to estimate α.

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 52: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

2SLS with LASSO/Post-LASSO estimated Optimal IV

2SLS with LASSO/Post-LASSO estimated Optimal IV

◮ Step 1:Estimate D̂(zi ) = x ′

i β̂ using LASSO or Post-LASSOestimator.

◮ Step 2:α̂ = [En[diD̂(zi)

′]]−1En[D̂(zi)yi ]

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 53: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

2SLS with LASSO-based IV

Theorem (2SLS with lasso-based IV)Under conditions of Result 1, and s2 log2 p = o(n), we have

(Λ̂∗)−1/2√n(α̂ − α) →d N(0, I),

where Λ̂∗ := σ̂2{En[D̂(zi)D̂(zi)′]}−1.

◮ For more results, see B., Chernozhukov, and Hansen(submitted, ArXiv 2010)

◮ The requirement s2 log2 p = o(n) can be relaxed tos log p = o(n) if split IV estimator is used.

◮ Extension to non-Gaussian and heteroskedastic errors in B.,Chen, Chernozhukov and Hansen (submitted, ArXiv 2010)required adjustments in the penalty function

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 54: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Application of IV: Eminent Domain

Estimate economic consequences of government take-over ofproperty rights from individuals

◮ yi = economic outcome in a region i , e.g. housing priceindex

◮ di = number of property take-overs decided in a court oflaw, by panels of 3 judges

◮ xi = demographic characteristics of judges, that arerandomly assigned to panels: education, politicalaffiliations, age, experience etc.

◮ fi = xi + various interactions of components of xi ,◮ a very large list p = p(fi) = 344

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 55: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Application of IV: Eminent Domain, Continued.

◮ Outcome is log of housing price index; endogenousvariable is government take-over

◮ Can use 2 elementary instruments, suggested by reallawyers (Chen and Yeh, 2010)

◮ Can use all 344 instruments and select approximately theright set using LASSO.

Estimator Instruments Price Effect Rob Std Error2SLS 2 .07 .0322SLS / LASSO IVs 4 .05 .017

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 56: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

ℓ1-Penalized Quantile Regression

◮ a response variable y and p-dimensional covariates x◮ u-th conditional quantile function of y given x is given by

x ′β(u), β(u) ∈ Rp.

◮ the dimension p is large, possibly much larger than thesample size n

◮ the true model β(u) is sparse having only s < n non-zerocomponents; we don’t know which.

◮ in population, β(u) minimizes

Qu(β) = E[ρu(y − x ′β)],

where ρu(t) = (u − 1{t ≤ 0})t = ut+ + (1 − u)t−.

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 57: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

ℓ1-Penalized Quantile Regression

◮ canonical estimator that minimizes

Q̂u(β) = n−1n∑

i=1

ρu(yi − x ′i β)

is not consistent when p ≥ n.◮ ℓ1-penalized quantile regression β̂(u) solves

minβ∈Rp

Q̂u(β) + λu‖β‖1.

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 58: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

ℓ1-QR Rate Result

Theorem (Rates)Under standard conditions, letting U ⊂ (0, 1) be a compact set,wp→ 1

supu∈U

‖β̂(u) − β(u)‖2 .P

√s log(n ∨ p)

n.

◮ very similar to the uniform rate√

s log n/n when thesupport is known

◮ similar rates to lasso but also hold uniformly over u ∈ U◮ although similar rates substantially different analysis

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 59: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Application ℓ1-QR: Conditional Independence

Let X ∈ IRm denote a random vector. We are interest indetermining conditional independence structure of X , for everypair of components i , j determine if

Xi ⊥⊥ Xj | X−i ,−j .

Conditional independence between two components of aGaussian distributed random vector X ∼ N(0,Σ) has been wellunderstood:For Σ = Var(X ), Xi and Xj are independent conditioned on theother components if and only if Σ−1

ij = 0.

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 60: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Application ℓ1-QR: Conditional Independence

Let X ∈ IRm denote a random vector. We are interest indetermining conditional independence structure of X , for everypair of components i , j determine if

Xi ⊥⊥ Xj | X−i ,−j .

We consider a non-Gaussian high-dimensional setting.

Xi ⊥⊥ Xj |X−i ,−j

if and only if

QXi(u|X−i) = QXi

(u|X−i ,−j) for all u ∈ (0, 1).

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 61: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Application ℓ1-QR: Conditional Independence

In order to approximate conditional independence consider: wesay that Xi and Xj are U -quantile uncorrelated if

β i(u) ∈ arg min E [ρu(Xi − β′X−i)] and

β j(u) ∈ arg min E [ρu(Xj − β′X−j)]

are such that β ij (u) = 0 and β j

i (u) = 0 for all u ∈ U .

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 62: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Application ℓ1-QR: Conditional Independence

In order to approximate conditional independence consider:◮ U ⊂ (0, 1) a compact set of quantile indices◮ a class of functions F

We will say thatXi ⊥⊥U ,F Xj | X−i ,−j

if and only if for every u ∈ U

f i ,ju (0, X−i ,−j) ∈ arg min

f∈FE [ρu(Xi − f (Xj , X−i ,−j))] and

f j ,iu (0, X−i ,−j) ∈ arg min

f∈FE [ρu(Xj − f (Xi , X−i ,−j))]

This approach requires:◮ series approximation for F ;◮ uniform estimation across all quantile indices u ∈ U ;◮ model selection.

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 63: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Application ℓ1-QR: Conditional Independence

Let X be a vector of log returns. Edge (i , j) if Xi 6⊥⊥ Xj |X−i ,−j

EWST

BTFGAEPI

JJSF

DGSE

PLXS

CAU

RMCF

HGR

SUNW

KDE

ORCL

MSFT

TW

OSIP

FUND

LLTC

DJCO

DELL

IBM

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 64: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Application ℓ1-QR: Conditional Independence

Consider IBM returns first.

EWST

BTFGAEPI

JJSF

DGSE

PLXS

CAU

RMCF

HGR

SUNW

KDE

ORCL

MSFT

TW

OSIP

FUND

LLTC

DJCO

DELL

IBM

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 65: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Application ℓ1-QR: Conditional Independence

Next consider all returns.

EWST

BTFGAEPI

JJSF

DGSE

PLXS

CAU

RMCF

HGR

SUNW

KDE

ORCL

MSFT

TW

OSIP

FUND

LLTC

DJCO

DELL

IBM

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 66: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Application ℓ1-QR: Conditional Independence

Next we investigate how this dependence structure is affectedby “downside movement” of the market.This is particulary relevant for hedging since during downsidemovements of the market is when hedge is most needed.

We quantify downside movement by conditioning on theobservations with the largest 25%, 50%, 75% and 100%downside movement of the market.(The 100% corresponds to the overall dependence.)

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 67: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Application ℓ1-QR: Conditional Independence

100% (No downside movement)

EWST

BTFGAEPI

JJSF

DGSE

PLXS

CAU

RMCF

HGR

SUNW

KDE

ORCL

MSFT

TW

OSIP

FUND

LLTC

DJCO

DELL

IBM

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 68: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Application ℓ1-QR: Conditional Independence

75%

EWST

BTFGAEPI

JJSF

DGSE

PLXS

CAU

RMCF

HGR

SUNW

KDE

ORCL

MSFT

TW

OSIP

FUND

LLTC

DJCO

DELL

IBM

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 69: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Application ℓ1-QR: Conditional Independence

50%

EWST

BTFGAEPI

JJSF

DGSE

PLXS

CAU

RMCF

HGR

SUNW

KDE

ORCL

MSFT

TW

OSIP

FUND

LLTC

DJCO

DELL

IBM

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 70: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Application ℓ1-QR: Conditional Independence

25%

EWST

BTFGAEPI

JJSF

DGSE

PLXS

CAU

RMCF

HGR

SUNW

KDE

ORCL

MSFT

TW

OSIP

FUND

LLTC

DJCO

DELL

IBM

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 71: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Application ℓ1-QR: Conditional Independence

100% vs. 25%

EWST

BTFGAEPI

JJSF

DGSE

PLXS

CAU

RMCF

HGR

SUNW

KDE

ORCL

MSFT

TW

OSIP

FUND

LLTC

DJCO

DELL

IBMEWST

BTFGAEPI

JJSF

DGSE

PLXS

CAU

RMCF

HGR

SUNW

KDE

ORCL

MSFT

TW

OSIP

FUND

LLTC

DJCO

DELL

IBM

The right graph is much sparser than the left graph. Thus, whenmarkets are moving down several variables become (nearly)conditionally independent. Therefore, hedging strategies thatdo not accounting for downside movements might not offer thedesired protection exactly when it is needed the most.

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 72: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Variable Length Markov Chains

Consider a dynamic model:

◮ dynamic discrete choice models◮ discrete stochastic dynamic programming

The main feature we are interest:

history of outcomes potentially impactsthe probability distribution of the next outcome

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 73: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Variable Length Markov Chains

Consider at each time t agents choose: st ∈ A = {Low , High}.Typical assumption: Markov Chain of order 1 (2 cond. prob.)

t = 0

lllllllllllll

RRRRRRRRRRRRR

t = −1 L H

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 74: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Variable Length Markov Chains

Consider at each time t agents choose: st ∈ A = {Low , High}.Markov Chain of order 2 requires to estimate 4 cond. prob.

t = 0

lllllllllllll

RRRRRRRRRRRRR

t = −1 L

yyyy

yyyy

EEEE

EEEE

H

yyyy

yyyy

EEEE

EEEE

t = −2 LL LH HL HH

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 75: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Variable Length Markov Chains

Consider at each time t agents choose: st ∈ A = {Low , High}.Markov Chain of order 3 requires to estimate 8 cond. prob.

t = 0

lllllllllllll

RRRRRRRRRRRRR

t = −1 L

yyyy

yyyy

EEEE

EEEE

H

yyyy

yyyy

EEEE

EEEE

t = −2 LL

yyyy

yyyy

LH

yyyy

yyyy

HL

EEEE

EEEE

HH

EEEE

EEEE

t = −3 LLL LLH LHL LHH HLL HLH HHL HHH

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 76: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Variable Length Markov Chains

Consider at each time t agents choose: st ∈ A = {Low , High}.Markov Chain of order K requires to estimate 2K cond. prob.

t = 0

lllllllllllll

RRRRRRRRRRRRR

t = −1 L

yyyy

yyyy

EEEE

EEEE

H

yyyy

yyyy

EEEE

EEEE

t = −2 LL

yyyy

yyyy

LH

yyyy

yyyy

HL

EEEE

EEEE

HH

EEEE

EEEE

t = −3 LLL

%%%%%

�����

LLH

%%%%%

�����

LHL

%%%%%

�����

LHH

%%%%%

�����

HLL

%%%%%

�����

HLH

%%%%%

�����

HHL

%%%%%

�����

HHH

%%%%%

�����

t = −K ......

......

......

......

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 77: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Variable Length Markov Chains

Issues:

◮ K is unknown (potentially infinite)◮ full K -order Markov chain becomes large quickly◮ similar histories may lead to similar transition probabilities

Motivation:◮ bias (small model) vs. variance (large model)◮ make the order of the Markov Chain depend on the history◮ model selection balances:

misspecification and length of confidence intervals

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 78: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Variable Length Markov Chains

Variable Length Markov Chain:very flexible but need to select the “context tree”

t = 0

lllllllllllll

RRRRRRRRRRRRR

t = −1 L

yyyy

yyyy

EEEE

EEEE

H

yyyy

yyyy

EEEE

EEEE

t = −2 LL LH

yyyy

yyyy

HL HH

t = −3 LHL LHH

%%%%%

�����

...

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 79: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Variable Length Markov Chains

A penalty method based on (uniform) confidence intervals.

Uniform estimation of transition probabilities over all possiblehistories x ∈ A−1

−∞:

supx∈A−1

−∞

‖P̂(·|x) − P(·|x)‖∞

(bounds achieve the minimax rates for some classes)

Results allow to establish:◮ in dynamic discrete choice models:

uniform estimation of dynamic marginal effects◮ in discrete stochastic dynamic programming:

uniform estimation of the value function

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 80: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Conclusion

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 81: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Conclusion

1. ℓ1 and post-ℓ1-penalization methods◮ offer a good way to control for confounding factors;◮ offer a good way to select the series terms;◮ are especially useful when p ≥ n

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 82: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Conclusion

1. ℓ1 and post-ℓ1-penalization methods◮ offer a good way to control for confounding factors;◮ offer a good way to select the series terms;◮ are especially useful when p ≥ n

2. ℓ1-penalized estimators achieve near-oracle rates

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 83: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Conclusion

1. ℓ1 and post-ℓ1-penalization methods◮ offer a good way to control for confounding factors;◮ offer a good way to select the series terms;◮ are especially useful when p ≥ n

2. ℓ1-penalized estimators achieve near-oracle rates

3. post-ℓ1-penalized estimators achieve at least near-oraclerates

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 84: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

Conclusion

1. ℓ1 and post-ℓ1-penalization methods◮ offer a good way to control for confounding factors;◮ offer a good way to select the series terms;◮ are especially useful when p ≥ n

2. ℓ1-penalized estimators achieve near-oracle rates

3. post-ℓ1-penalized estimators achieve at least near-oraclerates

4. applications of model selection in econometrics estimators

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 85: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

References

◮ Bickel, Ritov and Tsybakov, “Simultaneous analysis of Lasso andDantzig selector”, Annals of Statistics, 2009.

◮ Candes and T. Tao, “The Dantzig selector: statistical estimation when pis much larger than n,” Annals of Statistics, 2007.

◮ Meinshausen and Yu, “Lasso-type recovery of sparse representationsfor high-dimensional data,” Annals of Statistics, 2009.

◮ Tibshirani, “Regression shrinkage and selection via the Lasso,” J. Roy.

Statist. Soc. Ser. B, 1996.

Alexandre Belloni High-Dimensional Sparse Econometrics

Page 86: High-Dimensional Sparse Econometric Models, an Introductionice.uchicago.edu/2011_presentations/29_Belloni/Sparse_Lecture_ICE... · High-Dimensional Sparse Econometric Models, an Introduction

References

◮ Belloni and Chernozhukov, “ℓ1-Penalized Quantile Regression for HighDimensional Sparse Models,” 2007, Annals of Statistics.

◮ Belloni and Chernozhukov, “Post-ℓ1-Penalized Estimators inHigh-Dimensional Linear Regression Models”, ArXiv, 2009.

◮ Belloni, Chernozhukov and Wang, “√

LASSO: Pivotal Recovery ofSparse Signals via Conic Programming,” Biometrika, 2011.

◮ Belloni and Chernozhukov, “High Dimensional Sparse EconometricModels: An Introduction,” Ill-Posed Inverse Problems andHigh-Dimensional Estimation, Springer Lecture Notes in Statistics,2011.

◮ Belloni, Chernozhukov, and Hansen, “Estimation and InferenceMethods for High-dimensional Sparse Econometric Models: a Review,”2011, submitted.

◮ Belloni and Oliveira, “Approximate Group Context Tree: application to

dynamic programming and dynamic choice models,” 2011, submitted.

Alexandre Belloni High-Dimensional Sparse Econometrics