69
QUANTILE AND PROBABILITY CURVES WITHOUT CROSSING Victor Chernozhukov Iván Fernández-Val Alfred Galichon THE INSTITUTE FOR FISCAL STUDIES DEPARTMENT OF ECONOMICS, UCL cemmap working paper CWP10/07

Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and
QUANTILE AND PROBABILITY CURVES WITHOUTCROSSING

Victor ChernozhukovIván Fernández-Val

Alfred Galichon

THE INSTITUTE FOR FISCAL STUDIESDEPARTMENT OF ECONOMICS, UCL

cemmap working paper CWP10/07

Page 2: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

QUANTILE AND PROBABILITY CURVES WITHOUT CROSSING

VICTOR CHERNOZHUKOV† IVAN FERNANDEZ-VAL§ ALFRED GALICHON‡

Abstract. The most common approach to estimating conditional quantile curves is to fit a curve,

typically linear, pointwise for each quantile. Linear functional forms, coupled with pointwise fitting, are

used for a number of reasons including parsimony of the resulting approximations and good computa-

tional properties. The resulting fits, however, may not respect a logical monotonicity requirement – that

the quantile curve be increasing as a function of probability. This paper studies the natural monotoniza-

tion of these empirical curves induced by sampling from the estimated non-monotone model, and then

taking the resulting conditional quantile curves that by construction are monotone in the probability.

This construction of monotone quantile curves may be seen as a bootstrap and also as a monotonic re-

arrangement of the original non-monotone function. It is shown that the monotonized curves are closer

to the true curves in finite samples, for any sample size. Under correct specification, the rearranged

conditional quantile curves have the same asymptotic distribution as the original non-monotone curves.

Under misspecification, however, the asymptotics of the rearranged curves may partially differ from the

asymptotics of the original non-monotone curves. An analogous procedure is developed to monotonize

the estimates of conditional distribution functions. The results are derived by establishing the compact

(Hadamard) differentiability of the monotonized quantile and probability curves with respect to the

original curves in discontinuous directions, tangentially to a set of continuous functions. In doing so,

the compact differentiability of the rearrangement-related operators is established.

Keywords: Quantile regression, Monotonicity, Rearrangement, Approximation, Functional Delta

Method, Hadamard Differentiability of Rearrangement Operators.

AMS 2000 subject classification: Primary 62J02; Secondary 62E20, 62P20

Date: First version is of April 6, 2005. The last modification was done April 27, 2007. The title

of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful

for the inspiration and formulation of the problem. We would like to thank Josh Angrist, Andrew

Chesher, Phil Cross, Raymond Guiteras, Xuming He, Roger Koenker, Vadim Marmer, Ilya Molchanov,

Francesca Molinari, Whitney Newey, Steve Portnoy, Shinichi Sakata, Art Shneyerov, Alp Simsek, and

seminar participants at BU, Columbia, Cornell, Georgetown, Harvard-MIT, MIT, Northwestern, UBC,

and UCL for very useful comments that helped improve the paper.1

Page 3: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

2

1. Introduction and Discussion

The problem studied in this paper can be best described using linear quantile regres-

sion as the prime example (Koenker, 2005). Suppose that x′β(u) is a linear approxima-

tion to the u-quantile Q0(u|x) of a real response variable Y , given a vector of regressors

X = x. The typical estimation methods fit the conditional curve x′β(u) pointwise in

u ∈ (0, 1) producing an estimate x′β(u). Linear functional forms, coupled with pointwise

fitting, are used for a number of reasons including parsimony of the resulting approxi-

mations and good computational properties (Portnoy and Koenker, 1997). However, a

problem that might occur is that the map

u 7→ x′β(u)

may not be increasing in u, which violates the logical monotonicity requirement. Another

manifestation of this issue, known as the “quantile crossing problem” (He, 1997), is that

the conditional quantile curves x 7→ x′β(u) may cross for different values of u.

In the analysis we shall distinguish the following two cases, each leading to the lack

of monotonicity or the crossing problem:

(1) Monotonically correct case: The population curve u 7→ x′β(u) is increasing in u,

and thus satisfies the monotonicity requirement. However, the empirical curve

u 7→ x′β(u) may be non-monotone due to estimation error.

(2) Monotonically incorrect case: The population curve u 7→ x′β(u) is non-monotone

due to imperfect approximation to the true conditional quantile function. Ac-

cordingly, the resulting empirical curve u 7→ x′β(u) is also non-monotone due to

both non-monotonicity of the population curve and estimation error.

Consider the random variable

Yx := x′β(U) where U ∼ U(0, 1).

This variable can be seen as a bootstrap draw from the estimated quantile regression

model, as in Koenker (1994), and has the distribution function

F (y|x) =

∫ 1

0

1{x′β(u) ≤ y}du. (1.1)

Page 4: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

3

Moreover, inverting the distribution function, one obtains a proper quantile function

F−1(u|x) = inf{y : F (y|x) ≥ u}, (1.2)

which is monotone in u. The rearranged quantile function F−1(u|x) coincides with the

original curve x′β(u) if the original curve is increasing in u, but differs from the original

curve otherwise. Thus, starting with a possibly non-monotone original curve u 7→ x′β(u),

the rearrangement (1.1)-(1.2) produces a monotone quantile curve u 7→ F−1(u|x). In

what follows, we focus our attention on the interval (0, 1) without loss of generality.

Indeed, any closed subinterval of (0, 1) can also be considered isomorphically to the

treatment of the unit interval case, as commented further in Section 2.

As mentioned above, this rearrangement mechanism has a direct relation to the quan-

tile regression bootstrap (Koenker, 1994), since the rearranged quantile curve is produced

by sampling from the estimated original quantile model. Moreover, both the mechanism

and its name have a direct relation to rearrangement maps in variational analysis and op-

erations research (e.g., Hardy, Littlewood, and Polya, 1952, and Villani, 2003). Further

important references on the rearrangement method are discussed below.

The purpose of this paper is to establish the empirical properties of the rearranged

quantile curves and their distribution counterparts:

u 7→ F−1(u|x) and y 7→ F (y|x),

under scenarios (1) and (2). The paper also characterizes certain analytical and approx-

imation properties of the corresponding population curves:

u 7→ F−1(u|x) = inf{y : F (y|x) ≥ u}, and y 7→ F (y|x) =

∫ 1

0

1{x′β(u) ≤ y}du.

The first main result of the paper establishes the improved estimation properties of

the rearranged curves. We show that the rearranged curve F−1(u|x) is closer to the true

conditional quantile curve Q0(u|x) than the original curve. Formally, for each x, we have

that for all p ∈ [1,∞]

(∫

U|Q0(u|x)− F−1(u|x)|pdu

)1/p

≤(∫

U|Q0(u|x)− x′β(u)|pdu

)1/p

,

Page 5: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

4

where the inequality is strict for p ∈ (1,∞) whenever u 7→ x′β(u) is decreasing on

a subset of U := (0, 1) of positive Lebesgue measure, while u 7→ Q0(u|x) is strictly

increasing. This property is independent of the sample size, and thus continues to hold

in the population, and also regardless of whether the linear quantile estimator x′β(u)

estimates Q0(u|x) consistently or not, i.e. whether Q0(u|x) = x′β(u) or Q0(u|x) 6=x′β(u). In other words, the rearranged quantile curves have smaller estimation error

than the original curves whenever the latter are not monotone. This is a very important

property that does not depend on the way the quantile model is estimated. It also does

not rely on any other specifics of the current context and is therefore applicable quite

generally.

Towards describing the essence of the rest of results, let us fix the value of the regressor

X to x. Suppose that β(u) is an estimator for β(u) that converges weakly to a Gaussian

process G(u), so that√

nx′(β(u)− β(u)) ⇒ x′G(u), (1.3)

as a stochastic process indexed by u in the metric space of bounded functions `∞(0, 1).

For sufficient conditions, see, for example, Gutenbrunner and Jureckova (1992), Portnoy

(1991), and Angrist, Chernozhukov, and Fernandez-Val (2006).

The second main result of the paper is that in the monotonically correct case (1),

√n(F (y|x)− F (y|x)) ⇒ F ′(y|x)[x′G(F (y|x))], (1.4)

as a stochastic process indexed by y in the metric space `∞(Y), where Y is the support

of Yx; and

F ′(y|x) =1

x′β′(u)

∣∣∣u=F (y|x)

, with β′(u) :=∂β(u)

∂u.

Moreover, we show that

√n(F−1(u|x)− F−1(u|x)) ⇒ x′G(u), (1.5)

as a stochastic process indexed by u, in `∞(0, 1); which, remarkably, coincides with the

first order asymptotics (1.3) of the original curve. This result has a convenient practical

implication: if the population curve is monotone, then the empirical non-monotone

curve can be re-arranged to be monotonic without affecting its (first order) asymptotic

properties. To derive the above results we find the functional Hadamard derivatives

Page 6: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

5

of F (y|x) and F−1(u|x) with respect to perturbations of the underlying curve x′β(u)

in discontinuous directions, tangentially to the set of continuous functions, and then

use the functional delta method. Establishing the Hadamard differentiability of the

rearranged distribution and quantile curves in discontinuous directions is the second

main theoretical result of the paper.

The third main result is that in the monotonically incorrect case

√n(F (y|x)− F (y|x)) ⇒

K(y|x)∑

k=1

x′G(uk(y|x))

|x′β′(uk(y|x))| , (1.6)

as a stochastic process indexed by y ∈ K, in `∞(K), where K is an appropriate set

defined in the next section. Here u1(y|x) < ... < uK(y|x)(y|x) are solutions to the

equation y = x′β(u), assuming that K(y|x) is bounded. Similarly, for the rearranged

quantile curve,

√n(F−1(u|x)− F−1(u|x)) ⇒

(K(y|x)∑

k=1

1

|x′β′(uk(y|x))|

)−1 K(y|x)∑

k=1

x′G(uk(y|x))

|x′β′(uk(y|x))|

∣∣∣∣∣y=F−1(u|x)

,

(1.7)

as a stochastic process indexed by u ∈ K ′, in `∞(K ′), where K ′ is an appropriate set

defined in the next section.

Analogously to quantiles, most estimation methods for conditional distribution func-

tions do not impose monotonicity, and therefore can give rise to non-monotonic empirical

conditional distribution curves; see, for example, Hall, Wolff, and Yao (1999). A similar

monotone rearrangement can be applied to these distribution curves by exchanging the

roles played by the quantile and the probability spaces. Thus, suppose that P (y|x) is a

candidate estimate of a conditional distribution function, which is not monotone in y.

The rearranged monotone quantile curve associated with P (y|x) is

Q (u|x) =

∫ ∞

0

1{P (y|x) ≤ u}dy −∫ 0

−∞1{P (y|x) > u}dy.

The rearranged probability curve can then be obtained as the inverse of this quantile

curve, i.e.,

F (y|x) = inf{

u : Q (u|x) ≥ y}

,

Page 7: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

6

which is monotone by construction. Section 3 shows in more detail that similar improved

estimation properties and an asymptotic distribution theory goes through for Q and F .

The distributional results in the paper do not rely on the sampling properties of the

particular estimation method used, because they are expressed in terms of the differ-

entiability of the operator with respect to the basic estimated process. Moreover, the

results that follow are derived without imposing linearity of the functional forms. The

only conditions required are that (1) a central limit theorem like (1.3) applies to the

estimator of the curve, and (2) the population curves have some smoothness properties.

The exact nature of these population curves does not affect the validity of the results.

For example, the results hold regardless of whether the underlying model is an ordinary

or an instrumental quantile regression model.

There exist other methods to obtain monotonic fits based on quantile regression.

He (1997), for example, proposes to impose a location-scale regression model, which

naturally satisfies monotonicity. This approach is fruitful for location-scale situations,

but in numerous cases data do not satisfy the location-scale model, as discussed, for

example, in Lehmann (1974), Doksum (1974), and Koenker (2005). Koenker and Ng

(2005) develop a computational method for quantile regression that imposes the non-

crossing constraints in simultaneous fitting of quantile curves. This approach may be

fruitful in many situations, but the statistical properties of the method remain unknown.

Clearly, Koenker and Ng’s proposal is different from the rearrangement method.

The distributional results obtained in the paper can also be viewed as a functional

delta method for the rearrangement-related operators (1.1) and (1.2) that include the

inverse (quantile) operators as a special case. In this sense, they extend the previous

results by Gill and Johansen (1990), Doss and Gill (1992), and Dudley and Norvaisa

(1999) on compact differentiability of the quantile operator. The main technical difficulty

here, as well as in the quantile case, is that differentiability needs to be established in

discontinuous directions (that converge to continuous directions, i.e., tangentially to

the set of continuous functions), because the empirical perturbations of the quantile

processes are typically step functions.

Both the statistical and mathematical results of this paper complement the important

work of Dette, Neumeyer, and Pilz (2006), which applies the rearrangement operators to

Page 8: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

7

kernel mean regressions, in order to obtain mean regression functions that are monotonic

in the regressors. Our results on Hadamard differentiability in discontinuous directions

are new. They complement the local expansions in smooth directions subsumed in the

proofs in Dette et. al. for the case called here the monotonically correct case. In

addition, our results cover monotonically incorrect cases. The statistical problem also

differs quite substantially. The mathematical results of this paper also complement the

results on directional differentiability of L1- functionals of rearranged functions like (1.2)

by Mossino and Temam (1981). The results for L1-functionals do not imply the main

results of this paper, such as (1.4)-(1.7), but the converse is shown to be true. (See

discussion after Proposition 4 in Section 2 for more details.)

There are many potential applications of the estimation and differentiability results of

the paper to objects other than probability or quantile curves. For example, in a com-

panion work we present applications to economic demand and production functions and

to biometric growth curves, where monotonization is used to impose useful theoretical

or logical restrictions (Chernozhukov, Fernandez-Val, and Galichon, 2006a).

We organize the rest of the paper as follows. In Section 2.1 we describe some basic

analytical properties of the rearranged population curves. In Section 2.2 we derive the

functional differentiability results. In Section 2.3 we present estimation properties of

the rearranged curves and establish their limit distributions. In Section 3 we extend

the previous results to monotonize estimates of distribution curves. In Section 4.1 we

illustrate the rearrangement procedure with an empirical application, and in Section 4.2

we provide a Monte-Carlo example. In Section 5 we conclude with a summary of the

main results.

2. Rearranged Quantile Curves: Analytical and Empirical Properties

In this section the treatment of the problem is somewhat more general than in the in-

troduction. In particular, we replace the linear functional form x′β(u) by Q(u|x). Define

Yx := Q(U |x), where U ∼ Uniform(U) with U = (0, 1). Let F (y|x) :=∫ 1

01{Q(u|x) ≤

y}du be the distribution function of Yx, and F−1(u|x) := inf{y : F (y|x) ≥ u} be the

quantile function of Yx.

Page 9: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

8

Remark. We consider the interval (0, 1) without loss of generality. Indeed, suppose

we are interested in a particular subinterval (a, a + b) of (0, 1). For example, we may

wish to focus estimation on a particular range of quantiles or to avoid estimation of tail

quantiles. For this purpose, we define all objects conditionally on the event U ∈ (a, a+b):

Yx := Q(U |x) = Q(a + bU |x), where U ∼ U(0, 1), F (y|x) :=∫ 1

01{Q(u|x) ≤ y}du, and

F−1(u|x) := inf{y : F (y|x) ≥ u} for u ∈ (0, 1). The analysis of the paper applies to the

functions Q and F−1. In order to go back to the unconditional quantities, we can use

the transformations Q(u|x) = Q((u−a)/b|x) for u ∈ (a, a+b) and F (y|x) = a+bF (y|x)

for y ∈ {Q(u|x) : u ∈ (a, a + b)}.

2.1. Basic Analytical Properties. We start by developing some basic properties for

F (y|x) and F−1(u|x), the population counterparts of the rearranged distribution curve

and its inverse. We need these properties to derive various empirical properties stated

in the next section.

Recall first the following definitions from Milnor (1965): Let g : U ⊂ R → R be a

continuously differentiable function. A point u ∈ U is called a regular point of g if the

derivative of g at this point does not vanish, i.e., g′ (u) 6= 0. A point u which is not a

regular point is called a critical point. A value y ∈ g (U) is called a regular value of g

if g−1 ({y}) contains only regular points, i.e., if ∀u ∈ g−1 ({y}), g′ (u) 6= 0. A value y

which is not a regular value is called a critical value.

Denote by Yx the support of Yx, YX := {(y, x) : y ∈ Yx, x ∈ X}, and UX := U × X .

We assume throughout that Yx ⊂ Y , which is compact subset of R, and that x ∈ X , a

compact subset of Rd. In some applications the curves of interest are not functions of

x, or we might be interested in a particular value x. In this case, the set X is taken to

be a singleton X = {x}. We make the following assumptions about Q(u|x):

(a) Q(u|x) : U ×X → R is a continuously differentiable function in both arguments,

(b) For each x ∈ X , the number of elements of {u ∈ U | Q′(u|x) = 0} is finite and

uniformly bounded on x ∈ X .

Assumption (b) implies that, for each x ∈ X , Q′(u|x) is not zero almost everywhere on

U and can only switch sign a bounded number of times. Let Y∗x be the subset of regular

values of u 7→ Q(u|x) in Yx, and YX ∗ := {(y, x) : y ∈ Y∗x, x ∈ X}.

Page 10: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

9

Proposition 1 (Basic Properties of F (y|x) and F−1(u|x)). Under assumptions (a) -

(b), the functions F (y|x) and F−1(u|x) satisfy the following properties:

1. The set of critical values, Yx \ Y∗x, is finite, and∫Yx\Y∗x dF (y|x) = 0.

2. For any y ∈ Y∗x

F (y|x) =

K(y|x)∑

k=1

sign{Q′(uk(y|x)|x)}uk(y|x) + 1{Q′(uK(y|x)(y|x)|x) < 0},

where {uk(y|x), for k = 1, ..., K(y|x) < ∞} are the roots of Q(u|x) = y in increasing

order.

3. For any y ∈ Y∗x, the ordinary derivative f(y|x) = ∂F (y|x)/∂y exists and takes the

form

f(y|x) =

K(y|x)∑

k=1

1

|Q′(uk(y|x)|x)| ,

which is continuous at each y ∈ Y∗x. For any y ∈ Y \ Y∗x, set f(y|x) := 0. F (y|x) is

absolutely continuous and strictly increasing in y ∈ Yx. Moreover, f(y|x) is a Radon-

Nikodym derivative of F (y|x) with respect to the Lebesgue measure.

4. The quantile function F−1(u|x) partially coincides with Q(u|x); namely

F−1(u|x) = Q(u|x),

provided that Q(u|x) is increasing at u, and the equation Q(u|x) = y has unique solution

for y = F−1(u|x).

5. The quantile function F−1(u|x) is equivariant to location and scale transformations

of Q(u|x).

6. The quantile function F−1(u|x) has an ordinary continuous derivative

1/f(F−1(u|x)|x),

when F−1(u|x) ∈ Y∗x. This function is also a Radon-Nikodym derivative with respect to

the the Lebesgue measure.

7. The map (y, x) 7→ F (y|x) is continuous on YX and the map (u, x) 7→ F−1(u|x) is

continuous on UX .

Page 11: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

10

The following simple example illustrates some of these basic properties in a situation

where the initial population pseudo-quantile curve is highly non-monotone. Consider

the following pseudo-quantile function:

Q(u) = 5

(u +

1

πsin(2πu)

). (2.1)

The left panel of Figure 1 shows that this function is non-monotone in [0, 1]. In par-

ticular, the slope of Q(u) changes sign twice at 1/3 and 2/3. The rearranged quantile

curve F−1(u), also plotted in this panel, is continuous and monotonically increasing.

The results 1, 2, 4 and 7 of the proposition are illustrated in the right panel of Figure

1, which plots the original and rearranged distribution curves. Here we can see that

the rearranged distribution function is continuous, does not have mass points, and co-

incide with the original curve for values of y where the original curve is one to one and

increasing.

Figure 2 illustrates the third and sixth results of the proposition by plotting the

sparsity function for F−1(u) and the density function of F (y). The derivative of F (y)

in the right panel is continuous at the regular values of Q(u). Similarly, the sparsity

function for F−1(u) in the left panel is continuous at the corresponding image values

(under F (y)).

2.2. Functional Derivatives. Next, we establish the main results of the paper on

Hadamard differentiability of F (y|x) and F−1(u|x) with respect to Q(u|x), tangentially

to the space of continuous functions on UX . This differentiability property is important

for deriving the asymptotic distributions of the rearranged estimates. In particular, the

property allows us to establish generic convergence results for rearranged curves based on

any initial quantile estimator, provided the initial estimator satisfies a functional central

limit theorem. The property also implies that the bootstrap is valid for performing

inference on the rearranged estimates, provided the bootstrap is valid for the initial

estimates. This result follows from the functional delta method for the bootstrap (e.g.,

Theorem 13.9 in van der Vaart, 1998).

In what follows, `∞(UX ) denotes the set of bounded and measurable functions h :

UX → R, C(UX ) denotes the set of continuous functions mapping h : UX → R, and

Page 12: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

11

0.0 0.2 0.4 0.6 0.8 1.0

01

23

45

u

Q(u)F−1(u)

0 1 2 3 4 50.

00.

20.

40.

60.

81.

0

y

Q−1(y)F(y)

Figure 1. Left: The pseudo-quantile function Q(u) and the rearranged

quantile function F−1(u). Right: The pseudo-distribution function Q−1(y)

and the rearranged distribution function F (y).

0.0 0.2 0.4 0.6 0.8 1.0

05

1015

u

1/f(F−1(u))

0 1 2 3 4 5

0.0

0.5

1.0

1.5

y

f(y)

Figure 2. Left: The density (sparsity) function of the rearranged quan-

tile function F−1(u). Right: The density function of the rearranged dis-

tribution function F (y).

Page 13: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

12

`1(UX ) denotes the set of measurable functions h : UX → R such that∫U

∫X |h(u|x)|dudx

< ∞, where du and dx denote the integration with respect to the Lebesgue measure on

U and X , respectively.

Proposition 2 (Hadamard Derivative of F (y|x) with respect to Q(u|x)). Define F (y|x, ht)

:=∫ 1

01{Q(u|x) + tht(u|x) ≤ y}du. Under assumptions (a)-(b), as t → 0,

Dht(y|x, t) =F (y|x, ht)− F (y|x)

t→ Dh(y|x), (2.2)

Dh(y|x) := −K(y|x)∑

k=1

h(uk (y|x) |x)

|Q′(uk(y|x)|x)| . (2.3)

The convergence holds uniformly in any compact subset of YX ∗ := {(y, x) : y ∈ Y∗x, x ∈X}, for every |ht − h|∞ → 0, where ht ∈ `∞ (UX ), and h ∈ C(UX ).

Proposition 3 (Hadamard Derivative of F−1(u|x) with respect to Q(u|x)). Under as-

sumptions (a)-(b), as t → 0,

Dht(u|x, t) :=F−1(u|x, ht)− F−1(u|x)

t→ Dh(u|x), (2.4)

Dh(u|x) := − 1

f(F−1(u|x)|x)·Dh(F

−1(u|x)|x). (2.5)

The convergence holds uniformly in any compact subset of UX ∗ = {(u, x) : (F−1(u|x), x) ∈YX ∗}, for every |ht − h|∞ → 0, where ht ∈ `∞ (UX ), and h ∈ C(UX ).

The convergence results hold uniformly on regions that exclude the critical values of

the mapping u 7→ Q(u|x). At the critical values, Q(u|x) possibly changes from increasing

to decreasing. Moreover, in the monotonically correct case (1), the following result is

worth emphasizing:

Corollary 1 (Monotonically correct case). Suppose u 7→ Q(u|x) has Q′(u|x) > 0, for

each (u, x) ∈ UX , then YX ∗ = YX and UX ∗ = UX . Therefore, the convergence in

Propositions 2 and 3 holds uniformly over the entire YX and UX , respectively. More-

over, Dh(u|x) = h, i.e., the Hadamard derivative of the rearranged quantile with respect

to the original curve is the identity operator.

Page 14: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

13

The convergence is uniform over the entire domain in the monotonically correct case.

This result raises naturally the question of whether uniform convergence can be achieved

by some operation of smoothing in the monotonically incorrect case – namely integrating

either over y (or over u). The answer is indeed yes.

The following proposition calculates the Hadamard derivative of the following func-

tionals obtained by integration:

(y′, x) 7→∫

Y1{y ≤ y′}g(y|x)F (y|x)dy, (u′, x) 7→

U1{u ≤ u′}g(u|x)F−1(u|x)du,

with the restrictions on g specified below. These elementary functionals are useful

building blocks for various statistics, as briefly mentioned in the next section.

Proposition 4. The following results are true with the limits being continuous on the

specified domains:

1.

Y1{y ≤ y′}g(y|x)Dht(y|x, t)dy →

Y1{y ≤ y′}g(y|x)Dh(y|x)dy

uniformly in (y′, x) ∈ YX , for any g ∈ `∞(YX ) such that x 7→ g(y|x) is continuous for

a.e. y.

2.

U1{u ≤ u′}g(u|x)Dht(u|x, t)du →

U1{u ≤ u′}g(u|x)Dh(u|x)du

uniformly in (u′, x) ∈ UX , for any g ∈ `1(UX ) such that x 7→ g(u|x) is continuous for

a.e. u.

This proposition essentially is a corollary of Propositions 2 and 3. Indeed, the results

(1)-(2) follow from the fact that the pointwise convergence of Propositions 2 and 3,

coupled with the uniform integrability shown in Lemma 3 in the Appendix, permits the

interchange of limits and integrals. An alternative way of proving result (2), but not any

other result in the paper, can be based on exploiting the convexity of the functional in

(2) with respect to the underlying curve, following the approach of Mossino and Temam

(1981), and Alvino, Lions, and Trombetti (1989). Due to this limitation, we do not

pursue this approach in this paper. However, details of this approach are described

in Chernozhukov, Fernandez-Val, and Galichon (2006b) with an application to some

nonparametric estimation problems.

Page 15: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

14

It is also worth emphasizing the properties of the following smoothed functionals. For

a measurable function f : R 7→ R define the smoothing operator as

Sf(y) :=

∫kδ(y − y′)f(y′)dy′, (2.6)

where kδ(v) = 1{|v| ≤ δ}/2δ and δ > 0 is a fixed bandwidth. Accordingly, the smoothed

curves SF (y|x) and SF−1(u|x) are given by

SF (y|x) :=

∫kδ(y − y′)F (y′|x)dy′, SF−1(u|x) :=

∫kδ(u− u′)F−1(u′|x)du′.

Since these curves are merely formed as differences of the elementary functionals in

Proposition 4, followed by a division by δ, the following corollary is immediate.

Corollary 2. We have that SDht(y|x, t) → SDh(y|x) uniformly in (y, x) ∈ YX , and

SDht(u|x, t) → SDh(u|x) uniformly in (u, x) ∈ UX .

Note that smoothing accomplishes uniform convergence over the entire domain, which

is a good property to have from the perspective of data analysis.

2.3. Empirical Properties of F (y|x) and F−1(u|x). We are now ready to state the

main results for this section.

Proposition 5 (Improvement in Estimation Property Provided by Rearrangement).

Suppose that Q(·|·) is an estimator (not necessarily consistent) for some true quantile

curve Q0(·|·). Then, the rearranged curve F−1(u|x) is closer to the true curve than

Q(u|x) in the sense that, for each x ∈ X ,

(∫

U|Q0(u|x)− F−1(u|x)|pdu

)1/p

≤(∫

U|Q0(u|x)− Q(u|x)|pdu

)1/p

, p ∈ [1,∞],

where the inequality is strict for p ∈ (1,∞) whenever Q(u|x) is decreasing on a subset

of U of positive Lebesgue measure, while Q0(u|x) is increasing on U .

The above property is independent of the sample size and of the way the estimate of

the curve is obtained, and thus continues to hold in the population.

This proposition establishes that the rearranged quantile curves have smaller estima-

tion error than the original curves whenever the latter are not monotone. This is a very

Page 16: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

15

important property that does not depend on the way the quantile model is estimated.

It also does not rely on any other specifics and is thus applicable quite generally.

The following proposition investigates the asymptotic distributions of the rearranged

curves.

Proposition 6 (Empirical Properties of (y, x) 7→ F (y|x) and (u, x) 7→ F−1(u|x)).

Suppose that Q(·|·) is an estimator for Q(·|·) that takes its values in the space of bounded

measurable functions defined on UX , and that, in `∞(UX ),

√n(Q(u|x)−Q(u|x)) ⇒ G(u|x),

as a stochastic process indexed by (u, x) ∈ UX , where (u, x) 7→ G(u|x) is a Gaussian

process with continuous paths. Assume also that Q(u|x) satisfies the basic conditions (a)

and (b). Then in `∞(K), where K is any compact subset of YX ∗,√

n(F (y|x)− F (y|x)) ⇒ DG(y|x)

as a stochastic process indexed by (y, x) ∈ YX ∗; and in `∞(UXK), with UXK = {(u, x) :

(F−1(u|x), x) ∈ K},√

n(F−1(u|x)− F−1(u|x)) ⇒ DG(u|x),

as a stochastic process indexed by (u, x) ∈ UXK.

Corollary 3 (Monotonically correct case). Suppose u 7→ Q(u|x) has Q′(u|x) > 0

for each (u, x) ∈ UX , then YX ∗ = YX and UX ∗ = UX . Accordingly, the con-

vergence in Proposition 5 holds uniformly over the entire YX and UX . Moreover,

DG(u|x) = G(u|x), i.e., the rearranged quantile curves have the same first order as-

ymptotic distribution as the original quantile curves.

Thus, in the monotonically correct case, the first order properties of the rearranged

and initial quantile estimates coincide. Hence, all the inference tools that apply to

original quantile estimates also apply to the rearranged quantile estimates. In particular,

if the bootstrap is valid for the original estimate, it is also valid for the rearranged

estimate, by the functional delta method for the bootstrap. In the empirical example

of Section 4, we exploit this useful property to construct uniform confidence bands for

the conditional quantile functions based on the rearranged quantile function estimates.

Page 17: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

16

In addition to the results on quantile function estimates, Proposition 6 provides the

asymptotic properties of the distribution function estimates. The preceding remark

about the validity of bootstrap applies also to these estimates.

In the monotonically incorrect case, the large sample properties of the rearranged

quantile estimates differ from those of the initial quantile estimates. Proposition 6

enables us to perform inferences for rearranged curves in this case, including by the

bootstrap, but only after excluding certain nonregular neighborhoods (for the distribu-

tion estimates, the neighborhood of the critical values of the map u 7→ Q(u|x), and, for

the rearranged quantile estimates, the image of the latter neighborhood under F (y|x)).

However, if we consider the following linear functionals of the rearranged quantile and

distribution estimates:

(y′, x) 7→∫

Y1{y ≤ y′}g(y|x)F (y|x)dy, (u′, x) 7→

U1{u ≤ u′}g(u|x)F−1(u|x)du,

then we no longer need to exclude the nonregular neighborhoods. The following propo-

sition describes the empirical properties of these functionals in large samples.

Proposition 7 (Empirical Properties of Integrated Curves). Under the conditions of

Proposition 6, the following results are true with the limits being continuous on the

specified domains:

1.√

n

Y1{y ≤ y′}g(y|x)(F (y|x)− F (y|x))dy ⇒

Y1{y ≤ y′}g(y|x)DG(y|x)dy,

as a stochastic process indexed by (y′, x) ∈ YX , in `∞(YX ).

2.√

n

U1{u ≤ u′}g(u|x)(F−1(u|x)− F−1(u|x))du ⇒

U1{u ≤ u′}g(u|x)DG(u|x)du,

as stochastic process indexed by (u′, x) ∈ UX , in `∞(UX ).

The restrictions on the function g are the same as in Proposition 4.

The linear functionals defined above are useful building blocks for various statistics,

such as partial means, various moments, and Lorenz curves. For example, the conditional

Lorenz curve is

L(u|x) =( ∫

U1{t ≤ u}F−1(t|x)dt

)/( ∫

UF−1(t|x)dt

),

Page 18: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

17

which is a ratio of a partial mean to the mean. Hadamard differentiability of these

statistics with respect to the underlying Q(u|x) immediately follows from the Hadamard

differentiability of the elementary functionals of Proposition 7 by means of the chain

rule. Therefore, the asymptotic distribution of these statistics can be determined from

the asymptotic distribution of the linear functionals, by the functional delta method.

In particular, the validity of the bootstrap for these functionals is preserved by the

functional delta method for the bootstrap.

We next consider the empirical properties of the smoothed curves obtained by applying

the linear smoothing operator S defined in (2.6) to F (y′|x) and F−1(u|x):

SF (y|x) :=

∫kδ(y − y′)F (y′|x)dy′, SF−1(u|x) :=

∫kδ(u− u′)F−1(u′|x)du′.

The following corollary immediately follows from Corollary 2 and the functional delta

method.

Corollary 4 (Large Sample Properties of Smoothed Curves). Under the conditions of

Proposition 6, in `∞(YX ),

√n(SF (y|x)− SF (y|x)) ⇒ S[DG(y|x)],

as a stochastic process indexed by (y, x) ∈ YX , and in `∞(UX ),

√n(SF−1(u|x)− SF−1(u|x)) ⇒ S[DG(u|x)],

as a stochastic process indexed by (u, x) ∈ UX .

Thus, inference on the smoothed rearranged estimates can be performed without

excluding nonregular neighborhoods, which is convenient for practice. Furthermore,

validity of the bootstrap for the smoothed curves follows by the functional delta method

for the bootstrap.

3. Theory of Rearranged Distribution Curves

The rearrangement method can also be applied to rearrange cumulative distribution

curves monotonically by exchanging the roles of the quantile and probability spaces.

There are several situations where one might be faced with the problem of non-

increasing empirical distribution curves. In an option pricing context, for example,

Page 19: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

18

Ait-Sahalia and Duarte (2003) use market data to estimate a risk-neutral distribution.

Estimation error may cause the resulting distribution function to be non-monotonic. In

other cases the distribution curve is obtained by some inverse transformation and local

non-monotonicity comes as an artefact of the regularization technique. In other situa-

tions the particular estimation technique may not respect monotonicity (see, e.g., Hall,

Wolff, and Yao, 1999). We present an alternative solution to this problem that uses the

rearrangement method.

Here we do not present the conditional case for notational convenience. All derivations

for conditional distributions, however, are exactly parallel to those presented in this

section. Suppose we have y 7→ P (y) as a candidate empirical probability distribution

curve, which does not necessarily satisfy monotonicity, with population counterpart

P (y). Define the following quantile curve

Q(u) =

∫ ∞

0

1{P (y) < u}dy −∫ 0

−∞1{P (y) > u}dy,

which is monotone. In what follows, we further assume that the support of P (y) is

Y ⊂ [0, +∞), so that the second term drops out (otherwise it can be treated analogously

to the first).

The inverse of the quantile curve is the rearranged probability curve

F (y) = inf{

y : Q (u) ≥ y}

,

which is also monotone by construction. It should be clear at this point that the quan-

tities Q and F are exactly symmetric to F and Q in the quantile case.

The following improved approximation property is true for F : Let F0(y) be the true

distribution function, then for all p ∈ [1,∞],

(∫

R|F0(y)− F (y)|pdy

)1/p

≤(∫

R|F0(y)− P (y)|pdy

)1/p

,

where the inequality is strict for p ∈ (1,∞) whenever the integral on the right is finite

and y 7→ P (y) is decreasing on a subset of positive Lebesgue measure, while F0(u) is

strictly increasing. This property is independent of the sample size, and thus continues

to hold in the population.

Page 20: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

19

In the monotonically correct case, that is when P ′ (Q(u)) > 0 for all u ∈ [0, 1], if the

empirical distribution curve P (y) satisfies

√n

(P (y)− P (y)

)⇒ G(y)

in `∞(Y), where G is a Gaussian process, then

√n

(Q(u)−Q(u)

)⇒

(1

P ′(Q(u))

)G (Q(u))

in `∞([0, 1]), and, in `∞(Y),

√n

(F (y)− F (y)

)⇒ G (y) . (3.1)

Results paralleling those of the previous section also follow for the monotonically

incorrect case. In particular, we have

√n

(Q(u)−Q(u)

)⇒

K(u)∑

k=1

G (yk (u))

|P ′(yk(u))|in `∞([0, 1]), and

√n

(F (y)− F (y)

)⇒

K(u)∑

k=1

1

|P ′(yk (u))|

−1

K(u)∑

k=1

G (yk (u))

|P ′(yk (u))|

∣∣∣∣∣∣u=P (y)

(3.2)

in `∞(Y), where {yk(u), for k = 1, ..., K(u)} are the roots of P (y) = u, assuming K(u)

is bounded uniformly in u.

4. Illustrative Examples

4.1. Empirical Example. To illustrate the practical applicability of the rearrangement

method, we consider the estimation of expenditure curves. We use the original Engel

(1857) data, from 235 budget surveys of 19th century working-class Belgium households,

to estimate the relationship between food expenditure and annual household income (see

Koenker, 2005). Ernst Engel originally presented these data to support the hypothesis

that food expenditure constitutes a declining share of household income (Engel’s Law).

In Figure 3, we show a scatterplot of the Engel data on food expenditure versus house-

hold income, along with quantile regression curves with the quantile indices {05, 0.1, ...,

0.95}. We see that the quantile regression lines become closer and cross at low values of

Page 21: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

20

income. This crossing problem of the Engel curves is also evident in Figure 4, in which

we plot the quantile regression process of food expenditure as a function of the quantile

index. For low values of income, the quantile regression process is clearly non-monotone.

The rearrangement procedure fixes the non-monotonicity producing increasing quantile

functions. Moreover, the rearranged curves coincide with their quantile regression coun-

terparts for the middle values of income where there is no quantile-crossing problem.

In Figure 5, we plot simultaneous 90% confidence intervals for the conditional quantile

function of food expenditure for different values of income (at the sample median, and the

5% percentile of income). We construct the bands using both original quantile regression

curves and rearranged quantile curves based on 500 bootstrap repetitions and a grid of

quantile indices {0.10, 0.11, ..., 0.90}. We obtain the bands for the rearranged curves

assuming that the population quantile regression curves are monotonically correct, so

that the first order behavior of the rearranged curves coincides with the behavior of the

original curves. The figure shows that even for the low value of income the rearranged

bands lie within the quantile regression bands. This observation points towards the

maintained assumption of the monotonically correct case. The lack of monotonicity of

the estimated quantile regression process in this case is likely to by caused by sampling

error.

We find more evidence consistent with the monotonically correct case in Figure 6, in

which we plot the simultaneous confidence bands for the smoothed quantile regression

and rearranged curves. We construct the band by bootstrapping the smoothed curves

(with bandwidth equal to .05). The bootstrap bands are valid for the smoothed rear-

ranged curves even in the monotonically incorrect case. The almost perfect overlapping

between the confidence bands points towards the monotonically correct case. Interest-

ingly, smoothing reduces the width of the confidence bands, but does not completely

monotonize the quantile regression curves.

4.2. Monte Carlo. We use the following Monte Carlo experiment, matching closely the

previous empirical application, to illustrate the estimation properties of the rearranged

curves in finite samples. In particular, we consider two designs based on the location-

scale shift model: Y = Z(X)′α + (Z(X)′γ)ε, where ε is independent of X, with the true

Page 22: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

21

400 600 800 1000 1200 1400

300

400

500

600

700

800

Income

Food E

xpenditu

re

Figure 3. The scatterplot and quantile regression fits of the Engel

food expenditure data. The plot shows a scatterplot of the Engel data

on food expenditure vs. household income for a sample of 235 19th cen-

tury working-class Belgium households. Superimposed on the plot are the

{0.05, 0.10, ..., 0.95} quantile regression curves. The range displayed corre-

sponds to values of income lower than 1500 and values of food expenditure

lower than 800.

conditional quantile function

Q0(u|X) = Z(X)′α + (Z(X)′γ)Qε(u).

Design 1 includes a constant and a regressor, namely Z(X) = (1, X); and design 2

has an additional nonlinear regressor, namely, Z(X) = (1, X, 1{X > a} · X), where

a = median(X). We select the parameters for designs 1 and 2 to match the Engel

empirical example, employing the estimation method of Koenker and Xiao (2002). For

design 1 we set α = (624.15, 0.55) and γ = (1, 0.0013); and for design 2 we set α =

Page 23: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

22

0.0 0.2 0.4 0.6 0.8 1.0

270

290

310

330

u

Foo

d E

xpen

ditu

re

A. Income = 394 (1% quantile)

Q(u)F (u)Q(u)

−1

0.0 0.2 0.4 0.6 0.8 1.0

300

320

340

360

380

u

Foo

d E

xpen

ditu

re

B. Income = 452 (5% quantile)

Q(u)F (u)Q(u)

−1

0.0 0.2 0.4 0.6 0.8 1.0

500

550

600

650

u

Foo

d E

xpen

ditu

re

C. Income = 884 (Median)

Q(u)F (u)Q(u)

−1

0.0 0.2 0.4 0.6 0.8 1.0

1100

1300

1500

1700

u

Foo

d E

xpen

ditu

re

D. Income = 2533 (99% quantile)

Q(u)F (u)Q(u)

−1

Figure 4. Quantile regression processes and rearranged quantile pro-

cesses for the Engel food expenditure data. Quantile regression estimates

are plotted with a thick gray line, whereas the rearranged estimates are

plotted in black.

(624.15, 0.55,−0.003) and γ = (1, 0.0017,−0.0003). For each design, we draw 1,000

Monte Carlo samples of size n = 235. To generate the values of the dependent variable,

we draw observations from a normal distribution with the same mean and variance as

the residuals ε = (Y −Z(X)′α)/(Z(X)′γ) of the Engel data set; and we fix the regressor

X in all the replications to the observations of income in the Engel data set.

We use designs 1 and 2 to assess the estimation properties of the original and rear-

ranged quantile regressions under the correct and incorrect specification of the functional

Page 24: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

23

0.0 0.2 0.4 0.6 0.8 1.0

250

300

350

400

u

Food E

xpenditu

reA. Income = 452 (5% quantile)

Q(u)F (u)Q(u)

−1

0.0 0.2 0.4 0.6 0.8 1.0

450

500

550

600

650

700

u

Food E

xpenditu

re

B. Income = 884 (Median)

Q(u)F (u)Q(u)

−1

Figure 5. Simultaneous 90% confidence bands for quantile regression

processes and rearranged quantile processes for the Engel food expenditure

data. Two different values of the income regressor are considered. The

bands for quantile regression are plotted in light gray, whereas the bands

for rearranged quantile regression are plotted in dark gray.

form. Thus, in each replication, we estimate the model

Q(u|X) = Z(X)′β(u), Z(X) = (1, X).

This gives the correct functional form for design 1, that is, Q(u|X) ≡ Q0(u|X), and an

incorrect functional form for design 2, that is Q(u|X) 6≡ Q0(u|X) (due to the omission of

a nonlinear regressor). Accordingly, estimation error for design 1 arises entirely due to

sampling error, while the estimation error for design 2 arises due to both sampling error

and specification error. Regardless of the nature of the estimation error, Proposition 5

establishes that the rearranged quantile curves should be closer to the true conditional

quantiles than the original curves.

Page 25: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

24

0.0 0.2 0.4 0.6 0.8 1.0

250

300

350

400

u

Food E

xpenditu

reA. Income = 452 (5% quantile)

SQ(u)SF (u)SQ(u)S −1

0.0 0.2 0.4 0.6 0.8 1.0

450

500

550

600

650

700

u

Food E

xpenditu

re

B. Income = 884 (Median)

SQ(u)SF (u)SQ(u)S −1

Figure 6. Simultaneous 90% confidence bands for the smoothed quan-

tile regression processes and the smoothed rearranged quantile processes

for the Engel food expenditure data. The bands for the smoothed original

curves are plotted in light gray, whereas the bands for the smoothed rear-

ranged curves are plotted in dark gray. The smoothed curves are obtained

using a bandwidth equal to 0.05.

In each replication, we fit a linear quantile regression curve Q(u|X) = X ′β(u) and

monotonize this curve to get F−1(u|X) using the rearrangement method. Table 1 reports

measures of the estimation error of the original and rearranged estimated conditional

quantile curves using different norms (p = 1, 2, 3, 4, and ∞), with the regressor fixed at

a value, X = x0, that corresponds to the 5% quantile of the regressor X (X = 452).

We select this value motivated by the empirical example. Each entry of the table gives

a Monte Carlo average of

Lp :=

(∫

U|Q0(u|x0)− Q(u|x0)|pdu

)1/p

,

Page 26: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

25

for Q(u|x0) = x′0β(u) and Q(u|x0) = F−1(u|x0). We evaluate the integral using a net of

indices u of size .01.

Both in the correctly specified case and in the misspecified case, we find that the

rearranged curves estimate the true quantile curves Q0(u|X) more accurately than the

original curves, providing a 4% to 15% reduction in the estimation/approximation error,

depending on the norm.

Table 1. Estimation Error of Original and Rearranged Curves.

Design 1: Correct Specification Design 2: Incorrect Specification

Original Rearranged Ratio Original Rearranged Ratio

L1 6.79 6.61 0.96 7.33 7.02 0.95

L2 7.99 7.69 0.95 8.72 8.20 0.93

L3 8.93 8.51 0.95 9.85 9.12 0.92

L4 9.70 9.17 0.94 10.78 9.86 0.91

L∞ 17.14 15.32 0.90 19.44 16.44 0.85

5. Conclusion

This paper analyzes a simple regularization procedure for estimation of conditional

quantile and distribution functions based on rearrangement operators. Starting from

a possibly non-monotone empirical curve, the procedure produces a rearranged curve

that not only satisfies the natural monotonicity requirement, but also has smaller esti-

mation error than the original curve. Asymptotic distribution theory is derived for the

rearranged curves, and the usefulness of the approach is illustrated with an empirical

example and a simulation experiment.

† Massachusetts Institute of Technology, Department of Economics and Operations

Research Center, University College London, CEMMAP, and The University of Chicago.

E-mail: [email protected]. Research support from the Castle Krob Chair, National Science

Foundation, the Sloan Foundation, and CEMMAP is gratefully acknowledged.

§ Boston University, Department of Economics. E-mail: [email protected].

Page 27: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

26

‡ Harvard University, Department of Economics. E-mail: [email protected].

Research support from the Conseil General des Mines and the National Science Foun-

dation is gratefully acknowledged.

Page 28: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

27

Appendix A. Proofs

A.1. Proof of Proposition 1. First, note that the distribution of Yx has no atoms,

i.e.,

Pr[Yx = y] = Pr[Q(U |x) = y] = Pr[U ∈ {u ∈ U : u is a root of Q(u|x) = y}] = 0,

since the number of roots of Q(u|x) = y is finite under (a) - (b), and U ∼ Uniform(U).

Next, by assumptions (a)-(b) the number of critical values of Q(u|x) is finite, hence

claim (1) follows.

Next, for any regular y, we can write F (y|x) as

∫ 1

0

1{Q(u|x) ≤ y}du =

K(y|x)−1∑

k=0

∫ uk+1(y|x)

uk(y|x)

1{Q(u|x) ≤ y}du +

∫ 1

uK(y|x)(y|x)

1{Q(u|x) ≤ y}du,

where u0(y|x) := 0 and {uk(y|x), for k = 1, ..., K(y|x) < ∞} are the roots of Q(u|x) = y

in increasing order. Note that the sign of Q′(u|x) alternates over consecutive uk(y|x),

determining whether 1{Q(y|x) ≤ y} = 1 on the interval [uk−1(y|x), uk(y|x)]. Hence

the first term in the previous expression simplifies to∑K(y|x)−1

k=0 1{Q′(uk+1(y|x)|x) ≥0}(uk+1(y|x)− uk(y|x)); while the last term simplifies to 1{Q′(uK(y|x)(y|x)|x) ≤ 0}(1−uK(y|x)(y|x)). An additional simplification yields the expression given in claim (2) of the

proposition.

The proof of claim (3) follows by taking the derivative of expression in claim (2), noting

that at any regular value y the number of solutions K(y|x) and sign(Q′(uk(y|x)|x)) are

locally constant; moreover,

u′k(y|x) =sign(Q′(uk(y|x)|x))

|Q′(uk(y|x)|x)| .

Combining these facts we get the expression for the derivative given in claim (3).

To show the absolute continuity of F (y|x) with f(y|x) being the Radon-Nykodym

derivative, it suffices to show that for each y′ ∈ Yx,∫ y′

−∞ f(y|x)dy =∫ y′

−∞ dF (y|x), cf.

Theorem 31.8 in Billingsley (1995). Let V xt be the union of closed balls of radius t

centered on the critical points Yx \ Y∗x, and define Y tx = Yx\V x

t . Then,∫ y′

−∞ 1{y ∈Y t

x}f(y|x)dy =∫ y′

−∞ 1{y ∈ Y tx}dF (y|x). Since the set of critical points Yx \ Y∗x is finite

Page 29: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

28

and has mass zero under F (y|x),∫ y′

−∞ 1{y ∈ Y tx}dF (y|x) ↑ ∫ y′

−∞ dF (y|x) as t → 0.

Therefore,∫ y′

−∞ 1{y ∈ Y tx}f(y|x)dy ↑ ∫ y′

−∞ f(y|x)dy =∫ y′

−∞ dF (y|x).

Claim (4) follows by noting that at the regions where s → Q(s|x) is increasing and

one-to-one, we have that F (y|x) =∫

Q(s|x)≤yds =

∫s≤Q−1(y|x)

ds = Q−1(y|x). Inverting

the equation u = F (F−1(u|x)|x) = Q−1(F−1(u|x)|x) yields F−1(u|x) = Q(u|x).

Claim (5). We have Yx = Q(U |x) has quantile function F−1(u|x). The quantile

function of α + βQ(U |x) = α + βYx, for β > 0, is therefore inf{y : Pr(α + βYx ≤ y) ≥u} = α + βF−1(u|x).

Claim (6) is immediate from claim (3).

Claim (7). The proof of continuity of F (y|x) is subsumed in the step 1 of the proof of

Proposition 3 (see below). Therefore, for any sequence xt → x we have that F (y|xt) →F (y|x) uniformly in y, and F (y|x) is continuous. Let ut → u and xt → x. Since

F (y|x) = u has a unique root y = F−1(u|x), the root of F (y|xt) = ut, i.e., yt =

F−1(ut|xt), converges to y by a standard argument, see, e.g., van der Vaart and Wellner

(1997). ¤

A.2. Proof of Propositions 2-7. In the proofs that follow we will repeatedly use

Lemma 1, which establishes the equivalence of continuous convergence and uniform

convergence:

Lemma 1. Let D and D′ be complete separable metric spaces, with D compact. Suppose

f : D → D′ is continuous. Then a sequence of functions fn : D → D′ converges to f

uniformly on D if and only if for any convergent sequence xn → x in D we have that

fn(xn) → f(x).

Proof of Lemma 1: See, for example, Resnick (1987), page 2. ¤

Proof of Proposition 2. We have that for any δ > 0, there exists ε > 0 such that

for u ∈ Bε(uk(y|x)) and for small enough t ≥ 0

1{Q(u|x) + tht(u|x) ≤ y} ≤ 1{Q(u|x) + t(h(uk(y|x)|x)− δ) ≤ y},

for all k ∈ 1, ..., K(y|x); whereas for all u 6∈ ∪kBε(uk(y|x)), as t → 0,

1{Q(u|x) + tht(u|x) ≤ y} = 1{Q(u|x) ≤ y}.

Page 30: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

29

Therefore,

∫ 1

01{Q(u|x) + tht(u|x) ≤ y}du− ∫ 1

01{Q(u|x) ≤ y}du

t(A.1)

≤K(y|x)∑

k=1

Bε(uk(y|x))

1{Q(u|x) + t(h(uk(y|x)|x)− δ) ≤ y} − 1{Q(u|x) ≤ y}t

du,

which by the change of variables y′ = Q(u|x) is equal to

1

t

K(y|x)∑

k=1

Jk∩[y,y−t(h(uk(y|x)|x)−δ)]

1

|Q′(Q−1(y′|x)|x)|dy′,

where Jk is the image of Bε(uk(y|x)) under u 7→ Q(·|x). The change of variables is

possible because for ε small enough, Q(·|x) is one-to-one between Bε(uk(y|x)) and Jk.

Fixing ε > 0, for t → 0, we have that Jk ∩ [y, y − t(h(uk(y|x)|x) − δ)] = [y, y −t(h(uk(y|x)|x) − δ)], and |Q′(Q−1(y′|x)|x)| → |Q′(uk(y|x)|x)| as Q−1(y′|x) → uk(y|x).

Therefore, the right hand term in (A.1) is no greater than

K(y|x)∑

k=1

−h(uk(y|x)|x) + δ

|Q′(uk(y|x)|x)| + o (1) .

Similarly∑K(y|x)

k=1−h(uk(y|x)|x)−δ|Q′(uk(y|x)|x)| + o (1) bounds (A.1) from below. Since δ > 0 can be

made arbitrarily small, the result follows.

To show that the result holds uniformly in (y, x) ∈ K, a compact subset of YX ∗, we

use Lemma 1. Take a sequence of (yt, xt) in K that converges to (y, x) ∈ K, then the

preceding argument applies to this sequence, since (1) the function (y, x) 7→ −h(uk(y|x)|x)|Q′(uk(y|x)|x)|

is uniformly continuous on K, and (2) the function (y, x) 7→ K(y|x) is uniformly con-

tinuous on K. To see (2), note that K excludes a neighborhood of critical points

(Y \ Y∗x, x ∈ X ), and therefore can be expressed as the union of a finite number of com-

pact sets (K1, ..., KM) such that the function K(y|x) is constant over each of these sets,

i.e., K(y|x) = kj for some integer kj > 0, for all (y, x) ∈ Kj and j ∈ {1, ..., M}. Likewise,

(1) follows by noting that the limit expression for the derivative is continuous on each of

the sets (K1, ..., KM) by the assumed continuity of h(u|x) in both arguments, continuity

of uk(y|x) (implied by the Implicit Function Theorem), and the assumed continuity of

Q′(u|x) in both arguments. ¤

Page 31: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

30

Proof of Proposition 3. For a fixed x the result follows by Proposition 2, by step

1 of the proof below, and by an application of the Hadamard differentiability of the

quantile operator shown by Doss and Gill (1992). Step 2 establishes uniformity over

x ∈ X .

Step 1. Let K be a compact subset of YX ∗. Let (yt, xt) be a sequence in K, convergent

to a point, say (y, x). Then, for every such sequence, εt := t‖ht‖∞+‖Q(·|xt)−Q(·|x)‖∞+

|yt − y| → 0, and

|F (yt|xt, ht)− F (y|x)| ≤∣∣∣∫ 1

0

[1{Q(u|xt) + tht(u|x) ≤ yt} − 1{Q(u|x) ≤ y}]du∣∣∣

≤∣∣∣∫ 1

0

1{|Q(u|x)− y| ≤ εt}du∣∣∣ → 0, (A.2)

where the last step follows from the absolute continuity of y 7→ F (y|x), the distribution

function of Q(U |x). By setting ht = 0 the above argument also verifies that F (y|x)

is continuous in (y, x). Lemma 1 implies uniform convergence of F (y|x, ht) to F (y|x),

which in turn implies by a standard argument1 the uniform convergence of quantiles

F−1(u|x, ht) → F−1(u|x), uniformly over K∗, where K∗ is any compact subset of UX ∗.

Step 2. We have that uniformly over K∗,

F (F−1(u|x, ht)|x, ht)− F (F−1(u|x, ht)|x)

t= Dh(F

−1(u|x, ht)|x) + o(1),

= Dh(F−1(u|x)|x) + o(1),

(A.3)

using Step 1, Proposition 2, and the continuity properties of Dh(y|x). Further, uniformly

over K∗, by Taylor expansion and Proposition 1, as t → 0,

F (F−1(u|x, ht)|x)− F (F−1(u|x)|x)

t= f(F−1(u|x)|x)

F−1(u|x, ht)− F−1(u|x)

t+ o(1),

(A.4)

and (as will be shown below)

F (F−1(u|x, ht)|x, ht)− F (F−1(u|x)|x)

t= o(1), (A.5)

as t → 0. Observe that the left hand side of (A.5) equals that of (A.4) plus that of

(A.3). The result then follows.

1See, e.g., Lemma 1 in Chernozhukov and Fernandez-Val (2005).

Page 32: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

31

It only remains to show that equation (A.5) holds uniformly in K∗. Note that for any

right-continuous cdf F , we have that u ≤ F (F−1(u)) ≤ u + F (F−1(u)) − F (F−1(u)−),

where F (·−) denotes the left limit of F , i.e., F (x0−) = limx↑x0 F (x). For any continuous,

strictly increasing cdf F , we have that F (F−1(u)) = u. Therefore, write

0 ≤ F (F−1(u|x, ht)|x, ht)− F (F−1(u|x)|x)

t

≤ u + F (F−1(u|x, ht)|x, ht)− F (F−1(u|x, ht)− |x, ht)− u

t

≤ F (F−1(u|x, ht)|x, ht)− F (F−1(u|x, ht)− |x, ht)

t

(1)=

[F (F−1(u|x, ht)|x, ht)− F (F−1(u|x, ht)|x)]

t

− [F (F−1(u|x, ht)− |x, ht)− F (F−1(u|x, ht)− |x)]

t(2)= Dh(F

−1(u|x, ht)|x)−Dh(F−1(u|x, ht)− |x) + o(1) = o(1),

as t → 0, where in (1) we use that F (F−1(u|x, ht)|x) = F (F−1(u|x, ht)−|x) since F (y|x)

is continuous and strictly increasing in y, and in (2) we use Proposition 2. ¤

The following lemma, due to Pratt (1960), will be very useful to prove Proposition 4.

Lemma 2. Let |fn| ≤ Gn and suppose that fn → f and Gn → G almost everywhere,

then if∫

Gn →∫

G finite, then∫

fn →∫

f .

Proof of Lemma 2. See Pratt (1960). ¤

Lemma 3 (Boundedness and Integrability Properties). Under the hypotheses of Propo-

sition 2 and 3, we have that for all (y, x) ∈ YX :

|Dht(u|x, t)| ≤ ‖ht‖∞, (A.6)

and

|Dht(y|x, t)| ≤ ∆(y|x, t) =

∫ 1

0

1{|Q(u|x)− y| ≤ t‖ht‖∞}t

du, (A.7)

where for any xt → x ∈ X , as t → 0,

∆(y|xt, t) → 2‖h‖∞f(y|x) for a.e y ∈ Y and

Y∆(y|xt, t)dy →

Y2‖h‖∞f(y|x)dy.

Page 33: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

32

Proof of Lemma 3. To show (A.6) note that

supx∈X ,y∈Y

|Dht(y|x, t)| ≤ ‖ht‖∞ (A.8)

immediately follows from the equivariance property noted in Claim (5) of Proposition 1.

The inequality (A.7) is trivial. That for any xt → x ∈ X , ∆(y|xt, t) → 2‖h‖∞f(y|x)

for a.e y ∈ Y follows by applying Proposition 2 respectively with functions h′t(u|x) =

‖ht‖∞ and h′t(u, x) = −‖ht‖∞ (for the case when f(y|x) > 0; and trivially otherwise).

Similarly, that for any yt → y ∈ Y , ∆(yt|x, t) → 2‖h‖∞f(y|x) for a.e x ∈ X follows by

Proposition 2 (for the case when f(y|x) > 0; and trivially otherwise) .

Further, by Fubini’s Theorem,∫

Y∆(y|xt, t)dy =

∫ 1

0

(∫

Y

1{|Q(u|xt)− y| ≤ t‖ht‖∞}t

dy

)

︸ ︷︷ ︸=: ft(u)

du. (A.9)

Note that ft(u) ≤ 2‖ht‖∞. Moreover, for almost every u, ft(u) = 2‖ht‖∞ for small

enough t, and 2‖ht‖∞ converges to 2‖h‖∞ as t → 0. Then, trivially, 2∫ 1

0‖ht‖∞du →

2‖h‖∞. By Lemma 2 the right hand side of (A.9) converges to 2‖h‖∞. ¤

A.3. Proof of Proposition 4. Define mt(y|x, y′) := 1{y ≤ y′}g(y|x)Dht(y|x, t) and

m(y|x, y′) := 1{y ≤ y′}g(y|x)Dh(y|x). To show claim (1), we need to demostrate that

for any y′t → y′ and xt → x∫

Ymt(y|xt, y

′t)dy →

Ym(y|x, y′)dy, (A.10)

and that the limit is continuous in (x, y′). We have that |mt(y|xt, yt)| is bounded, for

some constant C, by C∆(y|xt, t) which converges a.e. and the integral of which converges

to a finite number by Lemma 3. Moreover, by Proposition 2, for almost every y we have

mt(y|xt, y′t) → m(y|x, y′). We conclude that (A.10) holds by Lemma 2.

In order to check continuity, we need to show that for any y′t → y′ and xt → x∫

Ym(y|xt, y

′t)dy →

Ym(y|x, y′)dy. (A.11)

We have that m(y|xt, y′t) → m(y|x, y′) for almost every y. Moreover, m(y|xt, yt) is

dominated by ‖g‖∞‖h‖∞f(y|xt), which converges to ‖g‖∞‖h‖∞f(y|x) for almost every

Page 34: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

33

y, and, moreover,∫Y ‖g‖∞‖h‖∞f(y|x)dy converges to ‖g‖∞‖h‖∞. Conclude that (A.11)

holds by Lemma 2.

To show claim (2), define mt(u|x, u′) = 1{u ≤ u′}g(u|x)Dht(u|x) and m(u|x, u′) =

1{u ≤ u′}g(u|x)Dh(u|x). Here we need to show that for any u′t → u′ and xt → x∫

Umt(u|xt, u

′t)du →

Um(u|x, u′)du, (A.12)

and that the limit is continuous in (u′, x). We have that mt(u|xt, u′t) is bounded by

g(u|xt)‖ht‖∞, which converges to g(u|x)‖h‖∞ for a.e. u. Furthermore, the integral of

g(u|xt)‖ht‖∞ converges to the integral of g(u|x)‖h‖∞ by the dominated convergence

theorem. Moreover, by Proposition 2, we have that mt(u|xt, u′t) → m(u|x, u′) for almost

every u. We conclude that (A.12) holds by Lemma 2.

In order to check the continuity of the limit, we need to show that for any u′t → u′

and xt → x ∫

Um(u|xt, u

′t)du →

Um(u|x, u′)du. (A.13)

We have that m(u|xt, u′t) → m(u|x, u′) for almost every u. Moreover, for small enough

t, m(u|xt, u′t) is dominated by |g(u|xt)|‖h‖∞, which converges for almost every value of

u to |g(u|x)|‖h‖∞ as t → 0. Furthermore, the integral of |g(u|xt)|‖h‖∞ converges to the

integral of |g(u|x)|‖h‖∞ by dominated convergence theorem. We conclude that (A.13)

holds by Lemma 2. ¤

The following lemma will be used to prove Proposition 5:

Lemma 4. Assume that Q(u) is a function mapping U := (0, 1) to K, a bounded subset

of R, and that Q0(u) is a non-decreasing function mapping U to K. Think of Q(u)

as an approximation to Q0(u). Let FQ(y) =∫U 1{Q(u) ≤ y}du denote the distribution

function of Q(U) when U ∼ U(0, 1). Let Q∗(u) = F−1Q (u) = inf{y ∈ R : FQ(y) ≥ u}.

Then, for any p ∈ [1,∞],[∫

U|Q0(u)−Q∗(u)|p du

]1/p

≤[∫

U|Q0(u)−Q(u)|p du

]1/p

.

Moreover, this inequality is strict provided (1) p ∈ (1,∞), (2) Q(u) is decreasing on

a subset of U that has positive Lebesgue measure, and (3) the true function Q0(u) is

increasing on U .

Page 35: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

34

Proof of Lemma 4. A direct proof of this lemma is given in Proposition 1 of

Chernozhukov, Fernandez-Val, Galichon (2006a). It is helpful to give a quick indirect

proof of the weak inequality contained in the lemma using the following inequality due

to Lorentz (1953): Let Q and G be two functions mapping U to K, a bounded subset

of R. Let Q∗ and G∗ denote their corresponding increasing rearrangements. Then, we

have ∫

UL(Q∗(u), G∗(u))du ≤

UL(Q(u), G(u))du,

for any submodular discrepancy function L : R2 7→ R+ In our case, G(u) = Q0(u) =

G∗(u) = Q∗0(u) almost everywhere. Thus, the true function is its own rearrangement.

Moreover, L(v, w) = |w − v|p is submodular for p ∈ [1,∞). For the proof of the strict

inequality, please refer to Chernozhukov, Fernandez-Val, Galichon (2006a), Proposition

1. For p = ∞, the inequalities follows by taking limit as p →∞. ¤

A.4. Proof of Proposition 5. This proposition is an immediate consequence of Lemma

4. ¤

A.5. Proof of Proposition 6. This Proposition simply follows by the functional delta

method (e.g. van der Vaart, 1998). Instead of restating what this method is, it takes

less space to simply recall the proof in the current context.

To show the first part, consider the map gn(y, x|h) =√

n(F (y|x, n−1/2h) − F (y|x)).

The sequence of maps satisfies gn′(y, x|hn′) → Dh(y|x) in `∞(K) for every subsequence

hn′ → h in `∞(UX ∗), where h is continuous. It follows by the Extended Continuous Map-

ping Theorem that, in `∞(K), gn(y, x|√n(Q(u|x)−Q(u|x))) ⇒ DG(y|x) as a stochastic

process indexed by (y, x), since√

n(Q(u|x)−Q(u|x)) ⇒ G(u|x) in `∞(K).

Conclude similarly for the second part. ¤

A.6. Proof of Proposition 7. This follows by the functional delta method, similarly

to the proof of Proposition 6. ¤

References

[1] Ait-Sahalia, Y. and Duarte, J. (2003),“Nonparametric option pricing under shape restrictions,”

Journal of Econometrics 116, pp. 9–47.

Page 36: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

35

[2] Alvino, A., Lions, P. L. and Trombetti, G. (1989), “On Optimization Problems with Prescribed

Rearrangements,” Nonlinear Analysis 13 (2), pp. 185–220.

[3] Angrist, J., Chernozhukov, V., and I. Fernandez-Val (2006): “Quantile Regression under Misspec-

ification, with an Application to the U.S. Wage Structure,” Econometrica 74, pp. 539–563.

[4] Billingsley, P. (1995), Probability and measure. Third edition. Wiley Series in Probability and

Mathematical Statistics. A Wiley-Interscience Publication. John Wiley & Sons, Inc., New York.

[5] Chernozhukov, V., and I. Fernandez-Val (2005): “Subsampling Inference on Quantile Regression

Processes. ” Sankhya 67, pp. 253–276.

[6] Chernozhukov, V., Fernandez-Val, I., and A. Galichon (2006a): “Improving Estimates of Monotone

Functions by Rearrangement,” Preprint, available at arxiv.org and ssrn.com.

[7] Chernozhukov, V., Fernandez-Val, I., and A. Galichon (2006b): “An Addendum for Quantile and

Probability Curves Without Crossing (Alternative Proof Directions and Explorations) ” Preprint.

[8] Dette, H., Neumeyer, N., and K. Pilz (2006): “A simple Nonparametric Estimator of a Strictly

Monotone Regression Function,” Bernoulli, 12, no. 3, pp 469-490.

[9] Doksum, K. (1974): “Empirical Probability Plots and Statistical Inference for Nonlinear Models

in the Two-Sample Case,” Annals of Statistics 2, pp. 267–277.

[10] Doss, Hani; Gill, Richard D. (1992), “An elementary approach to weak convergence for quan-

tile processes, with applications to censored survival data.” Journal of the American Statistical

Association 87, no. 419, 869–877.

[11] Dudley, R. M., and R. Norvaisa (1999), Differentiability of six operators on nonsmooth functions

and p-variation. With the collaboration of Jinghua Qian. Lecture Notes in Mathematics, 1703.

Springer-Verlag, Berlin.

[12] Engel, E. (1857), “Die Produktions und Konsumptionsverhaltnisse des Konigreichs Sachsen,”

Zeitschrift des Statistischen Bureaus des Koniglich Sachsischen Misisteriums des Innerm, 8, pp.

1-54.

[13] Gill, R. D., and S. Johansen (1990), “A survey of product-integration with a view toward applica-

tion in survival analysis.” Annals of Statistics 18, no. 4, 1501–1555.

[14] Gutenbrunner, C., and J. Jureckova (1992): “Regression Quantile and Regression Rank Score

Process in the Linear Model and Derived Statistics,” Annals of Statistics 20, pp. 305-330.

[15] Hall, P., Wolff, R., and Yao, Q. (1999), “Methods for estimating a conditional distribution func-

tion,” Journal of the American Statistical Association 94, pp. 154–163.

[16] Hardy, G., Littlewood, J., and G. Polya (1952), Inequalities. Cambridge: Cambridge University

Press.

[17] He, X. (1997), “Quantile Curves Without Crossing,” American Statistician, 51, pp. 186–192.

Page 37: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

36

[18] Koenker, R. (1994): “Confidence Intervals for Regression Quantiles,” in M.P. and M. Huskova

(eds.), Asymptotic Statistics: Proceeding of the 5th Prague Symposium on Asymptotic Statistics.

Physica-Verlag.

[19] Koenker, R. (2005), Quantile Regression. Econometric Society Monograph Series 38, Cambridge

University Press.

[20] Koenker, R., P. Ng (2005), “Inequality constrained quantile regression.” Sankhya 67, no. 2, 418–

440.

[21] Koenker, R., and Z. Xiao (2002): “Inference on the Quantile Regression Process,” Econometrica

70, no. 4, pp. 1583–1612.

[22] Lehmann, E. (1974): Nonparametrics: Statistical Methods Based on Ranks, San Francisco: Holden-

Day.

[23] Lorentz, G. G. (1953): “An Inequality for Rearrangements,” The American Mathematical Monthly

60, pp. 176–179.

[24] Milnor, J. (1965), Topology from the differential viewpoint, Princeton University Press.

[25] Mossino J. and R. Temam (1981), “Directional derivative of the increasing rearrangement mapping

and application to a queer differential equation in plasma physics,” Duke Math. J. 48 (3), 475–495.

[26] Portnoy, S. (1991), “Asymptotic behavior of regression quantiles in nonstationary, dependent

cases,” Journal of Multivariate Analysis 38 , no. 1, 100–113.

[27] Portnoy, S. , and R. Koenker (1997), “The Gaussian hare and the Laplacian tortoise: computability

of squared-error versus absolute-error estimators,” Statist. Sci. 12, no. 4, 279–300.

[28] Pratt, J.W. (1960), “On interchanging limits and integrals.” Annals of Mathematical Statistics 31,

74–77.

[29] Resnick, S. I. (1987), Extreme values, regular variation, and point processes, Applied Probability.

A Series of the Applied Probability Trust, 4. Springer-Verlag, New York.

[30] Vaart, A. van der (1998). Asymptotic statistics. Cambridge Series in Statistical and Probabilistic

Mathematics, 3.

[31] Vaart, A. van der, and J. Wellner (1996), Weak convergence and empirical processes: with appli-

cations to statistics, New York: Springer.

[32] Villani, C. (2003), Topics in Optimal Transportation, Providence: American Mathematical Society.

Page 38: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and
IMPROVING ESTIMATES OF MONOTONE FUNCTIONSBY REARRANGEMENT

Victor ChernozhukovIván Fernández-Val

Alfred Galichon

THE INSTITUTE FOR FISCAL STUDIESDEPARTMENT OF ECONOMICS, UCL

cemmap working paper CWP09/07

Page 39: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

IMPROVING ESTIMATES OF MONOTONE FUNCTIONS BYREARRANGEMENT

VICTOR CHERNOZHUKOV† IVAN FERNANDEZ-VAL§ ALFRED GALICHON‡

Abstract. Suppose that a target function f0 : Rd → R is monotonic, namely, weakly

increasing, and an original estimate f of the target function is available, which is not

weakly increasing. Many common estimation methods used in statistics produce such

estimates f . We show that these estimates can always be improved with no harm using

rearrangement techniques: The rearrangement methods, univariate and multivariate,

transform the original estimate to a monotonic estimate f∗, and the resulting estimate

is closer to the true curve f0 in common metrics than the original estimate f . We

illustrate the results with a computational example and an empirical example dealing

with age-height growth charts.

Key words. Monotone function, improved approximation, multivariate rearrange-

ment, univariate rearrangement, growth chart, quantile regression, mean regression,

series, locally linear, kernel methods

AMS Subject Classification. Primary 62G08; Secondary 46F10, 62F35, 62P10

Date: First version is of December, 2006. This version is of April 26, 2007. We would like to thank

Andrew Chesher, Moshe Cohen, Emily Gallagher, Raymond Guiteras, Xuming He, Roger Koenker,

Charles Manski, Costas Meghir, Ilya Molchanov, Steve Portnoy, Alp Simsek, and seminar participants

at Columbia, Cornell, BU, Georgetown, MIT, MIT-Harvard, Northwestern, UBC, UCL, and UIUC for

very useful comments that helped improve the paper.† Massachusetts Institute of Technology, Department of Economics & Operations Research Center,

University College London, CEMMAP, and The University of Chicago. E-mail: [email protected]. Re-

search support from the Castle Krob Chair, National Science Foundation, the Sloan Foundation, and

CEMMAP is gratefully acknowledged.

§ Boston University, Department of Economics. E-mail: [email protected].

‡ Harvard University, Department of Economics. E-mail: [email protected]. Research

support from the Conseil General des Mines and the National Science Foundation is gratefully

acknowledged.1

Page 40: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

2

1. Introduction

A common problem in statistics is to approximate an unknown monotonic function

on the basis of available samples. For example, biometric age-height charts should be

monotonic in age; econometric demand functions should be monotonic in price; and

quantile functions should be monotonic in the probability index. Suppose an original,

possibly non-monotonic, estimate is available. Then, the rearrangement operation from

variational analysis (Hardy, Littlewood, and Polya 1952, Lorentz 1953, Villani 2003)

can be used to monotonize the original estimate. The rearrangement has been shown

to be useful in producing monotonized estimates of conditional mean functions (Dette,

Neumeyer, and Pilz 2006, Dette and Pilz 2006) and various conditional quantile and

probability functions (Chernozhukov, Fernandez-Val, and Galichon (2006a, 2006b)). In

this paper, it is shown that the rearrangement of the original estimate is useful not

only for producing monotonicity, but also has the following important property: The

rearrangement always improves over the original estimate, whenever the latter is not

monotonic. Namely, the rearranged curves are always closer (often considerably closer)

to the target curve being estimated. Furthermore, this improvement property is generic,

i.e. it does not depend on the underlying specifics of the original estimate and applies

to both univariate and multivariate cases.

The paper is organized as follows. In Section 2.1, we motivate the monotonicity

issue in regression problems, and discuss common estimates/approximations of regression

functions that are not naturally monotonic. In Section 2.2, we analyze the improvements

in estimation/approximation properties that the rearranged estimates deliver. In Section

2.3, we discuss the computation of the rearrangement, using sorting and simulation. In

Section 2.4, we extend the analysis of Section 2.2 to multivariate functions. In Section 3,

we provide proofs of the main results. In Section 4, we present an empirical application to

biometric age-height charts. We show how the rearrangement monotonizes and improves

the original estimates of the conditional mean function in this example, and quantify

the improvement in a simulation example resembling the empirical one. In the same

section, we also analyze estimation of conditional quantile processes for height given age

that need to be monotonic in both age and the quantile index. We apply a multivariate

rearrangement to doubly monotonize the estimates both in age and the quantile index.

We show that the rearrangement monotonizes and improves the original estimates, and

Page 41: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

3

quantify the improvement in a simulation example mimicking the empirical example. In

Section 5 we offer a summary and a conclusion.

2. Improving Approximations of Monotonic Functions

2.1. Common Estimates of Monotonic Functions. A basic problem in many ar-

eas of analysis is to approximate an unknown function f0 : Rd → R on the basis of

some available information. In statistics, the common problem is to approximate an

unknown regression function, such as the conditional mean or a conditional quantile, us-

ing an available sample. In numerical analysis, the common problem is to approximate

an intractable target function by a more tractable function on the basis of the target

function’s values at a collection of points.

Suppose we know that the target function f0 is monotonic, namely weakly increasing.

Suppose further that an original estimate f is available, which is not necessarily mono-

tonic. Many common estimation methods do indeed produce such estimates. Can these

estimates always be improved with no harm? The answer provided by this paper is yes:

the rearrangement method transforms the original estimate to a monotonic estimate

f ∗, and this estimate is in fact closer to the true curve f0 than the original estimate f

in common metrics. Furthermore, the rearrangement is computationally tractable, and

thus preserves the computational appeal of the original estimates.

Estimation methods, specifically the ones used in regression analysis, can be grouped

into global methods and local methods. An example of a global method is the series

estimator of f0 taking the form

f(x) = Pkn(x)′b,

where Pkn(x) is a kn-vector of suitable transformations of the variable x, such as B-

splines, polynomials, and trigonometric functions. Section 4 lists specific examples in

the context of an empirical example. The estimate b is obtained by solving the regression

problem

b = arg minb∈Rkn

n∑i=1

ρ(Yi − Pkn(Xi)′b),

where (Yi, Xi), i = 1, ..., n denotes the data. In particular, using the square loss ρ(u) = u2

produces estimates of the conditional mean of Yi given Xi (Gallant 1981, Andrews

Page 42: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

4

1991, Stone 1994, Newey 1997), while using the asymmetric absolute deviation loss

ρ(u) = (u − 1(u < 0))u produces estimates of the conditional u-quantile of Yi given Xi

(Koenker and Bassett 1978, Portnoy 1997, He and Shao 2000). Likewise, in numerical

analysis “data” often consist of values Yi of a target function evaluated at a collection

of mesh points {Xi, i = 1, , n} and the mesh points themselves. The series estimates

x 7→ f(x) = Pkn(x)′b are widely used in data analysis due to their good approximation

properties and computational tractability. However, these estimates need not be natu-

rally monotone, unless explicit constraints are added into the optimization program (for

example, Matzkin (1994), Silvapulle and Sen (2005), and Koenker and Ng (2005)).

Examples of local methods include kernel and locally polynomial estimators. A kernel

estimator takes the form

f(x) = arg minb∈R

n∑i=1

wiρ(Yi − b), wi = K

(Xi − x

h

),

where the loss function ρ plays the same role as above, K(u) is a standard, possibly

high-order, kernel function, and h > 0 is a vector of bandwidths (see, for example,

Wand and Jones (1995) and Ramsay and Silverman (2005)). The resulting estimate

x 7→ f(x) needs not be naturally monotone. Dette, Neumeyer, and Pilz (2006) show

that the rearrangement transforms the kernel estimate into a monotonic one. We further

show here that the rearranged estimate necessarily improves upon the original estimate,

whenever the latter is not monotonic. The locally polynomial regression is a related

local method (Chaudhuri 1991, Fan and Gijbels 1996). In particular, the locally linear

estimator takes the form

(f(x), d(x)) = argminb∈R,d∈R

n∑i=1

wiρ(Yi − b− d(Xi − x))2, wi = K

(Xi − x

h

).

The resulting estimate x 7→ f(x) may also be non-monotonic, unless explicit constrains

are added to the optimization problem. Section 4 illustrates the non-monotonicity of

the locally linear estimate in an empirical example.

In summary, there are many attractive estimation and approximation methods in sta-

tistics that do not necessarily produce monotonic estimates. These estimates do have

other attractive features though, such as good approximation properties and computa-

tional tractability. Below we show that the rearrangement operation applied to these

estimates produces (monotonic) estimates that improve the approximation properties of

Page 43: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

5

the original estimates by bringing them closer to the target curve. Furthermore, the re-

arrangement is computationally tractable, and thus preserves the computational appeal

of the original estimates.

2.2. The Rearrangement and its Approximation Property: The Univariate

Case. In what follows, let X be a compact interval. Without loss of generality, it is

convenient to take this interval to be X = [0, 1]. Let f(x) be a measurable function

mapping X to K, a bounded subset of R. Let Ff (y) =∫X 1{f(u) ≤ y}du denote the

distribution function of f(X) when X follows the uniform distribution on [0, 1]. Let

f ∗(x) := Qf (x) := inf {y ∈ R : Ff (y) ≥ x}

be the quantile function of Ff (y). Thus,

f ∗(x) := inf

{y ∈ R :

[∫

X1{f(u) ≤ y}du

]≥ x

}.

This function f ∗ is called the increasing rearrangement of the function f .

Thus, the rearrangement operator simply transforms a function f to its quantile func-

tion f ∗. That is, x 7→ f ∗(x) is the quantile function of the random variable f(X) when

X ∼ U(0, 1). It is also convenient to think of the rearrangement as a sorting operation:

given values of the function f(x) evaluated at x in a fine enough net of equidistant

points, we simply sort the values in an increasing order. The function created in this

way is the rearrangement of f .

The main point of this paper is the following:

Proposition 1. Let f0 : X → K be a weakly increasing measurable function in x, where

K is a bounded subset of R. This is the target function. Let f : X → K be another

measurable function, an initial estimate of the target function f0.

1. For any p ∈ [1,∞], the rearrangement of f , denoted f ∗, weakly reduces the estimation

error: [∫

X

∣∣∣f ∗(x)− f0(x)∣∣∣p

dx

]1/p

≤[∫

X

∣∣∣f(x)− f0(x)∣∣∣p

dx

]1/p

. (2.1)

2. Suppose that there exist regions X0 and X ′0, each of measure greater than δ > 0, such

that for all x ∈ X0 and x′ ∈ X ′0 we have that (i) x′ > x, (ii) f(x) > f(x′) + ε, and (iii)

Page 44: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

6

f0(x′) > f0(x) + ε, for some ε > 0. Then the gain in the quality of approximation is

strict for p ∈ (1,∞). Namely, for any p ∈ [1,∞],

[∫

X

∣∣∣f ∗(x)− f0(x)∣∣∣p

dx

]1/p

≤[∫

X

∣∣∣f(x)− f0(x)∣∣∣p

dx− δηp

]1/p

, (2.2)

where ηp = inf{|v − t′|p + |v′ − t|p − |v − t|p − |v′ − t′|p} and ηp > 0 for p ∈ (1,∞), with

the infimum taken over all v, v′, t, t′ in the set K such that v′ ≥ v + ε and t′ ≥ t + ε.

The first part of the proposition states the weak inequality (2.1), and the second part

states the strict inequality (2.2). For example, the inequality is strict for p ∈ (1,∞) if

the original estimate f(x) is decreasing on a subset of X having positive measure, while

the target function f0(x) is increasing on X (by increasing, we mean strictly increasing

throughout). Of course, if f0(x) is constant, then the inequality (2.1) becomes an equal-

ity, as the distribution of the rearranged function f ∗ is the same as the distribution of

the original function f , that is Ff∗ = Ff .

This proposition establishes that the rearranged estimate f ∗ has a smaller estimation

error in the Lp norm than the original estimate whenever the latter is not monotone.

This is a very useful and generally applicable property that is independent of the sample

size and of the way the original estimate f is obtained.

An indirect proof of the weak inequality (2.1) is a simple but important consequence

of the following classical inequality due to Lorentz (1953): Let q and g be two functions

mapping X to K, a bounded subset of R. Let q∗ and g∗ denote their corresponding

increasing rearrangements. Then,

XL(q∗(x), g∗(x), x)dx ≤

XL(q(x), g(x), x)dx,

for any submodular discrepancy function L : R3 7→ R. Set q(x) = f(x), q∗(x) =

f ∗(x), g(x) = f0(x), and g∗(x) = f ∗0 (x). Now, note that in our case f ∗0 (x) = f0(x)

almost everywhere, that is, the target function is its own rearrangement. Moreover,

L(v, w, x) = |w − v|p is submodular for p ∈ [1,∞). This proves the first part of the

proposition above. For p = ∞, the first part follows by taking the limit as p →∞.

In Section 3 we provide a proof of the strong inequality (2.2) as well as the direct proof

of the weak inequality (2.1). The direct proof illustrates how reductions of the estimation

Page 45: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

7

error arise from even a partial sorting of the values of the estimate f . Moreover, the

direct proof characterizes the conditions for the strict reduction of the estimation error.

The following immediate implication of the above finite-sample result is also worth

emphasizing: The rearranged estimate f ∗ inherits the Lp rates of convergence from the

original estimates f . For p ∈ [1,∞], if λn = [∫X |f0(x)− f(x)|pdu]1/p = OP (an) for some

sequence of constants an, then [∫X |f0(x)− f ∗(x)|pdu]1/p ≤ λn = OP (an).

2.3. Computation of the Rearranged Estimate. One of the following methods can

be used for computing the rearrangement. Let {Xj, j = 1, ..., B} be either (1) a net of

equidistant points in [0, 1] or (2) a sample of i.i.d. draws from the uniform distribution

on [0, 1]. Then the rearranged estimate f ∗(u) at point u ∈ X can be approximately

computed as the u-quantile of the sample {f(Xj), j = 1, ..., B}. The first method is

deterministic, and the second is stochastic. Thus, for a given number of draws B, the

complexity of computing the rearranged estimate f ∗(u) in this way is equivalent to the

complexity of computing the sample u-quantile in the sample of size B.

The number of evaluations B can depend on the problem. Suppose that the den-

sity function of the random variable f(X), when X ∼ U(0, 1), is bounded away from

zero over a neighborhood of f ∗(x). Then f ∗(x) can be computed with the accuracy

of OP (1/√

B), as B → ∞, where the rate follows from the results of Knight (2002).

As shown in Chernozhukov, Fernandez-Val, and Galichon (2006a), the density of f(X),

denoted F ′f (t), exists if f(x) is continuously differentiable and the number of elements

in {x ∈ X : f ′(x) = 0} is bounded; in particular,

F ′f (t) =

x∈{r∈X :f(r)=t}

1

|f ′(x)| . (2.3)

Thus, the density F ′f (t) is bounded away from zero if f ′(x) is bounded away from infinity.

Interestingly, the density has infinite poles at {t ∈ X : there is an x such that f ′(x) =

0 and f(x) = t}.

2.4. The Rearrangement and Its Approximation Property: The Multivariate

Case. In this section, we consider multivariate functions f : X d → K, where X d =

[0, 1]d and K is a bounded subset of R. The notion of monotonicity we seek to impose

on f is the following: We say that the function f is weakly increasing in x if f(x′) ≥ f(x)

whenever x′ ≥ x. The notation x′ = (x′1, ..., x′d) ≥ x = (x1, ..., xd) means that one vector

Page 46: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

8

is weakly larger than the other in each of the components, that is, x′j ≥ xj for each

j = 1, ..., d. In what follows, we use the notation f(xj, x−j) to denote the dependence of

f on its j-th argument, xj, and all other arguments, x−j, that exclude xj. The notion

of monotonicity above is equivalent to the requirement that for each j in 1, ..., d the

mapping xj 7→ f(xj, x−j) is weakly increasing in xj, for each x−j in X d−1.

Define the rearranged operator Rj and the rearranged function f ∗j (x) with respect to

the j-th argument as follows:

f ∗j (x) := Rj ◦ f(x) := inf

{y :

[∫

X1{f(x′j, x−j) ≤ y}dx′j

]≥ xj

}.

This is the one-dimensional increasing rearrangement applied to one-dimensional func-

tion xj 7→ f(xj, x−j), holding the other arguments x−j fixed. The rearrangement is

applied for every value of the other arguments x−j.

Let π = (π1, ..., πd) be an ordering, i.e. a permutation, of the integers 1, ..., d. Let us

define the π-rearrangement operator Rπ and the π-rearranged function f ∗π(x) as follows:

f ∗π(x) := Rπ ◦ f(x) := Rπ1 ◦ ... ◦Rπd◦ f(x).

For any ordering π, the π-rearrangement operator rearranges the function with respect

to all of its arguments. As shown below, the resulting function fπ(x) is weakly increasing

in x.

In general, two different orderings π and π′ of 1, ..., d can yield different rearranged

functions f ∗π(x) and f ∗π′(x). Therefore, to resolve the conflict among rearrangements

done with different orderings, we may consider averaging among them: letting Π be any

collection of distinct orderings π, we can define the average rearrangement as

f ∗(x) :=1

|Π|∑π∈Π

f ∗π(x), (2.4)

where |Π| denotes the number of elements in the set of orderings Π. As shown below, the

approximation error of the average rearrangement is weakly smaller than the average of

approximation errors of individual π-rearrangements.

The following proposition describes the properties of multivariate π-rearrangements:

Proposition 2. Let the target function f0 : X d → K be weakly increasing and measur-

able in x. Let f : X d → K be a measurable function that is an initial estimate of the

Page 47: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

9

target function f0. Let f : X d → K be another estimate of f0, which is measurable in x,

including, for example, a rearranged f with respect to some of the arguments. Then,

1. For each ordering π of 1, ..., d, the π-rearranged estimate f ∗π(x) is weakly increasing

in x. Moreover, f ∗(x), an average of π-rearranged estimates, is weakly increasing in x.

2. (a) For any j in 1, ..., d and any p in [1,∞], the rearrangement of f with respect

to the j-th argument produces a weak reduction in the approximation error:

[∫

X d

|f ∗j (x)− f0(x)|pdx

]1/p

≤[∫

X d

|f(x)− f0(x)|pdx

]1/p

. (2.5)

(b) Consequently, a π-rearranged estimate f ∗π(x) of f(x) weakly reduces the approxi-

mation error of the original estimate:

[∫

X d

|f ∗π(x)− f0(x)|pdx

]1/p

≤[∫

X d

|f(x)− f0(x)|pdx

]1/p

. (2.6)

3. Suppose that f(x) and f0(x) have the following properties: there exist subsets

Xj ⊂ X and X ′j ⊂ X , each of measure δ > 0, and a subset X−j ⊆ X d−1, of measure

ν > 0, such that for all x = (xj, x−j) and x′ = (x′j, x−j), with x′j ∈ X ′j , xj ∈ Xj,

x−j ∈ X−j, we have that (i) x′j > xj, (ii) f(x) > f(x′) + ε, and (iii) f0(x′) > f0(x) + ε,

for some ε > 0.

(a) Then, for any p ∈ [1,∞],

[∫

X d

|f ∗j (x)− f0(x)|pdx

]1/p

≤[∫

X d

|f(x)− f0(x)|pdx− ηpδν

]1/p

, (2.7)

where ηp = inf{|v− t′|p + |v′− t|p − |v− t|p − |v′− t′|p}, and ηp > 0 for p ∈ (1,∞), with

the infimum taken over all v, v′, t, t′ in the set K such that v′ ≥ v + ε and t′ ≥ t + ε.

(b) Further, for an ordering π = (π1, ..., πk, ..., πd) with πk = j, let f be a partially

rearranged function, f(x) = Rπk+1◦ ... ◦ Rπd

◦ f(x) (for k = d we set f(x) = f(x)). If

the function f(x) and the target function f0(x) satisfy the condition stated above, then,

for any p ∈ [1,∞],

[∫

X d

|f ∗π(x)− f0(x)|pdx

]1/p

≤[∫

X d

|f(x)− f0(x)|pdx− ηpδν

]1/p

. (2.8)

Page 48: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

10

4. The approximation error of an average rearrangement is weakly smaller than the

average approximation error of the individual π- rearrangements: For any p ∈ [1,∞],

[∫

X d

|f ∗(x)− f0(x)|pdx

]1/p

≤ 1

|Π|∑π∈Π

[∫

X d

|f ∗π(x)− f0(x)|pdx

]1/p

. (2.9)

This proposition generalizes the results of Proposition 1 to the multivariate case,

also demonstrating several features unique of the multivariate case. We see that the

π-rearranged functions are monotonic in all of the arguments. The rearrangement along

any argument improves the approximation properties of the estimate. Moreover, the

improvement is strict when the rearrangement with respect to a j-th argument is per-

formed on an estimate that is decreasing in the j-th argument, while the target function

is increasing in the same j-th argument, in the sense precisely defined in the proposition.

Moreover, averaging different π-rearrangements is better (on average) than using a single

π-rearrangement chosen at random. All other basic implications of the proposition are

similar to those discussed for the univariate case.

3. Proofs of Propositions

3.1. Proof of Proposition 1. The first part establishes the weak inequality, following

in part the strategy in Lorentz’s (1953) proof. The proof focuses directly on obtaining

the result stated in the proposition. The second part establishes the strong inequality.

Proof of Part 1. We assume at first that the functions f(·) and f0(·) are simple

functions, constant on intervals ((s− 1)/r, s/r], s = 1, ..., r. For any simple f(·) with r

steps, let f denote the r-vector with the s-th element, denoted fs, equal to the value of

f(·) on the s-th interval. Let us define the sorting operator S(f) as follows: Let ` be an

integer in 1, ..., r such that f` > fm for some m > l. If ` does not exist, set S(f) = f . If

` exists, set S(f) to be a r-vector with the `-th element equal to fm, the m-th element

equal to f`, and all other elements equal to the corresponding elements of f . For any

submodular function L : R2 → R+, by f` ≥ fm, f0m ≥ f0` and the definition of the

submodularity,

L(fm, f0`) + L(f`, f0m) ≤ L(f`, f0`) + L(fm, f0m).

Page 49: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

11

Therefore, we conclude that

XL(S(f)(x), f0(x))dx ≤

XL(f(x), f0(x))dx, (3.1)

using that we integrate simple functions.

Applying the sorting operation a sufficient finite number of times to f , we obtain a

completely sorted, that is, rearranged, vector f ∗. Thus, we can express f ∗ as a finite

composition f ∗ = S ◦ ... ◦ S(f) . By repeating the argument above, each composition

weakly reduces the approximation error. Therefore,

XL(f ∗(x), f0(x))dx ≤

XL(S ◦ ... ◦ S︸ ︷︷ ︸

finite times

(f), f0(x))dx ≤∫

XL(f(x), f0(x))dx. (3.2)

Furthermore, this inequality is extended to general measurable functions f(·) and f0(·)mapping X to K by taking a sequence of bounded simple functions f (r)(·) and f

(r)0 (·)

converging to f(·) and f0(·) almost everywhere as r → ∞. The almost everywhere

convergence of f (r)(·) to f(·) implies the almost everywhere convergence of its quantile

function f ∗(r)(·) to the quantile function of the limit, f ∗(·). Since inequality (3.2) holds

along the sequence, the dominated convergence theorem implies that (3.2) also holds for

the general case. ¤Proof of Part 2. Let us first consider the case of simple functions, as defined in Part

1. We take the functions to satisfy the following hypotheses: there exist regions X0 and

X ′0, each of measure greater than δ > 0, such that for all x ∈ X0 and x′ ∈ X ′

0, we have

that (i) x′ > x, (ii) f(x) > f(x′) + ε, and (iii) f0(x′) > f0(x) + ε, for ε > 0 specified in

the proposition. For any strictly submodular function L : R2 → R+ we have that

η = inf{L(v′, t) + L(v, t′)− L(v, t)− L(v′, t′)} > 0,

where the infimum is taken over all v, v′, t, t′ in the set K such that v′ ≥ v + ε and

t′ ≥ t + ε.

We can begin sorting by exchanging an element f(x), x ∈ X0, of r-vector f with an

element f(x′), x′ ∈ X ′0, of r-vector f . This induces a sorting gain of at least η times 1/r.

The total mass of points that can be sorted in this way is at least δ. We then proceed to

sort all of these points in this way, and then continue with the sorting of other points.

Page 50: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

12

After the sorting is completed, the total gain from sorting is at least δη. That is,∫

XL(f ∗(x), f0(x))dx ≤

XL(f(x), f0(x))dx− δη.

We then extend this inequality to the general measurable functions exactly as in the

proof of part one. ¤

3.2. Proof of Proposition 2. The proof consists of the following four parts.

Proof of Part 1. We prove the claim by induction. The claim is true for d = 1 by

f ∗(x) being a quantile function. We then consider any d ≥ 2. Suppose the claim is

true in d− 1 dimensions. If so, then the estimate f(xj, x−j), obtained from the original

estimate f(x) after applying the rearrangement to all arguments x−j of x, except for the

argument xj, must be weakly increasing in x−j for each xj. Thus, for any x′−j ≥ x−j,

we have that

f(Xj, x′−j) ≥ f(Xj, x−j) for Xj ∼ U(0, 1). (3.3)

Therefore, the random variable on the left of (3.3) dominates the random variable on

the right of (3.3) in the stochastic sense. Therefore, the quantile function of the random

variable on the left dominates the quantile function of the random variable on the right,

namely

f ∗j (xj, x′−j) ≥ f ∗j (xj, x−j) for each xj ∈ X = (0, 1). (3.4)

Moreover, for each x−j, the function xj 7→ f ∗j (xj, x−j) is weakly increasing by virtue of

being a quantile function. We conclude therefore that x 7→ f ∗j (x) is weakly increasing

in all of its arguments at all points x ∈ X d. The claim of Part 1 of the Proposition now

follows by induction. ¤Proof of Part 2 (a). By Proposition 1, we have that for each x−j,∫

X

∣∣f ∗j (xj, x−j)− f0(xj, x−j)∣∣p dxj ≤

X

∣∣f(xj, x−j)− f0(xj, x−j)∣∣p dxj. (3.5)

Now, the claim follows by integrating with respect to x−j and taking the p-th root of

both sides. For p = ∞, the claim follows by taking the limit as p →∞. ¤Proof of Part 2 (b). We first apply the inequality of Part 2(a) to f(x) = f(x), then

to f(x) = Rπd◦ f(x), then to f(x) = Rπd−1

◦ Rπd◦ f(x), and so on. In doing so,

we recursively generate a sequence of weak inequalities that imply the inequality (2.6)

stated in the Proposition. ¤

Page 51: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

13

Proof of Part 3 (a). For each x−j ∈ X d−1 \ X−j, by Part 2(a), we have the weak

inequality (3.5), and for each x−j ∈ X−j, by the inequality for the univariate case stated

in Proposition 1 Part 2, we have the strong inequality∫

X

∣∣f ∗j (xj, x−j)− f0(xj, x−j)∣∣p dxj ≤

X

∣∣f(xj, x−j)− f0(xj, x−j)∣∣p dxj − ηpδ, (3.6)

where ηp is defined in the same way as in Proposition 1. Integrating the weak inequality

(3.5) over x−j ∈ X d−1 \ X−j, of measure 1− ν, and the strong inequality (3.6) over X−j,

of measure ν, we obtain∫

X d

∣∣f ∗j (x)− f0(x)∣∣p dx ≤

X d

∣∣f(x)− f0(x)∣∣p dx− ηpδν. (3.7)

The claim now follows. ¤Proof of Part 3 (b). As in Part 2(a), we can recursively obtain a sequence of weak

inequalities describing the improvements in approximation error from rearranging se-

quentially with respect to the individual arguments. Moreover, at least one of the

inequalities can be strengthened to be of the form stated in (3.7), from the assumption

of the claim. The resulting system of inequalities yields the inequality (2.8), stated in

the proposition. ¤Proof of Part 4. We can write

[∫

X d

∣∣∣f ∗(x)− f0(x)∣∣∣p

dx

]1/p

=

[∫

X d

∣∣∣∣∣1

|Π|∑π∈Π

(f ∗(x)− f0(x)

)∣∣∣∣∣

p

dx

]1/p

≤ 1

|Π|∑π∈Π

[∫

X d

∣∣∣f ∗π(x)− f0(x)∣∣∣p

dx

]1/p

,

(3.8)

where the last inequality follows by pulling out 1/|Π| and then applying the triangle

inequality for the Lp norm. ¤

4. Illustrations

In this section we provide an empirical application to biometric age-height charts.

We show how the rearrangement monotonizes and improves various nonparametric esti-

mates, and then we quantify the improvement in a simulation example that mimics the

empirical application.

Page 52: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

14

4.1. An Empirical Illustration with Age-Height Reference Charts. Since their

introduction by Quetelet in the 19th century, reference growth charts have become com-

mon tools to asses an individual’s health status. These charts describe the evolution

of individual anthropometric measures, such as height, weight, and body mass index,

across different ages. See Cole (1988) for a classical work on the subject and Wei, Pere,

Koenker, and He (2006) for a recent analysis from a quantile regression perspective and

additional references.

To illustrate the properties of the rearrangement method we consider the estimation

of growth charts for height. It is clear that height should naturally follow an increasing

relationship with age. Our data consist of repeated cross sectional measurements of

height and age from the 2003-2004 National Health and Nutrition Survey collected by

the National Center for Health Statistics. Height is measured as standing height in

centimeters, and age is recorded in months and expressed in years. To avoid confounding

factors that might affect the relationship between age and height, we restrict the sample

to US-born white males age two through twenty. Our final sample consists of 533 subjects

almost evenly distributed across these ages.

Let Y and X denote height and age, respectively. Let E[Y |X = x] denote the condi-

tional expectation of Y given X = x, and QY (u|X = x) denote the u-th quantile of Y

given X = x, where u is the quantile index. The population functions of interests are

(1) the conditional expectation function (CEF), (2) the conditional quantile functions

(CQF) for several quantile indices (5%, median, and 95%), and (3) the entire condi-

tional quantile process (CQP) for height given age. In the first case, the target function

x 7→ f0(x) is x 7→ E[Y |X = x]; in the second case, the target function x 7→ f0(x) is

x 7→ QY [u|X = x], for u = 5%, 50%, and 95%; and, in the third case, the target func-

tion (u, x) 7→ f0(u, x) is (u, x) 7→ QY [u|X = x]. The natural monotonicity requirements

for the target functions are the following: The CEF x 7→ E[Y |X = x] and the CQF

x 7→ QY (u|X = x) should be increasing in age x, and the CQP (u, x) 7→ QY [u|X = x]

should be increasing in both age x and the quantile index u.

We estimate the target functions using non-parametric ordinary least squares or quan-

tile regression techniques and then rearrange the estimates to satisfy the monotonicity

requirements. We consider (a) kernel, (b) local linear, (c) spline, (d) global polynomial,

(e) Fourier, and (f) flexible Fourier methods. For the kernel method, we provide a fit

Page 53: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

15

on a cell-by-cell basis, with each cell corresponding to one month. For the local linear

method, we choose a bandwidth of one year and a box kernel. For the spline method,

we use cubic B-splines with a knot sequence {3, 5, 8, 10, 11.5, 13, 14.5, 16, 18}, following

Wei, Pere, Koenker, and He (2006). For the global polynomial method, we fit a quartic

polynomial. For the Fourier method, we employ eight trigonometric terms, with four

sines and four cosines. For the flexible Fourier method, we use a quadratic polynomial

and four trigonometric terms, with two sines and two cosines. Finally, for the estima-

tion of the conditional quantile process, we use a net of two hundred quantile indices

{0.005, 0.010, ..., 0.995}. In the choice of the parameters for the different methods, we

select values that either have been used in the previous empirical work or give rise to

specifications with similar complexities for the different methods.

The panels A-F of Figure 1 show the original and rearranged estimates of the con-

ditional expectation function for the different methods. All the estimated curves have

trouble capturing the slowdown in the growth of height after age sixteen and yield non-

monotonic curves for the highest values of age. The Fourier series have a special difficulty

approximating the aperiodic age-height relationship. The rearranged estimates correct

the non-monotonicity of the original estimates, providing weakly increasing curves that

coincide with the original estimates in the parts where the latter are monotonic. More-

over, the rearranged estimates necessarily improve upon the original estimates, since,

by the theoretical results derived earlier, they are closer to the true functions than the

original estimates. We quantify this improvement in the next subsection.

Figure 2 displays similar but more pronounced non-monotonicity patterns for the

estimates of the conditional quantile functions. The rearrangement again performs well

in delivering curves that improve upon the original estimates and that satisfy the natural

monotonicity requirement.

Figures 3-7 illustrate the multivariate rearrangement of the conditional quantile pro-

cess (CQP) along both the age and the quantile index arguments. We plot in three

dimensions the original estimate, its age rearrangement, its quantile rearrangement, and

its average multivariate rearrangement (the average of the age-quantile and quantile-age

rearrangements). We also plot the corresponding contour surfaces. (Here, we do not

show the multivariate age-quantile and quantile-age rearrangements separately, because

Page 54: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

16

they are very similar to the average multivariate rearrangement.) We see from the con-

tour plots that, for all of the estimation methods considered, the estimated CQP is

non-monotone in age and non-monotone in the quantile index at extremal values of this

index. The contour plots for the estimates based on the Fourier series best illustrate the

non-monotonicity problem. We see that the average multivarite rearrangement fixes the

non-monotonicity problem, and delivers an estimate of the CQP that is monotone in

both the age and the quantile index arguments. Furthermore, by the theoretical results

of the paper, the multivariate rearranged estimates necessarily improve upon the original

estimates.

4.2. Monte-Carlo Illustration. The following Monte Carlo experiment quantifies the

improvement in the estimation/approximation properties of the rearranged estimates

relative to the original estimates. The experiment closely matches the empirical appli-

cation presented above.

Specifically, we consider the design where the outcome variable Y equals a location

function plus a disturbance ε, Y = Z(X)′β+ε, and the disturbance is independent of the

regressor X. The vector Z(X) includes a constant and a piecewise linear transformation

of the regressor X with three changes of slope, namely Z(X) = (1, X, 1{X > 5} · (X −5), 1{X > 10} · (X − 10), 1{X > 15} · (X − 15)). This design implies the conditional

expectation function

E[Y |X] = Z(X)′β, (4.1)

and the conditional quantile function

QY (u|X) = Z(X)′β + Qε(u). (4.2)

We select the parameters of the design to match the empirical example of growth charts

in the previous subsection. Thus, we set the parameter β equal to the ordinary least

squares estimate obtained in the growth chart data, namely (71.25, 8.13, −2.72, 1.78,

−6.43). This parameter value and the location specification (4.2) imply a model for CEF

and CQP that is monotone in age over the range of 2-20. To generate the values of the

dependent variable, we draw disturbances from a normal distribution with the mean and

variance equal to the mean and variance of the estimated residuals, ε = Y − Z(X)′β,

in the growth chart data. We fix the regressor X in all of the replications to be the

Page 55: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

17

observed values of age in the growth chart data set. In each replication, we estimate the

CEF and CQP using the nonparametric methods described in the previous section.

In Table 1 we report the average Lp errors (for p = 1, 2, 3, 4 and ∞) for the original

estimates and the rearranged estimates of the CEF. We also report the relative efficiency

of the two estimates, measured as the ratio of the average error of the rearranged estimate

to the average error of the original estimate. We calculate the average Lp error as the

Monte Carlo average of

Lp :=

[∫

X|f(x)− f0(x)|pdx

]1/p

,

where the target function f0(x) is the CEF E[Y |X = x], and the estimate f(x) denotes

either the original nonparametric estimate of the CEF or its increasing rearrangement.

For all of the methods considered, we find that the rearranged curves estimate the true

CEF more accurately than the original curves, providing a 2% to 84% reduction in the

average error, depending on the method and the norm (i.e. values of p).

In Table 2 we report the average Lp errors for the original estimates of the conditional

quantile process and their multivariate rearrangement with respect to the age and quan-

tile index arguments. We also report the ratio of the average error of the rearranged

estimate to the average error of the original estimate. The average Lp error is the Monte

Carlo average of

Lp :=

[∫

U

X|f(u, x)− f0(u, x)|pdxdu

]1/p

,

where the target function f0(u, x) is the conditional quantile process QY (u|X = x), and

the estimate f(u, x) denotes either the original nonparametric estimate of the conditional

quantile process or its multivariate rearrangement. We present the results for the av-

erage multivariate rearrangement only. The age-quantile and quantile-age multivariate

rearrangements give errors that are very similar to their average multivariate rearrange-

ment, and we therefore do not report them separately. For all the methods considered,

we find that the multivariate rearranged curves estimate the true CQP more accurately

than the original curves, providing a 4% to 74% reduction in the approximation error,

depending on the method and the norm.

Page 56: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

18

In Table 3 we report the average Lp error for the univariate rearrangements of the

conditional quantile function along either the age argument or the quantile index ar-

gument. We also report the ratio of the average error for these rearrangements to the

average error of the original estimates. For all of the methods considered, we find that

these rearranged curves estimate the true CQP more accurately than the original curves,

providing noticeable reductions in the estimation error. Moreover, in this case the re-

arrangement along the age argument is more effective in reducing the estimation error

than the rearrangement along the quantile index. Furthermore, by comparing Tables 2

and 3, we also see that the multivariate rearrangement provides an improvement over

the individual univariate rearrangements, yielding estimates of the CQP that are often

much closer to the true process.

5. Conclusion

Suppose that a target function is known to be weakly increasing, and we have an

original estimate of this function, which is not weakly increasing. Common estima-

tion methods provide estimates with such a property. We show that these estimates

can always be improved using rearrangement techniques. The univariate and multivari-

ate rearrangement methods transform the original estimate to a monotonic estimate.

The resulting monotonic estimate is in fact closer to the target function in common

metrics than the original estimate. We illustrate these theoretical results with a com-

putational example and an empirical example, dealing with estimation of conditional

mean and quantile functions of height given age. The rearrangement both monotonizes

and improves the original non-monotone estimates. It would be interesting to determine

whether this improved estimation/approximation property carries over to other methods

of monotonization. We leave this extension for future research.

References

Andrews, D. W. K. (1991): “Asymptotic normality of series estimators for nonparametric and semi-

parametric regression models,” Econometrica, 59(2), 307–345.

Chaudhuri, P. (1991): “Nonparametric estimates of regression quantiles and their local Bahadur

representation,” Ann. Statist., 19, 760–777.

Chernozhukov, V., I. Fernandez-Val, and A. Galichon (2006a): “Quantile and Probability

Curves without Crossing,” MIT Working Paper, www.arxiv.org and www.ssrn.com.

Page 57: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

19

(2006b): “Rearranging Edgeworth-Cornish-Fisher Expansions,” MIT Working Paper, coming

to www.arxiv.org and www.ssrn.com.

Cole, T. J. (1988): “Fitting smoothed centile curves to reference data,” J Royal Stat Soc, 151, 385–418.

Dette, H., N. Neumeyer, and K. F. Pilz (2006): “A simple nonparametric estimator of a strictly

monotone regression function,” Bernoulli, 12(3), 469–490.

Dette, H., and K. F. Pilz (2006): “A comparative study of monotone nonparametric kernel esti-

mates,” J. Stat. Comput. Simul., 76(1), 41–56.

Fan, J., and I. Gijbels (1996): Local polynomial modelling and its applications, vol. 66 of Monographs

on Statistics and Applied Probability. Chapman & Hall, London.

Gallant, A. R. (1981): “On the bias in flexible functional forms and an essentially unbiased form:

the Fourier flexible form,” J. Econometrics, 15(2), 211–245.

Hardy, G. H., J. E. Littlewood, and G. Polya (1952): Inequalities. Cambridge University Press,

2d ed.

He, X., and Q.-M. Shao (2000): “On parameters of increasing dimensions,” J. Multivariate Anal.,

73(1), 120–135.

Knight, K. (2002): “What are the limiting distributions of quantile estimators?,” in Statistical data

analysis based on the L1-norm and related methods (Neuchatel, 2002), Stat. Ind. Technol., pp. 47–65.

Birkhauser, Basel.

Koenker, R., and G. S. Bassett (1978): “Regression quantiles,” Econometrica, 46, 33–50.

Koenker, R., and P. Ng (2005): “Inequality constrained quantile regression,” Sankhya, 67(2), 418–

440.

Lorentz, G. G. (1953): “An inequality for rearrangements,” Amer. Math. Monthly, 60, 176–179.

Matzkin, R. L. (1994): “Restrictions of economic theory in nonparametric methods,” in Handbook of

econometrics, Vol. IV, vol. 2 of Handbooks in Econom., pp. 2523–2558. North-Holland, Amsterdam.

Newey, W. K. (1997): “Convergence rates and asymptotic normality for series estimators,” J. Econo-

metrics, 79(1), 147–168.

Portnoy, S. (1997): “Local asymptotics for quantile smoothing splines,” Ann. Statist., 25(1), 414–434.

Ramsay, J. O., and B. W. Silverman (2005): Functional data analysis. Springer, New York, second

edn.

Silvapulle, M. J., and P. K. Sen (2005): Constrained statistical inference, Wiley Series in Proba-

bility and Statistics. Wiley-Interscience [John Wiley & Sons], Hoboken, NJ.

Stone, C. J. (1994): “The use of polynomial splines and their tensor products in multivariate function

estimation,” Ann. Statist., 22(1), 118–184, With discussion by Andreas Buja and Trevor Hastie and a

rejoinder by the author.

Villani, C. (2003): Topics in optimal transportation, vol. 58 of Graduate Studies in Mathematics.

American Mathematical Society, Providence, RI.

Wand, M. P., and M. C. Jones (1995): Kernel smoothing, vol. 60 of Monographs on Statistics and

Applied Probability. Chapman and Hall Ltd., London.

Page 58: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

20

Wei, Y., A. Pere, R. Koenker, and X. He (2006): “Quantile regression methods for reference

growth charts,” Stat. Med., 25(8), 1369–1382.

Page 59: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

21

Table 1. Lp Estimation/Approximation Error of Original and Rear-

ranged Estimates of the Conditional Expectation Function, for p =

1, 2, 3, 4, and ∞. Univariate Rearrangement.

p LpO Lp

R LpR/Lp

O LpO Lp

R LpR/Lp

O

A. Kernel B. Local Polynomial

1 3.69 1.33 0.36 0.79 0.76 0.96

2 4.80 1.84 0.38 1.00 0.96 0.96

3 5.81 2.46 0.42 1.17 1.13 0.96

4 6.72 3.12 0.46 1.33 1.28 0.96

∞ 16.8 9.84 0.58 2.96 2.81 0.95

C. Splines D. Quartic

1 0.87 0.81 0.93 1.33 1.19 0.89

2 1.10 1.02 0.93 1.64 1.46 0.89

3 1.31 1.22 0.93 1.89 1.68 0.89

4 1.52 1.39 0.92 2.10 1.87 0.89

∞ 3.72 3.19 0.86 4.38 3.79 0.87

E. Fourier F. Flexible Fourier

1 6.57 3.21 0.49 0.73 0.72 0.97

2 10.7 3.79 0.35 0.91 0.89 0.97

3 15.2 4.24 0.28 1.06 1.04 0.98

4 19.0 4.59 0.24 1.18 1.16 0.98

∞ 48.9 7.79 0.16 2.44 2.40 0.98

Notes: The table is based on 10,000 replications.

LpO is the Lp error of the original estimate.

LpR is the Lp error of the rearranged estimate.

Page 60: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

22

Table 2. Lp Estimation/Approximation Error of Original and Rear-

ranged Estimates of the Conditional Quantile Process, for p = 1, 2, 3, 4,

and ∞. Average Multivariate Rearrangement.

p LpO Lp

RR LpRR/Lp

O LpO Lp

RR LpRR/Lp

O

A. Kernel B. Local Polynomial

1 5.35 3.13 0.58 1.21 1.09 0.91

2 6.97 4.37 0.63 1.61 1.46 0.91

3 8.40 5.49 0.65 2.03 1.84 0.91

4 9.72 6.49 0.67 2.48 2.24 0.91

∞ 34.3 26.4 0.77 12.3 10.4 0.84

C. Splines D. Quartic

1 1.33 1.20 0.90 1.49 1.35 0.90

2 1.78 1.60 0.90 1.87 1.69 0.90

3 2.30 2.03 0.88 2.23 1.99 0.89

4 2.92 2.50 0.86 2.62 2.29 0.87

∞ 16.9 12.1 0.72 12.6 8.61 0.68

E. Fourier F. Flexible Fourier

1 6.72 4.18 0.62 1.05 1.00 0.96

2 13.7 5.35 0.39 1.38 1.31 0.95

3 20.8 6.36 0.31 1.72 1.63 0.95

4 26.7 7.25 0.27 2.12 1.98 0.94

∞ 84.9 21.9 0.26 10.9 9.13 0.84

Notes: The table is based on 1,000 replications.

LpO is the Lp error of the original estimate.

LpRR is the Lp error of the average multivariate rearranged estimate.

Page 61: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

23

Table 3. Lp Estimation/Approximation Error of Rearranged Estimates

of the Conditional Quantile Process, for p = 1, 2, 3, 4,∞. Univariate Re-

arrangements.

p LpRu

LpRx

LpRu

/LpO Lp

Rx/Lp

O LpRu

LpRx

LpRu

/LpO Lp

Rx/Lp

O

A. Kernel B. Local Polynomial

1 5.35 3.13 1.00 0.58 1.20 1.10 1.00 0.91

2 6.97 4.37 1.00 0.63 1.60 1.47 1.00 0.91

3 8.40 5.49 1.00 0.65 2.01 1.85 0.99 0.91

4 9.72 6.49 1.00 0.67 2.45 2.26 0.99 0.91

∞ 34.3 26.4 1.00 0.77 11.8 10.8 0.96 0.88

C. Splines D. Quartic

1 1.31 1.21 0.99 0.91 1.49 1.35 1.00 0.91

2 1.75 1.63 0.98 0.91 1.87 1.69 1.00 0.90

3 2.24 2.08 0.97 0.90 2.22 2.00 0.99 0.90

4 2.80 2.59 0.96 0.89 2.60 2.30 0.99 0.88

∞ 14.4 13.9 0.85 0.82 11.9 9.11 0.95 0.72

E. Fourier F. Flexible Fourier

1 6.71 4.19 1.00 0.62 1.04 1.01 0.99 0.96

2 13.7 5.36 1.00 0.39 1.36 1.32 0.99 0.96

3 20.8 6.37 1.00 0.31 1.70 1.65 0.99 0.96

4 26.7 7.26 1.00 0.27 2.08 2.02 0.98 0.95

∞ 84.9 22.2 1.00 0.26 10.0 9.86 0.92 0.91

Notes. The table is based on 1,000 replications.

LpO is the Lp error of the original estimate.

LpRx

is the Lp error of the estimate rearranged in age x.

LpRu

is the Lp error of the estimate rearranged in the quantile index u.

Page 62: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

24

5 10 15 20

8010

012

014

016

018

0

Age

Hei

ght

A. CEF (Kernel)

OriginalRearranged

5 10 15 20

8010

012

014

016

018

0

Age

Hei

ght

B. CEF (Local Pol.)

OriginalRearranged

5 10 15 20

8010

012

014

016

018

0

Age

Hei

ght

C. CEF (Splines)

OriginalRearranged

5 10 15 20

8010

012

014

016

018

0

Age

Hei

ght

D. CEF (Quartic)

OriginalRearranged

5 10 15 20

8010

012

014

016

018

0

Age

Hei

ght

E. CEF (Fourier)

OriginalRearranged

5 10 15 20

8010

012

014

016

018

0

Age

Hei

ght

F. CEF (Flexible Fourier)

OriginalRearranged

Figure 1. Nonparametric estimates of the Conditional Expectation

Function (CEF) of height given age and their increasing rearrangements.

Nonparametric estimates are obtained using kernel regression (A), locally

linear regression (B), cubic B-splines series (C), a four degree polynomial

(D), Fourier series (E), and flexible Fourier series (F).

Page 63: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

25

5 10 15 20

8010

012

014

016

018

0

Age

Hei

ght

A. CQF: 5%, 50%, 95% (Kernel)

OriginalRearranged

5 10 15 20

8010

012

014

016

018

0

Age

Hei

ght

B. CQF: 5%, 50%, 95% (Local Pol.)

OriginalRearranged

5 10 15 20

8010

012

014

016

018

0

Age

Hei

ght

C. CQF: 5%, 50%, 95% (Splines)

OriginalRearranged

5 10 15 20

8010

012

014

016

018

0

Age

Hei

ght

D. CQF: 5%, 50%, 95% (Quartic)

OriginalRearranged

5 10 15 20

8010

012

014

016

018

0

Age

Hei

ght

E. CQF: 5%, 50%, 95% (Fourier)

OriginalRearranged

5 10 15 20

8010

012

014

016

018

0

Age

Hei

ght

F. CQF: 5%, 50%, 95% (Flexible Fourier)

OriginalRearranged

Figure 2. Nonparametric estimates of the 5%, 50%, and 95% Condi-

tional Quantile Functions (CQF) of height given age and their increasing

rearrangements. Nonparametric estimates are obtained using kernel re-

gression (A), locally linear regression (B), cubic B-splines series (C), a

four degree polynomial (D), Fourier series (E), and flexible Fourier series

(F).

Page 64: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

26

quan

tileage

height

A. CQP (Kernel)

quantile

age

0.0 0.2 0.4 0.6 0.8 1.0

510

1520

B. CQP: Contour

quan

tileage

height

C. CQP: Age Rearrangement

quantile

age

0.0 0.2 0.4 0.6 0.8 1.0

510

1520

D. CQP: Contour (R−Age)

quan

tileage

height

E. CQP: Quantile Rearrangement

quantile

age

0.0 0.2 0.4 0.6 0.8 1.0

510

1520

F. CQP: Contour (R−Quantile)

quan

tileage

height

G. CQP: Average Quantile/Age Rearrangement

quantile

age

0.0 0.2 0.4 0.6 0.8 1.0

510

1520

H. CQP: Contour (RR−Quantile/Age)

Figure 3. Kernel estimates of the Conditional Quantile Process (CQP)

of height given age and their increasing rearrangements. Panels C and

E plot the one dimensional increasing rearrangement along the age and

quantile dimension respectively; panel G shows the average multivariate

rearrangement.

Page 65: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

27

quan

tileage

height

A. CQP (Local Pol.)

quantile

age

0.0 0.2 0.4 0.6 0.8 1.0

510

1520

B. CQP: Contour

quan

tileage

height

C. CQP: Age Rearrangement

quantile

age

0.0 0.2 0.4 0.6 0.8 1.0

510

1520

D. CQP: Contour (R−Age)

quan

tileage

height

E. CQP: Quantile Rearrangement

quantile

age

0.0 0.2 0.4 0.6 0.8 1.0

510

1520

F. CQP: Contour (R−Quantile)

quan

tileage

height

G. CQP: Average Quantile/Age Rearrangement

quantile

age

0.0 0.2 0.4 0.6 0.8 1.0

510

1520

H. CQP: Contour (RR−Quantile/Age)

Figure 4. Locally linear estimates of the Conditional Quantile Process

(CQP) of height given age and their increasing rearrangements. Panels C

and E plot the one dimensional increasing rearrangement along the age and

quantile dimension respectively; panel G shows the average multivariate

rearrangement.

Page 66: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

28

quan

tileage

height

A. CQP (Splines)

quantile

age

0.0 0.2 0.4 0.6 0.8 1.0

510

1520

B. CQP: Contour

quan

tileage

height

C. CQP: Age Rearrangement

quantile

age

0.0 0.2 0.4 0.6 0.8 1.0

510

1520

D. CQP: Contour (R−Age)

quan

tileage

height

E. CQP: Quantile Rearrangement

quantile

age

0.0 0.2 0.4 0.6 0.8 1.0

510

1520

F. CQP: Contour (R−Quantile)

quan

tileage

height

G. CQP: Average Quantile/Age Rearrangement

quantile

age

0.0 0.2 0.4 0.6 0.8 1.0

510

1520

H. CQP: Contour (RR−Quantile/Age)

Figure 5. Cubic B-splines series estimates of the Conditional Quantile

Process (CQP) of height given age and their increasing rearrangements.

Panels C and E plot the one dimensional increasing rearrangement along

the age and quantile dimension respectively; panel G shows the average

multivariate rearrangement.

Page 67: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

29

quan

tileage

height

A. CQP (Quartic)

quantile

age

0.0 0.2 0.4 0.6 0.8 1.0

510

1520

B. CQP: Contour

quan

tileage

height

C. CQP: Age Rearrangement

quantile

age

0.0 0.2 0.4 0.6 0.8 1.0

510

1520

D. CQP: Contour (R−Age)

quan

tileage

height

E. CQP: Quantile Rearrangement

quantile

age

0.0 0.2 0.4 0.6 0.8 1.0

510

1520

F. CQP: Contour (R−Quantile)

quan

tileage

height

G. CQP: Average Quantile/Age Rearrangement

quantile

age

0.0 0.2 0.4 0.6 0.8 1.0

510

1520

H. CQP: Contour (RR−Quantile/Age)

Figure 6. Quartic polynomial series estimates of the Conditional Quan-

tile Process (CQP) of height given age and their increasing rearrange-

ments. Panels C and E plot the one dimensional increasing rearrange-

ment along the age and quantile dimension respectively; panel G shows

the average multivariate rearrangement.

Page 68: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

30

quan

tileage

height

A. CQP (Fourier)

quantile

age

0.0 0.2 0.4 0.6 0.8 1.0

510

1520

B. CQP: Contour

quan

tileage

height

C. CQP: Age Rearrangement

quantile

age

0.0 0.2 0.4 0.6 0.8 1.0

510

1520

D. CQP: Contour (R−Age)

quan

tileage

height

E. CQP: Quantile Rearrangement

quantile

age

0.0 0.2 0.4 0.6 0.8 1.0

510

1520

F. CQP: Contour (R−Quantile)

quan

tileage

height

G. CQP: Average Quantile/Age Rearrangement

quantile

age

0.0 0.2 0.4 0.6 0.8 1.0

510

1520

H. CQP: Contour (RR−Quantile/Age)

Figure 7. Fourier series estimates of the Conditional Quantile Process

(CQP) of height given age and their increasing rearrangements. Panels C

and E plot the one dimensional increasing rearrangement along the age and

quantile dimension respectively; panel G shows the average multivariate

rearrangement.

Page 69: Victor Chernozhukov Iván Fernández-Val Alfred Galichon · of this paper is (partially) borrowed from the work of Xuming He (1997), to whom we are grateful for the inspiration and

31

quan

tileage

height

A. CQP (Flexible Fourier)

quantile

age

0.0 0.2 0.4 0.6 0.8 1.0

510

1520

B. CQP: Contour

quan

tileage

height

C. CQP: Age Rearrangement

quantile

age

0.0 0.2 0.4 0.6 0.8 1.0

510

1520

D. CQP: Contour (R−Age)

quan

tileage

height

E. CQP: Quantile Rearrangement

quantile

age

0.0 0.2 0.4 0.6 0.8 1.0

510

1520

F. CQP: Contour (R−Quantile)

quan

tileage

height

G. CQP: Average Quantile/Age Rearrangement

quantile

age

0.0 0.2 0.4 0.6 0.8 1.0

510

1520

H. CQP: Contour (RR−Quantile/Age)

Figure 8. Flexible Fourier form series estimates of the Conditional

Quantile Process (CQP) of height given age and their increasing rearrange-

ments. Panels C and E plot the one dimensional increasing rearrangement

along the age and quantile dimension respectively; panel G shows the av-

erage multivariate rearrangement.