Span of Control in Production Hierarchies...Span of Control in Production Hierarchies Jan Eeckhout and Roberto Pinheiro University of Pennsylvania February, 2008 (Preliminary Draft)

Span of Control in Production Hierarchies∗

Jan Eeckhout and Roberto Pinheiro

University of Pennsylvania

February, 2008

(Preliminary Draft)

Abstract

The allocation of skills in the firm is determined by the span of control of managers. The span

of control of any given manager includes the lower skilled managers and the workers that are in the

span of control of those lower skilled managers. At each level, skills are imperfect substitutes in the

production of output and there are decreasing returns to hiring more agents with the same skill level.

In a competitive labor market with atomless firms, we find that: 1. firms have a non-degenerate skill

distribution; 2. larger firms hire disproportionately more skilled workers. As a result, large firms

have a skill distribution that is more skewed and they pay on average higher wages; 3. The presence

of a non-divisibility as in Lucas (1978) will pin down the skill level of the highest skilled manager.

When investment in skills is endogenized, we find that the equilibrium skill distribution has a long

right tail, even if ex ante all agents are identical.

1 Introduction

The firm is a distribution of skills. Workers with many different talents and skills populate a typical

firm. Even the smallest firms have occupations ranging from CEO to janitor. Most production processes

involve the collaboration of agents with different ability and skill, and typically there is a hierarchy of

decision making and execution of tasks. With those hierarchies comes a distribution of skills, and

as a consequence, a distribution wages. We put forward a matching theory where firms operate in

a competitive labor market and where workers optimally sort in different firms. The theory aims to

characterize the firm as an equilibrium composition of skilled workers. Each firm has an endogenous

organigram that maximizes output. With this production technology, it does not only matter who your

workers are, but also what they do and what the organigram of the firm is.∗We are grateful to numerous colleagues for valuable discussions and comments.

1

a + bnγ

n

h(n)

a

1

Lucas

Figure 1: Span of Control: h(n) versus Lucas (1978)

The starting point of our analysis is the Lucas (1978) span of control model. There, the firm consists

of a manager who hires workers, and her productivity is limited by her span of control. Hiring more

workers has decreasing returns, which in equilibrium determines the boundaries of the firm. Because

there are complementarities between managers and workers, higher skilled managers in equilibrium hire

more workers. The firm is fully characterized by one manager and her skill, and the equilibrium number

of workers. The key assumption is that a firm has exactly one manager. This non-divisibility seems

reasonable: to run a firm, we need to hire one manager who is full time devoted to the firm.

We further build on this model in two ways. First, while we maintain that a limited amount of time

commitment by a manager is needed, we relax the assumption that a firm is restricted to hiring exactly

one manager. Our interpretation is that a minimum scale of managerial skill is needed, but that we

can extend beyond the minimum scale. When a firm hires more managers with the same skill level, we

assume that there are decreasing returns. We model this by the a function h(n) which measures the

productivity at a given skill level. In Lucas, this is a step function valued zero if you hire less than

one manager, and valued one if you hire 1 or more managers. We have a smooth concave function that

reflects the decreasing returns to hiring more workers. If the intercept of that function is negative, we

assume it to be bounded at zero: the firm can always opt out and not hire any manager of that skill

type at all. This is illustrated in Figure 1 where h(n) is a simple polynomial.

2

Second, we assume that each skilled agent faces the same technology within the firm, i.e. there is no

distinction between managers and workers. A hire skilled manager has span of control over the lower

skilled manager who in turn has span of control over the next lower skilled managers.

Within this framework we study how the distribution of skills and earnings differs between firms.

We will be able to establish whether larger firms hire more skilled workers and whether the distribution

of skills inside large firms is more dispersed than in small firms. While all firms hire a wide range of

skills, it is possible that large firms like GE or IBM have a very different composition of skills than

small family firms.

The firm exists because different inputs in production are needed to produce output. And while any

particular individual or individuals of similar ability may be able to provide those different inputs in

production, typically it will be optimal to allocate differently skilled agents to different jobs even if one

particular skilled agent is better at performing all other tasks. This is because the firm faces a trade-off

between allocating a more skilled worker who contributes more to output at a higher wage, and a less

productive agent who commands a lower wage. The price system, market wages for different skill levels,

determines the optimal resolution of the trade-off and therefore fixes the equilibrium allocation of skills.

The optimal solution to this trade-off fundamentally boils down to the allocation of talent according

to comparative advantage, as captured by the well known metaphor of the attorney and the secretary.

Even if the best lawyer is also the best secretary, it is quite clear that this lawyer will focus on the

task of being an attorney by employing a secretary instead of doing all the paperwork by himself, or by

hiring a secretary who is as skilled as he is himself. Even though he is the best secretary and the best

lawyer, he can earn more by running a law firm and employing a low skilled secretary at a low wage

than by hiring another expensive attorney to do the secretarial work.

This implicitly involves production hierarchies as the marginal product of a lower skilled agent is

affected by the skill-level of those higher in the hierarchy. There are many reasons why such production

hierarchies emerge. This may be due to the efficient processing of information and resolution of prob-

lems. Garicano (2000) and Antras, Garicano and Rossi-Hansberg (2006) provide micro foundations for

why particular production functions may be optimal. The hierarchies may alternatively also be due to

O-ring type production technologies with asymmetry as in Kremer-Maskin (1996). A small mistake by

one worker in the production chain can have implications of unprecedented dimensions. One bug in

the software may lead to the malfunctioning of millions of electronic devices, or the inadequate quality

control for lead in paint can lead to a worldwide recall of a toy.

There are two main implications for the equilibrium allocation of using this technology:

1. The firm size is endogenous and consists of a non-degenerate distribution of skills.

The imperfect substitutability of workers as inputs in production implies that the size of the firm is

3

endogenous. For reasons of comparative advantage in different jobs, firms in equilibrium decide to hire

workers with different talent. Of course, quite a lot is known about the size distribution of firms (for

recent examples, see Luttmer (2007) and Rossi-Hansberg and Wright (2007)). The interest here is how

firm size relates to the internal distribution of skills within the firm.

2. Firms differ in their composition of talent. Firms with higher firm-specific total factor

productivity will hire more labor which is due to the complementarity between capital and labor inputs.

More interestingly, the equilibrium distribution of skills within different firms are not identical. We

show that only if the elasticity of substitution between different skills is constant and there are no

indivisibilities, will the distribution of skills within different firms be identical. We characterize the

properties of different firms for some skill distributions. Because size and firm-specific TFP are related,

we can characterize the firm’s skill distribution in function of the firm size.

First we analyze the general case without any indivisibility, i.e. with h(n) positive everywhere. We

show that if the elasticity of substitution between different skills is constant, all firms will obtain the

same distribution of skills. Firms with higher capital stocks will be larger, but they will not differ in

the composition of skills. In contrast, if the elasticity of substitution is decreasing, the skill distribution

of larger firms stochastically dominates the distribution of smaller firms. The implication of the first-

order stochastic dominance in the skill distribution is that there is also stochastic dominance in the

wage distribution. We analyze a competitive equilibrium in which market wages for the same skills are

the same. Larger firms therefore hire on average more skilled workers and therefore pay on average

higher wages. This can explain a well-documented fact in the empirical labor literature, that there is an

employer-size wage premium. While this fact is typically established after controlling for observables,

it may nonetheless be determined by skill heterogeneity unobserved by the econometrician.

Second, we consider the case as in Lucas, where a minimum scale of output is needed. This is the

case where h is negative for some values. We find that in equilibrium, firms with larger capital stocks

will be larger and will find it profitable to hire proportionally more high skilled workers. This implies

that the skill distribution in large firms is skewed to the right compared to the distribution in small

firms. In addition, the highest skilled manager in the large firm will be more skilled than the CEO in a

small firm. As a result, the support of skills of the small firm is included in the support of skills of the

large firm. This is illustrated in Figure 1.

In Section 3 below, we analyze the impact of investment in skills by ex ante identical agents and show

that in equilibrium, there will be an endogenous distribution of skills. Even with no or small ex ante

heterogeneity, there can be considerable ex post inequality as this technology enhances heterogeneity.

In equilibrium, if there is scarcity of any one particular input, the returns to obtaining that skill are

high. With increasing investment costs, the ensuing distribution of skills is decreasing in type as the

4

density

skills

large firm

small firm

Figure 2: Stochastic Dominance of Skill Distribution in Large Firms

returns in term of wages must be increasing to compensate for higher investments costs. Wages can

only be increasing if there is sufficient scarcity in that particular input.

One further application is Occupational Choice and span-of-control. A CEO chooses to optimally

design the hierarchy of the firm, and she herself competes on a market where she can choose, based

on equilibrium compensation schedules between a job as an employee or as a CEO. The span-of-

control of the CEO determines her productivity. We extend the technology of production hierarchies

to incorporate this feature.

2 The model

Population. Consider a population consisting of agents endowed with talent x, a one-dimensional skill

characteristic. Skills are distributed according to the distribution function F (x). The measure of agents

is normalized to one. There is a measure of capitalists, each of whom is atomless, who have the property

rights to a production process k. This can be interpreted as firm-specific total factor productivity. Let

µ(k) denote the measure of each type k.

Production. Firms produce output y using the input k and a set of workers of different skills. The

5

production function is given by

y = kL(n,x)

where n is the vector of quantities ni and x is the vector of skills xi, and where

L(n; x) =

[N∑i=1

h(ni)xi

]β

where h(·) is monotonically increasing and concave and β > 0.

It is important to note at this stage that y is a firm-level production function and that in general it

is not equal to the aggregate production function.

For most of the paper, we consider a discrete distribution of types x. A continuous distribution of

types is analogously represented by

L(n; x) =[∫

h(n(x))xdFk(x)]β

where Fk(x) denotes the distribution of skills in firm k. Below we derive that this is the continuous

limit of the production technology with finite skill types.

The firm’s optimization problem. Markets are competitive and the atomless firms act as price

takers. Given a vector of wages w(x) (normalize the output price to 1) firm k’s problem is given by:

π (k;w (·)) = maxn1,...,nN

k

[N∑i=1

h(ni)xi

]β−

N∑i=1

niw (xi)

A competitive equilibrium of the economy can be defined as follows:

Definition 1 In a competitive equilibrium in this economy: 1. Firms maximize profits πk; 2. workers

choose the job with the highest wage offered w(x) for a type x; 3. markets clear.

Now the main properties, as discussed in the introduction, of this production process are made

precise. 1. The marginal product of the second CEO, or the second janitor is lower. Returns are

decreasing returns to hiring more of the same worker since h(n) is concave. 2. The same number of

more skilled workers are more productive than low skilled workers: h(n)xi > h(n)xj if xi > xj . 3. There

is a notion of scarcity. A shortage of a particular skill level can drive up the prices. A lower skilled

employee xj can be more productive than the high skilled employee xi if she is sufficiently scarce in

the firm: h(ni)xi < h(nj)xj provided ni � nj . 4. Inputs in production can be both complements or

substitutes.

6

β

Substitutes and Strictly concavity

Complements and Strictly concavity

1

1

γ

Strc.

Quasiconcave

Once we have set up the characteristics of firm’s problem, we must define the equilibriumin our economy. Since we are assuming the approach in which workers don’t value leisure, ourequilibrium is quite simple and involves only profit maximization and market clearing condi-tions. Later, we endogeneize decisions on investment in education and we adapt our equilibriumconditions to take these decisions in account.

Definition 4 A Competitive equilibrium in this economy is one in which

• Firms maximize profits πi,

• workers choose the job with the highest wage w(x),

• markets clear.

Let’s explicitly find the equilibrium in this economy. As presented before, firm’s problem isgiven by:

π (k;w (·)) = maxn1,...,nN

k

"NXi=1

(a+ bnγi )xi

#β−

NXi=1

niw (xi)

Then, from F.O.C.s we have:

kβ

"NXi=1

(a+ bnγi )xi

#β−1bγnγ−1i xi = w (xi) , ∀i ∈ {1, ..., N}

4

Figure 3: Complements and Substitutes

Complements and Substitutes. From the firm’s objective function, we derive

∂2π

∂ni∂nj= kβ (β − 1)

[N∑i=1

h (ni)xi

]β−2

h′(ni)h′(nj)xixj

Notice that ∂2π∂ni∂nj

> 0⇐⇒ β > 1. Therefore, β determines whether xi and xj are gross complements

or substitutes.

Claim 2 If β > 1, inputs are complements. If β < 1 they are substitutes.

For example, let h (ni) = nγi , then we can summarize this in terms of the parameter values for

β ∈ R+ and γ ∈ [0, 1]. The firm’s problem is well-defined for β < 1/γ (a sufficient condition for

concavity is γβ < 1). Then the yellow area is the range of parameters where inputs in production are

complements, and the green area where they are substitutes.

Elasticity of Substitution. A key characteristic of the firm’s production function is its Elasticity of

Substitution between inputs ni and nj , denoted by σ. The elasticity of substitution is defined as

σ =d ln(xj/xi)d ln(TRS)

7

where TRS = dy/dxidy/dx2

is the technical rate of substitution. Then

σ = − h′ (ni)

h′′ (ni)1ni.

Claim 3 Let h(ni) = nγi . Then the production function is CES (σ is constant) and L (n; x) is homo-

geneous of degree one.

We can show in greater generality necessary and sufficient conditions for the production function

to be CES, namely that h(ni) = a + bnγi with a, b constants. In the appendix, we prove that if σ is a

constant, we must have that h (ni) is of the form a+bnγi , where a and b are constants, and that L (n; x)

is homotetic if and only if h (·) is the form a+ bnγi . For the remainder of this and the next subsection,

we assume that a is non-negative.

CES: h(ni) = nγi . The CES production function can be written as:

y = k

[N∑i=1

nγi xi

]β.

A special case of this CES production function is the one in Kremer (1993), which is equivalent to our

model when L =[

1N

∑Ni=1 n

γi

]Nγ with γ → 0.

Finally, we can show that if γβ < 1, then the firm’s objective function as defined generally above is

strictly concave. This Claim is proven in the Appendix.

2.1 The equilibrium allocation

We now explicitly derive the equilibrium in this economy. From the firm’s problem, we obtain the

F.O.C.s:

kβ

[N∑i=1

nγi xi

]β−1

bγnγ−1i xi = w (xi) , ∀i ∈ {1, ..., N}

Then, rearranging, we obtain:ninj

=(w (xj)xiw (xi)xj

) 11−γ

Substituting back, we obtain the demand for labor quality xj as a function of wages:

nj (k) =(kβγb2

) 11−γβ

(xj

w (xj)

) 1(1−γ)

[N∑i=1

(xi

w (xi)γ

) 11−γ] β−1

1−γβ

Market clearing satisfies: ∑k

nj (k)µ (k) = m (xj)

8

where m(xj) = F (xj) − F (xj−1) is the measure of worker type xj . Substituting for the equilibrium

quantity of nj(k) and solving for w (xj), we obtain the equilibrium wages:

w (xj) =xj

m (xj)1−γ

[N∑i=1

(xi

w (xi)γ

) 11−γ] (β−1)(1−γ)

1−γβ[∑

k

(kβγb2

) 11−γβ µ (k)

]1−γ

Now, substituting in the demand for wages, we obtain the equilibrium allocations:

nj (k) =k

11−γβm (xj)∑k k

11−γβ µ (k)

.

Then, looking at the total labor force of a firm with capital k, we have:

n (k) =N∑j=1

nj (k) =k

11−γβm∑

k k1

1−γβ µ (k)

where: m ≡∑N

j=1m (xj) . This expression is strictly increasing and convex in k, and the next Propo-

sition therefore immediately follows.

Proposition 4 Firms with higher k have a larger labor force

Firms with higher firm-specific TFP k are larger. The productivity per worker is higher, and

therefore at common economomy-wide wage rates, it is optimal for them to hire more workers. The

question remains how the skill distributions within the different firms compare.

Proposition 5 When the production function is CES, in equilibrium all firms have the same skill

distribution Fk(x) which is equal to the economy’s skill distribution F (x)

To see this, look at the fraction of quality j workers in terms of the total number of workers, and

we have:

nj (k)n (k)

=

k1

1−γβm(xj)∑k k

11−γβ µ(k)

k1

1−γβm∑k k

11−γβ µ(k)

=m (xj)m

for every k. Therefore, the distribution of workers inside a firm is exactly the same as the one in any

other firm and mimics the distribution in the market. Firms are different in size, given different k′s,

but they look alike considering the distribution of labor qualities inside the firm. This is a consequence

of constant elasticity of substitution assumption or, more generally, homotheticity. This assumption

imposes a lot of structure on firm’s production function.

9

2.2 Different Production Hierarchies

We show here that the equilibrium distribution of labor abilities inside a firm varies with k according

to changes in σ and the measure of labor qualities in the economy. We establish the following result,

consistent with the distributional properties discussed in Figure 1.1

Proposition 6 Let σ′ < 0. If the density of x is decreasing then:

1. Higher k firms are larger;

2. Average skills and average wages are higher in larger firms than in smaller firms;

3. The skill and wage distribution in larger firms First-Order Stochastically dominates those in small

firms.

To derive this result, we show how the elasticity of substitution and the equilibrium allocation relate.

In particular, the following result holds.

Proposition 7 If σ is decreasing2, then higher k firms hire more of the scarce skilled workers (m(x1) <

m(x2)):

σ

(m (x2)

2

)< σ

(m (x1)

2

)⇒

∂(n1

1

n12

)∂k1

∣∣∣∣∣∣k1=k2

> 0

Proof. We prove this result here for β = 1. In the appendix, we provide proof for general β. Observe

that the first-order conditions, after substituting for market clearing imply

k1h′ (n1

1

)= k2h

′ (m (x1)− n11

)k1h′ (n1

2

)= k2h

′ (m (x2)− n12

)These conditions implicitly define the equilibrium allocations n1

1 and n12. Applying the implicit function

theorem, we get

∂n11

∂k1= −

h′(n1

1

)k1h′′

(n1

1

)+ k2h′′

(m (x1)− n1

1

) and∂n1

2

∂k1= −

h′(n1

2

)k1h′′

(n1

2

)+ k2h′′

(m (x2)− n1

2

)1To simplify exposition, here we will consider the case in which we have only two firms with different managerial skills

or TFP, k1 and k2 and β = 1. We have shown that exactly the same results hold in general, though the derivation is

somewhat more involved.2A sufficient condition for σ decreasing is h′′′ < 0. This immediately follows from

dσ

dn= − [h′′]

2n− h′ [h′′ + nh′′′]

[nh′′]2

and the fact that h′ > 0, h′′ < 0.

10

When k1 = k2, we obtain from using the quotient rule that:

∂(n1

1

n12

)∂k1

∣∣∣∣∣∣k1=k2

=

−h′(m(x1)

2

)h′′(m(x1)

2

) ∗ m(x2)2 −

−h′(m(x2)

2

)h′′(m(x2)

2

) ∗ m(x1)2

2k1

(m(x2)

2

)2

The left-hand side is positive provided

−h′(m(x1)

2

)h′′(m(x1)

2

) 1m(x1)

2

>−h′

(m(x2)

2

)h′′(m(x2)

2

) 1m(x2)

2

Recall that the elasticity of substitution is

σ = − h′ (ni)

h′′ (ni)1ni,

and therefore∂(n1

1

n12

)∂k1

∣∣∣∣∣∣k1=k2

> 0 if σ(m (x2)

2

)< σ

(m (x1)

2

)

Considering the more scarce labor quality as the one with higher levels of education or human

capital, an economy with production hierarchies will have larger firms hiring more heavily at the top,

i.e., they will have more skilled workers. This effectively means that they have proportionally more

managerial positions compared to smaller firms. One possible interpretation is that of an increase in

the monitoring cost. In order to manage a larger hierarchy, the demands on communication skills and

span-of-control go up, leading to the hiring of more skilled types. Notice that this is due to the relative

scarcity of each type of labor. The result is driven by the elasticity of substitution of a given quality of

labor compared to others.

2.3 Minimum Scale of Operation

We derive under plausible conditions that the highest skilled worker has a higher type in larger firms

than in smaller firms. This implies that the distribution of higher k firms has fat tails at the top as

long as the skill distribution has decreasing density.

Suppose there is some non-convexity in the production technology. On any given task, firms incur

a fixed cost.3 Consider the production function we used above h(n) = a + bnγ , where a < 0. A firm

will hire a type x if for that type, the equilibrium n∗ yields positive output: h(n∗) = a+ b (n∗)γ , where

we derived n earlier as:

n∗ (k) =k

11−γβm (x)∑k k

11−γβ µ (k)

.

3A special case of this technology is that in Antras, Garicano and Rossi-Hansberg (2006).

11

The firm’s decision problem is therefore to choose n∗ as long as h∗ = a+ b

(k

11−γβm(x)∑k k

11−γβ µ(k)

)γ> 0. A firm

with capital k will therefore be indifferent between hiring and not hiring provided

k =(−ab

) 1−γβγ

1m(x)

∑k∈K(x)

k1

1−γβ µ (k)

1−γβ

.

The only caveat is of course that the summation over k is for all k actively hiring workers of type x.

K(x) denotes the set of firms actively hiring type x workers.

Proposition 8 Let the elasticity of substitution σ be constant, and there is a fixed cost of employing

one skill type (a < 0), then: 1. higher k firms hire more workers; 2. the support of skills hired in lower

k firms is included in the support of skills of higher k firms; 3. when the skill density is decreasing,

higher k firms higher more skilled workers

Example. Let skills be distributed according to the Pareto with location 1 and coefficient 1. Then the

cdf is P (x) = x−1 and the density is p(x) = x−2(= m(x)). Let the distribution of firms be uniform,

µ = 1 for k ∈ [0, 1]. Let h(n) = a+ n1/2, and β = 1. We have:

h (n) =

a+ n12 if n > 0

0 if n = 0

where a < 0. From previous calculations, we obtain:

nx (k) =k2x−2∫K k

2dk.

Define k (x) = {k ∈ K |h (nx (k)) = 0}. Therefore, there exists a threshold such that if k < k (x),

max{

0, a+ nx (k)12

}= 0. This implies that K =

[k (x) , 1

]. Solving for k (x) :

a+

k (x)2

x2∫ 1k(x) k

2dk

12

= 0,

and rearranging, we have:

3k (x)2 = (−ax)2[1− k (x)3

], (1)

which defines k (x). From the implicity function theorem, we have:

dk (x)dx

=2a2x

[1− k (x)3

]3k (x)

[2 + a2x2k (x)

] > 0.

12

Claim 9 x→∞ as k (x)→ 1.

Proof. Assume that there is a x∗ ∈ R such that k (x∗) = 1. But then, from (1) we must have:

3k (x∗)2︸︷︷︸=3

− (−ax∗)2

1− k (x∗)3︸︷︷︸=0

= 0

3 = 0

which is a contradiction. Then, we cannot have k (x∗) = 1 for x∗ finite. Since dk(x)dx > 0, ∀k (x) ∈ (0, 1),

we must have k (x)→ 1 as x→∞.

Claim 10 k (1) > 0, i.e., some firms shut down in equilibrium.

Proof. From (1), we have:

3k (1)2 = (−a)2[1− k (1)3

]Now, observe that the LHS of this equality is strictly increasing in k (1), while the RHS is strictly

decreasing. But if k (1) = 0 , we have LHS < RHS, so we must have that k (1) > 0.

The fact that k is increasing in x of course also implies that the larger firms k have higher cut-off

types for their highest skilled employee. The maximum quality of x that a given k firm hire:

x (k) =√

3k

−a (1− k3)12

and is increasing in k. The lowest firm that has positive profits in this market

x =√

3k

0.5 (1− k3)12

k = 0.25

Finally, we also verifty that the demand in the right tail is in fact decreasing as x increases:

dnx (k)dx

=d

{3k2

x2[1−k(x)3]

}dx

=−3k2

{2x[1− k (x)3

]− 3k (x)2 dk(x)

dx x2}

x4[1− k (x)3

]2

Substituting k (x) and rearranging, we have:

dnx (k)dx

=−12xk2

x4[2 + a2x2k (x)

] [1− k (x)3

] < 0

So, the demand is strictly decreasing in x, for a given k and a cut off rule is optimal.

13

For this example, we now explicitly have the measure of skills within a firm

n(x | k) =3k2

x2[1− k (x)3

]where k (x) solves (1). Normalizing this measure to sum up to one, we obtain the firm’s distribution of

skills. Larger firms hire more workers of all skill types, but from simple comparison of the normalized

densities, we see that the low k firms hire proportionally more low skilled workers. The high k firm’s

skill distribution is therefore heavy in the tail and skewed to the right.

3 Applications

3.1 Investment in skills: Endogenous heterogeneity

With production hierarchies, ex ante identical agents have incentives to take on different levels of

investment. Because all skill levels are needed in production, it cannot be an equilibrium where all

agents choose to invest the same amount and obtain the same level of investment. We now show this

for the constant elasticity case with h (ni) = nγi . Let c(xi) be the cost associated with obtaining skill

level xj .

We first find an expression for wages. From our previous calculations, we obtain:

w (xj) =xj

m (xj)1−γ

(N∑i=1

m (xi)γ xi

)(β−1) [∑k

(kβγb2

) 11−γβ µ (k)

]1−γβ

.

Notice that w (xj) depends on the ratio xjm(xj)

1−γ , where m (xj) is the aggregate supply of skill j.

Then,the worker’s problem is given by:

max{x1,...,xN}

{w (x1)− c (x1) , ..., w (xN )− c (xN )}

since in equilibrium all skills must be offered4 and workers are ex ante symmetric, we must have:

w (x1)− c (x1) = ... = w (xN )− c (xN ) = v

where v is a constant.

Then w (xi) is identical to the cost function up to a constant. Using this, we can calculate m (xi),

which is the supply of skill i. Observe that:

v + c (xj) =xj

m (xj)1−γ

(N∑i=1

m (xi)γ xi

)(β−1) [∑k

(kβγ)1

1−γβ µ (k)

]1−γβ

4With h(ni) = nγi the Inada conditions hold and the marginal productivity at ni = 0 is infinity. Hence all skills will be

offered in in equilibrium.

14

and that:v + c (xj)v + c (xi)

=

xjm(xj)

1−γ

xim(xi)

1−γ.

Then, from the expression for v + c (xj), we have:

v + c (xj) =xj

m (xj)1−γ

(m (xj)

γN∑i=1

[m (xi)m (xj)

]γxi

)(β−1) [∑k

(kβγb2

) 11−γβ µ (k)

]1−γβ

.

Substituting the above expression and rearranging:

m (xj) =(

xjv + c (xj)

) 11−γ(

N∑i=1

(xi

[v + c (xi)]γ

) 11−γ) (β−1)

1−γβ[∑

k

(kβγb2

) 11−γβ µ (k)

]

Then:

m′ (xj) =1

1− γ

(xj

v + c (xj)

) γ1−γ(

N∑i=1

(xi

[v + c (xi)]γ

) 11−γ) (β−1)

1−γβ

×

[∑k

(kβγb2

) 11−γβ µ (k)

](v + c (xj)− c′ (xj)xj

(v + c (xj))2

)

If there is no fixed cost, m′ (xj) < 0, for every xj . The density of skill types is downward sloping.

The higher the skill level, the higher the cost of obtaining those skills. As a result, wages must be higher

for higher skill level to compensate for the cost. For that to be the case, there must be fewer people in

equilibrium who invest to obtain high skill levels.5 Observe that the properties of the distribution here

are derived in the context of a competitive market without any externalities.6

3.2 Occupational Choice and Span-of-Control

Models of occupational choice have increasingly received attention as a way of explaining the aggregate

outcomes by means of micro-founded allocation problems. Lucas (1978) uses a matching problem where

differently skilled agents decide to become either workers or entrepreneurs. The span-of-control of the

manager determines the size of the firm, which in equilibrium generates an equilibrium distribution of

firms. In our context, we can extend Lucas’ framework to allow for CEO’s to run the firm, rather than

capitalists. The production technology then becomes y = xL(n,x) instead of y = kL(n,x). There will5If there is a fixed cost, then there is the possibility that m′ (xj) is positive for small x′js, and there can be a skewed

unimodal distribution with a long upper tail. The intuition comes from the compensation for the fixed cost. Since these

skills are not that valuable per se (small x′s), but there is this fixed cost that workers have to pay, we need to increase

wages by decreasing m (x), so we have this initial increase in m (x) as we increase x and then we start decreasing.6For a framework with spillovers from technology adoption and the ensuing endogenous heterogeneity of ex ante identical

agents, see for example Eeckhout and Jovanovic (2002).

15

be a wage for all types both as a worker and as a CEO, and the equilibrium allocation is determined

by the occupational choice of each type, driven by the maximum over both wages. With a distribution

of skills, CEOs of different skills will manage teams, each potentially with a different composition and

distribution. This relates to the findings in Gabaix and Landier (2008) who analyze the matching

problem of CEOs to firms with different capital stocks. Here we interpret the task of the CEO as

managing the composition and distribution of the work force.

We derive the equilibrium condition that determines the occupational choice decision for the con-

tinuous type distribution (the derivation of the continuous type formulation as the limit of the discrete

type case is at the end of this section). If the distance between two sequential qualities is ∆, we have

N = 1+ (x−x)∆ . Then, m (xi) = F (xi)−F (xi−1) = F (xi)−F (xi −∆). Wages and profits, after taking

the limit for ∆→ 0, satisfy:

w (xi) =γxαi

[∫E x

11−γ f (x) dx

]1−γ

f (xi)1−γ

and

π (x,w (·)) = (1− γ)x∫Ecxγi

(γxxαiw (xi)

) γ1−γ

dxi.

then, substituting w (xi), we obtain:

π (x,w (·)) =(1− γ)x

11−γ[∫

E x1

1−γj f (xj) dxj

]γ ∫Ec

[xif (xi)]γ dxi.

Then the condition to become a manager is:

(1− γ)x1

1−γ[∫E x

11−γj f (xj) dxj

]γ ∫Ec

[xif (xi)]γ dxi ≥

γxα[∫E x

11−γ f (x) dx

]1−γ

f (x)1−γ

Rearranging, we have:

x1−α+αγ

1−γ f (x)1−γ ≥ γ

1− γ

[∫E x

11−γ f (x) dx

]∫Ec [xif (xi)]

γ dxi.

We can also go further and introduce span-of-control at each skill level. More generally therefore,

we can use our set up and specify h as h (ni;n−i,x) instead of h (ni) : the returns to each occupation do

not only depend on the number of people of the same skill, but on the number of people of other skill

levels. As a result, there is an occupational choice decision for each job, and each occupation, driven

by the local span-of-control of that occupation. As in Lucas (1979), this partitions the set of skills into

different distributions.

This generalized production function that combines the optimal allocation of skills with an occu-

pational choice decision is really getting to the heart of production hierarchies. At all levels within

16

the firm, managers can be interpreted as having span-of-control of different degrees over workers with

different skill levels. One of the main objectives of this paper is to further elaborate the links between

production hierarchies and span-of-control and occupational choice.

17

4 Appendix

Claim 11 If σ is a constant, we must have that h (ni) is of the form a + bnγi , where a and b are

constants.

Proof. Since σ is a constant, we have that:

h′′ (ni) +1σni

h′ (ni) = 0

is a homogeneous second order linear differential equation. Considering h′ (ni) = g (ni) we reduce it to

a first order ODE. Solving it, we obtain:

h′ (ni) = h′ (n0) e−∫ nin0

1σydy

where h′ (n0) is the initial condition. Taking the integral on both sides, we obtain:

h (ni)− h (n0) = h′ (n0)∫ ni

n0

e−∫ zn0

1σydydz

Then, notice that:

−∫ z

n0

1σydy =

1σ

∫ n0

z

1ydy =

1σ

ln y|n0z =

1σ

lnn0

z

Substituting back, we have:

e−∫ zn0

1σydy =

[eln(n0

z )] 1σ =

(n0

z

) 1σ

Substituting back again, we have:

h (ni)− h (n0) = h′ (n0)∫ ni

n0

(n0

z

) 1σdz

h (ni)− h (n0) = h′ (n0)n1σ0

∫ ni

n0

z−1σ dz

Solving the integral, we obtain:

h (ni)− h (n0) = h′ (n0)n1σ0

[σ

σ − 1zσ−1σ

∣∣∣∣nin0

]

Then, rearranging, we have:

h (ni) = h (n0)− σ

σ − 1h′ (n0)n0 +

σ

σ − 1h′ (n0)n

1σ0 n

σ−1σ

i

Therefore:

h (ni) = a+ bnγi

18

where:

a : = h (n0)− σ

σ − 1h′ (n0)n0

b : =σ

σ − 1h′ (n0)n

1σ0

γ : =σ − 1σ

.

Claim 12 L (n; x) is homotetic if and only if h (·) is the form a+ bnγi .

Proof. We know that, by definition, L (n; x) is homotetic if for any i, j ∈ {1, ..., N} and for any t > 0,

we have that:∂L(n;x)∂ni

∂L(n;x)∂nj

=∂L(tn;x)∂ni

∂L(tn;x)∂nj

But then, we should have:h′ (ni)h′ (nj)

=h′ (tni)h′ (tnj)

rearranging:h′ (tnj)h′ (nj)

=h′ (tni)h′ (ni)

Since this must always be satisfied, we must have:

h′ (tni)h′ (ni)

= c

where c is a constant. But then, we must have:

h′ (tni) = ch′ (ni)

since the function f (β) = tβ, with t > 0, is continuous and has image on (0,∞), by mean value theorem

we have that there is a (γ − 1) ∈ (0,∞) such that t(γ−1) = c. Therefore, we have:

h′ (tni) = tγ−1h′ (ni)

Therefore, h′ (·) is a homogeneous function of degree γ − 1.

Since h (·) is a univarite function, it is easy to see that it must be of the form dnγ−1i , where bd is a

constant (Note that h (ni) = h (ni ∗ 1) = nγ−1i h (1) = dnγ−1

i , where d = h (1)). But then, we have:

h (ni) =∫h′ (ni) dni =

∫dnγ−1

i dni =d

γnγi + a

Define b = dγ , so we have:

h (ni) = a+ bnγi .

19

Claim 13 γβ < 1 is a sufficient condition for strictly concavity of firm’s objective function, whenever

a ≥ 0.

Proof. Notice that:

∂2π

∂n2i

= kβ (β − 1)

[N∑i=1

(a+ bnγi )xi

]β−2

b2γ2n(γ−1)2i x2

i +

kβ

[N∑i=1

(a+ bnγi )xi

]β−1

bγ (γ − 1)nγ−2i xi.

Rearranging:

∂2π

∂n2i

= kβ

[N∑i=1

(a+ bnγi )xi

]β−2

bγnγ−2i xi

{(β − 1) bnγi γxi + (γ − 1)

[N∑i=1

(a+ bnγi )xi

]}Then, ∂2π

∂n21< 0 if we have:

kβ

[N∑i=1

(a+ bnγi )xi

]β−2

bγnγ−21 x1

{(β − 1) bnγ1γx1 + (γ − 1)

[N∑i=1

(a+ bnγi )xi

]}< 0

Which implies:

(β − 1) bnγ1γx1 + (γ − 1)

[N∑i=1

(a+ bnγi )xi

]< 0


(γβ − 1) bnγ1x1 + (γ − 1)

[a

N∑i=1

xi + bN∑i=1

nγi xi

]< 0

From our assumption that h′ (·) > 0, we must have b > 0. However, initially we don’t have any

assumptions on a. If we consider a ≥ 0, we notice that a sufficient condition would be γβ < 1 (I’m

already assuming by concavity of h (·) that γ < 1). To get Inada conditions, we necessarily have a = 0.

If a < 0, then we wouldn’t have strict concavity holding for all n.

Let’s now consider the second principal minor. Then, our condition is given by:

k2β2

[N∑i=1

(a+ bnγi )xi

]2β−3

b2γ2nγ−21 nγ−2

2 x1x2 (γ − 1)

(γβ − 1) b (nγ1x1 + nγ2x2)

+ (γ − 1)[a∑N

i=1 xi + b∑N

i=3 nγi xi

] > 0

Again, for the case in which a ≥ 0, γβ < 1 is a sufficient condition, since γ < 1.

Let’s now consider the third principal minor. Then, our condition is given by:

k3β3

[N∑i=1

(a+ bnγi )xi

]3β−4

b3γ3nγ−21 nγ−2

2 nγ−23 x1x2x3 (γ − 1)2

∗

(γβ − 1) b (nγ1x1 + nγ2x2 + nγ3x3)

+ (γ − 1)[a∑N

i=1 xi + b∑N

i=4 nγi xi

] < 0

20

Then, again, for the case in which a ≥ 0 , γβ < 1 is a sufficient condition. We also can see the pattern

for these conditions, meaning that γβ < 1 is a sufficient condition for any N and a ≥ 0. Therefore,

γβ < 1 is a sufficient condition for strict concavity of the objective function whenever a ≥ 0.

Proof of Proposition 7 for general β.

Proof. Equilibrium conditions. Two firms, two skills; Endogenous Variables: n11, n

12, n

21, n

22, w1, w2.

k1β[h(n1

1

)x1 + h

(n1

2

)x2

]β−1h′(n1

1

)x1 = w1 (1)

k1β[h(n1

1

)x1 + h

(n1

2

)x2

]β−1h′(n1

2

)x2 = w2 (2)

k2β[h(n2

1

)x1 + h

(n2

2

)x2

]β−1h′(n2

1

)x1 = w1 (3)

k2β[h(n2

1

)x1 + h

(n2

2

)x2

]β−1h′(n2

2

)x2 = w2 (4)

n11 + n2

1 = m (x1) (5)

n12 + n2

2 = m (x2) (6)

For the general case, when β 6= 1, we can reduce the system to:k1

[h(n1

1

)x1 + h

(n1

2

)x2

]β−1h′(n1

1

)− k2

h(m (x1)− n1

1

)x1

+h(m (x2)− n1

2

)x2

β−1

h′(m (x1)− n1

1

)= 0 (F1)

k1

[h(n1

1

)x1 + h

(n1

2

)x2

]β−1h′(n1

2

)− k2

h(m (x1)− n1

1

)x1

+h(m (x2)− n1

2

)x2

β−1

h′(m (x2)− n1

2

)= 0 (F2)

The main problem is that this is a non-linear non-separable system.

From (F1)(F2) , we have:

h′(n1

1

)h′(n1

2

) =h′(m (x1)− n1

1

)h′(m (x2)− n1

2

)Then, let’s prepare ourselves for the IFT:

DkF =

∂F1∂k1

∂F1∂k2

∂F2∂k1

∂F2∂k2

where:

∂F1

∂k1=[h(n1

1

)x1 + h

(n1

2

)x2

]β−1h′(n1

1

)∂F1

∂k2= −

[h(m (x1)− n1

1

)x1 + h

(m (x2)− n1

2

)x2

]β−1h′(m (x1)− n1

1

)∂F2

∂k1=[h(n1

1

)x1 + h

(n1

2

)x2

]β−1h′(n1

2

)∂F2

∂k2= −

[h(m (x1)− n1

1

)x1 + h

(m (x2)− n1

2

)x2

]β−1h′(m (x2)− n1

2

)

21

And,

DnF =

∂F1

∂n11

∂F1

∂n12

∂F2

∂n11

∂F2

∂n12

where:

∂F1

∂n11

= k1

{(β − 1)

[h(n1

1

)x1 + h

(n1

2

)x2

]β−2 [h′(n1

1

)]2x1 +

[h(n1

1

)x1 + h

(n1

2

)x2

]β−1h′′(n1

1

)}−k2

− (β − 1)[h(m (x1)− n1

1

)x1 + h

(m (x2)− n1

2

)x2

]β−2 [h′(m (x1)− n1

1

)]2x1

−[h(m (x1)− n1

1

)x1 + h

(m (x2)− n1

2

)x2

]β−1h′′(m (x1)− n1

1

)

∂F1

∂n12

=

k1 (β − 1)[h(n1

1

)x1 + h

(n1

2

)x2

]β−2h′(n1

1

)h′(n1

2

)x2+

k2 (β − 1)[h(m (x1)− n1

1

)x1 + h

(m (x2)− n1

2

)x2

]β−2h′(m (x1)− n1

1

)h′(m (x2)− n1

2

)x2

∂F2

∂n11

=

k1 (β − 1)

[h(n1

1

)x1 + h

(n1

2

)x2

]β−2h′(n1

2

)h′(n1

1

)x1+

k2 (β − 1)[h(m (x1)− n1

1

)x1 + h

(m (x2)− n1

2

)x2

]β−2

∗h′(m (x2)− n1

2

)h′(m (x1)− n1

1

)x1

∂F2

∂n12

= k1

{(β − 1)

[h(n1

1

)x1 + h

(n1

2

)x2

]β−2 [h′(n1

2

)]2x2 +

[h(n1

1

)x1 + h

(n1

2

)x2

]β−1h′′(n1

2

)}−k2

− (β − 1)[h(m (x1)− n1

1

)x1 + h

(m (x2)− n1

2

)x2

]β−2 [h′(m (x2)− n1

2

)]2x2

−[h(m (x1)− n1

1

)x1 + h

(m (x2)− n1

2

)x2

]β−1h′′(m (x2)− n1

2

)

Then, we have:

detDnF =∂F1

∂n11

∗ ∂F2

∂n12

− ∂F2

∂n11

∗ ∂F1

∂n12

So:

∂F1

∂n11

∗ ∂F2

∂n12

=k1

{(β − 1)

[h(n1

1

)x1 + h

(n1

2

)x2

]β−2 [h′(n1

1

)]2x1 +

[h(n1

1

)x1 + h

(n1

2

)x2

]β−1h′′(n1

1

)}−k2

− (β − 1)[h(m (x1)− n1

1

)x1 + h

(m (x2)− n1

2

)x2

]β−2 [h′(m (x1)− n1

1

)]2x1

−[h(m (x1)− n1

1

)x1 + h

(m (x2)− n1

2

)x2

]β−1h′′(m (x1)− n1

1

)

∗

k1

{(β − 1)

[h(n1

1

)x1 + h

(n1

2

)x2

]β−2 [h′(n1

2

)]2x2 +

[h(n1

1

)x1 + h

(n1

2

)x2

]β−1h′′(n1

2

)}−k2

− (β − 1)[h(m (x1)− n1

1

)x1 + h

(m (x2)− n1

2

)x2

]β−2 [h′(m (x2)− n1

2

)]2x2

−[h(m (x1)− n1

1

)x1 + h

(m (x2)− n1

2

)x2

]β−1h′′(m (x2)− n1

2

)

22

Rearranging:

∂F1

∂n11

∗ ∂F2

∂n12

=k1

[h(n1

1

)x1 + h

(n1

2

)x2

]β−2{

(β − 1)[h′(n1

1

)]2x1 +

[h(n1

1

)x1 + h

(n1

2

)x2

]h′′(n1

1

)}−k2

[h(m (x1)− n1

1

)x1 + h

(m (x2)− n1

2

)x2

]β−2 ∗ − (β − 1)[h′(m (x1)− n1

1

)]2x1

−[h(m (x1)− n1

1

)x1 + h

(m (x2)− n1

2

)x2

]h′′(m (x1)− n1

1

)

∗

k1

[h(n1

1

)x1 + h

(n1

2

)x2

]β−2{

(β − 1)[h′(n1

2

)]2x2 +

[h(n1

1

)x1 + h

(n1

2

)x2

]h′′(n1

2

)}−k2

[h(m (x1)− n1

1

)x1 + h

(m (x2)− n1

2

)x2

]β−2 ∗ − (β − 1)[h′(m (x2)− n1

2

)]2x2

−[h(m (x1)− n1

1

)x1 + h

(m (x2)− n1

2

)x2

]h′′(m (x2)− n1

2

)

and

∂F2

∂n11

∗ ∂F1

∂n12

=

k1 (β − 1)

[h(n1

1

)x1 + h

(n1

2

)x2

]β−2h′(n1

1

)h′(n1

2

)x2+

k2 (β − 1)[h(m (x1)− n1

1

)x1 + h

(m (x2)− n1

2

)x2

]β−2

∗h′(m (x1)− n1

1

)h′(m (x2)− n1

2

)x2

∗

k1 (β − 1)

[h(n1

1

)x1 + h

(n1

2

)x2

]β−2h′(n1

2

)h′(n1

1

)x1+

k2 (β − 1)[h(m (x1)− n1

1

)x1 + h

(m (x2)− n1

2

)x2

]β−2

∗h′(m (x2)− n1

2

)h′(m (x1)− n1

1

)x1

Now, consider the symmetric equilibrium in which k1 = k2, n1

1 = m(x1)2 and n1

2 = m(x2)2 . Then, we have:

∂F1

∂n11

∗ ∂F2

∂n12

∣∣∣∣k1=k2=k

=

(2k[h

(m (x1)

2

)x1 + h

(m (x2)

2

)x2

]β−2)2 (β − 1)

[h′(m(x1)

2

)]2x1

+[h(m(x1)

2

)x1 + h

(m(x2)

2

)x2

]h′′(m(x1)

2

)

∗

(β − 1)[h′(m(x2)

2

)]2x2

+[h(m(x1)

2

)x1 + h

(m(x2)

2

)x2

]h′′(m(x2)

2

)

and

∂F2

∂n11

∗ ∂F1

∂n12

=

[2k (β − 1)

[h

(m (x1)

2

)x1 + h

(m (x2)

2

)x2

]β−2

h′(m (x1)

2

)h′(m (x2)

2

)]2

x2x1

23

Then, detDnF becomes:

detDnF = 4k2

[h

(m (x1)

2

)x1 + h

(m (x2)

2

)x2

]2β−4

∗(β − 1)

[h′′(m(x2)

2

) [h′(m(x1)

2

)]2x1 + h′′

(m(x1)

2

) [h′(m(x2)

2

)]2x2

]+[h(m(x1)

2

)x1 + h

(m(x2)

2

)x2

]2h′′(m(x1)

2

)h′′(m(x2)

2

)

If β < 1, this is necessarily different than zero. Otherwise, this could be zero but the set of parameters

in which this occurs has mean zero. Then:

D−1n F =

1|detDnF |

∂F2

∂n12−∂F1

∂n12

−∂F2

∂n11

∂F1

∂n11

Then: ∂n1

1∂k1

∂n11

∂k2∂n1

2∂k1

∂n12

∂k2

= −D−1n F ∗DkF

Substituting, we have:∂n11

∂k1

∂n11

∂k2∂n1

2∂k1

∂n12

∂k2

= − 1|detDnF |

∂F2

∂n12−∂F1

∂n12

−∂F2

∂n11

∂F1

∂n11

∗∂F1∂k1

∂F1∂k2

∂F2∂k1

∂F2∂k2

Then:

∂n11

∂k1= − 1|detDnF |

(∂F2

∂n12

∗ ∂F1

∂k1− ∂F1

∂n12

∗ ∂F2

∂k1

)Then:

∂F2

∂n12

∗ ∂F1

∂k1=

k1

{(β − 1)

[h(n1

1

)x1 + h

(n1

2

)x2

]β−2 [h′(n1

2

)]2x2 +

[h(n1

1

)x1 + h

(n1

2

)x2

]β−1h′′(n1

2

)}−k2

− (β − 1)[h(m (x1)− n1

1

)x1 + h

(m (x2)− n1

2

)x2

]β−2 [h′(m (x2)− n1

2

)]2x2

−[h(m (x1)− n1

1

)x1 + h

(m (x2)− n1

2

)x2

]β−1h′′(m (x2)− n1

2

)

∗[h(n1

1

)x1 + h

(n1

2

)x2

]β−1h′(n1

1

)at k1 = k2 = k and symmetric equilibrium, we have:

∂F2

∂n12

∗ ∂F1

∂k1

∣∣∣∣k1=k2=k

=

2k[h

(m (x1)

2

)x1 + h

(m (x2)

2

)x2

]2β−3 (β − 1)

[h′(m(x2)

2

)]2x2+[

h(m(x1)

2

)x1 + h

(m(x2)

2

)x2

]h′′(m(x2)

2

)h′(m (x1)

2

)

24

and

∂F1

∂n12

∗ ∂F2

∂k1=

k1 (β − 1)[h(n1

1

)x1 + h

(n1

2

)x2

]β−2h′(n1

1

)h′(n1

2

)x2+

k2 (β − 1)[h(m (x1)− n1

1

)x1 + h

(m (x2)− n1

2

)x2

]β−2

∗h′(m (x1)− n1

1

)h′(m (x2)− n1

2

)x2

∗[h(n1

1

)x1 + h

(n1

2

)x2

]β−1h′(n1

2

)again, at k1 = k2 = k, we have:

∂F1

∂n12

∗ ∂F2

∂k1

∣∣∣∣k1=k2=k

=

2k (β − 1)[h

(m (x1)

2

)x1 + h

(m (x2)

2

)x2

]2β−3

h′(m (x2)

2

)2

h′(m (x1)

2

)x2

Putting everything together at k1 = k2 = k, we have:

∂F2

∂n12

∗ ∂F1

∂k1

∣∣∣∣k1=k2=k

− ∂F1

∂n12

∗ ∂F2

∂k1

∣∣∣∣k1=k2=k

=

2k[h

(m (x1)

2

)x1 + h

(m (x2)

2

)x2

]2β−3 (β − 1)

[h′(m(x2)

2

)]2x2+[

h(m(x1)

2

)x1 + h

(m(x2)

2

)x2

]h′′(m(x2)

2

)h′(m (x1)

2

)

−2k (β − 1)[h

(m (x1)

2

)x1 + h

(m (x2)

2

)x2

]2β−3

h′(m (x2)

2

)2

h′(m (x1)

2

)x2

= 2k[h

(m (x1)

2

)x1 + h

(m (x2)

2

)x2

]2β−2

h′′(m (x2)

2

)h′(m (x1)

2

)Therefore:

∂n11

∂k1

∣∣∣∣k1=k2=k

= −

{2k[h(m(x1)

2

)x1 + h

(m(x2)

2

)x2

]2β−2h′′(m(x2)

2

)h′(m(x1)

2

)}|detDnF |

> 0

Now, let’s calculate ∂n12

∂k1. Then, we have:

∂n12

∂k1

∣∣∣∣k1=k2=k

= − 1|detDnF |

(∂F1

∂n11

∗ ∂F2

∂k1− ∂F2

∂n11

∗ ∂F1

∂k1

)

25

Then, let’s substitute this step by step:

∂F1

∂n11

∗ ∂F2

∂k1=

k1

[h(n1

1

)x1 + h

(n1

2

)x2

]β−2{

(β − 1)[h′(n1

1

)]2x1 +

[h(n1

1

)x1 + h

(n1

2

)x2

]h′′(n1

1

)}−k2

[h(m (x1)− n1

1

)x1 + h

(m (x2)− n1

2

)x2

]β−2 ∗ − (β − 1)[h′(m (x1)− n1

1

)]2x1

−[h(m (x1)− n1

1

)x1 + h

(m (x2)− n1

2

)x2

]h′′(m (x1)− n1

1

)

∗[h(n1

1

)x1 + h

(n1

2

)x2

]β−1h′(n1

2

)at k1 = k2 = k, we have:

∂F1

∂n11

∗ ∂F2

∂k1

∣∣∣∣k1=k2=k

=

2k[h

(m (x1)

2

)x1 + h

(m (x2)

2

)x2

]2β−3 (β − 1)

[h′(m(x1)

2

)]2x1

+[h(m(x1)

2

)x1 + h

(nm(x2)

2

)x2

]h′′(m(x1)

2

)h′(m (x2)

2

)and

∂F2

∂n11

∗ ∂F1

∂k1=

k1 (β − 1)[h(n1

1

)x1 + h

(n1

2

)x2

]β−2h′(n1

2

)h′(n1

1

)x1+

k2 (β − 1)[h(m (x1)− n1

1

)x1 + h

(m (x2)− n1

2

)x2

]β−2

∗h′(m (x2)− n1

2

)h′(m (x1)− n1

1

)x1

∗[h(n1

1

)x1 + h

(n1

2

)x2

]β−1h′(n1

1

)Then, at k1 = k2 = k, we have:

∂F2

∂n11

∗ ∂F1

∂k1

∣∣∣∣k1=k2=k

=

2k (β − 1)[h

(m (x1)

2

)x1 + h

(m (x2)

2

)x2

]2β−3

h′(m (x2)

2

)[h′(m (x1)

2

)]2

x1

Then, we have:

∂F1

∂n11

∗ ∂F2

∂k1

∣∣∣∣k1=k2=k

− ∂F2

∂n11

∗ ∂F1

∂k1

∣∣∣∣k1=k2=k

=

2k[h

(m (x1)

2

)x1 + h

(m (x2)

2

)x2

]2β−3 (β − 1)

[h′(m(x1)

2

)]2x1

+[h(m(x1)

2

)x1 + h

(nm(x2)

2

)x2

]h′′(m(x1)

2

)h′(m (x2)

2

)

−2k (β − 1)[h

(m (x1)

2

)x1 + h

(m (x2)

2

)x2

]2β−3

h′(m (x2)

2

)[h′(m (x1)

2

)]2

x1

= 2k[h

(m (x1)

2

)x1 + h

(m (x2)

2

)x2

]2β−2

h′′(m (x1)

2

)h′(m (x2)

2

)26

Then:

∂n12

∂k1= −

{2k[h(m(x1)

2

)x1 + h

(m(x2)

2

)x2

]2β−2h′′(m(x1)

2

)h′(m(x2)

2

)}|detDnF |

> 0

Then:∂(n1

1

n12

)∂k1

=∂n1

1∂k1∗ n1

2 −∂n1

2∂k1∗ n1

1(n1

2

)2∂(n1

1

n12

)∂k1

∣∣∣∣∣∣k1=k2

=2k[h(m(x1)

2

)x1 + h

(m(x2)

2

)x2

]2β−2

|detDnF |(m(x2)

2

)2

−h′′ (m(x2)2

)h′(m(x1)

2

)m(x2)

2

+h′′(m(x1)

2

)h′(m(x2)

2

)m(x1)

2

Therefore, we have:

∂(n1

1

n12

)∂k1

∣∣∣∣∣∣k1=k2

> 0 if

−h′′(m (x2)

2

)h′(m (x1)

2

)m (x2)

2> −h′′

(m (x1)

2

)h′(m (x2)

2

)m (x1)

2


−h′′(m(x2)

2

)h′(m(x2)

2

) m (x2)2

> −h′′(m(x1)

2

)h′(m(x1)

2

) m (x1)2

which is exactly the same condition we obtained before for the case in which β = 1.

Example in which σ′ (n) < 0

→ Case without Inada Conditions: arctan (n) :

h (n) = arctan (n)

h′ (n) =1

1 + n2> 0

but note that h′ (n)→ 1 as n→ 0, and

h′′ (n) = − 2n(1 + n2)2 < 0

Then:

σ = − h′ (n)

h′′ (n)1n.

solving it:

σ = −

(1

1+n2

)(− 2n

(1+n2)2

) 1n

σ =1 + n2

2n2

27

and

σ′ = − 1n3

< 0

but note that:

h′′′ (n) = −2

(1− 3n2

)(1 + n2)3

notice that this derivative is negative until n = 1√3

and then becomes positive. Since we want a bounded

function, unless limn→∞ h′ (n) is not defined, we must have limn→∞ h

′ (n) = 0, and then we need this

long tail. If h′′′ (·) < 0 this would be impossible.

→ Example with Inada Conditions X 21 (Chi-Square with one degree of freedom).

Pdf of Chi-square X 2k :

h′ (n) =1

2k2 Γ(k2

)x k2−1e−x2 .

where:

Γ (k) =∫ ∞

0tk−1e−tdt

Remark: we are considering the distribution as h (·) .

We can show that: Γ(

12

)=√π. Then, the pdf of X 2

1 is:

h′ (n) =1√2πn−

12 e−

n2 .

Then:

h′′ (n) = − 1√2π

12n−

32 e−

n2 (1 + n)

Finally:

σ =2

1 + n.

Notice that: h′ (n)→∞ as n→ 0 and h′′ (·) < 0. Again, we have h′′′ (·) > 0.

Looking atd

(n11n12

)dk1

Anyway, I actually can show a general proof ofd

(n11n12

)dk1

< 0 if we assume h′′′ (·) < 0 (at least for the

simplest case in which β = 1).

Proposition 14 If β = 1 and h′′′ (·) < 0, we have thatd

(n11n12

)dk1

< 0.

Proof. First of all, remember that, simplifying the system of equilibrium conditions, we end up with

the following two conditions:

28

k1h′ (n1

1

)= k2h

′ (m (x1)− n11

)(1)

k1h′ (n1

2

)= k2h

′ (m (x2)− n12

)(2)

Then, rearranging eq. (1), we have:

k1

k2=h′(m (x1)− n1

1

)h′(n1

1

)Now, consider that we increase m (x1). Since LHS is constant and h′′ (·) < 0, we must increase n1

1 to

increase the numerator and decrease the denominator. Therefore, an increase in m (x1) increases both

m (x1)− n11 and n1

1. Now, using the IFT, we have:

∂n11

∂k1= −

h′(n1

1

)k1h′′

(n1

1

)+ k2h′′

(m (x1)− n1

1

)Similarly:

∂n12

∂k1= −

h′(n1

2

)k1h′′

(n1

2

)+ k2h′′

(m (x2)− n1

2

)Considering that m (x1) > m (x2), using a similar argument as the one we used above, we have

n11 > n1

2 and m (x1) − n11 > m (x2) − n1

2 (an increase in m increases n but less than proportion-

ally). Then, using the fact that h′′ (·) < 0 and h′′′ (·) < 0, we have that h′(n1

1

)< h′

(n1

2

)and

−[k1h′′ (n1

1

)+ k2h

′′ (m (x1)− n11

)]> −

[k1h′′ (n1

2

)+ k2h

′′ (m (x2)− n12

)](since h′′′

(n1

1

)is more nega-

tive than h′′(n1

2

)and so for so on). Therefore, ∂n1

1∂k1

<∂n1

2∂k1

. But then:

∂n11

∂k1n1

2 −∂n1

2

∂k1n1

1 < 0

andd(n1

1

n12

)dk1

=∂n1

1∂k1

n12 −

∂n12

∂k1n1

1(n1

2

)2 < 0.

Now, I think we can have a broader solution. First of all, notice that:

∂n11

∂k1= −

h′(n1

1

)k1h′′

(n1

1

)+ k2h′′

(m (x1)− n1

1

)dividing above and below by −h′

(n1

1

), we have:

∂n11

∂k1=

1

−k1h′′(n1

1)h′(n1

1)− k2

h′′(m(x1)−n11)

h′(n11)

29

Since h′(n1

1

)= k2

k1h′(m (x1)− n1

1

), we have:

∂n11

∂k1=

1

k1

{−h′′(n1

1)h′(n1

1)− h′′(m(x1)−n1

1)h′(m(x1)−n1

1)

}A similar argument can be made for ∂n1

2∂k1

. Then, for ∂n11

∂k1n1

2 −∂n1

2∂k1

n11 < 0, we have:

1{−h′′(n1

1)n11

h′(n11)− h′′(m(x1)−n1

1)n11

h′(m(x1)−n11)

} <1{

−h′′(n12)n1

2

h′(n12)− h′′(m(x2)−n1

2)n12

h′(m(x2)−n12)

}Substituting the elasticity of substitution, we have:

−h′′(n1

2

)n1

2

h′(n1

2

) −h′′(m (x2)− n1

2

)n1

2

h′(m (x2)− n1

2

) < −h′′(n1

1

)n1

1

h′(n1

1

) −h′′(m (x1)− n1

1

)n1

1

h′(m (x1)− n1

1

)1

σ(n1

2

) +n1

2

m (x2)− n12

∗ 1σ(m (x2)− n1

2

) <1

σ(n1

1

) +n1

1

m (x1)− n11

∗ 1σ(m (x1)− n1

1

) .

Derivation of the continuous case. We need to be careful about which assumptions we impose

on n (x) for writing down the continuous case. If we rewrite the model with ∆s, we are using a

partition/refinement argument, which delivers a Riemann integral7. Based on this, we must have a

piecewise continuous n (x). Consider a partition P and an associated set of points X in which Xi ∈ Ii,

where Ii is an interval in the partition P. Then, S [(P,X ) , f ] is defined by:

S [(P,X ) , f ] =N−1∑i=1

h (n (Xi))Xi |Ii| .

A function f is integrable if and only if:

lim|P|→0

S [(P,X ) , f ] =∫ x

xh (n (x))xdx

for any (P,X ) . We can show that any piecewise continuous function satisfies integrability. The contin-

uous case can derived from taking the appropriate limit for ∆→ 0

L(n,x) =[∫

h(n)xdx]β

A special case with h CES:

L(n,x) =

[N∑i=1

nγi xαi

]βthen becomes in the continuous case:

L(n,x) =[∫

nγi xαi dni

]β.

7A function is Riemann integrable if it is continuous almost everywhere, i.e., it is discontinuous in at most a zero

measure set.

30

References

Antras, Pol, Luis Garicano and Esteban Rossi-Hansberg, “Offshoring in a Knowledge Econ-

omy”, Quarterly Journal of Economics 121(1), 2006, 31-77.

Eeckhout, Jan, and Boyan Jovanovic, “Knowledge Spillovers and Inequality”, American Eco-

nomic Review 92(5), 2002, 1290-1307.

Eeckhout, Jan, and Boyan Jovanovic, “Occupational Sorting and Development”, NBER working

paper w13686, 2007.

Gabaix, Xavier, and Augustin Landier “Why has CEO Pay Increased so Much?”, Quarterly

Journal of Economics, forthcoming 2008.

Gale, David, and Lloyd Shapley, “College Admission and the Stability of Marriage”, American

Mathematical Monthly, 69, (1962), 9-15.

Garicano, Luis, “Hierarchies and the Organization of Knowledge in Production,” Journal of Political

Economy 108(5), 2000.

Kelso, Alexander and Vincent Crawford, “Job Matching, Coalition Formation, and Gross

Substitutes,” Econometrica 50, 1982.

Koopmans, T. C., and M. J. Beckmann, “Assignment Problems and the Location of Economic

Activity.” Econometrica 25, 1957, 52-76.

Kremer, Michael, “The O-Ring Theory of Economic Development”, Quarterly Journal of Eco-

nomics 108(3), 1993, 551-575.

Kremer, Michael, and Eric Maskin, “Wage Inequality and Segregation by Skill”, NBER Working

Paper No. w5718, 1996

Lucas, Robert, “On the Size Distribution of Business Firms.” Bell Journal 1978.

Luttmer, Erzo, “Selection, Growth, and the Size Distribution of Firms,” Quarterly Journal of

Economics 122(3), 2007, 1103-1144.

Rossi-Hansberg, Esteban, and Mark Wright, “Establishment Size Dynamics in the Aggregate

Economy”, American Economic Review 97(5), 2007, 1639-1666.

Documents

Span of Control in Production Hierarchies...Span of Control in Production Hierarchies Jan Eeckhout and Roberto Pinheiro University of Pennsylvania February, 2008 (Preliminary Draft)