Granular Firms - Brandeis...

Preview:

Citation preview

Granular Firms:Micro-Data on the Private

Sectorand

Large-Scale Agent ModelsRob Axtell

Computational Social Science Program, Department of Computational and Data

Sciences

Department of Economics

Krasnow Institute for Advanced Study

George Mason University

rax222@gmu.edu

External Professor, Santa Fe Institute

External Faculty Member, Northwestern Institute on Complexity, Northwestern

University

Outline

Background (5 mins)

Mental model calibration (5 mins)

The data (5 mins)

Agent-based computational models (5 mins)

A full-scale model (5 mins)

Computational aspects (5 mins)

The Private SectorMost output, innovation due to firms

Most workers are employees of firms

What do we know about firms?

Many ‘competing’ theories:

Most read like philosophy not science; falsifiable?

Data on universe of firms progressively available

(U.S.)

There do not exist models that explain all these data

Common is the case study of individual firms

Goals:

Describe these data

Can a model be built that explains them?

U.S. Private Sector:Calibration

Firms~30 million

6 million have employees

Publicly-traded firms

(~10 thousand)

Wal-Mart

~100 thousand/month

Labor~120 million

WORK

WAGES

30 million in the public sector

~2 million/month

~2 million/month

~3 million/month

UnemployedOut of

labor force

Job-to-

job

U.S. Private Sector:Calibration

Firms~30 million

6 million have employees

Publicly-traded firms

(~10 thousand)

Wal-Mart

~100 thousand/month

Labor~120 million

WORK

WAGES

30 million in the public sector

~2 million/month

~2 million/month

~3 million/month

UnemployedOut of

labor force

Job-to-

job

MICRO-DATA ON ALL U.S. FIRMS FROM TAX RECORDS➤ Firm sizes by employees (input), receipts (output), market

capitalization (public), plant and equipment, depreciation,…

➤ Firm ages and, when they go out of business, firm lifetimes

➤ Firm sizes conditional on age and firm ages by size

➤ Firm productivities (output per unit of input), by size and age

➤ Firm growth rates (size changes per unit time), annually and over longer periods; growth conditional on firm size and age

➤ Firm employment by size, age, productivity, growth rates,…

➤ Employment tenure, by size, age, productivity, growth rates,…

➤ Employment networks from employer-employee matched data

The Data

Firm entry, exit

Number of firms

Size dist (workers) Size dist (output) Returns to scale Productivity dist

Age distribution Survival vs age Avg size vs age Avg age vs size Joint dist of size, age

Avg growth rate Growth rate distAvg growth vs

sizeAvg growth vs age Growth vs size, age Growth var vs size

Largest firm Employment by age Income dist Job tenure dist Hiring + separation

Job to job flows Labor flow network LFN degree dist LFN edge weigh dist LFN disassortativityLFN clustering coeff

Productivity by

size

Income by firm size

Firm lifetimes

Number of Firms/Avg Size

SOURCE: CENSUS

Number of Firms/Avg Size

SOURCE: CENSUS

Number of Entrants…

Number of Entrants…

…and Exits

Firm Sizes

“U.S. Firm Sizes are Zipf Distributed,” RL Axtell, Science, 293 (Sept 7, 2001), pp. 1818-20

Pr[S ≥ si] = 1-F(si) = si-a

Average firm size ~ 20

Median ~ 3-4

Mode = 1

Source: Census

-2

~ZIPF DISTRIBUTION

Stationary U.S. Firm Sizes

Source: Census

2000-2012

Individual firms move up

and down the distribution

over time

Firm Sizes in France

SOURCE: GARICANO, LELARGE AND VAN REENEN, 2013

broken power law

Size of the Largest Firm

Source: Luttmer [2011]

Labor Productivity

BIG FIRMS DO NOT HAVE THE HIGHEST PRODUCTIVITY!

IMPLICATIONS: PROBLEMATICAL FOR THE NEW NEW TRADE THEORY

FIRM SIZE, S:

1 < S < 100

100 < S < 10,000

10,000 < S < 1,000,000

Firm Ages are Stationary

Source: Census

WEIBULL DISTRIBUTION

Survival Probability

Source: Census

YOUNG FIRMS HAVE A HIGHER FAILURE RATE

Firm Growth: Subbotin dist

Source: Census and SBA; Perline, Axtell and Teitelbaum

[2006]

ANNUAL 5 YEARS

More variance in separations

Davis, Haltiwanger and Schuh [1996]

LESS HEAVY-TAILED OVER TIME

Job Tenure

Source:

BLS

EXPONENTIAL DISTRIBUTION

U.S. Wages are Stationary

Source: Yakovenko

Buyer-Supplier Networks

Source: Atalay, Hortascu, Roberts and Syverson

Labor Flow Network(dissertation of Omar Guerrero)

Data Summary

Approximately stationary distributions of:

Firm sizes (by many measures)

Firm productivity, by size

Firm ages, survival probabilities, etc., by size

Firm growth rates, by size and age

Job tenure, by size and age, and growth…

Wages, by size and age and growth and tenure..

Networks…

Gross regularities any theory needs to hit…

TheoriesCoase (1930s): Why do firms exist at all?

Why did GM buy Fisher Body (Milwaukee)?

Berle and Means (1930s): corporate structure

Business schools: case study approach

Data on public firms (1950s): stochastic growth

Organizational models (1960s and beyond)

Theories developed in the pre-micro-data era are

not easily brought to bear on such data

Models

How to Create a Model Grounded in the Data?

What parts of the conventional theory of the firm

can be brought to bear on the data?

Machine learning approach?

Stochastic growth approach?

ABM approach: large numbers of interacting

agents

‘Growing’ the Firm Population

Start with 120M workers, emerge:

6 million firms (with employees)

3 million job changers each month

100 thousand start-ups each month

20 thousand largest firms employ 1/2 of workers

1 firm with one million employees

What microeconomic specification can reproduce

these and other empirical facts?

Methodology: Computational agents

How to realize 108 agents?

Model components

• Heterogeneous agents, otherwise no groups or

identical groups

• In order to get large firms to form we need increasing

returns to size/scale (team production)

• Agents adjust their behavior to one another

• Agents cannot be rational because the environment is

too complex, so boundedly rational

• Compensation system: rules for dividing team output

• Each agent has a social network from which it learns

about jobs

Model: Team Production

Consider a group of N agents, each of whom

supplies input (‘effort’)

Total effort level:

Total output:

Each agent receives compensation

proportional to input and a share of output:

Agents have Cobb-Douglas preferences for

income (output shares) and leisure,

Agents periodically seek utility

Analytical Results

Nash equilibria

always exist and

are unique

Agents under-

supply effort at

Nash equlibrium

(Holmström)

Nash equilibrium is

dynamically

unstable for

sufficiently large

groups

Pseudo-code

For all agents:

Consider staying in current job; how hard to work?

Consider a few other firms

Consider doing a start-up

Do option w/highest reward

For all firms:

Produce output

Pay employees

Basic Idea

t t+1

5 FIRMS

13 AGENTS

Basic Idea

t t+1

5 FIRMS

13 AGENTS

Basic Idea

t t+1

5 FIRMS

13 AGENTS

Basic Idea

t t+1

5 FIRMS

13 AGENTS

Basic Idea

t t+1

5 FIRMS

13 AGENTS

5 DIFFERENT FIRMS

13 AGENTS (CONSERVED)

Model dynamics with 1000 agents

Base ParameterizationSize of the U.S. private sector

Results

Number of firms

Number of firms

Average firm size Size dist (workers) Size dist (output) Returns to scale Productivity dist

Average firm age Age distribution Survival vs age Avg size vs age Avg age vs size Joint dist of size, age

Avg growth rate Growth rate distAvg growth vs

sizeAvg growth vs age Growth vs size, age Growth var vs size

Firm entry, exit Largest firm Employment by age Income dist Job tenure dist Hiring + separation

Job to job flows Labor flow network LFN degree dist LFN edge weigh dist LFN assortativityLFN clustering coeff

6 million 20 workers Zipf Zipf constant heavy-tailed

Weibull increasinglinearly

increasing

LaplaceSubbotin decreasing decreasing increasing 1/6 law

1 million100K/mo

3 million/mo power law power law complex

exponential simultaneousexponential

disassortative

linear inc

log(size)

exponential

hierarchical

Pareto

exponential15 years

Realized Number of Firms

TOTAL

ENTRANTS

EXITS

avg firm size ~ 20 =

Realized Firm Size Distributions

-2

Firm Size Statistics

Realized Productivity

SMALL FIRMS

MEDIUM FIRMS

LARGE FIRMS

Realized Firm Ages

Realized Firm Survival

Realized Firm Growth

KEY

8-15

16-31

32-63

64-127

128-255

256-511

512-1023

Realized Job Tenure

US data

Model output

Realizing 108 agents

Needed: 500 bytes/agent => 60 GB, 1 KB/firm => 6 GB

What doesn’t work:

hardware: vector/cluster HPC, multiple boxes, clouds;

software: MPI; Java threads; ‘big data’ languages: Hadoop,

Go, Scala, Erlang, Clojure, Haskell...

What is needed:

large ‘flat’ memory space, OS to address it (Unix)

lots of processors (many cores/processor)

low level language (C/C++, OpenMP, Intel TBB, some GPUs)

2015 $US: $10K for 256 GB RAM, $25K for 1000 GB RAM

What this gets you:compute time of 12-24 hours to remove transient

output ‘data’ directly comparable to real-world data

Fork/Join Parallelization of Agent CodesPopulation of agents

or firms

‘Fork’ it up into

pieces to execute

on a single core

Let it run 1 month

Join to compute

statistics and do

housekeeping

Standard paradigm

in C/C++ (pthreads)

Summary

New data guides model building:

Older theories are insufficiently quantitative

We have more data than we can explain presently

Built and calibrated a full scale model of the U.S.

private sector (120 million workers, 6 million firms)

Endogenous dynamics: realistic job flows + firm

formation for micro reasons; no exogenous shocks

Microeconomic level is not in (Nash) equilibrium

Macro-level is approximately stationary

Possible Implications

IO courses focused on purely strategic (game

theoretic considerations) seem archaic vis-a-vis

comprehensive micro-data

Could a model of the U.S. private sector serve as

the basis for the production side of

macroeconomics?

Can machine learning be used to develop

alternative models that also fit the data?

Can such models be developed to have

economic interpretations/meaning?

Why a Full-Scale Model?

Comprehensive data (administratively complete,

every firm) are increasingly available…

Models at reduced scale require rescaling of the

results in order to compare to the data

We have enough going on already, let’s simplify by

eliminating scale effects!

Links to the reigning computational zeitgeist: ‘whole

cell’ simulation, brain from ‘every last neuron,’

turbulence via CFD, climate from GCMs

Why not: Computing challenges are very real…

Recommended