89
Network Formation with Local Complements and Global Substitutes: The Case of R&D Networks Chih-Sheng Hsieh a , Michael D. K¨ onig b , Xiaodong Liu c a Department of Economics, Chinese University of Hong Kong, CUHK Shatin, Hong Kong, China. b Department of Economics, University of Zurich, Sch¨ onberggasse 1, CH-8001 Zurich, Switzerland. c Department of Economics, University of Colorado Boulder, Boulder, Colorado 80309–0256, United States. Abstract We analyze R&D collaboration networks in industries where firms are competitors in the product market. Firms’ benefits from collaborations arise by sharing knowledge about a cost-reducing technology. By forming collaborations, however, firms also change their own competitive position in the market as well as the overall market structure. We provide a general characterization of both equilibrium networks and endogenous production choices in the form of a Gibbs measure. We analyze the efficient network maximizing social welfare, and find that the efficient graph shows a strong core-periphery structure. We then estimate the model using a unique dataset on R&D collaborations in the chemicals and pharmaceuticals sector, and show that it can reproduce the empirically observed patterns. With our estimated model we are able to perform various counterfactual analyses, taking into account that the network and production can dynamically adjust to shocks or policy interventions. In particular, we analyze the impact on welfare of firm exits, mergers, and subsidies to collaboration costs. We find that the exit of Pfizer would lead to a reduction in welfare by 9.3%, a merger between Cubist Pharmaceuticals and Novartis would result in a welfare loss of 2.9%, while subsidizing an R&D collaboration between Isis Pharmaceuticals Inc. and Allergan Inc. would increase welfare by 0.8%. Our study further reveals that when ignoring the network structure, the impact of exits, mergers or subsidies is significantly underestimated. Key words: network formation, path dependence, efficiency, R&D networks, key player, mergers & acquisitions, innovation, spillovers, subsidies JEL: C63, D83, D85, L22 1. Introduction R&D partnerships have become a widespread phenomenon characterizing technological dynamics, especially in industries with rapid technological development such as, for instance, the pharma- ceutical and chemical industries [see e.g. Hagedoorn, 2002; Powell et al., 2005; Roijakkers and We would like to thank Antonio Cabrales, Matt Jackson, Ben Golub, Yves Zenou, Fabrizio Zilibotti, Alexey Kushnir, Nick Netzer, Onur ¨ Ozg¨ ur, Tim Hellmann, Andrea Montanari, Mar´ ıaS´aez-Mart´ ı, Armin Schmutzler, Filomena Garcia, Timothy Van Zandt, Martin Summer, Terence Johnson, Anton Badev, Joachim Voth, David Hemous, and seminar participants at the University of Zurich, the Austrian National Bank, the Econometric Society World Congress in Montrael in 2015, the congress of the European Economic Association in Mannheim in 2015, the 25th Stony Brook International Conference on Game Theory in 2014, the Public Economic Theory Conference in Lisbon in 2013 and Stanford University in 2012 for the helpful comments and advice. We further thank Christian Helmers for data sharing and Sebastian Ottinger for the excellent research assistance. A previous version of this paper has been circulated under the title “Dynamic R&D Networks” as Working Paper No. 109 in the Department of Economics working paper series of the University of Zurich. Michael K¨ onig acknowledges financial support from Swiss National Science Foundation through research grants PBEZP1–131169 and 100018 140266, and thanks SIEPR and the department of economics at Stanford University for their hospitality during 2010–2012. Email addresses: [email protected] (Chih-Sheng Hsieh), [email protected] (Michael D. onig), [email protected] (Xiaodong Liu) Preliminary and incomplete – please do not distribute. This version: March 25, 2016, First version: April 24, 2012

f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

Network Formation with Local Complements and Global Substitutes: The

Case of R&D Networks

Chih-Sheng Hsieha, Michael D. Konigb, Xiaodong Liuc

aDepartment of Economics, Chinese University of Hong Kong, CUHK Shatin, Hong Kong, China.bDepartment of Economics, University of Zurich, Schonberggasse 1, CH-8001 Zurich, Switzerland.

cDepartment of Economics, University of Colorado Boulder, Boulder, Colorado 80309–0256, United States.

Abstract

We analyze R&D collaboration networks in industries where firms are competitors in the productmarket. Firms’ benefits from collaborations arise by sharing knowledge about a cost-reducingtechnology. By forming collaborations, however, firms also change their own competitive positionin the market as well as the overall market structure. We provide a general characterization ofboth equilibrium networks and endogenous production choices in the form of a Gibbs measure.We analyze the efficient network maximizing social welfare, and find that the efficient graphshows a strong core-periphery structure. We then estimate the model using a unique dataset onR&D collaborations in the chemicals and pharmaceuticals sector, and show that it can reproducethe empirically observed patterns. With our estimated model we are able to perform variouscounterfactual analyses, taking into account that the network and production can dynamicallyadjust to shocks or policy interventions. In particular, we analyze the impact on welfare offirm exits, mergers, and subsidies to collaboration costs. We find that the exit of Pfizer wouldlead to a reduction in welfare by 9.3%, a merger between Cubist Pharmaceuticals and Novartiswould result in a welfare loss of 2.9%, while subsidizing an R&D collaboration between IsisPharmaceuticals Inc. and Allergan Inc. would increase welfare by 0.8%. Our study furtherreveals that when ignoring the network structure, the impact of exits, mergers or subsidies issignificantly underestimated.

Key words: network formation, path dependence, efficiency, R&D networks, key player,mergers & acquisitions, innovation, spillovers, subsidiesJEL: C63, D83, D85, L22

1. Introduction

R&D partnerships have become a widespread phenomenon characterizing technological dynamics,

especially in industries with rapid technological development such as, for instance, the pharma-

ceutical and chemical industries [see e.g. Hagedoorn, 2002; Powell et al., 2005; Roijakkers and

We would like to thank Antonio Cabrales, Matt Jackson, Ben Golub, Yves Zenou, Fabrizio Zilibotti, AlexeyKushnir, Nick Netzer, Onur Ozgur, Tim Hellmann, Andrea Montanari, Marıa Saez-Martı, Armin Schmutzler,Filomena Garcia, Timothy Van Zandt, Martin Summer, Terence Johnson, Anton Badev, Joachim Voth, DavidHemous, and seminar participants at the University of Zurich, the Austrian National Bank, the EconometricSociety World Congress in Montrael in 2015, the congress of the European Economic Association in Mannheimin 2015, the 25th Stony Brook International Conference on Game Theory in 2014, the Public Economic TheoryConference in Lisbon in 2013 and Stanford University in 2012 for the helpful comments and advice. We furtherthank Christian Helmers for data sharing and Sebastian Ottinger for the excellent research assistance. A previousversion of this paper has been circulated under the title “Dynamic R&D Networks” as Working Paper No. 109 in theDepartment of Economics working paper series of the University of Zurich. Michael Konig acknowledges financialsupport from Swiss National Science Foundation through research grants PBEZP1–131169 and 100018 140266, andthanks SIEPR and the department of economics at Stanford University for their hospitality during 2010–2012.

Email addresses: [email protected] (Chih-Sheng Hsieh), [email protected] (Michael D.Konig), [email protected] (Xiaodong Liu)

Preliminary and incomplete – please do not distribute. This version: March 25, 2016, First version: April 24, 2012

Page 2: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

Hagedoorn, 2006]. In these industries firms have become more specialized in specific domains of a

technology and they tend to combine their knowledge with that of other firms that are specialized

in different domains in order to jointly generate innovations that can help to develop new prod-

ucts or reduce their production costs [Ahuja, 2000; Belderbos et al., 2004; Powell et al., 1996].1

Such R&D collaborations often involve competing firms, as for example a strategic alliance be-

tween Pfizer and Bayer, both operating in the pharmaceuticals sector (with primary standard

industry classification code 2834) to develop treatments for obesity, type 2 diabetes and other

related disorders in the year 2006 illustrates.

Despite the increasing importance of such R&D collaborations, there exists only limited re-

search on theoretical and empirical models of these relationships which can be used for policy

analysis. This paper provides the first fully tractable and estimable model of strategic R&D

network formation with endogenous production and R&D choice, which takes into account the

two-way flow of influence from the market structure to the incentives to form R&D collaborations

and, in turn, from the formation of collaborations to the market structure [cf. D’Aspremont and

Jacquemin, 1988].

We study the incentives of firms to form R&D collaborations with other firms and the im-

plications of these alliance decisions for the overall network structure. We provide a complete

characterization of the stationary states of a dynamic process in which firms can adjust both,

quantities produced (as well as research efforts), and the R&D collaborations between them,

based on a noisy profit maximization rationale [cf. Blume, 2003; Brock and Durlauf, 2001], tak-

ing into account that the establishment of an R&D collaboration is fraught with ambiguity and

uncertainty [cf. e.g. Czarnitzki et al., 2015; Kelly et al., 2002]. Using a potential function we

show that the stationary states of this process are completely characterized by a “Gibbs measure”

[cf. Bisin et al., 2006; Grimmett, 2010]. Further, we find that for any distribution of output lev-

els, the network can be characterized as an inhomogenous random graph with a link probability

that depends on the firms’ sizes and the linking cost [cf. Bollobas et al., 2007; Soderberg, 2002].

Moreover, we show that the stochastically stable networks (in the limit of vanishing noise) are

“nested split graphs” [cf. Konig et al., 2014a; Konig et al., 2011].2 We then characterize the sta-

tionary degree distribution and compute the asymptotic network density. In particular, we find

that there exists a sharp transition between sparse and dense networks with decreasing linking

costs. We also compute the stationary output levels and show that there exists an intermediate

range of the linking cost for which multiple equilibria arise. The equilibrium selection is a path

dependent process characterized by “hysteresis” [cf. David, 1992, 2005]. Moreover, as in the case

of the network density, there exists a sharp transition from a low output to a high output equilib-

rium. It is also possible to generalize our model by introducing heterogeneous marginal costs as

well as heterogeneous spillovers from collaborations between firms stemming from differences in

their technological characteristics. In particular, we show that if firms’ technology stocks follow a

power-law distribution then the degree distribution will be power-law distributed as well [cf. Pow-

ell et al., 2005]. We then investigate the efficient network architecture and output structure that

1For example, Bernstein [1988] finds that R&D spillovers decrease the unit costs of production for a sampleof Canadian firms. Similarly Belderbos et al. [2004] find evidence for production cost reductions due to R&Dcollaborations using data on a large sample of Dutch innovating firms.

2A network is a nested split graph if the neighborhood of every node is contained in the neighborhoods of thenodes with higher degrees [see also Mahadev and Peled, 1995]. See supplementary Appendix B for further networkdefinitions and characterizations.

2

Page 3: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

maximize social welfare, and find that the efficient graph has a strong core-periphery structure.

We then bring the model to the data by analyzing a unique panel dataset of R&D collab-

orations in the chemicals and pharmaceuticals sector (standard industry classification code 28)

matched to firms’ balance sheets. The theoretical characterization of the stationary states via a

Gibbs measure allows us to estimate the model’s parameters using a double Metropolis-Hastings

Markov Chain (DMH) Monte Carlo algorithm [Hsieh and Lee, 2014; Liang, 2010], which takes

into account not only the endogeneity of the network but also the firms’ production choices.

We also use a novel adaptive exchange (AEX) algorithm [Jin et al., 2013; Liang et al., 2015] to

overcome the uncertainty of slow mixing faced by the DMH algorithm, which applies importance

sampling to prevent the “local trap problem” (i.e. the sampler can escape local maxima of the

potential function used for maximum likelihood estimation).

From our estimation results we observe that the estimated technology spillover effect signif-

icantly drops when moving from an exogenous to an endogenous network, suggesting an endo-

geneity bias due to self-selection of R&D collaborations. The joint modeling approach on both

industry production and network formation in the endogenous case allows us to capture firms’

unobserved fixed effects in both the R&D link formation (and removal) and output adjustment

processes, and therefore eliminates any endogeneity bias due to omitted variables.

We then use our estimated model to investigate the impact of exogenous shocks on the network.

In particular, we perform a “key player” analysis [cf. Ballester et al., 2006], to gauge the impact

of firm exit on the economy, when firms are connected through R&D collaborations in a network.

The exit of a firm can have either financial reasons, such as they were experienced by the American

automobile manufacturing industry during the global financial downturn, or legal reasons, such as

for Volkswagen during the recent emission-fraud scandal. In the latter case, policy makers want

to know the overall cost they impose on the economy by inflicting high pollution penalties and

criminal fines that might threaten the continued existence of a firm. Focusing on the chemicals

and pharmaceutical sectors, our results indicate that the exit of Pfizer, an American global

pharmaceutical corporation, which is among the largest pharmaceutical companies world wide,

would lead to a reduction in welfare of 9.3%. We then provide a ranking of the firms in our sample

according to their impact on welfare upon exit. The ranking shows that the most important firms

are not necessarily the ones with the highest market share, but that we need to take into account

the position of these firms in a network of R&D collaborations, and how this network dynamically

responds to shocks.

Our framework also allows us to study mergers and acquisitions, and their impact on welfare

[cf. Farrell and Shapiro, 1990; Kim and Singal, 1993]. Traditional market concentration indices are

not adequate to correctly account for the network effects of a merger on welfare. This is because

the effect of a merger on industry profits, consumer surplus and overall welfare depends not only

on the market structure, but also on the architecture of the R&D collaboration network between

firms through which R&D spillovers are channelled. By taking into account these spillovers our

results show that a merger between Cubist Pharmaceuticals, an American biopharmaceutical

company that was ranked 19th in total sales among American based biotechnology companies

in 2006, and Novartis, a Swiss multinational pharmaceutical company which has recently been

ranked number one in sales among the worldwide industry, would result in a welfare loss of 2.9%.

The dominant position of these firms in the network amplifies the effect of a merger between them.

Consequently, the welfare loss ignoring the network structure is much lower than the welfare loss

3

Page 4: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

incurred in the presence of the R&D collaboration network. Our counterfactual policy analysis

is therefore potentially important for antitrust policy makers.

Finally, we investigate the impact of subsidizing R&D collaboration costs of selected pairs of

firms. Our study indicates that subsidizing an R&D collaboration between Isis Pharmaceuticals

Inc., an American pharmaceutical company developing cholesterol-reducing drugs, and Allergan

Inc., a global pharmaceutical company, would increase welfare by 0.8%. Not taking into account

the endogeneity of the network would yield a slightly lower welfare gain. As R&D subsidies are

increasingly being used by governmental organizations to stimulate collaboration activities, our

framework could assist governmental funding agencies that typically do not take into account the

aggregate spillovers generated within a dynamic R&D network structure.

Related literature There exist a number of related studies on R&D networks in the economics

literature. Similar to our framework, Dawid and Hellmann [2014]; Goyal and Moraga-Gonzalez

[2001]; Westbrock [2010] study the formation of R&D networks in which firms can form collab-

orations to reduce their production costs. In particular, Dawid and Hellmann [2014] study a

perturbed best response dynamic process as we do here, and analyze the stochastically stable

states. However differently to the current model, they ignore the R&D investment decision, and

the cost reduction from a collaboration in these models is independent of the identity and the

characteristics of the firms involved.3

Our analysis also bears similarities with a number of other recent contributions in the literature

which analyze a similar payoff structure. In the paper by Ballester et al. [2006] the authors derive

equilibrium outcomes in a linear quadratic game where agents’ efforts are local complements in an

exogenously given network. Differently to Ballester et al. [2006], we make the network as well as

effort choices endogenous.4 Our approach is further a generalization of the endogenous network

formation mechanisms proposed in Snijders [2001] and Mele [2010a]. As in these papers we use a

potential function to characterize the stationary states, but here both, the action choices as well

as the linking decisions are fully endogenized. Moreover, differently to these authors we provide a

microfoundation (from a Cournot competition model with externalities) for the potential function.

Similarly, Cabrales et al. [2010] allow the network to be formed endogenously, but assume that

link strengths are proportional to effort levels, while we make the linking decision depending on

marginal payoffs.

Further, in a recent paper by Badev [2013] a potential function is used to analyze the formation

of networks in which agents not only form links but also make a binary choice of adopting

a certain behavior depending on the choices of their neighbors. Differently to Badev [2013]

we consider a continuum of choices, provide an explicit equilibrium characterization, use an

alternative estimation method, and apply our model to a different context. Similarly, in a recent

paper Hsieh and Lee [2013] apply a potential function to an empirical model of network formation

3Goyal and Moraga-Gonzalez [2001] present a more general setup which relaxes this assumption but theiranalysis is restricted to regular graphs and networks comprising of four firms. In this paper we take into accountgeneral equilibrium structures with an arbitrary number of firms and make no ex ante restriction on the potentialcollaboration pattern between them.

4It is straightforward to see that the results obtained in this paper can be generalized to the payoff structureintroduced in Ballester et al. [2006]. See in particular the general payoff structure considered in Equation (8). Weprovide a complete equilibrium characterization for the model introduced in Ballester et al. [2006], but allow bothagents’ actions and links to be endogenously determined.

4

Page 5: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

and effort choices. However, their potential function is based on a transferable utility function

so that linking decisions are based on maximizing aggregate payoffs, while here we consider

decentralized R&D collaboration decision making between profit maximizing firms.

Furthermore, Bimpikis et al. [2014] analyze the effect of mergers and acquisitions within and

across different industries using a similar model as we do here. However, neither does their

analysis incorporate the spillover effects from R&D collaborations, nor do they allow for these

collaborations to respond to a merger. In contrast our empirical analysis reveals that the network

structure of R&D collaborations matters, and that when ignoring the network structure the

impact of mergers is significantly underestimated.

Finally, in Konig et al. [2014b] a similar market structure is considered. In particular, the

authors investigate the impact of a subsidy per unit of R&D spent, while here we analyze subsidies

to R&D collaborations directly. Moreover, the authors characterize key firms whose exit would

have the largest impact on the output of the economy in the short run, taking the network as

given. Here we develop a long run analysis, where the network is allowed to dynamically adjust

upon the exit of a firm. To the best of your knowledge, this is the first paper to perform such a

dynamic key player analysis.

The paper is organized as follows. Section 2 introduces the firms’ profit function, the spillovers

from collaborations and the competitive environment. Section 3 defines the stochastic network

formation and quantity adjustment process and provides a complete characterization of the sta-

tionary state. Section 4 discusses several extensions of the model that allow for firm heterogeneity.

In Section 5 the welfare maximizing networks are derived. Section 6 provides information about

the data we are using and explains the estimation method. Section 7 then uses the estimated

model to analyze several counterfactual policy experiments. Finally, Section 8 concludes. All

proofs are relegated to Appendix A. Additional relevant material can be found in the supplemen-

tary appendix. In particular, supplementary Appendix B provides basic definitions and character-

izations of networks, supplementary Appendix C explains the distinction between continuous and

discrete quantity choices, supplementary Appendix D provides an equilibrium characterization

for exogenous networks, supplementary Appendix E explains in detail the extensions mentioned

in the main text, while supplementary Appendix F provides additional details of the estimation

algorithm.

2. The Model

We consider a Cournot oligopoly game in which a set N = 1, . . . , n of firms is competing in a

homogeneous product market.5 We assume that firms are not only competitors in the product

market, but they can also form pairwise collaborative agreements. These pairwise links involve

a commitment to share R&D results and thus lead to lower marginal cost of production of the

collaborating firms. The amount of this cost reduction depends on the effort the firms invest into

R&D. Given the collaboration network G ∈ Gn, where Gn denotes the set of all graphs with n

nodes, each firm sets an R&D effort level unilaterally.6 We assume that firms can only jointly

develop a cost reducing technology. Given the effort levels ei ≥ 0, marginal cost ci ≥ 0 of firm i

5Generalizations to Bertrand competition are straight forward [see Konig et al., 2014b; Westbrock, 2010].6See also Kamien et al. [1992] for a similar model of competitive RJVs in which firms unilaterally choose their

R&D effort levels.

5

Page 6: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

is given by [cf. Spence, 1984]7,8

ci(e, G) = c− αei − β

n∑

j=1

aijej , (1)

where aij = 1 if firms i and j set up a collaboration (0 otherwise) and aii = 0. The parameter

α ≥ 0 measures the relative cost reduction due to a firms’ own R&D effort while the parameter

β ≥ 0 measures the relative cost reduction due to the R&D effort of its collaboration partners.

In this model, firms are exposed to business stealing effects if their rivals increase their output

via cost reducing R&D collaborations.

Moreover, we also assume that firms incur a direct cost γ ≥ 0 for their R&D efforts and a

fixed cost ζ ≥ 0 for each R&D collaboration.9 The profit of firm i, given the R&D network G

and the quantities 0 ≤ qi ≤ q and efforts 0 ≤ ei ≤ e, is then given by

πi(q, e, G) = (pi − ci)qi − γe2i − ζdi, (2)

with vectors q = (qi)ni=1 and e = (ei)

ni=1. Inserting marginal costs from Equation (1) gives

πi(q, e, G) = piqi − cqi + αqiei + βqi

n∑

j=1

aijej − γe2i − ζdi.

The first-order condition with respect to R&D effort ei is given by ∂πi(q,e,G)∂ei

= αqi − 2γei = 0.

Solving for ei and taking into account that ei ∈ [0, e] delivers e∗i = minλqi, e, where we have

denoted by λ = α2γ .

10 This equation can be viewed as reflecting learning-by-doing effects on

R&D efforts. It reflects various empirical studies which have found that the R&D effort of a firm

is correlated with its output or size [Cohen and Klepper, 1996a,b]. We then can write marginal

costs from Equation (1) as follows11

ci(e∗(q), G) = c− λαqi − λβ

n∑

j=1

aijqj. (3)

Profits can be written as

πi(q, G) ≡ πi(q, e∗(q), G) = piqi − cqi − λαq2i + λβqi

n∑

j=1

aijqj − λ2γq2i − ζdi. (4)

Next we consider the demand for goods produced by firm i. A representative consumer

7Note that we have neglected spillovers among non-collaborating firms. For an extension incorporating thisadditional spillover channel see Konig et al. [2014b].

8This generalizes earlier studies such as the one by D’Aspremont and Jacquemin [1988] where spillovers were as-sumed to take place between all firms in the industry and no distinction between collaborating and non-collaboratingfirms was made.

9In Section 4 we discuss several extensions of the model including heterogeneous linking costs.10Konig et al. [2014b] show that with qi ∈ [0, q] we must have that 0 ≤ ei ≤ qi ≤ q, and requiring that

mini∈N ci > q(1 + β(n− 1)), implies that the best response effort level of firm i is given by e∗i = λqi.11We assume that firms always implement the optimal R&D effort level. Since the optimal R&D effort decision

only depends on a firm’s own output, a firm does not face any uncertainty when implementing this strategy. InSection 3 we will, however, introduce noise in the optimal output and collaboration decisions, since these dependon the decisions of all other firms in the industry and their characteristics, which might be harder to observe.

6

Page 7: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

maximizes [Singh and Vives, 1984]

U(I, q1, . . . , qn) = I + a

n∑

i=1

qi −1

2

n∑

i=1

q2i −b

2

n∑

i=1

j 6=i

qiqj, (5)

with the budget constraint I+∑n

i=1 qi ≤ E and endowment E. The parameter a captures the total

size of the market, whereas b ∈ (0, 1], measures the degree of substitutability between products.

In particular, b = 1 depicts a market of perfect substitutable goods, while b → 0 represents the

case of almost independent markets. The constraint is binding and the utility maximization of

the representative consumer gives the inverse demand function for firm i

pi = a− qi − b∑

j 6=i

qj. (6)

Firms face an inverse linear demand as given in Equation (6). Firm i then sets its quantity qi in

order to maximize its profit πi given by Equation (4). We also assume that there is a maximum

production capacity q such that qi ≤ q for all i ∈ N . Inserting marginal cost from Equation (3)

and inverse demand from Equation (6) we can write firm i’s profit as

πi(q, G) = (a− c)qi − (1− λα+ λ2γ)q2i − bqi∑

j 6=i

qj + λβ

n∑

j=1

aijqiqj − ζdi. (7)

In the following we will denote by η ≡ a− c, ν ≡ 1− λα+ λ2γ and ρ ≡ λβ, so that Equation (7)

becomes [cf. Ballester et al., 2006]

πi(q, G) = ηqi − νq2i︸ ︷︷ ︸own concavity

−bqin∑

j 6=i

qj

︸ ︷︷ ︸global substitutability

+ ρqi

n∑

j=1

aijqj

︸ ︷︷ ︸local complementarity

−ζdi. (8)

Firm i ∈ N sets its quantity qi and makes profit πi given by Equation (8). The corresponding

first-order conditions are given by

∂πi(q, G)

∂qi= η − 2νqi − b

n∑

j 6=i

qj + ρ

n∑

j=1

aijqj = 0. (9)

The second-order derivatives for j 6= i are given by ∂2πi(q,G)∂qj∂qi

= ∂2πi

∂qi∂qj= −b+ρaij, which is positive

if b < ρ and aij = 1, and ∂2πi

∂q2i= −2ν ≤ 0, if ν ≥ 0. Hence, the payoff function in Equation

(8) is supermodular for linked firms expressing strategic complementarity, as allied firms’ output

choices are complements to each other [cf. Topkis, 1998]. From Equation (9) we can write firm

i’s best response quantity as

qi = fi(q−i, G) ≡η

2ν− b

j∈N\i

qj +ρ

j∈Ni

qj =η

2ν− b

j /∈(Ni∪i)

qj +ρ− b

j∈Ni

qj, (10)

with the constraint that 0 ≤ qi ≤ q. Equation (10) shows that output of i is decreasing in the

output of the firms j not connected to i. Moreover, if ρ < b, then firm i’s output is also decreasing

in its neighbors’ output. However, if ρ > b, i’s output is increasing in its neighbors’ output.

7

Page 8: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

3. Industry Dynamics and Equilibrium Characterization

In the following we provide a complete equilibrium analysis of the R&D collaboration game,

assuming that firms are myopic (as e.g. in Watts [2001]). For this purpose it will be crucial

to observe that the profit function introduced in Equation (8) admits a potential function [cf.

Monderer and Shapley, 1996], which not only accounts for quantity adjustments but also for the

linking strategies.

Proposition 1. Assume that both, quantities and links can be changed according to a myopicprofit maximizing rationale of firms. Then the profit function of Equation (8) admits a potentialgame with potential function Φ: Qn × Gn → R given by

Φ(q, G) =

n∑

i=1

(ηqi − νq2i )−b

2

n∑

i=1

j 6=i

qiqj +ρ

2

n∑

i=1

n∑

j=1

aijqiqj − ζm, (11)

where m is the number of links in G.

We allow the network to be formed endogenously, based on the profit maximizing decisions of

firms with whom to collaborate, and share knowledge about a cost reducing technology. The

opportunities for change arrive as a Poisson process [cf. Sandholm, 2010], similar to Calvo models

of pricing [Calvo, 1983]. To capture the fact that the establishment of an R&D collaboration is

fraught with ambiguity and uncertainty [cf. Czarnitzki et al., 2015; Czarnitzki and Toole, 2013;

Kelly et al., 2002], we will introduce noise in this decision process. The precise definition of the

dynamics of quantity adjustment and network evolution is given in the following:

Definition 1. The evolution of the population of firms and the collaborations between them ischaracterized by a sequence of states (ωt)t∈R+ , ωt ∈ Ω = Qn×Gn, where each state ωt = (qt, Gt)consists of a vector of firms’ output levels qt ∈ Qn and a network of collaborations Gt ∈ Gn. Weassume that firms choose quantities from a bounded set Q. In a short time interval [t, t + ∆t),t ∈ R+, one of the following events happens:

Output adjustment At rate χ ≥ 0 a firm i ∈ N is selected at random and given a revisionopportunity of its current output. When firm i receives such a revision opportunity, theprobability to choose the output is governed by a multinomial logistic function with choiceset Q and parameter ϑ [cf. Anderson et al., 2001; McFadden, 1976], so that the probabilitythat we observe a switch by firm i to an output level in the infinitesimal interval [q, q + dq]is given by

P (ωt+∆t = (q ≤ qi ≤ q + dq,q−it, Gt)|ωt = (qt, Gt)) = χeϑπi(q,q−it,Gt)dq∫

Q eϑπi(q′,q−it,Gt)dq′

∆t+ o(∆t).

(12)

Link formation With rate λ ≥ 0 a pair of firms ij which is not already connected receives anopportunity to form a link. The formation of a link depends on the marginal payoff thefirms receive from the link plus an additive pairwise i.i.d. error term εij,t. The probabilitythat link ij is created is then given by12

P (ωt+∆t = (qt, Gt + ij)|ωt−1 = (q, Gt)) = λ P (πi(qt, Gt + ij)− πi(qt, Gt) + εij,t > 0∩πj(qt, Gt + ij) − πj(qt, Gt) + εij,t > 0) ∆t+ o(∆t)

= λ P (Φ(qt, Gt + ij)− Φ(qt, Gt) + εij,t > 0)∆t+ o(∆t),

12Alternatively, we can think of links being formed in a cooperative way, however with no commitment regardingthe output produced by firms (which is chosen in a non-cooperative way) as it is the case in typical models incontract theory [cf. Grossman and Hart, 1986]. Similar results can be obtained using this alternative specification.

8

Page 9: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

where we have used the fact that πi(qt, Gt+ ij)− πi(qt, Gt) = πj(qt, Gt + ij)−πj(qt, Gt) =Φ(qt, Gt + ij) − Φ(qt, Gt). Assuming that the error term εij,t is independently logisticallydistributed,13 we obtain for the creation of the link ij

P (ωt+∆t = (qt, Gt + ij)|ωt = (qt, Gt)) = λ P (−εij,t < Φ(qt, Gt + ij) − Φ(qt, Gt))∆t+ o(∆t)

= λeϑΦ(qt,Gt+ij)

eϑΦ(qt,Gt+ij) + eϑΦ(qt,Gt)∆t+ o(∆t). (13)

Link removal With rate ξ ≥ 0 a pair of connected firms i, j receives an opportunity to terminatetheir collaboration. The link is removed if at least one firm finds this profitable. Themarginal payoffs from removing the link ij are perturbed by an additive pairwise i.i.d. errorterm εij,t. The probability that the link ij is removed is then given by

P (ωt+∆t = (qt, Gt − ij)|ωt = (q, Gt)) = ξ P (πi(qt, Gt − ij) − πi(qt, Gt) + εij,t > 0∪πj(qt, Gt − ij)− πj(qt, Gt) + εij,t > 0) ∆t+ o(∆t)

= ξ P (Φ(qt, Gt − ij)− Φ(qt, Gt) + εij,t > 0)∆t+ o(∆t),

where we have used the fact that πi(qt, Gt− ij)− πi(qt, Gt) = πj(qt, Gt − ij)−πj(qt, Gt) =Φ(qt, Gt − ij)−Φ(qt, Gt). When the error term is independently logistically distributed weobtain

P (ωt+∆t = (qt, Gt − ij)|ωt = (qt, Gt)) = ξ P (−εij,t < Φ(qt, Gt − ij) − Φ(qt, Gt))∆t+ o(∆t)

= ξeϑΦ(qt,Gt−ij)

eϑΦ(qt,Gt−ij) + eϑΦ(qt,Gt)∆t+ o(∆t).

We assume that the set Q is a discretization of the bounded interval [0, q], with q <∞. The fact

that Q is a countable set allows us to use standard results for discrete state space, continuous

time Markov chains [cf. e.g. Liggett, 2010; Norris, 1998; Stroock, 2005].14 For an infinitesimally

fine discretization we then can use a continuous analogue of the standard multinomial logit

probabilistic choice framework in Equation (12) [see also Anderson et al., 2004, 2001; Ben-Akiva

and Watanatada, 1981; McFadden, 1976], where the probability of choosing an output level is

proportional to an exponential function of the profit associated with the output level.15 The

standard derivation of the logit model is based on the assumption that profits are subject to

noise from a type-I extreme value distribution [cf. Anderson et al., 1992].

Moreover, we can numerically implement the stochastic process introduced in Definition 1

using the “next reaction method” for simulating a continuous time Markov chain [cf. Anderson,

2012; Gibson and Bruck, 2000]. We will use this method throughout the paper to illustrate our

theoretical predictions for various network statistics.

Let F denote the smallest σ-algebra generated by σ (ωt : t ∈ R+). The filtration is the non-

decreasing family of sub-σ-algebras Ftt∈R+ on the measure space (Ω,F), Ω = Qn × Gn, with

the property that F0 ⊆ F1 ⊆ · · · ⊆ Ft ⊆ · · · ⊆ F . The probability space is given by the triple

13Let z be i.i. logistically distributed with mean 0 and scale parameter ϑ, i.e. Fz(x) = eϑx

1+eϑx . Consider the

random variable ε = g(z) = −z. Since g is monotonic decreasing, and z is a continuous random variable, the

distribution of ε is given by Fε(y) = 1− Fz(g−1(y)) = eϑy

1+eϑy .14In contrast, if Q is assumed to be a Borel set in R then we need to consider more general Feller processes [see

e.g. Liggett, 2010, Chap. 3]. In this case any measurable function f : Ω → R of the state variables ω ∈ Ω is aCaratheodory function since f(q, ·) is continuous for each q ∈ Qn and f(·, G) is (Gn,BG) measurable [Aliprantisand Border, 2006].

15See the supplementary Appendix C for further details.

9

Page 10: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

(Ω,F ,P), where P : F → [0, 1] is the probability measure satisfying∫Ω P(ω)dµ(ω) = 1. As we will

see below the sequence of states (ωt)t∈R+ , ωt ∈ Ω induces an irreducible and positive recurrent

(i.e. ergodic) time homogeneous Markov chain.

The one step transition probability matrix P(t) : Ω2 → [0, 1]|Ω|2 has elements which determine

the probability of a transition from a state ω ∈ Ω to a state ω′ ∈ Ω in a small time interval

[t, t + ∆t) given by P(ωt+∆t = ω′|Ft = σ(ω0,ω1, . . . ,ωt = ω)) = P(ωt+∆t = ω′|ωt = ω) =

q(ω,ω′)∆t+ o(∆t) if ω′ 6= ω, where q(ω,ω′) is the transition rate from state ω to state ω′. The

transition rate matrix (or infinitesimal generator) Q = (q(ω,ω′))ω,ω′∈Ω of the Markov chain is

given by

q(ω,ω′) =

χ eϑπi(q,q−i,G)dq∫Q eϑπi(q

′,q−i,G)dq′if ω′ = (q ≤ qi ≤ q + dq,q−i, G) and ω = (q, G),

λ eϑΦ(q,G+ij)

eϑΦ(q,G+ij)+eϑΦ(q,G) if ω′ = (q, G+ ij) and ω = (q, G),

ξ eϑΦ(q,G−ij)

eϑΦ(q,G−ij)+eϑΦ(q,G) if ω′ = (q, G− ij) and ω = (q, G),

−∑ω′ 6=ω

q(ω,ω′) if ω′ = ω,

0 otherwise,

(14)

satisfying the Chapman-Kolmogorov forward equation ddtP(t) = P(t)Q so that P(t) = I+Q∆t+

o(∆t). As the Markov chain is time homogeneous, the transition rates are independent of time.

The stationary distribution µϑ : Ω → [0, 1] is then the solution to µϑP = µϑ, or equivalently

µϑQ = 0 [cf. e.g. Norris, 1998].

With the potential function Φ(q, G) of Proposition 1 we can derive the stationary distribution

in the form of a Gibbs measure [cf. Grimmett, 2010].16

Proposition 2. The dynamic process (ωt)t∈R+ with states ωt ∈ Ω = Qn×Gn induces an ergodicMarkov chain with a unique stationary distribution µϑ : Ω → [0, 1] such that limt→∞ P(ωt =(q, G)|ω0 = (q0, G0)) = µϑ(q, G). The distribution µϑ is given by

µϑ(q, G) =eϑ(Φ(q,G)−m ln( ξ

λ))

∑G′∈Gn

∫Qn dq′eϑ(Φ(q′,G′)−m′ ln( ξ

λ)). (15)

From Proposition 2 we know that the Markov chain is ergodic, so that the Ergodic Theorem

applies [cf. e.g. Norris, 1998], stating that

P

(limt→∞

1

t

∫ t

01ωs∈Ads = µϑ(A)

)= 1, (16)

for any measurable set A ∈ Ω, so that long-run averages of sample paths converge to the invariant

distribution. Further note that the stationary distribution µϑ(q, G) in Equation (15) does not

depend on the output adjustment rate χ, nor the link adjustment rates, when the rates for link

creation and removal are the same, i.e. ξ = λ.17

16Observe that the characterization of the stationary state in Proposition 2 also holds if we were to allow fornegative values of ρ as for example in public goods games on networks [Bramoulle et al., 2014].

17A similar independence of the stationary distribution on the adjustment rates can be found in the modelsanalyzed in Mele [2010a] and Chandrasekhar and Jackson [2012].

10

Page 11: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

In the limit of vanishing noise, when ϑ→ ∞, the (stochastically stable) states in the support

of µϑ are given by [Kandori et al., 1993]

limϑ→∞

µϑ(q, G)

> 0, if Φ(q, G) ≥ Φ(q′, G′), ∀q′ ∈ Qn, G′ ∈ Gn,

= 0, otherwise.(17)

In the following we will denote by µ(q, G) ≡ limϑ→∞ µϑ(q, G).18 Moreover, we will set λ = ξ. The

stationary distribution µϑ(q, G) can be further analyzed by computing the partition function19

Zϑ =∑

G∈Gn

Qn

dqeϑΦ(q,G), (18)

so that we can write µϑ(q, G) = 1ZϑeϑΦ(q,G). We next compute the probability of observing a

network G given a specified output distribution q.20,21 This is also the probability of observing

the network G when the Markov chain is initialized at q and the output adjustment rate χ is set

to zero.

Proposition 3. The probability of observing a network G ∈ Gn, given an output distributionq ∈ Qn is determined by the conditional distribution

µϑ(G|q) = µϑ(q, G)

µϑ(q)=

n∏

i<j

eϑaij(ρqiqj−ζ)

1 + eϑ(ρqiqj−ζ), (19)

which is equivalent to the probability of observing an inhomogeneous random graph with linkprobability22

pϑ(qi, qj) =eϑ(ρqiqj−ζ)

1 + eϑ(ρqiqj−ζ). (20)

Moreover, in the limit of ϑ→ ∞, the stochastically stable network G with µ(q, G) > 0 is a nestedsplit graph in which firms i and j are linked if and only if ρqiqj > ζ.

Proposition 3 shows that the stochastically stable network (letting ϑ → ∞) is a nested split

graph, which is characterized by the fact that the neighborhood of every node is contained in

the neighborhoods of the nodes with higher degrees [cf. Konig et al., 2014a; Mahadev and Peled,

1995]. An immediate consequence is that if all firms produce at the same output level, given by

qi = q0 with q0 ∈ Q for all i = 1, . . . , n, then the stochastically stable network is the complete

18 Note also that in the limit of vanishing noise, ϑ → ∞, the Markov chain (ωt)t∈R+ is equivalent – in terms ofits stationary distribution – to a Markov chain (ω′

t)t∈R+ in which we replace the output adjustment process withone where firms choose simultaneously the Nash equilibrium output levels. See also Corollary 1 in Appendix A.

19See Park and Newman [2004] for an excellent discussion in the context of exponential random graphs.20For a discussion of inhomogeneous random graphs see Bollobas et al. [2007, 2001]; Van Der Hofstad [2009], the

“hidden variables” model studied in Boguna and Pastor-Satorras [2003], and the “β-model” analyzed in Chatterjeeet al. [2011]; Graham [2014]. Observe that the complementary problem of determining the distribution of randomvariables that depend only on their neighbors in a given network G is associated with a Markov random field, whosedistribution is a Gibbs measure (by the Hammersley-Clifford theorem), and can be decomposed into a sum over allcliques in G (see Besag [1974] and Kolaczyk [2009, Chap. 8] as well as Rue and Held [2005]).

21Proposition 3 has important implications. Numerous empirical studies have shown that the distribution ofoutput levels among firms tends to follow a power-law distribution (Zipf’s law) [Axtell, 2001; Gabaix, 1999; Growiecet al., 2008; Stanley et al., 1996]. If the output levels qi and qj in the link probability of Equation (20) are distributedaccording to a power-law, then we obtain the so called fitness model analyzed in Boguna and Pastor-Satorras [2003];Caldarelli et al. [2002]. These models also refer to random threshold graphs [Diaconis et al., 2008; Ide and Konno,2007; Ide et al., 2010]. In Appendix 4 we discuss further examples.

22Note that due to the endogenous quantities qi and qj in the link probability in Equation 20 standard techniquesto estimate inhomogeneous random graphs as for example used in Graham [2014] do not apply here.

11

Page 12: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

0 1 2 3 4 50

1

2

3

4

5

6

7

8

9

ζd

ϑ = 0.1

ϑ = 1

ϑ = 10

ϑ = 50

ϑ = 100

Figure 1: The average degree d as a function of the linking cost ζ for a fixed, homogeneous output distributionqi = 1 for all i = 1, . . . , n with χ = 0, b = 0, ρ = 2, ν = 1 and n = 10. The critical linking cost is ζ∗ = ρq20 = 2 isindicated with a vertical dashed line. Dashed lines indicate the theoretical prediction of Proposition 7.

graph Kn if ρq20 > ζ and it is given by the empty graph Kn if ρq20 < ζ. The transition from the

empty to the complete graph that occurs at ζ∗ = ρq20 is shown in Figure 1 for different values of

ϑ and n = 10 nodes. Note that nested split graphs are paramount examples of core-periphery

networks. The core-periphery structure of R&D alliance networks has also been documented

empirically in Kitsak et al. [2010] and Rosenkopf and Schilling [2007]. Our model thus provides a

theoretical explanation for why real-world R&D networks exhibit such a core-periphery structure.

Finally, note that the results of Proposition 3 also hold in the case of heterogeneous firms (when

η is replaced with ηi) as discussed in supplementary Appendix E.1.

We next compute the marginal asymptotic distribution across networks G ∈ Gn.

Proposition 4. The marginal distribution µϑ : Gn → [0, 1] across networks G ∈ Gn is given by

µϑ(G) ∼ 1

(ϑν

π

)−n2∣∣∣∣In +

b

2νB− ρ

2νA

∣∣∣∣− 1

2

eϑΦ(G,q∗),

as ϑ→ ∞, where B is an n× n matrix of ones with a zero diagonal.

The next proposition allows us to compute the stationary output levels in the limit of small

noise.

Proposition 5. Let η∗ ≡ η/(n− 1) and ν∗ ≡ ν/(n− 1). Then the average output q =∑n

i=1 qi isequal to q∗ with probability one in the large ϑ limit, where q∗ is the root of

(b+ 2ν∗)q − η∗ =ρ

2

(1 + tanh

2

(ρq2 − ζ

)))q, (21)

which has at least one solution if b + 2ν∗ > ρ. Further, from Equation (21) it follows that forϑ→ ∞ (in the stochastically stable equilibrium) we get

q∗ =

η∗

b+2ν∗−ρ , if ζ < ρ(η∗)2

(b+2ν∗)2 ,η∗

b+2ν∗−ρ ,η∗

b+2ν∗

, if ρ(η∗)2

(b+2ν∗)2< ζ < ρ(η∗)2

(b+2ν∗−ρ)2,

η∗

b+2ν∗ , if ρ(η∗)2

(b+2ν∗−ρ)2< ζ.

(22)

Note that the stationary output levels in Proposition 5 are increasing in ρ and η, and decreasing

in ζ and b (cf. Figure 2). A phase diagram illustrating the regions with a unique and with

12

Page 13: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

b q - Η

Ρ q

Η

b

Η

b- Ρ

Ζ1 Ζ2 Ζ3

0 1 2 3 4 5

-5

0

5

10

q

Η

b- Ρ

Η

Ρ

J3

J2

J1

0 20 40 60 80 100 120

5.5

6.0

6.5

7.0

q

b q - Η3

b q - Η2

b q - Η1

Ρ q

0 1 2 3 4 5-10

-5

0

5

10

15

b- ΡΗ

b

J1

J3J2

0 20 40 60 80 1000

1

2

3

4

5

6

Η

q

Figure 2: (Top left panel) The right hand side of Equation (21) for different values of ζ1 = 25, ζ2 = 10, ζ3 = 3 andb = 4, ρ = 2, η = 6.5, ν = 0 and ϑ = 10. (Top right panel) The values of q solving Equation (21) for differentvalues of ζ with b = 1.48, ρ = 0.45 and ϑ1 = 49.5, ϑ2 = 0.495, ϑ3 = 0.2475. (Bottom left panel) The right handside of Equation (21) for different values of η1 = 2.5, η2 = 6.5, η3 = 10 and b = 4, ρ = 2, ζ = 10 and ϑ = 10.(Bottom right panel) The values of q solving Equation (21) for different values of η with b = 4, ρ = 2 and ϑ1 = 10,ϑ2 = 0.26, ϑ3 = 0.2.

multiple equilibria according to Equation (21) can be seen in Figure 3. In the following we will

refer to the two possible output levels in the limit of ϑ→ ∞ as the high and the low equilibrium,

respectively.

We can also compute the marginal distribution of firms’ output levels as stated in the following

proposition.

Proposition 6. The marginal distribution µϑ : Qn → [0, 1] of the firms’ output levels q ∈ Qn isgiven by

µϑ(q) =

(2π

ϑ

)−n2

|−∆H (q∗)| 12 exp−1

2ϑ(q− q∗)⊤(−∆H (q∗))(q− q∗)

+ o

(‖q− q∗‖2

),

(23)where

(∇H (q))ii = −2ν + (n− 1)ϑρ2

4q2

(1− tanh

2

(ρq2 − ζ

))2),

while for j 6= i we have that

(∇H (q))ij = −b+ ρ

2

(1 + tanh

2

(ρq2 − ζ

)))(1 +

ϑρ

2q2(1− tanh

2

(ρq2 − ζ

)))),

and q∗ is given by the root of Equation (21). When ϑ→ ∞ we have that (∇H (q))ii = −2ν and(∇H (q))ij = −b for j 6= i. Further, for large n, the firms’ output levels become independent, andthe marginal distribution of a firm i’s output qi is given by a Gaussian distribution with density

13

Page 14: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

ρ

b

high equi l ibrium

multip l e equi l ibria

low equi l ibrium

Figure 3: A phase diagram illustrating the regions with a unique and with multiple equilibria according to Equation(21) for varying values of b ∈ 0, . . . , 0.01 and ρ ∈ 0, . . . , 0.01 with n = 100, ν = 0.5, η = 100 ϑ = 1 and ζ = 50.

function

µ(qi) = limϑ→∞

µϑ(qi) =1√2πσ2

e−(qi−q∗)2

2σ2 ,

where the asymptotic variance is given by

σ2 = limϑ→∞

− 1

ϑ∇H (q∗)−1 =

1

ϑ

2ν + (n− 2)b

4ν2 + 2(n− 2)νb− (n− 1)b2.

That is, in the limit of large ϑ, q is asymptotically normally distributed with mean q∗ and varianceσ2In.

The next proposition determines the expected number of links.

Proposition 7. The expectation of the average degree d = 1n

∑ni=1 di is given by

Eϑ(d)= (n− 1)

(1 + tanh

2

(ρ(q∗)2 − ζ

)))+

1

ϑ

((c1 + (n− 2)c2)c3 − (n− 1)c2c4)

n(c21 + (n− 2)c1c2 − (n− 1)c22), (24)

where q∗ is given by Equation (21) and

c1 ≡ −2ν + (n− 1)ϑρ2(q∗)2

4

(1− tanh

2

(ρ(q∗)2 − ζ

))2),

c2 ≡ −b+ ρ

2

(1 + tanh

2

(ρ(q∗)2 − ζ

)))(1 +

ϑρ(q∗)2

2

(1− tanh

2

(ρ(q∗)2 − ζ

)))),

c3 ≡(n− 1)ϑ2ρ2

4tanh

(1− tanh

2

(ρ(q∗)2 − ζ

))2),

c4 ≡ −ρϑ4

(1− tanh

2

(ρ(q∗)2 − ζ

))2)(

1− ϑρ(q∗)2 tanh

2

(ρ(q∗)2 − ζ

))).

Moreover, in the limit ϑ→ ∞ the expected average degree in the high equilibrium is limϑ→∞ Eϑ(d)=

n− 1, which corresponds to a complete graph, Kn, and zero in the low equilibrium, which corre-sponds to an empty graph, Kn.

Proposition 7 shows that the expected number of links is increasing in ρ, q and η, and decreasing in

ζ and b (by reducing the equilibrium quantity q). The left panel in Figure 4 shows the convergence

of the average degree to its stationary value from Proposition 7 for varying values of ϑ.

14

Page 15: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

0 10 20 30 40 500

2

4

6

8

10

12

14

t

d

100

101

102

10−3

10−2

10−1

100

d

P(d)

ϑ = 0.0

ϑ = 0.1

ϑ = 0.2

Figure 4: (Left panel) The time evolution of the average degree d with the dashed horizontal lines indicating thestationary solution from Proposition 7 for varying values of ϑ ∈ 0, 0.1, 0.2 with λ = ξ = χ = 1, η = 15, ν = 1,b = 0.5 ρ = 1 and n = 25 firms. (Right panel) The stationary degree distribution P (k) for the same parametervalues. The dashed lines indicate the solution from Proposition 8. The dashed lines indicate a Poisson, the dottedlines a binomial distribution.

Using the asymptotic independence of the firm’s output levels allows us to characterize the

degree distribution as a mixture of independent Poisson distributed random variables. This is

stated in the following proposition.

Proposition 8. In the sparse graph limit, the degree di(G) of firm i follows a mixed Poissondistribution with mixing parameter ν(q) =

∫Q p(q, q

′)µϑ(dq′), and for any 1 < m ≤ n the degreesd1(G), . . . , dm(G) are asymptotically independent. Moreover, let the empirical degree distributionbe given by P ϑ(k) = 1

n

∑ni=1 1di(G)=k, and denote by P ϑ(k) ≡ E

(P ϑ(k)

), then

P ϑ(k) = E

(e−d(q1)d(q1)

k

k!

)(1 + o(1)) ,

where the expected degree of a firm with output q is d(q) = nν(q). Further, when the probabilitymeasure µϑ concentrates on q∗ in the limit of large ϑ, we obtain

limϑ→∞

P ϑ(k) =e−d(q∗)d(q∗)k

k!(1 + o(1)) ,

where d(q∗) = np(q∗, q∗) and q∗ is the solution to Equation (21), that is, a Poisson distributionwith parameter d(q∗).

The right panel in Figure 4 shows the degree distribution for varying values of ϑ.

We next analyze two firms’ degree correlations, i.e. correlations between the degree of a

firm i and its neighbors’ degrees. Due to the independence of the degrees (cf. Proposition 8),

the average degree of neighbors of a firm with degree k can be written as knn(k) =∑n

k′=1 k′

P (d2(G) = k′ − 1| a12 = 1, d1(G) = k) [cf. Boguna and Pastor-Satorras, 2003; Hurd, 2015]. In the

case that knn(k) is an increasing function of k we speak of assortative mixing, while for knn(k)

decreasing with k we have dissortative mixing [Newman, 2002].

Proposition 9. In the sparse graph limit when ϑ becomes large the average nearest neighborconnectivity knn(k) of a firm with degree k converges to a constant 1 + d(q∗), with d(q∗) =np(q∗, q∗), where the connection probability p is given by Equation (20) and q∗ is given by Equation(21).

Proposition 9 shows that the average nearest neighbor connectivity knn(k) is asymptotically

constant, and hence neither assortative nor dissortative.

15

Page 16: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

We next turn to an analysis of three firms degree correlations. The clustering coefficient

C(k) of a firm with degree k is defined as the probability that a firm of degree k in the network

G is connected to firms which are themselves connected [cf. Watts and Strogatz, 1998], that is

C(k) = P (a23 = 1| a12 = 1, a13 = 1, d1(G) = k), where we have used the independence of the

degrees (cf. Proposition 8). The following proposition shows that the clustering coefficient,

similarly to the average nearest neighbor connectivity, is asymptotically constant.

Proposition 10. In the sparse graph limit when ϑ becomes large, the clustering coefficient C(k)of a firm with degree k converges to a constant p(q∗, q∗), where the connection probability p isgiven by Equation (20) and q∗ is given by Equation (21).

Propositions 9 and 10 illustrate that asymptotically, we do not find significant correlations in our

simple model with homogeneous firms, spillovers and linking costs. In the following section we

discuss that, by relaxing these assumptions, we can generate networks that allow for correlations

consistent with what we observe in the data.

4. Extensions

The model presented so far can be extended in a number of directions which are described in

greater detail in supplementary Appendix 4.

First, in Appendix E.1 we consider ex ante heterogeneity among firms in the variable pro-

duction cost ci ≥ 0 for i = 1, . . . , n [see also Banerjee and Duflo, 2005], expressing their different

technological and organizational capabilities. Similarly to above, we can characterize the equi-

librium states using a Gibbs measure. Moreover, the equilibrium networks have a particular

structure referred to nested split graphs, in which firms with lower marginal costs are more cen-

tral.23 Further, when the firms’ productivities are Pareto distributed then also the firms’ output

levels follow a Pareto distribution, consistent with the empirical data [cf. e.g. Gabaix, 2009; Konig

et al., 2012; Melitz et al., 2008].

Second, in supplementary Appendix E.2 we assume that firms with higher productivity incur

lower collaboration costs. One can show that a similar equilibrium characterization using a

Gibbs measure as in the previous sections is possible. Moreover, in the special case of firms’

productivities being Pareto distributed, one can show that the degree distribution also follows a

Pareto distribution,24 consistent with previous empirical studies of R&D networks [e.g. Gay and

Dousset, 2005; Powell et al., 2005].25

Third, in supplementary Appendix E.3 we assume that there are heterogeneous spillovers

between collaborating firms depending on their technology portfolios [cf. Cohen and Levinthal,

1990; Griffith et al., 2003]. For example, assume that firms can only benefit from collaborations

if they have at least one technology in common. Then one can show that our model is a general-

ization of a random intersection graph [cf. Deijfen and Kets, 2009; Newman, 2003; Singer-Cohen,

23See supplementary Appendix B for the definition of nested split graphs.24In particular, assume that the productivities s are distributed as a power-law s−γ with exponent γ. Then on

can show that the asymptotic degree distribution is also power-law distributed, P (k) ∼ k− γ

γ−1 , with exponent γ

γ−1.

25We note that also other statistics such as the clustering degree distribution can be computed. See supplementaryAppendix E.3 for further details. In particular, under the assumptions of a power-law productivity distribution,we can generate two-vertex and three-vertex degree correlations, such as a decreasing average nearest neighborconnectivity, knn(d), indicating a dissortative network, as well as a decreasing clustering degree distribution, C(d),with the degree d.

16

Page 17: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

1995], for which positive degree correlations can be obtained (assortativity).

The above extensions show that our model is capable of generating networks with properties

that can be observed in real world networks, such as power-law degree distributions and various

degree correlations, once we introduce firm heterogeneity. This counteracts general criticism

of (simple variants of) exponential random graphs, which often have difficulties in generating

networks with empirically relevant characteristics [Chandrasekhar and Jackson, 2012; Chatterjee

et al., 2013].26

5. Efficiency

Social welfare, W , is given by the sum of consumer surplus, U , and firms’ profits, Π. Consumer

surplus is given by U(q) = 12

∑ni=1 q

2i +

b2

∑ni=1

∑nj 6=i qiqj . In the special case of non-substitutable

goods, when b → 0, we obtain U(q) = 12

∑ni=1 q

2i , while in the case of perfectly substitutable

goods, when b → 1, we get U(q) = 12 (∑n

i=1 qi)2. The producer surplus is given by aggregate

profits Π(q, G) =∑n

i=1 πi(q, G). As a result, total welfare is equal to

W (q, G) = U(q) + Π(q, G) =1

2

n∑

i=1

q2i +b

2

n∑

i=1

n∑

j 6=i

qiqj +

n∑

i=1

πi(q, G)

=1

2

n∑

i=1

q2i +b

2

n∑

i=1

n∑

j 6=i

qiqj +

n∑

i=1

ηqi − νq2i − b

n∑

j 6=i

qiqj + ρqi

n∑

j=1

aijqiqj

− 2ζm.

(25)

Note that welfare W (q, G) is related to the potential Φ(q, G) of Proposition 1 as W (q, G) =12

∑ni=1 q

2i + b

2

∑ni=1

∑nj 6=i qiqj −

∑ni=1(η − νqi)qi + 2Φ(q, G). Hence, the states maximizing the

potential Φ(q, G) are not necessarily identical to the ones maximizing welfareW (q, G). The latter

are determined in the following proposition,27 and we find that the decentralized equilibrium is

efficient only when the linking costs are sufficiently low or high.

Proposition 11. The efficient network G∗ ∈ Gn and output profile q∗ ∈ Qn are given by

(q∗, G∗) =

((η∗

b+2ν∗−ρ , . . .),Kn

), if ζ < ρ(η∗)2

(b+2ν∗)2,((

η(n1−1)(b−ρ)+2ν−1 , . . . , 0, . . .

),Dn1,n−n1

), if ρ(η∗)2

(b+2ν∗)2< ζ < ρ(η∗)2

(b+2ν∗−ρ)2,((

η∗

b+2ν∗ , . . .),Kn

), if ρ(η∗)2

(b+2ν∗−ρ)2< ζ,

(26)

where η∗ = η/(n− 1), ν∗ = ν/(n− 1), Dn1,n−n1 is the dominant group architecture consisting ofa clique of size n1 in which all firms are connected and all remaining firms being isolated, and

26See Konig [2011] for an alternative growing network formation model that is also capable of reproducinga variety of statistics including a power-law degree distribution, a decreasing clustering degree distribution, anincreasing or decreasing average nearest neighbor degree distribution and a component size distribution decayingas a power-law, reproducing the distributions that can be observed in many real world networks.

27Proposition 11 characterizes the the efficient outcome in the first best solution where the social planner canset both, the production levels as well as the network of collaborations between them. A characterization of thesecond best solution, in which the planner chooses the network, but output levels are chosen in a decentralizedmanner by profit maximizing firms is studied in Konig et al. [2014b]. Moreover, an equilibrium characterizationwith an exogenously given network is provided in Appendix D.

17

Page 18: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

n1 = argmaxk∈0,...,nW (k) with

W (k) =η2k(b(k − 1) + 2ν − 1)

2((b− ρ)(k − 1) + 2ν − 1)2− ζk(k − 1), (27)

with the property that n1 is increasing with decreasing ζ, b and increasing with ρ and η.28

As we might expect, with increasing cost, the efficient network becomes more sparse. In Section

7.1 we will investigate the impact on welfare when a firm is removed from the network, and in

Section 7.2 when they are merged to become a single firm. Finally, in Section 7.3 we analyze the

welfare improving impact of a subsidy on firms’ R&D collaboration costs.

6. Empirical Implications

6.1. Data

To get a comprehensive picture of alliances we use data on interfirm R&D collaborations stemming

from two sources which have been widely used in the literature [cf. Schilling, 2009]. The first is

the Cooperative Agreements and Technology Indicators (CATI) database [cf. Hagedoorn, 2002].

The database only records agreements for which a combined innovative activity or an exchange

of technology is at least part of the agreement. Moreover, only agreements that have at least two

industrial partners are included in the database, thus agreements involving only universities or

government labs, or one company with a university or lab, are disregarded. The second is the

Thomson Securities Data Company (SDC) alliance database. SDC collects data from the U. S.

Securities and Exchange Commission (SEC) filings (and their international counterparts), trade

publications, wires, and news sources. We include only alliances from SDC which are classified

explicitly as R&D collaborations.29

We then merged the CATI database with the Thomson SDC alliance database. For the match-

ing of firms across datasets we adopted and extended the name matching algorithm developed as

part of the NBER patent data project [Atalay et al., 2011; Trajtenberg et al., 2009].30

The systematic collection of inter-firm alliances in CATI started in 1987 and ends in 2006.

As the CATI database includes only collaborations until the year 2006, we take this year as the

base year for our empirical analysis. We then construct the R&D alliance network by assuming

that an alliance lasts for 5 years [similar to e.g. Rosenkopf and Padula, 2008]. An illustration of

the observed R&D network can be seen in Figure 5.

The combined CATI-SDC database provides the names for each firm in an alliance, but does

not contain information about their output or R&D expenditures. We therefore matched the

firms’ names in the CATI-SDC database with the firms’ names in Standard & Poor’s Compustat

US and Global fundamentals databases, as well as Bureau van Dijk’s Orbis database, to obtain

information about their balance sheets and income statements [cf. e.g. Bloom et al., 2013]. For

the purpose of matching firms across databases, we employ the above mentioned name matching

algorithm. We could match roughly 25% of the firms in the alliance data. From our match

between the firms’ names in the alliance database and the firms’ names in the Compustat and

28See also Figure 12 in Appendix A.29For a comparison and summary of different alliance databases, including CATI and SDC, see Schilling [2009].30See https://sites.google.com/site/patentdataproject.

18

Page 19: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

Figure 5: The largest connected component in the observed network of R&D collaborations in the year 2006. Theshade and size of a node indicates its R&D expenditures. The 5 largest firms in terms of their R&D expendituresare mentioned in the graph.

Orbis databases, we obtained a firm’s R&D expenditures, sales, employment, primary industry

codes and location. From a firm’s sales and employment we then computed its labor productivity

as sales relative to the number of employees. Moreover, in order to measure R&D effort we use

the square root transformed R&D expenditures consistent with our model (see Section 2). Some

descriptive statistics are shown in Table 1.

We further identified the patent portfolios of the firms in our dataset using the EPOWorldwide

Patent Statistical Database (PATSTAT) [Hall et al., 2001; Jaffe and Trajtenberg, 2002; Thoma

et al., 2010]. It includes bibliographic details on patents filed to 80 patent offices worldwide,

covering more than 60 million documents. Hence filings in all major countries and the the World

International Patent Office are covered. We matched the firms in our data with the assignees in

the PATSTAT databse using the above mentioned name matching algorithm. We only consider

granted patents (or successful patents), as opposed to patents applied for, as they are the main

drivers of revenue derived from R&D [Copeland and Fixler, 2012]. We obtained matches for

roughly 30% of the firms in the data. The technology classes were identified using the main

international patent classification (IPC) numbers at the 4 digit level.

In order to further focus our analysis, we restrict our sample to the sector with the largest

number of R&D collaborations, which is the “chemicals and allied products” sector with SIC

code 28. This sector also has the smallest percentage of R&D collaborations to other two digit

sectors. Sector SIC 28 contains nine sub-sectors, ranging from industrial inorganic chemicals

(SIC 281) to miscellaneous chemical products (SIC 289). The summary statistics of these sectors

and the number of R&D collaborations are shown in Table 1 and Figure 6. After dropping firms

with missing observations, sector SIC 28 has 351 firms in our sample, with a total of 141 R&D

collaborations. Among the nine sub-sectors, the “drugs development” sector (SIC 283) is the

largest one, which consists of 259 firms and 121 within sector R&D collaborations.

19

Page 20: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

Table 1: Descriptive statistics.

SIC Square Root of R&D Productivity Log Patents

code # of firms mean min max mean min max mean min max

28 351 5.0654 0.1575 14.9692 1.3385 0.0002 10.1108 4.7711 0.0000 11.8014

280 19 7.7478 3.0046 14.9692 2.3405 1.4285 3.4879 7.0859 0.6931 11.8014281 8 4.7216 1.3762 8.6660 2.0951 0.8124 4.5133 6.9610 2.3026 9.9499282 22 6.0892 1.3529 13.4264 2.4637 0.1667 5.7551 6.7015 2.9957 10.3031283 259 4.6305 0.1575 14.9692 1.0326 0.0002 6.5232 4.1962 0.0000 10.8752284 12 8.7195 2.5671 14.9692 1.4869 0.6021 2.6405 7.7903 3.9890 10.9748285 5 8.3778 4.2773 14.9692 1.5160 1.2591 1.7099 8.4910 7.1325 10.3017286 8 4.5833 0.6652 9.0553 3.9443 1.1249 10.1108 3.6924 0.6931 6.6174287 8 4.9178 0.6850 14.9692 1.8069 0.0672 2.7076 3.9510 0.6931 10.6792289 10 3.7154 0.7348 6.0545 1.5494 0.0760 2.9324 5.3012 0.6931 9.8807

Note: The square root of firm’s R&D expenditure (measured by million dollars) serves as the proxy of firm’soutput. Firm’s productivity is measured by the ratio of sales to employment number. The log of the patentnumber is a control in the linking cost function.

B

i

j

50 100 150

20

40

60

80

100

120

140

160

180

Number of R&D collaborations within SIC sector 28.

280 281 282 283 284 285 286 287 289

280 0 1 2 13 0 0 0 0 0281 0 0 0 0 0 0 0 0282 1 1 0 0 0 0 0283 121 0 2 0 0 0284 0 0 0 0 0285 0 0 0 0286 0 0 0287 0 0289 0

Figure 6: (Left panel) The empirical competition matrix B. (Right panel) The number of R&D collaborationswithin sector SIC 28. Among the nine sub-sectors, the “drugs development” sector (SIC 283) is the largest one,which consists of 259 firms and 121 R&D collaborations.

20

Page 21: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

6.2. Firm Heterogeneity

To account for the firm level heterogeneity that we observe in the data, for each firm i we extend

the profit function of Equation (8) in Section 2 to accommodate heterogeneous marginal costs of

production, substitution, and heterogeneous technology spillovers (see also Section 4), so that

πi(q, G) = ηiqi −1

2q2i − b

n∑

j 6=i

bijqjqi + ρn∑

j=1

fkijaijqjqi − ζi(G), k ∈ J,M. (28)

In Equation (28), we have normalized ν in Equation (8) to 1/2. The term ηi = a − ci includes

a sector fixed effect, a, and heterogeneous marginal production costs, ci ≥ 0 (see supplementary

Appendix E.1 for further details). The substitution effect is considered within 3-digit SIC sectors.

Firm i faces substitution effects from any firm j which is in the same sector, i.e., bij = 1, and the

substitution matrix is defined by B = (bij)1≤i,j≤n with bii = 0. An illustration of the empirically

observed sectoral composition can be seen in Figure 6.

In Equation (28), we further introduced the weights fkij = fkji, k ∈ J,M, to capture heteroge-neous technology spillovers across firms, where fkij is a measure of technological proximity between

firms i and j (see supplementary Appendix E.3 for further details). Technological proximity is

measured with two alternative metrics. The first is based on Jaffe [1989]. Let Pi represents

the patent portfolio of firm i, where, for each firm i, Pi is a vector whose k-th component, Pik,

counts the number of patents firm i has in technology category k divided by the total number of

technologies attributed to the firm [see also Bloom et al., 2007]. The technological proximity of

firm i and j is then given by

fJij =P⊤

i Pj√P⊤

i Pi

√P⊤

j Pj

. (29)

We denote by FJ the (n× n) matrix with elements (fJij)1≤i,j≤n.

As an alternative measure for technological similarity we consider the Mahalanobis technology

proximity measure introduced by Bloom et al. [2013]. To construct this metric, let N be the

number of technology classes, n the number of firms, and let T be the (N × n) patent shares

matrix with elements Tji = Pji/∑n

k=1 Pki, for all 1 ≤ i ≤ n and 1 ≤ j ≤ N . Further, we

construct the (N × n) normalized patent shares matrix T with elements Tji = Tji/√∑N

k=1 T2ki,

and the (n × N) normalized patent shares matrix across firms is defined by X with elements

Xik = Tki/√∑N

i=1 T2ki. Let Ω = X⊤X. Then the (n × n) Mahalanobis technology similarity

matrix FM = (fMij )1≤i,j≤n is defined as

FM = T⊤ΩT. (30)

We then use either fJij or fMij as a measure for the potential technology spillovers between collab-

orating firms in the profit function of Equation (28).

The total cost of R&D collaborations for firm i is captured by the term ζi(G) =∑n

j=1 aij(ψij−χij), with the pairwise symmetric functions ψij = ψji, specified as ψij = γ⊤cij , and χij =

χji, specified as χij = κtij. The r-dimensional vector of dyadic-specific variables, cij , involves

measures of similarity between firms i and j in terms of sector, location, technology, research

21

Page 22: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

quality, etc. [cf. Lychagin et al., 2010; Marshall, 1890].31 The term tij =∑n

k 6=i,j aikajk counts

the number of common collaborators of firms i and j. It allows for R&D collaborations to be less

costly between companies that have mutual third-party collaborators [cf. Hanaki et al., 2010].32

The potential function Φ: Rn+ × Gn → R corresponding to Equation (28) is then given by33

Φ(q, G) =n∑

i=1

(ηiqi −

1

2q2i

)− b2

n∑

i=1

n∑

j 6=i

bijqiqj+ρ

2

n∑

i=1

n∑

j 6=i

fijaijqiqj−γ⊤

2

n∑

i=1

n∑

j 6=i

aijcij+κ

3

n∑

i=1

n∑

j 6=i

aijtij.

(31)

In vector-matrix form this is

Φ(q, G) = η⊤q− 1

2q⊤M(G)q − 1

2tr(A(Ψ⊤)) +

1

3tr(A(χ⊤)), (32)

where η = (η1, η2, · · · , ηn)⊤, Ψ = (ψij)1≤i,j≤n and χ = (χij)1≤i,j≤n. In the following, we assume

that firm i’s fixed effect ηi is captured by Xiδ, with Xi denoting a 1 × l vector characterizing

firm i’s productivity. We denote by M(G) ≡ In + bB − ρ(A Fk), where Fk is the matrix with

elements fkij for 1 ≤ i, j ≤ n, k ∈ J,M, and denotes the Hadamard elementwise matrix

product.34 The stationary distribution of the Markov process is then given by the Gibbs measure

µϑ(q, G) of Equation (15) in Proposition 2 with the potential function Φ(q, G) of Equation (31).

Note, however, that the conditional link independence shown in Proposition 3 only holds in the

absence of cyclic triangles effects, i.e. when we set κ = 0.

6.3. Exogenous Networks

As a benchmark, we first consider the case where the R&D network G is exogenously given, that

is, when the link update rates vanish (setting ξ = 0). In this exogenous cases, assuming that the

parameter ϑ is large, output levels chosen by firms will be close to the potential maximizer q∗.

We can then make the following Taylor expansion for the potential function

Φ(q, G) = Φ(q∗, G) + (q− q∗)⊤∇Φ(q∗, G) +1

2(q− q∗)⊤∆Φ(q∗, G)(q − q∗) + o(‖q− q∗‖),

where the potential function Φ(q, G) is given by Equation (32). Noting that the gradient vanishes

at q∗, i.e. ∇Φ(G,q∗) = 0, and dropping terms of the order o(‖q− q∗‖) we can write

Φ(q, G) ≈ Φ(q∗, G) +1

2(q− q∗)⊤∆Φ(q∗, G)(q − q∗).

31See also Appendix E.2 for a simple example.32Hanaki et al. [2010] argue that the existence of mutual collaborators may enhance the effectiveness of penalties

and improve the appropriability of the outcomes of joint R&D projects, and that firms can use preexisting collab-orations as conduits of information about the reliability of potential collaboration partners. The authors furtherfind empirical evidence that R&D collaborations are more likely to be undertaken between companies that havemutual third-party collaborators.

33Note that the potential function has the property that Φ(q, G + ij) − Φ(q, G) = ρfkijqiqj − γ⊤cij + 2κtij =

πi(q, G + ij) − πi(q, G) = πj(q,G + ij) − πj(q, G). A similar relationship holds for the removal of a link, G − ij,or quantity adjustment.

34Let A and B be m× n matrices. The Hadamard product of A and B is defined by [A B]ij = [A]ij [B]ij forall 1 ≤ i ≤ m, 1 ≤ j ≤ n, i.e. the Hadamard product is simply an element-wise multiplication.

22

Page 23: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

Inserting into the Gibbs measure gives

µϑ(q, G) =1

ZϑeϑΦ(G,q) ≈ 1

ZϑeϑΦ(q∗,G) exp

1

2ϑ(q− q∗)⊤∆Φ(q∗, G)(q − q∗)

.

The partition function can be written as

Zϑ =

Qn

dqneϑΦ(q,G)

≈ eϑΦ(q∗,G)

Qn

dqn exp

1

2ϑ(q− q∗)⊤∆Φ(q∗, G)(q − q∗)

= eϑΦ(q∗,G)

(2π

ϑ

)n2

|−∇Φ(q∗, G)|− 12 .

Therefore, the above Laplace approximation of the Gibbs measure yields [cf. Wong, 2001, Theorem

3, p. 495]

µϑ(q, G) ≈(2π

ϑ

)−n2

|−∇Φ(q∗, G)| 12 exp−1

2ϑ(q− q∗)⊤(−∆Φ(q∗, G))(q − q∗)

.

The gradient can be written as ∇Φ(q, G) = η−M(G)q and the Hessian is given by ∆Φ(q, G) =

−M(G). The FOCs are then given by ∇Φ(q∗, G) = η − M(G)q∗ = 0 so that q∗ = M(G)−1η

when the matrix M(G) is invertible and η = Xδ.35 Hence, the conditional density of output

levels q given the (exogenous) network G can be rewritten as

µ(q|G) ≈(2π

ϑ

)−n2

|M(G)| 12 exp−1

2

(q−M(G)−1Xδ

)⊤M(G)(q −M(G)−1Xδ)

, (33)

which implies that the output q, conditional on the R&D network G, follows a Gaussian normal

density function with mean M(G)−1Xδ and variance M(G)−1.

Given the conditional density function of Equation (33), we estimate the unknown parameters,

θexo = (b, ρ, δ⊤) ∈ Θexo, by a Bayesian approach [cf. e.g. Hsieh and Lee, 2014; Smith and LeSage,

2004]. In the Bayesian posterior analysis, the joint posterior (probability) density function of θexo

can be constructed by

P (θexo|q, G) ∝ π(θexo) · µ(q|G,θexo), (34)

where π(·) represents the prior density function. We assume independence between prior dis-

tributions, which are (b, ρ) ∼ U2(O) and δ ∼ Nl(δ0,∆0). The prior for (b, ρ) is a multivariate

uniform (uninformative) distribution with the values restricted to the compact subset O of R2,

in which the matrix M(G) is positive definite.36 The prior for δ is the normal distribution, which

is a conjugate prior commonly used in the Bayesian literature [cf. e.g. Koop et al., 2007; Robert

and Casella, 2004]. In the following applications, we fix δ0 = 0 and ∆0 = 100Il to ensure prior

density is relatively flat over the range of the data.

35The existence of a unique potential maximizer q∗ is guaranteed when the output levels are bounded and thematrix M(G) is positive definite [cf. Byong-Hun, 1983; Konig et al., 2014b].

36This requirement implies that M(G) is invertible and M(G)−1 is also positive definite. M(G) is positivedefinite if and only if all of its eigenvalues are positive.

23

Page 24: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

Directly drawing samples from the posterior distribution, P (θexo|q, G), would be difficult due

to high dimensionality. Therefore, we apply the Metropolis-within-Gibbs sampling procedure:

we first divide unknown parameters into blocks so that drawing from their conditional posterior

distributions is feasible. When we make draws sequentially from these conditional posterior

distributions by cycling, in the limit these draws can be treated as draws from the joint posterior

distribution. For the conditional posterior distribution of (b, ρ), which is not available in a closed

form, we use the M-H algorithm to draw from this conditional distribution. It has been shown

in Tierney [1994] and Chib and Greenberg [1996] that the combination of Markov chains is still

a Markov chain with the invariant distribution equal to the correct objective distribution.

Estimation result base on Equation (34) is reported in Table 2, serving as a benchmark in

which R&D networks are assumed exogenous. Under homogenous and heterogenous technology

spillover cases, we all find that the R&D complementarity effect (represented by ρ) is significantly

positive, while the sector substitutability effect is significantly negative. Further, in terms of firm’s

productivity, not surprisingly, we find that firms with higher productivity tend to have higher

R&D expenditures [cf. e.g. Cohen et al., 1987].

6.4. Endogenous Networks

In the following subsections, we will discuss two sets of estimation methods for our model –

one based on classical approach (Section 6.4.1) and one based on a Bayesian approach (Section

6.4.2).37

6.4.1. Classical Approach

In the absence of cyclic triangles effects, i.e. when we set κ = 0, conditional link independence

holds (see Proposition 3). The probability of observing a network G ∈ Gn, given an output

distribution q ∈ Qn, is then determined by the conditional distribution (cf. Equation (19))38

µϑ(G|q) = µϑ(q, G)

µϑ(q)=

n∏

i<j

eϑaij(ρfijqiqj−γ⊤cij)

1 + eϑ(ρfijqiqj−γ⊤cij). (35)

The marginal distribution of the firms’ output levels q ∈ Qn for large ϑ is given by (cf. Equation

(23))39

µϑ(q) ≈(2π

ϑ

)−n2

|−∆H (q∗)| 12 exp−1

2ϑ(q− q∗)⊤(−∆H (q∗))(q− q∗)

, (36)

where

(∆H (q))ii = −1 +ϑρ2

4

n∑

j 6=i

f2ijq2j

(1− tanh

2

(ρfijqiqj − γ⊤cij

))2),

37For more details about the implementation of the latter see the supplementary Appendix F.38We note that such a factorization would not be possible if the linking cost would allow for higher order network

effects captured by κ 6= 0. See also the proof of Proposition 3 in Appendix A.39We have introduced the effective Hamiltonian, H (q), implicitly defined by

∑G∈Gn e

ϑΦ(q,G) = eϑH (q). Seealso the proof of Proposition 6 in Appendix A.

24

Page 25: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

and

(∆H (q))ij = −bbij +ρfij2

(1 + tanh

2

(ρfijqiqj − γ⊤cij

)))

×(1 +

ϑρfij2

qiqj

(1− tanh

2

(ρfijqiqj − γ⊤cij

)))),

for j 6= i, and q∗ solves the following system of equations (cf. Equation (21))

qi = ηi +

n∑

j 6=i

(ρfij2

(1 + tanh

2

(ρfijqiqj − γ⊤cij

)))− bbij

)qj. (37)

Equation (35) shows that in the limit of large ϑ, q is asymptotically normally distributed

with mean q∗ and variance-covariance matrix − 1ϑ∆H (q∗)−1. It then follows that the likelihood

of the network G and the quantity profile q in the large ϑ limit can be written as

µϑ(q, G) = µϑ(G|q) · µϑ(q) ≈n∏

i<j

eϑaij(ρfijqiqj−γ⊤cij)

1 + eϑ(ρfijqiqj−γ⊤cij)

×(2π

ϑ

)−n2

|−∆H (q∗)| 12 exp−1

2ϑ(q− q∗)⊤(−∆H (q∗))(q − q∗)

, (38)

where we have inserted Equation (35) for µϑ(G|q) and Equation (36) for µϑ(q). The parameters

θendo = (b, ρ, δ⊤,γ⊤) of the model can then be obtained via maximum likelihood, by maximizing

Equation (38), and the variances from the Fisher information matrix [cf. e.g. Casella and Berger,

2001]:

I(θ) = −E

[∂2

∂θ2log µϑ(q, G)

∣∣∣∣ θ],

where the log-likelihood can be written as follows

lnµϑ(q, G) ≈ 1

2

n∑

i=1

n∑

j 6=i

aij ln

(eϑ(ρfijqiqj−γ

⊤cij)

1 + eϑ(ρfijqiqj−γ⊤cij)

)+ (1− aij) ln

(1− eϑ(ρfijqiqj−γ

⊤cij)

1 + eϑ(ρfijqiqj−γ⊤cij)

)

+n

2ln

)+

1

2ln |−∆H (q∗)| − 1

2ϑ(q− q∗)⊤(−∆H (q∗))(q− q∗), (39)

with q∗ solving Equation (37). Note that finding the parameters from maximizing the likelihood

function in Equation (38) (respectively, the log-likelihood of Equation (39)) represents an instance

of a bilevel optimization problem, which we can be solved using a nested fixed point algorithm

or a mathematical programming with equilibrium constraints (MPEC) approach [cf. Luo et al.,

1996; Su and Judd, 2012].

6.4.2. Bayesian Exchange Algorithm

When both the quantities q and the network G are endogenous, the stationary distribution

µ(q, G) is determined by Equation (15). The parameters of the model are denoted by θendo =

(b, ρ, δ⊤, γ⊤) ∈ Θendo according to the potential function in Eq. (32). This empirical model be-

longs to the family of exponential random graph models (ERGMs) or p∗ models [see Frank and

Strauss, 1986; Pattison and Wasserman, 1999]. ERGMs are notorious for the difficulty of estima-

25

Page 26: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

tion due to existence of an intractable normalizing constant in the likelihood function.40 Using

classical estimation approach such as Maximum likelihood (ML) approach, one needs to simulate

a set of auxiliary networks in order to approximate the intractable normalizing constant (MCMC-

MLE)[Badev, 2013; Geyer and Thompson, 1992]. However, performance of MCMC-MLE is influ-

enced by the choice of initial parameter value, θ(0). If θ(0) is not close to the true value, then the

method may converge to a suboptimal solution [Goldenberg et al., 2010]. In Bayesian approach,

intractable normalizing constant in the likelihood function makes the standard MCMC unfeasi-

ble. The standard M-H algorithm to update parameters from θ to θ′ depends on the acceptance

probability,

α(θ′|θ) = min

1,π(θ′)µ(q, G|θ′)T1(θ|θ′)

π(θ)µ(q, G|θ)T1(θ′|θ)

, (40)

where T1(θ′|θ) denotes the symmetric proposal density for the independent M-H draw, i.e., T1(θ

′−θ) = T1(θ − θ′). In the above acceptance probability, two normalizing terms in µ(q, G|θ′) and

µ(q, G|θ) do not cancel each other and therefore, α(θ′|θ) can not be calculated.

The exchange algorithm [Murray et al., 2006] (and a similar algorithm by Møller et al. [2006])

provides a way to bypass the evaluation of intractable normalizing constants in the M-H ac-

ceptance probability. The name “exchange” is given due to its similarity with the swapping

operation of exchange Monte Carlo [Geyer, 1991]. It is different from the conventional M-H

algorithm by having the proposal density T1(θ′|θ)µ(q′, G′|θ′), which involves simulation of auxil-

iary data (q′, G′) from the distribution µ(q′, G′|θ′). The acceptance probability of the exchange

algorithm can be written down as

α(θ′|θ,q′, G′) = min

1,π(θ′)µ(q, G|θ′)

π(θ)µ(q, G|θ) · T1(θ|θ′)µ(q′, G′|θ)

T1(θ′|θ)µ(q′, G′|θ′)

. (41)

Intractable normalizing terms in Equation (41) are canceled out and thus the acceptance prob-

ability α(θ′|θ,q′, G′) can be computed.

A problem of the exchange algorithm is that it requires a perfect sampler of G′ and q′ from

µ(·|θ′)[cf. Propp and Wilson, 1996], which is computationally impossible for most of ERGM

applications. To overcome this issue, Liang [2010] proposed the double M-H (DMH) algorithm

to replaces the perfect sampler with a finite M-H chain initialized at the observed network. In

this paper, we apply the DMH algorithm to estimate the endogenous network case. Details of

implementing the DMH algorithm is provided in supplementary Appendix F.1. Also note that for

the computational implementation of the DMH algorithm, we will have to evaluate the potential

function many times after a link has been added or removed in the network G. To do this in an

efficient way, we use a matrix perturbation result shown in supplementary Appendix F.4.

In order to further lower the computational cost of DMH, we assume that during the dynamic

network formation process, whenever a firm change its R&D collaboration links, all firms adjust

output levels immediately and thus the new output levels would be close to the profit-maximizing

40Other than the difficulty of estimation, the most basic exponential random graphs are statistically equivalentto an Erdos-Reny random graph in the limit of large n unless the model contains at least one non-trivial negativenetwork externality effect [cf. Bhamidi et al., 2011; Chatterjee et al., 2013; Mele, 2010b]. We show in Appendix 4how the introduction of various forms or firm heterogeneity leads to correlated networks with structural propertiesthat differ significantly from an Erdos-Reny random graph.

26

Page 27: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

output given by the best response q∗ = M(G)−1η but with a stochastic error. The size of error,

according to the approximation used in our exogenous case, should be determined by M(G)−1.

This assumption means that we impose two different time scales: the fast time scale of output

adjustments, and the slow time scale of link adjustments [cf. Gardiner, 2004; Khalil, 2002].41 In

other words, this means that the rate at which firms adjust their output levels is much faster than

the rate at which they adjust their links. This assumption saves firm’s infinitesimal adjustments

on output and turns it into an implicit function of the network.

Estimation Results of DMH for endogenous network formation are shown in Table 2. We

observe that the complementarity effect (captured by ρ) significantly drops when moving from

the exogenous to the endogenous network case. This finding suggests that the estimate of com-

plementarity effect in the exogenous case is subject to an endogeneity bias due to self-selection of

R&D collaborations. The joint modeling approach on q and G in the endogenous case allows us

to capture firms’ unobserved fixed effects in both of R&D link formation (and removal) process

and output adjustment processes, and therefore, common unobserved factors that contribute to

the three processes are under control. The estimate of the complementarity effect in the output

adjustment can then get rid of any endogeneity bias due to omitted variables.

Although the DMH algorithm alleviates the computation burden of perfect sampling, however,

convergence of the finite M-H chain in the DMH is not theoretically guaranteed. Therefore, the

resulting estimates are only approximately correct no matter how long the algorithm are run.

Especially, if the network distribution represented by ERGM is multimodal, the finite M-H run

may be trapped at one of the local maximums. Consequentially, the Markov chain of DMH may

mix very slowly and require unaffordable computation time for achieving convergence [Bhamidi

et al., 2011].42

6.4.3. Adaptive Exchange Algorithm

In this paper, we further employ the adaptive exchange (AEX) algorithm, which is recently

developed by Jin et al. [2013] and Liang et al. [2015], to overcome the uncertainty of slow mixing

faced by the DMH algorithm. The foundation of AEX is a MCMC sampling algorithm called

stochastic approximation Monte Carlo (SAMC) [Liang et al., 2007]. The main feature of SAMC

is that it applies importance sampling to prevent the local trap problem.43 In SAMC, the sample

space is partitioned into non-overlapping subregions. Different importance weights are assigned

41Observe that the stationary distribution µ(q, G) in Equation (15) does not depend on the parameter χ thatgoverns the speed of the output adjustment. This means that the stationary distribution µ(q, G) is independentof how quickly the output levels adjust.

42To overcome the local trap problem and thus speed up convergence during network simulation, Snijders [2002a]and Mele [2010b] proposed M-H samplers which consist of both local and non-local steps. In the local step, onlyone random edge is updated in an iteration. With a certain probability, the sampler will implement the non-localstep where multiple (or even all) edges are updated in an iteration. Although Mele [2010b] used a simulation studyto demonstrate convergence of the M-H sampler with a non-local step, the simulation result is based on a simplemodel specification with one indirect link effect and may not be extended to a more general case. Therefore, itis still questionable whether the finite M-H runs with non-local steps can always achieve convergence for generalERGMs.

43Other than SAMC, the parallel tempering algorithm [Geyer, 1991] is a global updating scheme to prevent thelocal trap problem. The parallel tempering algorithm runs N copies of the Markov chain with random initial valuesand temperatures. In the low temperature chain, draws may be trapped in the local maximum configuration withrestricted moves. The parallel tempering allows the low temperature chains to swap configurations with the hightemperature chains where global moves are more probable. A well-designed parallel tempering can be N timesmore efficient than a standard single-temperature MCMC algorithm.

27

Page 28: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

to each of subregions so that SAMC draw samples from a kind of “mixture distribution.” The

draws from this mixture distribution will yield a uniform distribution of the energy function in

our target distribution instead of being trapped by the local energy barrier. Additionally, SAMC

contains a self-adjusting mechanism to the weight of each subregion so that it can escape from

local energy minima very quickly.

In the AEX algorithm, two chains are running in parallel. In the first chain, we draw auxiliary

data (q, G) from a family of distributions µ(q, G|θ1), · · · , µ(q, G|θm) with frequencies determined

by the SAMC algorithm, where (θ1, · · · ,θm) are pre-specified parameter points. In practice,

we set m = 50 and (θ1, · · · ,θ50) are chosen from the quantiles of the DMH draws.44 In the

second chain, we implement the exchange algorithm for updating the target parameter, where an

auxiliary data q′ and G′ are re-sampled from the first chain by an importance sampling procedure.

Convergence of the AEX algorithm, i.e., q′ and G′ from the importance sampling procedure of

the AEX converges to µ(·|θ′) and the draws of θ from the target chain will converge weakly to the

posterior µ(θ|q, G), can be shown when the number of iterations of both the auxiliary and target

chains goes to infinity. Details of implementing the AEX algorithm and the proof of convergence

are provided in F.2 and F.3.

44We have also tried m = 30 and m = 100 and the results are largely similar.

28

Page 29: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

Table 2: Estimation results.

Homogeneous Heterogenous (Jaffe) Heterogenous (Mahalanobis)

Exogenous Endogenous Exogenous Endogenous Exogenous Endogenous

ρ 0.1027∗∗∗ 0.0175∗∗∗ 0.1078∗∗∗ 0.1085∗∗∗ 0.3705∗∗∗ 0.0160∗∗∗ 0.3720∗∗∗ 0.3539∗∗∗ 0.1514∗∗∗ 0.0084∗∗∗ 0.1526∗∗∗ 0.1454∗∗∗

(0.0024) (0.0015) (0.0036) (0.0044) (0.0125) (0.0049) (0.0181) (0.0246) (0.0049) (0.0020) (0.0070) (0.0100)b 0.0012∗∗∗ - 0.0012∗∗∗ 0.0011∗∗∗ 0.0009∗∗∗ - 0.0009∗∗∗ 0.0010∗∗∗ 0.0009∗∗∗ - 0.0010∗∗∗ 0.0009∗∗∗

(0.0001) - (0.0002) (0.0003) (0.0001) - (0.0002) (0.0003) (0.0001) - (0.0002) (0.0003)Prod. 0.8219∗∗∗ - 0.8309∗∗∗ 0.7921∗∗∗ 0.9085∗∗∗ - 0.9036∗∗∗ 0.8965∗∗∗ 0.9061∗∗∗ - 0.8926∗∗∗ 0.8938∗∗∗

(0.0465) - (0.0509) (0.0566) (0.0457) - (0.0505) (0.0579) (0.0461) - (0.0500) (0.0581)Sector FE Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes Yes

Cost function

Constant - 10.6807∗∗∗ 92.7542∗∗∗ 89.3585∗∗∗ - 11.7707∗∗∗ 89.4861∗∗∗ 91.9237∗∗∗ - 11.7299∗∗∗ 91.3462∗∗∗ 90.0079∗∗∗

- (0.4618) (7.3171) (7.9234) - (0.5181) (7.2794) (3.1671) - (0.4992) (7.5306) (8.1606)Same Sector - -2.1720∗∗∗ -15.9644∗∗∗ -13.2568∗∗∗ - -2.3482∗∗∗ -13.0264∗∗∗ -11.6918∗∗∗ - -2.3600∗∗∗ -14.2979∗∗∗ -12.0715∗∗∗

- (0.2688) (4.0742) (4.4738) - (0.2807) (3.8763) (2.7640) - (0.2582) (3.7335) (4.5575)Same Country - -0.5712∗∗ -3.4300 -3.4221 - -0.5545∗∗ -3.4618 -3.4166 - -0.5775∗∗ -2.8818 -2.6522

- (0.1733) (2.5809) (2.9108) - (0.1680) (2.6383) (2.7218) - (0.1687) (2.6258) (3.2253)Diff-in-Prod. - -1.3247∗∗∗ -8.0760∗∗ -9.2376∗∗ - -1.2537∗∗∗ -8.0981∗ -14.4721∗∗∗ - -1.2386∗∗∗ -8.5047∗∗ -10.7669∗∗

- (0.2900) (4.0175) (4.1551) - (0.3041) (4.5732) (4.3019) - (0.3068) (4.2776) (5.8749)Diff-in-Prod. Sq - 0.3088∗∗∗ 2.1773∗∗ 2.4786∗∗ - 0.2928∗∗∗ 2.1342 4.1343∗∗∗ - 0.2914∗∗∗ 2.3086∗ 3.0454∗

- (0.0865) (1.2089) (1.2735) - (0.0878) (1.2887) (1.4933) - (0.0904) (1.2440) (1.8723)Patents - -0.0998∗∗∗ -1.4779∗∗∗ -1.2138∗∗∗ - -0.2582 -1.5730∗∗∗ -1.4504∗∗∗ - -0.2509∗∗∗ -1.6025∗∗∗ -1.4090∗∗∗

- (0.0290) (0.3783) (0.4592) - (0.0281) (0.3740) (0.3454) - (0.0275) (0.4182) (0.4714)Cyclic Effect - - - -14.5055∗∗∗ - - - -18.2168∗∗∗ - - - -18.6769∗∗∗

- - - (1.9773) - - - (2.5662) - - - (2.4141)

Note: The estimation results are based on 351 firms from the SIC-28 sector. The estimates reported are the posterior mean and posterior standard deviation (in parenthesis) from50, 000 MCMC draws. The asterisks ∗∗∗(∗∗,∗) indicate that its 99% (95%, 90%) highest posterior density range does not cover zero. Heterogeneous spillovers are captured by thetechnological proximity matrix with elements fk

ij of Equations (29) for k = J and (30) for k = M [cf. Bloom et al., 2007, 2013; Jaffe, 1989]. Productivity is measured by dividing afirm’s sales (measured by 100,000 US dollars) with the firm’s employment number. Patents is the sum of both firm i’s and firm j’s accumulated patent stocks.

29

Page 30: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

6.4.4. Model Fit

To see whether the network model that we propose fit the observed network data well or not,

we adopt a model goodness-of-fit examination following Hunter et al. [2008]. We simulate one

hundred artificial networks from our network formation model with parameters evaluated at the

posterior mean of the parameters generated from the MCMC draws. Model fitness is examined

by the similarity between simulated networks and observed networks in the distribution of four

network statistics45 – degree, edge-wise shared partner, minimum geodesic distance and average

nearest neighbor connectivity. The examination result is shown in Figure 7. We present the

distribution for the observed network by solid blue curves, distributions for simulated networks

by box plots and the 5th and 95th percentiles by black dot lines. From the figure we can observe

that simulated networks and the observed network display similar distributions on these five

statistics, which suggests that our network model and parameter estimates successfully imitate

the unobserved network generating process.

7. Policy Analysis

With our estimates from the previous section (Table 2) we are now able to perform various

counterfactual policy studies. The first, discussed in Section 7.1, studies the impact on welfare of

a firm exit from the network. The second, discussed in Section 7.2, analyzes the welfare impact

of a merger between firms in the same sector. The third policy intervention, discussed in Section

7.3, studies the welfare impact of a subsidy on the collaboration costs between pairs of firms, and

aims at identifying the pair for which the subsidy yields the highest welfare gains.

7.1. Firm Exit and Key Players

In this section we evaluate the expected welfare loss from the exit of a firm from the network.

The firm whose exit results in the highest expected welfare loss is termed the “key player”.

This counterfactual policy analysis is related to Ballester et al. [2006], who perform a key player

analysis where agents are ranked according to the reduction in aggregate output when they are

removed from the network, and Konig et al. [2014b] who do this for the reduction in welfare

similar to our setup. However, while these authors have assumed that the network is exogenously

given and does not adapt to the exit of a firm, here we can relax this assumption and allow the

network to reconfigure endogenously after the removal of a firm.

Formally, the key firm is defined as

i∗ = argmaxi∈N

W (q, G) −

G∈Gn−1

q∈Qn−1

W−i(G,q)µϑ(q, G)

, (42)

45The comparison based on the first four network statistics is proposed in Hunter et al. [2008]. The de-gree is a basic measure of an individual’s centrality in the network and its distribution consists of the valuesDEg(0)/mg , · · · , DEg(mg − 1)/mg, where DEg(k) denotes the number of individuals who share edges with ex-actly k other individuals in network g. The edge-wise shared partner contains important information of a net-work, for example, the count of triangles in a network is a function of it. Its distribution consists of valuesEPg(0)/Eg, · · · , EPg(mg − 2)/Eg , where EPg(k) denotes the number of edges whose endpoints both share edgeswith exactly k other individuals and Eg is the total number of edges in network g. The minimum geodesic distanceis a higher-order network statistics, which will not be captured by the previous three statistics. Its distributionconsists of the proportions of the possible values of geodesic distance between two nodes.

30

Page 31: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

degree

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7pr

opor

tion

of n

odes

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18

minimum geodistic distance

0

0.005

0.01

0.015

0.02

0.025

prop

ortio

n of

dya

ds

0 1 2 3 4

edge-wise shared partner

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

prop

ortio

n of

edg

es

0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

Degree

0

1

2

3

4

5

6

7

8

9

10

Ave

. Nea

rest

Nei

ghbo

r C

onne

ctiv

ity

Figure 7: Goodness-of-fit examination.

where the probability measure µϑ(q, G) is given by Equation (15), the potential function from

Equation (31), the welfare function W (q, G) is defined in Equation (25) and W−i(G,q) denotes

the welfare function with firm i removed from the network. As the first term in Equation (42)

does not depend on i, the key player can be equivalently written as

i∗ = argmini∈N

G∈Gn−1

q∈Qn−1

W−i(G,q)µϑ(q, G).

The results for the key player analysis can be seen in Table 3. We consider a case of exogenously

fixed network, which can be interpreted as a short run analysis, as well as allowing the network

to dynamically adjust to the exit of a firm, which can be interpreted as a long run analysis. For

the key firm, Pfizer Inc., an American global pharmaceutical corporation which is among the

world’s largest pharmaceutical companies, the reduction in welfare due to exit amounts to 8.3%

when we assume that the network is fixed, and 9.3% when we allow for an adaptive network.

Thus, network adaptivity and endogeneity amplify the impact of firm exit, so that studies that

assume an exogenous network typically underestimate the effect of the exit of a firm. Moreover,

we observe that the welfare loss ignoring the network structure altogether is much lower than the

welfare loss incurred in the presence of the R&D collaboration network.

Table 3 also illustrate that the most important firms are not necessarily the ones with the high-

est market share, number of patents or degree, nor can they be identified with standard centrality

measures in the literature (such as the betweenness or eigenvector centrality; see Wasserman and

Faust [1994]). Rather, our results illustrate that in order to identify the key firms that are system-

31

Page 32: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

Table 3: Key player ranking for the chemicals and allied products sector with SIC code 28.

Firm Mkt. Sh. [%]a Patents Degree ∆W [%]b ∆WF [%]c ∆WN [%]d SIC Rank

Pfizer Inc. 2.7679 7844 15 -9.3298 -8.3045 -0.8576 283 1Novartis 2.0690 18924 15 -8.8432 -8.3180 -0.7028 283 2Amgen 0.8190 1885 13 -7.2453 -6.6007 -0.8979 283 3Bayer 3.8340 133433 10 -7.2364 -6.6212 -0.7721 280 4Merck & Co. Inc. 1.2998 52847 10 -5.8206 -4.3798 -0.8091 283 5Medarex Inc. 0.0028 168 9 -5.8066 -4.4589 -0.6733 283 6Exelixis 0.0056 283 7 -4.5366 -3.6320 -0.9021 283 7Xoma 0.0017 648 7 -4.2508 -3.2874 -0.6676 283 8Takeda Pharm. Co. Ltd. 0.6444 19460 7 -4.2237 -3.2980 -1.0186 283 9Daiichi Sankyo Co. Ltd. 0.4589 585 5 -4.1757 -3.0476 -0.8889 283 10Johnson & Johnson Inc. 3.0540 1212 7 -3.8065 -3.2701 -0.8189 283 11Bristol-Myers Squibb Co. 1.0287 6308 6 -3.6630 -3.2622 -0.8864 283 12Infinity Pharm. Inc. 0.0010 44 4 -3.6384 -2.3588 -0.5220 283 13Cell Genesys Inc. 0.0001 236 5 -3.4823 -1.9466 -0.4028 283 14Clinical Data Inc. 0.0036 6 4 -3.3530 -2.1821 -0.5455 283 15Compugen Ltd. 0.0001 22 5 -3.3255 -2.3133 -0.5922 283 16Dyax Corp. 0.0007 227 6 -3.1594 -2.4439 -0.5451 283 17MorphoSys AG 0.0038 20 4 -3.0709 -2.0938 -0.5237 283 18Seattle Genetics Inc. 0.0006 28 3 -2.6998 -1.2876 -0.5150 283 19Avalon Pharm. Inc. 0.0002 27 3 -2.6693 -1.2833 -0.5503 283 20

a Market share in the primary 3-digit sector in which the firm is operating.b The relative welfare loss due to exit of a firm i is computed as ∆W =(

E[W−i(G,q)]−W (qobs, Gobs))/W (qobs, Gobs), where qobs and Gobs denote the observed R&D expendi-

tures and network, respectively.c ∆WF denotes the relative welfare loss due to exit of a firm assuming a fixed network G of R&D collaborations.d ∆WN denotes the relative welfare loss due to exit of a firm in the absence of a network G of R&D collaborations.

ically relevant, we need to consider not only the market structure but also the spillovers generated

though a network of R&D collaborations, and this network must be allowed to dynamically adjust

upon the exit of a firm.

7.2. Mergers and Acquisitions

Our framework can also be used to study alternative counterfactual scenarios, such as analyzing

mergers and acquisitions, and their impact on welfare [cf. Cho, 2013; Farrell and Shapiro, 1990;

Kim and Singal, 1993]. Market concentration indices are not adequate to correctly account for

the network effect of a merger on welfare. This is because the effect of a merger of two firms on

industry profits, consumer surplus and overall welfare depends not only on the market structure

(as for example in Bimpikis et al. [2014]), but also on the architecture of the R&D collabora-

tion network between firms. This counterfactual analysis is therefore potentially important for

antitrust policy makers.

Based on our model we can assess the impact of a merger between two firms i and j which

are competing in the same market. The merger that results (in expectation) in the greatest

reduction in welfare is termed the “key merger” (in analogy to the previous section). Formally,

the key merger is defined as

(i, j)∗ = argmaxi,j∈N

W (q, G)−

G∈Gn−1

q∈Qn−1

W i∪j(G,q)µϑ(q, G)

, (43)

where the probability measure µϑ(q, G) is given by Equation (15), the potential function from

Equation (31), the welfare function W (q, G) is defined in Equation (25) and W i∪j(G,q) denotes

32

Page 33: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

the welfare function with firms i and j being merged to a single firm k in the network G. That is,

two incident nodes, i and j in G, are merged into a new node k, where each of the edges incident

to k correspond to an edge incident to either i or j.

As the first term in Equation (43) does not depend on i or j, the key merger can be equivalently

written as

(i, j)∗ = argmini,j∈N

G∈Gn−1

q∈Qn−1

W i∪j(G,q)µϑ(q, G).

The empirical merger rankings can be found in Table 4. The ranking for a fixed network is based

on the assumption that the network does not adjust to the merger and can be interpreted as a

short run analysis, while the ranking for an endogenous network adapts to the merger and can

be interpreted as a long run analysis. The mergers resulting in the highest reduction in welfare

involve only firms in the drugs development sector (SIC 283), which is also the largest sector in

our sample. The highest loss in the endogenous network case is triggered by a merger between

Cubist Pharmaceuticals, a U.S. biopharmaceutical company that was ranked 19th in total sales

among American based biotechnology companies in 2006, and Novartis, a Swiss multinational

pharmaceutical company which has recently been ranked number one in sales among the world-

wide industry, yielding a reduction in welfare of 2.9%. The dominant position of these firms in

the network amplifies the effect of a merger between them. Consequently we observe that the

welfare loss ignoring the network structure is much lower than the welfare loss incurred in the

presence of the R&D collaboration network. However, the endogeneity of the network tends to

reduce the welfare loss of a merger between two firms. This is opposite to the key player analysis

that we have performed in the previous section. Further, the welfare loss due to a merger is

typically lower than the welfare loss due to firm exit.

7.3. R&D Collaboration Subsidy

Many governments provide R&D subsidies to foster R&D collaborations between private firms

[cf. e.g. Cohen, 1994; Czarnitzki et al., 2007]. One example is the Advanced Technology Program

(ATP) which was administered by the National Institute of Standards and Technology (NIST)

in the United States. In Europe, EUREKA is a Europe-wide network for industrial R&D. The

main aim of this EU programme is to strengthen European competitiveness in the field of R&D

by means of promoting market-driven collaborative research and technology development. In

Germany the federal government provides R&D subsidies to stimulate collaboration activities

between private organizations [Broekel and Graf, 2012]. In this section we analyze the impact of

changes in the collaboration cost due to R&D subsidies on aggregate welfare.46

We analyze a counterfactual policy that selects a specific firm-pair i, j and compensates their

collaboration costs through a subsidy, i.e. setting ψij = 0. We then can evaluate the changes in

welfare due to such subsidies using our estimated model. The pair of firms for which the subsidy

results in the largest gain in welfare is defined as

(i, j)∗ = argmaxi,j∈N

G∈Gn

q∈Qn

W (q, G|ψij = 0)µϑ(q, G). (44)

46For a theoretical and empirical analysis of R&D subsidies provided by the Finnish Funding Agency for Tech-nology and Innovation see Takalo et al. [2013a,b].

33

Page 34: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

Table 4: Key merger ranking for the chemicals and allied products sector with SIC code 28.

Firm i Firm j Mkt. Sh. i [%]a Mkt. Sh. j [%] Pat. i Pat. j di dj ∆W [%]b WF [%]c ∆WN [%]d SIC i SIC j Rank

Cubist Pharm. Novartis 0.0111 2.0690 110 18924 3 15 -2.8969 -2.0660 -0.7619 283 283 1Genzyme Corp. Amgen 0.1830 0.8193 1116 1885 3 13 -2.5459 -1.3878 -0.7632 283 283 2Exelixis Johnson & Johnson Inc. 0.0056 3.0546 283 1212 7 7 -2.5268 -1.2219 -0.4474 283 283 3Bristol-Myers Squibb Co. Merck & Co. Inc. 1.0287 1.2998 6308 52847 6 10 -2.4261 -1.3795 -0.7249 283 283 4Curagen Pfizer Inc. 0.0022 2.7679 174 7844 3 15 -2.3271 -1.4521 -0.4928 283 283 5Wyeth Johnson & Johnson Inc. 1.1686 3.0546 7782 1212 2 7 -2.2562 -1.0655 -0.7009 283 283 6Schering-Plough Corp. Amgen 0.6056 0.8193 52847 1885 1 13 -2.1932 -1.5515 -0.6825 283 283 7Human Genome Sciences Inc. Daiichi Sankyo Co. Ltd. 0.0014 0.4589 898 585 2 5 -2.1223 -1.0404 -0.5280 283 283 8Allergan Inc Merck & Co. Inc. 0.1759 1.2998 1900 52847 3 10 -2.0790 -0.5998 -0.5419 283 283 9Neurocrine Biosciences Amgen 0.0022 0.8193 292 1885 1 13 -2.0708 -0.9019 -0.7423 283 283 10Eisai Amgen 0.3329 0.8193 4541 1885 1 13 -2.0107 -0.8416 -0.8827 283 283 11Nektar Therapeutics Pfizer Inc. 0.0125 2.7679 730 7844 2 15 -2.0081 -1.0424 -0.4555 283 283 12Mitsubishi Tanabe Pharm. Corp. Takeda Pharm. Co .Ltd. 0.0877 0.6444 5296 19460 1 7 -2.0009 -0.6775 -0.5120 283 283 13Chugai Pharm. Novartis 0.1610 2.0690 3944 18924 1 15 -1.9140 -1.1375 -0.9380 283 283 14Mitsubishi Tanabe Pharm. Corp. Daiichi Sankyo Co. Ltd. 0.0877 0.4589 5296 585 1 5 -1.8792 -0.3887 -0.9808 283 283 15Daiichi Sankyo Co Ltd Takeda Pharm. Co. Ltd. 0.4589 0.6444 585 19460 5 7 -1.8743 -0.7133 -0.6755 283 283 16Sepracor Amgen 0.0687 0.8193 1473 1885 1 13 -1.8571 -1.0033 -0.8965 283 283 17XenoPort Inc. Medarex Inc. 0.0006 0.0027 23 168 1 9 -1.8254 -0.4382 -0.3503 283 283 18Millennium Pharm. Johnson & Johnson Inc 0.0279 3.0546 1171 1212 4 7 -1.8243 -0.8038 -0.8367 283 283 19Cell Genesys Inc. Human Genome Sc. Inc. 0.0001 0.0014 236 898 5 2 -1.8231 -0.2929 -0.6207 283 283 20

a Market share in the primary 3-digit sector in which the firm is operating.b The relative welfare loss due to a merger of firms i and j is computed as ∆W =

(E[W i∪j(G,q)]−W (qobs, Gobs)

)/W (qobs, Gobs), where qobs and Gobs denote the observed R&D

expenditures and network, respectively.c ∆WF denotes the relative welfare loss due to a merger of firms assuming a fixed network G of R&D collaborations.d ∆WN denotes the relative welfare loss due to a merger of firms in the absence of a network G of R&D collaborations.

34

Page 35: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

where the probability measure µϑ(q, G) is given by Equation (15), the welfare function W (q, G)

is defined in Equation (25) and W (q, G|ψij = 0) denotes the welfare function with firms i and j

receiving a subsidy such that they do not incur a pair-specific collaboration cost (by setting ψij =

0). The results can be seen in Table 5. We find that a subsidy between Isis Pharmaceuticals Inc.,

a U.S. pharmaceutical company developing cholesterol-reducing drugs, and Allergan Inc., a global

pharmaceutical company, would yield a welfare gain of 0.8 %. The welfare gain is slightly weaker

when we impose an exogenous network. Our framework could be used to guide governmental

funding agencies that typically do not take into account the total spillovers generated within a

dynamic R&D network structure.

8. Conclusion

We have introduced a tractable model for the formation of R&D collaboration networks in which

firms are competitors on the product market. We show that the model can be extended along

several directions including firm heterogeneity in productivity and patent portfolios, and we find

that the model can reproduce the observed patterns in real world R&D networks. Moreover, the

model can be conveniently estimated using a novel statistical algorithm. Based on our estimates

we then perform a dynamic key player analysis, which allows us to identify the firms whose exit

would reduce welfare the most, as well as a dynamic merger analysis, taking into account that the

network and production dynamically adjust to such interventions. Due to the generality of the

payoff function we consider in Equation (8), we believe that our model – both from a theoretical

and empirical perspective – can be applied to a variety of related contexts where externalities

can be modelled in the form of an adaptive network such as peer effects in education, crime, risk

sharing, scientific co-authorship, etc. [cf. Jackson et al., 2015].

35

Page 36: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

Table 5: Key subsidy ranking for the chemicals and allied products sector with SIC code 28.

Firm i Firm j Mkt. Sh. i [%]a Mkt. Sh. j [%] Pat. i Pat. j di dj ∆W [%]b ∆WF [%]c SIC i SIC j Rank

Isis Pharmaceuticals Inc. Allergan Inc. 0.0014 0.1759 2541 1900 4 3 0.7756 0.7411 283 283 1Shionogi & Co. Ltd. Pfizer Inc. 0.0986 2.7670 3082 7844 0 15 0.6938 0.7674 283 283 2Linde Rohm & Haas Co. 4.5391 1.4583 11347 29824 0 0 0.6581 0.5149 281 282 3Gedeon Richter Pharmion Corporation 0.0571 0.0137 1794 3 0 0 0.6575 0.5706 283 283 4Agennix Endo Pharm. Holdings 0.0016 0.0522 13 37 0 0 0.6483 0.5367 283 283 5Orion Asahi Kasei Corp. 0.0224 1.4710 126 9075 0 0 0.6405 0.5662 283 280 6Agennix Bayer 0.0016 3.8340 13 133433 0 10 0.6394 0.8008 283 280 7Mochida Pharm. Co. Agennix 0.0365 0.0016 575 13 0 0 0.6168 0.5822 283 283 8Merck KGaA Unilever PLC 0.4515 5.4914 68 14881 3 0 0.6116 0.6755 283 284 9bioMerieux SA Pfizer Inc. 0.0748 2.7679 351 7844 1 15 0.5977 0.9545 283 283 10Biosite Incorporated ZEON Corp. 0.0177 0.4290 182 3628 3 0 0.5973 0.7126 283 282 11Eisai Wyeth 0.3329 1.1686 4541 7782 1 2 0.5950 0.6305 283 283 12Mitsubishi Tanabe Pharma Corp. Akzo Nobel NV. 0.0877 11.7495 5296 11366 1 2 0.5864 0.7373 283 285 13Kaocorp Daiichi Sankyo Co. Ltd. 1.1678 0.4589 18213 585 0 5 0.5776 0.6603 284 283 14Mochida Pharm. Co. JSR Corp. 0.0365 0.5574 575 4153 0 0 0.5764 0.5625 283 282 15Asahi Kasei Corporation Novozymes A/S 0.2282 0.0657 9075 4829 0 1 0.5734 0.5442 289 283 16Lonza Group Ltd. Chugai Pharm. 0.1923 0.1610 20 3944 0 1 0.5725 0.5816 280 283 17NPS Pharm. Hospira Inc. 0.0027 0.1543 311 112 0 0 0.5653 0.4875 283 283 18Biosite Incorporated Mochida Pharm. Co. 0.0177 0.0365 182 575 3 0 0.5647 0.6138 283 283 19Kemira Oyj Amgen 2.0870 0.8193 894 1885 0 13 0.5636 0.7277 289 283 20

a Market share in the primary 3-digit sector in which the firm is operating.b The relative welfare gain due to subsidizing the R&D collaboration costs between firms i and j is computed as ∆W =

(E[W (q,G|ψij = 0)]−W (qobs, Gobs)

)/W (qobs, Gobs),

where qobs and Gobs denote the observed R&D expenditures and network, respectively.c ∆WF denotes the relative welfare loss due to a merger of firms assuming a fixed network G of R&D collaborations.

36

Page 37: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

References

Ahuja, G. (2000). Collaboration networks, structural holes, and innovation: A longitudinal study. AdministrativeScience Quarterly, 45:425–455.

Aliprantis, C. and Border, K. (2006). Infinite dimensional analysis: a hitchhiker’s guide. Springer Verlag.Anderson, D. F. (2012). An efficient finite difference method for parameter sensitivities of continuous time markov

chains. SIAM Journal on Numerical Analysis, 50(5):2237–2258.Anderson, S., Goeree, J., and Holt, C. (2004). Noisy directional learning and the logit equilibrium. Scandinavian

Journal of Economics, 106(3):581–602.Anderson, S. P., De Palma, A., and Thisse, J. F. (1992). Discrete choice theory of product differentiation. MIT

Press.Anderson, S. P., Goeree, J. K., and Holt, C. A. (2001). Minimum-effort coordination games: Stochastic potential

and logit equilibrium. Games and Economic Behavior, 34(2):177–199.Atalay, E., Hortacsu, A., Roberts, J., Syverson, C., 2011. Network structure of production. Proceedings of the

National Academy of Sciences 108 (13), 5199.Axtell, R. (2001). Zipf distribution of US firm sizes. Science, 293(5536):1818.Badev, A. (2013). Discrete games in endogenous networks: Theory and policy. University of Pennsylvania Working

Paper PSC 13–05.Ballester, C., Calvo-Armengol, A., and Zenou, Y. (2006). Who’s who in networks. wanted: The key player.

Econometrica, 74(5):1403–1417.Banerjee, A. and Duflo, E. (2005). Growth theory through the lens of development economics. Handbook of

Economic growth, 1:473–552.Baum, J., Cowan, R., and Jonard, N. (2009). Network-independent partner selection and the evolution of innovation

networks. UNU-MERIT working paper series; 2009-022.Belderbos, R., Carree, M. and Lokshin, B. (2004). Cooperative R&D and firm performance. Research policy,

33:1477–1492.Bellman, R. E. (1970). Introduction to matrix analysis, volume 960. SIAM.Ben-Akiva, M. and Watanatada, T. (1981). Application of a continuous spatial choice logit model. Structural

analysis of discrete data with econometric applications, pages 320–343, MIT Press Cambridge, MA.Berliant, M. and Fujita, M. (2008). Knowledge creation as a square dance on the hilbert cube. International

Economic Review, 49(4):1251–1295.Berliant, M. and Fujita, M. (2009). Dynamics of knowledge creation and transfer: The two person case. Interna-

tional Journal of Economic Theory, 5(2):155–179.Bernstein, J. I., 1988. Costs of production, intra-and interindustry R&D spillovers: Canadian evidence. Canadian

Journal of Economics, 324–347.Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems. Journal of the Royal Statistical

Society. Series B (Methodological), pages 192–236.Bhamidi, S., Bresler, G., Sly, A., et al. (2011). Mixing time of exponential random graphs. The Annals of Applied

Probability, 21(6):2146–2170.Bimpikis, K., Ehsani, S., and Ilkilic, R. (2014). Cournot competition in networked markets. Mimeo, Stanford

University.Bisin, A., Horst, U., and Ozgur, O. (2006). Rational expectations equilibria of economies with local interactions.

Journal of Economic Theory, 127(1):74–116.Bloom, N., Schankerman, M., and Van Reenen, J. (2007). Identifying technology spillovers and product market

rivalry. Technical report, NBER Working Paper No. 13060.Bloom, N., Schankerman, M., Van Reenen, J., 2013. Identifying technology spillovers and product market rivalry.

Econometrica 81 (4), 1347–1393.Bloznelis, M. (2013). Degree and clustering coefficient in sparse random intersection graphs. The Annals of Applied

Probability, 23(3):1254–1289.Blume, L. (2003). How noise matters. Games and Economic Behavior, 44(2):251–271.Blundell, R., Griffith, R., and Van Reenen, J. (1995). Dynamic count data models of technological innovation. The

Economic Journal, 105(429):333–344.Boguna, M. and Pastor-Satorras, R. (2003). Class of correlated random networks with hidden variables. Physical

Review E, 68(3):036112.Bollobas, B., Janson, S., and Riordan, O. (2007). The phase transition in inhomogeneous random graphs. Random

Structures & Algorithms, 31(1):3–122.Bollobas, B., Riordan, O., Spencer, J., and Tusnady, G. (2001). The degree sequence of a scale-free random graph

process. Random Structures & Algorithms, 18(3):279–290.Bonacich, P. (1987). Power and centrality: A family of measures. American Journal of Sociology, 92(5):1170.Boyd, S. and Vandenberghe, L. (2004). Convex Optimization. Cambridge University Press.Bramoulle, Y., Kranton, R., and D’Amours, M. (2014). Strategic interaction and networks. The American Economic

Review, 104, 898–930.Britton, T., Deijfen, M., and Martin-Lof, A. (2006). Generating simple random graphs with prescribed degree

distribution. Journal of Statistical Physics, 124(6):1377–1397.Brock, W. and Durlauf, S. (2001). Discrete choice with social interactions. Review of Economic Studies, 68(2):235–

260.Broekel, T. and Graf, H. (2012). Public research intensity and the structure of german R&D networks: a comparison

of 10 technologies. Economics of Innovation and New Technology, 21(4):345–372.Brualdi, R. A., Solheid Ernie, S., 1986. On the Spectral Radius of Connected Graphs. Publications de l’ Institute

Mathematique, 53, 45–54.Byong-Hun, A., 1983. Iterative methods for linear complementarity problems with upperbounds on primary vari-

ables. Mathematical Programming 26 (3), 295–315.Cabrales, A., Calvo-Armengol, A., and Zenou, Y. (2010). Social interactions and spillovers. Games and Economic

Behavior.Caldarelli, G., Capocci, A., De Los Rios, P., and Munoz, M. A. (2002). Scale-free networks from varying vertex

intrinsic fitness. Physical Review Letters, 89(25):258702.

37

Page 38: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

Calvo, G. A. (1983). Staggered prices in a utility-maximizing framework. Journal of Monetary Economics,12(3):383–398.

Calvo-Armengol, A., Patacchini, E., and Zenou, Y. (2009). Peer effects and social networks in education. Reviewof Economic Studies, 76:1239–1267.

Casella, G. and Berger, R. L. (2001). Statistical Inference. Duxbury Press.Chandrasekhar, A. and Jackson, M. (2012). Tractable and consistent random graph models. Available at SSRN

2150428.Chatterjee, S., Diaconis, P., et al. (2013). Estimating and understanding exponential random graph models. The

Annals of Statistics, 41(5):2428–2461.Chatterjee, S., Diaconis, P., and Sly, A. (2011). Random graphs with a given degree sequence. The Annals of

Applied Probability, pages 1400–1435.Chib, S. and Greenberg, E. (1996). Markov chain monte carlo simulation methods in econometrics. Econometric

theory, 12(03):409–431.Cho, S.-H. (2013). Horizontal mergers in multitier decentralized supply chains. Management Science, 60(2):356–379.Cohen, L. (1994). When can government subsidize research joint ventures? politics, economics, and limits to

technology policy. The American Economic Review, pages 159–163.Cohen, W. and Klepper, S. (1996a). A reprise of size and R & D. The Economic Journal, 106(437):925–951.Cohen, W. and Klepper, S. (1996b). Firm size and the nature of innovation within industries: the case of process

and product R&D. The Review of Economics and Statistics, 78(2):232–243.Cohen, W., Levin, R., and Mowery, D. (1987). Firm size and R & D intensity: a re-examination. The Journal of

Industrial Economics, 35(4):543–565.Cohen, W. M. and Levinthal, D. A. (1990). Absorptive capacity: A new perspective on learning and innovation.

Administrative Science Quarterly, 35(1):128–152.Copeland, A., Fixler, D., 2012. Measuring the price of research and development output. Review of Income and

Wealth 58 (1), 166–182.Corchon, L. C. and Mas-Colell, A. (1996). On the stability of best reply and gradient systems with applications to

imperfectly competitive models. Economics Letters, 51(1):59 – 65.Czarnitzki, D., Hussinger, K., and Schneider, C. (2015). R&D collaboration with uncertain intellectual property

rights. Review of Industrial Organization, 46(2):183–204.Czarnitzki, D. and Toole, A. A. (2013). The R&D investment–uncertainty relationship: Do strategic rivalry and

firm size matter? Managerial and Decision Economics, 34(1):15–28.Cvetkovic, D., Doob, M., Sachs, H., 1995. Spectra of Graphs: Theory and Applications. Johann Ambrosius Barth.Cvetkovic, D., Rowlinson, P., 1990. The largest eigenvalue of a graph: A survey. Linear and Multinilear Algebra

28 (1), 3–33.Czarnitzki, D., Ebersberger, B., and Fier, A. (2007). The relationship between R&D collaboration, subsidies and

R&D performance: empirical evidence from Finland and Germany. Journal of applied econometrics, 22(7):1347–1366.

D’Aspremont, C. and Jacquemin, A. (1988). Cooperative and noncooperative R&D in duopoly with spillovers. TheAmerican Economic Review, 78(5):1133–1137.

David, P. A. (1992). Heroes, herds and hysteresis in technological history: Thomas edison and the battle of thesystems reconsidered. Industrial and Corporate Change, 1(1):129–180.

David, P. A. (2005). Path dependence in economic processes: implications for policy analysis in dynamical systemscontexts. The evolutionary foundations of economics, pages 151–194.

Dawid, H. and Hellmann, T. (2014). The evolution of R&D networks. Journal of Economic Behavior & Organiza-tion, 105:158–172.

Deijfen, M. and Kets, W. (2009). Random intersection graphs with tunable degree distribution and clustering.Probability in the Engineering and Informational Sciences, 23(04):661–674.

Diaconis, P., Holmes, S., and Janson, S. (2008). Threshold graph limits and random threshold graphs. InternetMathematics, 5(3):267–320.

Dorogovtsev, S. N. and Mendes, J. F. (2013). Evolution of networks: From biological nets to the Internet andWWW. Oxford University Press.

Fafchamps, M. and Gubert, F. (2007). The formation of risk sharing networks. Journal of development Economics,83(2):326–350.

Farrell, J. and Shapiro, C. (1990). Horizontal mergers: an equilibrium analysis. The American Economic Review,pages 107–126.

Fort, G., Moulines, E., and Priouret, P. (2011). Convergence of adaptive and interacting Markov chain MonteCarlo algorithms. The Annals of Statistics, pages 3262–3289.

Frank, O. and Strauss, D. (1986). Markov graphs. Journal of the American Statistical Association, 81(395):832–842.Gabaix, X. (1999). Zipf’s Law For Cities: An Explanation. Quarterly Journal of Economics, 114(3):739–767.Gabaix, X. (2009). Power laws in economics and finance. Annual Review of Economics, 1(1):255–294.Gardiner, C. W. (2004). Handbook of Stochastic Methods: for Physics, Chemistry and the Natural Sciences.

Springer, Springer Series in Synergetics.Gay, B. and Dousset, B. (2005). Innovation and network structural dynamics: Study of the alliance network of a

major sector of the biotechnology industry. Research Policy, pages 1457–1475.Geyer, C. J. (1991). Markov chain Monte Carlo maximum likelihood. Interface Foundation of North America.Geyer, C. J. and Thompson, E. A. (1992). Constrained monte carlo maximum likelihood for dependent data.

Journal of the Royal Statistical Society. Series B (Methodological), pages 657–699.Gibson, M. A. and Bruck, J. (2000). Efficient exact stochastic simulation of chemical systems with many species

and many channels. The Journal of Physical Chemistry A, 104(9):1876–1889.Goldenberg, A., Zheng, A. X., Fienberg, S. E., and Airoldi, E. M. (2010). A survey of statistical network models.

Foundations and Trends in Machine Learning, 2(2), 129–233.Goyal, S. and Moraga-Gonzalez, J. L. (2001). R&D networks. RAND Journal of Economics, 32(4):686–707.Graham, B. S. (2014). An empirical model of network formation: detecting homophily when agents are heteroge-

nous. National Bureau of Economic Research Working Paper No. w20341.Graham, B. S. (2015). Methods of identification in social networks. Annual Review of Economics, 7(1):465–485.

38

Page 39: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

Griffith, R., Redding, S., and Van Reenen, J. (2003). R&D and absorptive capacity: Theory and empirical evidence.Scandinavian Journal of Economics, 105(1):99–118.

Grimmett, G. (2010). Probability on graphs. Cambridge University Press.Grossman, S. J. and Hart, O. D. (1986). The costs and benefits of ownership: A theory of vertical and lateral

integration. The Journal of Political Economy, pages 691–719.Growiec, J., Pammolli, F., Riccaboni, M., and Stanley, H. (2008). On the size distribution of business firms.

Economics Letters, 98(2):207–212.Hagedoorn, J. (2002). Inter-firm R&D partnerships: an overview of major trends and patterns since 1960. Research

Policy, 31(4):477–492.Hall, B. H., Jaffe, A. B., and Trajtenberg, M. (2001). The NBER Patent Citation Data File: Lessons, Insights and

Methodological Tools. NBER Working Paper No. 8498.Hanaki, N., Nakajima, R., and Ogura, Y. (2010) The dynamics of R&D network in the IT industry. Research

policy, 39(3), 386–399.Heimeriks, K. H. and Duysters, G. (2007). Alliance capability as a mediator between experience and alliance per-

formance: an empirical investigation into the alliance capability development process*. Journal of ManagementStudies, 44(1):25–49.

Horn, R. A. and Johnson, C. R. (1990). Matrix Analysis. Cambridge University Press.Hsieh, C.-S. and Lee, L. F. (2013). Specification and Estimation of Network Formation and Network Interaction

Models with the Exponential Probability Distribution. CUHK Working Paper.Hsieh, C.-S. and Lee, L. F. (2014). A social interactions model with endogenous friendship formation and selectivity.

Journal of Applied Econometrics.Hunter, D. R., Goodreau, S. M., and Handcock, M. S. (2008). Goodness of fit of social network models. Journal

of the American Statistical Association, 103(481).Hurd, T. R. (2015). Contagion! The Spread of Systemic Risk in Financial Networks. Springer.Ide, Y. and Konno, N. (2007). Limit theorems for some statistics of a generalized threshold network model. Theory

of Biomathematics and its Applications III, 1551:81–86.Ide, Y., Konno, N., and Masuda, N. (2010). Statistical properties of a generalized threshold network model.

Methodology and Computing in Applied Probability, 12(3):361–377.Ide, Y., Konno, N., and Obata, N. (2009). Spectral properties of the threshold network model. Internet Mathematics,

6(2):173–187.Jackson, M. O., Zenou, Y., et al. (2015). Games on networks. Handbook of Game Theory with Economic Applica-

tions, 4:95–163.Jaffe, A. B. (1989). Characterizing the technological position of firms, with application to quantifying technological

opportunity and research spillovers. Research Policy, 18(2):87 – 97.Jaffe, A., Trajtenberg, M., 2002. Patents, Citations, and Innovations: A Window on the Knowledge Economy. MIT

Press.Jin, I. H., Yuan, Y., and Liang, F. (2013). Bayesian analysis for exponential random graph models using the

adaptive exchange sampler. Statistics and its interface, 6(4):559.Kamien, M. I., Muller, E., and Zang, I. (1992). Research joint ventures and R&D cartels. The American Economic

Review, 82(5):1293–1306.Kandori, M., Mailath, G., and Rob, R. (1993). Learning, mutation, and long run equilibria in games. Econometrica,

pages 29–56.Kelly, M. J., Schaan, J.-L., and Joncas, H. (2002). Managing alliance relationships: key challenges in the early

stages of collaboration. R&D Management, 32(1):11–22.Khalil, H. K. (2002). Nonlinear Systems. Prentice Hall.Kim, E. H. and Singal, V. (1993). Mergers and market power: Evidence from the airline industry. The American

Economic Review, pages 549–569.Kitsak, M., Riccaboni, M., Havlin, S., Pammolli, F., and Stanley, H. (2010). Scale-free models for the structure of

business firm networks. Physical Review E, 81:036117.Kolaczyk, E. D. (2009). Statistical Analysis of Network Data: Methods and Models. Springer.Konig, M., Tessone, C., and Zenou, Y. (2014a). Nestedness in networks: A theoretical model and some applications.

Theoretical Economics, 9:695–752.Konig, M. D. (2011). The formation of networks with local spillovers and limited observability. forthcoming in

Theoretical Economics.Konig, M. D., Battiston, S., Napoletano, M., and Schweitzer, F. (2011). The efficiency and stability of R&D

networks. Games and Economic Behavior, 75:694–713.Konig, M. D., Liu, X., and Zenou, Y. (2014b). R&D networks: Theory, empirics and policy implications. CEPR

Discussion Paper No. 9872.Konig, M. D., Lorenz, J., and Zilibotti, F. (2012). Innovation vs. imitation and the evolution of productivity

distributions. CEPR Discussion Paper No. 8843.Koop, G., Poirier, D. J., and Tobias, J. L. (2007). Bayesian econometric methods. Cambridge University Press.Lee, G., Tam, N., and Yen, N. (2005). Quadratic programming and affine variational inequalities: a qualitative

study. Springer Verlag.Liang, F. (2010). A double metropolis–hastings sampler for spatial models with intractable normalizing constants.

Journal of Statistical Computation and Simulation, 80(9):1007–1022.Liang, F., Jin, I. H., Song, Q., and Liu, J. S. (2015). An adaptive exchange algorithm for sampling from distributions

with intractable normalizing constants. Journal of the American Statistical Association, (just-accepted):00–00.Liang, F., Liu, C., and Carroll, R. J. (2007). Stochastic approximation in monte carlo computation. Journal of the

American Statistical Association, 102(477):305–320.Liben-Nowell, D. and Kleinberg, J. (2007). The link-prediction problem for social networks. Journal of the American

society for information science and technology, 58(7):1019–1031.Liggett, T. (2010). Continuous time Markov processes: an introduction, volume 113. American Mathematical

Society.Luo, Z.-Q., Pang, J.-S., and Ralph, D. (1996). Mathematical programs with equilibrium constraints. Cambridge

University Press.

39

Page 40: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

Lychagin, S., Pinkse, J., Slade, M. E., and Van Reenen, J. (2010). Spillovers in space: does geography matter?National Bureau of Economic Research Working Paper No. w16188.

Mahadev, N. and Peled, U. (1995). Threshold Graphs and Related Topics. North Holland.Marshall, A. (1890). Principles of Economics. 2nd Edition London: Macmillan.Mattsson, L. and Weibull, J. (2002). Probabilistic choice and procedurally bounded rationality. Games and

Economic Behavior, 41(1):61–78.McFadden, D. (1976). The mathematical theory of demand models. Behavioral Travel Demand Models, pages

305–314.Mele, A. (2010a). Segregation in Social Networks: Theory, Estimation and Policy. NET Institute Working Paper

10-16.Mele, A. (2010b). A structural model of segregation in social networks. Working Paper, Available at SSRN 1694387.Melitz, M. (2003). The impact of trade on intra-industry reallocations and aggregate industry productivity. Econo-

metrica, 71(6):1695–1725.Melitz, M., Ottaviano, G., and Maggiore, S. (2008). Market Size, Trade, and Productivity. Review of Economic

Studies, 75(1):295–316.Meyer, C. D. (2000). Matrix analysis and applied linear algebra. Society for Industrial and Applied Mathematics.Møller, J., Pettitt, A. N., Reeves, R., and Berthelsen, K. K. (2006). An efficient markov chain monte carlo method

for distributions with intractable normalising constants. Biometrika, 93(2):451–458.Monderer, D. and Shapley, L. (1996). Potential Games. Games and Economic Behavior, 14(1):124–143.Murray, I., Ghahramani, Z., and MacKay, D. J. (2006). MCMC for doubly-intractable distributions. Proceedings

of the 22nd Annual Conference on Uncertainty in Artificial Intelligence (UAI-06), Arlington, Virginia, AUAIPress.

Newman, M. (2002). Assortative mixing in networks. Physical Review Letters, 89(20):208701.Newman, M. E. (2003). Properties of highly clustered networks. Physical Review E, 68(2):026121.Nolan, J. P. (2014). Stable distributions: Models for Heavy Tailed Data.Nooteboom, B., Van Haverbeke, W., Duysters, G., Gilsing, V., and Van Den Oord, A. (2007). Optimal cognitive

distance and absorptive capacity. Research Policy, 36(7):1016–1034.Norris, J. R. (1998). Markov chains. Number 2008. Cambridge university press.Park, J. and Newman, M. (2004). Statistical mechanics of networks. Physical Review E, 70(6):66117.Pattison, P. and Wasserman, S. (1999). Logit models and logistic regressions for social networks: II. Multivariate

relations. British Journal of Mathematical and Statistical Psychology, 52:169–193.Peled, U. N., Petreschi, R., Sterbini, A., 1999. (n, e)-graphs with maximum sum of squares of degrees. Journal of

Graph Theory 31 (4), 283–295.Powell, W., Koput, K. W., and Smith-Doerr, L. (1996). Interorganizational collaboration and the locus of innova-

tion: Networks of learning in biotechnology. Administrative Science Quarterly, 41(1):116–145.Powell, W. W., White, D. R., Koput, K. W., and Owen-Smith, J. (2005). Network dynamics and field evolution:

The growth of interorganizational collaboration in the life sciences. American Journal of Sociology, 110(4):1132–1205.

Propp, J. G. and Wilson, D. B. (1996). Exact sampling with coupled markov chains and applications to statisticalmechanics. Random Structures & Algorithms, 9(1-2):223–252.

Robert, C. and Casella, G. (2004). Monte Carlo statistical methods. Springer Verlag.Roberts, G. O. and Rosenthal, J. S. (2007). Coupling and ergodicity of adaptive Markov chain Monte Carlo

algorithms. Journal of applied probability, pages 458–475.Roberts, G. O. and Tweedie, R. L. (1996). Geometric convergence and central limit theorems for multidimensional

hastings and metropolis algorithms. Biometrika, 83(1):95–110.Roijakkers, N. and Hagedoorn, J. (2006). Inter-firm R&D partnering in pharmaceutical biotechnology since 1975:

Trends, patterns, and networks. Research Policy, 35(3):431–446.Rosenkopf, L. and Padula, G. (2008). Investigating the Microstructure of Network Evolution: Alliance Formation

in the Mobile Communications Industry. Organization Science, 19(5):669.Rosenkopf, L. and Schilling, M. (2007). Comparing alliance network structure across industries: Observations and

explanations. Strategic Entrepreneurship Journal, 1:191–209.Rue, H. and Held, L. (2005). Gaussian Markov random fields: theory and applications, volume 104. Chapman &

Hall.Schilling, M., 2009. Understanding the alliance data. Strategic Management Journal 30 (3), 233–260.Sampson, R. C. (2005). Experience effects and collaborative returns in R&D alliances. Strategic Management

Journal, 26(11):1009.Sandholm, W. (2010). Population games and evolutionary dynamics. MIT Press.Rosenkopf, L., Schilling, M., 2007. Comparing alliance network structure across industries: Observations and

explanations. Strategic Entrepreneurship Journal 1, 191–209.Shalizi, C. R. and Rinaldo, A. (2013). Consistency under sampling of exponential random graph models. The

Annals of Statistics, 41(2):508–535.Singer-Cohen, K. (1995). Random intersection graphs. PhD thesis, PhD thesis, Department of Mathematical

Sciences, The Johns Hopkins University.Singh, N. and Vives, X. (1984). Price and quantity competition in a differentiated duopoly. The RAND Journal

of Economics, 15(4):546–554.Smith, T. E. and LeSage, J. P. (2004). A bayesian probit model with spatial dependencies. Advances in economet-

rics, 18:127–160.Snijders, T. (2002a). Markov chain monte carlo estimation of exponential random graph models. Journal of Social

Structure, 3(2):1–40.Snijders, T. A. (2001). The Statistical Evaluation of Social Network Dynamics. Sociological Methodology, 31(1):361–

395.Snijders, T. A. (2002b). Markov chain monte carlo estimation of exponential random graph models. Journal of

Social Structure, 3(2):1–40.Soderberg, B. (2002). General formalism for inhomogeneous random graphs. Physical Review E, 66(6):066121.Spence, M. (1984). Cost reduction, competition, and industry performance. Econometrica, pages 101–121.

40

Page 41: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

Stanley, M., Amaral, L., Buldyrev, S., Havlin, S., Leschhorn, H., Maass, P., Salinger, M., and Stanley, H. (1996).Scaling behaviour in the growth of companies. Nature, 379(6568):804–806.

Stroock, D. (2005) An introduction to Markov processes. Springer Verlag. 230.Su, C.-L. and Judd, K. L. (2012). Constrained optimization approaches to estimation of structural models. Econo-

metrica, 80(5):2213–2230.Takalo, T., Tanayama, T., and Toivanen, O. (2013a). Estimating the benefits of targeted R&D subsidies. Review

of Economics and Statistics, 95(1):255–272.Takalo, T., Tanayama, T., and Toivanen, O. (2013b). Market failures and the additionality effects of public

support to private R&D: Theory and empirical implications. International Journal of Industrial Organization,31(5):634–642.

Tierney, L. (1994). Markov chains for exploring posterior distributions. the Annals of Statistics, pages 1701–1728.Topkis, D. M. (1998). Supermodularity and complementarity. Princeton University Press.Thoma, G., Torrisi, S., Gambardella, A., Guellec, D., Hall, B. H., Harhoff, D., 2010. Harmonizing and combining

large datasets-an application to firm-level patent and accounting data. National Bureau of Economic ResearchWorking Paper No. w15851.

Trajtenberg, M., Shiff, G., Melamed, R., 2009. The “names game”: Harnessing inventors, patent data for economic

research. Annals of Economics and Statistics/Annales d’Economie et de Statistique, 79–108.Van Der Hofstad, R. (2009). Random graphs and complex networks.Van Mieghem, P., Stevanovic, D., Kuipers, F., Li, C., van de Bovenkamp, R., Liu, D., and Wang, H. (2011).

Decreasing the spectral radius of a graph by link removals. Phys. Rev. E, 84:016101.Wainwright, M. J. and Jordan, M. I. (2008). Graphical models, exponential families, and variational inference.

Foundations and Trends in Machine Learning, 1(1-2):1–305.Wasserman, S. and Faust, K. (1994). Social Network Analysis: Methods and Applications. Cambridge University

Press.Watts, A. (2001). A dynamic model of network formation. Games and Economic Behavior, 34(2):331–341.Watts, D. J. and Strogatz, S. H. (1998). Collective dynamics of small-world networks. Nature, 393:440–442.Weibull, J. (1997). Evolutionary game theory. The MIT press.Westbrock, B. (2010). Natural concentration in industrial research collaboration. The RAND Journal of Economics,

41(2):351–371.Wong, R. (2001). Asymptotic approximations of integrals, volume 34. Society for Industrial Mathematics.

41

Page 42: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

Appendix

A. Proofs

Proof of Proposition 1. The potential Φ(q, G) has the property that47

Φ(q, G+ ij)− Φ(q, G) = ρqiqj − ζ = πi(q, G + ij)− πi(q, G). (45)

From the properties of πi(q, G) it also follows that Φ(q′i,q−i, G)−Φ(qi,q−i, G) = φ(q′i,q−i, G)−φ(qi,q−i, G) =

πi(q′i,q−i, G)− πi(qi,q−i, G).

Proof of Proposition 2. First, note that the embedded discrete time Markov chain is irreducible andaperiodic, and thus is ergodic and has a unique stationary distribution. Hence, also the continuous timeMarkov chain is ergodic and has a unique stationary distribution.

The stationary distribution solves µϑQ = 0 with the transition rates matrix Q = (q(ω,ω′))ω,ω′∈Ω

of Equation (14). This equation is satisfied when the probability distribution µϑ(ω) satisfies the detailedbalance condition [cf. e.g. Norris, 1998]

µϑ(ω)q(ω,ω′) = µϑ(ω′)q(ω′,ω), (46)

for all ω,ω′ ∈ Ω. Observe that the detailed balance condition is trivially satisfied if ω′ and ω differ inmore than one link or more than one quantity level. Hence, we consider only the case of link creationG′ = G+ ij (and removal G′ = G− ij) or an adjustment in quantity q′i 6= qi for some i ∈ N . For the caseof link creation with a transition from ω = (q, G) to ω′ = (q, G + ij) we can write the detailed balancecondition as follows

1

eϑ(Φ(q,G)−m ln( ξλ )) eϑΦ(q,G+ij)

eϑΦ(q,G+ij) + eϑΦ(q,G)λ =

1

eϑ(Φ(q,G+ij)−(m+1) ln( ξλ )) eϑΦ(q,G)

eϑΦ(q,G) + eϑΦ(q,G+ij)ξ.

This equality is trivially satisfied. A similar argument holds for the removal of a link with a transitionfrom ω = (q, G) to ω′ = (q, G − ij) where the detailed balance condition reads

1

eϑ(Φ(q,G)−m ln( ξλ )) eϑΦ(q,G−ij)

eϑΦ(q,G−ij) + eϑΦ(q,G)ξ =

1

eϑ(Φ(q,G−ij)−(m−1) ln( ξλ )) eϑΦ(q,G)

eϑΦ(q,G) + eϑΦ(q,G−ij)λ.

For a change in the output level with a transition from ω = (qi,q−i, G) to ω′ = (q′i,q−i, G) we get for thefollowing detailed balance condition

1

eϑ(Φ(qi,q−i,G)−m ln( ξλ )) eϑπi(q

′i,q−i,G)dq∫

Q eϑπi(q′,q−i,G)dq′

χ =1

eϑ(Φ(q′i,q−i,G)−m ln( ξλ )) eϑπi(qi,q−i,G)dq∫

Q eϑπi(q′,q−i,G)dq′

χ.

This can be written as eϑ(Φ(qi,q−i,G)−Φ(q′i,q−i,G)) = eϑ(πi(qi,q−i,G)−πi(q′i,q−i,G)), which is satisfied since we

have for the potential Φ(qi,q−i, G)−Φ(q′i,q−i, G) = πi(qi,q−i, G)− πi(q′i,q−i, G). Hence, the probability

measure µϑ(ω) satisfies a detailed balance condition of Equation (46) and therefore is the stationarydistribution of the Markov chain with transition rates q(ω,ω′).

Note that in the limit of vanishing noise, ϑ→ ∞, the Markov chain (ωt)t∈R+ is equivalent – interms of its stationary distribution – to a Markov chain (ω′

t)t∈R+ in which we replace the outputadjustment process with one where firms choose simultaneously the Nash equilibrium outputlevels.

Corollary 1. Consider a Markov chain (ω′t)t∈R+ with the same probability space as in Definition 1. For

every t, let the output levels qt be given by the Nash equilibrium q∗t for the current network Gt. Further,

let the link formation and removal events be as in Definition 1. Then, in the limit of small noise, ϑ→ ∞,the Markov chains (ω′

t)t∈R+ and (ω′t)t∈R+ have the same stationary distribution.

47In the limit of fast output adjustment, and b = 0, output levels are given by the Bonacich centrality. Whenspillover effects are high, these in turn are determined by the principal eigenvector components. The marginalpayoff from forming a link ij is then determined by the product of the eigenvector components vivj , less the costζ of the link. It then can be shown that maximizing the product vivj is equivalent to maximizing the principaleigenvalue λ1 [cf. Van Mieghem et al., 2011], and this is the setup that has been analyzed in Konig et al. [2011].In particular, we then have a potential function given by Φ(G) = λ1(G)−mζ.

42

Page 43: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

Proof of Corollary 1. Let µ = limϑ→∞ µϑ be the stationary distribution of the Markov chain (ωt)t≥0

of Definition 1 with state ωt = (qt, Gt) in the state space Ω = Qn × Gn in the limit of vanishing noise(ϑ → ∞). From Proposition 2 we know that the Markov chain is ergodic, so that the Ergodic Theoremapplies [cf. e.g. Norris, 1998], which states that

P

(limt→∞

1

t

∫ t

0

1ωs∈Ads = µ(A)

)= 1,

for any set A ∈ Ω. In particular, let A be the set of states in which the firms’ output levels are given bythe Nash equilibrium output q∗(G) in a network G, i.e. A = (q, G) ∈ Ω : q = q∗(G), and Ac the setof states where they don’t. Further, let τǫ be the largest time, across any output profile qt and for anynetwork G, such that P(qt+τǫ ∈ A) = 1 − ǫ for some small ǫ > 0. This means that in all but ǫ samplepaths of the Markov chain after a time τǫ the chain has reached a state in A. Next, let the rate at whichthe network changes be given by (Nτǫ)

−1, so that the expected time between two network updates is givenby Nτǫ. Then the proportion of time the Markov chain spends in states q 6= q∗(G) between two networkupdates is upper bounded by (1 − ǫ)τǫ + ǫNτǫ, and, since firms do not change their output levels anymore once they have reached the Nash equilibrium q∗(G), the time the Markov chain rests in A is lowerbounded by (1− ǫ)(N − 1)τǫ. It then follows that

µ(Ac)

µ(A)≤ (1− ǫ)τǫ + ǫNτǫ

(N − 1)τǫ→ ǫ,

as N → ∞. Since µ(A) ≤ 1 we then must have that limǫ→0 limN→∞ µ(Ac) = 0. Hence, when the rate atwhich the output levels are adjusted in the Markov chain (ωt)t≥0 is much higher than the rate at whichthe links change, then the states in Ac are not in the support of the stationary distribution µ. Finally, weobserve that the Markov chain (ω′

t)t≥0 is a copy of the Markov chain (ωt)t≥0 except for the states in Ac,and they both have the same stationary distribution.

Proof of Proposition 3. Observe that the potential can be written as

Φ(q, G) =

n∑

i=1

η − νqi −

b

2

n∑

j 6=i

qj

qi

︸ ︷︷ ︸ψ(q)

+

n∑

i=1

n∑

j=i+1

aij (ρqiqj − ζ)︸ ︷︷ ︸σij

. (47)

We then have thateϑΦ(q,G) = eϑψ(q)eϑ

∑ni<j aijσij ,

where only the second factor in the above expression is network dependent. We then can use the fact thatfor any constant, symmetric σij = σji, 1 ≤ i, j ≤ n, we can write48

G∈Gn

eϑ∑n

i<j aijσij =∏

i<j

(1 + eϑσij

)(48)

to obtain

G∈Gn

eϑΦ(q,G) = eϑψ(q)∏

i<j

(1 + eϑσij

)=

n∏

i=1

eϑ(η−νqi−b2

∑j 6=i qj)qi

i<j

(1 + eϑ(ρqiqj−ζ)

). (49)

48Note that Equation (48) requires the σij to be constant, and in particular, to be independent. In this case thesummation over all networks only needs to count the number of possible networks in which the link ij is present.In contrast, when σij depends on the other links in the network, then this simple summation formula would nolonger hold.

43

Page 44: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

We can use Equation (49) to compute the marginal distribution

µϑ(q) =1

G∈Gn

eϑΦ(q,G)

=1

Z ϑn

n∏

i=1

eϑ(η−νqi−b2

∑j 6=i qj)qi

i<j

(1 + eϑ(ρqiqj−ζ)

)

=1

eϑ∑n

i=1(η−νqi− b2

∑j 6=i qj)qie

∑i<j ln(1+eϑ(ρqiqj−ζ))

=1

eϑH (q), (50)

where we have introduced the effective Hamiltonian

H (q) ≡n∑

i=1

ηqi − νq2i +

j>i

(1

ϑln(1 + eϑ(ρqiqj−ζ)

)− bqiqj

) . (51)

With the marginal distribution from Equation (50) we can write the conditional distribution as

µϑ(G|q) = µϑ(q, G)

µϑ(q)=

eϑΦ(q,G)

∏ni=1 e

ϑ(η−νqi− b2

∑j 6=i qj)qi

∏i<j

(1 + eϑ(ρqiqj−ζ)

)

=eϑ

∑ni<j aij(ρqiqj−ζ)

∏i<j

(1 + eϑ(ρqiqj−ζ)

)

=∏

i<j

eϑaij(ρqiqj−ζ)

1 + eϑ(ρqiqj−ζ)

=∏

i<j

(eϑ(ρqiqj−ζ)

1 + eϑ(ρqiqj−ζ)

)aij (1− eϑ(ρqiqj−ζ)

1 + eϑ(ρqiqj−ζ)

)1−aij

=∏

i<j

pij(qi, qj)aij (1− pij(qi, qj))

1−aij .

Hence, we obtain the likelihood of an inhomogeneous random graph with link probability

pij(qi, qj) =eϑ(ρqiqj−ζ)

1 + eϑ(ρqiqj−ζ). (52)

We next show that the networks G in the support of the stationary distribution µϑ(q, G) in the limit ofvanishing noise ϑ→ ∞ are nested split graphs. A graph G is a nested split graph if for every node i ∈ Nthere exist a weight xi and a threshold τ such that vertices i and j are linked if and only if xi + xj ≥ τ[Mahadev and Peled, 1995]. In the following we denote by Hn ⊂ Gn the set of nested split graphs with nnodes.

In the limit ϑ → ∞ the linking probability in Equation (52) becomes limϑ→∞ pϑ(qi, qj) = 1ρqiqj>ζ,and the conditional probability of the network G can be written as

limϑ→∞

µϑ(G|q) =n∏

i<j

1aijρqiqj>ζ

11−aijρqiqj<ζ

.

Assume that G is a stochastically stable network, that is, we must have that limϑ→∞ µϑ(q, G) > 0. Since,µϑ(q, G) = µϑ(G|q)µϑ(q) this implies that limϑ→∞ µϑ(G|q) > 0. It follows that ρqiqj > ζ for all aij = 1and ρqiqj < ζ for all aij = 0. We then define the weights xi ≡ log qi, xj ≡ log qj and a thresholdτ ≡ log ζ − log ρ, and conclude that G ∈ Hn is a nested split graph.

Finally, note that the above results also hold in the case of heterogeneous firms (when η is replacedwith ηi) as discussed in Appendix E.1.

Remark 1. Note that an inhomogeneous random graph with linking probabilities pij given by Equa-tion (52) corresponds to the best variational approximation to the partition function Zϑ in the limitof large ϑ [cf. Wainwright and Jordan, 2008]. Observe that the potential function can be written asΦ(q, G) = θ⊤X(q, G), with the vector of parameters θ ≡ ϑ(η, ν, b, ρ, ζ)⊤ and the data vector X(q, G) =

44

Page 45: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

(X1(q, G), X2(q, G), X3(q, G), X4(q, G), X5(q, G))⊤, where

X1(q, G) =

n∑

i=1

qi,

X2(q, G) = −n∑

i=1

q2i ,

X3(q, G) = −1

2

n∑

i=1

j 6=i

qiqj ,

X4(q, G) =1

2

n∑

i=1

n∑

j=1

aijqiqj ,

X5(q, G) = −m. (53)

The log-likelihood associated with the Gibbs measure µϑ(G,q) can then be written as

lnµϑ(G,q) = θ⊤X− lnZϑ(θ).

Moreover, the log-partition function can be written as

lnZϑ(θ) = ln

(∫

Qn

dq∑

G∈Gn

eθ⊤X(q,G)

)

= ln

(∫

Qn

dq∑

G∈Gn

p(q, G)eθ

⊤X(q,G)

p(q, G)

)

≥∫

Qn

dq∑

G∈Gn

p(q, G) ln

(eθ

⊤X(q,G)

p(q, G)

)

=

Qn

dq∑

G∈Gn

p(q, G)θ⊤X(q, G)−∫

Qn

dq∑

G∈Gn

p(q, G) ln p(q, G)

= θ⊤Eϑ(X(q, G)) −H(p),

using Jensen’s inequality,49 where p(q, G) is a general probability measure on the measure space (Ω,F), theexpectation E

ϑ(·) is with respect to this measure, and H(p) is the entropy associated with the distributionp(q, G). For an inhomogeneous random graph with link probability pij the entropy is given by

H(p) = −n∑

i=1

n∑

j=i+1

pij ln pij −n∑

i=1

n∑

j=i+1

(1− pij) ln(1 − pij).

With ψ(q) as in Equation (47) we then can write

θ⊤Eϑ(X(q, G)) = E

ϑ(Φ(q, G)) = ψ(q) +n∑

i=1

n∑

j=i+1

ϑ(ρqiqj − ζ)pij = ψ(q) +n∑

i=1

n∑

j=i+1

σijpij ,

where we have denoted by σij ≡ ϑ(ρqiqj − ζ). The log-partition function can then be written as

lnZϑ(θ) = ψ(q) +n∑

i=1

n∑

j=i+1

σijpij −n∑

i=1

n∑

j=i+1

pij ln pij −n∑

i=1

n∑

j=i+1

(1− pij) ln(1− pij).

The linking probability pij that maximizes the log-partition function follows from the FOC

σij − ln pij − 1 + 1 + ln(1 − pij) = 0,

49Jensen’s inequality states that for any numbers xi > 0 with i ∈ I, the log of their sum satisfies ln(∑

i∈I xi

)≥

∑i∈I pi ln

(xi

pi

), where pi ≥ 0 is any probability measure over I satisfying

∑i∈I pi = 1.

45

Page 46: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

from which we get

σij = ln

(pij

1− pij

),

so that

pij =eσij

1 + eσij=

eϑ(ρqiqj−ζ)

1 + eϑ(ρqiqj−ζ).

However, this is exactly the linking probability that we have derived in Equation (52).

Remark 2. Observe that the conditional probability of a network G given a quantity profile q can bewritten as lnµϑ(G|q) =

∑ni=1

∑nj=i+1 aij ln pij + (1 − aij) ln(1 − pij) with the linking probability pij in

Equation (52), and thus has separable increments and satisfies dyadic independence. This in turn implies“projectibility” as defined in Shalizi and Rinaldo [2013]. Shalizi and Rinaldo [2013] show that projectibilityimplies strong consistency of the maximum likelihood estimator, and that the same parameters can be usedfor both the whole network and any of its sub-networks, making the estimates robust to missing data.

Proof of Proposition 4. Using a Laplace expansion around the equilibrium values q∗ (i.e. the poten-tial maximizers) we can write for large ϑ [cf. Wong, 2001, Theorem 3, p. 495]50

µϑ(G) =1

Qn

dqeϑΦ(G,q) ≈ 1

)−n2

∣∣∣∣∣

(∂2Φ(G,q)

∂qi∂qj

)

q=q∗

∣∣∣∣∣

− 12

eϑΦ(G,q∗).

Further, using the fact that the potential function can be written in vector-matrix form as Φ(q, G) =η⊤q − u⊤Au − νq⊤

(In + b

2νB− ρ2νA

)q with B = uu⊤ − In being a matrix of all ones with a zero

diagonal, we find that the Hessian of the potential is given by ∆φ(q, G) = −2ν(In + b

2νB− ρ2νA

). We

then can write

µϑ(G) =1

)−n2

(2ν)−n2

∣∣∣∣In +b

2νB− ρ

2νA

∣∣∣∣− 1

2

eϑΦ(G,q∗),

The conditional distribution is given by

µϑ(q|G) ≈ µϑ(G,q)

µϑ(G)≈(ϑ

)n2

(2ν)n2

∣∣∣∣In +b

2νB− ρ

2νA

∣∣∣∣12

eϑ(Φ(G,q)−Φ(G,q∗)).

Proof of Proposition 5. We first analyze the partition sum Zϑ in more detail. From the proof ofProposition 6 we find that it can be written as

Zϑ =

Qn

n∏

i=1

eϑ(η−νqi−b2

∑j 6=i qj)qi

i<j

(1 + eϑ(ρqiqj−ζ)

)dq

=

Qn

eϑ∑n

i=1(η−νqi− b2

∑j 6=i qj)qie

∑i<j ln(1+eϑ(ρqiqj−ζ))dq.

With the effective Hamiltonian defined by

H (q) ≡n∑

i=1

ηqi − νq2i +

j>i

(1

ϑln(1 + eϑ(ρqiqj−ζ)

)− bqiqj

) ,

so that∑G∈Gn eϑΦ(q,G) = eϑH (q) (cf. Equation (49)), we can write the partition function as

Zϑ =

Qn

eϑH (q)dq.

50 Consider the integral J(ϑ) =∫Qn g(q)e

ϑf(q)dq. Assume that (i) J(ϑ) is absolutely convergent for some

ϑ > ϑ0, (ii) f has a unique maximum at q0, and (iii) the Hessian(

∂2f

∂qi∂qj

)

q=q0

is negative definite. Then

J(ϑ) ∼(2πϑ

)n2 g(q0)

∣∣∣∣(

∂2f

∂qi∂qj

)

q=q0

∣∣∣∣− 1

2

e−ϑf(q0) for ϑ → ∞. For the case of multiple maxima and boundary

solutions see Wong [2001].

46

Page 47: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

In the following we make the Laplace approximation [cf. Wong, 2001, Theorem 3, p. 495]

Zϑ ∼(2π

ϑ

)n2

∣∣∣∣∣

(∂2H

∂qi∂qj

)

qi=q∗

∣∣∣∣∣

− 12

eϑH (q∗), (54)

as ϑ→ ∞, where q∗ = argmaxq∈[0,q]n H (q), and the Hessian is given by ∂2H

∂qi∂qjfor 1 ≤ i, j ≤ n. The free

energy is defined as Fϑ ≡ − lnZϑ, and with Equation (54) it can be written as

Fϑ ∼ −n2ln

(2π

ϑ

)+

1

2ln

∣∣∣∣∣

(∂2H

∂qi∂qj

)

qi=q∗

∣∣∣∣∣− ϑH (q∗).

Next, we have that

(n∑

i=1

qi

)=∑

G∈Gn

Qn

dq

(n∑

i=1

qi

)µϑ(q, G)

=1

G∈Gn

Qn

dq

(n∑

i=1

qi

)eϑΦ(q,G)

=1

G∈Gn

Qn

dq1

ϑ

∂ηeϑΦ(q,G)

=1

ϑ

1

∂Zϑ

∂η

=1

ϑ

∂ lnZϑ

∂η

= − 1

ϑ

∂Fϑ

∂η.

The average output is then given by

(1

n

n∑

i=1

qi

)= − 1

∂Fϑ

∂η.

From the Laplace approximation in Equation (54) we find that

∂Fϑ

∂η= −ϑ∂H (q∗)

∂η+

1

2

∂ηln

∣∣∣∣∣

(∂2H

∂qi∂qj

)

qi=q∗

∣∣∣∣∣

= −ϑ∂H (q∗)

∂η+

1

2tr

((∂2H

∂qi∂qj

)−1∂

∂η

(∂2H

∂qi∂qj

))

qi=q∗

,

where we have used Jacobi’s formula [cf. Bellman, 1970].51 From Equation (51) we find that

∂H

∂qi= η − 2νqi +

n∑

j 6=i

2

(1 + tanh

2(ρqiqj − ζ)

))− b

)qj . (55)

From Equation (55) we further have that

∂2H

∂q2i= −2ν +

ϑρ2

4

n∑

j 6=i

q2j

(1− tanh

2(ρqiqj − ζ)

)2),

51For any invertible matrix M(x) for all x, Jacobi’s formula states that ddx

|M(x)| = |M(x)| tr(M(x)−1 d

dxM(x)

),

which can be written more compactly as ddx

ln |M(x)| = tr(M(x)−1 d

dxM(x)

).

47

Page 48: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

and for j 6= i we have that

∂2H

∂qi∂qj= −b+ ρ

2

(1 + tanh

2(ρqiqj − ζ)

))(1 +

ϑρ

2qiqj

(1− tanh

2(ρqiqj − ζ)

))).

This shows that ∂∂η

(∂2

H

∂qi∂qj

)= 0, so that ∂Fϑ

∂η= −ϑ∂H (q∗)

∂η, and the expected output level is then given

by

(1

n

n∑

i=1

qi

)=

1

n

∂H (q∗)

∂η.

Using the fact that ∂H (q∗)∂η

=∑n

i=1 qi = nq∗ (see below), we then get in leading order terms for large ϑ

limϑ→∞

(1

n

n∑

i=1

qi

)= q∗.

We next compute q∗ = argmaxq∈[0,q]n H (q). The first order conditions in Equation (55) imply that

η − 2νqi =n∑

j 6=i

(b− ρ

2

(1 + tanh

2(ρqiqj − ζ)

)))qj .

This system of equations has a symmetric solution, qi = q for all i = 1, . . . , n, where

(b(n− 1) + 2ν)q − η =(n− 1)ρ

2

(1 + tanh

2

(ρq2 − ζ

)))q.

Introducing the variables η∗ ≡ η/(n− 1) and ν∗ ≡ ν/(n− 1), this can be written as

(b + 2ν∗)q − η∗ =ρ

2

(1 + tanh

2

(ρq2 − ζ

)))q. (56)

Let the RHS of Equation (56) be denoted by F (q) so that we can write it as (b+2ν∗)q− η∗ = F (q). Thenwe have that F (0) = 0, F ′(q) ≥ 0 and F (q) ∼ ρq for q → ∞. It follows that (b+2ν∗)q− η∗ = F (q) has atleast one solution when b+2ν∗ > ρ.52 Moreover, any iteration (b+2ν∗)qt+1−η∗ = F (qt) starting at q0 = 0converges to the smallest fixed point q∗ such that (b+ 2ν∗)q∗ − η∗ = F (q∗).

In the limit of ϑ→ ∞ we obtain from the FOC of the above equation that

(b + 2ν∗)q − η∗ =

ρq, if ζ < ρq2,0, if ρq2 < ζ.

This shows that the right hand side of Equation (21) has a point of discontinuity at√

ζρ(cf. Figure 2). It

then follows that, in the limit of ϑ→ ∞ (for the stochastically stable equilibrium), we have

q∗ =

ηb+2ν∗−ρ , if ζ < ρ(η∗)2

(b+2ν∗)2 ,η∗

b+2ν∗−ρ ,η∗

b+2ν∗

, if ρ(η∗)2

(b+2ν∗)2 < ζ < ρη2

(b+2ν∗−ρ)2 ,

η∗

b+2ν∗ , if ρ(η∗)2

(b+2ν∗−ρ)2 < ζ,

(57)

52Since the RHS, F (q), of Equation (56) is increasing (one can see this from taking the derivative), is zero atq = 0, i.e. F (0) = 0, and asymptotically grows linearly as ρq, it follows that when b+ 2ν∗ > ρ there must exist atleast one fixed point. This is because the LHS, (b+2ν∗)q− η∗, of Equation (56) starts below zero at q = 0 (whereit is −η∗), both LHS and RHS are increasing, and the RHS behaves asymptotically as a line with a slope smallerthan the slope b+ 2ν∗ of the LHS. Hence they must intersect at some q ≥ 0.

48

Page 49: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

which is increasing in ρ and η∗, and decreasing in ζ and b (cf. Figure 2). Next, note that

(n∑

i=1

q2i

)=∑

G∈Gn

Qn

dq

(n∑

i=1

q2i

)µϑ(q, G)

=1

G∈Gn

Qn

dq

(n∑

i=1

q2i

)eϑΦ(q,G)

=1

G∈Gn

Qn

dq1

ϑ2∂2

∂η2eϑΦ(q,G)

=1

1

ϑ2∂2Zϑ

∂η2.

We further have that

∂2 lnZϑ

∂η2=

1

∂2Zϑ

∂η2− 1

Z 2ϑ

(∂Zϑ

∂η

)2

=1

∂2Zϑ

∂η2−(∂ lnZϑ

∂η

)2

= ϑ2Eϑ

(n∑

i=1

q2i

)− ϑ2Eϑ

(n∑

i=1

qi

)2

We then get

Varϑ

(n∑

i=1

qi

)= E

ϑ

(n∑

i=1

q2i

)− E

ϑ

(n∑

i=1

qi

)2

=1

ϑ2∂2 lnZϑ

∂η2

= − 1

ϑ2∂2Fϑ

∂η2.

The variance of the mean is then given by

Varϑ

(1

n

n∑

i=1

qi

)= − 1

n2ϑ2∂2Fϑ

∂η2.

We have that∂2Fϑ

∂η2= −ϑ∂

2H (q∗)

∂η2= 0,

and we get

limϑ→∞

Varϑ

(1

n

n∑

i=1

qi

)= 0.

Note that the variance of the average output can be equal to zero only if it is equal to its expectation inall of its support. This can only happen if the average output is equal to q∗ with probability one in thelarge ϑ limit.

An illustration with the average output level from numerical simulations starting with differentinitial conditions and a comparison with the predictions of Equation (57) can be seen in Figure8.

Proof of Proposition 6. The output distribution is given by

µϑ(q) =1

G∈Gn

eϑΦ(q,G) =1

Z ϑn

eϑH (q),

49

Page 50: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

0 5 10 150

0.5

1

1.5

2

2.5

3

q

P(q)

20 40 60 80 1004

5

6

7

8

9

10

11

12

ζ

q

Figure 8: (Left panel) The stationary output distribution. The vertical dashed lines indicate the theoreticalpredictions from Equation (57). (Right panel) The average output level from numerical simulations with ϑ = 1starting with different initial conditions (indicated with different colors). The horizontal dashed lines indicate theequilibrium quantities and the vertical dashed lines the threshold cost levels from Equation (22). In the region ofthe cost ζ between the lower and upper thresholds two equilibria exist.

where the effective Hamiltonian is implicitly defined by eϑH (q) =∑

G∈Gn eϑΦ(q,G). From a Taylor expan-sion around q∗ we have that

H (q) = H (q∗) + (q− q∗)∇H (q∗) +1

2(q− q∗)⊤∆H (q∗)(q− q∗) + o

(‖q− q∗‖2

),

as ϑ → ∞, where q∗ = argmaxq∈[0,q]n H (q), the gradient is ∇H (q) =(∂H

∂qi

)i=1,...,n

, and the Hessian is

∆H (q) =(∂2

H

∂qi∂qj

)i,j=1,...,n

. As the gradient ∇H (q) vanishes at q∗, we have that

H (q) = H (q∗) +1

2(q − q∗)⊤∆H (q∗)(q− q∗) + o

(‖q− q∗‖2

).

We then can write

µϑ(q) =1

Z ϑn

eϑH (q∗) exp

−1

2ϑ(q− q∗)⊤(−∇H (q∗))(q − q∗)

+ o

(‖q− q∗‖2

).

Normalization implies that

Zϑn =

Qn

dqeH (q)

= eϑH (q∗)

Qn

dq exp

−1

2ϑ(q− q∗)⊤(−∆H (q∗))(q − q∗)

+ o

(‖q− q∗‖2

)

= eϑH (q∗)(2π)n2 |−∆H (q∗)|− 1

2 + o(‖q− q∗‖2

).

The Laplace approximation of µϑ(q) is then given by[cf. Wong, 2001, Theorem 3, p. 495]

µϑ(q) =

(2π

ϑ

)−n2

|−∆H (q∗)| 12 exp−1

2ϑ(q− q∗)⊤(−∆H (q∗))(q − q∗)

+ o

(‖q− q∗‖2

). (58)

That is, in the limit of large ϑ, q is asymptotically normally distributed with mean q∗ and variance− 1ϑ∇H (q∗)−1. The marginal distribution is then given by

µϑ(qi) =1

Z ϑn

q−i∈Qn−1

eϑH (q) ∼ 1√2πσ2

e−(qi−q∗i )2

2σ2 , (59)

50

Page 51: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

where σ2 = − 1ϑ(∇H (q∗))

−1ii . From Equation (55) we further have that

∂2H

∂q2i= −2ν +

ϑρ2

4

n∑

j 6=i

q2j

(1− tanh

2(ρqiqj − ζ)

))(1 + tanh

2(ρqiqj − ζ)

)),

and for j 6= i we have that

∂2H

∂qi∂qj= −b+ ρ

2

(1 + tanh

2(ρqiqj − ζ)

))(1 +

ϑρ

2qiqj

(1− tanh

2(ρqiqj − ζ)

))).

Imposing symmetry, qi = q for all i = 1, . . . , n, we then can write

∂2H

∂q2i

∣∣∣∣qi=q

= −2ν + (n− 1)ϑρ2

4q2(1− tanh

2

(ρq2 − ζ

)))(1 + tanh

2

(ρq2 − ζ

))),

and for j 6= i we have that

∂2H

∂qi∂qj

∣∣∣∣qi=qj=q

= −b+ ρ

2

(1 + tanh

2

(ρq2 − ζ

)))(1 +

ϑρ

2q2(1− tanh

2

(ρq2 − ζ

)))).

Using Equation (56), from which we get

ρ

2

(1 + tanh

2

(ρq2 − ζ

)))=

((n− 1)b+ 2ν)q − η

(n− 1)q,

andρ

2

(1− tanh

2(ρqiqj − ζ)

))=

((n− 1)(ρ− b)− 2ν)q + η

(n− 1)q,

we then can write

∂2H

∂q2i

∣∣∣∣qi=q

= −2ν +ϑ(((n − 1)(ρ− b)− 2ν)q + η)(((n − 1)b+ 2ν)q − η)

n− 1,

and∂2H

∂qi∂qj

∣∣∣∣qi=qj=q

= −b+ ((n− 1)b+ 2ν)q − η

(n− 1)q

(1 +

ϑq(((n − 1)(ρ− b)− 2ν)q + η)

n− 1

).

Denoting by ν∗ ≡ ν/(n− 1) and η∗ ≡ η/(n− 1) we can further write

∂2H

∂q2i

∣∣∣∣qi=q

= (n− 1) (−2ν∗ + ϑ ((ρ− b− 2ν∗)q + η∗) ((b + 2ν∗)q − η∗)) ,

and∂2H

∂qi∂qj

∣∣∣∣qi=qj=q

= −b+ (b+ 2ν∗)q − η∗

q(1 + ϑq((ρ− b− 2ν∗)q + η∗)) .

That is

∂2H

∂q2i

∣∣∣∣qi=q

= (n− 1)

(−2ν∗ + ϑq2

(ρ− b− 2ν∗ +

η∗

q

)(b+ 2ν∗ − η∗

q

)),

and∂2H

∂qi∂qj

∣∣∣∣qi=qj=q

= −b+(b + 2ν∗ − η∗

q

)(1 + ϑq2

(ρ− b− 2ν∗ +

η∗

q

)).

Recall from Proposition 5 that in the high equilibrium we have that q = η∗

b+2ν∗ . Inserting gives

∂2H

∂q2i

∣∣∣∣qi=q

= −2ν∗ = −2ν, (60)

51

Page 52: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

0 1 2 3 4 5 6 70

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

q

P(q)

Figure 9: The stationary output distribution P (q) for n = 50, η = 150, b = 0.5, ν = 10, ρ = 1 , ϑ = 0.5 and ζ = 60.Dashed lines indicate the normal distribution of Equation (59).

and∂2H

∂qi∂qj

∣∣∣∣qi=qj=q

= −b. (61)

We obtain exactly the same result when inserting the high equilibrium q∗ = η∗

b+2ν∗−ρ from Proposition 5.

Note that due to symmetry, the Hessian ∆H (q∗) with components in Equations (60) and (61) isa special case of a circulant matrix. Denoting by a the diagonal elements of ∆H (q∗) and by b theoff-diagonal elements, the determinant follows from the general formula [cf. Horn and Johnson, 1990]:

|−∆H (q∗)| =

∣∣∣∣∣∣∣∣

2ν b b . . .b 2ν b . . .b b 2ν...

.... . .

∣∣∣∣∣∣∣∣= (2ν − b)n−1(2ν + (n− 1)b),

Similarly, for a circulant matrix (by applying the Sherman-Morrison formula) we get for the inverse

−∆H (q∗)−1 =

2ν b b . . .b 2ν b . . .b b 2ν...

.... . .

−1

=1

2ν2 + 2(n− 2)νb− (n− 1)b2

2ν + (n− 2)b −b −b . . .−b 2ν + (n− 2)b −b . . .−b −b 2ν + (n− 2)b...

.... . .

.

In particular, for large n we see that the off-diagonal elements vanish. As q is asymptotically normallydistributed with mean q∗ and variance − 1

ϑ∆H (q∗)−1, this implies that, in the limit of n → ∞, the

individual firms’ output levels become independent. The diagonal entries are given by

σ2 = − 1

ϑ(∆H (q∗))−1

ii =1

ϑ

2ν + (n− 2)b

4ν2 + 2(n− 2)νb− (n− 1)b2.

Finally, in the limit of ϑ→ ∞, the output of firm i converges to a constant random variable q∗ with pointmass density δ(q − q∗).

An illustration of µϑ(q) of Equation (59) together with the simulation results for n = 50, η = 150,b = 0.5, ν = 10, ρ = 1 , ϑ = 0.5 and ζ = 60 can be seen in Figure 9. The figure shows that theanalytic prediction reproduces the simulation results fairly well even for small values of n.

52

Page 53: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

Proof of Proposition 7. The expected number of links can be obtained as follows

Eϑ(m) =

G∈Gn

Qn

mµϑ(q, G)dq =1

G∈Gn

Qn

meϑΦ(q,G)︸ ︷︷ ︸

− 1ϑ

∂∂ζeϑΦ(q,G)

dq

= − 1

ϑ

1

∂Zϑ

∂ζ=

1

ϑ

∂Fϑ

∂ζ.

From the Laplace approximation in Equation (54) we find that

∂Fϑ

∂ζ= −ϑ∂H (q∗)

∂ζ+

1

2

∂ζln

∣∣∣∣∣

(∂2H

∂qi∂qj

)

qi=q∗

∣∣∣∣∣

= −ϑ∂H (q∗)

∂ζ+

1

2tr

((∂2H

∂qi∂qj

)−1∂

∂ζ

(∂2H

∂qi∂qj

))

qi=q∗

,

where we have used Jacobi’s formula [cf. e.g. Bellman, 1970]. Consequently, the expected number of linksis

Eϑ(m) = −∂H (q∗)

∂ζ+

1

2ϑtr

((∂2H

∂qi∂qj

)−1∂

∂ζ

(∂2H

∂qi∂qj

))

qi=q∗

.

Further, we have that

∂H

∂ζ= −1

2

n∑

i=1

n∑

j>i

(1 + tanh

2(ρqiqj − ζ)

)),

and in the symmetric equilibrium this is

∂H

∂ζ

∣∣∣∣qi=q

= −n(n− 1)

4

(1 + tanh

2

(ρq2 − ζ

))).

The expected number of links can then be written as

Eϑ(m) =

n(n− 1)

2

(1 + tanh

2

(ρq2 − ζ

)))+

1

2ϑtr

((∂2H

∂qi∂qj

)−1∂

∂ζ

(∂2H

∂qi∂qj

))

qi=q∗

.

Using the fact that

ρ

2

(1 + tanh

2

(ρq2 − ζ

))2)

= b + 2ν∗ − η∗

q,

where ν∗ = νn−1 and η∗ = η

n−1 , we can write

∂H

∂ζ

∣∣∣∣qi=q

= −n(n− 1)

(b + 2ν∗ − η∗

q

).

In the limit of ϑ→ ∞ in the low equilibrium, where q = η∗

b+2ν∗ and therefore η∗

q= b+ 2ν∗, we then get

∂H

∂ζ

∣∣∣∣qi=q

= 0.

In contrast, in the limit of ϑ → ∞ in the high equilibrium, where q = η∗

b+2ν∗−ρ , andη∗

q= b + 2ν∗ − ρ we

find that∂H

∂ζ

∣∣∣∣qi=q

= −n(n− 1)

2.

Moreover, from Equation (55) we further have that

∂2H

∂q2i= −2ν +

ϑρ2

4

n∑

j 6=i

q2j

(1− tanh

2(ρqiqj − ζ)

)2),

53

Page 54: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

and for j 6= i we have that

∂2H

∂qi∂qj= −b+ ρ

2

(1 + tanh

2(ρqiqj − ζ)

))(1 +

ϑρ

2qiqj

(1− tanh

2(ρqiqj − ζ)

))).

Further, the derivatives with respect to ζ are given by

∂ζ

∂2H

∂q2i=ϑ2ρ2

4

n∑

j 6=i

tanh

2(ρqiqj − ζ)

)(1− tanh

2(ρqiqj − ζ)

)2),

and for j 6= i

∂ζ

∂2H

∂qi∂qj= −ϑρ

4

(1− tanh

2(ρqiqj − ζ)

)2)(

1− ϑρqiqj tanh

2(ρqiqj − ζ)

)).

Imposing symmetry, qi = q for all i = 1, . . . , n, we then can write

∂ζ

∂2H

∂q2i

∣∣∣∣qi=q

=(n− 1)ϑ2ρ2

4tanh

2

(ρq2 − ζ

))(1− tanh

2

(ρq2 − ζ

))2),

and

∂ζ

∂2H

∂qi∂qj

∣∣∣∣qi=qj=q

= −ϑρ4

(1− tanh

2

(ρq2 − ζ

))2)(

1− ϑρq2 tanh

2

(ρq2 − ζ

))).

For a circulant matrix (by applying the Sherman-Morrison formula) we have that

a b b . . .b a b . . .b b a...

.... . .

−1

=1

a2 + (n− 2)ab− (n− 1)b2

a+ (n− 2)b −b −b . . .−b a+ (n− 2)b −b . . .−b −b a+ (n− 2)b...

.... . .

,

and

tr

c d d . . .d c d . . .d d c...

.... . .

e f f . . .f e f . . .f f e...

.... . .

= n(ce + (n− 1)df),

so that

tr

a b b . . .b a b . . .b b a...

.... . .

−1

e f f . . .f e f . . .f f e...

.... . .

=

n((a+ (n− 2)b)e− (n− 1)bf)

a2 + (n− 2)ab− (n− 1)b2.

The expected number of links can then be written as follows:

Eϑ(m) =

n(n− 1)

2

(1 + tanh

2

(ρq2 − ζ

)))+

1

n((c1 + (n− 2)c2)c3 − (n− 1)c2c4)

c21 + (n− 2)c1c2 − (n− 1)c22,

where

c1 ≡ −2ν + (n− 1)ϑρ2q2

4

(1− tanh

2

(ρq2 − ζ

))2),

c2 ≡ −b+ ρ

2

(1 + tanh

2

(ρq2 − ζ

)))(1 +

ϑρq2

2

(1− tanh

2

(ρq2 − ζ

)))),

c3 ≡ (n− 1)ϑ2ρ2

4tanh

(1− tanh

2

(ρq2 − ζ

))2),

c4 ≡ −ρϑ4

(1− tanh

2

(ρq2 − ζ

))2)(

1− ϑρq2 tanh

2

(ρq2 − ζ

))).

54

Page 55: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

We next consider the limit ϑ→ ∞. Using the fact that

(1− tanh

2

(ρq2 − ζ

))2)

=

(b+ 2ν∗ − η∗

q

)(ρ− b− 2ν∗ +

η∗

q

),

andρ

2tanh

2

(ρq2 − ζ

))= b + 2ν∗ − η∗

q− ρ

2,

where ν∗ = νn−1 and η∗ = η

n−1 , we can write

∂ζ

∂2H

∂q2i

∣∣∣∣qi=q

= (n− 1)ϑ2(b+ 2ν∗ − η∗

q

)(ρ− b− 2ν∗ +

η∗

q

)(2

ρ

(b+ 2ν∗ − η∗

q

)− 1

), (62)

and

∂ζ

∂2H

∂qi∂qj

∣∣∣∣qi=qj=q

= −ϑρ

(b+ 2ν∗ − η∗

q

)(ρ− b− 2ν∗ +

η∗

q

)(1− 2ϑq2

(b+ 2ν∗ − η∗

q− ρ

2

)). (63)

Finally, in the limit of ϑ→ ∞, for both, the low equilibrium, where q = η∗

b+2ν∗ and therefore η∗

q= b+2ν∗,

as well as the high equilibrium, where q = η∗

b+2ν∗−ρ , andη∗

q= b+2ν∗ − ρ we find from Equation (62) that

∂ζ

∂2H

∂q2i

∣∣∣∣qi=q

= 0,

and from Equation (63) we get∂

∂ζ

∂2H

∂qi∂qj

∣∣∣∣qi=qj=q

= 0.

Hence, we find that in the high equilibrium limϑ→∞ Eϑ(m) = n(n−1)

2 , while in the low equilibrium

limϑ→∞ Eϑ(m) = 0. Consequently, the expected average degree in the high equilibrium is limϑ→∞ E

ϑ(1n

∑ni=1 di

)=

n− 1, where we have a complete graph, and zero in the low equilibrium where we obtain an empty graph.

Remark 3. We note that it is also possible to compute other statistics of interest using the log-partitionfunction lnZϑ. Noting that the potential function can be written as Φ(q, G) = θ⊤X(q, G), with the vectorof parameters θ = ϑ(η, ν, b, ρ, ζ)⊤ and the data vector X(q, G) from Equation (53) we have that (see e.g.Proposition 3.1 in Wainwright and Jordan [2008])

∂ lnZϑ

∂θi= E

ϑ(Xi),

∂2 lnZϑ

∂θi∂θj= E

ϑ(XiXj)− Eϑ(Xi)E

ϑ(Xj).

From this it follows that the expected number of links is Eϑ(m) = ∂ lnZϑ

∂θ5= 1

ϑ∂Fϑ

∂ζ, the variance is Varϑ =

∂2 lnZϑ

∂θ25= 1

ϑ2∂2

∂ζ2, the average output is E

ϑ(1n

∑ni=1 qi

)= 1

n∂ lnZϑ

∂θ1= − 1

nϑ∂Fϑ

∂ηand the variance is

Varϑ(1n

∑ni=1 qi

)= 1

n2∂2 lnZϑ

∂θ21= − 1

n2ϑ2∂2

∂η2.

Proof of Proposition 8. Let each firm i have an output level q distributed identically and indepen-dently with density µϑ(q) given by Equation (59) and converging to δ(q−q∗) in the limit ϑ→ ∞. Further,let the probability of a link between a firm with output q and a firm with output q′ be given by

p(q, q′) =eϑ(ρqq

′−ζ)

1 + eϑ(ρqq′−ζ)=

g(q, q′)

1 + g(q, q′),

where we have denoted by g(q, q′) ≡ eϑ(ρqq′−ζ). We then have a special case of an inhomogeneous ran-

dom graph with linking probability p(q, q′) [cf. Boguna and Pastor-Satorras, 2003; Britton et al., 2006;

55

Page 56: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

Soderberg, 2002]. The probability of observing the network G, given the output levels q is then given by

P (G|q) =n∏

i=1

n∏

j=i+1

(g(qi, qj)

1 + g(qi, qj)

)aij ( 1

1 + g(qi, qj)

)1−aij

=n∏

i=1

n∏

j=i+1

1

1 + g(qi, qj)

n∏

i=1

n∏

j=i+1

g(qi, qj)aij ,

which can be written as

P (G|q) = γ(q)

n∏

i=1

n∏

j=i+1

g(qi, qj)aij ,

with the normalizing constant

γ(q) ≡n∏

i=1

n∏

j=i+1

(1 + g(qi, qj)) .

Since∑G∈Gn P (G|q) = 1, γ(q) can also be written as

γ(q) =∑

G∈Gn

n∏

i=1

n∏

j=i+1

g(qi, qj)aij .

Next, we consider the probability generating function of the vector of degrees, (di(G))ni=1, given by

E

(n∏

i=1

xdi(G)i

∣∣∣∣∣q)

= E

n∏

i=1

n∏

j=i+1

(xixj)aij

∣∣∣∣∣∣q

(64)

=∑

G∈Gn

P (G|q)n∏

i=1

n∏

j=i+1

(xixj)aij

=1

γ(q)

G∈Gn

n∏

i=1

n∏

j=i+1

g(qi, qj)aij

n∏

i=1

n∏

j=i+1

(xixj)aij

=1

γ(q)

G∈Gn

n∏

i=1

n∏

j=i+1

(g(qi, qj)xixj)aij

=

∑G∈Gn

∏ni=1

∏nj=i+1 (g(qi, qj)xixj)

aij

∏ni=1

∏nj=i+1(1 + g(qi, qj))

=

n∏

i=1

n∏

j=i+1

1 + g(qi, qj)xixj1 + g(qi, qj)

, (65)

where we have used the fact that∑

G∈Gn

∏ni=1

∏nj=i+1 (g(qi, qj)xixj)

aij =∏ni=1

∏nj=i+1 (1 + g(qi, qj)xixj).

To compute the generating function of d1(G), we simply set xi = 1 for all i > 1. Then

E

(xd1(G)1

)= E

(E

(xd1(G)1

∣∣∣ q1))

= E

E

n∏

j=2

1 + g(q1, qj)x11 + g(q1, qj)

∣∣∣∣∣∣q1

= E

((E

(1 + g(q1, q2)x11 + g(q1, q2)

∣∣∣∣ q1))n−1

),

where we have used symmetry and the independence of q1, . . . , qn. Further, note that

1 + xy

1 + x= 1+ (y − 1)x+O

(x2).

56

Page 57: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

Hence, for g(q1, q2) small in the sparse graph limit, we can write

E

(1 + g(q1, q2)x11 + g(q1, q2)

∣∣∣∣ q1)

=

Q

1 + g(q1, q2)x11 + g(q1, qj)

µϑ(dq2)

= 1 + (x1 − 1)

Q

g(q1, q2)µϑ(dq2) + o(1)

= 1 + (x1 − 1)ν(q1) + o(1),

where we have denoted by ν(q) ≡∫Q g(q, q

′)µϑ(dq′). It then follows that

E

(xd1(G)1

)= E

((1 + (x1 − 1)ν(q1))

n−1)(1 + o(1))

= E

(e(x1−1)(n−1)ν(q1)

)(1 + o(1)) ,

where we have used the fact that e(x1−1)ν(q) = 1 + (x1 − 1)ν(q) + o(1). This is the probability gener-ating function of a mixed Poisson random variable with mixing parameter ν(q) [cf. e.g. Van Der Hof-stad, 2009]. In particular, since p(q, q′) = g(q, q′) + o(1), we can write nν(q) = n

∫Q p(q, q

′)µϑ(dq′) =∑nj=1

∫Q p(q, qj)µ

ϑ(dqj) =∑n

j=1 P (a1j = 1| q1 = q) = E (d1(G)| q1 = q), which is the expected degree of a

firm with output q, and we denote it by d(q). Further, it then follows that

E

(xd1(G)1

)=

n∑

k=0

xk1P (d1(G) = k)

= E

(e(x1−1)d(q1)

)(1 + o(1))

= E

(e−d(q1)

n∑

k=0

(x1d(q1))k

k!

)(1 + o(1))

n∑

k=0

xk1E

(e−d(q1)d(q1)

k

k!

)(1 + o(1)) .

Let the empirical degree distribution be given by Pϑ(k) = 1n

∑ni=1 1di(G)=k, and denote by Pϑ(k) ≡

E(Pϑ(k)

). Then we have that

Pϑ(k) = P (d1(G) = k) = E

(e−d(q1)d(q1)

k

k!

)(1 + o(1)) .

When the probability measure µϑ concentrates on q∗, that is limϑ→∞ µϑ = δ(q − q∗), we obtain

limϑ→∞

Pϑ(k) =e−d(q

∗)d(q∗)k

k!(1 + o(1)) ,

where d(q∗) = np(q∗, q∗), that is, a Poisson distribution with parameter d(q∗). To show independence ofthe degrees, let xi = 1 for all i > m > 1 in the generating function of Equation (65). Then we find that

E

(m∏

i=1

xdi(G)i

)=

m∏

i=1

E

(e(xi−1)d(qi)

)(1 + o(1)) .

Hence, the joint generating function of (d1(G), . . . , dm(G)) factorizes into the product of generating func-tions of mixed Poisson random variables, and thus the di(G) are asymptotically independent.

Proof of Proposition 9. The average nearest neighbor degree is defined as [cf. Boguna and Pastor-Satorras, 2003]

knn(k) =n∑

k′=1

k′P (d2(G) = k′ − 1|a12 = 1, d1(G) = k) .

57

Page 58: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

Using the fact that

P (d2(G) = k′ − 1|a12 = 1, d1(G) = k) =P (d2(G) = k′ − 1, d1(G) = k| a12 = 1)

P (d1(G) = k),

and denoting by P (k) = P (d1(G) = k), we obtain

knn(k) =1

P (k)

n∑

k′=1

k′P (d2(G) = k′ − 1, d1(G) = k| a12 = 1)

=1

P (k)

n∑

k′=1

k′∫

Q

dq

Q

dq′P (d2(G) = k′ − 1, d1(G) = k|a12 = 1, q1 = q, q2 = q′)

× P (q1 = q, q2 = q′| a12 = 1)

=1

P (k)

n∑

k′=1

k′∫

Q

dq

Q

dq′P (d2(G) = k′ − 1| q2 = q′)P (d1(G) = k| q1 = q)

× P (q2 = q′| a12 = 1, q1 = q)µϑ(q),

where µϑ(q) = P (q1 = q). Further, from Bayes’ theorem it follows that

P (d1(G) = k| q1 = q)µϑ(q1)

P (d1(G) = k)= P (q1 = q| d1(G) = k) ,

so that we can write

knn(k) =

n∑

k′=1

k′∫

Q

dq

Q

dq′P (d2(G) = k′ − 1| q2 = q′)P (q1 = q| d1(G) = k)P (q2 = q′| a12 = 1, q1 = q) .

Next, we have that

P (q2 = q′| a12 = 1, q1 = q) =P (a12 = 1| q1 = q, q2 = q′)µϑ(q′)

P (a12 = 1| q1 = q)

=P (a12 = 1| q1 = q, q2 = q′)µϑ(q′)∫

Q P (a12 = 1| q1 = q, q2 = q′′)µϑ(dq′′),

and we get

knn(k) =

n∑

k′=1

k′∫

Q

dq

Q

dq′P (d2(G) = k′ − 1| q2 = q′)P (q1 = q| d1(G) = k)

× P (a12 = 1| q1 = q, q2 = q′)µϑ(q′)∫Q P (a12 = 1| q1 = q, q2 = q′′)µϑ(dq′′)

.

In the following we denote by g(k|q) ≡ P (d1(G) = k| q1 = q), and let

p(q′|q) ≡ P (a12 = 1| q1 = q, q2 = q′)µϑ(q′)∫Q P (a12 = 1| q1 = q, q2 = q′′)µϑ(dq′′)

=p(q, q′)µϑ(q′)∫

Q p(q, q′′)µϑ(dq′′)

=p(q, q′)µϑ(q′)

d(q),

where we have denoted by d(q) ≡∫Q p(q, q

′′)µϑ(dq′′). With this notation we can write

P (q1 = q| d1(G) = k) =1

P (k)P (d1(G) = k| q1 = q)µϑ(q) =

g(k|q)µϑ(q)P (k)

,

58

Page 59: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

so that

knn(k) =1

P (k)

n∑

k′=1

k′∫

Q

dq

Q

dq′g(k′ − 1|q′)g(k|q)p(q′|q)µϑ(q)

= 1 +1

P (k)

n∑

k′=1

k′∫

Q

dq

Q

dq′g(k′|q′)g(k|q)p(q′|q)µϑ(q)

= 1 +1

P (k)

Q

dqg(k|q)µϑ(q)∫

Q

dq′p(q′|q)n∑

k′=1

k′g(k′|q′).

Denoting by d(q) ≡ n∫Q p(q, q

′)µϑ(dq′) =∑n

k′=1 k′g(k′|q′), and

knn(q) ≡∫

Q

dq′p(q′|q)n∑

k′=1

k′g(k′|q′) =∫

Q

dq′p(q′|q)d(q′),

we can write the average nearest neighbor connectivity in the form introduced in Boguna and Pastor-Satorras [2003], which is given by

knn(k) = 1 +1

P (k)

Q

dqg(k|q)µϑ(q)knn(q).

When the probability measure µϑ(q) concentrates on q∗, that is limϑ→∞ µϑ(q) = δ(q − q∗), we obtain

d(q) = n

Q

dq′p(q, q′)δ(q − q∗) = np(q, q∗),

knn(q) =

Q

dq′np(q′, q∗)np(q, q′)δ(q′ − q∗)

np(q, q∗)p(q′|q)δ(q − q∗) = np(q∗, q∗),

P (k) = Pois(k, np(q∗, q∗)),

g(k|q) = Pois(k, np(q, q∗)),

so that

knn(k) = 1 +1

P (k)

Q

dqg(k|q)δ(q − q∗)knn(q)

= 1 +1

P (k)g(k|q∗)knn(q∗)1 +

Pois(k, np(q∗, q∗))np(q∗, q∗)

Pois(k, np(q∗, q∗))

= 1 + np(q∗, q∗)

= 1 + d(q∗).

Thus, the average nearest neighbor degree is constant, and independent of the degree k.

Proof of Proposition 10. The clustering coefficient of firm 1 with output q1 = q is given by

C(q) ≡ P (a23 = 1|a12 = 1, a13 = 1, q1 = q)

=

Q

dq′∫

Q

dq′′P (a23 = 1|a12 = 1, a13 = 1, q1 = q, q2 = q′, q3 = q′′)

× P (q2 = q′| a12 = 1, q1 = q)P (q3 = q′′| a13 = 1, q1 = q)

=

Q

dq′∫

Q

dq′′P (a23 = 1| q2 = q′, q3 = q′′)

× P (a12 = 1| q2 = q′, q1 = q)µϑ(q′)

P (a12 = 1| q1 = q)

P (a13 = 1| q3 = q′′, q1 = q)µϑ(q′′)

P (a13 = 1| q1 = q)

=

Q

dq′∫

Q

dq′′P (a23 = 1| q2 = q′, q3 = q′′)

× P (a12 = 1| q2 = q′, q1 = q)µϑ(q′)∫Qdq′′′P (a12 = 1| q1 = q, q2 = q′′′)µϑ(q′′′)

P (a13 = 1| q3 = q′′, q1 = q)µϑ(q′′)∫Qdq′′′P (a13 = 1| q1 = q, q3 = q′′′)µϑ(q′′′)

.

59

Page 60: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

In the following we denote by

p(q′|q) ≡ P (a12 = 1| q2 = q′, q1 = q)µϑ(q′)∫Q dq

′′P (a12 = 1| q1 = q, q2 = q′′)µϑ(q′′)=

p(q, q′)µϑ(q′)∫Q dq

′′p(q, q′′)µϑ(q′′)=np(q, q′)µϑ(q′)

nd(q),

so that we can write

C(q) =

Q

dq′∫

Q

dq′′p(q′, q′′)p(q′|q)p(q′′|q).

The clustering coefficient of a firm with degree k can then be written as

C(k) = P (a23 = 1| a12 = 1, a13 = 1, d1(G) = k)

=

Q

dqP (a23 = 1| a12 = 1, a13 = 1, q1 = q)P (q1 = q| d1(G) = k)

=

Q

dqP (q1 = q| d1(G) = k) C(q)

=

Q

dqP (d1(G) = k| q1 = q)µϑ(q)

P (d1(G) = k)C(q)

=1

P (k)

Q

dqg(k|q)µϑ(q)C(q),

which is the general form we can find in Boguna and Pastor-Satorras [2003].When the probability measure µϑ concentrates on q∗, that is limϑ→∞ µϑ = δ(q − q∗), we obtain

d(q) = n

Q

dq′p(q, q′)δ(q − q∗) = np(q, q∗),

p(q′|q) = p(q, q′)δ(q′ − q∗)

p(q, q∗),

P (k) = Pois(k, np(q∗, q∗)),

so that

C(q) =

Q

dq′∫

Q

dq′′p(q′, q′′)p(q, q′)δ(q′ − q∗)

p(q, q∗)

p(q, q′′)δ(q′′ − q∗)

p(q, q∗)= p(q∗, q∗),

and

C(k) =1

P (k)

Q

dqPois(k, np(q, q∗))δ(q − q∗)p(q∗, q∗)

=Pois(k, np(q∗, q∗))

Pois(k, np(q∗, q∗))p(q∗, q∗)

= p(q∗, q∗)

=d(q∗)

n,

which is constant and independent of the degree k, and decreasing with increasing n.

Figure 10 shows the average nearest neighbor connectivity, knn(d), and the clustering degreedistribution, C(d), in the stationary state. Both distributions approach a constant in the infinitenetwork size limit.

Proof of Proposition 11. The only network dependent part in W (q, G) is the potential functionΦ(q, G). For a given vector of outputs q the network that maximizes the potential is the thresholdgraph G where each link ij ∈ G if and only if ρqiqj > ζ. Hence, we can write welfare reduced to this classof networks as follows

W (q) = ηn∑

i=1

qi +1− 2ν

2

n∑

i=1

q2i −b

2

n∑

i=1

n∑

j 6=i

qiqj +n∑

i=1

j 6=i

(ρqiqj − ζ)1ρqiqj>ζ.

Note that

1ρqiqj>ζ = limϑ→∞

1

2(1 + tanh (ϑ(ρqiqj − ζ))) .

60

Page 61: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

d

knn(d)

100

101

102

100

101

d

C(d)

100

101

102

10−3

10−2

10−1

100

Figure 10: (Left panel) The average nearest neighbor connectivity knn(d), with a dashed lines indicating d and1 + d in the stationary state. (Right panel) The clustering degree distribution C(d) in the stationary state. Bothdistributions approach a constant in the infinite network size limit.

We then can write welfare as limϑ→∞Wϑ, where

Wϑ(q) = η

n∑

i=1

qi +1− 2ν

2

n∑

i=1

q2i −b

2

n∑

i=1

n∑

j 6=i

qiqj +1

2

n∑

i=1

j 6=i

(ρqiqj − ζ) (1 + tanh (ϑ(ρqiqj − ζ))) .

The partial derivative with respect to qi is given by

∂Wϑ

∂qi= η(1−2ν)qi−b

j 6=i

qj+ρ∑

j 6=i

qj tanh (ϑ(ρqiqj − ζ))+ϑρ∑

j 6=i

qj(ρqiqj−ζ)(1− tanh (ϑ(ρqiqj − ζ))

2).

Taking the limit ϑ→ ∞ and using the fact that

limϑ→∞

tanh (ϑx) =

1, if x > 0,0, if x < 0,

andlimϑ→∞

ϑ(1− tanh(ϑx)) = 0,

we can write the FOC as53

limϑ→∞

∂Wϑ

∂qi= η + (1 − 2ν)qi − b

n∑

j 6=i

qj + ρ

n∑

j 6=i

qj1ρqiqj>ζ = 0.

We next consider a symmetric solution qi = q for all i = 1, . . . , n with the property that ρq2 > ζ. Then

η + (1− 2ν)q + (ρ− b)(n− 1)q = 0,

from which we deduce that

q =η∗

b+ 2ν∗ − ρ,

where we have denoted by η∗ = η/(n− 1) and ν∗ = ν/(n− 1). The corresponding network is a completegraph, Kn. Further, for a symmetric solution with ρq2 < ζ we must have that

η + (1− 2ν)q − b(n− 1)q = 0,

from which we deduce that

q =η∗

b+ 2ν∗.

53Note that W ϑ is a real valued function that converges pointwise and whose derivatives converge uniformly ona closed interval [0, q] so that we can exchange the derivative with the limit.

61

Page 62: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

5 10 15 20 25

4

6

8

10

12

14

16

18

n1

WHn

1L

Ζ1

Ζ2

Ζ3

5 10 15 20 25

-14

-12

-10

-8

-6

-4

-2

0

n1

Figure 11: (Left panel) The function W (n1) in Equation (67) for varying values of n1 with ζ = 0.1, b = 0.05,ν = 10, ρ = 0.01 and η = 10. (Right panel) The LHS and RHS of Equation (68) for varying values of n1 and thesame parameter values but three different values of ζ. Dashed lines indicate the W (n1) maximizing value of n1.

The corresponding network is an empty graph, Kn.Next, we consider a solution with two output levels q1 > 0 and q2 = 0 such that ρq21 > ζ.54 Assume

that the number of firms with output q1 > 0 is given by n1. The FOC are then given by

η + (1− 2ν)q1 − b(n1 − 1)q1 + ρ(n1 − 1)q1 = 0,

from which we obtainq1 =

η

(b− ρ)(n1 − 1) + 2ν − 1. (66)

Welfare can be written as

W = ηn1q1 +1

2n1(1− 2ν − (n1 − 1)b)q21 + n1(n1 − 1)(ρq21 − ζ).

Inserting the output levels q1 from Equation (66) yields

W =η2n1(b(n1 − 1) + 2ν − 1)

2((b− ρ)(n1 − 1) + 2ν − 1)2− ζn1(n1 − 1). (67)

The corresponding network is then the dominant group architecture Dn1,n−n1 consisting of a clique of sizen1 in which all firms are connected and all remaining firms being isolated, and n1 maximizes Equation(67), and satisfying

ρη2

((b− ρ)(n1 − 1) + 2ν − 1)2> ζ.

The function W (n1) is concave, and its maximum is the solution to the following equation

− ζ(2n1 − 1) =η2(ρ((b − 2ν + 1)(b(n− 1) + 2ν − 1)− b(n1 − 1) + (2ν − 1)(n+ 1)))

2((b− ρ)(n1 − 1) + 2ν − 1)3. (68)

We observe that the RHS of Equation (68) is independent of ζ, while the LHS is linear in ζ and decreasing.Hence, the intersection where the LHS and the RHS are equal is decreasing with increasing ζ, that is, theoptimal size n1 of the clique is decreasing with the linking cost ζ. This is illustrated in Figure 11. Similarcomparative statics results for ρ, η and b can be derived. The optimal size n1 maximizing Equation (67)for varying values of ζ, b, ρ and η is illustrated in Figure 12. The above results can be summarized inEquation (26).

54One can show that a solution with q2 > 0 can be ruled out.

62

Page 63: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

0.2 0.4 0.6 0.8 1.02

4

6

8

Ζ

n 1

0.00 0.02 0.04 0.06 0.08 0.10

12.5

13.0

13.5

14.0

b

n 1

0.00 0.02 0.04 0.06 0.08 0.1013

14

15

16

17

Ρ

n 1

0 2 4 6 8 100

2

4

6

8

10

12

Η

n 1

Figure 12: The optimal size n1 maximizing Equation (67) for varying values of ζ, b, ρ and η with n = 100.

63

Page 64: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

Supplement to “Network Formation with Local Complements and Global

Substitutes: The Case of R&D Networks”

Chih-Sheng Hsieha, Michael D. Konigb, Xiaodong Liuc

aDepartment of Economics, Chinese University of Hong Kong, CUHK Shatin, Hong Kong, China.bDepartment of Economics, University of Zurich, Schonberggasse 1, CH-8001 Zurich, Switzerland.

cDepartment of Economics, University of Colorado Boulder, Boulder, Colorado 80309–0256, United States.

B. Definitions and Characterizations

A network (graph) G is the pair (N , E) consisting of a set of nodes (vertices) N = 1, . . . , n and aset of edges (links) E ⊂ N × N between them. A link (i, j) is incident with nodes i and j. Theneighborhood of a node i ∈ N is the set Ni = j ∈ N : (i, j) ∈ E. The degree di of a node i ∈ Ngives the number of links incident to node i. Clearly, di = |Ni|. Let N (2)

i =⋃j∈Ni

Nj\ (Ni ∪ i)denote the second-order neighbors of node i. Similarly, the k-th order neighborhood of node i is

defined recursively from N (0)i = i, N (1)

i = Ni and N (k)i =

⋃j∈N

(k−1)i

Nj\(⋃k−1

l=0 N (l)i

). A walk in G

of length k from i to j is a sequence 〈i0, i1, . . . , ik〉 of nodes such that i0 = i, ik = j, ip 6= ip+1, andip and ip+1 are (directly) linked, that is ipip+1 ∈ E, for all 0 ≤ p ≤ k − 1. Nodes i and j are said tobe indirectly linked in G if there exists a walk from i to j in G containing nodes other than i and j.A pair of nodes i and j is connected if they are either directly or indirectly linked. A node i ∈ Nis isolated in G if Ni = ∅. The network G is said to be empty (denoted by Kn) when all its nodesare isolated.

A subgraph, G′, of G is the graph of subsets of the nodes, N (G′) ⊆ N (G), and links, E(G′) ⊆E(G). A graph G is connected, if there is a path connecting every pair of nodes. Otherwise G isdisconnected. The components of a graph G are the maximally connected subgraphs. A componentis said to be minimally connected if the removal of any link makes the component disconnected.

A dominating set for a graph G = (N , E) is a subset S of N such that every node not in Sis connected to at least one member of S by a link. An independent set is a set of nodes in agraph in which no two nodes are adjacent. For example the central node in a star K1,n−1 formsa dominating set while the peripheral nodes form an independent set.

Let G = (N , E) be a graph whose distinct positive degrees are d(1) < d(2) < . . . < d(k), andlet d0 = 0 (even if no agent with degree 0 exists in G). Further, define Di = v ∈ N : dv = d(i)for i = 0, . . . , k. Then the set-valued vector D = (D0,D1, . . . ,Dk) is called the degree partition ofG. Consider a nested split graph G = (N , E) and let D = (D0,D1, . . . ,Dk) be its degree partition.Then the nodes N can be partitioned in independent sets Di, i = 1, . . . ,

⌊k2

⌋and a dominating

set⋃ki=⌊ k

2 ⌋+1 Di in the graph G′ = (N\D0, E). Moreover, the neighborhoods of the nodes are

nested. In particular, for each node v ∈ Di, Nv =⋃ij=1 Dk+1−j if i = 1, . . . ,

⌊k2

⌋if i = 1, . . . , k, while

Nv =⋃ij=1 Dk+1−j \ v if i =

⌊k2

⌋+ 1, . . . , k.

In a complete graph Kn, every node is adjacent to every other node. The graph in which nopair of nodes is adjacent is the empty graph Kn. A clique Kn′ , n′ ≤ n, is a complete subgraphof the network G. A graph is k-regular if every node i has the same number of links di = k forall i ∈ N . The complete graph Kn is (n − 1)-regular. The cycle Cn is 2-regular. In a bipartite

graph there exists a partition of the nodes in two disjoint sets V1 and V2 such that each linkconnects a node in V1 to a node in V2. V1 and V2 are independent sets with cardinalities n1 andn2, respectively. In a complete bipartite graph Kn1,n2 each node in V1 is connected to each othernode in V2. The star K1,n−1 is a complete bipartite graph in which n1 = 1 and n2 = n− 1.

The complement of a graph G is a graph G with the same nodes as G such that any two nodesof G are adjacent if and only if they are not adjacent in G. For example the complement of thecomplete graph Kn is the empty graph Kn.

Let A be the symmetric n × n adjacency matrix of the network G. The element aij ∈ 0, 1indicates if there exists a link between nodes i and j such that aij = 1 if (i, j) ∈ E and aij = 0 if

1

Page 65: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

(i, j) /∈ E. The k-th power of the adjacency matrix is related to walks of length k in the graph. Inparticular,

(Ak)ijgives the number of walks of length k from node i to node j. The eigenvalues of

the adjacency matrix A are the numbers λ1, λ2, . . . , λn such that Avi = λivi has a nonzero solutionvector vi, which is an eigenvector associated with λi for i = 1, . . . , n. Since the adjacency matrixA of an undirected graph G is real and symmetric, the eigenvalues of A are real, λi ∈ R for alli = 1, . . . , n. Moreover, if vi and vj are eigenvectors for different eigenvalues, λi 6= λj , then vi andvj are orthogonal, i.e. v⊤

i vj = 0 if i 6= j. In particular, Rn has an orthonormal basis consistingof eigenvectors of A. Since A is a real symmetric matrix, there exists an orthogonal matrix S

such that S⊤S = SS⊤ = I (that is S⊤ = S−1) and S⊤AS = D, where D is the diagonal matrix ofeigenvalues of A and the columns of S are the corresponding eigenvectors. The Perron-Frobenius

eigenvalue λPF(G) is the largest real eigenvalue of A associated with G, i.e. all eigenvalues λi ofA satisfy |λi| ≤ λPF(G) for i = 1, . . . , n and there exists an associated nonnegative eigenvectorvPF ≥ 0 such that AvPF = λPF(G)vPF. For a connected graph G the adjacency matrix A has aunique largest real eigenvalue λPF(G) and a positive associated eigenvector vPF > 0. There existsa relation between the number of walks in a graph and its eigenvalues. The number of closedwalks of length k from a node i in G to herself is given by

(Ak)iiand the total number of closed

walks of length k in G is tr(Ak)=∑n

i=1

(Ak)ii=∑ni=1 λ

ki . We further have that tr (A) = 0, tr

(A2)

gives twice the number of links in G and tr(A3)gives six times the number of triangles in G.

Finally, a nested split graph is a graph with a nested neighborhood structure such that theset of neighbors of each node is contained in the set of neighbors of each higher degree node[Cvetkovic and Rowlinson, 1990; Mahadev and Peled, 1995]. A nested split graph is characterizedby a stepwise adjacency matrix A, which is a symmetric, binary (n × n)-matrix with elements aijsatisfying the following condition: if i < j and aij = 1 then ahk = 1 whenever h < k ≤ j and h ≤ i.Both, the complete graph, Kn, as well as the star K1,n−1, are particular examples of nested splitgraphs. Nested split graphs are also the graphs which maximize the largest eigenvalue, λPF(G),[Brualdi and Solheid, 1986], and they are the ones that maximize the degree variance [Peled et al.,1999]. See e.g. Konig et al. [2014a] for further properties.

C. Continuous vs. Discrete Multinomial Logit

Consider a discretization Qs = 0,∆q, 2∆q, . . . , q of the interval Q = [0, q] into s subintervals withlength ∆q = q/s. Let the expected profit of firm i from choosing output qi ∈ Qs be given byπi(qi,q−i, G)+ εi. When the error terms εi are independently and identically type-I extreme valuedistributed with parameter ϑ we get [cf. Anderson et al., 1992]

P

(qi = argmax

q′i∈Qs

πi(qi,q−i, G) + εi)

=eϑπi(q

′i,q−i,G)

∑q′i∈Qs

eϑπi(q′i,q−i,G).

Assume that the output adjustment rate is given by χ > 0. Then

P (ωt+∆t = (qi,q−it, Gt)|ωt = (qt, Gt)) = χP

(qi = argmax

q′i∈Qs

πi(q′i,q−i, G) + ε)∆t+ o(∆t)

= χeϑπi(qi,q−it,Gt)

∑q′i∈Qs

eϑπi(q′i,q−it,Gt)∆t+ o(∆t).

2

Page 66: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

The probability to observe a transition to an output level in the set P(q) = qi ∈ Qs : q ≤ qi ≤q +∆q ⊂ Qs with q ∈ [0, q] is then given by

P (ωt+∆t = (q ≤ qi ≤ q +∆q,q−it, Gt)|ωt = (qt, Gt)) = χ

∑qi∈P(q) e

ϑπi(qi,q−it,Gt)

∑qi∈Q e

ϑπi(qi,q−it,Gt)∆t+ o(∆t)

= χ

∑qi∈P(q) e

ϑπi(qi,q−it,Gt)∆q∑

qi∈Q eϑπi(qi,q−it,Gt)∆q

∆t+ o(∆t)

= χeϑπi(q,q−it,Gt)∆q∑

qi∈Q eϑπi(qi,q−it,Gt)∆q

∆t+ o(∆t),

for some q ∈ [q, q +∆q] by the mean value theorem. In the limit of s → ∞ (and ∆q ↓ 0) we thenget for the infinitesimal interval [q, q + dq]

P (ωt+∆t = (q ≤ qi ≤ q + dq,q−it, Gt)|ωt = (qt, Gt)) = χeϑπi(q,q−it,Gt)dq∫

Qeϑπi(q′,q−it,Gt)dq′

∆t+ o(∆t),

which is exactly Equation (12). See also Anderson et al. [2004, 2001]; Ben-Akiva and Watanatada[1981]; McFadden [1976] for further discussion.

D. Equilibrium Analysis for Exogenous Networks

In this section we consider the case of λ = ξ = 0, when the network is exogenously given. Let usfirst assume that ζ = 0, such that firms do not incur a fixed cost for an R&D collaboration. Thenthe best response of firm i can be written as follows (see also Equation (10))

qi = min (q,max (0, fi(q−i, G))) . (69)

Observe that if fi(q−i, G) < 0, then the business stealing effects become so large, that firm i’s bestresponse is to leave the market (qi = 0). If ζ > 0 then firm i’s best response is

qi =

min (q, fi(q−i, G)) , if πi(fi(q−i, G), G) > 0,

0, otherwise.(70)

The set of networks G for which qi > 0 is starkly reduced if ζ > 0, and if ζ becomes large enough,no firm will operate at positive quantity.

We now analyze the best response dynamics of quantities in a fixed network G where firmsadjust their output levels optimally, given the output levels of all other firms in the industry[Corchon and Mas-Colell, 1996; Weibull, 1997]. We also assume that an interior equilibriumexists. This dynamics is given by

dqidt

= fi(q−i, G)− qi =η

2ν− b

n∑

j 6=i

qj +ρ

n∑

j=1

aijqj − qi, (71)

with some appropriate initial conditions q(0) ≥ 0. The equilibrium quantities for a given networkG can be obtained as the fixed points of the best response dynamics. In vector-matrix notationthe dynamics can be written as

dq

dt=

η

2νu− 1

((2ν − b)In + buu⊤ − ρA

)q. (72)

This is an inhomogeneous linear first-order ordinary differential equation with constant coeffi-cients. Let us denote by U = uu⊤ and introduce the matrix

Q ≡ In +b

2ν − bU− ρ

2ν − bA. (73)

3

Page 67: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

The solution of Equation (72) is stable if an only if all eigenvalues of Q have a positive real part.If a stable solution exists and if Q is invertible, then the steady state q∗ = limt→∞ q(t) is given by

q∗ =η

2ν − bQ−1u, (74)

and the solution trajectory is given by

q(t) = q∗ + e−Qt(q(0)− q∗). (75)

If q(0) = 0 then we can writeq(t) =

η

2ν − b

(1− e−Qt

)Q−1u. (76)

We have that Q = In− ρ2ν−bA, and λi(Q) = 1− ρ

2ν−bλi(A). This implies that the stability conditionλmin(Q) < 0 is equivalent to λmax(A) > 2ν−b

ρ.

Observe that the profit function introduced in Equation (8) admits a potential game with acorresponding potential function [cf. Monderer and Shapley, 1996]. This is stated in the followingproposition.

Proposition 12. For a given network G ∈ Gn, the profit function of Equation (8) admits a potentialgame with potential function φ(q,G) : Rn+ × Gn → R given by

φ(q, G) =n∑

i=1

(ηqi − νq2i )−b

2

n∑

i=1

j 6=i

qiqj +ρ

2

n∑

i=1

n∑

j=1

aijqiqj . (77)

The existence of a potential function given in Equation (90) allows us to state a condition for aNash equilibrium in a static network. For a given network G and no fixed linking costs, ζ = 0, aNash equilibrium of our game solves the following constrained optimization problem [cf. Byong-Hun, 1983; Sandholm, 2010]

maxq∈R

n+

φ(q, G)

s.t. ∀i = 1, . . . , n

∂φ

∂qi= 0 and qi > 0, or

∂φ

∂qi≤ 0 and qi = 0. (78)

We can write the potential function φ(q, G) in vector-matrix notation as follows

φ(q, G) = ηu⊤q− 1

2q⊤ ((2ν − b)In + bU− ρA)︸ ︷︷ ︸

=(2ν−b)Q

q. (79)

The Hessian of the potential is given by ∆φ(q, G) =(∂2φ(q,G)∂qi∂qj

)i,j∈N

= −(2ν − b)Q. If 2ν > b and

the matrix Q is positive definite then ∆φ(q, G) < 0 is negative definite.55 The matrix Q is positivedefinite if and only if the matrix B = In − ρ

2ν−bA is positive definite. If B is positive definite thenits inverse B−1 exists and is positive definite. B−1 exists if and only if the following eigenvaluecondition is satisfied

ρ

2ν − b<

1

λmax(A), (80)

where λmax(A) is the largest (real) eigenvalue of the (real and symmetric) adjacency matrix A. Ifthe inequality in (80) is satisfied, then the maximization of φ(q, G) as stated in (78), for a givenG, is a linear-quadratic programming problem [Boyd and Vandenberghe, 2004; Byong-Hun, 1983;

55The n× n matrix Q is positive definite if and only if for all q ∈ Rn+ we have that q⊤Qq > 0. If Q is positive

definite, then all its eigenvalues are positive.

4

Page 68: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

Lee et al., 2005], where φ(q, G) is a concave function of q, and this optimization problem has aunique solution.

In the following we assume that the inequality in (80) is satisfied. Then we can obtain firms’equilibrium quantities and profits as follows:

Proposition 13. Denote by ϕ = ρ2ν−b = αβ

2(2−b)γ−α2 and consider a network G ∈ Gn satisfying ϕ <

1/λmax(q).

(i) If ζ = 0, equilibrium output and profit are given by

qi =nη

2ν + b(‖b(G,ϕ)‖ − 1)bi(G,ϕ) (81)

and

πi =νη2

(2ν + b(‖b(G,ϕ)− 1‖))2 b2i (G,ϕ), (82)

for all i = 1, . . . , n.

(ii) If ζ > 0 and πi > ζdi for all i = 1, . . . , n in Equation (82) then equilibrium quantities are given byEquation (81) and equilibrium profits are given by Equation (82) less the cost of collaboration ζdi.

Observe that, in the limit ϕ ↑ λ−1max, the normalized Bonacich centrality converges to the eigen-

vector centrality v, where qv = λmaxv. This implies that

limϕ↑λ−1

max

qi =η

bvi (83)

and

limϕ↑λ−1

max

πi =η2ν

b2v2i − ζdi, (84)

for all i = 1, . . . , n.Note that if the eigenvalue condition (80) is not satisfied, then corner solutions must be

considered.56

In Figure D.1 we give an example of the evolution of q(t) for the star network K1,n−1 withn = 3. The stationary state q∗ for values of ϕ ≤ λmax(q)

−1 is correctly described by Equation(81). Interestingly, this solution is also correct for values of ϕ > λmax(q)

−1 (see bottom left panelin Figure D.1), unless the equilibrium quantities from Equation (81) grow without bound (seebottom right panel in Figure D.1), and qi(t) grows to its capacity constraint q.

E. Extensions

E.1. Heterogeneous Marginal Production Costs

We consider ex ante heterogeneity among firms in the variable cost ci ≥ 0 [see also Banerjeeand Duflo, 2005], expressing their different technological and organizational capabilities.57 Themarginal cost of production of firm i ∈ N is then given by

ci(e, G) = ci − αei − βn∑

j=1

aijej , (85)

where ei ∈ [0, ei] and α, β ∈ [0, 1]. Requiring that ci ≥ 0 we must have that ci ≥∑nj=1 ej = ne for

all i = 1, . . . , n. Hence, ci is O(n). Similarly, as in the previous sections, the first-order conditions

56See Cabrales et al. [2010] for the case of strategic complements and linking strengths proportional to socializa-tion effort.

57Blundell et al. [1995] argued that because the main source of unobserved heterogeneity in models of innovationlies in the different knowledge stocks with which firms enter a sample, a variable that approximates the build-upof firm knowledge at the time of entering the sample is a particularly good control for unobserved heterogeneity.

5

Page 69: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

0 1 2 3 4 50

0.005

0.01

0.015

0.02

0.025

q

t

q1q2

0 2 4 6 8 100

0.01

0.02

0.03

0.04

q

t

q1q2

0 20 40 60 80 1000

0.1

0.2

0.3

0.4

0.5

0.6

q

t

q1q2

0 20 40 60 80 1000

0.5

1

1.5

2x 10

6

qt

q1q2

Figure D.1: A star network with n = 3, λmax(A) =√2 = 1.41 and no linking costs, i.e. ζ = 0. The top left panel

shows the evolution q(t) for ϕ = 1/3 < λmax(A)−1 = 0.70, the top right panel for ϕ = λmax(A)

−1 = 0.70, thebottom left panel for ϕ = 4/3 > λmax(A)

−1 = 0.70, and the bottom right panel for ϕ = 3/2 > λmax(A)−1 = 0.70.

The dashed lines indicate the solutions from Equation (81).

for efforts imply that ei = max α2γ qi, ei. The non-negativity of marginal cost in the case of an

interior equilibrium then requires that qi ≤ 2γαei for all i = 1, . . . , n. We further assume that

a > max1≤i≤nci. We denote by λ = α2γ . Profits of firm i from Equation (7) then become

πi = (a− ci)︸ ︷︷ ︸ηi

qi − bn∑

j 6=i

qiqj + αqiei + βn∑

j=1

aijqiej − γe2i − ζdi. (86)

Using the fact that in the interior equilibrium ei = λqi, we can write firm i’s profit as

πi = ηiqi − νq2i − bqi∑

j 6=i

qj + ρqi

n∑

j=1

aijqj − ζdi. (87)

We first compute the equilibrium quantities for a given network G. The FOC can be written as

∂πi∂qi

= ηi − 2νqi − b∑

j 6=i

qj + ρ

n∑

j=1

aijqj = ηi − (2ν − b)qi − b‖q‖+ ρ

n∑

j=1

aijqj = 0. (88)

Let η ≡ (η1, . . . , ηn)⊤. Then in vector-matrix notation this is

η = ((2ν − b)In − ρq)q + bu‖q‖.

Denoting by µ ≡ 12ν−bη, φ ≡ ρ

2ν−b and κ ≡ b2ν−b this is

q = (I− φq)−1(µ− κ‖q‖)u.

6

Page 70: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

Following Calvo-Armengol et al. [2009] we define the µ-weighted Bonacich centrality as

bµ(G,φ) = (In − φq)−1µ =

∞∑

k=0

φkqkµ, (89)

where suitable conditions have to be imposed on the vector µ, the parameter φ and the eigenvalueλPF(G). The Bonacich centrality is then simply given by b(G,φ) = bu(G,φ). Then we can write

q = bµ(G,φ) − κ‖q‖bu(G,φ).

Multiplying from the left with u⊤ gives

u⊤q = ‖q‖ = ‖bµ(G,φ)‖ − ν‖q‖‖bu(G,φ)‖.

from which we get

‖q‖ =‖bµ(G,φ)‖

1 + κ‖bu(G,φ)‖.

It follows that equilibrium quantity can be written as

q = bµ(G,φ) −κ‖bµ(G,φ)‖

1 + κ‖bu(G,φ)‖bu(G,φ).

Note that the profit function of Equation (87) admits a potential function Φ: Rn+ × Gn → R

given by

Φ(q, G) =

n∑

i=1

(ηiqi − νq2i )−b

2

n∑

i=1

j 6=i

qiqj +ρ

2

n∑

i=1

n∑

j=1

aijqiqj − ζm, (90)

where m is the number of links in G. A similar equilibrium characterization using a Gibbsmeasure as in the previous sections can thus be obtained. Note further that the Hamiltonian inthe partition function Zϑ =

∑q∈Qn eϑH (q) in the case of heterogeneous marginal costs is given by

H (q) =n∑

i=1

ηiqi − νq2i +

j>i

(1

ϑln(1 + eϑ(ρqiqj−ζ)

)− bqiqj

) .

When ϑ→ ∞ we then can write

limϑ→∞

H (q) =

n∑

i=1

ηiqi − νq2i +

j>i

(ρqiqj − ζ)1ρqiqj>ζ − bqiqj

.

From the maximization of this expression we find that if the capacity constraints qi are bind-ing, then the stochastically stable state will be a threshold graph (nested split graph) in whicha link ij is present if and only if q(i)q(j) >

ζρand quantities are given by the ordered vector

(q(1), q(2), . . . , q(k), 0, . . . , 0) where k = max1 ≤ j ≤ n : q(1)q(j) >ζρ. In the case of finite ϑ we obtain

a generalized threshold graph as they have been studied in Diaconis et al. [2008]; Ide et al. [2010,2009].

Further, observe that for any network G, the Nash equilibrium output levels from Equation(88) can be written as q =

(In − b

2νB+ ρ2νA

)−1η where B is a matrix of ones with zero diagonal.

Denoting by M =(In − b

2νB+ ρ2νA

)−1with elements M = (mij)1≤i,j≤n we can write this as qi =∑n

j=1mijηj . Hence, the output of firm i can be expressed as a weighted sum of the ηj . Then ifηi follows a stable distribution with a power-law tail then also the weighted sum

∑nj=1mijηj will

follow a stable distribution with a power law tail according to a generalized CLT.58 An illustration

58Following the notation in Nolan [2014], let S(α, β, γ, δ; 0) be the class of stable distributions, with an indexof stability or characteristic exponent α ∈ (0, 2], a skewness parameter β ∈ [−1, 1], the scale parameter γ and thelocation parameter δ. A Cauchy(γ, δ) distribution is stable with α = 1, β = 0 and a Levy(γ, δ) distribution is

7

Page 71: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

100

101

10−3

10−2

10−1

100

η

P(η

)

100

101

10−3

10−2

10−1

q

P(q)

Figure E.1: (Left panel) The distribution P (η) of η following a Pareto distribution with exponent 2. (Right panel)The output distribution P (q).

can be seen in Figure E.1.Further, note that in the limit of ϑ→ ∞ the linking probability between two firms with output

levels q and q′, respectively, is given by

limϑ→∞

pϑ(q, q′) = limϑ→∞

eϑ(ρqq′−ζ)

1 + eϑ(ρqq′−ζ)= 1ρqq′>ζ = 1log q+log q′>log( ζ

ρ ). (91)

When the output levels are power-law distributed, with density f(x) = γc

(cx

)γ+1for x > c, where

c > 0 is a lower-cut-off and γ > 0 is a positive parameter, then the log-ouptut levels, ln q, followan exponential distribution with density f(y) = γcγe−γy.59 The linking probability in Equation(91) then induces an inhomogenous random graph identical to the one analyzed in Boguna andPastor-Satorras [2003]. In particular, the authors show that this random graph is characterized bya power-law degree distribution, a negative clustering degree correlation and a decaying averagenearest neighbor degree distribution indicating a dissortative network. In Section ?? we showhow such network characteristics can also be obtained when firms are heterogeneous in terms oftheir marginal collaboration costs.

E.2. Heterogeneous Marginal Collaboration Costs

In the following we assume that the marginal cost of collaboration between firms i and j can bewritten as ζi(G) =

∑nj=1 aijψij , where the marginal cost of a collaboration between firms i and j

is additively separableψij = si + sj ,

and we assume that the cost si is proportional to the inverse of the firm’s productivity, that issi =

1φi, where φi > 0 is the productivity (or efficiency) of firm i [see e.g. Melitz, 2003; Melitz

et al., 2008]. This implies that firms with higher productivity incur lower collaboration costs.

stable with α = 1/2, β = 1. Moreover, let X ∼ S(α, β, γ, δ; 0) with 0 < α < 2, −1 < β ≤ 1. Then as x → ∞,P(X > x) ∼ γαcα(1 + β)x−α, where cα is a function of α. Hence, stable distributions have power law tails,given certain restrictions on α. Now, if Xj ∼ S(αj , βj , γj , δj ; k), j = 1, . . . , n, independent and arbitrary weightsw1, . . . , wn, then the sum w1X1 +w2X2+ . . .+wnXn ∼ S(α, β, γ, δ; 0) where α, β, γ, δ are functions of αj , βj , γj , δj[Nolan, 2014, Equation (1.8)].

59For a monotone transformation Y = g(X) of a random variable X with density fX(x) the density of Y is given

by fY (y) = fX(g−1(y))∣∣∣dg

−1(y)dy

∣∣∣.

8

Page 72: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

The probability of a link between firms i and j is then given by [cf. Graham, 2014, 2015]60

pϑ(qi, si, qj , sj) =eϑ(ρqiqj−si−sj)

1 + eϑ(ρqiqj−si−sj).

Next, we assume that the firms’ productivities are Pareto distributed [cf. e.g. Gabaix, 2009;Konig et al., 2012; Melitz et al., 2008], with density f(x) = γ

c

(cx

)γ+1with x > c, where c > 0 is

a lower-cut-off and γ > 0 is a positive parameter. The complementary distribution function isthen given by F (x) = 1 −

(cx

)γ. It follows that the cost s = 1

φhas the density f(s) = γcγsγ−1 for

s ∈(0, 1

c

), and the cumulative distribution function F (s) = (cs)

γ .The generating function of the degree d1(G) is given by

E

(xd1(G)1

)= E

(E

(xd1(G)1

∣∣∣ q1, s1))

= E

((E

(1 + p(q1, s1, q2, s2)x11 + p(q1, s1, q2, s2)

∣∣∣∣ q1, s1))n−1

),

With the cost distributed as f(s) = γcγsγ−1 for s ∈(0, 1

c

), we can write

E

(xd1(G)1

∣∣∣ q1 = q, s1 = s)= E

(1 + p(q, s, q2, s2)x11 + p(q, s, q2, s2)

∣∣∣∣ q, s)n−1

= (1 + (x1 − 1)E (p(q, s, q2, s2)| q1 = q, s1 = s))n−1

=

(1 + (x1 − 1)

Q

dq′µϑ(q′)

∫ds′γcγ(s′)γ−1p(q, s, q′, s′)

)n−1

=

(1 + (x1 − 1)

Q

dq′µϑ(q′)

∫ds′γcγ(s′)γ−1 eϑ(ρqq

′−s−s′)

1 + eϑ(ρqq′−s−s′)

)n−1

.

In the limit of ϑ→ ∞ we have that

limϑ→∞

pϑ(q, s, q′, s′) = limϑ→∞

eϑ(ρqq′−s−s′)

1 + eϑ(ρqq′−s−s′)= 1ρqq′>s+s′,

so that we can write

E

(xd1(G)1

∣∣∣ q1 = q, s1 = s)=

(1 + (x1 − 1)

Q

dq′µϑ(q′)

∫ds′γcγ(s′)γ−1

1ρqq′>s+s′

)n−1

=

(1 + (x1 − 1)γcγ

∫ ρqq∗−s

0

ds′(s′)γ−1

)n−1

= (1 + (x1 − 1)cγ (ρqq∗ − s)γ)n−1

= e(x1−1)(n−1)cγ(ρqq∗−s)γ .

This is the generating function of a Poisson random variable with expectation and variance givenby d(q, s) ≡ (n− 1)cγ (ρqq∗ − s)

γ . When the cut-off c is small, the variance becomes small, and wecan approximate the Poisson random variable with a constant random variable at the expectedvalue. Making further a continuum approximation,61 where we treat the degree as a continuous

60In particular, Fafchamps and Gubert [2007]; Graham [2014] consider an econometric network formation model

in which the probability of a link between agents i and j is given by P(aij = 1) = eXi+Xj+Z⊤

ijβ

1+eXi+Xj+Z⊤

ijβ

where Xi is an

agent specific fixed effect and Zij is a vector of pair specific covariates. Similarly, Chatterjee et al. [2011] analyze

a network formation model with linking probability P(aij = 1) = eXi+Xj

1+eXi+Xj

.61This is an approximation which has shown to be accurate in various network formation models as the network

size becomes large [Dorogovtsev and Mendes, 2013, pp. 117].

9

Page 73: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

variable, we can write

P (d1(G) = k| q1 = q, s1 = s) = δ(k − d(q, s)

)= δ (k − (n− 1)cγ (ρqq∗ − s)

γ) .

Note that under the continuum approximation there exists a one-to-one mapping from the degreek to the cost s, where for a given k and output q, the cost s is given by

s = ρqq∗ −(

k

(n− 1)cγ

) 1γ

.

Using the fact that62

δ(k − (n− 1)cγ

(ρ(q∗)2 − s

)γ)= δ

(s−

(ρ(q∗)2 −

(k

(n− 1)cγ

) 1γ

))1

γk

(k

(n− 1)cγ

) 1γ

. (92)

and assuming that the output distribution concentrates on q∗, the degree distribution is given by

P (k) =

∫dsP (d1(G) = k| q1 = q∗, s1 = s) f(s)

= γcγ∫dsδ

(s− (n− 1)cγ

(ρ(q∗)2 − s

)γ)sγ−1

= γcγ∫dsδ

(k −

(ρ(q∗)2 −

(k

(n− 1)cγ

) 1γ

))1

γk

(k

(n− 1)cγ

) 1γ

sγ−1

=cγ

k

(ρ(q∗)2 −

(k

(n− 1)cγ

) 1γ

)γ−1(k

(n− 1)cγ

) 1γ

= O(k−

γ−1γ

), (93)

Hence, we obtain a power-law degree distribution with parameter γ−1γ, consistent with previous

empirical studies which have found power-law degree distributions in R&D alliance networks [e.g.Powell et al., 2005]. An illustration can be seen in Figure E.2 for the case of γ = 2 and n = 200

firms.Next we compute the average nearest neighbor degree distribution

knn(k) = 1 +1

P (k)

∫ds

Q

dqf(s)µϑ(q)g(k|q, s)knn(q, s),

where

knn(q, s) =

∫ds′∫

Q

dq′p(q′, s′|q, s)d(q′, s′),

g(k|q, s) = P (d1(G) = k| q1 = q, s1 = s) = δ(k − d(q, s)

),

d(q, s) = (n− 1)cγ (ρqq∗ − s)γ

p(q′, s′|q, s) = (n− 1)p(q, s, q′, s′)f(s′)µϑ(q′)

d(q, s),

limϑ→∞

pϑ(q, s, q′, s′) = limϑ→∞

eϑ(ρqq′−s−s′)

1 + eϑ(ρqq′−s−s′)= 1ρqq′>s+s′

µϑ(q) = δ(q − q∗)

f(s) = γcγsγ−1.

62When g(x) is a continuously differentiable function in R it holds that δ(g(x)) =∑m

i=1δ(x−xi)|g′(xi)|

where the m

roots xi satisfy g(xi) = 0 for all i = 1, . . . ,m.

10

Page 74: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

10−1

100

101

102

10−6

10−4

10−2

100

s

P(s)

100

101

102

103

10−4

10−3

10−2

10−1

100

d

P(d)

100

101

102

103

101

102

d

knn(d)

100

101

102

103

10−1

100

d

C(d)

Figure E.2: (Top left panel) The empirical and the theoretical cumulative cost distribution F (s) = (cs)γ withγ = 2 and c = 0.02. The empirical distribution is indicated with circles and the theoretical distribution with adashed line. (Top right panel) The degree distribution P (d). The dashed line indicates the theoretical predictionof Equation (93). (Bottom left panel) The average nearest neighbor degree distribution knn(d), decreasing withincreasing degrees d and thus indicating a dissortative network. The dashed line indicates the theoretical predictionof Equation (94). (Bottom right panel) The clustering degree distribution C(d), decreasing with increasing degreed. The parameters used are b = 0.75, ν = 1 and ρ = 1. The distributions are computed across 10 independentsimulation runs with n = 200 firms.

11

Page 75: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

It then follows that

p(q′, s′|q, s) = (n− 1)1ρqq′>s+s′γcγ(s′)γ−1δ(q′ − q∗)

d(q, s),

and therefore

knn(q, s) =n− 1

d(q, s)

∫ds′f(s′)

∫dq′δ(q − q∗)1ρqq′>s+s′d(q

′, s′)

=n− 1

d(q, s)

∫ds′f(s′)1ρqq∗>s+s′d(q

∗, s′)

=n− 1

d(q, s)

∫ ρqq∗−s

0

ds′f(s′)d(q∗, s′)

= c−γ(ρqq∗ − s)−γ∫ ρqq∗−s

0

ds′γcγ(s′)γ−1(n− 1)cγ(ρ(q∗)2 − s′)−γ

=(n− 1)γcγJ(q, s)

(ρqq∗ − s)γ,

where we have denoted by

J(q, s) ≡∫ ρqq∗−s

0

ds′(ρ(q∗)2 − s′)γ(s′)γ−1.

We then get

knn(k) = 1 +1

P (k)

∫ds

Q

dqf(s)δ(q − q∗)δ(k − d(q, s)

)knn(q, s)

= 1 +1

P (k)

∫dsγcγsγ−1δ

(k − d(q∗, s)

)knn(q

∗, s)

= 1 +1

P (k)

∫dsf(s)δ

(k − d(q∗, s)

) (n− 1)γcγJ(q∗, s)

(ρqq∗ − s)γ.

Using Equation (92) we can write this as

knn(k) = 1 +1

P (k)γcγ

(ρ(q∗)2 −

(k

(n− 1)cγ

) 1γ

)γ−11

γk

(k

(n− 1)cγ

) 1γ

× (n− 1)cγγJ

(q∗, ρ(q∗)2 −

(k

(n− 1)cγ

) 1γ

)

= 1 +(n− 1)2γc2γ

kJ

(q∗, ρ(q∗)2 −

(k

(n− 1)cγ

) 1γ

)

= 1 +(n− 1)2γc2γ

k

∫ ( k(n−1)cγ )

0

ds′(s′)γ−1(ρ(q∗)2 − s′)γ . (94)

Figure E.2 shows the results from numerical simulations compared with the theoretical predictionof Equation (94).

Next we analyze the clustering coefficient of a firm with degree k, which can be written as

C(k) =1

P (k)

∫ds

Q

dqf(s)δ(q − q∗)g(k|q, s)C(q, s)

=1

P (k)

∫dsf(s)g(k|q∗, s)C(q∗, s),

where

C(q∗, s) =

∫ds′∫ds′′

Q

dq′∫

Q

dq′′p(q′, s′, q′′, s′′)p(q′, s′|q∗, s)p(q′′, s′′|q∗, s).

12

Page 76: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

This can further be written as follows

C(q∗, s) =

∫ds′∫ds′′

Q

dq′∫

Q

dq′′1ρq′q′′>s′+s′′

× (n− 1)1ρq∗q′>s+s′f(s′)δ(q′ − q∗)

d(q∗, s)

(n− 1)1ρq∗q′′>s+s′′f(s′′)δ(q′′ − q∗)

d(q∗, s)

=(n− 1)2

d(q∗, s)2

∫ds′f(s′)

∫ds′′f(s′′)1ρ(q∗)2>s′+s′′1ρ(q∗)2>s+s′1ρ(q∗)2>s+s′′

=(n− 1)2

d(q∗, s)2

(1

s>ρ(q∗)2

2

∫ ρ(q∗)2−s

0

ds′f(s′)

∫ ρ(q∗)2−s

0

ds′′f(s′′)

+1s<

ρ(q∗)2

2

(∫ s

0

ds′f(s′)

∫ ρ(q∗)2−s

0

ds′′f(s′′) +

∫ ρ(q∗)2−s

s

ds′f(s′)

∫ ρ(q∗)2−s′

0

ds′′f(s′′)

)).

We then get (see also Figure E.3)

C(q∗, s) =γ2c2γ(n− 1)2

d(q∗, s)

(1

s<ρ(q∗)2

2

∫ ρ(q∗)2−s

0

ds′(s′)γ−1

∫ ρ(q∗)2−s

0

ds′′(s′′)γ−1

+1s>

ρ(q∗)2

2

(∫ s

0

ds′(s′)γ−1

∫ ρ(q∗)2−s

0

ds′′(s′′)γ−1 +

∫ ρ(q∗)2−s

s

ds′(s′)γ−1

∫ ρ(q∗)2−s′

0

ds′′(s′′)γ−1

))

=γc2γ(n− 1)2

d(q∗, s)

(1

s<ρ(q∗)2

2

∫ ρ(q∗)2−s

0

ds′(s′)γ−1(ρ(q∗)2 − s

+1s>

ρ(q∗)2

2

(∫ s

0

ds′(s′)γ−1(ρ(q∗)2 − s

)γ+

∫ ρ(q∗)2−s

s

ds′(s′)γ−1(ρ(q∗)2 − s′

)γ))

=γc2γ(n− 1)2

d(q∗, s)

(1

s<ρ(q∗)2

2

1

γ

(ρ(q∗)2 − s

)2γ

+1s>

ρ(q∗)2

2

(1

γsγ(ρ(q∗)2 − s

)γ+

∫ ρ(q∗)2−s

s

ds′(s′)γ−1(ρ(q∗)2 − s′

)γ))

=c2γ(n− 1)2

d(q∗, s)

(1

s<ρ(q∗)2

2

(ρ(q∗)2 − s

)2γ+ 1

s>ρ(q∗)2

2

(sγ(ρ(q∗)2 − s

)γ+ γJ(s)

)), (95)

where we have denoted by

J(s) ≡∫ ρ(q∗)2−s

s

ds′(s′)γ−1(ρ(q∗)2 − s′

)γ.

Using the fact that d(q, s) ≡ (n− 1)cγ (ρqq∗ − s)γ this can be written as

C(q∗, s) =1

(ρ(q∗)2 − s)2γ

(1

s<ρ(q∗)2

2

(ρ(q∗)2 − s

)2γ+ 1

s>ρ(q∗)2

2

(sγ(ρ(q∗)2 − s

)γ+ γJ(s)

)). (96)

Hence we get

C(k) =1

P (k)

∫dsf(s)δ

(k − d(q∗, s)

)C(q∗, s)

=1

P (k)

∫dsγcγsγ−1δ

(s−

(ρ(q∗)2 −

(k

(n− 1)cγ

) 1γ

))1

γk

(k

(n− 1)cγ

) 1γ

C(q∗, s)

=1

P (k)γcγ

(ρ(q∗)2 −

(k

(n− 1)cγ

) 1γ

)γ−11

γk

(k

(n− 1)cγ

) 1γ

C

(q∗, ρ(q∗)2 −

(k

(n− 1)cγ

) 1γ

)

= C

(q∗, ρ(q∗)2 −

(k

(n− 1)cγ

) 1γ

).

13

Page 77: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

s′

s′′

sρ(q∗)2 − s

ρ(q∗)2 − s

ρ(q∗)2 = s′ + s′′

1/c

1/c

s′

s′′

s ρ(q∗)2 − s

ρ(q∗)2 − s

ρ(q∗)2 = s′ + s′′

1/c

1/c

Figure E.3: The area of integration in Equation (95) for the case of s < ρ(q∗)2

2(left panel) and for the case of

s > ρ(q∗)2

2(right panel).

Inserting Equation (96) this gives

C(k) = 1k<(n−1)

(ρ(q∗)2c

2

)γ + 1k>(n−1)

(ρ(q∗)2c

2

((n− 1)cγ

k+ γ

((n− 1)cγ

k

)2

J

(ρ(q∗)2 −

(k

(n− 1)cγ

) 1γ

))

= 1k<(n−1)

(ρ(q∗)2c

2

)γ + 1k>(n−1)

(ρ(q∗)2c

2

)γ((n− 1)cγ

k

+ γ

((n− 1)cγ

k

)2 ∫ ( k(n−1)cγ )

ρ(q∗)2−( k(n−1)cγ )

ds′(s′)γ−1(ρ(q∗)2 − s′

)γ (97)

Figure E.2 shows the results from numerical simulations compared with the theoretical predictionof Equation (97). The figure further illustrates that the model can generate two-vertex and three-vertex degree correlations, such as a decreasing average nearest neighbor connectivity, knn(d),indicating a dissortative network, as well as a decreasing clustering degree distribution, C(d),with the degree d.

E.3. Heterogeneous Technology Spillovers

In this section we allow for heterogeneity among firms in terms of their technological abilities [cf.Cohen and Levinthal, 1990; Griffith et al., 2003]. We assume that the knowledge embodied in afirm i ∈ N = 1, . . . , n can be represented as an N -dimensional vector hi in the knowledge spaceHN = 0, 1N , which consists of all binary sequences with elements in 0, 1 of length N . Thenumber of such sequences is 2N . The knowledge vector hi, with components hik ∈ 0, 1, indicateswhether firm i knows idea k ∈ 1, . . . , N or not. We introduce a spillover function f : HN×HN → R

capturing the potential technology transfer between any pairs of firms. A possible specificationis one in which f(hi,hj) = 1〈hi,hj〉>s, where 〈·, ·〉 denotes the usual scalar product in R

n, so that〈hi,hj〉 counts the number of technologies known to both i and j, and f(hi,hj) is one iff i and j

have at least s > 0 technologies in common. This is an instance of a random intersection graph [cf.Bloznelis, 2013; Deijfen and Kets, 2009; Newman, 2003; Singer-Cohen, 1995].63

63There is a variety of other functional forms that can be incorporated in our model. For example, a simple choicefor the function f could be f(hi,hj) = a|hi ∩ hj |, where a ∈ R+ and |hi ∩ hj | = h⊤

i hj =∑N

k=1 hikhjk denotes thecommon knowledge of i and j. Alternative specifications for similarity can be found in Liben-Nowell and Kleinberg[2007] and Bloom et al. [2007]; Jaffe [1989]. Alternatively, following Berliant and Fujita [2008, 2009], a possible

parametric specification for f would be f(hi,hj) = |hi ∩ hj |κd(hi,hj)1−κ2 for some κ ∈ (0, 1). The distance is the

product of the total number of ideas known by agent i but not by j times the total number of ideas known by j

14

Page 78: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

Given the spillover function f(hi,hj), the marginal cost of production of a firm i becomes

ci = c− αei − β

n∑

j=1

aijf(hi,hj)ej ,

and profits of firm i are given by

πi = (a− c)qi − q2i − bqi∑

j 6=i

qj + αqiei + βqi

n∑

j=1

aijf(hi,hj)ej − γe2i − ζdi.

The FOC with respect to effort ei is given by

∂πi∂ei

= αqi − 2γei = 0,

from which it follows thatei =

α

2γqi = λqi.

Inserting into profits yields

πi = (a− c)qi − (1 − λα+ λ2γ)q2i − bqi∑

j 6=i

qj + λβqi

n∑

j=1

aijf(hi,hj)qj − ζdi

= ηqi − νq2i − bqi∑

j 6=i

qj + ρqi

n∑

j=1

aijf(hi,hj)qj − ζdi.

We can then obtain a potential function given by

Φ(q, G,h) =

n∑

i=1

((a− c)qi − νq2i )−b

2

n∑

i=1

qi∑

j 6=i

qj +

n∑

i=1

qi

n∑

j=1

aijf(hi,hj)qj − ζm.

The stationary distribution is given by

µϑ(q, G,h) =eϑΦ(q,G,h)

∑h′∈HN

∑G′∈Gn

∫Qn eϑΦ(s,G′,h′)ds

.

The probability of observing a network G ∈ Gn, given an output distribution q ∈ [0, q]n andtechnology portfolios h ∈ HN is determined by the conditional distribution

µϑ(G|q,h) =∏

i<j

eϑaij(ρf(hi,hj)qiqj−ζ)

1 + eϑ(ρf(hi,hj)qiqj−ζ), (98)

which is equivalent to the probability of observing an inhomogeneous random graph with linkprobability

pϑ(qi,hi, qj ,hj) ≡eϑ(ρf(hi,hj)qiqj−ζ)

1 + eϑ(ρf(hi,hj)qiqj−ζ). (99)

In the following we consider a particularly simple specification in which each firm i is assigned atechnology k ∈ 1, . . . , N uniformly at random so that hik = 1 and hil = 0 for all l 6= k. Moreover,let f(hi,hj) = 1〈hi,hj〉≥1, that is, firms i an j can only benefit from a collaboration if they havea technology in common. If technologies are assigned uniformly at random then

P ( 〈hi,hj〉 ≥ 1| qi = q, qj = q′) =1

N1ρqq′>ζ.

but not by i, i.e. d(hi,hj) = |hi\hj |× |hj\hi| = |hi ∩hcj |× |hc

i ∩hj | =∑N

k=1 hik(1−hjk)∑N

k=1(1−hik)hjk, whereu = (1, . . . , 1)⊤ and hc

i = u − hi. Other functional forms have been suggested in the literature [see e.g. Baumet al., 2009; Nooteboom et al., 2007], such as f(hi,hj) = a1|hi ∩ hj | − a2|hi ∩ hj |2, with constants a1, a2 ≥ 0.

15

Page 79: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

Due to symmetry the firms quantities in the stationary state when ϑ → ∞ are identical and givenby q∗. In the case of ρ(q∗)2 > ζ > 0 we then we have that

P (aij = 1) =1

N,

and the degree distribution is given by

P (k) = P (d1(G) = k) =

(n

k

)(1

N

)k (1− 1

N

)n−k.

The average nearest neighbor degree distribution is then given by

knn(k) =

n∑

k′=1

k′P (d2(G) = k′ − 1| a12 = 1, d1(G) = k)

where

P (d2(G) = k′ − 1|a12 = 1, d1(G) = k) =P (d2(G) = k′ − 1, d1(G) = k| a12 = 1)

P (k)

=1

P (k)

(1

N

)k′−1(1− 1

N

)n−k′+1(1

N

)k−1 (1− 1

N

)n−k+1

=P (k′ − 1)P (k − 1)

P (k).

We then get

knn(k) =

n∑

k′=1

k′P (d2(G) = k′ − 1| a12 = 1, d1(G) = k)

=

n∑

k′=1

k′P (k′ − 1)P (k − 1)

P (k)

=k(1 − 1

N)(1 + n 1

N− (n+ 1)

(1N

)n)

1N(1 + n− k)

= O(k), (100)

as n→ ∞. That is, the average nearest neighbor degree knn(k) is asymptotically linearly increasingwith the degree k, and thus we have an assortative network. An illustration can be seen in FigureE.4.

The clustering coefficient is simply given by C(k) = P (a23 = 1| a12 = 1, a23 = 1, d1(G) = k) = 1.This is because if firm 1 is connected to firm 2 then they must have the same technology. Similarly,if firm 1 is connected to firm 3 then they also must have the same technology. Due to transitivity,firms 2 and 3 then must have the same technology, and thus must be connected.

F. Details of the Estimation Algorithm

F.1. Implementation of the DMH Algorithm

In the endogenous network case, unknown parameters are extended from θexo = (b, ρ, δ⊤) toθendo = (b, ρ, δ⊤, γ⊤) by having the additional parameter γ⊤. We specify the prior distribution forγ as,

γ ∼ Nr(γ0,Γ0),

where γ0 = 0 and Γ0 = 104Ir. Based on the assignment of prior distributions and the likelihoodof Equation (15), we can derive the posterior distribution and then apply the Gibbs sampler toupdate the following set of conditional posterior distributions:

(i) P (γ|q, G, θendo\γ), where θendo\γ stands for θendo with γ excluded.

16

Page 80: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

i

j

20 40 60 80 100

20

40

60

80

10010

010

110

210

0

101

102

103

knn(

)d

d

Figure E.4: (Left panel) Illustration of the matrix with elements 1〈hi,hj〉>0 for n = 100 firms and N = 10technologies. (Right panel) The average nearest neighbor degree distribution, knn(d), for the same parameters.The dashed line represents the solution from Equation (100) while the circles correspond to a numerical simulation.

By applying Bayes’ theorem, we have

P (γ|q, G, θ\γ) ∝ φr+1(γ; γ0,Γ0) · µ(q, G|θendo), (101)

where φ denotes the normal density function.

(ii) P (b, ρ|q, G, θendo\(b, ρ)).By applying Bayes’ theorem, we have

P (b, ρ|q, G, θendo\(b, ρ)) ∝ µ(q, G|θendo)I((b, ρ) ∈ O). (102)

(iii) P (δ|q, G, θendo\δ).By applying Bayes’ theorem, we have

P (δ|q, G, θendo\δ) ∝ φl(δ; δ0,∆0) · µ(q|G, θendo). (103)

Among these conditional posterior distributions, we apply the DMH algorithm to simulatedraws from (i) and (ii) because their distributions are not available in a closed form. At each tth

MCMC iteration, the implementation steps are illustrated as follows:

Step I. Simulate γ(t) from P(γ∣∣q, G,Υ(t−1)

)by the DMH algorithm, where Υ(t−1) denotes the

rest of parameters evaluated at the (t− 1)th iteration.

(a) propose γ from a random walk proposal density T1(γ|γ(t−1))

(b) simulate an auxiliary data q′ and G′ by M runs of the MH algorithm based on

µ(q, G|γ,Υ(t−1)) =exp

(Φ(q, G, γ,Υ(t−1)

))∑

g∈Gn

∫Qn dq exp

(Φ(q, g, γ,Υ(t−1)

))

starting from G.64 The details are as follows:First, let the initial auxiliary graph G(0) equal to observed G, i.e., A(0) = A. At each mth

run, m = 1, · · · ,M , we propose G from G(m−1) by selecting node i with probability 1nand

node j 6= i with a probability 1n−1 randomly and change the value a

(m−1)ij and a

(m−1)ji in

matrix A(m−1) to 1− a(m−1)ij and 1− a

(m−1)ji and propose it as A. With a certain probability,

we flip all elements in the A(m−1) from 0 to 1 (or 1 to 0) and propose it as A. Denote M =

64This step mimics the MH sampler in Snijders [2002b] and Mele [2010b]. In practice, we set M = 2n(n − 1)where n is the size of the network k.

17

Page 81: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

In+b(t−1)S−ρ(t−1)(AF) and calculate q∗ =

(M)−1

Xδ(t−1). Simulate q from N(q∗, (M)−1

).

With the acceptance probability

α(G|G(m−1)) = min

exp(Φ(G, , q)

)

exp(Φ(G(m−1),q(m−1))

) , 1

,

set G(m) to G and q(m) = q, otherwise, set G(m) = G(m−1) and q(m) = q(m−1). After M runs,collect G′

k = G(M) and q′ = q(M).

(d) With the acceptance probability equal to

α(γ|γ(t−1), G′,q′)

= min

φr+1(γ|γ0,Γ0) · µ (G,q, γ)φr+1(γ(t−1)|γ0,Γ0) · µ (G,q, γ(t−1))

·µ(G′,q′, γ(t−1)

)

µ (G′,q′, γ), 1

= min

φr+1(γ|γ0,Γ0) · exp (Φ(G,q, γ))

φr+1(γ(t−1)|γ0,Γ0) · exp (Φ(G,q, γ(t−1)))·exp

(Φ(G′,q′, γ(t−1))

)

exp (Φ(G′,q′, γ)), 1

,

set γ(t) = γ. Otherwise, set γ(t) = γ(t−1).

Step II. Simulate (b(t), ρ(t)) from P(b, ρ∣∣q, G, γ(t),Υ(t−1)

)by the DMH algorithm.

(a) propose (b, ρ) from a random walk proposal density T1(b, ρ|b(t−1), ρ(t−1))

(b) simulate an auxiliary data q′ and G′ by M runs of the MH algorithm based on

µ(q, G|γ(t), b, ρ,Υ(t−1)) =exp

(Φ(q, G, γ(t), b, ρ,Υ(t−1)

))

∑g∈Gn

∫Qn dq exp

(Φ(q, g, γ(t−1), b, ρ,Υ(t−1)

))

starting from G. After M runs, collect G′ = G(M) and q′ = q(M).

(d) With the acceptance probability equal to

α(b, ρ|b(t−1), ρ(t−1), G′,q′)

= min

I((b, ρ) ∈ O) · µ

(q, G, γ(t), b, ρ

)

I((bl(t−1), bs(t−1), ρ(t−1)) ∈ O) · µ (q, G, γ(t), bl(t−1), bs(t−1), ρ(t−1))·

µ(q′, G′, γ(t), b(t−1), ρ(t−1)

)

µ(q′, G′, γ(t), b, ρ

) , 1

= min

I((b, ρ) ∈ O) · exp(Φ(q, G, γ(t), b, ρ)

)

I((bl(t−1), bs(t−1), ρ(t−1)) ∈ O) · exp (Φ(q, G, γ(t), b(t−1), ρ(t−1)))·

exp(Φ(q′, G′, γ(t), b(t−1), ρ(t−1))

)

exp(Φ(q′, G′, γ(t), b, ρ)

) , 1

,

set (b(t), ρ(t)) = (b, ρ). Otherwise, set (b(t), ρ(t)) = (b(t−1), ρ(t−1)).

Step III. Simulate δ(t) from P (δ|q, G, γ(t), bl(t), bs(t), ρ(t),Υ(t−1)). From the conditional posteriordistribution of δ in Equation (103),

P (δ|q, G, γ(t), bl(t), bs(t), ρ(t),Υ(t−1)) ∝ φl(δ; δ0,∆0) ·mu(q|G, bl(t), bs(t), ρ(t), δ,Υ(t−1)),

18

Page 82: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

since both φl(δ; δ0,∆0) and f(q|G, bl(t), bs(t), ρ(t), δ,Υ(t−1)) are normal density functions, we can drawδ from

P (δ|q, G, γ(t), bl(t), bs(t), ρ(t),Υ(t−1)) ∝ φl

(δ; δ,∆

),

δ = ∆(∆−1

0 δ0 + ϑ(t−1)X⊤q),

∆ =(∆−1

0 + ϑ(t−1)X⊤(M(t))−1X)−1

.

F.2. Implementation of the AEX Algorithm

We first define mathematical notations used in the AEX algorithm. The AEX algorithm consistsof two Markov chains running in parallel. Subscript t denotes the tth iteration of chains. In thefirst chain, we simulate auxiliary network sample Gt from µ(G|θt), where θt is sampled from theset of pre-specified parameter points (θ(1), · · · , θ(m)). H(θt) denotes the order index of θt, i.e.,H(θt) = i if θt = θ(i). Let p = (p1, · · · , pm) be the desired sampling frequencies from the respectivedistributions µ(G|θ(1)), · · · , f(G|θ(m)), where 0 < pi < 1 and

∑mi=1 pi = 1. We follow Liang et al.

[2015] to set p1 = · · · = pm = 1m. As required by the SAMC algorithm, we specify a gain factor

sequence, at, which is a positive, nonincreasing sequence satisfying the following condition:

(A1) limt→∞ at = 0,∑∞

t=1 at = ∞,∑∞t=1 a

kt <∞ for some k ∈ (1, 2].

Following Liang et al. [2015], we set at =t0

max(t0,t)with a known constant t0 > 1. Since a large

value of t0 will help the sampler to reach each distribution f(G|θ(i)) quickly, we set t0 = 20, 000

in this paper. Also, we let w(i)t denote an abundance factor for each distribution f(G|θ(i)) and

wt = (w(1)t , · · · , w(m)

t ). We set the initial values w(1)0 = · · · = w

(m)0 = 1 in our simulation. Finally, let

Ωt = (G1, · · · , Gt, θ1, · · · , θt, H(θ1), · · · , H(θt), w1, · · · , wt) denote the information in the auxiliarychain that we collect up to iteration t. In the second chain, we draw θt by the exchangealgorithm from the target posterior f(θ|q,G,S).

At each tth iteration, the AEX is implemented by Part I and Part II described as follows:

Part I. Auxiliary network simulation

1. Choose to update θt or Gt with pre-specified probabilities, i.e., 0.75 for updating θt and0.25 for updating Gt.

(a) Update θt: Select θ′ from the set (θ(1), · · · , θ(m)) according to a proposal distribu-tion T2(·|θt−1). With the probability

min

1,

wH(θt−1)t−1 exp(Φ(Gt−1, ǫ(θt−1), θ

′))T2(θt−1|θ′)

wH(θ′)t−1 exp(Φ(Gt−1, ǫ(θt−1), θt−1))T2(θ′|θt−1)

, (104)

update (θt, Gt) = (θ′, Gt−1), where ǫ(θt−1) is defined in Equation (??) with therespective parameter θt−1. Otherwise, Set (θt, Gt) = (θt−1, Gt−1).

(b) Update Gt: simulate Gt from f(G|θt−1) via few MH updates starting from Gt−1.Then set θt = θt−1.

2. For i = 1, · · · ,m, update the abundance factor w(i)t by

log(w(i)t ) = log(w

(i)t−1) + at(et,i − pi),

where et,i = 1 if θt = θ(i) and 0 otherwise.

3. Append (θt, Gt, H(θt), wt) to the collection Ωt−1 to form Ωt.

Part II. Adaptive Exchange algorithm for target parameter

4. Propose a candidate θ′ from the proposal distribution T1(θ′|θt−1).

19

Page 83: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

5. Resample an auxiliary network data G′ from the collection St via an importance sam-pling procedure. With probability

P (G′ = Gi) =

∑t

j=1

w

H(θj)

j

exp(Φ(Gj ,ǫ(θ′),θ′))

exp(Φ(Gj ,ǫ(θj),θj))I(Gj = Gi)

∑t

j=1

w

H(θj)

j

exp(Φ(Gj ,ǫ(θ′),θ′))

exp(Φ(Gj ,ǫ(θj),θj))

, (105)

choose G′ = Gi.

6. Implement the exchange algorithm. With the probability

α(θ′, G′|θt−1, G)

= min

1,

π(θ′)|M(θ′)|P (ǫ|θ′) exp(Φ(G, ǫ(θ′),θ′))

π(θt−1)|M(θt−1)|P (ǫ|θt−1) exp(Φ(G, ǫ(θt−1),θt−1))

T1(θt−1|θ′)

T1(θ′|θt−1)

exp(Φ(G′, ǫ(θt−1),θt−1))

exp(Φ(G′, ǫ(θ′),θ′))

,

(106)

set θt = θ′. Otherwise, set θt = θt−1.

F.3. Convergence of the AEX Algorithm

In this appendix, we outline the ideas and main results behind the convergence proof of the AEXalgorithm. Interested readers are referred to Jin et al. [2013] and the supplementary material ofLiang et al. [2015] for the completed version of proof. From Part II of the AEX algorithm, onecan see that auxiliary network sample G′ is drawn via a dynamic importance sampling procedurein Equation (105), which implies that the underlying proposal distribution for auxiliary networkschanges from iteration to iteration. Therefore, AEX falls into the class of adaptive MCMC algo-rithms with varying stationary distributions and requires an unconventional convergence theory[Fort et al., 2011; Roberts and Rosenthal, 2007].

Liang et al. [2015] provided an ergodicity theorem for adaptive Markov chains with varyingstationary distributions, which can be used to prove ergodicity of the AEX algorithm. Considera state space, (X,B), where Xt ∈ X denotes the state of the Markov chain at iteration t andB denotes the Borel set defined on X. Let γt denote realization of the adoption index Γt ∈ Y

and Ft = σ(X0, · · · , Xt,Γ0, · · · ,Γt) be the filtration generated by Xi,Γiti=0. Let Pγt denote thetransition kernel at iteration t and thus,

P (Xt+1 ∈ B|Xt = x,Γt = γt,Ft−1) = Pγt(x,B), x ∈ X, γt ∈ Y, B ∈ B.

Let P hγ (x,B) = Pγ(Xh ∈ B|X0 = x) denote the h-step transition probability for the Markov chainwith the fixed transition kernel Pγ and the initial condition X0 = x. Also let P h((x, γ), B) =

P (Xh ∈ B|X0 = x,Γ0 = γ,Fh−1), B ∈ B, denote the h-step transition probability for the adaptiveMarkov chain with the initial conditions X0 = x and Γ0 = γ. An adaptive Markov chain is calledergodic if limt→∞ V (x, γ, t) = 0 for all x ∈ X and γ ∈ Y, where

V (x, γ, t) =‖ P t((x, γ), ·) − π(·) ‖= supB∈B

‖ P t((x, γ), B)− π(B) ‖

denotes the total variation distance between the distribution of the adaptive chain at iteration t

and the target distribution π(·).

Theorem 1. Consider an adaptive Markov chain defined on the state space (X,B) with the adoption indexΓt ∈ Y. The adoptive Markov chain is ergodic if the following conditions are satisfied:

(a) (Stationarity) There exist a stationary distribution πγt(·) for each transition kernel Pγt .

(b) (Asymptotic Simultaneous Uniform Ergodicity) For any u > 0, there exist a measurable set E(u) inthe probability space such that Pr(E(u)) ≥ 1 − u and on this set E(u), for any ǫ > 0, there existt(ǫ) ∈ N and n(ǫ) ∈ N such that

supx∈X

‖ Pnγt(x, ·)− π(·) ‖≤ ǫ,

for all t > t(ǫ) and n > n(ǫ).

20

Page 84: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

(c) (Diminishing Adoption) let Dt = supx∈X‖ PΓt+1(x, ·)− PΓt

(x, ·) ‖ and limt→∞Dt = 0 in probability.

From conditions (a) and (b) of Theorem 1, it is implied that on the set E(u), for any ǫ,

‖ πγt(·)− π(·) ‖=‖ πγt(·)P hγt − π(·) ‖≤ supx∈X

‖ Pnγt(x, ·)− π(·) ‖≤ ǫ,

for t > t(ǫ) and n > n(ǫ). Furthermore, by the triangular inequality,

‖ Pnγt(x, ·)− πγt(·) ‖<‖ Pnγt(x, ·)− π(·) ‖ + ‖ πγt(·)− π(·) ‖< 2ǫ,

for t > t(ǫ) and n > n(ǫ). Thus, the simultaneous uniform ergodicity of Pγt on the set E(u) isestablished. Given condition (c) of Theorem 1 and the simultaneous uniform ergodicity of Pγt,Liang et al. [2015] showed the weak law of large numbers for an adaptive Markov chain in thefollowing theorem:

Theorem 2. Consider an adaptive Markov chain defined on the state space (X,B). Let g(·) be a bounded

measurable function. Suppose that conditions (a), (b), and (c) of Theorem 1 hold, then 1T

∑Tt=1 g(Xt) →

π(g), in probability as T → ∞, where π(g) =∫Xg(x)π(dx).

The rest of procedure is to show the AEX algorithm satisfies three conditions of Theorem 1and thus it is ergodic and the weak law of large number holds for the average of sample pathθt. To begin with, we assume that in the auxiliary chain, the Markov transition kernel, Pw,for updating the state variable Xt = (Gt, θt) ∈ X via the SAMC algorithm satisfies the followingDoeblin condition:

(A2) For any given w ∈ , the Markov transition kernel Pw is irreducible and aperiodic. Inaddition, there exist an integer h, a real number 0 < δ < 1, and a probability measure νsuch that for any compact subset K ⊂ ,

infw∈K

P hw(x,B) ≥ δν(B), ∀x ∈ X, ∀B ∈ B,

where B denotes the Borel set of X.

The condition (A2) will be assured by Theorem 2.2 of Roberts and Tweedie [1996] if X is com-pact, the potential function Φ(G′, ǫ(θ), θ) is bounded away from 0 and ∞ on X, and the proposaldistribution T (y|x) satisfies the local positive condition:

(local positive condition) For every x ∈ X, there exist ǫ1 > 0, and ǫ2 > 0 such that |x− y| ≤ ǫ1 ⇒T (y|x) ≥ ǫ2.

The following Lemma 1 shows that draws of G′ from the dynamic importance sampler ofEquation (105) converges to the distribution of f(·|θ′) almost surely when the number of iterationsgoes to infinity.

Lemma 1. Assume (i) conditions (A1) and (A2) are satisfied; (ii) X is compact; (iii) f(G|θ) is boundedaway from 0 and ∞ on X × Θ; Given a total of N iterations, let g1, · · · , gn be n distinct samples in

G1, · · · , GN. Re-sample a random network G′ from G1, · · · , GN such that

P (G′ = gk|θ′) =

∑Nt=1

∑mi=1

w

(i)t

exp(Φ(Gt,ǫ(θ′),θ′))

exp(Φ(Gt,ǫ(θ(i)),θ(i)))I(H(θt) = i and Gt = gk)

∑Nt=1

∑mi=1

w

(i)t

exp(Φ(Gt,ǫ(θ′),θ′))

exp(Φ(Gt,ǫ(θ(i)),θ(i)))I(H(θt) = i)

, (107)

for k = 1, · · · , n, then the distribution of G′ converges to f(G′|θ′) almost surely as N → ∞.

Notice that θt from the AEX algorithm form an adaptive Markov chain with the transitionkernel given by

Pl(θ, dθ′) =

X

α(θ′, G′|θ, G)T1(dθ′|θ)νl(G′|θ′)dG′+δθ(dθ′)

[1−

Θ×X

α(ϑ′, G′|θ, G)T1(dϑ′|θ)νl(G′|ϑ′)dG′

],

21

Page 85: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

where α(θ′, G′|θ, G) is defined in Equation (106), δθ(dθ′) = 1 if θ ∈ dθ′ and 0 otherwise, T1(dθ′|θ)is the proposal distribution, l denotes the cardinality of the set Ωl, i.e., l = |Ωl|, and νl(G

′|θ′)

denotes the distribution of G′ resampled from Ωl. Different from the transition kernel of AEX,the transition kernel of exchange algorithm depends on the perfect draw of G′ from f(·|ϑ′), whichis given by

P (θ, dθ′) =

X

α(θ′, G′|θ, G)T1(dθ′|θ)f(G′|θ′)dG′+δθ(dθ′)

[1−

Θ×X

α(ϑ′, G′|θ, G)T1(dϑ′|θ)f(G′|ϑ′)dG′

],

where α(θ′, G′|θ, G) is defined in Equation (41)

Lemma 2. Assume that (i) conditions (A1) and (A2) are satisfied; (ii) both Θ and X are compact; (iii)the potential function Φ(G′, ǫ(θ), θ) is continuously differentiable in θ for all G′ ∈ X and bounded awayfrom 0 and ∞ on X × Θ; (iv) π(θ) and q(θ′|θ) are continuously differentiable in θ and θ′. Define

Dl = supθ∈Θ ‖ Pl(θ, ·)− P (θ, ·) ‖, then Dl → 0 almost surely as l → ∞.

Define Dl = supθ∈Θ ‖ Pl+1(θ, ·) − Pl(θ, ·) ‖. Since Dl ≤ supθ∈Θ ‖ Pl+1(θ, ·)− P (θ, ·) ‖ +supθ∈Θ ‖Pl(θ, ·)− P (θ, ·) ‖, we have liml→∞Dl = 0 almost surely by Lemma 2. Thus, Pl satisfies condition(c) of Theorem D1. It is known that the transition kernel of the exchange algorithm, P (θ, dθ′),is irreducible and aperiodic and admits an invariant limit distribution. Lemma 3 shows that thetransition kernel of the AEX algorithm, Pl(θ, dθ′), also has these properties and thus satisfiescondition (a) of Theorem 1.

Lemma 3. Assume (i) the conditions of Lemma 2 are satisfied; (ii) P is irreducible and aperiodic and

admits an invariant distribution, then Pl is irreducible and aperiodic, and hence exists a stationary distri-bution πl(θ|G′) such that for any θ0 ∈ Θ,

limh→∞

‖ P hl (θ0, ·)− πl(·|G) ‖= 0.

Lemma 4 establishes the asymptotic simultaneous uniform ergodicity of the kernel Pl’s.

Lemma 4. Assume the conditions of Lemma 3 are satisfied. If the proposal q(·, ·) satisfies the localpositive condition, then for any e > 0, there exist a measurable set E(e) in the probability space such thatP (E(e)) > 1− e and on this set E(e), for any ǫ > 0, there exist n(ǫ) ∈ N and l(ǫ) ∈ N such that for anyn > n(ǫ) and l > l(ǫ),

‖ Pnl (θ0, ·)− π(·|G) ‖≤ ǫ, for all θ0 ∈ Θ.

Putting Lemmas 2, 3, and 4 altogether, we have the following theorem in regard to ergodicityand the weak law of large number for AEX.

Theorem 3. Assume the conditions of Lemma 4 hold. If the proposal q(·, ·) satisfies the local positivecondition and the potential function Φ(G′, ǫ(θ), θ) is upper semi-continuous in θ for all G′ ∈ X, then theadaptive exchange algorithm is ergodic and for any bounded measurable function g,

1

T

T∑

i=1

g(θi) → π(g|G), in probability,

as T → ∞, where π(g|G) =∫Θg(θ)π(θ|G)dθ.

F.4. Matrix Perturbation

For the computational implementation of the MCMC algorithm we will have to evaluate thepotential function many times after a link has been added or removed in the network G. To dothis in an efficient way, the following lemma will be helpful.

22

Page 86: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

Lemma 5. Let ei and ej be the i-th and j-th unit basis vectors in Rn. The matrix eie

⊤j has a one in the

ij-th position and zeros everywhere else. Define the matrix

Bij ≡A−1eie

⊤j A

−1

1 + αe⊤j A−1ei

.

Adding a perturbation α to the matrix A in the ij-th and the ji-th position can be written as A+αeie⊤j +

ρeje⊤i .

(i) The inverse of the perturbed matrix can be written as

(A+ αeie⊤j + αeje

⊤i )

−1 = A−1 − αBij − α

(A−1 − αBij

)eje

⊤i

(A−1 − αBij

)

1 + αe⊤i (A−1 − αBij) ej. (108)

(ii) The determinant of the perturbed matrix can be written as

det(A+ αeie⊤j + αeje

⊤i ) =

(1 + αe⊤i

(A−1 − αBij

)ej) (

1 + αe⊤j A−1ei

)det (A) . (109)

Note that due to Lemma 5 the determinant and the inverse of the perturbed matrix A+αeie⊤j +

αeje⊤i can be efficiently computed if the determinant, det (A), and the inverse, A−1, of A are

already known.

G. Proofs

Proof of Lemma 1. By the assumption that X is compact and f(G|θ) is bounded away from 0 and ∞,it follows from the convergence and strong law of large numbers of SAMC [Liang et al., 2015] that

1

N

N∑

t=1

m∑

i=1

w

(i)t

exp(Φ(Gt, ǫ(θ′), θ′))

exp(Φ(Gt, ǫ(θ(i)), θ(i)))I(H(θt) = i)

→m∑

i=1

X

κ(θ(i), ǫ(θ(i)))

pi

exp(Φ(Gt, ǫ(θ′), θ′))

exp(Φ(Gt, ǫ(θ(i)), θ(i)))pif(G|θ(i), ǫ(θ(i)))dG

= mκ(θ′), a.s., (110)

where f(G|θ(i), ǫ(θ(i))) = exp(Φ(G,ǫ(θ(i)),θ(i)))κ(θ(i),ǫ(θ(i)))

and κ(θ(i), ǫ(θ(i))) =∫Xexp(Φ(G, ǫ(θ(i)), θ(i))). Similarly,

for any Borel set A ∈ X,

1

N

N∑

t=1

m∑

i=1

w

(i)t

exp(Φ(Gt, ǫ(θ′), θ′))

exp(Φ(Gt, ǫ(θ(i)), θ(i)))I(H(θt) = i and Gt ∈ A)

→m∑

i=1

A

κ(θ(i), ǫ(θ(i)))

pi

exp(Φ(Gt, ǫ(θ′), θ′))

exp(Φ(Gt, ǫ(θ(i)), θ(i)))pif(G|θ(i))dG

= m

A

exp(Φ(G, ǫ(θ′), θ′))dG. (111)

Putting Equations (110) and (111) together, as N → ∞, we have

P (G′ ∈ A|G1, θ1, w1; · · · , GN , θN , wN ; θ′) (112)

=

∑Nt=1

∑mi=1

w

(i)t

exp(Φ(Gt,ǫ(θ′),θ′))

exp(Φ(Gt,ǫ(θ(i)),θ(i)))I(H(θt) = i and Gt ∈ A)

∑Nt=1

∑mi=1

w

(i)t

exp(Φ(Gt,ǫ(θ′),θ′))

exp(Φ(Gt,ǫ(θ(i)),θ(i)))I(H(θt) = i)

→∫

A

f(G|θ′, ǫ(θ′))dG, a.s. (113)

23

Page 87: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

Finally, by Lebesque’s dominated convergence theorem,

P (G′ ∈ A|θ′) = E[P (G′ ∈ A|G1, θ1, w1; · · · , GN , θN , wN ; θ′)] →∫

A

f(G|θ′, ǫ(θ′))dG. (114)

Proof of Proposition 12. The potential φ(q, G) of Equation (90) has the property that for any q′i 6=qi ∈ [0, q] we have that

φ(q′i,q−i, G)− φ(qi,q−i, G) = η(q′i − qi)− ν(q′2i − q2i )− b(q′i − qi)∑

j 6=i

qj + ρ(q′i − qi)∑

j∈Ni

qj

= πi(q′i,q−i, G)− πi(qi,q−i, G).

Proof of Proposition 13. Equation (21) can be written as

qi −ρ

2ν − b

n∑

j=1

aijqj =η

2ν − b− b

2ν − b‖q‖

where ‖q‖ = u⊤q. Let us denote by ϕ = ρ2ν−b , A = η

2ν−b and B = b2ν−b . Then, in vector-matrix notation,

the above equation can be written as

(In − ϕq)q = (A−B‖q‖)u.

If ϕ < 1λmax

(q) then the matrix In − ϕq is invertible, and we obtain

q = (A−B‖q‖) (In − ϕq)−1

u.

Noting that(In − ϕq)

−1u = b(G,ϕ),

where b(G,ϕ) is the vector of Bonacich centralities with parameter ϕ [Bonacich, 1987], we obtain

q = (A−B‖q‖)b(G,ϕ).

With‖q‖ = (A−B‖q‖)‖b(G,ϕ)‖

we obtain

‖q‖ =A‖b(G,ϕ)‖

1 +B‖b(G,ϕ)‖)and it follows that

q =A

1 +B‖b(G,ϕ)‖)b(G,ϕ) =η

2ν + b(‖b(G,ϕ)‖ − 1)b(G,ϕ).

Next, we compute equilibrium profits. Let us denote by C = η2ν+b(‖b(G,ϕ)‖−1) , so that qi = Abi(G,ϕ).

Profit of firm i from Equation (8) can then be written as

πi = ηCbi(G,ϕ) − (ν − b)C2bi(G,ϕ)2 − bC2‖b(G,ϕ)‖bi(G,ϕ) + ρC2bi(G,ϕ)

n∑

j=1

aijbj(G,ϕ) − ζdi.

Using the fact that

bi(G,ϕ) = 1 +ρ

2ν − b

n∑

j=1

aijbj(G,ϕ)

we obtain

πi = ηCbi(G,ϕ)− (ν − b)C2bi(G,ϕ)2 − bC2‖b(G,ϕ)‖bi(G,ϕ)

+ ρC2(2ν − b)bi(G,ϕ)(bi(G,ϕ)− 1)− ζdi.

= νC2bi(G,ϕ)2 − ζdi.

24

Page 88: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

which gives

πi =n2η2ν

(2ν + b(‖b(G,ϕ)‖ − 1))2b2i (G,ϕ)− ζdi.

Proof of Lemma 5. We first give a proof of part (i) of the lemma. The Sherman–Morrison formulastates that [cf. Meyer, 2000, page 124]

(A+ αuv⊤)−1 = A−1 − αA−1uv⊤A−1

1 + αv⊤A−1u.

Let c = ei and d = ej , where ei and ej are the i-th and j-th unit basis vectors, respectively. The matrixcd⊤ then has a one in the (i, j)-position and zeros elsewhere, so that adding a one to the matrix A in the(i, j)-position and the (j, i)-position (resulting from adding a link ij to the adjacency matrix A) yields aperturbed matrix B which can be written as

B = A+ αeie⊤j + αeje

⊤i = C+ αeje

⊤i ,

where we have denoted by C ≡ A+ αeie⊤j . Using the Sherman–Morrison formula we then can write

B−1 = (C+ αeje⊤i )

−1 = C−1 − αC−1eje

⊤i C

−1

1 + αe⊤i C−1ej

, (115)

while applying Sherman–Morrison to C−1 gives

C−1 = (A+ αeie⊤j )

−1 = A−1 − αA−1eie

⊤j A

−1

1 + αe⊤j A−1ei

.

Inserting C−1 into Equation (115) yields

B−1 = A−1 − αA−1eie

⊤j A

−1

1 + αe⊤j A−1ei

− α

(A−1 − α

A−1eie⊤j A−1

1+αe⊤j A−1ei

)eje

⊤i

(A−1 − α

A−1eie⊤j A−1

1+αe⊤j A−1ei

)

1 + αe⊤i

(A−1 − α

A−1eie⊤j A−1

1+αe⊤j A−1ei

)ej

.

We next give a proof of part (ii) of the lemma. We have that det(A+αeie⊤j +αeje

⊤i ) = det(C+αeje

⊤i ),

where we have denoted by C ≡ A+αeie⊤j . The matrix determinant lemma states that [Horn and Johnson,

1990]

det(A+ uv⊤) = (1 + v⊤A−1u) det(A).

It then follows that

det(A+ αeie⊤j + αeje

⊤i ) = det(C+ αeje

⊤i ) = (1 + αe⊤i C

−1ej) det (C) ,

Similarly, from the matrix determinant lemma, we have that

det (C) = det(A+ αeie

⊤j

)= (1 + αe⊤j A

−1ei) det (A) .

Further, the Sherman–Morrison formula states that [cf. Meyer, 2000, page 124]

(A+ αcd⊤)−1 = A−1 − αA−1uv⊤A−1

1 + αv⊤A−1u.

It follows that

C−1 = (A+ αeie⊤j )

−1 = A−1 − αA−1eie

⊤j A

−1

1 + αe⊤j A−1ei

.

Putting the above results together, we find that

det(A+ αeie⊤j + αeje

⊤i ) = (1 + αe⊤i C

−1ej) det (C)

=

(1 + αe⊤i

(A−1 − α

A−1eie⊤j A

−1

1 + αe⊤j A−1ei

)ej

)(1 + αe⊤j A

−1ei)det (A) .

25

Page 89: f21d58f0-ce18-4ff3-8630-d7e092cb607e/p… · NetworkFormationwithLocalComplementsandGlobalSubstitutes: The CaseofR&DNetworks Chih-Sheng Hsieha, Michael D. Ko¨nigb, Xiaodong Liuc

26