48
Measuring the Coworker Effects on Wages * Jianhong Xin Saturday 9 th January, 2021 [Click for Latest Version] Abstract The fast growing literature studying the impact of co-workers on individual’s wages has recently made significant progress by developing techniques that allowed it to move from small and idiosyncratic case studies to more generalizable studies based on large labor markets. However, I show that the empirical methodology underlying this shift delivers a large positive or negative bias in measured co-worker effects in realistic settings. I combine insights from the assortative matching theory with recent computer science advances in graph embedding techniques to develop a machine learning method that allows researchers to obtain efficient and unbiased estimates in those settings. The proposed method allows to non-parametrically measure the potentially heterogeneous impact of different co-workers on individuals’s wages. I am currently using the proposed method to measure co-worker effects in the matched employer-employee panel data covering the entire population of Denmark. Keywords : Coworker Effects, Two-Sided Unobserved Heterogeneity, Assortative Match- ing, Machine Learning, Graph Embedding, Matrix Completion. * I am deeply indebted to Iourii Manovskii, Marcus Hagedorn, Dirk Krueger for their invaluable guidance and support. I would like to thank Harold Cole, Jos´ e-V´ ıctor R´ ıos-Rull, Andrew Postlewaite, Xu Cheng, Guillermo Ordonez, Hanming Fang and all other seminar participants at University of Pennsylvania and 2020 Joint Statistical Meetings. University of Pennsylvania, Department of Economics, The Ronald O. Perelman Center for Political Science and Economics, 133 South 36th Street, Philadelphia, PA 19104. Email: [email protected].

Measuring the Coworker E ects on Wages · 2021. 1. 9. · Measuring the Coworker E ects on Wages Jianhong Xiny Saturday 9th January, 2021 [Click for Latest Version] Abstract The fast

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

  • Measuring the Coworker Effects on Wages∗

    Jianhong Xin†

    Saturday 9th January, 2021

    [Click for Latest Version]

    Abstract

    The fast growing literature studying the impact of co-workers on individual’s wageshas recently made significant progress by developing techniques that allowed it tomove from small and idiosyncratic case studies to more generalizable studies based onlarge labor markets. However, I show that the empirical methodology underlying thisshift delivers a large positive or negative bias in measured co-worker effects in realisticsettings. I combine insights from the assortative matching theory with recent computerscience advances in graph embedding techniques to develop a machine learning methodthat allows researchers to obtain efficient and unbiased estimates in those settings. Theproposed method allows to non-parametrically measure the potentially heterogeneousimpact of different co-workers on individuals’s wages. I am currently using the proposedmethod to measure co-worker effects in the matched employer-employee panel datacovering the entire population of Denmark.

    Keywords: Coworker Effects, Two-Sided Unobserved Heterogeneity, Assortative Match-ing, Machine Learning, Graph Embedding, Matrix Completion.

    ∗I am deeply indebted to Iourii Manovskii, Marcus Hagedorn, Dirk Krueger for their invaluable guidanceand support. I would like to thank Harold Cole, José-Vı́ctor Ŕıos-Rull, Andrew Postlewaite, Xu Cheng,Guillermo Ordonez, Hanming Fang and all other seminar participants at University of Pennsylvania and2020 Joint Statistical Meetings.†University of Pennsylvania, Department of Economics, The Ronald O. Perelman Center for Political

    Science and Economics, 133 South 36th Street, Philadelphia, PA 19104. Email: [email protected].

    https://economics.sas.upenn.edu/people/jianhong-xin

  • 1 Introduction

    How does the wage of a worker depend on where she works and whom she works with? How

    to disentangle the contribution to wages the unobserved components of worker’s individual

    productivity, her firm productivity and her coworkers’ productivities? What is the magnitude

    of the complementarity between the productivity of the worker and the firm? What is the

    magnitude of the complementarity between the productivities of coworkers? How to predict

    the potential wage and output of any worker relocated to any firm she never worked at and

    with coworkers she might have never encountered? Measuring coworker effects on wages at

    the scale of a local labor market provides the key to answer these questions. The empirics of

    coworker effects also pave the way for subsequent research, for instance, to investigate the

    efficiency of the labor market allocation: Method that allows to predict wages and outputs in

    a counterfactual meeting of any workers across any firm is capable of projecting the optimal

    assignment of workers against search frictions and can shed light upon the design of policies

    such as the taxation and unemployment insurance system to achieve the optimal assignment.

    To estimate coworker effects, the central empirical challenge is well-known as the selection

    problem (Manski (1993), Brock and Durlauf (2001), Angrist (2014), Bramoulle et al. (2020)).

    The problem is rooted in the sorting in the labor market: workers may be endogenously

    matched into peers across firms and occupations. The sorting may be based on similar

    productive attributes or common external causes, which would potentially confound the

    peer effects. Moreover, a big proportion of these attributes may not be observed in the data,

    considering that observable worker and firm characteristics only account for small fraction

    of the observed variation in wages. 1

    The conventional approach to circumvent the selection problem is to instrumentalize the

    exogenous sources of variation in exposure to coworker influences. Researchers can random-

    ize peer components or treatment affecting outcome through field experiments (Sacerdote

    (2001), Duflo et al. (2011), Banerjee et al. (2015), Eckles et al. (2016) and Feld and Zölitz

    (2017)) or have to rely on a variety of strong and context-specific exogeneity assumptions

    for observational data. 2 Perhaps not surprisingly, these empirical research have been lim-

    1A large number of studies have found evidence that workers with similar unobservable productive at-tributes systematically self-select across firms and become peers. To name just a few, Andrews et al. (2012)find positive correlation between worker and firm contributions to wage for German data. Hagedorn et al.(2017) identify sorting based on unobserved characteristics in both workers and firms and find positive as-sortative matching (PAM), which has reinforced the previous finding. Abowd et al. (2018) find unobservabledifferences in worker productivity are strongly positively correlated with unobservable differences in firmproductivity across sectors. Crane (2014) find productive, fast-growing firms tend to hire more productiveworkers using U.S. census data. Mendes et al. (2007) document PAM for Portuguese data. Borovikova andShimer (2017) find high wage workers work for high wage firms for Austria.

    2For example, models studying peer effects in classroom typically assume idiosyncratic variations in peer

    1

  • ited to small and idiosyncratic case studies. While it is difficult to apply their methods to

    investigate coworker effects at a broader scale, the empirical findings in these studies exhibit

    a large extent of heterogeneity, making it difficult to generalize their outcome for the whole

    market as well. 3

    Yet, there is a recent growing interest in the literature moving beyond these case studies

    to investigate coworker effects in large labor markets. In a recent advance, Cornelissen et al.

    (2017) (henceforth CDS) are the first to estimate the average coworker effects on wages for the

    whole labor market in Germany. They proposed an empirical method to measure the average

    worker’s wage response to a change of her coworker quality with a linear-in-means model.

    Building on the worker and firm fixed effects model pioneered by Abowd et al. (1999) (AKM

    hereafter), the method accounts for selection into peer groups by including additive worker

    and firm fixed effects. Taking the approach of Arcidiacono et al. (2012), the method estimates

    coworker effects emanating from unobservable characteristics by iteratively estimating the

    focal worker’s ability, the firm ability, and the spillover coefficient mapping her own wage

    response to her coworker quality measured by the average of their fixed effects. Similar

    strategy based on including two-way additive fixed effects are also adopted by Hanushek

    et al. (2003), Betts and Zau (2004), Lavy et al. (2012), Burke and Sass (2013) and Sanz de

    Galdeano (2020).

    However, this method could be confronted with two challenges. First, by assuming linearly

    additive fixed effects, the model cannot capture potential interactions between unobserved

    heterogeneity of agents on both sides of the market. Particularly, wages that are monotone

    in worker and firm productivity are inconsistent with standard sorting models where the

    complementarity between worker and firm productivities play a key role (e.g. Becker (1973),

    Eeckhout and Kircher (2011), Gautier and Teulings (2006), Lopes de Melo (2013), Lentz

    (2010), Lise et al. (2016), Hagedorn et al. (2014)). The estimated worker and firm fixed effects

    therefore may fail to correctly reflect the true worker abilities and firm productivities when

    these attributes are truly complementary in the underlying production process. 4 Based on

    components across cohorts (Hoxby (2000)). Models studying peer effects on networks are often abstractfrom correlated effects based on unobservable characteristics. The typical assumption is the network andobservables are exogeneous to the outcome (Bramoulle et al. (2009)), or the endogenous formation of thenetwork is only conditional on observed characteristics (Bramoulle et al. (2020)). These assumptions abstractfrom correlated effects based on unobserved heterogeneity.

    3Mas and Moretti (2009) for instance, find strong evidence of positive productivity spillovers conditionalon the physical presence of coworkers in a supermarket chain, while in a different scenario, Bloom et al.(2014) conclude quite the opposite. Waldinger (2009) investigates spillover among university researchers andfound no evidence for localized peer effects, in sharp contrast to the findings of Herbst and Mas (2015).

    4From a theoretical point of view, Gautier and Teulings (2006) and Eeckhout and Kircher (2011) showthe role of the complementarity in determining the sign and magnitude of sorting and its implication forreallocation. From an empirical point of view, Hagedorn et al. (2017) non-parametrically estimate the wageand output profile and find overall positive assortative matching between workers and firms and positive

    2

  • these estimates, the CDS method can induce a sizable misspecification bias in the measured

    coworker effects: I show that the bias can be large, with its sign being either positive or

    negative, depending on the strength and sign of the wage complementarity.5

    Second, the adoption of linear-in-means model only captures average coworker effects.

    Despite its trackability, the linear-in-means model cannot reconcile the vast heterogeneity

    in coworker effects documented in the empirical literature: across various occupations and

    industries, different workers are found affected by their coworkers differently. 6 The second

    challenge stems from the fact that coworker influences are heterogeneous and these hetero-

    geneities may not have been observed in the data. However, there has been scarce discussion

    in the literature on estimating heterogeneous coworker effects based on unobservables.

    In this paper, I propose a new semi-parametric methodology to measure the effects of

    coworkers on wages. Combining economic theory and recent advances in machine learning,

    the method considers the dependence of wages on the heterogeneous productive attributes of

    both workers and firms that are either observed or unobserved. First, the method delivers an

    efficient and robust estimation under the presence of potential complementarities between

    these attributes on both sides of the market. Second, beyond linear-in-means models, the

    method is also the first to capture the heterogeneous coworker effects based on unobserved

    heterogeneity.

    The main idea to non-parametrically identify the wages that are non-linear in the com-

    plementarities and the coworker effects is to partition the set of workers into clusters where

    workers inside a group are similar to each other in their productivities and coworker influ-

    ences. Then at cluster level, I estimate the complementarities between each worker clusters

    and firms as well as the coworker effects between worker clusters.

    The identification consists of two stages. In the first stage, I identify similar workers and

    cluster workers based on their similarity. To achieve this goal, I develop a method in the

    spirit of Hagedorn et al. (2020) but taking into account the potential time-varying impact

    from the evolving set of coworkers. The key underlying assumption is that wages are driven

    by the unobserved “types” of the worker, the firm she matched into and the “types” of

    her coworkers. Here, “types” are interpreted as a time-invariant unobserved membership

    of groups with certain productive worker attributes that govern the worker’s productivity

    complementarity between their producitivities in output, in sharp contrast to additive firm and worker fixedeffect model would imply.

    5Specifically, “wage complementarity” refers as the wage component that is specific to the combinationof the productivity type of the worker and the firm in the match.

    6There are two other major problems with the linear-in-means model received by the literature: First, themodel is not most interesting for it constrains the net effect from regrouping peers thus the estimation hasonly limited policy implications. Second, the linear-in-means model is often misspecified and not supportedby numerous empirical studies (Hoxby and Weingarth (2005), Sacerdote (2011)).

    3

  • and her peer influence on coworkers. The vision is that there could be a large number of

    unobserved types of workers and their coworkers that meaningfully determine the observed

    wages in the data, and the core of the method is to partition workers of a large number of

    types into a relatively small numbers of clusters so that each cluster is populated with workers

    of similar types. I leverage the insight featured by most sorting models in the literature: two

    workers with similar attributes and similar working history, must earn similar wages in the

    same firm influenced by the same set of coworkers. These restrictions allow me to identify

    pairs of workers with similar unobserved types by comparing their wages observed in the

    same firm at each point of time.

    Based on a worker partition, the method enables to counter-factually predict the “out-

    of-sample” wage of any worker relocated to any firm she never worked at and with coworkers

    she might have never encountered, using the wages of other workers belonging to her “sim-

    ilar worker cluster” in the target firm to predict. The clustering-based estimation can be

    viewed as a “matrix completion” process that allows to measure the global consequence of

    wages (and other potential economic outcomes) between every worker and every firm with

    any combinations of coworkers after a reallocation. Methodology-wise, completing the wage

    matrix is also important for two reasons. First, it allows to further evaluate the similarity

    of workers who has not been directly worked as coworkers in the same firm by comparing

    their “completed wages” based on wages of other workers in their cluster. Second, it enables

    to validate the quality of the clustering through testing the accuracy of the out-of-sample

    predictions.

    In the data, however, most individual workers have been working for only a few firms

    with limited number of coworkers, and not many similar workers have been working together

    in the same firm. This posits a sparsity problem to the proposed method since wages are only

    comparable within firms implied by the theory. To account for the sparsity problem, I adopt

    a hierarchical clustering approach to group workers in an iterative manner. I start with the

    most restrictive criteria only joining pairs of workers with the minimal average wage difference

    throughout their coworkership. Despite the limited amount of initial merges due to the

    strictness of the criteria, joining these workers into bigger clusters can mitigate the sparsity

    problem through expanding the connectivity of coworker network and the set of available

    comparisons. This is because now I can make comparisons not only between a worker and

    her immediate coworkers in the same firm, but also between her and coworkers of another

    individual from her cluster encountered at different firms, based on her wages “completed”

    by the cluster average. I continue to merge similar workers until no more workers are left

    to be merged by the current criteria. Then, I iteratively move through a set of progressively

    relaxed similarity criterion allowing for merging workers within a increasingly bigger cutoff

    4

  • for their wage differences. The process delivers a hierarchical sequence of worker clustering.

    There is an apparent trade-off when to terminate the algorithm: as the procedure iter-

    ates forward joining more individuals into larger clusters, the connectivity of the coworker

    network become increasingly denser. The method is therefore able to make predictions for

    a wider range of workers as larger sets of comparable workers now become available. In

    the meanwhile, however, the accuracy of the prediction gradually deteriorates, subject to

    a greater “approximation error” since the purity of each cluster become increasingly con-

    taminated with including workers with lower similarity. To best balance the the trade off, I

    resort to machine learning: I split the data into three random subsets: the training set, the

    validation set and the test set. I estimate the hierarchical sequence of worker clustering only

    based on observations in the training set. Then the optimal worker partition chosen in the

    sequence of worker clustering as the one that best predicts observations in the validation set.

    For evaluation purpose, the performance of the estimated worker partition is validated and

    reported using the test dataset.

    In the second stage of this approach, I estimate the wage complementarities and coworker

    effects given the worker clustering obtained from the first stage. The essence to account for

    the potential self-selection problem in estimating coworker effects is to control for the wage

    complementarity that governs the sorting of workers across firms based on their unobserved

    productivity types. To control for the wage complementarity empirically, I include two-

    dimensional “match fixed effects” cross-indexed by the identity of the individual’s worker

    cluster and the identity of the individual firm in the match. Moving beyond the AKM

    approach relying on restrictive worker and firm fixed effects decompositions, the proposed

    method non-parametrically identifies wage complementarity reflecting non-linear interactions

    during each type of match.

    Conditional on the wage complementarity, I non-parametrically identify coworker effects

    utilizing the focal worker’s wage response to variations in her coworker productivity distri-

    bution driven by their job mobility. Here, the important identification assumption is that

    conditional on the productivity of the worker, the productivity of the firm and the pro-

    ductivity of all her coworkers, the wages of the worker are exogenous to her coworker job

    mobility. Empirically, I approximate the coworker productivity distribution for each indi-

    vidual with discrete “coworker components”, the mass function depicting the measure of

    the coworkers assigned to each worker bin. Then I estimate coworker effects function by

    projecting the wages of the worker on her coworker components conditional on the match

    effects. Each coefficient captures the coworker influence, the elasticity of the wage response

    to the coworker share of each productivity type. Beyond the linear-in-means model, the

    proposed method non-parametrically estimates heterogeneous coworker influences, as these

    5

  • coefficients are unrestricted by functional form assumptions. In an important extension of

    the method, I further generalize the framework that enables to measure complementarities

    between coworkers. I estimate a two-dimensional coworker effect function that allows for

    asymmetric coworker influence from one clusters to the other. In the general framework, I

    show the identification of the worker clustering follows the identical machine learning algo-

    rithm, and the two-dimensional coworker effects can be estimated separately for the focal

    workers in each worker clusters.

    I am now taking the algorithm to the administrative matched employer-employee data

    of Denmark, which covers a population of 3 million workers over a span of 20 years. De-

    spite its high accuracy, the proposed hierarchical clustering algorithm is relative slow for big

    dataset: its computational complexity is quadratic in space and cubic in time. To accommo-

    date the demand for scalability, I integrate the baseline method with Graph Convolutional

    Networks (GCN), a computer science advance in graph embedding techniques (Kipf and

    Welling (2016), Hamilton et al. (2017)). In recent years, GCN-based graph embedding tech-

    niques have enjoyed tremendous success in multi-discipline applications ranging from natural

    language processing, knowledge graphs to online recommender systems (Xu et al. (2018), Cai

    et al. (2018), Zhou et al. (2018), Wu et al. (2019)). The fundamental idea is to represent each

    node of the graph by a vector in a low dimensional space such that the similar pair of nodes

    are embedded close in the space. However, while these graph embedding methods exhibit

    high performance in computations, I found the accuracy of these methods are limited when

    tested with simulated data.

    Following Hagedorn et al. (2020), I integrate the baseline hierarchical clustering algorithm

    with the graph embedding techniques with a divide-and-conquer strategy. The “dividing”

    step computes worker embeddings using GraphSAGE and group closely embedded workers

    and divide them into separate subsets. The “conquering” step applies the baseline hierarchi-

    cal clustering algorithm only to each local subset: when the dividing step is relatively accu-

    rate, only similar workers are assigned into each cluster. Therefore, the divide-and-conquer

    strategy can significantly reduce the dimension of the problem by erasing voluminous redun-

    dant comparisons without any compromise of accuracy.

    This paper is related to multiple strands of the literature. The first contribution is to

    the fast-moving research on peer effects using large matched-employer-employee data. I de-

    veloped a parsimonious machine-learning-based approach that enables reliable and testable

    results. My framework extends Cornelissen et al. (2017) by allowing wages to reflect flexi-

    ble worker-firm complementarities and capture heterogeneous peer effects across unobserved

    worker productivities. Moving beyond homogeneous peer effects, Sanz de Galdeano (2020)

    find heterogeneous peer effects across observed characteristics for MEE data for Brazil, tak-

    6

  • ing a similar approach to Arcidiacono et al. (2012) and Cornelissen et al. (2017). In parallel

    with the literature focusing on contemporaneous peer effects, Herkenhoff et al. (2018) and

    Jarosch et al. (2019) find asymmetric flow of knowledge spillover from high wage workers

    to low wages workers over years. As for evaluating the efficacy of the estimate. Eckles and

    Bakshy (2020) conduct a constructed observational study by comparing the prediction of

    observational estimator of peer behavior based on a nonexperimental control group to a

    randomized experiment. My method takes a machine learning approach and evaluate out-of-

    sample prediction on the test set. Second, the paper contributes to the literature to identify

    labor market sorting based on unobserved heterogeneity. Bonhomme et al. (2019), Lentz

    et al. (2018) propose random-effect-based approach to bicluster workers and firms based on

    wage distribution. Complementary to these existing methods, my approach delivers in finite

    samples precise and accurate counterfactual predictions for any individual worker if allo-

    cated to any firm conditional on the set of coworkers. The third contribution of the paper

    is to the literature on team productions. worker-firm sorting and wage complementarities

    separately from the wage effects of coworkers. With additional assumptions on bargaining

    protocol and the market structures, my method also provides to non-parametrically estimate

    worker production in teams given observed wages, which is an equilibrium object and can be

    non-parametrically inverted for outputs. In contrast, Bonhomme (2020) quantify individual

    contribution given observed team output. Finally, the paper contribute to search frictions

    and assortative matching literature. To my knowledge, this paper is the first to jointly es-

    timate the complementary between worker and firm productivity taking into account the

    coworker effects.

    The rest of the paper is organized as follows. Section 2 setups the environment and il-

    lustrate the extent of the misspecification bias in measured coworker effects if the standard

    assumption that wages are linear in worker and firm effects is violated in the data. Sec-

    tion 3 propose a novel machine learning based approach and apply to extended framework

    of Cornelissen et al. (2017). The extension allows for the worker-firm complementarity and

    captures heterogeneous peer effects. Section 4 presents simulation results to show the effi-

    ciency and efficacy of the algorithm. Section 5 integrate my baseline method with the recent

    graph embedding techniques to achieve scalability. Section 6 concludes.

    2 Background

    This section introduces the framework of Cornelissen et al. (2017), the first leading empirical

    methodology that attempts to estimate peer effects for a large local labor market. Then I

    show the method is not robust at the presence of the wage complementarity between workers

    7

  • and firms, which could induce a sizable misspecification bias in the measured coworker effects.

    2.1 Environment

    In the matched employer and employee data for workers I = {1, ..., N} and firms J ={1, ...,M}, we can keep track of the wages for each individual worker i ∈ I when he ismatched into firm j ∈ J in year t ∈ T . For every observed match in each period (i, j, t),denote the log wages as wijt and match indicator mijt = 1, otherwise mijt = 0. Individual

    worker can have up to one job in each year. The set of workers for each firm j at the same

    period t are referred to as peer group Pjt = {i ∈ I | mijt = 1}. Denote Njt = |Pjt| − 1.Denote the coworker set for a reference worker i as P∼i,jt = {i′ ∈ I, i′ 6= i | mi′jt = 1}.

    2.2 The Method of Cornelissen et al. (2017)

    Cornelissen et al. (2017) is the leading empirical method to measure peer effects based on

    matched employer-employee data. To account for the selection problem, i.e. the endogenous

    sorting of high-productivity workers into high-productivity peer groups or high-productivity

    firms based on unobserved attributes, CDS extend the worker and firm fixed effects model

    of Abowd et al. (1999) by including control variables and multiple fixed effects. The goal is

    to estimate the following wage equation: 7

    wijt = αi + φjt + γᾱ∼i,jt + �ijt. (1)

    αi is a worker fixed effects for worker i to capture permanent worker ability and φjt is

    time-varying firm fixed effects for firm j for time t. The model captures peer effects in a

    “linear-in-means” setup: Here, term ᾱ∼i,jt is peer productivity defined as the average worker

    fixed effect in the peer group, computed by excluding individual i:

    ᾱ∼i,jt =1

    Njt

    ∑i′∈P∼i,jt

    αi′

    The spillover coefficient γ measuring the coworkers’ impact on wages, is the key parameter

    of interest.

    7For expositional clarity, this wage equation is simplified relative to the CDS by abstracting from occu-pation fixed effects and from the influence of other observable time-varying characteristics, as they do notaffect the conclusion of this section.

    8

  • 2.2.1 Homogeneous peer effects

    Two underlying assumptions in CDS are restrictive. The first one is that peer effect function

    is homogeneous. This is inconsistent with heterogeneity found in empirical studies.

    2.2.2 Worker-firm complementarity

    The second important one inherited from AKM is that the worker-firm wage component can

    be separably additive as a worker fixed effect and a time-varying firm fixed effect:

    αi + φjt.

    The first implication of this underlying assumption is the wage gap between two coworkers

    is a constant: if worker i is more productive and gets a higher wage than his coworker i′ for

    one firm j, he is expected to be so for all other firm j′ in the economy, irrespective to the

    productivity of the firm :

    E(wijt − wi′jt) = (αi − αi′)(

    1− γNjt

    ), ∀j ∈ J .

    Second, the high wage firm would always pay a high wage premium: for two firms j and j′

    with equal peer productivity ᾱ∼i,jt = ᾱ∼i,j′t, the expected wage difference is independent of

    the productivity of the worker employed:

    E(wijt − wij′t|ᾱ∼i,jt = ᾱ∼i,j′t) = φjt − φj′t, ∀i ∈ I.

    These two assumptions combined would rule out the interdependence of worker and firm

    productivity in wages, so that there’s no gain for firms to select the right job applicants,

    nor is there extra credit for the job seeker to select a best job. These implications are

    inconsistent with most structural models in the assortative matching literature where the

    production function is such that it is optimal to sort workers to firms where joint output is

    maximized.

    In more realistic settings, however, wages can reflect the effect of complementarities

    between worker and firm productivity. The worker ability could be complementary (or sub-

    stitutable) to the firm productivity such that a high-ability worker become more (or less)

    productive moving from a low-productivity firm to a high-productivity firm comparing to a

    low-ability worker. In consequence, the inter-dependence of worker and firm productivity give

    rise to positive (or negative) assortative matching, i.e. high-productivity workers sorted into

    high-productivity (or low-productivity) firms with other high-productivity colleagues in the

    9

  • equilibrium outcome. Hagedorn et al. (2014) for instance found positive assortative match-

    ing in German administrative data, in alignment with theories that attribute sorting to the

    worker-firm complementarity.

    The complementarity has two important implications. First, it induces sorting between

    workers and firms which posits a well-known empirical challenge of “selection problem” in the

    estimation of coworker effects. The selection problem arises when the cohort of coworkers

    are not formed at random. Second, the inter-dependence of worker and firm productivity

    implies that wage cannot be decomposed in an additively separable worker and firm fixed

    effect, which is in sharp contrast to the specification of CDS.

    2.2.3 Misspecification bias

    The CDS method cannot correctly account for the selection problem induced by the the

    complementarities between workers and firms. The misspecification bias can be sizable, with

    its sign being either positive or negative, depending on the magnitude and sign of the com-

    plementarity.

    Data generating process To illustrate the misspecification bias, I present the perfor-

    mance of CDS estimator applied to wages simulated from an alternative simple data gen-

    erating. In contrast to Equation (1), the wage does reflect the complementarities between

    workers and firms:

    wift = w(αi, φf ) = (αρi + φ

    ρf )

    1/ρ. (2)

    Here, each worker i ∈ I is endowed with a permanent latent productivity αi each firmf ∈ J a permanent latent productivity φj on entering the market. Both αi and φf areindependently drawn from the standard uniform distribution and cannot be observed in the

    data. Importantly, I focus on the case where the wages incorporates no coworker effect: wages

    are solely determined by the productivity of the worker αi and firm φf in the match, but

    not by coworkers. Substitution parameter ρ controlling the curvature of wage function w,

    representing the magnitude of the complementarity. In particular, when rho = 1, Equation

    (2) degenerates to (1) with the corresponding γ = 0. To generate realistic peer selction and

    positively associative matching, assume the worker and firm matches if and only if

    |αi − φf | < 0.1.

    10

  • The rest of the model follows a basic search and matching paradigm. The model use

    standard calibration with moments of the labor market mobility. The details is delegated to

    Section 4.1.

    Bias in finite sample The simulation result of estimator γ from equation (1) shown in

    Figure (2). The 95% bootstrap confidence interval is constructed using B = 100 bootstrap

    samples.

    Note that when ρ = 1, the worker and firm productivity x and y are linearly additive. In

    this case, the estimator γ̂ correctly recover the true magnitude of peer effect, i.e. the true value

    of γ = 0. In alternative cases, I simulate data for four different values of ρ ∈ {−3,−1, 0.5, 1.5}.In each case, the estimate γ̂ is subject to mis-specification bias. The size and sign of the bias

    is dependent on the strength of complementarity measured by |ρ− 1|: when ρ > 1, the signof the bias is upward γ̂ > 0. When ρ < 1 its sign is downward γ̂ < 0.

    Figure 1: Eγ̂ 6= 0 when ρ 6= 1.

    Asymptotic performance CDS estimator γ̂ does not converge to the true γ = 0 asymp-

    totically. The misspecification bias does not vanish asymptotically as the length of simulation

    T approaches infinity.

    11

  • Figure 2: γ̂ is inconsistent.

    CDS method detect a negative spillover parameter in the simulation where the true one

    is zero. To the intuition of the bias can be illustrated by contradictions: if we restrict the

    spillover parameter to be zero in (1), it amounts to fit an AKM regression with worker and

    time-varying firm effects only,

    wift = αi + φft + uift

    then a negative correlation could be found between the peer quality α̂−i,ft and the regres-

    sion residual uift. The negative correlation implies if withhold the restriction, the spillover

    parameter can be lowered to a negative number to better fit the model. That is in the case

    of (1):

    γ =cov(ᾱ−i,ft, uift)

    var(ᾱ−i,ft)< 0.

    The correlation is negative is because of the presence of complementarity, the estimated

    constant worker fixed effect tend to underpredicted the wage for a high-productivity worker

    while overpredict for a low-productivity worker. Focusing only on the within-peer-group

    variations the identification utilize, the regression residual is systematically higher for better

    workers, who are innately paired with worse coworkers within the same peer group.

    The example implies that estimating the match component w(x, y) is vital for estimating

    coworker effects. It controls for variations in wages accounted by the movement of non-

    observable worker and firm characteristics that may endogenously correlated with unobserved

    coworker attributes. If complementarity in w(x, y) is correctly measured, estimating coworker

    12

  • effect will be free from the selection problem. This is the main focus of the paper and will

    be delivered in the next section.

    3 Machine-learning based Approach

    In this section, I propose economic-theory-based machine-learning approach to estimate wage

    peer effects in a more generalized framework by extending Cornelissen et al. (2017) in two im-

    portant directions: First, the framework allows wages to reflect the flexible interdependence

    of worker and firm productivities, so that complementarities or substitabilities between them

    can be well captured. In accordance with assortative matching theories, these complemen-

    tarities can potentially account for the endogenous peer selection. Second, the framework

    allows for heterogeneous coworker effects across workers with different productivity levels.

    By doing so, the model can reconcile the mixed empirical findings in peer effect literature

    where different workers may exhibits heterogeneous impact on the wage of their coworkers

    in various scenarios.

    In addition to these two relaxations, my method provides to make precise counterfactual

    prediction of the wage one individual if reallocated to any firm in the economy with a

    different set of coworkers. From a macroeconomic perspective, the wage is an equilibrium

    object of a structural model and can be non-parametrically inverted to all other equilibrium

    outcome such as output and productivity. Thus, being able to empirically estimate the

    complementarities and coworker effects on wages, researchers can make further inference

    of complementarities and coworker effects in these outcomes as well. It opens the gate to

    address substantive questions: for instance, how to assess labor market efficiency and how

    to design policies to achieve the efficient assignment of coworkers. Of course, answering such

    questions involves making additional assumptions of the labor market, regarding the market

    structure, the bargaining protocol, etc.

    3.1 The Framework

    The goal is to jointly estimate worker-firm complementarities and heterogeneous coworker

    effects in wages in the following framework:

    wift = w(xi, yf )︸ ︷︷ ︸match effects

    +1

    Nft

    ∑j∈P−i,ft

    a(xj)︸ ︷︷ ︸coworker effects

    + νift. (3)

    13

  • where wift are observed wages in a large matched employer-employee dataset. xi is the

    unobserved productivity of worker i and yf the latent productivity for firm f . Productivity

    xi and yf are drawn at the born of the worker and vacancy from the exogenous distribution

    whose support can be normalized to the closed unit interval

    xi ∈ X = [0, 1], yf ∈ Y = [0, 1],

    and remain constant over time. Denote match effects w(xi, yf ) the component that captures

    the complementarity between worker producitivity xi and firm productivity yf in wages.

    Denote P−i,ft the (self-exclusive) coworker set for worker i’ in peer group. Peer groups aredefined by the set of workers at workplace (firm) f at time t, therefore can be indexed by

    (f, t). DenoteNft = |P−i,ft| the number of coworkers. Disturbance νift captures the variationsaccounted by all other factors, satisfying that E

    [νift∣∣ xi, yf] = 0.

    The key object of interest is the match and coworker effects component w(xi, yf ) and

    a(xj). The first key underlying assumption is that wages are not determined by the identity

    of the worker i, firm j or coworker j conditional on their productivity xi, yf and xj. The

    second key assumption is that both mapping w and a are both finite and continuous mapping

    defined on compact set:

    w : X × Y →cts

    R, a : X →cts

    R.

    I start with baseline model (3) as it is an immediate generalization of the CDS method

    and thus can be a good benchmark.8 Importantly, I do not impose restrictive function form

    assumptions on the match and coworker effect component w(x, y) and a(x), so that w(x, y)

    can flexibly reflect arbitrary interactions between worker and firm productivity, and a(x) can

    capture the heterogeneous peer effects between coworkers.

    The example in Section 2 implies that estimating the match component w(x, y) is vital

    for estimating coworker effects a(x), as it controls for variations in wages accounted by the

    movement of non-observable worker and firm characteristics that may endogenously corre-

    lated with unobserved coworker attributes. Once w(x, y) is correctly measured, estimating

    coworker effect a(x) will be free from the selection problem.

    8In specific, when the match and spillover component are both linear

    w(x, y) = x+ y, a(x) = γx

    equation (3) degenerates into CDS specification (1). Similar to CDS, my framework is abstract from en-dogenous peer effects: own wage is independent of peer wage conditional on the productivity of the worker,her firm and her coworkers. Therefore, the method has bypassed the highlighted challenge of the refectionproblem as well as distinguishing between the endogenous and exogenous peer effects in the literature sinceManski (1993).

    14

  • 3.1.1 Identification

    When the types of workers {xi}i∈I and firms {yf}f∈F are observable, coworker effects functiona(x) in equation (3) can be identified when identification assumption holds:

    Assumption 3.1. Identification: For all workers i and her coworkers ∀j ∈ P−i,ft:

    νift ⊥⊥ h−i,ft(k)∣∣∣∣ xi, yf , {xj}j∈P−i,ft ,

    where h(xj) is the measure of her coworkers j belonging to productivity type xj = k:

    h(k) =1

    Nft

    ∑j∈P−i,ft

    1{xj = k}.

    Identification assumption (3.1) states that conditional on the type of the worker, the firm

    and her coworkers, the wages νift are exogenous to the mobility of her coworkers. Impor-

    tantly, identification holds under this assumption for a general wage function incorporating

    contemporaneous coworker effects as follows:

    wift =gn(xi, yf , {xj}j∈P−i,ft) + νift (4)

    where n is the size of the peer group; {xj}j∈P−i,ft is the collection of coworker productivity.

    Theorem 3.1. Under Assumption (3.1), general wage equation (4) is identified.

    The baseline (3) is identified as it is a specific form of equation (4). 9 The identification

    holds given both types of workers and firms x and y being observed. The greater challenge is

    how to measure the match and spillover functions when these types are not observed. This

    is discussed in the next session.

    9One concern regarding Assumption (3.1) is that there could be potentially peer-group level shocks thatsimultaneously affect wages and coworker components. Then Assumption (3.1) is violated and the estimationfor (3) may lead to biased results. The issue can be easily fixed using “within estimator” w, a, i.e. by estimating(3) conditional on time-varying peer-group fixed effects:

    wift = w(xi, yf ) +1

    Nft

    ∑j∈P−i,ft

    a(xj) + Zft + �ift. (5)

    (16) is identified when

    �ift ⊥⊥ h−i,ft(xj)∣∣∣∣ xi, yf , {xj}j∈P−i,ft .

    The identification only utilizes movements of wages and coworker composition within peer groups.

    15

  • 3.2 The Method

    I develop an economic-theory-based semi-parametric approach to estimate the wage function

    (3). The method can be viewed as an extension of the non-parametric method proposed by

    Hagedorn, Manovskii and Xin (2020), to allow for the coworker impact on wages. The main

    take from their work is that wage function can be non-parametrically estimated by grouping

    workers with similar unobserved productivities with a hierarchical clustering approach. From

    a machine learning perspective, non-parametrically estimate w(xi, yf ) and a(xj) be viewed

    as a matrix completion problem: the goal is to best predict wages if counter-factually match

    any worker i ∈ I into any firm f ∈ F , based on wages in observed matches {i, f, t, wift} inthe matched-employer-employee dataset, and under the constraint of (3).

    Taking the coworker effects into account, the wages of the same worker matched into the

    same firm can move in response to the evolution of peer components. To identify similar

    workers, I leverage the insight that workers with similar matching history would get similar

    wages working in the same firm at the same point in time. Thus, the similarity between two

    coworkers can be measured by their average wage distance during the coworkership. Notice

    that implied by both (3) and (4), any pair of similar coworkers i and i′ ∈ Pft at the same firmsame period should receive similar coworker influence on their wages, as these two workers

    share almost identical coworker set P−i,ft and P−i′,ft. To my knowledge, this feature is alignwith most assortative matching models in the literature, including Hagedorn et al. (2014),

    Gautier and Teulings (2012), Eeckhout and Kircher (2011), Lise et al. (2016).

    The identification of unobserved worker productivities, wage complementarities and

    coworker effects therefore can be conducted in two consecutive stages: “clustering” and “es-

    timation”. In the “clustering” stage, the target is to identify in the data groups of workers

    that are similar in latent productivity and assign them into the same group, and predict

    the wage at firms one did not work at, based on what workers assigned to the same group

    make at that firm in the same year. The worker clustering takes an agglomerative hierar-

    chical approach: the algorithm starts with each worker initialized as a single-point cluster,

    and iteratively merges the most similar pair of “child” clusters at current stage into a new

    “parent” cluster, and update the similarity between the new cluster and the rest of existing

    ones. In the “estimation” stage, I estimate match and coworker effect function a(x) and

    w(x, y) given the worker clustering assigned.

    I adopt a cross validation method to decide the number of clusters. The optimal clustering

    is chosen to make the best out-of-sample prediction of wages.

    16

  • 3.3 The “Clustering” Stage

    This section explains how the algorithm works for the “clustering” stage .

    3.3.1 Notations

    Clustering C is a of a set of workers I if it forms a partition of I:

    C = {C1, C2, ..., CK}, Ck = {i ∈ I|ci = k}.

    The assignment function c maps individuals to their cluster c : I → K. Clusters are indexedby integers from the cluster-label set K ≡ {1, ..., K}. The number of clusters in C is K.Clusters Ck and firm f match at t if i works at firm f at some t. Denote the matching set :

    Ckft ≡ {i ∈ Ck : i works at firm f at some t}

    Denote the matching indicator between workers and firms for certain periods on set C×F×T :

    Ωk,f,t =

    1 if worker i ∈ Ck works at firm f at time t.0 otherwiseWhen worker cluster Ck matches firm f at year t, the cluster mean µkft can be evaluated:

    µkft =1

    |Ckft|∑

    k′∈Ckft

    wk′ft, if Ωkft = 1.

    Dissimilarity Within firm f at t, wage distance between individual worker j, k:

    Djkft = wjft − wkft =w(xj, yf )− w(xk, yf ) +a(xk)− a(xj)

    Nft(6)

    Note that when worker i and j are similar xi ≈ xj, Djkft ≈ 0 given that both w and a arecontinuous. The average wage distance between individual worker j, k over all wage distances

    observed at periods t and firms f :

    Djk = mean{f,t | Ωjft=1, Ωkft=1}

    Djkft (7)

    17

  • Worker similarity between each pair of worker is measured by the average wage distance a

    coworkership.

    d(j, k) =

    |Djk|∑

    t∈T ΩjftΩkft > 0

    ∞ otherwise

    In particular, worker j is similar to k at cutoff κ, if d(j, k) < κ; otherwise worker j is

    dissimilar to k at κ. The hierarchical clustering algorithm sequentially merge individual

    worker to worker clusters. I also define wage distance and similarity measure between a pairs

    of worker clusters. Given clustering set C, the wage distance and similarity between workercluster pair Cj and Ck within firm f at t is:

    Djkft = µjft − µkft. (8)

    The average wage distance and dissimilarity between Cj and Ck:

    Djk = mean{f,t | Ωjft=1, Ωkft=1}

    Djkft (9)

    d(Cj, Ck) =

    |Djk|∑

    t∈T ΩjftΩkft > 0

    ∞ otherwise(10)

    Worker cluster Cj is similar to Ck at cutoff κ, if d(Cj, Ck) ≤ κ. Likewise, worker cluster Cj isdissimilar to Ck at cutoff κ, if d(Cj, Ck) > κ. Affinity graph A ∈ SK of C. Vertex j ∈ K of Arepresents a cluster Cj ∈ C and edge (j, k) represents the similarity between Cj and Ck at κ:

    Aj,k(κ) =

    1 if d(Cj, Ck) < κ

    −1 if d(Cj, Ck) > κ

    0 if Cj and Ck does not match.

    . (11)

    Components Denote Π ∈ P (C) one partition of worker clustering set C who has L subsets.The l-th subset of Π consists of Kl member clusters Sl = {C1l , C2l , ..., CKl}:

    Π = {{C11 , ..., CK1}, ..., {C1L , ..., CKL}} = {S1, ...,SL},L∑l=1

    Kl = K.

    Subset S is a path-similar component at cutoff κ if all its member clusters forms a connectcomponent of the affinity graph A. That is, for any pair of member clusters Ck, Ck′ ∈ S,

    18

  • exists path on S {Ck → Cp(1) → ... → Cp(N) → Ck′} such that any two consecutive clusterson the path are similar at κ: Ap(n), p(n+1)(κ) = 1, ∀ n ∈ {0, ..., N}. Partition Π is a path-similar partition of C at κ if all its subsets are path-similar components at κ. Denote it asΠ∗(C).

    Subset S is a disagreement-free component at cutoff κ if there are no dissimilar pairof member clusters Ck, Ck′ ∈ S at κ, or Ak, k′ (κ) > −1, ∀ Ck, Ck′ ∈ S. Partition Π is adisagreement-free partition of C at κ if all its subsets are disagreement-free components atκ. Denote it as Π∗∗(C).

    3.3.2 The Algorithm

    The input of the algorithm is the wage information wift for all worker-firm pairs (i, f) which

    match at time t, i.e. Ωi,f,t = 1. The output of the algorithm is worker clustering C.The algorithm goes through a number of iterations ι ∈ {0, 1, 2, . . . , ῑ} each associated

    with regularization parameter κ ∈ {0, �, 2�, . . . , κ̄ ≡ ῑ�}, where � represents a small number.At the first iteration ι = 0, initialize worker clustering C0 = I to be the set of single-worker clusters, i.e. N worker clusters each contains an individual worker: C0 = {C01 ={1}, C02 = {2}, . . . , C0N = {N}}. and the corresponding initial assignment functions wouldbe c0i = i,∀i ∈ I. The outcome of each iteration ι is a worker clustering Cι.

    Workers are clustered in a hierarchical order of their similarities: for each iteration ι, the

    algorithm decide whether or not to group current worker cluster Cιj, Cιk ∈ Cι based on the

    current cutoff κ. If Cιj, Cιk are similar at κ, they will be both merged into the same worker

    cluster:

    merge Cιj and Cιk if Aj,k(κ) = 1.

    The algorithm can be summarized as follows:

    Algorithm 1 Worker Clustering (Baseline)

    function Worker Clustering

    Initialize clustering: one worker = one cluster

    Distance d(Cj, Ck) = meanf∈F ,t∈T | µjft − µkft |

    for κ ∈ [0, ..., κ̄] do

    if d(Cj, Ck) < κthen Merge (Cj, Ck) and update cluster wage ŵ, â end if

    Repeat until distance for all ≥ κ

    end for

    end function

    19

  • 3.3.3 Theoretical Properties

    This subsection derives theoretical properties of the worker clustering by Algorithm 1 with

    additional assumptions.

    If κ increase slowly, i.e. step increment � is small for each iteration, at most one pair of

    workers are merged into the same cluster in the next iteration clustering Cι+1, and Algorithm1 can accurately group similar workers.

    Theorem 3.2. When κ increases slowly, each iteration ι associated to cutoff κ delivers a

    clustering Cι is a path-similar partition of workers I by κ. That is, for each pair of individualon the j, k ∈ I in the same cluster j, k ∈ C,C ∈ Cι, exists a path connecting j and k on Csuch that each pair of adjacent workers on the path are similar by κ.

    Proof. Proof by induction: Theorem 3.2 obviously holds for ι = 1. Assume it holds for ι = n

    with cutoff κ = ι�: without loss of generality, suppose that Ck, Cj ∈ C are two clusters to beagglomerated at iteration j ∈ Cj, k ∈ Ck, then each of Cj and Ck is a path-similar partitionfor I at the previous cutoff κ′ = (n − 1)� < κ, thus Cj is also path-similar by κ. Considerthe distance between the closest pair of workers on each cluster j∗ ∈ Cj, k∗ ∈ Ck, j∗ and k∗

    are similar at κ as d(j∗, k∗) ≤ d(j∗, Ck) ≤ d(Cj, Ck) < κ. The newly formed cluster Cj,k isalso a path-similar partition of I, since for any two member j, k ∈ Cj,k, can find path fromj → j∗ → k∗ → k such that all adjacent workers are similar at κ. Therefore, the theoremalso holds for ι = n+ 1.

    For computational efficiency, � is set a reasonable large number in practice and there

    can be multiple worker clusters in the same path-similar component (constituting the same

    connected components of the affinity graph A(κ)) assigned in each iteration ι. Ruling out dis-

    agreement for each path-similar component requires additional constraints. This discussion

    is delegated to Section 3.7.

    Homophily Worker of certain productivity x ∈ X exhibits local homophily near neighbor-hood Br(x) ≡ {x′ ∈ X : |x′ − x| < r}, if any pair workers with similar types ∀k, k′ ∈ Br(x)have been working as coworkers with independent probability:

    p(k, k′) >log µr(x)N

    µr(x)N.

    µr(x) is the fraction of workers whose type x ∈ Br(x): µr(x) =∫Br(x) φ(x)dx. φ(x) is the

    probability density at x. Local homophily assumes that workers with similar latent produc-

    tivities have higher tendency to become coworkers. It guarantees sufficient local matching

    20

  • density for these workers that are close in productivity space to meet in the same workplace.

    Algorithm 1 can detect and group workers with similar unobserved productivity where local

    homophily holds.

    Single-crossing Worker of certain productivity xj ∈ X satisfies the single crossing con-dition, if the average wage distance between any individual worker j and k, Djk(xj) is

    monotonically increasing in the productivity xj, for all xk ∈ X and yj ∈ Y . Recall that by(6) and (7):

    Djk(xj) = mean{f,t | Ωjft=1, Ωkft=1}

    w(xj, yf )− w(xk, yf ) +a(xk)− a(xj)

    Nft

    Note that Djk is a finite function defined on a compact set X . The single-crossing conditionassumes that a worker of higher productivity should always get a higher wage. Under this

    condition, Djk(xj) crosses zero only for once at xj = xk. In consequence, small wage dis-

    tances between workers can be mapped into the proximity of their productivity. A sufficient

    condition for single-crossing is when wage w(x, y) is increasing in x and size of the peer group

    Nft is large so that|a(xk)−a(xj)|

    Nftis always dominated by |w(xj, yf )− w(xk, yf )| .

    Theorem 3.3. (No global split) Suppose x ∈ X exhibits local homophily in Br(x), for anypair of worker j and k with similar productivity ∀xj, xk ∈ Br(x), can find κ > 0 at whichAlgorithm 1 terminates and delivers path-similar clustering C∗ at κ that assigns both workersto the same cluster with high probability:

    limN→∞

    p(c∗k = c∗j) = 1.

    Lemma 3.1. (Erdos and Renyi, ’60) Denote random graph G(N, p) that has N vertices and

    whose edge between any pair of vertices form independently with probability p.

    p =p0 logN

    N

    Graph G(N, p) is connected with high probability if and only if p0 > 1.

    Proof. when N → ∞, immediately after Lemma 3.1, sub-graph G(µr(x)N, p) is connectedwith high probability, i.e. exists a path connecting all workers in Br(x) such that any adjacentworkers j, k on the path are coworkers. Now need to show can find κ > 0 all j and k on the

    path are also similar at κ. Since j, k ∈ Br(x), |xj − xk| < r. Because wage distance function(6) is continuous, the average distance is bounded: ∃δjkft : |Djk(xj) − Djk(xk)| < δjkft.Therefore, can choose κ = maxj,k,f,t δjkft such that all adjacent worker j and k are similar

    21

  • at κ. That is Br(x) constitutes a path-similar component and all its members are assignedto the same cluster.

    Theorem 3.4. (No local contamination): Assume that all x ∈ X satisfies single-crossingcondition. Exists κ > 0 at which Algorithm 1 terminates and delivers path-similar clustering

    C∗ such that∀k : |xj − xk| > r, p(c∗k′ = c∗j) = 0.

    Proof. Theorem 3.4 immediately follows the continuity and the monotonicity of wage dis-

    tance Djk(xj): without loss of generality, assume that xj > xk + r, and then κ0 ≡Djk(xj) − Djk(xk) > 0. For all small κ < κ0, worker j and k will be dissimilar and willnot be grouped to the same cluster.

    Theorem 3.4 states that if terminate Algorithm 1 at a relative big cutoff κ, the outcome

    clustering C can detect and group all similar workers in a densely connected neighborhood.On the other hand, Theorem 3.3 states that if terminate Algorithm 1 at a relative small

    cutoff κ′, the corresponding clustering C ′ that can distinguish any dissimilar pair of worker jand k. The optimal stopping criterion for κ must best balance the tradeoff between “splitting

    similar workers into multiple clusters” and “contaminating a cluster by introducing dissimilar

    workers”. The discussion of choosing the optimal κ is delegated to Section 3.5.

    3.4 The “Estimation” Stage

    Based on worker clustering C and assignment function ci, the “estimation” stage recoversthe complementarities and coworker effects function w(xi, yf ) and a(xj) at worker cluster

    level. The idea is that when workers assigned to the same worker cluster are close in their

    productivity space, I can recover coworker influences for any individual their cluster average:

    Ewift = w(xi, yf ) +1

    Nft

    ∑j∈P−i,ft

    â(xj)

    ≈ ŵ(ci, f) +1

    Nft

    ∑j∈P−i,ft

    â(cj). (12)

    where ŵ : C × F → R, ŝ : C → R.The estimation recovers unobserved heterogeneous characteristics in workers and firms

    that determines the wages. Despite the fact that xi, yf are latent, the cluster membership ci

    and firm identity f are now observable from the “clustering” stage. Conditional on observed

    22

  • ci and f , equation (13) is identified under Assumption 3.1.

    wift = ŵ(ci, f) +1

    Nft

    ∑j∈P−i,ft

    â(cj) + ν̃ift

    = ŵ(ci, f) +1

    Nft

    ∑k∈K

    h−i,ft(k) â(k) + ν̃ift (13)

    Coworker component h−i,ft(k) =∑

    j∈P−i,ft 1{cj = k} counts the number of coworkers inP−i,ft assigned to cluster k. The match component ŵ(ci, f) is captured by the joint-worker-cluster-by-firm fixed effect and coworker effect â(cj) can be estimated by the coefficient of

    wage responses to the changing coworker components.

    3.5 Regularization

    I use regularization criterion to select the optimal worker clustering among the sequence of

    all iterations {Cι} that minimizes the generalization error of the machine learning algorithm.In particular, I place penalty when the cluster sizes are too small, and select the clustering

    Cι (and ŵ and â it implied) that minimizes the RMSE of out-of-sample forecast.To evaluate the criterion, I randomly split the data into three components: the training

    set (80%), the validation set (10%), and the test set (10%). 10 Only based on observations

    in the training set, I estimate the sequence of worker clustering {Cι and function ŵι, âι} foreach iteration ι. Based on that, I can make out-of-sample predictions for each observation

    wi,f ′,t′ in the validation set and the test set. If Cιi matches to f

    ′ at t′ in the training set, i.e.

    the algorithm can find one or more workers assigned to the same cluster Cιi who worked at

    firm f ′ at t′, it should optimally predict the average wage in that cluster:

    w̃if ′t′ = ŵ(ci, f′) +

    1

    Nf ′t′

    ∑j∈P−i,f ′t′

    â(cj) = µci,f ′t′

    If such worker does not exist, predictions cannot be made based on a similar reference worker.

    The best predictor would be worker i’s average wage in the training sample conditional on

    all available information.

    Predicted wage ŵιif ′t′ of i at new firm f′ at t′ =

    w̃if ′t′ , if Cιi and f ′ match at t′:worker i tr. sample ave., otherwise10Since wages are inter-dependent within each peer group, the split is at random at peer groups. For

    example, if peer groups (f, t) is drawn and assigned to the training set, all workers in Pft are assigned tothe training set.

    23

  • The criterion function is the RMSE evaluated on the test sample.

    Q(w, ŵι) =

    (mean

    (i,f,t)∈{validation set}(ŵιift − wift)

    )1/2. (14)

    and the optimal clustering is selected as

    ι∗ = arg minι{Q(w, ŵι)} (15)

    Starting from the initial iteration ι = 0, each worker forms its own cluster. At this itera-

    tion, the average of each single-worker cluster can accurately reflect wages for the individual.

    However, I cannot make out-of-sample predictions based on “similar reference workers”, but

    can only rely on the worker’s personal average. Therefore, criterion (14) for the initial itera-

    tion would be large. As the algorithm proceeds, the wage cutoff κ = ι� gradually increases,

    with more workers can be grouped as similar, and the average size of clusters gets bigger.

    Criterion (14) will first decrease as more workers can be predicted with cluster average in-

    stead of personal average. (14) will increase again when the average size of clusters gets too

    large to accurately predict individual wages. One can imagine when the wage cutoff goes to

    infinite, all workers are assigned to one single big cluster. The algorithm would predict the

    average wage for all workers which couldn’t be accurate. The lowest points of the u-shaped

    regularization curve represents the optimal trade-off.

    Figure 3: Criterion Q(w, ŵ) on the validation set.

    24

  • 3.6 Measuring coworker complementarities

    In this section, I show how to further generalize the framework (3) to allow for the com-

    plementarity between coworker productivities in the coworker effects. Further allowing for

    coworker complementarities besides the complementarity between the worker and the firm

    has important implications on efficiency and reallocation of workers across the firms.

    3.6.1 Framework

    In a more relaxed framework, wages can be determined as 11

    wift = w(xi, yf )︸ ︷︷ ︸match effects

    +1

    Nft

    ∑j∈P−i,ft

    a(xi, xj)︸ ︷︷ ︸general coworker effects

    + νift. (17)

    Here, I adopt the two-dimensional continuous spillover function aX 2 → R to capturethe wage complementarity between worker i and her coworkers j ∈ P−i,ft. Particularly,a(xi, xj) reflect the coworker influence exerted by coworker j on the focal worker i. Note that

    conditional on the set of productivity xi, yf , {xj}j∈P−i,ft being observed, the identificationholds for (16) as it takes a specific form of the general wage function (4).

    3.6.2 The “clustering” stage

    Importantly, the identification of unobserved worker productivity is identical to the baseline

    equation (6), i.e. the worker similarity can be measured with the wage distance between

    11Similarly to (3), one can include time-varying fixed effects to account for the potential endogeneity bythe common shocks (e.g. technology shocks at the firm or other cohort level):

    wift = w(xi, yf ) +1

    Nft

    ∑j∈P−i,ft

    a(xi, xj) + Zft(xi) + �ift. (16)

    This equation is identified under the assumption of exogeneity

    �ift ⊥⊥ h−i,ft(xj)∣∣∣∣ xi, yf , {xj}j∈P−i,ft .

    and empirically requires no multi-linearity between the realization of

    X = {1{ci = l, f}, {h−i,ft(cj)}, 1{ci = l, f, t}}.

    25

  • individual worker j, k within firm f at t:

    Djkft = wjft − wkft =w(xj, yf )− w(xk, yf ) +a(xi, xk)− a(xi, xj)

    Nft(18)

    As the similarity can be measured pairwisely with identical distance function, the worker

    clustering obtained with the “clustering” stage identifies types {xi}I for the extended model.

    3.6.3 The “estimation” stage

    Given worker clustering C acquired in the clustering stage, I estimate coworker effect functionâ(ci = l, cj = k) defined for each pair of interaction between worker cluster Ck and Cl. The

    estimation and identification are in the similar fashion of (13), but separately for each group

    of worker in the same clusters ci = l.

    wift = ŵ(ci, f) +1

    Nft

    ∑j∈P−i,ft

    â(ci, cj) + ν̃ift

    = ŵ(l, f) +1

    Nft

    ∑k∈K

    h−i,ft(k) â(l, k) + ν̃ift (19)

    Coworker component h−i,ft(k) =∑

    j∈P−i,ft 1{cj = k} counts the number of coworkers inP−i,ft assigned to cluster k.

    3.7 Disagreement-free Partition

    When the step of cutoff � is small, at most one pair of workers are merged into the same

    cluster next clustering Cι+1. While it is accurate, the algorithm is slow. In practice, I usereasonably large cutoff step � to merge multiple worker clusters in the same path-similar

    component assigned in each iteration ι. This is more efficient. The problem is, there could

    be disagreement in same path-similar component. For example, one component {Ci, Cj, Ck}can be path-similar by certain κ, if worker cluster Ci is similar to cluster Cj, and cluster Cj

    is similar to cluster Ck. It can also contain a “disagreement” if worker cluster Ci and Ck are

    observably dissimilar at a workplace where Cj is absent. Ruling out disagreement for each

    path-similar component requires additional constraints.

    The idea is to partition each path-similar component into finer collection of disagreement-

    free components:

    {S∗∗l(1), ...,S∗∗l(M)} = split disagreement(S∗l ).

    Subroutine split disagreement is a fine-tuning device developed to rule out dissimilar workers

    in a path-similar component. It takes each path-similar component S∗l ∈ Π∗ as input, and

    26

  • recursively finds the minimum cut to split the component into multiple disconnected “child

    component” once a disagreement is found. Finding the smallest cut of a component is to

    remove least edges weighted by similarity so that the component is split into disconnected

    child sub-components. If both child components are disagreement-free, the algorithm stops.

    Otherwise repeat the procedure until all “child component” do not contain any dissimilar

    pair of worker clusters at the cutoff κ. The procedure can be viewed as a tree transverse with

    depth-first search.

    Algorithm 2 Split Disagreement

    function Split Disagreement . Input: S∗ . Output: S∗∗

    Initialize partition list S∗∗ = {}

    if component S∗ is disagreement-free

    then add S∗ to partition list S∗∗

    else

    Find the most distant cluster

    Cj, Ck = arg maxCj ,Ck∈S∗

    d(Cj, Ck)

    Find minimum cut

    {S∗l ,S∗r } = maxflow(S∗, Cj, Ck)

    Repeat for component containing Cj:

    {S∗∗j(1), ...,S∗∗j(M)} = split disagreement(S∗l )

    Repeat for component containing Ck:

    {S∗∗k(1), ...,S∗∗k(M ′)} = split disagreement(S∗r )

    Add {S∗∗l(1), ...,S∗∗l(M),S∗∗r(1), ...,S∗∗r(M ′)} to partition list S∗∗

    end if

    end function

    27

  • Incorporating this modification, I propose the efficient version algorithm for worker clus-

    tering as follows:

    Algorithm 3 Worker Clustering (Efficient)

    function Worker Clustering

    Initialize clustering: one worker = one cluster

    Distance d(Cj, Ck) = meanf∈F ,t∈T | µjft − µkft |

    for ι ∈ [0, ..., ῑ] do

    Current clustering Cι, cutoff: κ = ι�

    Evaluate affinity graph Aj,k(κ)

    Find its connected component Π∗(Cι) = {S1, ...,SL}

    for l ∈ [1, ..., L] do

    S∗∗ ≡ {S∗∗l(1), ...,S∗∗l(M)} = split disagreement(S∗l )

    end for

    Disagreement-free partition

    Π∗∗ = {S∗∗1(1), ...,S∗∗1(M ′), ...,S∗∗L(1), ...,S∗∗L(M ′′)}

    Merge clusters in all components of Π∗∗ into the same cluster

    The output is the clustering for the next iteration Cι+1

    end for

    end function

    28

  • Illustration of Worker Clustering

    1 23

    5 6

    Firm 𝑓, Time 𝑡

    Wag

    e 𝑊

    𝑖,𝑓,𝑡

    Worker 𝑖

    1. The input of the algo-rithm is wages observed in thematched employer employeedata wift (y axis) for all work-ers i (x axis) at the sameworkplace, i.e. in firm f at thesame period t. Individuals ex-ert influence on their cowork-ers’ wages.

    1 23

    5 6 1 2 3 4 5 6

    Firm 𝑓, Time 𝑡

    Wag

    e 𝑊

    𝑖,𝑓,𝑡

    Worker 𝑖 Worker 𝑖

    Worker hierarchy

    Dis

    sim

    ilar

    ity

    |𝐷𝑖,𝑖′|

    2. Starting from the first it-eration associated with theminimum κ (y axis of theright panel), the algorithmassign the most similar work-ers (“1” and “2”) into thesame cluster.

    Firm 𝑓, Time 𝑡

    3

    5 6 3 4 5 6

    Worker hierarchy

    1 2

    1 2

    Firm 𝑓, Time 𝑡

    Wag

    e 𝑊

    𝑖,𝑓,𝑡

    Worker 𝑖 Worker 𝑖

    Worker hierarchy

    Dis

    sim

    ilar

    ity

    |𝐷𝑖,𝑖′|

    3. Continue to the next itera-tion with larger κ, the nextmost similar pair of worker(“3” and “4”) are grouped.

    29

  • 1 23

    5 6 1 2 3 5 64

    Firm 𝑓, Time 𝑡

    Wag

    e 𝑊

    𝑖,𝑓,𝑡

    Worker 𝑖 Worker 𝑖

    Worker hierarchy

    Dis

    sim

    ilar

    ity

    |𝐷𝑖,𝑖′|

    4. As cutoff κ further in-crease, larger fraction ofsingle-worker clusters areagglomerated into biggerclusters. Note that mergenot only takes place betweenindividual workers but alsobetween worker clusters (e.g.between cluster “12” andindividual “3”).

    45 1 2 3 4 5 6

    78

    Firm 𝑓′′, Time 𝑡′′

    Wag

    e 𝑊

    𝑖,𝑓′′,𝑡′′

    Worker 𝑖 Worker 𝑖

    Worker hierarchy

    Dis

    sim

    ilar

    ity

    |𝐷𝑖,𝑖′|

    5. As the size of clusterincrease, the algorithm cancompare and group a widerrange of workers. For ex-ample, worker “4” and “6”has never been coworkers.Nonetheless, they are identi-fied similar and cluster via in-termediate worker “5”. Notethat “4” and “5” earn similarwage in firm f ′ at t′ while “5”and “6” earn similar wage infirm f at t.

    1 23

    5 6 1 2 3 4 5 6

    Firm 𝑓, Time 𝑡

    Wag

    e 𝑊

    𝑖,𝑓,𝑡

    Worker 𝑖 Worker 𝑖

    Worker hierarchy

    Dis

    sim

    ilar

    ity

    |𝐷𝑖,𝑖′|

    train

    6. When increasing κ to in-finity, the algorithm ultimateagglomerates all workers. Theoutput is a “worker hier-archy”: for every individualworker, a group of similarworkers given cutoff κ.

    30

  • Illustration of Regularization

    11 2 3 4 5 61

    3

    45

    78

    2

    6

    error

    cross-validate

    1 23

    5 6

    Firm 𝑓, Time 𝑡

    Wag

    e 𝑊

    𝑖,𝑓,𝑡

    Worker 𝑖 Worker 𝑖

    Worker hierarchy Firm 𝑓′, Time 𝑡′

    Worker 𝑖

    Dis

    sim

    ilar

    ity

    |𝐷𝑖,𝑖′|

    Wag

    e 𝑊

    𝑖,𝑓′ ,𝑡′

    train predict

    𝜅

    1. Each level of cutoff κ corresponds to a unique worker clustering, based on which the algorithmpredict out-of-sample wages in the validation set. Given κ fixed at the level in the middle panel, thecorresponding clustering C = {{1, 2, 3}, {4, 5, 6}}. The algorithm use the average wage of worker2 and 3 in firm f ′ at t′ to predict for worker 1 (the dashed square) and compare it to the actualwage (the solid square) in validation set to evaluate the RMSE error.

    Firm 𝑓, Time 𝑡

    13

    Wag

    e 𝑊

    𝑖,𝑓,𝑡

    5 6

    Worker 𝑖

    1 2 3 4 5 6

    Worker 𝑖

    Worker hierarchy Firm 𝑓′, Time 𝑡′

    2

    train predict

    1

    3

    45

    78

    2

    6

    Worker 𝑖

    Dis

    sim

    ilar

    ity

    |𝐷𝑖,𝑖′|

    Wag

    e 𝑊

    𝑖,𝑓′′,𝑡′′

    cross-validate

    𝜅

    2. Search for κ that minimize the prediction error. Note in this case, the RMSE decreases whenmove to a lower κ, corresponding to clustering C = {{1, 2}, {3}, {4, 5, 6}}, and the counterfactualwage for worker 1 is only predicted with worker 2’s wage.

    31

  • 4 Simulation Results

    To evaluate the accuracy of the estimator in finite sample, I run Monte Carlo simulations. I

    simulate wages from a data generating process with standard calibration.

    4.1 Simulation

    This section shows that Algorithm 1 effectively estimates the worker clustering, the wage

    complementarity and the coworker effects function. Each worker i ∈ I has productivity

    xi, x : I → R and each firm j ∈ J is has productivity yj, y : J → R. For each year

    t ∈ {1, ..., T} the unemployed workers randomly search and apply for job vacancies in firms.

    The average monthly job finding rate is calibrated to λ̄ is set to 40%. To generate realistic

    positively associative matching, i.e. high-productivity workers will work in high-productivity

    firms, assume that the offer is accepted if and only if

    |xi − yf | < 0.1.

    For each type of match (xi, yj), worker i left firm j subject to an exogeneous job separation

    rate δ = 3%. Importantly, the match effect component for (xi, yf ).

    w(xi, yf ) = (xρi + y

    ρf )

    1/ρ.

    and the coworker effect exerted by coworker j whose type xj is a(xj). Wage is determined

    by Equation (3) given worker productivity xi, firm productivity yf and all the coworker

    productivities {xj}j∈P−i,ft :

    wift = w(xi, yf ) +1

    Nft

    ∑j∈P−i,ft

    a(xj) + νift.

    The parameters are summarized in Table I:

    32

  • Workers N = 10, 000Firms M = 200Years T = 20Worker types xi ∼ U [0, 1]Firm types yf ∼ U [0, 1]Job finding rate λ = 40%Job separating rate δ = 3%Meeting randomMatching set {(x, y) : ||x− y|| < 0.1}

    Table I: Data Generating Process

    4.1.1 Data Generating Process #1

    In the first DGP, the match effect takes functional form of a CES and there is no peer effects:

    w(x, y) =(x1/2 + y1/2

    )2, a(xi) = 0.

    Clustering The outcome cluster assignment ci for all workers in displayed in Figure 12.

    On the x-axis for each pixel is the true worker productivity xi while on the y-axis is the label

    of the assigned cluster ci for the same worker i. The clustering C displayed in Figure 12 is

    highly accurate: workers that are close in X are assigned to the same or adjacent clusters

    and the assignment function ci is on the 45 degree line.

    Figure 12: Worker cluster assignment ci. Note that cluster labels are identified up to apermutation. In cluster label are ranked by the average true worker productivity.

    33

  • Coworker effects The coworker effect function is accurately estimated. In Figure 13 When

    there is no coworker effect is in the DGP, the algorithm correctly detect zero. â(x) ≈ 0.

    Figure 13: Estimated coworker effects â(cj) and the ground truth a(xj)

    To evaluate the accuracy of estimator â, I use relative risk, defined as the ratio of the

    root of mean squared error for estimator â(X ) over the total variance of the true coworker

    effects in the population.

    RR =

    (∫ 10

    (â(x)− a(x))2φ(x)dx∫ 10a(x)2φ(x)dx

    )1/2× 100%

    φ(x) is probability density function of workers whose type is x. To create a comparable

    benchmark, I show the performance of the implied CDS estimator for coworker influence for

    each individual i as:

    âCDSi = γ̂ · ψ̂i.

    The results are summarised in Table II.

    Baseline estimator â CDS estimator âCDS

    Relative Risk 6.87% inf

    Table II: Relative Risk for both estimators

    34

  • 4.1.2 Data Generating Process #2

    In the second DGP, I simulate wage with the same match effect function, but now with

    positive coworker effect function:

    w(xi, yf ) =(x

    1/2i + y

    1/2f

    )2, a(xj) = 0.05xj.

    Coworker effects Still, the coworker effect function is accurately estimated. In Figure 15

    the estimated and true coworker effect function well align on top of each other. â(x) ≈ a(x).

    Figure 14: Estimated coworker effects â(ci) and the ground truth a(xi)

    35

  • General coworker effects The results for the general estimator between type

    â(x = l, x′ = k)

    are displayed as follows:

    Figure 15: Estimated coworker effects â(ci, cj) and the ground truth a(xi, xj)

    The performance of the baseline estimator â(l), two-dimensional estimator â(l, k) and

    benchmark estimator âCDS are summarised in III

    Baseline est. â(k) General est. â(l, k) CDS est. âCDS

    Relative Risk 3.42% 5.68% 44.81%

    Table III: Relative Risk for both estimators

    36

  • 5 “Divide and Conquer”: Achieve the Scalability

    While the baseline hierarchical clustering algorithm demonstrates its accuracy in identifying

    unobserved worker productivities and estimating the coworker effects, one impediment to

    implement it to large administrative data lies in its computational complexity in both time

    and space. The time complexity of an algorithm measures how the number of operations

    scales with the size of the data and space complexity measures its memory usage. For a typical

    hierarchical clustering, the time complexity is cubic O(N2 logN) and space requirement is

    quadratic O(N2), where N is the number of workers in the data. This is formidable volume

    of computations and space requirement for the scale of the matched employer-employee data

    of Denmark where the size N = 3 million workers.

    5.1 Graph Embedding Techniques

    To accommodate the demand for scalability, I combine the baseline Algorithm 1 with recent

    advancements in graph neural network (GNN) based graph embedding techniques.

    Graph embedding is a widely applied graph analysis method that has achieved ground-

    breaking success in recent applications in multiple domains ranging from recommender sys-

    tem to pharmaceutics (Cai et al. (2018), Wu et al. (2019),Zhou et al. (2018)) for detailed

    reviews of these recent development and applications. The target to map a graph into a low

    dimensional space where the graph information is best preserved. Particularly in my appli-

    cation, the focus is to find embeddings for each worker i and each peer group (time-varying

    firm f by t) given wage matrix wi,ft ∈ RN×FT . Here, w is viewed as a biparte graph between

    worker node i ∈ I and time-varying firm node p ∈ F ×T and with the weight on edge (i, ft)

    being the observed wage wi,ft. The embedding is a vector representation of an individual

    worker or peer group in a low dimension: hi ∈ Rv and hft ∈ Rv, that can best preserve the

    wage information in the data.

    Graph embedding can be efficiently computed with GNN techniques, pionneered by Kipf

    and Welling (2016). The method of GNN follows a neighborhood aggregation scheme: the

    embedding of a worker node is computed by recursively aggregating and transforming the

    embeddings of its neighboring coworkers (Xu et al. (2018)). The scheme therefore enables

    37

  • flexible interactions between coworkers reflected by the embedding of workers and the time-

    varying firms.

    Illustration of GNN-based Graph Embedding

    𝒉𝒋𝒌−𝟏

    𝒉𝒇′𝒕′𝒌−𝟐

    𝒉𝒇𝒕𝒌−𝟐

    𝒉𝒊𝒌−𝟏

    𝒉𝒇𝒕𝒌

    Wage Matrix Graph Convolutional Network

    j

    ft

    𝒉𝒇𝒕𝒌−𝟐

    ft

    𝐿 𝐻; 𝑊 =

    𝑖,𝑓𝑡

    𝑊𝑖𝑓𝑡 − 𝑤𝜃 ℎ𝑓𝑡, ℎ𝑖

    𝒉𝒇𝒕𝒉𝒇′𝒕′

    𝒉𝒊

    𝒉𝒋

    𝑾𝒊𝒇𝒕

    ℎ𝑓𝑡𝑘 = 𝑔𝜃({ℎ𝑖

    𝑘−1, ℎ𝑗𝑘−1})

    f’t’

    i

    j

    i

    ft

    ft

    ft

    f’t’

    Figure 16: The implementation of GraphSAGE. Consider second order neighborhood aggre-gations. Each circle in both panels represents a worker node (“i”), a square a time-varyingfirm node (“ft”), and the edge the wage for the match (“wi,ft”). All grey square in the rightpanel represent the same graph neural network function gθ (paramenterized by θ) that ag-gregates the embedding of neighbor nodes (“i” and “j”) to update the embedding of currentnode (“ft”). Once the embedding are computed, wages can be recovered by neural networkfunction wθ. The object is to find embeddings H = {{hi}, {hft}}, wθ, and gθ to minimizeloss function L(H,W ).

    The worker embedding can be computed efficiently with GNN. The time and space com-

    plexity is only linearO(N). In addition, the computation and optimization for neural network

    is highly modularized, parallelizable and can be easily distributed on GPU.

    5.2 Worker embeddings

    Figure 17 displays the result of worker embedding for wages simulated by the data generating

    process in Section 4.1. The coordinate of a node is two t-SNE representation of the embedding

    for a worker. Each node is colored by the true productivity type x. The GNN algorithm can

    38

  • well distinguish workers globally as worker in similar colors tend to appear at the similar

    location, but the result subject to a plenty of local mistakes and is less accurate than the

    baseline Hierarchical clustering Algorithm 1.

    { Worker-by-time-varying-firm wage matrix }Graph Embbedding−−−−−−−−−−−→ Rv k-means−−−−→ subsets

    Figure 17: t-SNE representation of Worker Embedding

    5.3 “Divide and Conquer”

    The baseline hierarchical clustering is accurate but very costly in both computation and

    memory usage in big-data applications. Despite being less accurate, the GNN-based graph

    embedding is can be implemented with high performance and efficiency. This section pro-

    pose to integrate the baseline algorithm with the GNN approach with a divide and conquer

    strategy. The integration is closely related to the proposal by Hagedorn et al. (2020).

    In the “division” step, compute worker embeddings using GNN and group closely em-

    bedded workers and divide them into separate subsets. In the “conquering” step, apply

    hierarchical clustering only to each local subset: on the premise that only similar workers

    assigned into each cluster, this step significantly reduce the dimension of the problem by

    erasing voluminous redundant comparisons without any compromise of accuracy. In case

    that GNN mistakenly “split” similar workers into different subset, I reshuffle the subsets and

    repeat the procedure.

    39

  • Different from the random walk based “node2vec” embedding employed by Hagedorn,

    Manovskii and Xin (2000), GNN-based embedding has a number of advantages. First, it

    is suitable to study coworker effects as it explicitly model interaction between neighboring

    coworkers in a flexible manner. Second, GNN approach allows to incorporate node feature

    information. Moreover, the computation for GNN can be easily paralleled and efficiently

    computed with GPU.

    5.4 Simulation Results

    To illustrate the efficiency of the divide-and-conquer strategy, I simulate Data Generation

    Process 2 in Section 4.1 with large number of workers N = 100, 000.

    Clustering The outcome cluster assignment ci for all workers for the divide-and-conquer

    algorithm in displayed in Figure 18. The clustering C displayed is accurate: workers that are

    close in X are assigned to the same or adjacent clusters and the assignment function ci is on

    the 45 degree line.

    Figure 18: Worker cluster assignment ci (divide-and-conquer).

    Coworker effects Figure 15 indicates that the coworker effect function is accurately esti-

    mated.

    40

  • Figure 19: Estimated coworker effects â(ci) and the ground truth a(xi)

    For estimator â on X :

    RMSE =

    (∫ 10

    (â(x)− a(x))2φ(x)dx)1/2

    = 0.16%

    41

  • 6 Conclusion

    In this paper I have developed a new empirical methodology that allows to study peer effects.

    I show that the leading empirical methodology is biased under worker-firm complementar-

    ity. I developed a semi-parametric approach to jointly estimate the wage complementarities

    and coworker effects. The method can also capture heterogeneous coworker effects, which

    helps to reconcile the diverging results in microeconomic literature. The approach combines

    recent advancement in machine learning and the approach is based on economic theory. To

    accommodate the demand for computational efficiently, I integrate the baseline algorithm

    with GNN-based graph embedding techniques. I am currently using the proposed method to

    measure co-worker effects in the matched employer-employee panel data covering the entire

    population of Denmark.

    42

  • References

    Abowd, J. M., F. Kramarz, and D. N. Margolis (1999): “High Wage Workers and

    High Wage Firms,” Econometrica, 67, 251–334.

    Abowd, J. M., F. Kramarz, S. Pérez-Duarte, and I. M. Schmutte (2018): “Sorting

    Between and Within Industries: A Testable Model of Assortative Matching,” Annals of

    Economics and Statistics, 1–32.

    Andrews, M. J., L. Gill, T. Schank, and R. Upward (2012): “High Wage Workers

    Match with High Wage Firms: Clear Evidence of the Effects of Limited Mobility Bias,”

    Economics Letters, 117, 824–827.

    Angrist, J. (2014): “The perils of peer effects,” Labour Economics, 30, 98–108.

    Arcidiacono, P., G. Foster, N. Goodpaster, and J. Kinsler (2012): “Estimating

    spillovers using panel data, with an application to the classroom,” Quantitative Economics,

    3, 421–470.

    Banerjee, A., E. Duflo, R. Glennerster, and C. Kinnan (2015): “The Miracle of

    Microfinance? Evidence from a Randomized Evaluation,” American Economic Journal:

    Applied Economics, 7, 22–53.

    Becker, G. (1973): “A Theory of Marriage: Part I,” Journal of Political Economy, 81,

    813–846.

    Betts, J. and A. Zau (2004): “Peer groups and academic achievement: Panel evidence

    from administrative data,” .

    Bloom, N., J. Liang, J. Roberts, and Z. J. Ying (2014): “ Does Working from Home

    Work? Evidence from a Chinese Experiment *,” The Quarterly Journal of Economics,

    130, 165–218.

    Bonhomme, S. (2020): “Heterogeneity, Sorting and Complementarity,” Working paper,

    National Bureau of Economic Research.

    Bonhomme, S., T. Lamadon, and E. Manresa (2019): “A Distributional F