Transcript
Page 1: Alignment at Work: Using Language to Distinguish the

IRLE WORKING PAPER#102-17

April 2017

Gabriel Doyle, Sameer B. Srivastava, Amir Goldberg, and Michael C. Frank

Alignment at Work: Using Language to Distinguish the Internalization and Self-Regulation Components of Cultural Fit in Organizations

Cite as: Gabriel Doyle, Sameer B. Srivastava, Amir Goldberg, and Michael C. Frank. (2017). “Alignment atWork: Using Language to Distinguish the Internalization and Self-Regulation Components of Cultural Fit in Organizations”. IRLE Working Paper No. 102-17. http://irle.berkeley.edu/files/2017/Alignment-At-Work.pdf

http://irle.berkeley.edu/working-papers

Page 2: Alignment at Work: Using Language to Distinguish the

This paper is under review and may change in the final version. Please contact the authors before citing.

Alignment at Work: Using Language to Distinguish the Internalizationand Self-Regulation Components of Cultural Fit in Organizations

Gabriel DoyleDepartment of Psychology

Stanford [email protected]

Amir GoldbergGraduate School of Business

Stanford [email protected]

Sameer B. SrivastavaHaas School of Business

UC [email protected]

Michael C. FrankDepartment of Psychology

Stanford [email protected]

Abstract

Cultural fit is widely believed to affect thesuccess of individuals and the groups towhich they belong. Yet it remains an elu-sive, poorly measured construct. Recentresearch draws on computational linguis-tics to measure cultural fit but overlooksasymmetries in cultural adaptation. Bycontrast, we develop a directed, dynamicmeasure of cultural fit based on linguis-tic alignment, which estimates the influ-ence of one person’s word use on another’sand distinguishes between two encultura-tion mechanisms: internalization and self-regulation. We use this measure to traceemployees’ enculturation trajectories overa large, multi-year corpus of corporateemails and find that patterns of alignmentin the first six months of employment arepredictive of individuals downstream out-comes, especially involuntary exit. Fur-ther predictive analyses suggest referentialalignment plays an overlooked role in lin-guistic alignment.

1 Introduction

Entering a new group is rarely easy. Adjusting tounfamiliar behavioral norms and donning a newidentity can be cognitively and emotionally taxing,and failure to do so can lead to exclusion. But suc-cessful enculturation to the group often yields sig-nificant rewards, especially in organizational con-texts. Fitting in has been tied to positive careeroutcomes such as faster time-to-promotion, higherperformance ratings, and reduced risk of beingfired (O’Reilly et al., 1991; Goldberg et al., 2016).

A major challenge for enculturation researchis distinguishing between internalization and self-

regulation. Inward-looking internalization in-volves identifying as a group member and ac-cepting group norms, while outward-looking self-regulation entails deciphering the group’s norma-tive code and adjusting one’s behavior to comply.Existing approaches, which generally rely on self-reports, are subject to self-report bias and typicallyyield only static snapshots of this process. Recentcomputational approaches that use language as abehavioral signature of group integration uncoverdynamic traces of enculturation but cannot distin-guish internalization and self-regulation.

To overcome these limitations, we introduce adynamic measure of directed linguistic accommo-dation between a newcomer and existing groupmembers. Our approach differentiates betweenan individual’s (1) base rate of word use and (2)linguistic alignment to interlocutors. The formercorresponds to internalization of the group’s lin-guistic norms, whereas the latter reflects the ca-pacity to regulate one’s language in response topeers’ language use. We apply this languagemodel to a corpus of internal email communica-tions and personnel records, spanning a seven-yearperiod, from a mid-sized technology firm. Weshow that changes in base rates and alignment,especially with respect to pronoun use, are con-sistent with successful assimilation into a groupand can predict eventual employment outcomes—continued employment, involuntary exit, or volun-tary exit—at levels above chance. We use this pre-dictive problem to investigate the nature of linguis-tic alignment. Our results suggest that the com-mon formulation of alignment as a lexical-levelphenomenon is incomplete.

2 Linguistic Alignment and Group Fit

Linguistic alignment Linguistic alignment isthe tendency to use the same or similar words

Page 3: Alignment at Work: Using Language to Distinguish the

This paper is under review and may change in the final version. Please contact the authors before citing.

as one’s conversational partner. Alignment isan instance of a widespread and socially impor-tant human behavior: communication accommo-dation, the tendency of two interacting peopleto nonconsciously adopt similar behaviors. Ev-idence of accommodation appears in many be-havioral dimensions, including gestures, postures,speech rate, self-disclosure, and language or di-alect choice (see Giles et al. (1991) for a review).More accommodating people are rated by theirinterlocutors as more intelligible, attractive, andcooperative (Feldman, 1968; Ireland et al., 2011;Triandis, 1960). These perceptions have mate-rial consequences—for example, high accommo-dation requests are more likely to be fulfilled, andpairs who accommodate more in how they ex-press uncertainty perform better in lab-based tasks(Buller and Aune, 1988; Fusaroli et al., 2012).

Although accommodation is ubiquitous, indi-viduals vary in their levels of accommodation inways that are socially informative. Notably, morepowerful people are accommodated more stronglyin many settings, including trials (Gnisci, 2005),online forums (Danescu-Niculescu-Mizil et al.,2012), and Twitter (Doyle et al., 2016). Most rel-evant for this work, speakers may increase theiraccommodation to signal camaraderie or decreaseit to differentiate from the group. For example,Bourhis and Giles (1977) found that Welsh En-glish speakers increased their use of the Welshaccent and language in response to an Englishspeaker who dismissed it.

Person-group fit and linguistic alignmentThese findings suggest that linguistic alignment isa useful avenue for studying how people assimi-late into a group. Whereas traditional approachesto studying person-group fit rely on self-reportsthat are subject to self-report bias and cannot fea-sibly be collected with high granularity acrossmany points in time, recent studies have proposedlanguage-based measures as a means to tracingthe dynamics of person-group fit without havingto rely on self-reports. Building on Danescu-Niculescu-Mizil et al. (2013)’s research into lan-guage use similarities as a proxy for social dis-tance between individuals, Srivastava et al. (forth-coming) and Goldberg et al. (2016) developed ameasure of cultural fit based on the similarity inlinguistic style between individuals and their col-leagues in an organization. Their time-varyingmeasure highlights linguistic compatibility as an

important facet of cultural fit and reveals distincttrajectories of enculturation for employees withdifferent career outcomes.

While this approach can help uncover the dy-namics and consequences of an individual’s fitwith her colleagues in an organization, it cannotdisentangle the underlying reasons for this align-ment. For two primary reasons, it cannot distin-guish between fit that arises from internalizationand fit produced by self-regulation. First, Gold-berg et al. (2016) and Srivastava et al. (forthcom-ing) define fit using a symmetric measure, theJensen-Shannon divergence, which does not takeinto account the direction of alignment. Yet thedistinction between an individual adapting to peersversus peers adapting to the individual would ap-pear to be consequential. Second, this prior workconsiders fit across a wide range of linguistic cate-gories but does not interrogate the role of particu-lar categories, such as pronouns, that can be espe-cially informative about enculturation. For exam-ple, a person’s base rate use of the first-person sin-gular (I) or plural (we) might indicate the degreeof group identity internalization, whereas adjust-ment to we usage in response to others’ use of thepronoun might reveal the degree of self-regulationto the group’s normative expectations.

Modeling fit with WHAM To address theselimitations, we build upon and extend the WHAMalignment framework (Doyle and Frank, 2016) toanalyze the dynamics of internalization and self-regulation using the complete corpus of emailcommunications and personnel records from amid-sized technology company over a seven-yearperiod. WHAM uses a conditional measure ofalignment, separating overall homophily (uncon-ditional similarity in people’s language use, drivenby internalized similarity) from in-the-momentadaptation (adjusting to another’s usage, corre-sponding to self-regulation). WHAM also pro-vides a directed measure of alignment, in that itestimates a replier’s adaptation to the other con-versational participant separately from the partici-pant’s adaptation to the replier.

Level(s) of alignment The convention withinlinguistic alignment research, dating back to earlywork on Linguistic Style Matching (Niederhofferand Pennebaker, 2002), is to look at lexical align-ment: the repetition of the same or similar wordsacross conversation participants. From a commu-

Page 4: Alignment at Work: Using Language to Distinguish the

This paper is under review and may change in the final version. Please contact the authors before citing.

nication accommodation standpoint, this is justi-fied by assuming that one’s choice of words repre-sents a stylistic signal that is partially independentof the meaning one intends to express—similar tothe accommodation on paralinguistic signals dis-cussed above. The success of previous linguisticalignment research shows that this is valid.

However, words are difficult to divorce fromtheir meanings, and sometimes repeating a wordconflicts with repeating its referent. In particular,pronouns often refer to different people dependingon who uses the pronoun. While there is evidencethat one person using a first-person singular pro-noun increases the likelihood that her conversa-tion partner will as well (Chung and Pennebaker,2007), we may also expect that one person us-ing first-person singular pronouns may cause theother to use more second-person pronouns, sothat both people are referring to the same per-son. This is especially important under the In-teractive Alignment Model view (Pickering andGarrod, 2004), where conversants align their en-tire mental representations, which predicts bothlexical and referential alignment behaviors willbe observed. Discourse-strategic explanations foralignment also predict alignment at multiple levels(Doyle and Frank, 2016).

Since we have access to a high-quality corpuswith meaningful outcome measures, we can inves-tigate the relative importance of these two typesof alignment. We will show that referential align-ment is more predictive of employment outcomesthan is lexical alignment, suggesting a need foralignment research to consider both levels ratherthan just the latter.

3 Data: Corporate Email Corpus

We use the complete corpus of email between full-time employees at a mid-sized technology com-pany between 2009 to 2014 (Srivastava et al.,forthcoming). To protect privacy and confiden-tiality, all identifying information was removed,and each email was summarized as only a countof word categories in its text. These categories area subset of the Linguistic Information and WordCount system (Pennebaker et al., 2007). The cat-egories were chosen because they are likely to beindicative of one’s standing/role within a group.1

1Six pronoun categories (first singular (I), first plural (we),second (you), third singular personal (he, she), third sin-gular impersonal (it, this), and third plural (they)) and five

We divided email chains into message-replypairs to investigate conditional alignment betweena message and its reply. To limit these pairs tocases where the reply was likely related to the pre-ceding message, we removed all emails with morethan one sender or recipient (including CC/BCC),identical sender and recipient, or where the senderor recipient was an automatic notification systemor any other mailbox that was not specific to a sin-gle employee. We also excluded emails with nobody text or more than 500 words in the body text,and pairs with more than a week’s latency betweenmessage and reply. Finally, because our analy-ses involve enculturation dynamics over the firstsix months of employment, we excluded repliessent by an employee whose overall tenure wasless than six months. This resulted in a collec-tion of 407,779 message-reply pairs, with 485 dis-tinct replying employees. We combined this withmonthly updates of employees joining and leavingthe company and whether they left voluntarily orinvoluntarily. Of the 485, 66 left voluntarily, 90left involuntarily, and 329 remained employed atthe end of the observation period.

4 Model: An Extended WHAMFramework

To assess alignment, we use the Word-Based Hi-erarchical Alignment Model (WHAM) framework(Doyle and Frank, 2016). The core principle ofWHAM is that alignment is a change, usually anincrease, in the frequency of using a word categoryin a reply when the word category was used in thepreceding message. For instance, a reply to themessage What will we discuss at the meeting?, islikely to have more instances of future tense thana reply to the message What did we discuss at themeeting? Under this definition, alignment is thelog-odds shift from the baseline reply frequency,the frequency of the word in a reply when the pre-ceding message did not contain the word.

WHAM is a hierarchical generative modelingframework, so it uses information from related ob-servations (e.g., multiple repliers with similar de-mographics) to improve its robustness on sparsedata (Doyle et al., 2016). There are two key pa-rameters, shown in Figure 2: ηbase, the log-odds ofa given word category c when the preceding mes-sage did not contain c, and ηalign, the increase in

time/certainty categories (past tense, present tense, futuretense, certainty, and tentativity).

Page 5: Alignment at Work: Using Language to Distinguish the

This paper is under review and may change in the final version. Please contact the authors before citing.

the log-odds of c when the preceding message didcontain c.

A dynamic extension To understand encultura-tion, we need to track changes in both the align-ment and baseline over time. We add a month-by-month change term to WHAM, yielding a piece-wise linear model of these factors over the courseof an employee’s tenure. Each employee’s tenureis broken into two or three segments: their first sixmonths after being hired, their last six months be-fore leaving (if they leave), and the rest of theirtenure.2 The linear segments for their alignmentare fit as an intercept term ηalign, based at theirfirst month (for the initial period) or their lastmonth (for the final period), and per-month slopesα. Baseline segments are fit similarly, with pa-rameters ηbase and β.3 To visualize the align-ment behaviors and the parameter values, we cre-ate “sawhorse” plots, with an example in Figure1.

In our present work, we are focused on changesin cultural fit during the transitions into or out ofthe group, so we collapse observations outside thefirst/last six months into a stable point estimate,constraining their slopes to be zero. This simplifi-cation also circumvents the issue of different em-ployees having different middle-period lengths.4

Model structure The graphical model for ourinstantiation of WHAM is shown in Figure 2.For each word category c, WHAM’s generativemodel represents each reply as a series of token-by-token independent draws from a binomial dis-tribution. The binomial probability µ is dependenton whether the preceding message did (µalign)or did not (µbase) contain a word from categoryc, and the inferred alignment value is the differ-ence between these probabilities in log-odds space(ηalign).

2Within each segment, the employee’s alignment modelis similar to that of Yurovsky et al. (2016), who introduceda constant by-month slope parameter to model changes inparent-child alignment during early linguistic development.

3Informal investigations into the change in baseline usageover time showed roughly linear changes over the first/lastsix months, but our linearity assumption may mask interest-ing variation in the enculturation trajectories that may rewardfurther investigation.

4As shown in Figure 1, the pieces do not need to de-fine a continuous function. Alignment behaviors continueto change in the middle of an employee’s tenure (Srivastavaet al., forthcoming), so alignment six months in to the job isunlikely to be equal to alignment six months from leaving, orthe average alignment over the middle tenure.

alignbase

0.1

0.2

0.3

0.4

0.5

−2.7

−2.6

−2.5

−2.4

Time

Log−

odds

par

amet

er e

stim

ates

ηalignstart

ηalignmid

ηalignend

αend

α start

β start

βend

ηbasemid

ηbasestart

ηbaseend

Figure 1: Sample sawhorse plot with key variableslabelled. The η point parameters (first month,last month, and middle average) and α (or β) by-month slope (start, end) parameters are estimatedby WHAM for each word category and employeegroup.

The specific values of these variables dependon three hierarchical features: the word categoryc, the group g that a given employee falls into,and the time period t (a piece of the piece-wiselinear function: beginning, middle, or end). Notethat the hierarchical ordering is different for the ηchains and the α/β chains; c is above g and t forthe η chains, but below them for the α/β chains.This is because we expect the static (η) values fora given word category to be relatively consistentacross different groups and at different times, butwe expect the values to be independent across thedifferent word categories. Conversely, we expectthat the enculturation trajectories across word cat-egories (α/β) will be similar, while the trajecto-ries may vary substantially across different groupsand different times. Lastly, the month m in whicha reply is written (measured from the start of thetime period t) has a linear effect on the η value, asdescribed below.

To estimate alignment, we first divide thereplies up by group, time period, and calendarmonth. We separate the replies into two sets basedon whether the preceding message contained thecategory c (the “alignment” set) or not (the “base-line” set). All replies within a set are then aggre-gated in a single bag-of-words representation, withcategory token counts Calign

c,g,t,m and Cbasec,g,t,m, and

total token counts N basec,g,t,m and N base

c,g,t,m compris-ing the observed variables on the far right of themodel. Moving from right to left, these counts areassumed to come from binomial draws with prob-ability µalignc,g,t,m or µbasec,g,t,m. The µ values are then

Page 6: Alignment at Work: Using Language to Distinguish the

This paper is under review and may change in the final version. Please contact the authors before citing.

C

N

N

N αg,t αc,g,t

ηalignc ηalignc,g,t ηalignc,g,t,m µalignc,g,t,m Calignc,g,t,m

ηbasec ηbasec,g,t ηbasec,g,t,m µbasec,g,t,m Cbasec,g,t,m

βg,t βc,g,tN

N

N N logit−1

Binom

N N

logit−1

Binom

Nbasec,g,t,m

N alignc,g,t,m

m

month

group, time

category

Figure 2: The Word-Based Hierarchical Alignment Model (WHAM). Hierarchical chains of normaldistributions capture relationships between word categories, individuals, outcome groups, and time, andgenerate linear predictors η, which are converted into probabilities µ for binomial draws of the words inreplies.

in turn generated from η values in log-odds spaceby an inverse-logit transform, similar to linear pre-dictors in logistic regression.

The ηbase variables are representations of thebaseline frequency of a marker in log-odds space,and µbase is simply a conversion of ηbase to proba-bility space, the equivalent of an intercept term ina logistic regression. ηalign is an additive value,with µalign = logit−1(ηbase + ηalign), the equiv-alent of a binary feature coefficient in a logisticregression. The specific month’s η variables arecalculated as a linear function: ηalignc,g,t,m = ηalignc,g,t +mαc,g,t, and similarly with β for the baseline.

The remainder of the model is a hierarchy ofnormal distributions that integrate social structureinto the analysis. In the present work, we havethree levels in the hierarchy: category, group, andtime period. In Analysis 1, employees are groupedby their employment outcome (stay, leave volun-tarily, leave involuntarily); in Analyses 2 & 3,where we predict the employment outcomes, eachgroup is a single employee. The normal distri-butions that connect these levels have identicalstandard deviations σ2 = .25.5 The hierarchiesare headed by a normal distribution centered at

5The deviation is not a theoretically motivated choice, andwas chosen as a good empirical balance between reasonableparameter convergence (improved by smaller σ2) and goodmodel log-probability (improved by larger σ2).

0, except for the ηbase hierarchy, which has aCauchy(0, 2.5) distribution.6

Message and reply length can affect alignmentestimates; the WHAM model was developed inpart to reduce this effect. As different employeeshad different email length distributions, we furtheraccounted for length by dividing all replies intofive quintile length bins, and treated each bin asseparate observations for each employee. This de-sign choice adds an additional control factor, butresults were qualitatively similar without it. Allof our analyses are based on parameter estimatesfrom RStan fits of WHAM with 500 iterations overfour chains.

While previous research on cultural fit has em-phasized either its internalization (O’Reilly et al.,1991) or self-regulation (Goldberg et al., 2016)components, our extension to the WHAM frame-work helps disentangle them by estimating themas separate baseline and alignment trajectories.For example, we can distinguish between anarchetypal individual who initially aligns to hercolleagues and then internalizes this style of com-

6As ηbase is the log-odds of each word in a reply being apart of the category c, it is expected to be substantially nega-tive. For example, second person pronouns (you), are around2% of the words in replies, approximately −4 in log-oddsspace. We follow Gelman et al. (2008)’s recommendation ofthe Cauchy prior as appropriate for parameter estimation inlogistic regression.

Page 7: Alignment at Work: Using Language to Distinguish the

This paper is under review and may change in the final version. Please contact the authors before citing.

munication such that her baseline use also shiftsand another archetypal person who aligns to hercolleagues but does not change her baseline usage.The former exhibits high correspondence betweeninternalization and self-regulation, whereas thelatter demonstrates an ability to decouple them.

5 Analyses

We perform three analyses on this data. First,we examine the qualitative behaviors of pro-noun alignment and how they map onto employeeoutcomes in the data. Second, we show thatthese qualitative differences in early encultura-tion are meaningful, with alignment behaviorspredicting employment outcome above chance.Lastly, we consider lexical versus referential lev-els of alignment and show that predictions are im-proved under the referential formulation, suggest-ing that alignment is not limited to low-level word-repetition effects.

5.1 Analysis 1: Dynamic Qualitative Changes

We begin with descriptive analyses of the behav-ior of pronouns, which are likely to reflect incor-poration into the company. In particular, we lookat first-person singular (I), first-person plural (we),and second-person pronouns (you). We expect thatincreases in we usage will occur as the employeeis integrated into the group, while I and you us-age will decrease, and want to understand whetherthese changes manifest on baseline usage (i.e., in-ternalization), alignment (i.e., self-regulation), orboth.

Design We divided each employee’s emails bycalendar month, and separated them into the em-ployee’s first six months, their last six months (ifan employee left the company within the observa-tion period), and the middle of their tenure. Em-ployees with fewer than twelve months at the com-pany were excluded from this analysis, so thattheir first and last months did not overlap. Wefit two WHAM models in this analysis. The firstaggregated all employees, regardless of employ-ment outcome, to minimize noise; the second sep-arated them by outcome to analyze cultural fit dif-ferences.

Outcome-aggregated model We start with theaggregated behavior of all employees, shown inFigure 3. For baselines, we see decreased use of Iand you over the first six months, with we usage

I You We

alignbase

0.0

0.2

0.4

0.6

−4.5

−4.0

−3.5

Time

Log−

odds

par

amet

er e

stim

ates

Figure 3: Sawhorse plots showing the dynam-ics of pronoun alignment behavior across employ-ees. Vertical axis shows log-odds for baseline andalignment. Top row shows estimated alignment,highest for we and smallest for you. Bottom rowshows baseline dynamics, with employees shiftingtoward the average usage as they enculturate. Theshaded region is one standard deviation over pa-rameter samples.

increasing over the same period, confirming theexpected result that incorporating into the groupis accompanied by more inclusive pronoun usage.Despite the baseline changes, alignment is fairlystable through the first six months. Alignment onfirst-person singular and second-person pronounsis lower than first-person plural pronouns, likelydue to the fact that I or you have different referentswhen used by the two conversants, while both con-versants could use we to refer to the same group.We will consider this referential alignment in moredetail in Analysis 3. Since employees with dif-ferent outcomes have much different experiencesover their last six months, we will not discuss themin aggregate, aside from noting the sharp declinein we alignment near the end of the employees’tenures.

Outcome-separated model Figure 4 showsoutcome-specific trajectories, with green linesshowing involuntary leavers (i.e., those who arefired or downsized), blue showing voluntaryleavers, and orange showing employees who re-mained at the company through the final month ofthe data. The use of I and you is similar to theaggregates in Figure 3, regardless of group. Thelast six months of I usage show an interesting dif-ference, where involuntary leavers align more on Ibut retain a stable baseline while voluntary leaversretain a stable alignment but increase I overall,

Page 8: Alignment at Work: Using Language to Distinguish the

This paper is under review and may change in the final version. Please contact the authors before citing.

I You We

alignbase

−0.2

0.0

0.2

0.4

0.6

0.8

−5.0

−4.5

−4.0

−3.5

Time

Log−

odds

par

amet

er e

stim

ates

Outcome: invol stay vol

Figure 4: Sawhorse plots split by employment out-come. Mid-tenure points are jittered for improvedreadability.

which is consistent with group separation.The most compelling result we see here, though,

is the changes in we usage by different groups ofemployees. Employees who eventually leave thecompany involuntarily show signs of more self-regulation than internalization over the first sixmonths, increasing their alignment while decreas-ing their baseline use (though they return to moresimilar levels as other employees later in theirtenure). Employees who stay at the company, aswell as those who later leave voluntarily, showsigns of internalization, increasing their baselineusage to the company average, as well as adaptingtheir alignment levels to the mean. This findingsuggests that how quickly the employees internal-ize culturally-standard language use predicts theireventual employment outcome, even if they even-tually end up near the average.

5.2 Analysis 2: Predicting Outcomes

This analysis tests the hypothesis that there aremeaningful differences in employees’ initial en-culturation, captured by alignment behaviors. Weexamine the first six months of communicationsand attempt to predict whether the employee willleave the company. We find that, even with a sim-ple classifier, alignment behaviors are predictiveof employment outcome.

Design We fit the WHAM model to only the firstsix months of email correspondence for all em-ployees who had at least six months of email. Themodel estimated the initial level of baseline use(ηbase) and alignment (ηalign) for each employee,as well as the slope (α, β) for baseline and align-

ment over those first six months, over all 11 wordcategories mentioned in Section 3.

We then created logistic regression classifiers,using the parameter estimates to predict whetheran employee would leave the company. We fitseparate classifiers for leaving voluntarily or in-voluntarily. Our results show that early alignmentbehaviors are better at identifying employees whowill leave involuntarily than voluntarily, consistentwith (Srivastava et al., forthcoming)’s findings thatvoluntary leavers are similar to stayers until late intheir tenure. We fit separate classifiers using thealignment parameters and the baseline parametersto investigate their relative informativity.

For each model, we report the area underthe curve (AUC). This value is estimated fromthe receiver operating characteristic (ROC) curve,which plots the true positive rate against the falsepositive rate over different classification thresh-olds. An AUC of 0.5 represents chance per-formance. We use balanced, stratified cross-validation to reduce AUC misestimation due tounbalanced outcome frequencies and high noise(Parker et al., 2007).

Results The left column of Figure 5 shows theresults over 10 runs of 10-fold balanced logis-tic classifiers with stratified cross-validation. Thealignment-based classifiers are both above chanceat predicting that an employee will leave the com-pany, whether involuntarily or voluntarily. Thebaseline-based classifiers are above chance for in-voluntary leavers but are non-predictive for vol-untary leavers. This finding is consistent with theidea that voluntary leavers resemble stayers (whoform the bulk of the employees) until late in theirtenure when their cultural fit declines.

We fit a model using both alignment and base-line parameters, but this model yielded an AUCvalue below the alignment-only classifier. Thissuggests that where alignment and baseline be-haviors are both predictive, they do not providesubstantially different predictive power and leadto overfitting. A more sophisticated classifier mayovercome these challenges; our goal here was notto achieve maximal classification performance butto test whether alignment provided any useful in-formation about employment outcomes.

5.3 Analysis 3: Types of Alignment

Our final analysis investigates the nature of lin-guistic alignment: specifically, whether there is an

Page 9: Alignment at Work: Using Language to Distinguish the

This paper is under review and may change in the final version. Please contact the authors before citing.

lexical referential

involvol

0.50

0.54

0.58

0.62

0.50

0.54

0.58

0.62

Parameter set

Cla

ssifi

er A

UC

Figure 5: AUC values for 10 runs of 10-fold cross-validated logistic classifiers. Both lexical (left col-umn) and referential (right column) alignment pa-rameters lead to above chance classifier perfor-mance, but referential alignment outperforms lex-ical alignment at predicting both voluntary and in-voluntary departures.

effect of referential alignment beyond that of themore commonly used lexical alignment.

Testing this hypothesis requires a small changeto the alignment calculations. Lexical alignment isbased on the conditional probability of the replierusing a word category c given that the precedingmessage used that same category c. For referentialalignment, we examine the conditional probabilityof the replier using a word category cj given thatthe preceding message used the category ci, whereci and cj are likely to be referentially linked. Wealso consider cases where ci is likely to transitionto cj throughout the course of the conversation,such as present tense verbs turning into past tenseas the event being described recedes into the past.The pairs of categories that are likely to be refer-entially or transitionally linked are: (you, I); (we,I); (you, we); (past, present); (present, future); and(certainty, tentativity). We include both directionsof these pairs, so this provides approximately thesame number of predictor variables for both situa-tions to maximize comparability (12 for the refer-ential alignments, 11 for the lexical). This modifi-cation does not change the structure of the WHAMmodel, but rather changes its C and N counts byreclassifying replies between the baseline or align-ment pathways.

Results Figure 5 plots the differences in predic-tive model performance using lexical versus ref-erential alignment parameters. We find that thesemantic parameters provide more accurate clas-

sification than the lexical both for voluntarily andinvoluntarily-leaving employees. This suggeststhat while previous work looking at lexical align-ment successfully captures social structure, refer-ential alignment may reflect a deeper and moreaccurate representation of the social structure. Itis unclear if this behavior holds in less formalsituations or with weaker organizational structureand shared goals, but these results suggest that thetraditional alignment approach of only measuringlexical alignment should be augmented with ref-erential alignment measures for a more completeanalysis.

6 Conclusions

This paper described an effort to use directedlinguistic alignment as a measure of cultural fitwithin an organization. We adapted a hierarchicalalignment model from previous work to estimatefit within corporate email communications, focus-ing on changes in language during employees’ en-try to and exit from the company. Our resultsshowed substantial changes in the use of pronouns,with pronoun patterns varying by employees’ out-comes within the company.The use of the first-person plural “we” during an employee’s first sixmonths is particularly instructive. Whereas stay-ers exhibited increased baseline use, indicating in-ternalization, those eventually departing involun-tarily were on the one hand decreasingly likelyto introduce “we” into conversation, but increas-ingly responsive to interlocutors’ use of the pro-noun. While not internalizing a shared identitywith their peers, involuntarily departed employeeswere overly self-regulating in response to its invo-cation by others.

Quantitatively, rates of usage and alignment inthe first six months of employment carried infor-mation about whether employees left involuntar-ily, pointing towards fit within the company cul-ture early on as an indicator of eventual employ-ment outcomes. Finally, we saw ways in whichthe application of alignment to cultural fit mighthelp to refine ideas about alignment itself: prelimi-nary analysis suggested that referential, rather thanlexical, alignment was more predictive of employ-ment outcomes. More broadly, these results sug-gest ways that quantitative methods can be used tomake precise application of concepts like “culturalfit” at scale.

Page 10: Alignment at Work: Using Language to Distinguish the

This paper is under review and may change in the final version. Please contact the authors before citing.

ReferencesRichard Y. Bourhis and Howard Giles. 1977. The lan-

guage of intergroup distinctiveness. In H Giles, ed-itor, Language, Ethnicity, and Intergroup Relations,Academic Press, London, pages 119–135.

David B. Buller and R. Kelly Aune. 1988. The effectsof vocalics and nonverbal sensitivity on compliance:A speech accommodation theory explanation. Hu-man Communication Research 14:301–32.

Cindy Chung and James W. Pennebaker. 2007. Thepsychological functions of function words. InK Fiedler, editor, Social communication, Psychol-ogy Press, New York, chapter 12, pages 343–359.

Cristian Danescu-Niculescu-Mizil, Lillian Lee,Bo Pang, and Jon Kleinberg. 2012. Echoes ofpower: Language effects and power differencesin social interaction. In Proceedings of the 21stinternational conference on World Wide Web. page699. https://doi.org/10.1145/2187836.2187931.

Cristian Danescu-Niculescu-Mizil, Robert West, DanJurafsky, Jure Leskovec, and Christopher Potts.2013. No country for old members: user lifecy-cle and linguistic change in online communities. InProceedings of the 22nd International Conferenceon World Wide Web. pages 307–318.

Gabriel Doyle and Michael C. Frank. 2016. Investigat-ing the sources of linguistic alignment in conversa-tion. In Proceedings of ACL.

Gabriel Doyle, Dan Yurovsky, and Michael C. Frank.2016. A robust framework for estimating linguisticalignment in Twitter conversations. In Proceeedingsof WWW.

R. E. Feldman. 1968. Response to compatriots andforiegners who seek assistance. Journal of Person-ality and Social Psychology 10:202–14.

Riccardo Fusaroli, Bahador Bahrami, Karsten Olsen,Andreas Roepstorff, Geraint Rees, Chris Frith,and Kristian Tylen. 2012. Coming to Terms:Quantifying the Benefits of Linguistic Coordi-nation. Psychological Science 23(8):931–939.https://doi.org/10.1177/0956797612436816.

Andrew Gelman, Aleks Jakulin, Maria Grazia Pittau,and Yu-Sung Su. 2008. A weakly informative de-fault prior distribution for logistic and other regres-sion models. The Annals of Applied Statistics .

Howard Giles, Nikolas Coupland, and Justine Coup-land. 1991. Accommodation theory: Communica-tion, context, and consequences. In Howard Giles,Justine Coupland, and Nikolas Coupland, editors,Contexts of accommodation: Developments in ap-plied sociolinguistics, Cambridge University Press,Cambridge.

Augusto Gnisci. 2005. Sequential strategies of accom-modation: A new method in courtroom. BritishJournal of Social Psychology 44(4):621–643.

Amir Goldberg, Sameer B. Srivastava, V. Govind Ma-nian, and Christopher Potts. 2016. Fitting in orstanding out? The tradeoffs of structural and cul-tural embeddedness. American Sociological Review.

Molly E. Ireland, Richard B. Slatcher, Paul W.Eastwick, Lauren E. Scissors, Eli J. Finkel,and James W. Pennebaker. 2011. Languagestyle matching predicts relationship initiationand stability. Psychological Science 22:39–44.https://doi.org/10.1177/0956797610392928.

Kate G. Niederhoffer and James W. Pennebaker. 2002.Linguistic style matching in social interaction. Jour-nal of Language and Social Psychology 21(4):337–360. http://jls.sagepub.com/content/21/4/337.short.

Charles A. O’Reilly, Jennifer Chatman, and David F.Caldwell. 1991. People and organizational culture:a profile comparison approach to assessing person-organization fit. Academy of Management Journal34(3):487–516.

Brian J Parker, Simon Gunter, and Justin Bedo. 2007.Stratification bias in low signal microarray studies.BMC bioinformatics 8(1):1.

James W. Pennebaker, Cindy K. Chung, Molly Ireland,Amy Gonzalez, and Roger J. Booth. 2007. The de-velopment and psychometric properties of liwc2007.Technical report, LIWC.net.

Martin J. Pickering and Simon Garrod. 2004. To-ward a mechanistic psychology of dialogue.Behavioral and brain sciences 27(2):169–190.https://doi.org/10.1017/S0140525X04000056.

Sameer B. Srivastava, Amir Goldberg, V. Govind Ma-nian, and Christopher Potts. forthcoming. Encul-turation trajectories: Language, cultural adaptation,and individual outcomes in organizations. Manage-ment Science .

Harry C. Triandis. 1960. Cognitive similarity and com-munication in a dyad. Human Relations 13:175–183.

Dan Yurovsky, Gabriel Doyle, and Michael C. Frank.2016. Linguistic input is tuned to children’s devel-opmental level. In Proceedings of the 38th AnnualMeeting of the Cognitive Science Society.


Recommended