Analyzing Framing through the Casts of Characters in the Newsweb.stanford.edu/~dcard/resources/emnlp2016.pdfAnalyzing Framing through the Casts of Characters in the News Dallas Card1

Analyzing Framing through the Casts of Characters in the News

Dallas Card1 Justin H. Gross2 Amber E. Boydstun3 Noah A. Smith4

1School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA2Department of Political Science, University of Massachusetts, Amherst, MA 01003, USA

3Department of Political Science, University of California, Davis, CA 95616, USA4Computer Science & Engineering, University of Washington, WA 98195, USA

[email protected] [email protected] [email protected]

[email protected]

Abstract

We present an unsupervised model for thediscovery and clustering of latent “personas”(characterizations of entities). Our modelsimultaneously clusters documents featuringsimilar collections of personas. We evalu-ate this model on a collection of news arti-cles about immigration, showing that personashelp predict the coarse-grained framing anno-tations in the Media Frames Corpus. We alsointroduce automated model selection as a fairand robust form of feature evaluation.

1 Introduction

Social science tells us that communication almostinescapably involves framing—choosing “a few el-ements of perceived reality and assembling a nar-rative that highlights connections among them topromote a particular interpretation” (Entman, 2007).Memorable examples include loaded phrases (deathtax, war on terror), but the literature attests a muchwider range of linguistic means toward this end (Panand Kosicki, 1993; Greene and Resnik, 2009; Choiet al., 2012; Baumer et al., 2015).

Framing is associated with several phenomena towhich NLP has been applied, including ideology(Lin et al., 2006; Hardisty et al., 2010; Iyyer et al.,2014), sentiment (Pang and Lee, 2008; Feldman,2013), and stance (Walker et al., 2012; Hasan andNg, 2013). Although such author attributes are inter-esting, framing scholarship is concerned with persis-tent patterns of representation of particular issues—without necessarily tying these to the states or inten-tions of authors—and the effects that such patterns

may have on public opinion and policy. We also notethat NLP has often been used in large-scale stud-ies of news and its relation to other social phenom-ena (Leskovec et al., 2009; Gentzkow and Shapiro,2010; Smith et al., 2013; Niculae et al., 2015).

Can framing be automatically recognized? If so,social-scientific studies of framing will be enabledby new measurements, and new applications mightbring framing effects to the consciousness of every-day readers. Several recent studies have begun toexplore unsupervised framing analysis of politicaltext using autoregressive and hierarchical topic mod-els (Nguyen et al., 2013; Nguyen et al., 2015; Tsuret al., 2015), but most of these conceptualize fram-ing along a single dimension. Rather than trying toplace individual articles on a continuum from lib-eral to conservative or positive to negative, we areinterested in discovering broad-based patterns in theways in which the media communicate about issues.

Here, our focus is on the narratives found in newsstories, specifically the participants in those stories.Insofar as journalists make use of archetypal narra-tives (e.g., the struggle of an individual against amore powerful adversary), we expect to see recur-ring representations of characters in these narratives(Schneider and Ingram, 1993; Van Gorp, 2010).A classic example is the contrast between “wor-thy” and “unworthy” victims (Herman and Chom-sky, 1988). More recently, Glenn Greenwald haspointed out how he was repeatedly characterized asan activist or blogger, rather than a journalist duringhis reporting on the NSA (Greenwald, 2014).

Our model builds on the “Dirichlet personamodel” (DPM) introduced by Bamman et al. (2013)

for the unsupervised discovery of what they called“personas” in short film summaries (e.g., the “darkhero”). As in the DPM, we operationalize per-sonas as mixture of textually-expressed characteris-tics: what they do, what is done to them, and theirdescriptive attributes. We begin by providing a de-scription of our full model, after which we highlightthe differences from the DPM.

This paper’s main contributions are:• We strengthen the DPM’s assumptions about

the combinations of personas found in docu-ments, applying a Dirichlet process prior to in-fer patterns of coocurrence (§3). The result isa clustering of documents based on the collec-tions of personas they use, discovered simulta-neously with those personas.

• Going beyond named characters, we allowBamman-style personas to account for entitieslike institutions, objects, and concepts (§5).

• We find that our model produces interpretableclusters that provide insight into our corpus ofimmigration news articles (§6).

• We propose a new kind of evaluation basedon Bayesian optimization. Given a supervisedlearning problem, we treat the inclusion of acandidate feature set (here, personas) as a hy-perparameter to be optimized alongside otherhyperparameters (§7).

• In the case of U.S. news stories about immigra-tion, we find that personas are, in many cases,helpful for automatically inferring the coarse-grained framing and tone employed in a pieceof text, as defined in the Media Frames Corpus(Card et al., 2015) (§7).

2 Model Description

The plate diagram for the new model is shown inFigure 1 (right), with the original DPM (Bamman etal., 2013) shown on the left.

As evidence, the model considers tupleshw, r, e, ii, where w is a word token and r is thecategory of syntactic relation1 it bears to an entitywith index e mentioned in document with index i.The model’s generative story explains this evidence

1We adopt the terminology from Bamman et al. (2013) of“agent”, “patient”, and “attribute”, even though these categoriesof relations are defined in terms of syntactic dependences.

w

z r

p

!

"

#

$

s

%

&

'

T

E

D∞

H

w

z r

p

!

"

#

$

s

T

E

D

G &

H

w

z r

p

!

"

#

$

(

'

T

E

D

Figure 1: Plate diagrams for the DPM (left), and for thenew model (right).

as follows:

1. Let there be K topics as in LDA (Blei et al.,2003). Each topic �k ⇠ Dir(�) is a multino-mial over the V words in the vocabulary, drawnfrom a Dirichlet parameterized by �.

2. For each of P personas p, and for each syn-tactic relation type r, define a multinomial p,r

over the K topics, each drawn from a Dirichletparameterized by �.

3. Assume an infinite set of distributions over per-sonas drawn from a base distribution H . Eachof these ✓j ⇠ Dir(↵) is a multinomial overthe P personas, with an associated probabil-ity of being selected ⇡j , drawn from the stick-breaking process with hyperparameter �.

4. For each document i:(a) Draw a cluster assignment si ⇠ ⇡,

with corresponding multinomial distribu-tion over personas ✓si .

(b) For each entity e participating in i:i. Draw e’s persona pe ⇠ ✓si .

ii. For every hr, wi tuple associated withe in i, draw z ⇠ pe,r then w ⇠ �z .

The DPM (Figure 1, left) has a similar generativestory, except that each document has a unique distri-bution over personas. As such, step 4(a) is replacedwith a draw from a symmetric Dirichlet distribution✓i ⇠ Dir(↵).

3 Clustering Stories

The DPM assumes that each document has a uniquedistribution (✓i) from which its personas are drawn.However, for entities mentioned in news articles(as well as for the dramatis personae of films), wewould expect certain types of personas to occur to-gether frequently, such as articles about lawmak-ers and laws. Thus we would like to cluster doc-uments based on their “casts” of personas. To dothis, we have added a Dirichlet process (DP) prioron the document-specific distribution over personas(step 3), which allows the number of clusters toadapt to the size and complexity of the corpus (An-toniak, 1974; Escobar and West, 1994).

Although the model admits an unbounded num-ber of distributions over personas, the properties ofDPs are such that the number used by D documentswill tend to be much less than D. As a result, infer-ence under this model provides topics � (distribu-tions over words) interpretable as textual descriptorsof entities, personas (distributions over reusabletopics), and clusters of articles s with associated dis-tributions over personas ✓.

Following Bamman et al. (2013), we perform in-ference using collapsed Gibbs sampling, collapsingout the distributions over words (�), topics ( ), andpersonas (✓), as well as ⇡. On each iteration, wefirst sample a cluster for each document, followedby a persona for each entity, followed by a topic foreach tuple. Because we assume a conjugate basemeasure, sampling clusters can be done efficientlyusing the Chinese restaurant process (Aldous, 1985)for story types, personas, and topics, with slice sam-pling for hyperparameters (↵, �, �, �). Because suchalgorithms are well known to NLP readers, we haverelegated details to the supplementary material.

During sampling, we discard samples from thefirst 10,000 iterations, and collect one sample fromevery tenth iteration for following 1,000 iterations.We sample hyperparameters every 20 iterations forthe first 500 iterations, and every 100 thereafter.

4 Dataset

The Media Frames Corpus (MFC; Card et al., 2015)consists of annotations for approximately 4,200 ar-ticles about immigration taken from 13 U.S. news-papers over the years 1980–2012. The annotations

for these articles are in terms of a set of 15 general-purpose “framing dimensions” (such as Politics andLegality), developed to be broadly applicable to avariety of issues, and to be recognizable in text (bytrained annotators). Each article has been annotatedwith a “primary frame” (the overall dominant as-pect of immigration being emphasized), as well asan overall “tone” (pro, neutral, or anti), which is theextent to which a pro-immigration advocate wouldlike to see the article in print, without implying anyany stance taken by the author.2 The MFC containsat least two independent annotations for each article;agreement on the primary frame and tone was estab-lished through discussion in cases of initial disagree-ment. A complete list of these framing dimensionsis given in the supplementary material.

In order to train our model on a larger collec-tion of articles, we use the original corpus of articlesfrom which the annotated articles in the MFC weredrawn. This produces a corpus of approximately37,000 articles about immigration; we train the per-sona model on this larger dataset, only using thesmaller set for evaluation on a secondary task. Notethat the MFC annotations are not used by our model;rather, we hypothesize that the personas it discoversmay serve as features to help predict framing—thisserves as one of our evaluations (§7).

5 Identifying Entities

The original focus of the DPM was on named char-acters in movies, which could be identified usingnamed entity recognition and pronominal corefer-ence (Bamman et al., 2013), or name matching forpre-defined characters (Bamman et al., 2014). Here,we are interested in applying our model to entitiesabout which we assume no specific prior knowledge.

In order to include a broader set of entities, wepreprocess the corpus and apply a series of filters.First, we obtain lemmas, part-of-speech tags, de-pendencies, coreference resolution, and named enti-ties from the Stanford CoreNLP pipeline (Manninget al., 2014), as well as supersense tags from theAMALGrAM tagger (Schneider and Smith, 2015).For each document, we consider all tokens with a

2The MFC also contains more fine-grained annotations ofspans of text which cue each of the framing dimensions, but wedo not make use of those here.

NN* or PRP part of speech as possible entities,partially clustered by coreference. We then mergeall clusters (including singletons) within each docu-ment that share a non-pronomial mention word.

Next, we exclude all clusters lacking at least onemention classified as a person, organization, lo-cation, group, object, artifact, process, or act (byCoreNLP or AMALGrAM). From these, we extracthw, r, e, ii tuples using extraction patterns lightlyadapted from (Bamman et al., 2013). (The completeset of patterns are given in the supplementary mate-rial.) To further restrict the set of entities to thosethat have sufficient evidence, we construct a vocab-ulary for each of the three relations, and excludewords that appear less than three times in the corre-sponding vocabulary.3 We then apply one last filterto exclude entities that have fewer than three qual-ifying tuples across all mentions. From the datasetdescribed in §4, we extract 128,655 entities, men-tioned using 11,262 different mention words, with575,910 tuples and 11,104 distinct hr, wi pairs.

6 Exploratory Analysis

Here we discuss our model, as estimated on the cor-pus of 37,000 articles discussed in §4 with 50 per-sonas and 100 topics; these values were not tuned.A cursory examination of topics shows that eachtends to be a group of either verbs or attributes. Per-sonas, on the other hand, blend topics to include allthree relation types. The estimated Dirichlet hyper-parameters are all ⌧ 1, giving sparse (and henceeasily scanned) distributions over personas, topics,and words.

Table 1 shows all 50 personas. For each p, weshow (i) the mention words most strongly associatedwith p, and (ii) hr, wi pairs associated with the per-sona. (To save space, “I” denotes immigrant.) Re-call that, like the Dirichlet persona model, our modelsays nothing about the mention words; they are notincluded as evidence during inference.4 Nonethe-less, each persona is strongly associated with a

3We also exclude the lemma “say” as a stopword, as it is themost common verb in the corpus by an order of magnitude

4We did explore adding mention words as evidence, but theytended to dominate the relation tuples. Because our interest isin a richer set of framing devices than simply the words used torefer to people (and other entities), we consider here only themodel based on the surrounding context.

sparse handful of mention words, and we find thatlabeling each persona by its most strongly associ-ated mention word (excluding immigrant) is oftensensible (these are capitalized in Table 1, though insome cases the relation words differentiate strongly(e.g., the group personas, IDs 17 and 18 in Table 1).

The model finds expected participants (such asworkers, political candidates, and refugees), but alsomore conceptual entities, such as laws, bills (IDs3, 37), and the U.S.-Mexican border (ID 5), whichlooms large in the immigration debate. Some in-teresting distinctions are discovered, such as two ofthe worker personas, one high-skilled and residinglegally (ID 48), the other illegal (ID 49).

Using the original publication dates of the arti-cles, we can estimate the frequency of appearanceof each persona within immigration coverage bysumming the posterior distribution over personas foreach entity mention, and plotting these frequenciesacross time. (Note that time metadata is not givento the model as evidence.) We find immediately thatpersonas can signal events. Figure 2 shows thesetemporal trajectories for a small, selected set of per-sonas. Although bills and laws are conceptually sim-ilar, and have similar trajectories from 1980 to 2005,they are strongly divergent in 2006 and 2010. Theseare particularly notable years for immigration pol-icy, corresponding to the failed Comprehensive Im-migration Reform Act of 2006 (Senate bill S.2611)and Arizona’s controversial anti-immigration lawsfrom 2010.5 Refugees, by contrast, show a markedspike around the year 2000. Inspection showedthis persona to be strongly tied to the case of ElianGonzalez, which received a great deal of media at-tention in that year.

The main advantage of the extended model overthe DPM is being able to cluster articles by “casts.”During sampling, thousands of clusters are created(and mostly destroyed). Ultimately, our inferenceprocedure settled on approximately 110 clusters, andwe consider two examples. Figure 3 shows the tem-poral trajectories of the two clusters with the greatestrepresentation of the refugee persona. Both show thecharacteristic spike around the year 2000. The toppersonas for these two clusters are given in Table

5Other notable events which appear to be represented in-clude the Illegal Immigration Reform and Immigrant Respon-sibility Act of 1996, and the Secure Fence Act of 2006.

ID Mention words Relations1 AGENT police official authority federalm tellp finda arresta localm tella2 ASYLUM crime refugee asylum seeker politicalm seekp grantp commitp seriousm denyp

3 BILL law immigration reform measure comprehensivem passa passp makea havea supportp4 BOAT van crime document criminalm otherm havep usea usep bea5 BORDER border patrol border agent mexicanm crossp securep southernm u.s.-mexicom closep6 BUSH official mcnary people I havea tella wanta tellp formerm calla7 CANDIDATE bush romney leader republicanm presidentialm democraticm havea calla supporta8 CARD document visa status greenm newm getp temporarym fakem permanentm9 CARD visa state document consularm federalm havea mexicanm receivep getp

10 COMPANY country I state nation havea regionalm globalm ruralm takea requirep11 COUNTRY people I citizen united states americanm otherm enterp havea leavep centralm12 COUPLE marriage people I class gaym bilingualm same-sexm havea primem seasonalm13 COURT lawsuit suit ruling federalm filep rulea civilm filea havea14 EMPLOYER company people business hirea havea manym requirep employa localm15 FENCE amendment law wall realm 14thm virtualm buildp bea havea16 GOVERNMENT court judge official federalm localm havea rulea askp otherm17 GROUP deportation attack country terroristm civilm facep armedm islamicm muslimm

18 GROUP I voter people bush hispanicm immigrantm localm manym wanta havea19 I ALIEN immigration people worker illegalm allowp havea legalm undocumentedm livea20 I ALIEN people criminal inmate illegalm criminalm deportp immigrantm detainp releasep21 I ALIEN worker immigration employer illegalm hirep undocumentedm employp legalm hirea22 I ALIEN worker people immigration illegalm arrestp undocumentedm arresta chargep transportp23 I CHILD worker people student immigrantm foreign-bornm havea manym comea newm

24 I GROUP people population business newm immigrantm otherm manym asianm havea25 I GROUP program center city newm havea firstm bea otherm makea26 I IMMIGRATION alien worker illegalm legalm hirep havea allowp undocumentedm

27 I IMMIGRATION alien worker people illegalm legalm havea bea comea immigrantm28 I JEWS refugee israel child sovietm jewishm russianm havea vietnamesem israelim29 I MAN alien refugee people illegalm chinesem cubanm arrestp haitianm findp30 I PEOPLE child student worker manym youngm havea illegalm comea bea31 I PEOPLE country woman man blackm muslimm africanm havea comea koreanm

32 I WORKER people citizen job americanm newm havea mexicanm illegalm manym33 I WORKER resident student people legalm foreignm permanentm havea allowp skilledm34 I WORKER student people child undocumentedm illegalm immigrantm havea allowp livea35 JOB I people immigration law havep havea bea takep goodm makea36 JOB study survey I labor finda newm findp showa fillp takep37 LAW immigration law bill measure newm federalm enforcep requirea passp allowa

38 MAN I woman people haitians deportp havea arrestp holdp releasep facea39 MAN people agent official I arrestp chargep otherm formerm havea facea40 MAN woman I people girl tella killp havea otherm youngm takep41 PEOPLE I child man woman havea comea livea goa tellp worka42 PROFILING violence abuse discrimination racialm domesticm safem physicalm bea affordablem43 PROGRAM system law agency newm nationalm federalm createp usep specialm44 REFUGEE I boy people elian cubanm haitianm chinesem havea allowp returnp

45 SCHOOL people I family english havea highm seea comea goa bea46 SERVICE school care college publicm medicalm providep denyp receivep attendp47 TRAFFICKING rights group flight humanm internationalm commercialm bea havea takea48 WORKER I immigration student company foreignm legalm skilledm hirep americanm havea49 WORKER I people woman man mexicanm immigrantm undocumentedm migrantm illegalm50 YEAR program month income fiscalm lastm enda nextm previousm begina

Table 1: Personas with their associated mention words and relation tuples (a = agent, p patient, m = modifier/attribute);I denotes “immigrant.”

Figure 2: Temporal patterns of the mentions of selectedpersonas.

Figure 3: Temporal patterns of two clusters with thegreatest overall representation of the refugee persona.

2. Type A, which includes a story with the head-line “Protesters vow to keep Elian in U.S.,” empha-sizes political aspects, while type B (e.g., “Courtsays no to rights for refugees”) emphasizes legal as-pects. Note that Political and Legality are two of theframing dimensions used in the MFC.

Do these persona-cast clusters relate to frames?For the five most common story clusters, (whichhave no overlap with the two refugee story types),Figure 4 shows the number of annotated articles witheach of the primary frames if we assign each articleto its most likely cluster. The second and fifth clus-ters correlate particularly well with primary frames(Political and Crime, respectively). This is furtherreinforced by looking at the most frequent personafor each of these story clusters which are candidate(ID 7) for the second and immigrant (ID 22), char-acterized by illegalm and arrestp, for the fifth.

Refugee story cluster AFrequency Persona ID0.49 REFUGEE immigrant boy 440.10 BUSH official mcnary 60.06 IMMIGRANT man alien 290.05 ASYLUM crime refugee 2Refugee story cluster BFrequency Persona ID0.29 MAN immigrant woman 380.23 REFUGEE immigrant boy 440.12 COURT lawsuit suit 130.10 GOVERNMENT court judge 16

Table 2: Truncated distribution over personas for the twoclusters depicted in Figure 3. IDs index into Table 1.

Figure 4: Number of annotated articles in each of thefive most frequent clusters, with colors showing the pro-portion of articles annotated with each primary frame.

7 Experiments: Personas and Framing

We evaluate personas as features for automatic anal-ysis of framing and tone, as defined in the MFC(§4). Specifically, we build multi-class text clas-sifiers (separately) for the primary frame and thetone of a news article, for which there are 15 and3 classes, respectively. Because there are only afew thousand annotated articles, we applied 10-foldcross-validation to estimate performance.

Features are derived from our model by consider-ing each persona and each story cluster as a potentialfeature. A document’s feature values for story typesare the proportion of samples in which it was as-signed to each cluster. Persona feature values aresimilarly derived by the proportion of samples inwhich each entity was assigned to each persona,with the persona values for each entity in each docu-ment summed into a single set of persona values per

Primary frameFeatures: MF (W) (W,P1) (W,P2) (W,P2,S)Accuracy: 0.174 0.529 0.537 *0.540 0.537# Features: 0 3.9k 3.5k 3.5k 2.8kToneAccuracy: 0.497 0.628 0.631 0.628 0.630# Features: 0 5.0k 5.0k 5.0k 4.0k

Table 3: Evaluation using a direct comparison to a sim-ple baseline. Each model uses the union of listed features.(W = unigrams and bigrams, P1 = personas from DPM,P2 = personas from our model, S = story clusters; MF =always predict most frequent class.) * indicates a statisti-cally significant difference compared to the (W) baseline(p<0.05).

document. We did not use the topics (z) discoveredby our model as features.

7.1 Experiment 1: Direct Comparison

For the first experiment, we train independent multi-class logistic regression classifiers for predicting pri-mary frame and tone. We consider adding personaand/or story cluster features to baseline classifiersbased only on unigrams and bigrams with binarizedcounts, a simple but robust baseline (Wang and Man-ning, 2012).6 In all cases, we use L1 regularizationand use 5-fold cross validation within each split’straining set to determine the strength of regulariza-tion. We then repeat this for each of the 10 folds,thereby producing one prediction (of primary frameand tone) for every annotated article. The resultsof this experiment are given in Table 3; for predict-ing the primary frame, classifiers that used personaand/or story cluster features achieve higher accuracythan the bag-of-words baseline (W); the classifierusing personas from our model but not story clus-ters is significantly better than the baseline.7 Theenhanced models are also more compact, on aver-age, using fewer effective features. A benefit to pre-dicting tone is also observed, but it did not reachstatistical significance.

7.2 Experiment 2: Automatic Evaluation

Although bag-of-n-grams models are known to bea strong baseline for text classification, researchersfamiliar with the extensive catalogue of features of-

6We also binarized the persona feature values.7Two-tailed McNemar’s test (p<0.05).

fered by NLP will potentially see them as a strawman. We propose a new and more rigorous methodof comparison, in which a wide range of features areoffered to an automatic model selection algorithmfor each of the prediction tasks, with the features tobe evaluated withheld from the baseline.

Because no single combination of features andregularization strength is best for all situations, itis an empirical question which features are best foreach task. We therefore make use of Bayesian op-timization (Bayesopt) to make as many modelingdecisions as possible (Pelikan, 2005; Snoek et al.,2012; Bergstra et al., 2015; Yogatama et al., 2015).

In particular, let F be the set of features thatmight be used as input to any text classification al-gorithm. Let f be a new feature that is being pro-posed. Allow the inclusion or exclusion of eachfeature in the feature set to be a hyperparameterto be optimized, along with any additional deci-sions such as input transformations (e.g., lowercas-ing), and feature transformations (e.g., normaliza-tion). Using an automatic model selection algorithmsuch as Bayesian optimization, allow the perfor-mance on the validation set to guide choices aboutall of these hyperparameters on each iteration, andset up two independent experiments.

For the first condition, A1, allow the algorithm ac-cess to all features in F . For the second, A2, allowthe algorithm access to all features in F [ f . Af-ter R iterations of each, choose the best model orthe best set of models from each of A1 and A2 (M1

and M2, respectively), based on performance on thevalidation set. Finally, compare the selected mod-els in terms of performance on the test set (usingan appropriate metric such as F1), and examine thefeatures included in each of the best models. If fis a helpful feature, we should expect to see that, a)F1(M2) > F1(M1), and b), f is included in the bestmodel(s) found by A2.

If F1(M2) > F1(M1) but f is not included inthe best models from A2, this suggests that the per-formance improvement may simply be a matter ofchance, and there is no evidence that f is helpful.By contrast, if f is included in the best models, butF1(M2) is not significantly better than F1(M1), thissuggests that f is offering some value, perhaps ina more compressed form of the useful signal fromother features, but does not actually offer better per-

Features: (B) (B,P1) (B,P2) (B,P2,S)Primary frame 0.566 0.568 0.568 0.567Tone 0.667 0.671 0.667 0.671

Table 4: Mean accuracy of the best three iterations fromBayesian optimization (chosen based on validation accu-racy). (B = features from many NLP tools, P1=personasfrom the DPM, P2 = personas from our model, S=storyclusters.)

formance.For this experiment, we use the tree-structured

Parzen estimator for Bayesian optimization(Bergstra et al., 2015), with L1-regularized logisticregression as the underlying classifier, and setR = 40. In addition to the entities and storyclusters identified by these models, we allow theseclassifiers access to a large set of features, includingunigrams, bigrams, parts of speech, named entities,dependency tuples, ordinal sentiment values (Man-ning et al., 2014), multi-word expressions (Justesonand Katz, 1995), supersense tags (Schneider andSmith, 2015), Brown clusters (Brown et al., 1992),frame semantic features (Das et al., 2010), andtopics produced by standard LDA (Blei et al.,2003). The inclusion or exclusion of each featureis determined automatically on each iteration, alongwith feature transformations (removal of rare words,lowercasing, and binary or normalized counts).

The baseline, denoted “B,” offers all features ex-cept personas and story clusters to Bayesopt; weconsider adding DPM personas, our model’s per-sonas, and our model’s personas and story clus-ters. Table 4 shows test-set accuracy for each setup,averaged across the three best models returned byBayesopt.

Using this more rigorous form of evaluation, ap-proximately the same accuracy is obtained in all ex-perimental conditions. However, we can still gaininsight into which features are useful by examin-ing those selected by the best models in each con-dition. For primary frame prediction, both personasand story clusters are included by the best models inevery case where they have been offered as possiblefeatures, as are unigrams, dependency tuples, andsemantic frames. Other commonly-selected featuresinclude bigrams and part of speech tags. For pre-dicting tone, personas are only included by half ofthe best models, with the most common features be-

ing unigrams, bigrams, semantic frames, and Brownclusters. As expected, the best models in each condi-tion obtain better performance than the models fromexperiment 1, thanks to the inclusion of additionalfeatures and transformations.

This secondary evaluation suggests that for thistask, persona features are useful in predicting theprimary frame, but are unable to offer improved per-formance over existing features, such as semanticframes. However, the fact that that both personasand story clusters are included by all the best modelsfor predicting the primary frame suggests that theyare competitive with other features, and perhaps of-fer useful information in a more compact form.

8 Qualitative Evaluation

Prior to exposure to any output of our model, oneof the co-authors on this paper (Gross, who has ex-pertise in both framing and the immigration issue)prepared a list of personas he expected to frequentlyoccur in American news coverage of immigration.Given the example of the “skilled immigrant,” helisted 22 additional named personas, along with afew examples of things they do, things done to them,and attributes.

The list he prepared includes several differentcharacterizations of immigrants (low-skilled, unau-thorized, legal, citizen children, undocumented chil-dren, refugees, naturalized citizens), non-immigrantpersonas (U.S. workers, smugglers, politicians, of-ficials, border patrol, vigilantes), related pairs (pro/ anti advocacy groups, employers / guest workers,criminals / victims), and a few more conceptual en-tities (the border, bills, executive actions). Of these,almost all are arguably represented in the personaswe have discovered. However, there is rarely a per-fect one-to-one mapping: predefined personas aresometimes merged (e.g., “the border” and “borderpatrols”) or split (e.g., legislation, employers, andvarious categories of immigrants). Personas whichdon’t emerge from our model include smugglers,guest workers, vigilantes, and victims of immigrantcriminals. On the other hand, our model proposes farmore non-person entities, such as ID cards, courts,companies, jobs, and programs.

These partial matchings between predefined per-sonas and the results of our model are generally

identifiable by comparing the names given to thepredefined personas to the the most commonly oc-curring mention words and attributes of our discov-ered personas. The attributes and action words givento the predefined personas are harder to evaluate,as many of them are rare (e.g. politicians “vacil-late”) or compound phrases (e.g. low-skilled immi-grants “do jobs Americans won’t do”) that tend tomiss the more obvious properties captured by ourmodel. For example, the employer persona capturedby our model engages in actions like hire, employ,and pay. By contrast, the terms given for the pre-defined “business owners” persona are “lobby” and“rely on immigrant labor.” Our unsupervised dis-covery of this persona can clearly be matched tothe predefined persona in this case, but doesn’t pro-vide such fine-grained insight into how they mightbe characterized.

The best match between predefined and discov-ered personas is the U.S.-Mexican border. Of thewords given for the predefined persona, almost allare more frequently associated with border thanwith any other discovered persona (“Mexican-U.S.,”“lawless,” “porous,” “unprotected,” “guarded,” and“militarized”). The most commonly associatedwords discovered by our model that are missingfrom the predefined description include crossed, se-cured, southern, and closed.

While this qualitative evaluation helps to demon-strate the face validity of our model, it would be bet-ter to have a more comprehensive set of predefinedpersonas, based on input from additional experts.Moreover, it also illustrates the challenge of tryingto match the output of an unsupervised model to ex-pected results. Not only is some merging and split-ting of categories inevitable, there was a mismatchin this case in the types of entities to be described(people as opposed to more abstract entities), andthe ways of describing them (rare but specific wordsas opposed to more generic but potentially obviousterms).

9 Related Work

Much NLP has focused on identifying entities orevents (Ratinov and Roth, 2009; Ritter et al., 2012),analyzing schemes or narrative events in terms ofcharacters (Chambers and Jurafsky, 2009), inferring

the relationships between entities (O’Connor et al.,2013; Iyyer et al., 2016), and predicting personal-ity types from text (Flekova and Gurevych, 2015).Bamman also applied variants of the DPM to char-acters in novels (Bamman et al., 2014).

Previous work on sentiment, stance, and opinionmining has focused on recognizing stance or polit-ical sentiment in online ideological debates (Soma-sundaran and Wiebe, 2010; Hasan and Ng, 2014;Sridhar et al., 2015), and other forms of social me-dia (O’Connor et al., 2010; Agarwal et al., 2011),and recently through the lens of connotation frames(Rashkin et al., 2016). Opinion mining and senti-ment analysis are the subject of ongoing researchin NLP and have long served as test platforms fornew methodologies (Socher et al., 2013; Irsoy andCardie, 2014; Tai et al., 2015)

Framing is arguably one of the most importantconcepts in the social sciences, with roots in tosociology, psychology, and mass communication(Gitlin, 1980; Benford and Snow, 2000; D’Angeloand Kuypers, 2010); the scope and relevance offraming is widely debated (Rees et al., 2001), withmany authors applying the concept of framing to an-alyzing documents on particular issues (Baumgart-ner et al., 2008; Berinsky and Kinder, 2006).

10 Conclusion

We have extended models for discovering latent per-sonas to simultaneously cluster documents by their“casts” of personas. Our exploration of the model’sinferences and their incorporation into a challengingtext analysis task—characterizing coarse-grainedframing in news articles—demonstrate that personasare a useful abstraction when applying NLP tosocial-scientific inquiry. Finally, we introduced aBayesian optimization approach to rigorously assessthe usefulness of new features in machine learningtasks.

Acknowledgments

The authors thank members of the ARK group andanonymous reviewers for helpful feedback on this work.This research was made possible by a Natural Sciencesand Engineering Research Council of Canada Postgradu-ate Scholarship (to D.C.), a Bloomberg Data Science Re-search Grant (to J.H.G., A.E.B., and N.A.S.), and a Uni-versity of Washington Innovation Award (to N.A.S.).

ReferencesApoorv Agarwal, Boyi Xie, Ilia Vovsha, Owen Rambow,

and Rebecca J. Passonneau. 2011. Sentiment analysisof twitter data. In Proc. of Frame Semantics in NLP:A Workshop in Honor of Chuck Fillmore (1929-2014).

D. Aldous. 1985. Exchangeability and related topics.In Ecole d’Ete St Flour 1983, pages 1–198. Springer-Verlag.

Charles E. Antoniak. 1974. Mixtures of dirichlet pro-cesses with applications to bayesian nonparametricproblems. Annals of Statistics, 2(6), November.

David Bamman, Brendan O’Connor, and Noah A. Smith.2013. Learning latent personas of film characters. InProc. of ACL.

David Bamman, Ted Underwood, and Noah A. Smith.2014. A bayesian mixed effects model of literary char-acter. In Proc. of ACL.

Eric Baumer, Elisha Elovic, Ying Qin, Francesca Polletta,and Geri Gay. 2015. Testing and comparing computa-tional approaches for identifying the language of fram-ing in political news. In Proc. of NAACL.

Frank R. Baumgartner, Suzanna L. De Boef, and Am-ber E. Boydstun. 2008. The decline of the deathpenalty and the discovery of innocence. CambridgeUniversity Press.

Robert D. Benford and David A. Snow. 2000. Framingprocesses and social movements: An overview and as-sessment. Annual Review of Sociology, 26:611–639.

James Bergstra, Brent Komer, Chris Eliasmith, DanYamins, and David D Cox. 2015. Hyperopt: a pythonlibrary for model selection and hyperparameter opti-mization. Computational Science and Discovery, 8(1).

Adam J. Berinsky and Donald R. Kinder. 2006. Makingsense of issues through media frames: Understandingthe Kosovo crisis. Journal of Politics, 68(3):640–656.

David M. Blei, Andrew Y. Ng, and Michael I. Jordan.2003. Latent Dirichlet allocation. J. Mach. Learn.Res., 3:993–1022.

Peter F. Brown, Peter V. deSouza, Robert L. Mercer, Vin-cent J. Della Pietra, and Jenifer C. Lai. 1992. Class-based N-gram models of natural language. Com-putional Linguistics, 18(4):467–479.

Dallas Card, Amber E. Boydstun, Justin H. Gross, PhilipResnik, and Noah A. Smith. 2015. The media framescorpus: Annotations of frames across issues. In Proc.of ACL.

Nathanael Chambers and Dan Jurafsky. 2009. Unsuper-vised learning of narrative schemas and their partici-pants. In Proc. of ACL.

Eunsol Choi, Chenhao Tan, Lillian Lee, CristianDanescu-Niculescu-Mizil, and Jennifer Spindel.2012. Hedge detection as a lens on framing inthe GMO debates: A position paper. In Proc of.

Workshop on Extra-Propositional Aspects of Meaningin Computational Linguistics, pages 70–79.

Paul D’Angelo and Jim A. Kuypers. 2010. Doing NewsFraming Analysis. Routledge.

Dipanjan Das, Nathan Schneider, Desai Chen, andNoah A. Smith. 2010. Probabilistic frame-semanticparsing. In Proc. of NAACL.

Robert M. Entman. 2007. Framing bias: Media inthe distribution of power. Journal of Communication,57(1):163–173.

Michael D. Escobar and Mike West. 1994. Bayesiandensity estimation and inference using mixtures. J.Amer. Statist. Assoc., 90:577–588.

Ronen Feldman. 2013. Techniques and applications forsentiment analysis. Commun. ACM, 56(4):82–89.

Lucie Flekova and Iryna Gurevych. 2015. Personalityprofiling of fictional characters using sense-level linksbetween lexical resources. In Proc. of EMNLP.

Matthew Gentzkow and Jesse M. Shapiro. 2010. Whatdrives media slant? Evidence from U.S. daily newspa-pers. Econometrica, 78(1):35–71.

Todd Gitlin. 1980. The Whole World is Watching. Berke-ley: University of California Press.

Stephan Greene and Philip Resnik. 2009. More thanwords: Syntactic packaging and implicit sentiment. InProc. of ACL.

Glenn Greenwald. 2014. No Place to Hide. Picador.Eric Hardisty, Jordan L. Boyd-Graber, and Philip Resnik.

2010. Modeling perspective using adaptor grammars.In Proc. of EMNLP.

Kazi Saidul Hasan and Vincent Ng. 2013. Stance classi-fication of ideological debates: Data, models, features,and constraints. In Proc. of IJCNLP.

Kazi Saidul Hasan and Vincent Ng. 2014. Why are youtaking this stance? identifying and classifying reasonsin ideological debates. In Proc. of EMNLP.

Edward S. Herman and Noam Chomsky. 1988. Manu-facturing Consent. Vintage.

Ozan Irsoy and Claire Cardie. 2014. Opinion min-ing with deep recurrent neural networks. In Proc ofEMNLP.

Mohit Iyyer, Peter Enns, Jordan L. Boyd-Graber, andPhilip Resnik. 2014. Political ideology detection us-ing recursive neural networks. In Proc. of ACL.

Mohit Iyyer, Anupam Guha, Snigdha Chaturvedi, JordanBoyd-Graber, and Hal Daume III. 2016. Feuding fam-ilies and former friends: Unsupervised learning for dy-namic fictional relationships. In Proc. of NAACL.

J. Justeson and S. Katz. 1995. Technical terminology:some linguistic properties and an algorithm for identi-fication in text. Natural Language Engineering.

Jure Leskovec, Lars Backstrom, and Jon Kleinberg.2009. Meme-tracking and the dynamics of the newscycle. In Proc. of KDD.

Wei-Hao Lin, Theresa Wilson, Janyce Wiebe, andAlexander Hauptmann. 2006. Which side are you on?Identifying perspectives at the document and sentencelevels. In Proc. of CoNNL.

Christopher D. Manning, Mihai Surdeanu, John Bauer,Jenny Finkel, Steven J. Bethard, and David McClosky.2014. The Stanford CoreNLP natural language pro-cessing toolkit. In Proc. of ACL.

Viet-An Nguyen, Jordan Boyd-Graber, and PhilipResnik. 2013. Lexical and hierarchical topic regres-sion. In Proc. of NIPS.

Viet-An Nguyen, Jordan Boyd-Graber, Philip Resnik,and Kristina Miler. 2015. Tea party in the house: A hi-erarchical ideal point topic model and its application toRepublican legislators in the 112th congress. In Proc.of ACL.

Vlad Niculae, Caroline Suen, Justine Zhang, CristianDanescu-Niculescu-Mizil, , and Jure Leskovec. 2015.QUOTUS: The structure of political media coverage asrevealed by quoting patterns. In Proceedings of WWW2015.

Brendan T. O’Connor, Ramnath Balasubramanyan,Bryan R. Routledge, and Noah A. Smith. 2010. Fromtweets to polls: Linking text sentiment to public opin-ion time series. In ICWSM.

Brendan O’Connor, Brandon M. Stewart, and Noah A.Smith. 2013. Learning to extract international rela-tions from political context. In Proc. of ACL.

Zhongdang Pan and Gerald M. Kosicki. 1993. Fram-ing analysis: An approach to news discourse. Politicalcommunication, 10(1):55–75.

Bo Pang and Lillian Lee. 2008. Opinion mining andsentiment analysis. Found. Trends Inf. Retr., 2(1-2).

M Pelikan. 2005. Bayesian optimization algorithm. InHierarchical Bayesian optimization algorithm, pages31–48. Springer.

Hannah Rashkin, Sameer Singh, and Yejin Choi. 2016.Connotation frames: A data-driven investigation. InProc. of ACL.

Lev Ratinov and Dan Roth. 2009. Design challenges andmisconceptions in named entity recognition. In Proc.of CoNNL.

Stephen D. Rees, Oscar H. Gandy Jr., , and August E.Grant, editors. 2001. Framing Public Life. Routledge.

Alan Ritter, Mausam, Oren Etzioni, and Sam Clark.2012. Open domain event extraction from twitter. InKDD.

Anne Schneider and Helen Ingram. 1993. Social con-struction of target populations: Implications for pol-itics and policy. The American Political Science Re-view, 87(2):334–347.

Nathan Schneider and Noah A. Smith. 2015. A corpusand model integrating multiword expressions and su-persenses. In Proc. of ACL.

Sameer Singh, Amarnag Subramanya, Fernando Pereira,and Andrew McCallum. 2012. Wikilinks: A large-scale cross-document coreference corpus labeled vialinks to Wikipedia. Technical Report UM-CS-2012-015, University of Massachusetts, Amherst.

David A. Smith, Ryan Cordell, and Elizabeth MaddockDillon. 2013. Infectious texts: modeling text reusein nineteenth-century newspapers. In Proc. of IEEEInternational Conference on Big Data.

Jasper Snoek, Hugo Larochelle, and Ryan P Adams.2012. Practical bayesian optimization of machinelearning algorithms. In Proc. of NIPS.

Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang,Christopher D. Manning, Andrew Ng, and Christo-pher Potts. 2013. Recursive deep models for semanticcompositionality over a sentiment treebank. In Proc.of EMNLP.

Swapna Somasundaran and Janyce Wiebe. 2010. Rec-ognizing stances in ideological on-line debates. InProceedings of the Workshop on Computational Ap-proaches to Analysis and Generation of Emotion inText.

Dhanya Sridhar, James Foulds, Bert Huang, Lise Getoor,and Marilyn Walker. 2015. Joint models of disagree-ment and stance in online debate. In Proc. of ACL.

Kai Sheng Tai, Richard Socher, and Christopher D. Man-ning. 2015. Improved semantic representations fromtree-structured long short-term memory networks. InProc. of ACL.

Oren Tsur, Dan Calacci, and David Lazer. 2015. Frameof mind: Using statistical models for detection offraming and agenda setting campaigns. In Proc. ofACL.

Baldwin Van Gorp. 2010. Strategies to take subjectiv-ity out of framing analysis. In Paul D’Angelo andJim A. Kuypers, editors, Doing News Framing Anal-ysis, chapter 4, pages 84–109. Routledge.

Marilyn A. Walker, Pranav Anand, Robert Abbott, andRicky Grant. 2012. Stance classification using dia-logic properties of persuasion. In Proc. of NAACL.

Sida Wang and Christopher D Manning. 2012. Base-lines and bigrams: Simple, good sentiment and topicclassification. In Proc. of ACL.

Dani Yogatama, Lingpeng Kong, and Noah A. Smith.2015. Bayesian optimization of text representations.In EMNLP.

Supplement toAnalyzing Framing through the Casts of Characters in the News

Dallas Card1 Justin H. Gross2 Amber E. Boydstun3 Noah A. Smith4

1School of Computer Science, Carnegie Mellon University, Pittsburgh, PA 15213, USA2Department of Political Science, University of Massachusetts, Amherst, MA 01003, USA

3Department of Political Science, University of California, Davis, CA 95616, USA4Computer Science & Engineering, University of Washington, WA 98195, USA

[email protected] [email protected] [email protected]

[email protected]

A Framing Dimensions

The 15 framing dimensions defined in the MediaFrames Corpus are reproduced in Figure 1.

Economic: costs, benefits, or other financial implica-tionsCapacity and resources: availability of physical, hu-man or financial resources, and capacity of current sys-temsMorality: religious or ethical implicationsFairness and equality: balance or distribution ofrights, responsibilities, and resourcesLegality, constitutionality and jurisprudence:rights, freedoms, and authority of individuals, corpo-rations, and governmentPolicy prescription and evaluation: discussion ofspecific policies aimed at addressing problemsCrime and punishment: effectiveness and implica-tions of laws and their enforcementSecurity and defense: threats to welfare of the indi-vidual, community, or nationHealth and safety: health care, sanitation, publicsafetyQuality of life: threats and opportunities for the indi-vidual’s wealth, happiness, and well-beingCultural identity: traditions, customs, or values of asocial group in relation to a policy issuePublic opinion: attitudes and opinions of the generalpublic, including polling and demographicsPolitical: considerations related to politics and politi-cians, including lobbying, elections, and attempts tosway votersExternal regulation and reputation: internationalreputation or foreign policy of the U.S.Other: any coherent group of frames not covered bythe above categories

Figure 1: Framing dimensions used in the MFC.

B Model Details

B.1 Dirichlet Process Mixture ModelThere are a number of equivalent formulations ofDirichlet process mixture models. Here we presentthe formulation based on the stick-breaking process.According to this perspective, each mixture com-ponent is drawn from an (infinite) set of mixturecomponents (equivalently, clusters), each of whichis drawn from a base measure, H . The conditionalprobability of a cluster assignment is distributedaccording to an (infinite) multinomial distribution,generated according to the stick-breaking process,with hyperparameter �. In particular,

{⇡0k}1k=1 ⇠ Beta(1,�) (1)

{⇡k}1k=1 ⇠ ⇡0k

k�1Y

l=1

(1� ⇡0l) (2)

si ⇠ ⇡ (3)✓si ⇠ H (4)

In our model, we take H to be a symmetricDirichlet distribution with hyperparameter ↵. Givena cluster assignment for the ith document, each en-tity’s persona is then drawn according to pe ⇠ ✓si ,where si indexes the cluster assignment of the ithdocument.

B.2 Collapsed Gibbs SamplingThe probability of a document being assigned to anexisting cluster (mixture component) is proportionalto the number of documents already assigned to thatcluster times the likelihood of the document’s cur-rent personas being generated from that cluster’s dis-

tribution over personas (✓si). The probability of thedocument being assigned to a new cluster is propor-tional to � times the likelihood of the document’spersonas being generated from a new draw from thebase distribution. Integrating out ✓ and ⇡ gives:

p(si = s0 | s�i,p,↵,�) / n(�i)s0,⇤ ⇥ f(s0) (5)

p(si = snew | s�i,p,↵,�) / �⇥ f(snew) (6)

f(s) =JY

j=1

↵+ n(�i)s,pj +

Pj�1j0=1 I{p0j=pj}

P↵+ n(�i)s,⇤ + (j � 1)

(7)

Here, s0 is an existing cluster, J ranges over the en-tities in document i, and pj is the persona of the jthentity. n(�i)

s,pj is the number of entities in documentsof type s with persona pj , excluding those in i. n(�i)

s,⇤is the total number of entities in this set, and I{·} isthe indicator function.

The equation for sampling personas is similar, andcan be shown to be

p(pe = p | p�e, z, se,↵,�) = (↵+ n(�e)se,p )

⇥RY

r=1

Te,rY

t=1

� + n(�e)p,r,kt

+Pt�1

t0=1 I{kt0=kt}

K� + n(�e)p,r,⇤ + (t� 1)

(8)

where n(�e)se,p is the number of entities with persona

p in documents with cluster se, excluding entity e.R is the number of categories of relations (agent,patient, attribute), Te,r is the number of tuples forentity e with relation r, kt is the topic of the tth tuplein Te,r, n(�e)

p,r,ktis the number of tuples with relation

r and topic kt for entities with persona p, excludingentity e, and n

(�e)p,r,⇤ are these counts summed over

topics.The equation for sampling the topic of a tuple at-

tached to entity e is:

p(zt = k | z�t,p,w, r,�, �) =

(� + n(�t)pe,rt,k

)⇥� + n

(�t)k,wt

V � + n(�t)k,⇤

(9)

where V is the size of the vocabulary, n(�t)pt,rt,k

is thenumber of tuples with relation rt and topic k at-tached to entities with persona pe, excluding tuplet. n(�t)

k,wtis the number of tuples with wt assigned to

topic k, excluding t, and n(�t)k,⇤ is the sum of these

counts across topics.

C Extraction Patterns

The extraction patterns used to extract hw, ri tuplesfrom dependency trees are given in Table 1.

Relation type Neighbor Arc type POSparent nsubj JJ

Attribute parent nsubj NN*child amod JJparent agent VB*

Agent parent nsubj VB*child acl VB*parent dobj VB*

Patient parent nsubjpass VB*parent iobj VB*

Table 1: Extraction patterns.

Documents

Analyzing Framing through the Casts of Characters in the Newsweb.stanford.edu/~dcard/resources/emnlp2016.pdfAnalyzing Framing through the Casts of Characters in the News Dallas Card1