View
5
Download
0
Category
Preview:
Citation preview
1
Syntactic, Semantic, and Topics: The Cognitive Framework of Fake News
Leah C. Windsor Research Assistant Professor
Institute for Intelligent Systems The University of Memphis
Zhiqiang “Carl” Cai
Research Assistant Professor Institute for Intelligent Systems
The University of Memphis
James Grayson Cupit Junior Software Developer
Institute for Intelligent Systems The University of Memphis
2
Abstract The problem with detecting fake news from categories such as bias, consipracy, hate, fake science, satire, state-run media, and bullshit, is that these types of information may appear similar to information coming from reputable news sources. Further, current computational approaches to distinguishing them often do no better than chance or human ratings at distinguishing fake from real news, in the aggregate. Some researchers have suggested that fake news erodes the foundations of democracy by undermining the role of legitimate journalists reporting accurate information, causing citizens to form erroneous conclusions based on inaccurate information about important scientific, social, and political issues. In using a computational linguistics approach to analyzing news, we can identify language features about the syntax, sentiment, and topical variation that distinguish fake from real news. The benefit of doing this lies in the potential downstream automated applications, such as broswer extensions that alert users to potentially disreputable or questionable sources or articles, as well as track emerging trends in fake news. Our paper identifies the syntactic, sentiment, and topical differences in fake and real news using a Kaggle fake news corpus, and a proprietary verified news corpus. In this study, we analyzed a corpus of 12,999 posts downloaded from 244 websites identified as “bullshit” by BS detector and compared with 6,079 real news downloaded from six high reputation news agency websites.
3
Wikileaks Exposes Clinton Satanic Ritual, FBI Calls Hillary the Antichrist (Fake News Headline)
State Finds 30 Deleted Clinton Emails On Benghazi
(Real News Headline) Introduction The term “fake news” refers to multiple phenomena, including the deliberate spread of false
information, satire, outdated/revived content, hoaxes, clickbait, propaganda, and disinformation.
The key problem with fake news is determining truth from fiction, a form of deception detection;
readers interact with both headlines and the body of text to discern the veracity of the source. At
first glance, fake news may appear similar to news from reputable outlets, both stylistically and
linguistically. In fact, many computational approaches for distinguishing fake from real news
often do no better than chance or human ratings at distinguishing fake from real news, in the
aggregate (1–3). Yet, as this paper demonstrates, syntactic, sentiment, and topical features
provide theoretically meaningful distinctions between fake and real news, namely that fake news
aligns with cognitive frameworks for shallow information processing. Extant research has
examined fake news from various angles, including social media engagement from sources like
Facebook and Twitter (4). We demonstrate that a computational linguistics approach helps
identify language features about the syntax, sentiment, and topical variation that distinguish fake
from real news using both headlines and full article text, providing useful information about how
audiences process news sources in the modern media era.
Our paper identifies the syntactic, sentiment, and topical differences in fake and real
news headlines and full articles using a Kaggle fake news corpus, and a proprietary verified news
corpus. We analyze a corpus of 12,999 posts downloaded from 244 websites by BS detector and
compared with 6,079 real news downloaded from six high reputation news agency websites,
including the Wall Street Journal, CNN, Fox News, Reuters, The Economist, MSNBC, and the
4
Washington Post. We examine the linguistic features of fake and real news using syntactic,
sentiment, and topic modeling methods as well as a long short-term memory (LSTM) neural
network approach, and we in four parts: fake news headlines; fake news stories; real news
headlines; and real news stories. There are three key takeaways from our findings: first, fake and
real news use vastly different syntactic patterns that illuminate the cognitive framework,
motivating the persuasive method used by susceptible media consumers; second,
Our paper proceeds as follows: first we briefly discuss some of the relevant literature and
theoretical implications; we then describe our data generating process (DGP) and methods; next,
we present the results of our empirical models for each area (syntax, sentiment, topics); finally,
we discuss and provide context for our findings as well as future applications for this workflow.
Real Approaches to Analyzing Fake News
Fake News Typologies
Recent studies have demonstrated the social and political hazards related to the rise and spread of
fake news. The term ‘fake news’ refers to multiple phenomena, including the deliberate spread of
false information, satire, outdated/revived content, hoaxes, clickbait, propaganda, and
disinformation. Volkova et al., use several linguistics measurea to distinguish between these
types of suspicious news items (5), finding that adding syntax and grammar features does not
improve the predictive value of their models, but that linguistic and social interaction features do
improve classification results between the four types of suspicious news stories they investigate
(satire, hoaxes, clickbait, and propaganda). Fake news is more viral than real news, and that it
presents more novel information, piquing the curiosity of readers (6). Similarly, Rashkin et al.,
use LIWC (Linguistic Inquiry and Word Count) dictionaries for subjective lexicon to classify
news items as propaganda, satire, or hoax (2). Their results demonstrate that exaggerating terms
5
such as superlatives, subjectives, and adjectives, all appear more frequently in fake news items.
These studies concur that linguistic features of fake and real news are different in substantive
ways; combined with the mainstream media-fueled rising political polarization and partisanship,
fake news sources reach a receptive audience that diffuses this information farther and faster than
reputable media messages spread (6).
Recently, scholars have pioneered automated fake news detection systems by mapping
the diffusion pattern of `likes' and `shares' across social media platforms, while social media
networks such as Facebook have crowd-sourced the problem of identifying fake news (7,8). The
key problem with fake news is determining truth from fiction, akin to deception detection
(Hauch, Sporer, Michael, and Meissner, 2014; Mihalcea and Strapparava, 2009; Rubin and
Conroy, 2011). Readers interact with both headlines and the body of text to discern the veracity
of the source, taking cues from both the content and stylistic elements, such as punctuation,
concrete nouns, and emotional tone. Scholars are exploring many paths to help distinguish
between fake and real news, including automated fake news detection that map the diffusion
pattern of ‘likes’ and ‘shares’ through automatic hoax detection systems (7). Some fake news
detectors rely on human raters, such as the BS Detector, Fake News Alert, and Politifact. FiB and
Stop-the-Bullshit (9), that use automated tools built for social media which are not available for
the Internet at large (7,10). Facebook has crowd-sourced solutions to the problem of identifying
fake news, including verified news sites (akin to Twitter’s user verification strategy), separating
‘shares’ from personal information, time delays on ‘reshares’, Snopes partnership, and headline
and content analysis (11,12).
6
Permutations of fake news are found in media worldwide, and tap into a longstanding
political tradition of distracting domestic audiences using propaganda.1 Recent research on the
“50c army” in China shows that social media posts serve to distract and redirect public narrative
during times of crisis or negative publicity around socially significant events, which may be the
goals of fake news propagators more broadly (13). Crisis propaganda also takes the form of the
“rally ‘round the flag” effect (14–16). Crisis propaganda utilizes the media to foster support for
the leader and to facilitate a sense of unity. In China during times of crisis, the Internet with an
array of positive and distracting messages, a phenomenon that runs contrary to the standard
narrative of media censorship (13). Rather, the contrived social media posts draw attention away
from an undesirable event or even undermine the credibility of democratic processes, such as
human rights protests in China or the 2016 American presidential election (17). Similar to
propaganda, populism is resurgent across Latin America (18) and throughout Europe (19–21).
Yet broadly speaking, even in democracies citizens have winnowed their news sources, in
part because of a lack of diversity from consolidation of media markets, as well as the rise of the
Internet and increasingly individualized and personalized news consumption (22). Recent
scholarship found that cable media accounts for a substantial portion of recent partisan
polarization in the United States (23). Citizens have comparatively less exposure to a variety of
perspectives than in previous generations with common news sources read or viewed by people
across the political spectrum (24), trust the media and government less overall (25), and
increasingly engage only with news items that reinforce their existing beliefs – a cocktail of self-
selection bias and cognitive dissonance (26).
1 Table 3 in the Appendix provides the definitions used by Sieradski 2016 (9). We use these classifications in our analysis.
7
Hardwired for Soft News?
Cognitive Framework
Humans have a demonstrably difficult time distinguishing between fake and real news, which
compromises their ability to make informed decisions about a wide array of issues, like
candidates and elections (8), as well as issues that straddle the public-private spheres such as
vaccinations (27). Fuzzy trace theory helps to account for humans’ unreliability in distinguishing
between real and fake news. As Broniatowski notes, when audiences retrieve rote, or verbatim,
information, their reasoning processes are inhibited as compared to gist information that
encourages reasoning. Reyna (2012) writes that, “Verbatim memory is memory for surface form,
for example, memory representations of exact words, numbers, and pictures. Verbatim memory
is a symbolic, mental representation of the stimulus, not the stimulus itself. Gist memory is
memory for essential meaning, the “substance” of information irrespective of exact words,
numbers, or pictures. Hence, gist is a symbolic, mental representation of the stimulus that
captures meaning (28).” Fake news tends to rely more on messages that evoke gist and use
emotional cues rather than facts to convey information. Additionally, gist representations often
correspond to the peripheral route to persuasion that appeals more to emotion than logic, whereas
verbatim representations tend toward the central route that relies more on facts and complex
explanations or descriptions (29,30).
Citizens use these heuristic shortcuts and emotional connections to make decisions about
leaders and issues (31,32), and the attention-grabbing headlines can vary in content from the
articles they summarize (33). Research has demonstrated that some voters disregard “hard”
media sources in favor of informal news sources, “including infotainment”; the reliance of news
from outside of mainstream media coupled with the castigation of this genre follows the logic of
8
low-information rationality (24,34,35). In other words, voters are cognitive misers, they seek
information from familiar and easy sources, and they integrate information selectively into their
worldview. Worldview itself appears to be hardwired, as evidenced by findings on support for
authoritarianism in the American National Election Studies (36,37). As Lakoff notes, partisans
conceptualize the nature of problems broadly speaking, and specific political problems
themselves, in vastly different ways (38,39). The notion of thought shaping language, and
language shaping thought has implications for how citizens classify, integrate, and/or disregard
information (40–42), especially in the era of ubiquitous fake news.
Syntax and Political Language
The syntactic, sentiment, and topical characteristics of text influence an audience’s receptivity to
an idea. Some audiences receive information best when senders use straightforward,
uncomplicated syntax with abstract concepts, repetitive key words, and within a standard
narrative framework. Others receive information best when presented with complex phrasing and
concrete concepts delivered within a more expository, cohesive framework. These concepts are
operationalized along five syntactic constructs that shape how audiences process and send
information: syntactic simplicity; word concreteness; narrativity; deep cohesion; and referential
cohesion (43). Syntactic simplicity refers to the complexity or simplicity of sentence phrasing.
Left-embedded sentences, i.e., those with dependent clauses before the main subject and verb
(like this sentence) are syntactically complex, require greater focus, and demand a greater
cognitive workload for the listener. Simple syntax generally follows the SVO (subject verb
object) framework, and is easier on the receiver.
Word concreteness refers to the abstractness of the text base. Concrete words are nouns
that have real world extensions, such as car, boat, or chair. Abstract words include concepts like
9
hope, fear, and security. Simple syntax and abstract concepts often trend together, especially in
persuasive populist rhetoric. Narrativity refers to the narrative arc, where the information is
presented in a storylike fashion with an introduction, rising action, and resolution. The opposite
is expository presentation whereby the sender conveys a litany of information, often organized
thematically or conceptually. A text has deep cohesion when the components of the discourse are
connected by underlying concepts. On the other hand, referential cohesion refers to more
localized co-referents, often sentence-to-sentence or through repetition of a key term. Referential
and deep cohesion are often inversely related; whereas the former is more locally cohesive, the
latter is more globally cohesive. A speaker with high referential cohesion is often more
“quotable”, producing useful soundbytes for short media highlights; discourse with greater deep
cohesion often requires summarizing to convey the main point.
Collectively, these five syntactic components provide clues to the route to persuasion
used by a source (29,30,44). The peripheral route often relies on simple syntax and abstract
concepts presented narratively with low deep cohesion and high referential cohesion. This
approach makes information easy to parse and is cognitively un-demanding. The central route to
persuasion, on the other hand, likely has more syntactic complexity and more concrete terms
presented in an expository fashion, having greater deep cohesion and less referential cohesion.
This approach is the more cognitively demanding and often appeals to a more discerning
audience. Given this discussion, we propose the following expectations about real and fake
news:
Expectation 1: Real news will have more syntactic complexity, more concrete words, less
narrativity, more deep cohesion, and less referential cohesion.
10
Expectation 2: Fake news will have more syntactic simpliciyt, more abstract words, more
narrativity, less deep cohesion, and more referential cohesion.
Syntax Corpus
Fake news websites may talk about any topic real ones talk about. In this study, we analyzed a
corpus of 12,999 posts downloaded from 244 websites identified as “bullshit” by BS detector
and compared with 6,079 real news downloaded from six high reputation news agency websites.
The fake news corpus was downloaded from Kaggle (45). Each item contains the publish date,
headline, texts, the source url and other information (see Table 4 in the Appendix). Some of them
had empty texts. All fake news posts were published between Oct. 26, 2016 and Nov. 25, 2016,
the month of US president election. The real news posts were published between June 3, 2016
and June 3, 2017. The posts were collected from 244 websites identified as “bullshit” by the BS
Detector Chrome Extension created by Daniel Sieradski (9). The real news corpus was
downloaded from six websites, published between 6/3/2016 and 6/3/2017, totalling n=6079
articles. Of these, 391 were from sub-domains of the above six websites, such as
nytlive.nytimes.com or stream.aljazeera.com.
Methods
T-tests with unequal variances show that all variables save narrativity show statistically
significant differences between fake and real news, with the coefficients in the expected
directions (see Figure 1).
11
Figure 1. T-test with unequal variance for syntactic principal components for fake and real news
Presented differently, Figure 7 shows box plots for the values of each variable for both
fake and real news (where fake news is 0 and real news is 1). Syntactic simplicity refers to the
grammatical complexity of an utterance, including features such as dependent clauses,
conjunctions, and left-embedded phrases that make parsing more taxing for the reader or listener.
Real news is more syntactically complex than is fake news, meaning that it is more cognitively
demanding than is fake news. Word concreteness refers to how concrete (nouns that refer to
tangible people, places, or things) or how abstract (concepts such as love, fear, loyalty, or
patriotism) the text base is. Real news uses more concrete terms, whereas fake news uses more
abstract concepts; this aligns with our theoretical expectations about routes to persuasion.
To demonstrate how language aligns conceptually with political partisanship, we
compare the same five syntactic categories using a corpus of speeches given by all major
12
Republican and Democratic presidential candidates in the 2016 election (for a full list, see Table
5 in the Appendix). Figure 2 shows that the Democratic candidates had more syntactic
complexity, whereas Republican candidates used more straightforward syntactic constructions.
Democratic candidates used more concrete words than Republican candidates, and Republicans’
language was demonstrably more narrative than Democrats’. Finally, Democratic candidates
used less referential cohesion than the Republican candidates, but more deep cohesion. There are
striking parallels between the syntactic structures used in fake and real news corresponding to the
ways in which partisans use language that may have implications regarding the susceptibility of
audiences to ensnarement by fake news. We discuss this in the conclusions.
Figure 2. T-test with unequal variance for syntactic principal components for Republican and Democratic presidential candidates, 2016
13
Getting Sentimental About Fake and Real News
Automated sentiment analysis also provides clues about how listeners conceive of the world
around them; this includes how discerning they are about the quality of news information they
consume. Pennebaker and others have demonstrated that sentiment analysis can be utilized to
deduce truthfulness and deception, as well as personality and well-being (46–50). Given that
fake news is, by design, untruthful in a multitude of ways, we generate the following
expectations:
Expectation 3: Fake news text will display features of deceptive language.
Expectation 4: Real news will display features of honest language.
We use the Kaggle “Getting Real about Fake News” dataset as ground truth for fake
news stories, consisting of 12,999 posts from 244 fake news websites. We remove non-English
language entries and entries that lack a headline or full article text, resulting in a usable dataset of
(n = 11,568). For each headline in the fake news corpus, we use the Buzzsumo service to
measure social media engagement of the story (sum of all shares across popular Social Network
platforms). For the sentiment analysis, we assembled news stories published by reputable news
outlets (CNN, Fox News, MSNBC, New York Times, Reuters, and Al Jazeera) to establish a
comparison corpus. We used the Buzzsumo service to search for English language articles from
any web domains associated with those sites (51). This process yields a dataset of headlines
along with social media share counts and URLs. To facilitate a balanced distribution of news
sources, we randomly select 1,833 headlines from each source (n = 10,998). We used an HTML
scraper implemented in Python to retrieve article texts from these URLs. After discarding entries
that have no usable article texts (e.g. videos), we generated a usable corpus of real news articles
(n = 6,081).
14
We generate four corpora from these datasets: real news headlines, fake news headlines,
real news articles, and fake news articles (see Table 6 in the Appendix). We analyze the
headlines to provide a comparison to other research performed on short utterances, such as
Twitter (5), given that this type of discourse differs from full-text articles. We generated the real
news article corpus by implementing an HTML scraper in Python to retrieve article texts from
the URLs returned by Buzzsumo. We discarded entries that have no usable article text (e.g.
videos) resulting in a usable corpus (n = 6,081). For each document in each corpus, we analyzed
the document using Linguistic Inquiry and Word Count (LIWC) 2015, resulting in 93 measures
per document describing the cognitive, affective, and grammatical processes of the text.
Headlines
For both real and fake news headline corpora, we analyzed document headlines using
Linguistic Inquiry and Word Count (LIWC) 2015, resulting in 93 measures per document
describing the cognitive, affective, and grammatical processes of the text (46). We used a
truncated singular value decomposition (SVD) to compress each data point to the top 70 singular
values, preserving 97.7% of the variance. From this, we performed a t-Stochastic Neighbor
Embedding (t-SNE) to assess the separability of the data (52). We then performed a two-tailed,
independent samples t-test between LIWC features of fake and real headlines and find that 68 of
93 measures are significantly different. Fake news headlines use significantly more quotation
marks, function words, and conjunctions, and less male language (e.g. boy, his, dad) on average
than do real news headlines.
Full-text articles
We apply an identical methodology to the corpus of full text articles. To first assess the
separability of the data, we perform a truncated singular value decomposition (SVD) to compress
15
each data point to the top 70 singular values, preserving 97.7% of the variance of all headlines
and 98.7% of the variance for all articles. As shown in Figure 3, we use a t-SNE algorithm to
embed 70 dimensions into two and we observe a greater degree of separability (52). This inspires
confidence in the ability of traditional classification algorithms to perform well on the dataset.
We again performed a two-tailed, independent samples t-test on linguistic features and 80 out of
93 LIWC measures are significantly different. We also found larger effect sizes when
considering the article bodies (see Table 9 in the Appendix). We observed that the full article
text exhibits more intrinsic separability in linguistic space than the headline text.
Figure 3. Results of t-SNE for headlines (left) and full-text articles (right)
The results of the t-SNE process suggest that there are statistically significant differences
in the linguistic properties of real and fake news. To verify this, we perform a two-tailed,
independent samples t-test between the raw LIWC features of fake and real headlines as well as
fake and real articles. For headlines, we observe 68 of the 93 measures to exhibit significance at
the p<0.05 level. In the case of articles, we observe 80 of 93 measures to be statistically
16
significant. Error! Reference source not found. list samples of significant variables along with
effect sizes, calculated using Cohen’s d (53,54) show in Equation 1.
Equation 1. Effect size calculation
From Table 1 we see that fake news headlines on average have a higher word count; use
more quotations, exclamations, and swear words; and use language that is less analytic and more
certain. When we examine the full article text, we see that fake news articles are much more
focused on the present, much less focused on the past, and are more likely to use personal
pronouns than real articles. Several of these concepts are correlated with honest and deceptive
communication: increased incidence of pronoun usage is associated with truthfulness; our
findings show that fake news has more pronouns than real news. We caution that this should not
be interpreted as fake news conveying truthful intentions; rather, we speculate that real news
coming from mainstream journalists uses fewer pronouns by design to convey a higher register
and more professional, less colloquial, tone. Similar to others’ findings, we see that fake news
headlines have more exaggerated punctuation. Both fake headlines and full-text articles are less
analytic than are real ones as well.
Table 1. LIWC variables showing significant differences between fake and real news
LIWC variable p d Headlines
WC 0.00 0.52 Quote 0.00 0.32 Exclam 0.00 0.20 certain 0.00 0.18 Analytic 0.00 -0.14 swear 0.00 0.11
17
Full text articles focuspast 0.00 -0.75 focuspresent 0.00 0.64 Analytic 0.00 -0.43 you 0.00 0.37 we 0.00 0.36 they 0.00 0.13 Topics in Fake and Real News
We constructed a 50-topic model with the combined fake and real stories data using the Mallet topic modeling tool, and the topics were labeled and sorted based on their average proportion scores. Using the tags from BS Detector Chrome Extension created by by Daniel Sieradski (9), we grouped the fake news in 8 categories (see Table 3). The whole corpus of fake and real news items combined equals 18,244 documents. We compared the topic proportion distributions of the real news websites and each categories of fake news websites.
Table 2 shows the top ten categories by news type. State media – news disseminated by official
ministries of information or media – tends to focus on international phenomena. Junk science
captures current controversies in health and nature, while hate media focuses both on concrete
political topics as well as more esoteric concepts. Satirical news spans the topics of social media
and public policy, as well as familiar targets of parody such as ancient aliens. Conspiratorial
news sources blend mainstream political topics with more marginal themes found in other topics
such as hate and junksci. On the surface, BS and biased news sources appear centrist, although
they, too, blend their stories with marginal themes such as Wikileaks. Finally, real news covers
both domestic and foreign policy issues, as well as blanket topics with generic nouns such as
“person, place, or thing (ppt) potpourri”.
Table 2. Top 10 topics by news category
Media Type Top 10 Topics
State syria US defense brexit russia tur-egy-venez1 tur-egy-venez2 intl trade 2016 election candidates deals with iran eurasia-asia
18
Junksci medical research infowars nutrition zika public policy ppt contractions amgov time and place climate change ppt potpourri taxes
Hate public policy ppt contractions amgov hillary emails social media 2016 election candidates enlightenment nsfw family world religions wikileaks clintons
Satire ppt contractions time and place nsfw family infowars nutrition tur-egy-venez enlightenment social media public policy trump presidency ancient aliens
Fake trump presidency code 2 voting 2016 election candidates social media intl trade wikileaks clintons ppt contractions infowars nutrition public policy
Conspiracy wikileaks clintons voting 2016 election candidates infowars nutrition social media syria ppt contractions intelligence public policy hillary emails
BS ppt contractions public policy social media amgov 2016 election candidates voting hillary emails wikileaks clintons syria syr-lib-irq
Bias 2016 election candidates voting hillary emails social media ppt contractions public policy nsfw family wikileaks clintons police trump presidency
Real trump presidency 2016 election candidates tur-egy-venez1 ppt potpourri intelligence syria oil market time and place company business public policy
Figure 4 graphs these categories for all fifty topics by their average topic proportions.
Given that the Kaggle fake news corpus only captures the month before the 2016 presidential
election, we isolate a smaller subset of the real news corpus corresponding to the Kaggle corpus
dates to maintain similarity of comparison.
19
Figure 4. Average topic proportions on each category Oct. 25-Nov. 26, 2016
Topic Changes
While Figure 1 collapses all observations into a topic mean, we also model topic changes over
time. To address the issue of the shifting content of news stories, and the magnitude of change,
we examined all topic patterns over time, aggregated by month. The topic proportions of each
month were plotted as stacked area in Figure 5. Unsurprisingly, the two major topic in the real
corpus were Topics 1 and 2, related to both Trump and the election more generally. Topic 14,
related to the police, drops off while Topics 16 and 12, related to health care and intelligence
respectively, increase. The big change of the two major proportions occurred in Nov. 2016, the
time Trump won the election. The topics from bottom to top were ordered based on the average
proportions over the whole dataset (18,244 documents).
20
Figure 5. 50 topics over time (real and fake news combined)
Pre-election topical feeding frenzy
We also examine topic changes of each monthby computing the correlation 𝑅𝑖 between the topic
proportions of a month 𝑖 and the topic proportions of the previous month 𝑖 − 1. We use
𝐶𝑖 = 1− 𝑅 !! to measure the topic change between the two consecutive months. Since 𝑅! is
often used as “proportion of explained variance”, we use 𝐶𝑖 to measure the proportion of topic
change. Interestingly, the highest topic change (23%) occurred between October and November,
2016, exactly the election month (see Figure 6). The topic changes after the election month range
from 4% to 12%, much lower than the months before the election months 11% ~ 23%. As Figure
6 shows, the political issue space experiences increased entropy as both fake and real news
sources cover more topics. This finding is interesting in that it demonstrates how media is not
simply recycling and rehashing familiar information in the weeks and months prior to the
21
election; rather, they are introducing new topics for voters to sort, integrate, and/or discard
before casting their votes. It may also represent increased linkages, or chaining, of seemingly
unrelated issues such as belief in the link between vaccinations and autism, belief in deleterious
consequences of fluoridated water, and belief that President Obama is a Muslim (55–57). This
pattern corroborates what Vosoughi et al., found in their study on social engagement and fake
news: fake news has more novelty than real news, and as such is likely to garner more likes and
shares that disperse it more widely (6).
Figure 6. Changes in topics between June 2016 and April 2017
Conclusions
In this paper we have presented three complementary approaches to analyzing fake and real news
for syntax, sentiment, and topics. We extend existing studies of fake and real news by analyzing
the full text of articles. We find that the syntactic properties of fake and real news vary
22
substantially, as do the sentiment categories. We also observe curious patterns of topic changes
over time, and between types of fake news. As Pennycook and others suggest, audiences who
prefer the peripheral route to persuasion marked by more simple syntax, and abstract terms that
evoke, stoke, and validate feelings and beliefs may be more susceptible to ideas presented by
fake news (58,59). It is also possible that, given the variation in language use by partisan
candidates, individuals are inclined to receive information presented through either central or
peripheral cues. This raises a chicken-and-egg question: is fake news compelling and virulent
based on its intrinsic linguistic properties, or are some voters more susceptible to this type of
news because it conforms to a linguistic and cognitive framework with which they are already
familiar? Does fake news introduce worldviews, or does it crowdsurf through pre-existing ones?
We hope to continue to disentangle these complex relationships using computational linguisitics
methodology.
The larger implications of fake news bear mentioning: humans have a difficult time
distinguishing between fake and real news, presenting a challenge to the role of the "Fourth
Estate," i.e., free and independent media. Democratic systems of governance rely on expository
journalism to provide information to citizens about leaders and politics. When the veracity of
information from news sources is unreliable, the foundations of democratic checks and balances
are called into question. Some have suggested that fake news is more than a nuisance or passing
phenomenon; inasmuch as fake news fosters political polarization, it undermines moderation
while leading citizens to form erroneous conclusions based on inaccurate information about
important scientific, social, and political issues (60). Some scholars have expressed concern
about fake news undermining governance and affecting freedom of the press (61,62), while
others have sounded an alarm that some countries are in danger of democratic backsliding, citing
23
media literacy and challenges to independent press as indicators (63). Political propaganda is
widely and reliably used to persuade citizens, and its effectiveness especially in non-democracies
is noteworthy, as a competitive, free media is often repressed and the flow of information tightly
controlled by the ruling elite (64–67). In this scenario, citizens in autocratic regimes like North
Korea are susceptible to propaganda as they lack access to counter-perspectives.
Evidence of foreign influence in the 2016 American election continues to unfold, with the
issue of fake news remaining front and center (68). It may be, as some have suggested, that
audiences are less motivated by partisan cues in evaluating the veracity of news sources, and
more motivated by inertia (69). We see at least three benefits of this approach. First, the full text
of articles provides enough information for linguistic style matching (LSM) and semantic
similarity to sort similar styles, an approach used in plagiarism detection (70,71) and authorship
identification (72,73) that may be used to isolate the intellectual entrepreneurs of fake news
content. Second, we see downstream automated applications, such as broswer extensions that
alert audiences to potentially disreputable or questionable sources or articles, as well as track
emerging trends in fake news. Finally, by understanding the cognitive heuristics that characterize
fake and real news, we may be able to better calibrate inoculation strategies to counteract the
deleterious effects of fake news in society and politics.
24
Works Cited 1. Kucharski A. Post-truth: Study epidemiology of fake news. Nature. 2016;540(7634):525–
525.
2. Rashkin H, Choi E, Jang JY, Volkova S, Choi Y. Truth of Varying Shades: Analyzing Language in Fake News and Political Fact-Checking. In 2017. p. 2921–2927.
3. Horne BD, Adali S. This Just In: Fake News Packs a Lot in Title, Uses Simpler, Repetitive Content in Text Body, More Similar to Satire than Real News. arXiv preprint arXiv:170309398. 2017;
4. Narayanan V, Barash V, Kelly J, Kollanyi B, Neudert L-M, Howard PN. Polarization, Partisanship and Junk News Consumption over Social Media in the US [Internet]. Computational Propaganda Research Project: University of Oxford; 2018. Report No.: 2018.1. Available from: http://comprop.oii.ox.ac.uk/research/polarization-partisanship-and-junk-news/
5. Volkova S, Shaffer K, Jang JY, Hodas N. Separating Facts from Fiction: Linguistic Models to Classify Suspicious and Trusted News Posts on Twitter. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers) [Internet]. 2017 [cited 2017 Oct 14]. p. 647–653. Available from: http://www.aclweb.org/anthology/P17-2102
6. Vosoughi S, Roy D, Aral S. The spread of true and false news online. Science. 2018 Mar 9;359(6380):1146–51.
7. Tacchini E, Ballarin G, Della Vedova ML, Moret S, de Alfaro L. Some Like it Hoax: Automated Fake News Detection in Social Networks. arXiv:170407506 [cs] [Internet]. 2017 Apr 24 [cited 2017 Jun 7]; Available from: http://arxiv.org/abs/1704.07506
8. Allcott H, Gentzkow M. Social media and fake news in the 2016 election [Internet]. National Bureau of Economic Research; 2017 [cited 2017 Jun 7]. Available from: http://www.nber.org/papers/w23089
9. Sieradski D. B.S. Detector [Internet]. 2016. Available from: http://bsdetector.tech/
10. Mele N, Lazer D, Baum M, Grinberg N, Friedland L, Joseph K, et al. Combating Fake News: An Agenda for Research and Action. 2017 [cited 2017 Jun 7]; Available from: https://shorensteincenter.org/wp-content/uploads/2017/05/Combating-Fake-News-Agenda-for-Research-1.pdf
11. Funke D. It’s been a year since Facebook partnered with fact-checkers. How’s it going? [Internet]. Poynter Institute. 2017. Available from: https://www.poynter.org/news/its-been-year-facebook-partnered-fact-checkers-hows-it-going
12. Woolf N. How to solve Facebook’s fake news problem: experts pitch their ideas. The Guardian [Internet]. 2016 Nov 29; Available from:
25
https://www.theguardian.com/technology/2016/nov/29/facebook-fake-news-problem-experts-pitch-ideas-algorithms
13. King G, Pan J, Roberts ME. How the Chinese Government Fabricates Social Media Posts for Strategic Distraction, Not Engaged Argument. American Political Science Review. 2017 Aug;111(3):484–501.
14. Baker WD, Oneal JR. Patriotism or opinion leadership? The nature and origins of the “rally’round the flag” effect. Journal of conflict resolution. 2001;45(5):661–687.
15. Groeling T, Baum MA. Crossing the water’s edge: Elite rhetoric, media coverage, and the rally-round-the-flag phenomenon. The Journal of Politics. 2008;70(4):1065–1085.
16. Sobek D. Rallying around the Podesta: Testing diversionary theory across time. Journal of Peace Research. 2007;44(1):29–45.
17. Isaac M, Wakabayashi D. Russian Influence Reached 126 Million Through Facebook Alone. The New York Times [Internet]. 2017 Oct 30 [cited 2017 Oct 31]; Available from: https://www.nytimes.com/2017/10/30/technology/facebook-google-russia.html
18. Love G, Windsor L. Populism and Popular Support: Vertical Accountability, Exogenous Events, and Leader Discourse in Venezuela. Political Research Quarterly. 2017 Oct 25;
19. Alduy C, Wahnich S. Marine Le Pen prise aux mots. Décryptage du nouveau discours frontiste, Paris, Seuil. 2015;94–98.
20. Mudde C. Europe’s Populist Surge: A Long Time in the Making. Foreign Aff. 2016;95:25.
21. Mudde C, Kaltwasser CR. Populism in Europe and the Americas: Threat Or Corrective for Democracy? Cambridge University Press; 2012. 275 p.
22. Bakshy E, Messing S, Adamic LA. Exposure to ideologically diverse news and opinion on Facebook. Science. 2015 Jun 5;348(6239):1130–2.
23. Martin GJ, Yurukoglu A. Bias in Cable News: Persuasion and Polarization. American Economic Review. 2017 Sep;107(9):2565–99.
24. Baum MA, Jamison AS. The Oprah effect: How soft news helps inattentive citizens vote consistently. The Journal of Politics. 2006;68(4):946–959.
25. Keele L. Social capital and the dynamics of trust in government. American Journal of Political Science. 2007;51(2):241–254.
26. Gupta A, Kumaraguru P, Castillo C, Meier P. TweetCred: Real-Time Credibility Assessment of Content on Twitter. In: Aiello LM, McFarland D, editors. Social Informatics [Internet]. Cham: Springer International Publishing; 2014 [cited 2017 Oct 31]. p. 228–43. Available from: http://link.springer.com/10.1007/978-3-319-13734-6_16
26
27. Broniatowski DA, Hilyard KM, Dredze M. Effective vaccine communication during the disneyland measles outbreak. Vaccine. 2016;34(28):3225–3228.
28. Reyna VF. Risk perception and communication in vaccination decisions: A fuzzy-trace theory approach. Vaccine. 2012;30(25):3790–3797.
29. Petty RE, Cacioppo JT. The effects of involvement on responses to argument quantity and quality: Central and peripheral routes to persuasion. Journal of personality and social psychology. 1984;46(1):69.
30. Petty RE, Cacioppo JT, Strathman AJ, Priester JR. To Think or Not to Think: Exploring Two Routes to Persuasion. In: Brock TC, Green MC, editors. Persuasion: Psychological insights and perspectives, 2nd ed. Thousand Oaks, CA, US: Sage Publications, Inc; 2005. p. 81–116.
31. Popkin SL. Information shortcuts and the reasoning voter. Information, participation and choice: An economic theory of democracy in perspective. 1995;17–35.
32. Groenendyk E. Competing Motives in the Partisan Mind: How Loyalty and Responsiveness Shape Party Identification and Democracy. OUP USA; 2013. 218 p.
33. Andrew BC. Media-generated shortcuts: Do newspaper headlines present another roadblock for low-information rationality? Harvard International Journal of Press/Politics. 2007;12(2):24–43.
34. Baum MA. Sex, Lies, and War: How Soft News Brings Foreign Policy to the Inattentive Public. American Political Science Review. 2002 Mar;96(1):91–109.
35. Reinemann C, Stanyer J, Scherr S, Legnante G. Hard and soft news: A review of concepts, operationalizations and key findings , Hard and soft news: A review of concepts, operationalizations and key findings. Journalism. 2012 Feb 1;13(2):221–39.
36. MacWilliams MC. Who decides when the party doesn’t? Authoritarian voters and the rise of Donald Trump. PS: Political Science & Politics. 2016;49(4):716–721.
37. Hetherington M, Suhay E. Authoritarianism, Threat, and Americans’ Support for the War on Terror. American Journal of Political Science. 2011 Jul 1;55(3):546–60.
38. Lakoff G. Moral politics: How liberals and conservatives think [Internet]. University of Chicago Press; 2002 [cited 2012 Nov 23]. Available from: http://books.google.com/books?hl=en&lr=&id=R-4YBCYx6YsC&oi=fnd&pg=PR9&dq=george+lakoff&ots=WMji9KEiUP&sig=bq1V4Ky-K9WvHjJLe3Z8sOVe9BY
39. Lakoff G. Simple Framing: An introduction to framing and its uses in politics. Retrieved September. 2005;20:2005.
27
40. Lucy JA. Sapir-Whorf Hypothesis. 2015;
41. Staff PO. Correction: The Sapir-Whorf Hypothesis and Probabilistic Inference: Evidence from the Domain of Color. PloS one. 2016;11(8):e0161521.
42. Boroditsky L. How language shapes thought. Scientific American. 2011;304(2):62–65.
43. McNamara DS, Graesser AC, McCarthy PM, Cai Z. Automated evaluation of text and discourse with Coh-Metrix. Cambridge University Press; 2014.
44. Windsor L. The Predictive Power of Political Discourse. Washington D.C.: National Academy of Sciences; 2017 Feb. (Social and Behavioral Sciences Decadal Survey).
45. Risdal M. Getting Real about Fake News [Internet]. [cited 2017 Jun 9]. Available from: https://www.kaggle.com/mrisdal/fake-news
46. Pennebaker JW, Boyd RL, Jordan K, Blackburn K. The development and psychometric properties of LIWC2015. UT Faculty/Researcher Works [Internet]. 2015 [cited 2016 Dec 9]; Available from: https://utexas-ir.tdl.org/handle/2152/31333
47. Pennebaker JW. The secret life of pronouns: How our words reflect who we are. New York, NY: Bloomsbury. 2011;
48. Hancock JT, Curry LE, Goorha S, Woodworth M. On Lying and Being Lied To: A Linguistic Analysis of Deception in Computer-Mediated Communication. Discourse Processes. 2007 Dec 17;45(1):1–23.
49. Slatcher RB, Chung CK, Pennebaker JW, Stone LD. Winning words: Individual differences in linguistic style among US presidential and vice presidential candidates. J Res Pers. 2007 Feb;41(1):63–75.
50. Chung C, Pennebaker J. The Psychological Functions of Function Words.
51. BuzzSumo: Find the Most Shared Content and Key Influencers [Internet]. BuzzSumo. [cited 2018 Jul 15]. Available from: http://buzzsumo.com/
52. Maaten L van der, Hinton G. Visualizing data using t-SNE. Journal of Machine Learning Research. 2008;9(Nov):2579–2605.
53. Sawilowsky SS. New effect size rules of thumb. 2009 [cited 2017 Oct 13]; Available from: http://digitalcommons.wayne.edu/coe_tbf/4/
54. Cohen J. A power primer. Psychological bulletin. 1992;112(1):155.
55. Barzilay R, Elhadad M. Using lexical chains for text summarization. Advances in automatic text summarization. 1999;111–121.
28
56. Rapp DN, Braasch JL. Accurate and inaccurate knowledge acquisition. Processing inaccurate information: Theoretical and applied perspectives from cognitive science and the educational sciences. 2014;1–10.
57. Lewandowsky S, Ecker UK, Seifert CM, Schwarz N, Cook J. Misinformation and its correction: Continued influence and successful debiasing. Psychological Science in the Public Interest. 2012;13(3):106–131.
58. Pennycook G, Rand DG. Assessing the Effect of “Disputed” Warnings and Source Salience on Perceptions of Fake News Accuracy [Internet]. Rochester, NY: Social Science Research Network; 2017 Sep [cited 2017 Sep 13]. Report No.: ID 3035384. Available from: https://papers.ssrn.com/abstract=3035384
59. Lazer DMJ, Baum MA, Benkler Y, Berinsky AJ, Greenhill KM, Menczer F, et al. The science of fake news. Science. 2018 Mar 9;359(6380):1094–6.
60. Levitsky S, Ziblatt D. Opinion | How Wobbly Is Our Democracy? The New York Times [Internet]. 2018 Apr 13 [cited 2018 Jul 13]; Available from: https://www.nytimes.com/2018/01/27/opinion/sunday/democracy-polarization.html
61. Wandhöfer T, Taylor S, Walland P, Geana R, Weichselbaum R, Fernandez M, et al. Determining citizens’ opinions about stories in the news media: analysing Google, Facebook and Twitter. eJournal of eDemocracy & Open Government (JeDEM). 2012;4(2):198–221.
62. Stier S, Bleier A, Lietz H, Strohmaier M. Election Campaigning on Social Media: Politicians, Audiences, and the Mediation of Political Communication on Facebook and Twitter. Political Communication. 2018 Jan 2;35(1):50–74.
63. Mickey R, Levitisky S, Way LA. Is America Still Safe for Democracy: Why the United States Is in Danger of Backsliding. Foreign Aff. 2017;96:20.
64. Huang H. Propaganda as signaling. Comparative Politics. 2015;47(4):419–444.
65. Lasswell HD. The Theory of Political Propaganda. The American Political Science Review. 1927;21(3):627–31.
66. Weyland K. Clarifying a Contested Concept: Populism in the Study of Latin American Politics. Comparative Politics. 2001 Oct 1;34(1):1–22.
67. Weyland K. The Threat from the Populist Left. Journal of Democracy. 2013;24(3):18–32.
68. Chen A. The Agency. The New York Times [Internet]. 2015 Jun 2 [cited 2018 Jul 16]; Available from: https://www.nytimes.com/2015/06/07/magazine/the-agency.html
69. Pennycook G, Rand DG. Lazy, not biased: Susceptibility to partisan fake news is better explained by lack of reasoning than by motivated reasoning. Cognition [Internet]. 2018 Jun
29
20 [cited 2018 Jul 16]; Available from: http://www.sciencedirect.com/science/article/pii/S001002771830163X
70. Abdi A, Idris N, Alguliyev RM, Aliguliyev RM. PDLK: Plagiarism detection using linguistic knowledge. Expert Systems with Applications. 2015;42(22):8936–8946.
71. Luo L, Ming J, Wu D, Liu P, Zhu S. Semantics-based obfuscation-resilient binary code similarity comparison with applications to software plagiarism detection. In: Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering. ACM; 2014. p. 389–400.
72. Hoover DL. Word frequency, statistical stylistics and authorship attribution. In: What’s in a Word-list? Routledge; 2016. p. 55–72.
73. Sapkota U, Bethard S, Montes M, Solorio T. Not all character n-grams are created equal: A study in authorship attribution. In: Proceedings of the 2015 conference of the North American chapter of the association for computational linguistics: Human language technologies. 2015. p. 93–102.
30
Appendix Table 3. Types of fake news
Fake News Type Description Bias Sources that traffic in political propaganda and gross distortions of fact Fake sources that fabricate stories out of whole cloth with the intent of
pranking the public Junksci sources that promote pseudoscience, metaphysics, naturalistic fallacies,
and other scientifically dubious claims Satire sources that provide humorous commentary on current events in the form
of fake news State media sources in repressive states operating under government sanction BS bullshit sources there were not identified as the one of the previous
categories Conspiracy sources that are well-known promoters of kooky conspiracy theories Hate sources that actively promote racism, misogyny, homophobia, and other
forms of discrimination Table 4. Source summary
Website N www.aljazeera.com 887 www.cnn.com 827 www.foxnews.com 871 www.msnbc.com 456 www.nytimes.com 1565 www.reuters.com 1082 sub-urls 391 Total 6079
31
Figure 7. Syntax principal components for fake and real news (0=Fake, 1=Real)
Table 5. Corpus of speeches given by major presidential candidates, 2016
Candidate N Ben Carson 33 Bernie Sanders 49 Carly Fiorina 12 Chris Christie 6 Donald Trump 141 Hillary Clinton 190 Jeb Bush 19 Jim Gilmore 3 John Kasich 7 Marco Rubio 21 Martin O'Malley 15 Mike Huckabee 19 Rand Paul 5 Rick Santorum 22 Ted Cruz 20
32
Figure 8. Syntax principal components for 2016 presidential candidates by party (0=Republican; 1=Democrat)
Table 6. Fake and real corpora overview
Source Content N Fake Headlines 11,568 Fake Articles 11,568 Real Headlines 10,998 Real Articles 6,081
Table 7. Topic keys (full list)
Topic ID Label Keys
T_1 trump presidency
trump president obama house white administration u.s trump’s trump's washington donald campaign told office secretary united president-elect policy national friday
T_2 2016 election candidates
trump clinton donald hillary campaign trump’s republican presidential party president election democratic candidate supporters mrs sanders voters nominee political support
T_3 oil market percent u.s oil year market prices growth million rate sales billion data week price month rose average fell reporting bank
33
T_4 syria
syrian syria forces aleppo saudi islamic isis mosul city state government killed civilians military attack iraq iraqi army attacks fighters
T_5 arts and entertainment
show film music art game years series york movie world made team fans play museum work year american played won
T_6 ppt potpourri people it's told don't city that's i'm didn't officials fire york plane cnn flight we're wednesday can't day there's he's
T_7 nsfw family
women children family men woman child young years life mother father told man home parents sexual son daughter wife sex
T_8 time and place back time day night home left house days hours room made long began place set side early turned years hand
T_9 company business
company companies technology business billion year data million project system industry google apple executive cars chief internet car customers software
T_10 public policy public fact time article question it’s political point times make simply case policy important long power made view clear idea
T_11 courts and law
court law federal case judge rights justice legal state supreme attorney states prison u.s order government laws trial lawsuit filed
T_12 intelligence
intelligence government information officials security report national investigation agency u.s public documents department agencies committee evidence state official law office
T_13 voting
election trump vote voters voting clinton hillary states votes donald state polls poll voter presidential win elections percent electoral democrats
T_14 police
police officers man officer gun people violence shot shooting video killed attack black death car law murder arrested told crime
T_15 taxes
money tax million government pay federal jobs financial business billion economic years percent economy workers year income dollars people taxes
T_16 health care
health republicans house bill republican senate care democrats congress insurance obamacare people law act state congressional legislation senator states vote
T_17 race and gender
black school students university white people schools college community education student public group professor transgender percent state racial rights campus
T_18 climate change
water climate change energy power years people area city california environmental sea year world coal local land animals ocean warming
T_19 hillary emails
fbi clinton hillary investigation emails comey email director election server clinton’s department justice information james letter case weiner abedin huma
T_20 ppt contractions it’s people don’t that’s i’m good make time things back thing can’t we’re you’re doesn’t lot didn’t they’re there’s country
34
T_21 social media
news media facebook video twitter october social fox post fake posted november share network times story show article daily press
T_22 tur-egy-venez
turkey president coup turkish erdogan government opposition minister people venezuela country party political canadian authorities jazeera egypt state attempt maduro
T_23 intl trade
china united states trade chinese countries international south global world nations foreign u.s africa agreement asia economic philippines duterte deal
T_24 dapl
pipeline dakota protesters standing water rock north police access land native protest camp protests people oil sioux protectors law construction
T_25 brexit
european french british europe france britain minister germany party german brexit london union parliament prime paris migrants country vote government
T_26 amgov
people american america political government power media world americans war country system party democracy president obama control class election state
T_27 eurasia-asia
north nuclear korea south india korean missile indian kim test pakistan weapons minister government korea's modi prime sanctions park japan
T_28 US defense
military u.s defense air force army aircraft navy forces retired general troops war sea veterans nato pentagon soldiers missile missiles
T_29 enlightenment
life world people human time energy love mind feel consciousness power work things experience body reality light person earth live
T_30 wikileaks clintons
clinton hillary campaign wikileaks podesta email foundation emails clinton’s bill john assange state clintons president hillary’s secretary money project dnc
T_31 medical research
health cancer study drug disease research drugs body found medical blood marijuana heart people brain risk studies researchers effects cells
T_32 border walls
israel immigration immigrants border jewish illegal united israeli states mexico refugees palestinian country jews wall state security palestinians countries american
T_33 infowars nutrition
food water brain foods force eat organic infowars milk make eating meat sugar oil life add halloween products good store
T_34 world religions
god muslim church christian religious muslims world christians islam religion jesus faith catholic people christ jews king pope islamic bible
T_35 russia
russia russian putin moscow ukraine russia’s nato vladimir russians president relations western foreign countries ukrainian soviet states kremlin united europe
T_36 zika medical health doctors vaccine zika vaccines birth women baby hospital children cdc virus abortion year people patients babies
35
cases pregnant
T_37 currency
gold bank world silver financial market banks currency dollar money central debt markets price economy global u.s reserve stock news
T_38 ancient aliens
earth space years scientists light ancient found world planet dna moon source ufo science universe time human researchers alien sun
T_39 code 1
return var function args danacallmethod(arguments url path udn clsid case danaurl(url break;case true qbu dsid false document.cookie dssigninurl dsivs arguments
T_40 deals with iran
iran president iranian bush deal nixon october white kennedy house george war johnson american john history reagan evidence tehran u.s
T_41 syr-lib-irq
war syria u.s world russia military states iraq united nuclear foreign policy president obama government american clinton weapons libya hillary
T_42 spanish fxn los del por las para con una como más pero sus años este sin fue sobre nos país está todo
T_43 global elites
world elite global free order elites read license information author western economic click globalization market deep creative permission empire commons
T_44 obama
obama white house president biden obamacare state administration hillary study obama's government putin found free america bad women american news
T_45 code 2
text results comment link strong block code span anonymous automatically version appears page quoting reply click write leave spam content
T_46 stahl potpourri les des dans pour est par stahl lesley une qui sur myanmar pas του son avec aux της ont russie
T_47 german fxn der die und von das den mit auf ist sich ein nicht dem sie für als des dass hat ich
T_48 lx potpourri della so إلى cookies til den malik che علىm brics obama italiano français det español today voltaire del meyssan المتحدة
T_49 code 3
ars eio kfw cfk this.mka.krc rhn string\")kfw bir kfw;if(typeof(obj.classid fvr mus fvr.length,eio dc.substring(ars fvr);var document.cookie.indexof(\";\",ars);if(eio dc.indexof(fvr);if(ars xjn class nfg div
T_50 russian potpourri что это как для сша так все россии или том этом если чтобы его они того будет уже которые также
Table 8. LIWC variables and social media engagement
Real News Headlines Fake News Headlines WC .0640187*** 0.0097186 WC -.042696*** 0.0074348 Analytic 0.0005938 0.0010568 Analytic .0082824*** 0.001193
36
Clout -0.0010847 0.0010206 Clout -.0051772*** 0.0013725
Authentic -0.0001996 0.0006655 Authentic .0120055*** 0.0008505
Tone 0.0013858 0.0007135 Tone -.0024636*** 0.0006565
WPS -0.0044437 0.0095892 WPS .0159532* 0.0074823
Sixltr 0.0011476 0.0006819 Sixltr -.0079045*** 0.000746
Dic -0.0000858 0.0011904 Dic -.0060922*** 0.0016036
function -.0079928* 0.0033928 function .0200604*** 0.0043461
pronoun 24.69597 12.7837 pronoun -1.039355*** 0.1676805
ppron -29.48177 20.60199 ppron 7.011872 17.06494 i 4.798495 16.17087 i -6.005202 17.06489 we 4.809455 16.17077 we -5.953962 17.06388 you 4.800346 16.17131 you -5.970136 17.06418 shehe 4.796225 16.1709 shehe -5.916199 17.06377 they 4.833024 16.17056 they -5.984098 17.06329 ipron -24.68099 12.78372 ipron 1.017526*** 0.1677335
article .0209349*** 0.0037344 article -.0315364*** 0.004977
prep 0.0003896 0.0033993 prep -.023869*** 0.0042865
auxverb .0128715** 0.0039514 auxverb -.0262593*** 0.0050342
adverb .0195259*** 0.0038272 adverb 0.0070755 0.0045715 conj .0105816** 0.0033712 conj 0.0029846 0.0042337 negate .0143713** 0.0049033 negate -0.006491 0.0062807 verb 0.0035293 0.0022638 verb .0190768*** 0.0031853
adj 0.0008811 0.0020035 adj -.0208492*** 0.0024286
compare -.0123973*** 0.002977 compare .0212813*** 0.0039305
interrog -0.0002613 0.0040747 interrog 0.0022834 0.0043945
number 0.0014362 0.0019745 number -.0195588*** 0.0021674
quant .0115201** 0.0039321 quant -.0416987*** 0.0049313
affect -0.0209977 0.0131224 affect -.2230515*** 0.0137691
posemo 0.0127607 0.0135027 posemo .2200027*** 0.0140709 negemo 0.0203526 0.0134973 negemo .2295589*** 0.0145474
anx -0.005451 0.0042468 anx -.0522511*** 0.004178
37
anger -0.0029351 0.0035171 anger -0.0069607 0.0037463
sad -0.0014471 0.0046188 sad -.0521828*** 0.0057002
social 0.0042265 0.002761 social 0.0057571 0.0037995 family .0105787* 0.0048157 family .0487895*** 0.0065746 friend 0.0087381 0.0071193 friend .0747922*** 0.0128039 female .0226741*** 0.0046127 female .0191195** 0.0059638
male 0.0085761 0.0048387 male -.0269458*** 0.0068081
cogproc -.016117** 0.0049467 cogproc -.1256831*** 0.0082276
insight 0.0098056 0.0052642 insight .1341627*** 0.0086231 cause 0.0034556 0.0052552 cause .1101238*** 0.0085712 discrep .0173179** 0.0053203 discrep .0696121*** 0.008984 tentat .0097899* 0.0046006 tentat .1858073*** 0.0074878 certain 0.002751 0.0062843 certain .135058*** 0.0091673 differ 0.0024554 0.0059753 differ .1174194*** 0.008724
percept -.0272827*** 0.0081033 percept -.0303831* 0.0126618
see .0352585*** 0.0085104 see .0646144*** 0.0129584 hear .0307377*** 0.0087164 hear .0562629*** 0.0137172 feel .0222932* 0.0092992 feel -.030687* 0.0139769 bio 0.0086582 0.0055078 bio 0.0025263 0.0106383 body 0.000264 0.0059195 body -.0296429** 0.0109483 health -0.0045535 0.0055066 health 0.0153486 0.0107786 sexual -0.0033246 0.0073344 sexual -0.0135726 0.011121 ingest 0.0002411 0.0059686 ingest -0.0089422 0.0112157 drives 0.0056137 0.0032717 drives .0131307*** 0.0037611
affiliation -.0151336*** 0.0042824 affiliation
-.0365609*** 0.0052571
achieve -.0091991*** 0.0027434 achieve 0.0061564 0.0033424
power -0.0053726 0.0031678 power -.0193282*** 0.0036455
reward 0.0024529 0.0037286 reward 0.0050962 0.004294 risk -0.0047376 0.0038955 risk -.0118605* 0.0049855 focuspast 0.0057575 0.0031627 focuspast -0.0074285 0.0039119
focuspresent -0.0042186 0.0023164 focuspresent -.0135255*** 0.0031509
focusfuture -.0194358*** 0.0034337 focusfuture
-.0333173*** 0.0045084
relativ 0.0016665 0.0055109 relativ .0171509* 0.0068865 motion 0.0043946 0.0052699 motion - 0.0062222
38
.0374807***
space -0.0071926 0.0052553 space -.0484052*** 0.006289
time 0.0038156 0.0053199 time -.0445057*** 0.0066833
work 0.0008674 0.00166 work .0164242*** 0.0021887 leisure .0071782** 0.0023915 leisure .0130314*** 0.003427
home 0.006434 0.0043519 home -.0453921*** 0.0066161
money -0.0030922 0.0024292 money -0.0051391 0.0032548 relig .0210733*** 0.0034952 relig -0.0046581 0.0041011
death 0.0008639 0.0030127 death -.0318445*** 0.0035731
informal -.0425716*** 0.0118284 informal .0930984*** 0.0239291
swear .0420372* 0.0194132 swear -.0880546*** 0.0254977
netspeak .0723359*** 0.0127824 netspeak -.1232127*** 0.0237879
assent .0372384* 0.0160917 assent -.1311895*** 0.024141
nonflu .0494158* 0.0245061 nonflu -.2068267*** 0.0323786
filler 0.0267815 0.0456328 filler -.3130882** 0.1183724 AllPunc 7.739408* 3.741554 AllPunc -6.304725 3.328843 Period -7.744548* 3.741533 Period 6.318225 3.328779 Comma -7.732728* 3.741566 Comma 6.297792 3.329286 Colon -7.754333* 3.741462 Colon 6.297283 3.328693 SemiC -7.737446* 3.741341 SemiC 6.477362 3.329162 QMark -7.758828* 3.741413 QMark 6.324913 3.328906 Exclam -7.702702* 3.741124 Exclam 6.324987 3.32881 Dash -7.746779* 3.741618 Dash 6.26629 3.328999 Quote -7.734568* 3.741543 Quote 6.325731 3.329029 Apostro -7.738896* 3.741546 Apostro 6.278157 3.328438 Parenth -7.729559* 3.741628 Parenth 6.254026 3.329028 OtherP -7.750184* 3.74168 OtherP 6.49741 3.328957 Constant 8.640831*** 0.1170498 Constant 9.698888*** 0.1404026 N. of cases 10995
N. of cases 7736
* p<0.05, ** p<0.01, *** p<0.001 * p<0.05, ** p<0.01, *** p<0.001
1
Table 9. Social media engagement for select LIWC variables for fake and real news
Real
Std.Err. Fake
Std.Err. Real
Std.Err. Fake
Std.Err. Real
Std.Err. Fake
Std.Err.
Hon
esty
you -
0.0005054
0.0074804
0.0054393
0.0174533
shehe .020339
9** 0.0071
053
-0.01175
65 0.0185
307 they .033735
7* 0.0132
893 -
0.06615 0.0344
992 ipron 0.00603
96 0.0047
287 0.00350
28 0.0125
475
posemo -
0.0049288
0.003041
0.0058447
0.0089931
social .0063965**
0.0024381
0.002645
0.0093069
verb 0.003576
0.0024343
0.0077503
0.0127022
auxverb 0.00265
01 0.0038
65
-0.02076
91 0.0162
472 discrep 0.00498
18 0.0070
465 0.01212
24 0.0194
406
Dec
eptio
n ppron
.0116342***
0.003335
-0.01734
79 0.0114
271
WPS
-0.01347
98 0.0117
394
-0.02305
76 0.0329
879 WC
.069466 0.0126 - 0.0324
2
2*** 702 0.0260575
413
conj
0.0078287
0.005736
.060497***
0.013284
time
0.0059873
0.0031954
-.0207025*
0.0088924
space
-.008772***
0.00243
0.0022985
0.0109538
motion
0.0042458
0.0040071
0.0133804
0.0170899
number
-0.00286
72 0.0028
443 0.00111
11 0.0151
45
quant
0.0007692
0.0052364
-0.04072
68 0.0291
835
Com
posi
te
Analytic
-.0022279**
0.0008144
-0.00172
87 0.0027
825
Authentic
-0.00055
24 0.0005
296 .0050372*
0.0023647
Tone
0.0003695
0.0005288
0.0003993
0.0024906
Clout
.0025064**
0.0009017
-.0071897*
0.0034598
power
-0.00408
97 0.0022
899
-0.01872
98 0.0106
787 affiliati
- 0.0044 - 0.0156
3
on 0.0074498
913 0.0127407
502
honesty
.020406**
0.0076531
0.0209639
0.0289207
Constant
9.140952***
0.0261888
10.03645***
0.1295322
8.712321***
0.0709611
10.41131***
0.2087039
9.315875***
0.0794741
10.55753***
0.298632
lnalpha Consta
nt .5257337***
0.012438
1.763488***
0.0167405
.5167651***
0.0132244
1.740095***
0.0162624
.5263228***
0.0125837
1.751881***
0.0173387
N. of cases 10995
7736
10995
7736
10995
7736
* p<0.05, **p<.01, *** p<0.001 Table 10. Topics (T) and Proportions (P) for Fake and Real News
State Junksci Hate Satire Fake
Conspiracy BS Bias Real
Order T P T P T P T P T P T P T P T P T P
1 T_4 0.2
2 T_31
0.31
T_10
0.10
T_20
0.25 T_1
0.10 T_30 0.08
T_20
0.06 T_2
0.14 T_1
0.08
2 T_28
0.05
T_33
0.17
T_20
0.09 T_8
0.14
T_45
0.09 T_13 0.07
T_10
0.05
T_13
0.08 T_2
0.04
3 T_25
0.05
T_36
0.05
T_26
0.07 T_7
0.09
T_13
0.08 T_2 0.05
T_21
0.05
T_19
0.07 T_5
0.04
4 T_35
0.05
T_10
0.05
T_19
0.06
T_33
0.07 T_2
0.07 T_33 0.05
T_26
0.04
T_21
0.07 T_6
0.04
5 T_22
0.04
T_20
0.04
T_21
0.06 T_5
0.05
T_21
0.06 T_21 0.05 T_2
0.04
T_20
0.07
T_12
0.04
6 T_5 0.0
4 T_26
0.03 T_2
0.05
T_29
0.03
T_23
0.05 T_4 0.05
T_13
0.04
T_10
0.03 T_4
0.04
7 T_2 0.0 T_8 0.0 T_2 0.0 T_2 0.0 T_3 0.0 T_20 0.04 T_1 0.0 T_7 0.0 T_3 0.0
4
3 4 3 9 5 1 3 0 5 9 4 3 4
8 T_2 0.0
3 T_18
0.03 T_7
0.05
T_10
0.02
T_20
0.05 T_12 0.04
T_30
0.04
T_30
0.03 T_8
0.04
9 T_40
0.03 T_6
0.02
T_34
0.04 T_1
0.02
T_33
0.04 T_10 0.04 T_4
0.03
T_14
0.03 T_9
0.04
10 T_27
0.03
T_15
0.02
T_30
0.03
T_38
0.02
T_10
0.04 T_19 0.03
T_41
0.03 T_1
0.03
T_10
0.04
Recommended