The evolving perception of controversial moviesamendola/files/final_with_app.pdf · book ratings are correlated to sales, and a positive correlation of variance-sales is observed

ARTICLEReceived 6 Jul 2015 | Accepted 2 Nov 2015 | Published 8 Dec 2015

The evolving perception of controversial moviesLuca Amendola1, Valerio Marra2 and Miguel Quartin3

ABSTRACT Polarization of opinion is an important feature of public debate on political,

social and cultural topics. The availability of large internet databases of users’ ratings has

permitted quantitative analysis of polarization trends—for instance, previous studies have

included analyses of controversial topics on Wikipedia, as well as the relationship between

online reviews and a product’s perceived quality. Here, we study the dynamics of polarization

in the movie ratings collected by the Internet Movie database (IMDb) website in relation to

films produced over the period 1915–2015. We define two statistical indexes, dubbed hard

and soft controversiality, which quantify polarized and uniform rating distributions, respec-

tively. We find that controversy decreases with popularity and that hard controversy is

relatively rare. Our findings also suggest that more recent movies are more controversial than

older ones and we detect a trend of “convergence to the mainstream” with a time scale of

roughly 40–50 years. This phenomenon appears qualitatively different from trends observed

in both online reviews of commercial products and in political debate, and we speculate that it

may be connected with the absence of long-lived “echo chambers” in the cultural domain.

This hypothesis can and should be tested by extending our analysis to other forms of cultural

expression and/or to databases with different demographic user bases.

DOI: 10.1057/palcomms.2015.38 OPEN

1 Institut für Theoretische Physik, Universität Heidelberg, Heidelberg, Germany 2 Departamento de Física, Universidade Federal do Espírito Santo, Vitória,Brazil 3 Instituto de Física, Universidade Federal do Rio de Janeiro, Rio de Janeiro, Brazil Correspondence: (email: [email protected])

PALGRAVE COMMUNICATIONS | 1:15038 |DOI: 10.1057/palcomms.2015.38 |www.palgrave-journals.com/palcomms 1

http://dx.doi.org/10.1057/palcomms.2015.38

mailto:[email protected]


http://www.palgrave-journals.com/palcomms

Introduction

Polarization in public opinion has been the subject of manyanalyses in the past, due to its important implications inpolitics, social sciences, economy and marketing. The

availability of large digital datasets has allowed in the last decadea quantitative real-world analysis that produced many newinsights into how polarization begins and evolves, as well as onhow cultural or demographic differences affect its phenomenol-ogy. This new area of research has also been accompanied bysophisticated mathematical models that simulate the exchange ofopinions and reveal the statistical mechanisms that underlieopinion dynamics, as for instance in Castellano et al. (2009).

Most analyses have so far been devoted to polarization in regardto political themes, as a way of better understanding the origin andtrends of possible conflicts among the citizens. Another area ofresearch, of particular interest to the field of marketing, hasdeveloped around polarization of consumers’ opinions aboutcommercial items, from restaurants to home goods. Automatedsearches and characterizations of controversial topics, such as thoseraised on social media, could allow politicians or marketers toeffectively identify and address complaints or concerns regardingpolicies or products.

A less substantial body of research has been directedtowards the quantitative assessment of polarizations in tastein relation to cultural products (such as films and literature)and their evolution with time. While the question of whethermodern democratic societies are becoming more or lesspolarized on political issues, such as gun control or abortion,has been repeatedly addressed and hotly debated (DiMaggioet al., 1996; McCarty et al., 2006; Baldassarri and Bearman,2007; Fiorina and Abrams, 2008; Fischer and Mattson, 2009;Garcia et al., 2015; Koutra et al., 2015), much less effort hasbeen devoted to similar questions about cultural expression, aswe discuss below. Controversiality can arise as a consequenceof artistic innovation, or because a work of art exploressensitive topics such as religion, politics, ethics or even simplybecause it appeals to a particular demographic section ofsociety and not others. Kostelanetz (2000) claims in ADictionary of the Avant-Gardes: “My basic measures ofavant-garde work are esthetic innovation and initial unaccept-ability”. A high degree of controversy can then be seen as a signof innovative work of art, although of course it is by no meansa sufficient condition.

This paper presents a contribution to this particular area bystudying the evolution of controversy in the public perception ofmovies and its dependence on demographic factors. We employthe data collected by the Internet Movie database (IMDb)website, where viewers rate movies from 1 (worst) to 10 (best).The IMDb database is a large and continuously updatedcatalogue that also includes demographic details related to age,gender and geographic origin. Using the data available we areable to formulate a mathematical rationalization of trends and,at the same time, offer perspectives and terminology that couldinform further future research. We expect that our methodologycould be applied to other forms of expression, such as music andliterature, and that the results of our enquiry could provide asuitable basis for psychological, sociological and philosophicalevaluations of the issues at hand.

Terminology. In this study we identify two different kinds ofcontroversiality: one in which the debate is polarized at the twoextremes of the spectrum, and another in which opinions dis-tribute evenly across it. Here, we will use the term “controversy”to refer to both these possibilities, and the term “polarization”only for the first kind of controversiality.

Literature reviewThe issue of controversy in the public debate has been most oftenanalysed on an episodic basis. A casual search of the keyword“controversy” or related terminology in any journal databaseretrieves hundreds of papers dealing with particular controversiesin any field of sciences or humanities. Some attempts atidentifying common patterns in controversies arising in scienceor philosophy have been presented in the literature, notablystarting with the famous work on the scientific revolutions byKuhn (1962). Although limited to analyses of a few cases inspecific fields, nevertheless these works have supported the ideathat controversies might have universal features.

The advent of the internet, of digital databases and of socialmedia, has allowed researchers to analyse for the first time hugequantities of data related to all fields of human expression. Socialmedia like Twitter and Wikipedia offer a fascinating opportunityfor extensive research on automated identification of controversialtopics, on their classification and on their common features. Akoglu(2014); Mejova et al. (2014) analyse US online news outlets andpolitical databases in search of reliable indicators of controversialityand of language features. Yasseri et al. (2014) investigates themulticultural aspects of Wikipedia by searching for the mostcontroversial topics, identified as those that were subjected to “editwars” between editors with different views. In Garimella et al.(2015) the authors focus on measuring the degree of controversy inTwitter conversations on particular topics, such as news items, withthe aim of automatizing the task of finding and comparingcontroversial news. In Koutra et al. (2015) the controversial topic ofgun control was selected and studied across web-sites in order totrack and analyse reactions to a shocking news event. Additionalstudies on Twitter and Wikipedia controversies include those byConover et al. (2011), Yardi and Boyd (2010), Rad and Barbosa(2012) and Pennacchiotti and Popescu (2010).

All these works deal with social media and, mostly, withpolitical controversies or other topics related to news or issues ofpopular interest, such as sport. Clearly such opinions are heavilyinfluenced by demography and by events like elections, sportresults, conflicts or economic developments (McCarty et al., 2006;Fiorina and Abrams, 2008; Yardi and Boyd, 2010; Garcia et al.,2015). One expects controversies in art and, specifically, in moviesto be less affected by the latter short-term events and, in particular,to feature different temporal trends. Controversy in cultural debatehas been the subject of only limited quantitative studies. In Huet al. (2009), the average distribution of ratings of books and DVDson Amazon was found to follow a universal J-shaped distribution.The same pattern is implicit in the findings of Cai et al. (2013)among raters of musical items. Similar research in an onlinecollaborative platform for scholarly projects instead found a ratingdistribution peaked at around 75% of the maximum score (Bell andIppolito, 2011). While the study of the average distribution ofratings is not our prime focus, here we compare our results to thesefindings. Godes and Silva (2012) finds a decrease of online ratingsfor books after initial reviews, attributed to different types of userswho post reviews at different stages: fans first, casual readers lateron; they report that the decrease seems to stabilize roughly one yearafter the first reviews. In Sun (2012a), the average and variance ofbook ratings are correlated to sales, and a positive correlation ofvariance-sales is observed when the average is below a certainthreshold, perhaps because the rating scatter attracts the attentionof potential readers.

Ratings have also been analysed in other contexts. Zhang et al.(2014) considered the temporal evolution of the average and thevariance of online restaurant ratings over a period of 10 years.They find that while the average rating steadily increases overtime, the variance correspondingly decreases. In Moe andSchweidel (2012) the authors find that frequent raters of home

ARTICLE PALGRAVE COMMUNICATIONS | DOI: 10.1057/palcomms.2015.38

2 PALGRAVE COMMUNICATIONS | 1:15038 |DOI: 10.1057/palcomms.2015.38 |www.palgrave-journals.com/palcomms




products tend to post worse and more varied ratings than casualraters, thereby increasing overall variance over time. These trendsof mean and variances bear some similarities, but also importantdifferences, with respect to our findings, as we will discuss in thenext sections.

A few works performed a statistical study of film ratings,although not connected to controversiality. In what can be con-sidered a pioneer work on social media ante litteram, Wanderer(1970) tests whether film ratings assigned by professional criticsagree with those assigned by normal viewers, finding only marginaldifferences. Holbrook (2005) employs IMDb data to assess the samequestion, finding a similarly high correlation among ratings, butalso detecting low correlation among movies recommended byprofessional critics and movie popularity (that is, frequently review-ed by ordinary users). In Moon et al. (2010), the authors findsignificant correlations between film ratings in internet databases,both from ordinary viewers and from professional critics, and boxoffice revenues. Liu (2006) finds that movie audiences are morecritical just after a movie is released, and that online activitycorrelates well with box-office revenues.

The database we employ in the current work, IMDb, hasbeen previously analysed from different perspectives. Kostakos(2009) compares IMDb with other sources of ratings anddiscusses the rating bias induced by the website design.Hoßfeld et al. (2011) analysed 2 million movie ratings andargued that substantial information could be gathered bystudying the standard deviations of these ratings (in contrast tojust the means). Koh et al. (2010) finds that online ratings inIMDb and in a similar Chinese website, douban.com, representthe underlying perceived quality of movies in a way thatdiffers among cultures (United States, China and Singapore).US citizens are more likely to underreport and, there-fore, more likely to produce an average score that is differentfrom the perceived average. US citizens also seem more likelyto give ratings that are closer to the minimum or maximumvote than previous ones, while Chinese users do the opposite.They find their result to be in agreement with expectationsbased on the “individualism-collectivism” and “long-termorientation” axes of Hofstede’s cultural dimensions theory(Hofstede, 2001). Otterbacher (2013) finds a clear linguisticdifference in film reviews written by males and females and alsoon their degree of perceived usefulness to other users. Toconclude, two main issues investigated in the present study,controversiality in the movie rating distributions and the long-term time trends of ratings assigned to the same movies, appearnot to have been investigated in the past.

MethodsIt is not obvious how to characterize controversiality on a quantitative basis. Here,we propose two basic ways. One is when a movie gets many very positive and manyvery negative ratings—a so-called “love-hate movie”. We term this “hard”controversiality. The other occurs when a movie splits opinions across a broaderspectrum, generating discord among voters. In the extreme case, this leads to aroughly equal distribution of votes across all possible ratings. We term thisphenomenon “soft” controversiality.1

We define the indexes H and S as normalized measures of hard and softcontroversiality, respectively: H is unity only for a completely polarized ratingdistribution of 1s and 10s, while S is unity only when each rating is assigned by1/10 of the users. Although both H and S vanish for unanimous distributions, weidentify least controversial movies with movies with lowest H. Indeed, as discussedbelow, H is built upon the concept of variance, and unanimity is connected withthe state of lowest variance. Low H, high S and high H characterize the three broadclasses into which distributions defined in a small range of values (ratings 1–10)will typically fall: peaked (i.e. a single, prominent peak), flat or polarized (i.e. twowell separated peaks), respectively.

Mathematically, we define H as a normalized standard deviation:

H ¼ 1cH

!X10

i¼1

piðri # rÞ2"1=2

ð1Þ

where pi= vi/N, vi is the number of votes of the rating ri= i, N is the total numberof votes,

r ¼X

i

piri ð2Þ

is the average rating, and cH2 is the highest possible variance. The largest valuecH= 4.5 is obtained if half the ratings are 1 and half are 10 (a completely polarizeddistribution). The use of the variance to characterize polarization has beenadvocated in the past by several authors (DiMaggio et al., 1996; Baldassarri andBearman, 2007; Hu et al., 2009; Hoßfeld et al., 2011; ; Zhang et al., 2014). Note that,although high H can indicate bimodality, that is, two peaks not necessarily locatedat the boundaries of the distribution, H is not optimized for this role (for acomparison with bimodality estimators see Supplementary Information:comparison with estimators of bimodality).

Soft controversiality S in turn can be defined as

S ¼ 1# 1cS

!X10

i¼1

ðpi # 0:1Þ2"1=2

ð3Þ

which is a square root of a χ2 statistics relative to the flat distribution with pi= 0.1,normalized with cS ¼

ffiffiffiffiffiffi0:9

pin such a way that it vanishes if all votes are given to a

single rating ri. To our knowledge, no estimator similar to S has been introducedbefore in the context of rating distributions.

ResultsWe evaluate then H and S for each of the feature movies listed inIMDb, from 1915 to 2014 (our data was collected in January 2015),with a breakdown in a few demographic categories. To avoidfluctuations arising from small-number statistics, we consider onlymovies with at least N= 1000 ratings.2 There are 19,017 featuremovies with this many ratings, with almost 440 million individualratings. This represents, for instance, a factor of four more ratingsthan the data used in Kostakos (2009) and two orders of magnitudelarger than in Hoßfeld et al. (2011). Figure 1 shows the number ofmovies as a function of the release year, while the average ratingdistribution for all movies and various demographic subgroups isgiven in Fig. 2 (for a comparison of this distribution with otheronline review distributions see Supplementary Information: theaverage rating distribution). The average rating is approximately 7,independent of the demographic subgroup considered.

We show in Fig. 3 the distribution of H (top panel) and S(bottom panel), respectively. In each plot, we show separately thedistributions of movies with votes in the range 1,000⩽No50,000(1–50k case) and N⩾ 50,000 (50k+ case); see SupplementaryFig. 2 for the overall distributions. The average H for the 1–50kcase is H¼ 0:45 and for the 50k+ case is H¼ 0:40. The average Sfor the 1–50k case is S¼ 0:72 and for the 50k+ case is S¼ 0:69.The latter averages already show that controversy decreases withthe number of votes N, which we take as an estimator of the“popularity” of a movie. The actual trends of H and S with respectto N are shown in Supplementary Fig. 2.

3500

3000

2500

2000

1500

1000

500

0

num

ber o

f mov

ies

1920 1940 1960 1980 2000

Year

Figure 1 | Number of movies as a function of film release year.Note: Only the 19,017 IMDb movies with more than 1000 ratings (as ofJanuary 2015) are considered.

PALGRAVE COMMUNICATIONS | DOI: 10.1057/palcomms.2015.38 ARTICLE





The distributions of votes for the twenty most controversialmovies are shown in Fig. 4 (top and middle panels). As expected,the rating distribution of movies with a high value of H is stronglypolarized, while movies with a high value of S have a very flatdistribution and all ratings are similarly represented. Thedistributions of the 20 least controversial movies (lowest H) areshown in the bottom panel. Note that these low-controversymovies are not just unanimously bad or good: the votes are insteadmostly peaked around 7 or 8, with an approximately Gaussiandistribution. Therefore, it appears as if the “concordance vote” isaround 7 or 8: people do not seem to be able to agree on, say, arating of 3. A peak in the ratings around 75% of the highest scorehas also been reported in Bell and Ippolito (2011) and, on a moreepisodic basis, in Sun (2012b). We do not only find that both meanand mode of the aggregate rating distribution is around 7–8 (seeFig. 2) but also that movies with this average rating are those withtighter consensus. The correlation between H, S and average ratingr is shown in Fig. 5. Good and bad movies are clustered in adjacentregions. Movies with average rating—those that, as explained

below, can have highest H and S—are clustered along aboomerang-like region of the parameter space. The fact that thisregion defines a non-invertible relation between H and S shows thecomplementarity of the two indexes of controversiality.

A flat distribution may be interpreted as a very broad-peakeddistribution. This situation of “no-consensus” corresponds toHflat= 0.64 (and of course S= 1). Therefore, peaked distributionscan only produce controversy indexes in the range 0⩽H⩽Hflat andwe adopt Hflat as the threshold for polarized distributions. The firstinteresting result we find is that high H controversiality is rare.There are 606 movies (3.6%) with H4Hflat for the 1–50k case andonly 5 movies (0.24%) for the 50k+ case. To make a comparison, weconsider the symmetric value of H around the middle value of 1/2:Hlow ≡ 1−Hflat= 0.36. We find 1,765 movies (10%) with HoHlowfor the 1–50k case and 605 movies (29%) for the 50k+ case.

It is interesting to note that if one assumes that each rating iscomposed of a linear combination of independent subcategories(the quality of the actors, screenplay, photography and so on),each with a particular distribution, then the Central Limit

1 2 3 4 5 6 7 8 9 10

5%

10%

15%

20%

rating (ri)

all 5 bins US males females 18-29

0%

Figure 2 | Average rating distribution for all movies.Note: Black: all movies. Dashed grey: all movies, binned according to a 5-star system. Red: only United States of America users. Dark blue: onlymales. Green: only females. Light blue: only age group 18–29. For the 5-star system, we divide the ratings by 2 and round up, and we also showhalf the probability for better comparison with the other distributions.

2%

4%

6%

8%

2%

4%

6%

8%

0.0 0.2 0.4 0.6 0.8 1.0

H

S

0.0 0.2 0.4 0.6 0.8 1.0

1–50k 50k+

Figure 3 | Frequency histogram of controversiality.Note: Top: histogram of frequencies of H. The red vertical line marks thethreshold of high controversiality Hflat. Bottom: histogram of frequenciesof S.

0.6

0.4

0.2

0.0

0.6

0.4

0.2

0.0

0.6

0.4

0.2

0.01 2 3 4 5 6 7 8 9 10

P(x=

r i)P(

x=r i)

P(x=

r i)

ri

Figure 4 | High- and low-controversy rating distributions.Note: Top: Distribution of ratings for the 20 movies with highest hardcontroversy. In Fig. 5 these movies correspond to the first 20 points fromthe right. Middle: Same for soft controversy. In Fig. 5 these moviescorrespond to the first 20 points from the top. Bottom: Same for the 20least controversial movies (lowest H). In Fig. 5 these movies correspondto the first 20 points from the left.

10

9

8

7

6

5

4

3

2

1

r–

S

0.8

0.6

0.4

0.2

0.3 0.4 0.5 0.6

H

0.7 0.8 0.9

Figure 5 | Correlation of the hard controversy index H against the softcontroversy index S.Note: Movies are represented by dots which are coloured according totheir average rating r.






Theorem would imply that the final rating distribution should bewell approximated by a Gaussian. Under such an assumption,polarized (high H) movies are characterized by a failure of thisscenario, with a few subcategories dominating the overall vote orstrongly correlated with each other. This issue could beinvestigated with more detailed data as part of future analysis.

Release year trends. As indicated earlier, we show that both Hand S decrease with growing N. This seems to tell us that eithercontroversial movies do not become very popular or that theylose their controversiality as they do so. The opposite possibility,that controversiality induces a widespread debate that in turnleads to increased popularity (of the kind apparently detected inSun, 2012a, for low-rated books) is not supported by our findings;perhaps this effect, if present at all, is lost among the other factorsof popularity, such as advertisement, language, genre and so on.For more details and a summary plot of the relevant correlationsamong parameters see Supplementary Information: correlations.

Our database spans a full century of movies and can be furtherpartitioned into various demographic categories according to agegroup, gender and nationality (see Table 1). Figure 6 shows how thestatistics r, H and S vary with movie release year for the variouscategories. Several conclusions can be drawn from these results:first, older movies have a substantially higher average rating thanmore recent movies (see Fig. 6, top panel). A straightforwardexplanation is that old movies are watched again and rated only ifthey are good. It is interesting to note—perhaps surprisingly in viewof previous studies that showed marked cultural or gender-relateddifferences (Koh et al., 2010; Otterbacher, 2013)—that this trend isrobust against age, gender and geographic origin of the voters: theslope does not change significantly when particularizing the analysisto a given demographic subset. The effect of popularity is insteadwell visible in all plots and implies a marked correlation with N (formore details see Supplementary Information: correlations).

In regard to trends of the controversiality indexes H and S, asalready mentioned, a movie can have a high H or S only if r is notfar from 5.5 (otherwise the distribution cannot be polarized norflat). Therefore, we consider a bin 4oro7 centred around 5.5 inorder to remove any correlation of H and S with r. The trend of Hand S shows that controversy increases with release year. This resultalso remains true after organizing the data according to gender, agegroup or geographic origin of the votes, and is stronger for popularmovies when analysing separately the 1–50k and 50k+ cases. Allthese trends are shown in the last two panels of Fig. 6.

We also find that, compared to the case “all” of Fig. 6, if oneconsiders separately movies produced before 1990 there is littlechange in the trends relative to r and H, and only a moderateincrease in slope of the S trend. Therefore, the ever-increasingrate of movie production (see Fig. 1) does not have a significantimpact on the release year trends studied in this section.

We note that a source of bias can potentially come fromusers who try to rig the votes in some way. The IMDb employs anon-disclosed algorithm to minimize the impact of theseunrepresentative votes, and the final rating of a movie is obtainedfrom a weighted mean, in which suspicious votes count for less(IMDb, 2015). Therefore, large values of the estimator D= |weighted

mean− arithmetic mean| should signal movies with rigged ratingdistributions. We tested the effect of discarding all movies withD⩾ 1 and concluded it has little impact on the trend of H and Swith respect to release year, as can be seen in Fig. 6.

In summary, we find that the trends of r, H and S with releaseyear are robust against different binning strategies and againstdemographic and quality cuts. All the linear regressions shown inFig. 6 feature a slope that is greater than zero at more than 20σconfidence level, which corresponds to a P-value of 10− 88. Theonly exception is the case 50k+, which, due to the smaller sample,has a slope greater than zero at 7σ confidence level (P-value of10 − 12). We take our reference sample to be composed of the9,566 movies with 4rrr7, 1–50k number of votes, quality cutD ⩽ 1 and votes from all demographics together. Thecorresponding linear regressions are:

r¼ 6:5# 0:014ðyear# 2000Þ

H¼ 0:48þ 0:0013ðyear# 2000Þ

S¼ 0:77þ 0:00087ðyear# 2000Þ ð4Þ

Table 1 | Number of movies and total number of ratings (in millions) of the various analyses and subcategories of Fig. 6

All The United States Males 18–29 1–50k 50k+ D≤ 1

Analysis of r films 19,017 19,017 19,017 19,017 16,924 2,093 18,855ratings 437M 93M 303M 173M 123M 314M 436M

Analyses of H and S films 10,540 10,753 11,035 9,927 9,651 889 10,455ratings 161M 34M 116M 55M 70M 92M 161M

8.5

8.0

7.5

7.0

6.5

0.55

0.45

0.35

0.250.80

0.75

0.70

0.65

0.60

r–

H

S

1920 1940 1960 1980 2000Year

all US males 18–29 1–50k 50k+ D≤1

Figure 6 | Release year trends.Note: Linear regression trends of r (top panel), H (middle) and S(bottom) as a function of film release year. In the case of H and S onlymovies with 4oro7 have been considered. Black: all movies (with votesfrom all demographics), including error bars representing the error of themean. Red: only US users. Dark blue: only males. Light blue: only agegroup 18–29. Green: only movies with less than 50,000 votes. Purple:only movies with more than 50,000 votes. Dotted orange: only moviessatisfying the quality cut Do1. See Table 1 for the number of movies andthe total number of ratings of the various subcategories.






Evolution with respect to time of observation. In the previoussection we analysed how average ratings and hard and soft con-troversiality depend on the movie release year p.3 Here weconsider how the parameters r, H and S depend on the time t ofobservation, that is, of data collection. In order to perform arobust test of these behaviours, we need two surveys wellseparated in time. We thus make use of preliminary data collectedin January 2013, almost exactly two years before the main data setused in this work. This preliminary data were limited to moviesproduced between 1950 and 2012 and with more than 5,000votes, for a total of 6,030 movies, with no breakdown indemographic subgroups.

This analysis is particularly important as far as our subjectmatter is concerned for it is not obvious how to interpret the factthat old movies appear to be less controversial than recent ones.This might be either because the film industry is producing morecontroversial movies (the sophistication scenario)4 or becausemovie-goers are likely to find a film more controversial when it isfirst produced, but their strong love/hate feelings fade over time(the convergence scenario). A combination of both scenarios is ofcourse also possible. This reflects the eternal debate about“modern art” (Gans, 1974; Siegel, 1982; Levine, 1988; Kuspit,1991; Newman, 2009): is art becoming more and more abstruseand, therefore, controversial, or are people more likely to accept anartistic avant-garde when they are temporally distanced from it?In the first scenario, the H index obtained in a survey performedat the time t relative to movies produced in year p—we call thisquantity Hp(t)—remains constant when changing t. In the secondscenario, Hp(t) decreases when increasing t. The same shouldapply to S. As we show below, there is evidence that theconvergence scenario is favored over the sophistication scenario.

In Fig. 7 we show the change of the parameters r, H and S as afunction of the parameter itself for each of the 6,030 moviesmentioned above. A clear correlation is evident, which suggestsa simple linear model for the average evolution in t of theparameters:

dXp

dt¼ # 1

tXðXp # X&Þ ð5Þ

where X represents the value of r, H and S at time t as due to allvotes casted at times previous than t. In other words, we aredealing with the temporal evolution of integrated quantities. Apositive value of τX means that the evolution converges to X*. Thecharacteristic time τX represents the time scale for such evolution.

A fit of data gives the values listed in Table 2, from which oneconcludes that hard and soft controversiality of movies do indeedconverge to low non-controversial values (compare H* and S*with the distributions of Fig. 3) with very similar characteristictimes of approximately 40–50 years.5 After this period, acontroversial movie is likely to be classified as fully mainstream.The average rating converges towards 5 with a significantly longercharacteristic time. Interestingly, this makes the evolution of Hand S become independent of r for, as discussed before, thecontroversiality indexes are uncorrelated with the average ratingif the latter takes central values. The statistical significance of thecharacteristic times is always very high, as one can infer from theerrors quoted in Table 2.

Three important temporal factors could affect the voting trendsin recent decades. First, movies released more recently have had ashorter voting time than earlier movies. Furthermore, since IMDblaunched in 1990, only movies released after this date had achance of being voted for when first screened, when one expectsmovies to be discussed more—except for famous classic films.Finally, internet usage has increased dramatically and continu-ously since IMDb was launched. Within our data this is

demonstrated by the fact that the average number of ratingshas grown significantly with respect to the release year(see Supplementary Information: Correlations). In order to studyhow important the consequences of these trends are, we show inTable 2 the characteristic times for movies produced before andafter 1990. We obtain similar parameters regarding S and r but20% different characteristic times as far as H is concerned.A more thorough assessment of these effects would involveanalysing the evolution of non-integrated estimators. This canform the basis of future research.

Then we studied how the characteristic times depend on thenumber of votes N. Table 2 shows the characteristic times of r,H and S for two non-overlapping bins in popularity and Fig. 8shows the characteristic times for movies with number of votesgreater than a given number in the x-axis. A clear trend isevident: the controversiality of popular movies decreases fasterthan that of less popular movies. This is in agreement with thefindings in Fig. 6, where very popular movies are seen to show afaster rise in controversiality with release year then less popularones. This could imply that movies that are more “discussed”reach consensus more rapidly, that is, converge faster to apeaked distribution. This seems to support a scenario of“convergence through interaction”, as often studied in socialdynamics (Baldassarri and Bearman, 2007; Castellano et al.,2009), in contrast to the opposite pattern of the so-called “echochamber” effect (Jamieson and Cappella, 2010), that is, strong

0.30.20.10.0–0.1–0.2–0.3

0.02

0.00

–0.02

–0.04

–0.06

0.04

0.02

0.00

–0.02

–0.04

3 4 5 6 7 8 9

0.3 0.4 0.5 0.6 0.7 0.8 0.9

0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90

S

∆r–∆H

∆S

H

r–

Figure 7 | Evolution of average rating and controversiality.Note: Change of r, H and S between 2013 and 2015 (2 years) as afunction of r, H and S, respectively, for the 6,030 movies producedbetween 1950 and 2012 with more than 5,000 ratings. The red dashedline represents a linear regression to the data. The bins in the plots arecolor coded blue-to-yellow according to how many movies they contain.






interaction only within like-minded communities. The evolu-tion of soft controversy seems to be more sensitive to popularitythan hard controversy. Finally, the same analysis relative to rshows that the characteristic time increases, suggesting thatpopular movies have a more stable average rating.

As we have seen, the fitted parameters depend on the moviepopularity, which in turn depends on t. Nonetheless, as thedependence is not very strong we can treat τX as constant andintegrate the previous equation to obtain the following approx-imate exponential behaviour:

XpðtÞ # X& ¼ ðX0 # X&Þe#t#t0tH ð6Þ

where X0=Xp(t0).The results outlined here support the idea that while all movies

are created on average equal (i.e. the distribution of the initialamount of controversiality is almost independent of releaseyear), the subsequent evolution de-polarizes the audiences. As aconsequence, today older movies appear less controversial thanmodern ones. It is interesting to note that previous works on onlinerating trends reported either an increase in mean and a decrease invariance (Zhang et al., 2014), or vice versa (Moe and Schweidel,2012). Our findings, that the mean converges to the central value ofthe spectrum and at the same time the variance decreases (thoughwith a different characteristic time), a phenomenon that we term“convergence to mainstream”, appear to be novel.

Future workOne of our main results is that controversial films slowly lose theircontroversiality as time passes, converging to mainstream in thescale of 4–5 decades in a way described by an exponential decay.Although this simple model fits the data well, it might be the casethat the true governing law is different. Since we have only usedtwo datasets, collected in January 2013 and January 2015, wecannot elaborate further on this issue. One would need datacollected at further epochs to get a more refined understanding ofthe convergence. Another limitation is that since our 2013 surveydid not include demographic data we cannot estimate thedependence of the characteristic times on different user categories.

Further future research could involve obtaining data thatincludes the epoch of each vote. This would allow us to study theevolution of instantaneous, rather than integrated, parameters(H, S and r), which would prove helpful in order to understand theunderlining dynamics of patterns of controversiality in movies.

Our results, like all those based on online ratings or reviews,may also be limited by underreporting biases (Hu et al., 2009) andother unknown biases inherent in every survey (Schuman andScott, 1987). It must be clear that here we can only discuss thecontroversiality among the IMDb raters and not of the generalpublic at large; still, invoking the same argument as in Schumanand Scott (1987), we might expect that the changes, rather thanthe absolute values, within the same group are a robust indicatorof public opinion.

As Koh et al. (2010) identified, US users in IMDb stronglyunderreport their opinions, distinctively more so than Chineseusers of douban.com. Since we find no strong difference amongUS users and non-US users in our IMDb data, this may implythat all IMDb users have similar behaviour (which in itself meritsfurther investigation). Koh et al. (2010) also claim that otherwebsites based on a recommendation system (like MovieLens)and not purely on online reviewing are less subject to under-reporting bias. Therefore, it would be valuable to conduct asimilar study on data from other movie review sites.

ConclusionsIn the present work we address the issue of controversiality offilms from a quantitative and statistical point of view. We findthat old movies are significantly better rated than more recentones, and popular movies are significantly better rated than lesspopular ones. In addition, we find that controversy decreases withpopularity and that hard controversy is relatively rare: only 3.6%of the movies are above the value that separates peakeddistributions from polarized distributions. Furthermore, we findthat modern movies are judged to be more controversial than oldones and, at the same time, we find a trend of “convergence tomainstream” with a time scale of roughly 40–50 years. A decreasein the characteristic time scale with popularity seems to support ascenario of convergence through interaction. We also find thatthe average rating converges to the central value of the spectrum

Table 2 | Fits of the parameters of equation (5) for all movies, and for two bins in popularity and release year

all 5·103≤No5·104 5·104≤No106 1950≤ yro1990 1990≤ yro2013

films 6,030 4,627 1,403 1,439 4,591

tr 120± 5 113± 5 170± 20 110± 7 109± 5r& 4.7±0.5 4.8±0.5 4± 1 5.6±0.8 4.7±0.5τH 45± 1 42± 1 41± 2 37± 2 48± 1H* 0.21±0.02 0.24±0.02 0.19±0.03 0.25±0.03 0.20±0.02τS 42± 2 42± 2 33± 2 41± 2 43± 2S* 0.60±0.05 0.61±0.05 0.59±0.07 0.60±0.07 0.59±0.06

350

300

250

200

150

100

45

40

35

30

25

20

4.0 4.5 5.0 5.5

log10N

years

years

Figure 8 | Dependence of characteristic time on number of votes N.Note: Characteristic times for the set of movies with number of votesgreater than the number in the x-axis. Note that for large value of N theslope of the correlation relative to r shown in Fig. 7 is consistent withzero; therefore, the corresponding very large values of the characteristictimes are not shown in the plot.






with a longer time scale of 100–150 years. These results have beenfound to be robust with respect to demographic breakdown(whenever we could perform such breakdown). Finally, weobserve that the overall distribution of votes does not follow asimple J or U-shape (see Supplementary Information: the averagerating distribution), as other online review systems do (Hu et al.,2009; Koh et al., 2010). Instead, it exhibits a double edge-peakedtrimodal distribution, resembling a JV shape, which is alsouniversal among demographic subgroups. The causes behind thisunusual shape merit further analysis.

The history of art, and of culture in general, has witnessed thephenomenon in which avant-gardes that have been initiallyconsidered controversial, have then over time become accepted aspart of the mainstream (unless they disappear without leavingsignificant traces). It is worth noting the many derisive epithetsinitially employed against avant-gardes (for example, impres-sionism, fauvism, constructivism, cubism) that are today themainstay of the most important museums of modern art.Similar processes seem also to constellate the scientific debate,where consensus ultimately often emerges after years or decadesof heated discussion, as in the famous controversy over thefoundations of quantum mechanics between, among others,Einstein and Bohr, or the debate of early twentieth centuryastronomy on the nature of nebulae.6 Our results, althoughlimited to a particular form of expression, movies, might be seento be a confirmation of these phenomena. In contrast, politicalor ethical debate is often characterized by increasingpolarization (or ideological radicalization, as is often denotedin this context) (Sunstein, 2002; Garcia et al., 2015; Koutra et al.,2015), although the extent of this process has been debated in,for instance, DiMaggio et al. (1996), Fiorina and Abrams (2008)and Fischer and Mattson (2009). When radicalization occurs,sometimes it has been attributed to the “echo chamber” effect, areinforcing of previous opinion through interaction only withinclose and homogeneous communities. One can then hypothesisethat echo chambers are harder to build or to maintain in theartistic or cultural realm than in the political or ethical one,where organized groups might have a great and direct advantagein polarizing their audiences. In view of a broader discussion ofsuch speculative arguments, the extension of the quantitativeanalysis of controversiality to other online databases withdifferent demographic user bases and other forms of culturalexpression such as musical and literary compositions appears tobe a pressing task.

Notes1 Similar controversy patterns are found in other areas tarrow (Dascal, 1998; Tarrow,2008).

2 This threshold refers to the full dataset; demographic subgroups can have fewer votes.3 In the previous sections we used “year” to denote p. We will use “year” and pinterchangeably.

4 As pointed out by Douglas Adams in The Hitchhiker’s Guide to the Galaxy: TheRestaurant at the End of the Universe, the third phase of every major galactic civili-zation is sophistication.

5 “Au théâtre, comme dans tous les arts, il y a les gens qui voient et les aveugles-nés.II faut toute une vie à ces derniers pour s’habituer aux grandes choses et ce n’estqu’aprés avoir entendu rabâcher pendant cinquante ans: ceci est beau, qu’ils se rendentau jugement des autres”. A 1923 quote from French theatre director charles ullin, citedin Bishop (1964).

6 Many other scientific case-studies are discussed in, for example, Machamer et al.(2000).

ReferencesAkoglu L (2014) Quantifying political polarity based on bipartite opinion networks.

In International AAAI Conference on Web and Social Media, North America,May, http://www.aaai.org/ocs/index.php/ICWSM/ICWSM14/paper/view/8073.

Baldassarri D and Bearman P (2007) Dynamics of political polarization. AmericanSociological Review; 72 (5): 784–811.

Bell J and Ippolito J (2011) When the rich don’t get richer: Equalizing tendencies ofcreative networks. Leonardo; 44 (3): 260–261.

Bishop T (1964) Changing concepts of avant-garde in xxth century literature.The French Review; 38 (1): 34–41.

Cai T, Cai H J, Zhang Y, Huang K and Xu Z (2013) Polarized score distributions inmusic ratings and the emergence of popular artists. In Science and InformationConference (SAI), 7–9 October, pp 472–476, http://ieeexplore.ieee.org/xpl/abstractAuthors.jsp?arnumber=6661781.

Castellano C, Fortunato S and Loreto V (2009) Statistical physics of socialdynamics. Reviews of Modern Physics; 81 (2): 591–646.

Conover M D, Ratkiewicz J, Francisco M, Goncalves B, Menczer F and Flammini A(2011) Political polarization on twitter. In International AAAI Conference onWeb and Social Media, North America, July, https://www.aaai.org/ocs/index.php/ICWSM/ICWSM11/paper/view/2847.

Dascal M (1998) Types of polemics and types of polemical moves. In CmejrkovaJH, Mullerova O and Svetla J (eds). ‘S’, Max Niemeyer, Dialogue Analysis VI(Proceedings of the 6th Conference, Prague 1996), Vol. 1. Tubingen, pp 15–33.

DiMaggio P, Evans J and Bryson B (1996) Have american’s social attitudes becomemore polarized? American Journal of Sociology; 102 (3): 690–755.

Fiorina M P and Abrams S J (2008) Political polarization in the american public.Annual Review of Political Science; 11 (1): 563–588.

Fischer C S and Mattson G (2009) Is America fragmenting? Annual Review ofSociology; 35, 435–455.

Gans H (1974) Popular Culture and High Culture. An Analysis and Evaluation ofTaste. Basic Books: New York.

Garcia D, Abisheva A, Schweighofer S, Serdült U and Schweitzer F (2015)Ideological and temporal components of network polarization in onlinepolitical participatory media. Policy & Internet; 7 (1): 46–79.

Garimella K, De Francisci Morales G, Gionis A and Mathioudakis M (2015)Quantifying controversy in social media. ArXiv e-prints 1507.05224, http://arxiv.org/abs/1507.05224.

Godes D and Silva J C (2012) Sequential and temporal dynamics of online opinion.Marketing Science; 31 (3): 448–473.

Hofstede G (2001) Culture’s Consequences: Comparing Values, Behaviors,Institutions, and Organizations Across Nations, second edition, SagePublications: CA.

Holbrook M (2005) The role of ordinary evaluations in the market for popular culture:Do consumers have “good taste”? Marketing Letters; 16 (2): 75–86.

Hoßfeld T, Schatz R and Egger S (2011) SOS: The MOS is not enough! In Qualityof Multimedia Experience (QoMEX), 2011 Third International Workshop, IEEE,7–9 September, pp 131–136, http://ieeexplore.ieee.org/xpl/abstractAuthors.jsp?arnumber=6065690.

Hu N, Zhang J and Pavlou P A (2009) Overcoming the j-shaped distribution ofproduct reviews. Communications of the ACM; 52 (10): 144–147.

IMDb. (2015) http://www.imdb.com/help/showleaf?votes.Jamieson K H and Cappella J N (2010) Echo Chamber. Rush Limbaugh and the

Conservative Media Establishment. Oxford University Press: New York.Koh N S, Hu N and Clemons E K (2010) Do online reviews reflect a product’s true

perceived quality? An investigation of online movie reviews across cultures.Electronic Commerce Research and Applications; 9 (5): 374–385, Special Sectionon Strategy, Economics and Electronic Commerce.

Kostakos V (2009) Is the crowd’s wisdom biased? A quantitative analysis of threeonline communities. In Computational Science and Engineering. CSE ‘09.International Conference, Vol. 4, 29–31 August 2009, pp 251–255.

Kostelanetz R (2000) A Dictionary of the Avant-Gardes. Psychology Press:Routledge, London.

Koutra D, Bennett P N and Horvitz E (2015) Events and controversies: Influencesof a shocking news event on information seeking. In Proceedings of the 24thInternational Conference on World Wide Web (WWW). International WorldWide Web Conferences Steering Committee, Republic and Canton of Geneva,Switzerland, pp 614–624, http://dl.acm.org/citation.cfm?id=2736277.2741099.

Kuhn T S (1962) The Structure of Scientific Revolutions. University of ChicagoPress: Chicago.

Kuspit D (1991) The appropriation of marginal art in the 1980s. American Art;5 (1): 132–141.

Levine L (1988) Highbrow/lowbrow: The Emergence of Cultural Hierarchy inAmerica. Harvard University Press: Cambridge, MA.

Liu Y (2006) Word of mouth for movies: Its dynamics and impact on box officerevenue. Journal of Marketing; 70 (3): 74–89.

Machamer P, Pera M and Baltas A (2000) Scientific Controversies: Philosophicaland Historical Perspectives. 1st edn. Oxford University Press: New York.

McCarty N, Poole K T and Rosenthal H (2006) Polarized America: The Dance ofIdeology and Unequal Riches, Walras-Pareto Lectures, MIT Press.

Mejova Y, Zhang A X, Diakopoulos N and Castillo C (2014) Controversy andsentiment in online news. Computation and Journalism Symposium 2014.24–25 October. Columbia University, New York, NY.



http://www.aaai.org/ocs/index.php/ICWSM/ICWSM14/paper/view/8073

http://ieeexplore.ieee.org/xpl/abstractAuthors.jsp?arnumber=6661781


https://www.aaai.org/ocs/index.php/ICWSM/ICWSM11/paper/view/2847

https://www.aaai.org/ocs/index.php/ICWSM/ICWSM11/paper/view/2847

http://arxiv.org/abs/1507.05224

http://arxiv.org/abs/1507.05224



http://dl.acm.org/citation.cfm?id=2736277.2741099




Moe W W and Schweidel D A (2012) Online product opinions: Incidence,evaluation, and evolution. Marketing Science; 31 (3): 372–386.

Moon S, Bergey P K and Iacobucci D (2010) Dynamic effects among movie ratings,movie revenues, and viewer satisfaction. Journal of Marketing; 74 (1): 108–121.

Newman M Z (2009) Indie culture: In pursuit of the authentic autonomousalternative. Cinema Journal; 48 (3): 16–34.

Otterbacher J (2013) Gender, writing and ranking in review forums: A case study ofthe IMDb. Knowledge and Information Systems; 35 (3): 645–664.

Pennacchiotti M and Popescu A-M (2010) Detecting controversies in twitter:A first study. In Proceedings of the NAACL HLT 2010 Workshop onComputational Linguistics in a World of Social Media, WSA ‘10, Associationfor Computational Linguistics, Stroudsburg, PA, pp 31–32, http://dl.acm.org/citation.cfm?id= 1860667.1860683.

Rad H S and Barbosa D (2012) Identifying controversial articles in Wikipedia:A comparative study. In Proceedings of the Eighth International Symposiumon Wikis and Open Collaboration, 27–29 August. WikiSym: Linz, Austria.

Schuman H and Scott J (1987) Problems in the use of survey questions to measurepublic opinion. Science; 236 (4804): 957–959.

Siegel M B (1982) Vanguard meets the mainstream. The Hudson Review; 35 (1):99–104.

Sun M (2012a) How does the variance of product ratings matter? ManagementScience; 58 (4): 696–707.

Sun S (2012b) Why reader review metascores are meaningless, http://stephsun.com/metascores.html, accessed November 2015.

Sunstein C R (2002) The law of group polarization. Journal of Political Philosophy;10 (2): 175–195.

Tarrow S (2008) Polarization and convergence in academic controversies. Theoryand Society; 37 (6): 513–536.

Wanderer J J (1970) In defense of popular taste: Film ratings among professionalsand lay audiences. American Journal of Sociology; 76 (2): 262–272.

Yardi S and Boyd D (2010) Dynamic debates: An analysis of group polariza-tion over time on twitter. Bulletin of Science, Technology & Society; 30 (5):316–327.

Yasseri T, Spoerri A, GrahamM and Kertész J (2014) The most controversial topics inwikipedia: A multilingual and geographical analysis. In: Fichman P and Hara N(eds) Global Wikipedia: International and Cross-Cultural Issues in OnlineCollaboration. Rowman & Littlefield Education, pp 25–48.

Zhang Y, Lappas T, Crovella M and Kolaczyk E D (2014) Online ratings: Convergencetowards a positive perspective? In Acoustics, Speech and Signal Processing

(ICASSP), 2014 IEEE International Conference, 4–9 May, pp 4788–4792,http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6854511.

Data AvailabilityThe datasets analysed during the current study are not publicly available due IMDbconditions of use but are available from the corresponding author on reasonable request.

Author ContributionsLuca Amendola, Valerio Marra and Miguel Quartin contributed equally to this work.

AcknowledgementsDatasets were courtesy of IMDb (http://www.imdb.com) and were used with permission.The authors have no affiliation with IMDb; nor is IMDb involved any way in our study.We thank Fabiana V. Campos, Giulio Marra and Ana Luisa Santos for useful comments.In Fig. 5 we have adopted the color palette optimized for colour-blind people developedby Paul Tol (available at personal.sron.nl/pault). MQ is grateful to Brazilian researchagency CNPq for support and to the University of Heidelberg for hospitality.

Additional InformationSupplementary Information: accompanies this paper at http://www.palgrave-journals.com/palcomms

Competing interests: The authors declare no competing financial interests.

Reprints and permission information is available at http://www.palgrave-journals.com/pal/authors/rights_and_permissions.html

How to cite this article: Amendola L, Marra V and Quartin M (2015) The evolvingperception of controversial movies. Palgrave Communications. 1:15038 doi: 10.1057/palcomms.2015.38.

This work is licensed under a Creative Commons Attribution 3.0International License. The images or other third party material in this

article are included in the article’s Creative Commons license, unless indicated otherwisein the credit line; if the material is not included under the Creative Commons license,users will need to obtain permission from the license holder to reproduce the material.To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/



http://stephsun.com/metascores.html

http://stephsun.com/metascores.html

http://ieeexplore.ieee.org/xpl/articleDetails.jsp?arnumber=6854511

http://creativecommons.org/licenses/by/3.0/




The evolving perception of controversial moviesSUPPLEMENTARY INFORMATION

Luca Amendola,1 Valerio Marra,2 and Miguel Quartin3

1Institut für Theoretische Physik, Universität Heidelberg, Philosophenweg 16, 69120 Heidelberg, Germany2Departamento de Física, Universidade Federal do Espírito Santo,Vitória, ES, Brazil

3Instituto de Física, Universidade Federal do Rio de Janeiro, 21941-972, Rio de Janeiro, RJ, Brazil

THE AVERAGE RATING DISTRIBUTION

Hu et al. (2009) (see also Cai et al. (2013)) finds thatthe ratings of books, DVDs and videos on Amazon.comfollow a universal J-shaped distribution (i.e. peaked onthe right), with many very positive ratings (five stars),some negative ratings (one star) and relatively few rat-ings of two, three or four stars. The large number ofpositive reviews is attributed to a “purchase bias”, thatencourages buyers to buy products that they expect to begood. Furthermore, the over-representation of extremeratings is traced to the psychological tendency of express-ing an opinion preferentially when it is a strong one.

Although we confirm that IMDb movies also have anabundance of positive ratings and a smaller peak of neg-ative ratings, the distribution does not follow a simpleJ-shaped distribution. As shown in Figure 2 (main text)the distribution is universally trimodal, irrespective ofdemographic. There is a peak around 7-8 and peaks atboth edges, so that the distribution resembles more aJV shape. We depict in Figure 2 (main text), besidesdemographic subgroups, also the distribution binned ac-cording to only 5 ratings (akin to a 5-star system). As aresult of the lower resolution, the peak at the rating of10 disappears.

The di�erence compared to other online review distri-butions could be due to a variety of reasons. An averageuser probably has watched many dozens or hundreds ofmovies and is able to compare them with each other moreaccurately than products purchased online. Moreover, itseems plausible that there is a lower “purchase bias” forfilms as arguably they are watched more casually thanonline purchases are made. We nevertheless leave a care-ful study of this phenomenon to future work.

CORRELATIONS

Here we will present additional details regarding thecorrelations in the data we use. In Figure 1 we show thedistribution of number of votes for separate bins of 25years. One can see how recent popular movies are morepopular than older popular movies.

In Figure 2 we show the complete array of correlationsamong the relevant parameters. The dashed black linerepresents a linear regression. In particular, the fact thatboth hard and soft controversiality decrease with popu-larity is shown.

1915<year<19401940<year<19651965<year<19901990<year<2015

��

��%

��%

��%

��

FIG. 1. Supplementary Figure: Distribution of num-ber of votes N for separate bins of 25 years.

COMPARISON WITH ESTIMATORS OFBIMODALITY

It has been advocated by DiMaggio et al. (1996) thatthe kurtosis Ÿ is a good estimator of bimodality, which isa feature of hard controversiality. Here we will, therefore,compare the performance of H with the one of Ÿ and alsowith the one of Sarle’s bimodality coe�cient b. Theseestimators are defined as:

Ÿ = µ4µ2

2, b = µ2

3/µ32 + 1

µ4/µ22

,

where the n-th moment about the mean µn is:

µn =10ÿ

i=1pi(ri ≠ r̄)n .

The kurtosis is bounded from below, Ÿ Ø 1 (Shohat1929). Unity is obtained if all the votes are equally splitbetween ri and rj , independently of i and j. Clearlythis is not a welcome property as far as the subject ofthis paper is concerned as a movie with ratings split at1 and 10 would be degenerate with a movie with ratingssplit at 9 and 10. Sarle’s bimodality coe�cient is alwaysless than or equal to unity, b Æ 1 (Shohat 1929). Thisestimator su�ers from the same problem of Ÿ. Indeed,if all the votes are equally split between ri and rj thenthe third moment (proportional to the skewness) is zeroand b = 1/Ÿ. Moreover, for a Dirac distribution withpi = 1 one also has – after taking appropriately the limit– b = 1 (and Ÿ = Œ and H = 0). Summarizing, movieswith high H have b and Ÿ close to unity. However, theopposite is not necessarily true. On a theoretical basis

2

FIG. 2. Supplementary Figure: Correlations between the parameters characterizing the movies of our dataset.The bins in the plots are color coded blue-to-yellow according to how many movies they contain.

one concludes, therefore, that b and Ÿ are not good esti-mators for hard controversy.

Let us now see how b and Ÿ behave when applied to ourmovie dataset. Regarding b, as shown in Figure 3 (redline) one encounters in the dataset examples of distribu-tions with b but not H close to unity. This is entirelydue to the presence of the third moment in the defini-tion of b. One can conclude that the limitations of b aremanifest within our dataset. Regarding Ÿ, distributions

with Ÿ close to unity do tend to have also high H, as canbe seen in Figure 3 (blue line). This is the case becauseour dataset does not contain examples of distributionspolarized at values di�erent from 1 and 10. We concludethat the kurtosis is an adequate alternative to H as faras strong polarization is concerned (though very di�erentresults are obtained for movies with low H). We thinkthat H is preferable firstly because does not su�er fromthe degeneracy problems discussed above, and secondly

3

� � � � ��

��

�(�=� �)

FIG. 3. Supplementary Figure: Rating distributionswith Sarle’s bimodality coe�cient b and kurtosis closeto unity. It is shown in red a rating distribution with H =0.59 and b = 0.99 and in blue a distribution with H = 0.873and Ÿ = 1.19. See Appendix for details.

because lower moments are to be preferred as more ro-bust than higher ones.

CONSISTENCY TEST REGARDING THEFUTURE TREND

In order to check the robustness of our findings regard-ing the evolution of the controversiality of movies withthe survey year, we have carried out the following con-sistency test. We binned the data according to variouschoices of number of votes and average rating, and calcu-lated the corresponding characteristic times. We dividedthe data used in each iteration in two or more bins ac-cording to the movies’ release year. We then computedthe average mean of H and S for the latter bins usingseparately the datasets obtained in 2013 and 2015. Weremind the reader that the characteristic times of theexponential decay are computed from the change in con-troversiality of same movies between the two catalogs.Finally, using the 2013 mean as initial conditions for theexponential decay we predicted the 2015 values, whichturned out to agree very well with the actual 2015 val-ues.

LISTS OF MOST AND LEAST CONTROVERSIALMOVIES

Here we list most and least controversial movies. Thefollowing tables are provided in order to show a possi-ble application of the controversy indexes we have intro-duced. They are not part of the statistical analysis of themain text which deals with trends in the population andnot with specific movies.

In order to remove less well-known movies we restrictthe following tables to the top 25% voted movies of eachyear. We also impose the quality cut D Æ 1 in orderto remove cases where there is suspicion of vote rigging.We show below the following three sets of tables: (i) ac-cording to the decade the film was released; (ii) the most

and least controversial movies of all time; (iii) the mostand least controversial movies for di�erent demographiccategories.

Controversial movies by decadeHighest Hard Controversy votes r̄ H

10s Shoulder Arms (1918) 4k 6.4 0.6320s Greed (1924) 6k 7.7 0.5830s Reefer Madness (1936) 6k 4.1 0.6040s Ivan the Terrible, Part I (1945) 6k 7.3 0.5950s Plan 9 from Outer Space (1959) 28k 4.3 0.7160s Hercules in New York (1969) 13k 3.8 0.6670s Pink Flamingos (1972) 15k 6.4 0.6480s Grease 2 (1982) 23k 4.8 0.6490s Troll 2 (1990) 22k 3.0 0.7000s High School Musical 3: Seni[...] (2008) 38k 4.9 0.7410s In the Land of Blood and Honey (2011) 30k 3.8 0.84

Highest Soft Controversy votes r̄ S

10s The Birth of a Nation (1915) 15k 6.7 0.8120s The Broadway Melody (1929) 4k 6.7 0.7530s Reefer Madness (1936) 6k 4.1 0.8240s The Fountainhead (1949) 6k 7.1 0.7750s This Island Earth (1955) 6k 5.9 0.8160s The Green Berets (1968) 9k 5.7 0.8670s Caligula (1979) 25k 5.5 0.8980s Grease 2 (1982) 23k 4.8 0.8890s Showgirls (1995) 45k 4.8 0.8800s Southland Tales (2006) 31k 5.8 0.9210s Spring Breakers (2012) 92k 5.2 0.90

Lowest Hard Controversy votes r̄ H

10s Broken Blossoms or The Yel[...] (1919) 6k 7.7 0.4220s Blackmail (1929) 6k 7.1 0.3230s Sabotage (1936) 10k 7.2 0.3140s The Stranger (1946) 12k 7.5 0.3150s To Catch a Thief (1955) 39k 7.6 0.3060s A Fistful of Dollars (1964) 111k 8.1 0.3070s Escape from Alcatraz (1979) 75k 7.7 0.2780s Mississippi Burning (1988) 58k 7.9 0.3090s Donnie Brasco (1997) 199k 7.9 0.2800s The Last King of Scotland (2006) 127k 7.8 0.2810s Dallas Buyers Club (2013) 241k 8.1 0.27

4

Controversial movies of all timeHighest Hard controversy votes r̄ H

In the Land of Blood and Honey (2011) 30k 3.8 0.84Happy New Year (2014) 17k 6.8 0.80LOL (2012) 39k 5.0 0.75High School Musical 3: Senior[...] (2008) 38k 4.9 0.74Ek Tha Tiger (2012) 16k 5.5 0.73Hannah Montana: The Movie (2009) 28k 4.6 0.73You Got Served (2004) 23k 3.7 0.72Freddy Got Fingered (2001) 37k 5.0 0.72Chennai Express (2013) 28k 6.6 0.72Dhoom: 3 (2013) 28k 6.0 0.71Plan 9 from Outer Space (1959) 28k 4.3 0.71Troll 2 (1990) 22k 3.0 0.70Spice World (1997) 27k 4.1 0.70The Twilight Saga: B.D. - Part 1 (2011) 158k 5.6 0.70Get Rich or Die Tryin’ (2005) 30k 5.1 0.68Highest Soft controversy votes r̄ S

Southland Tales (2006) 31k 5.8 0.92Spring Breakers (2012) 92k 5.2 0.90Cosmopolis (2012) 33k 5.1 0.89Caligula (1979) 25k 5.5 0.89The Wiz (1978) 10k 5.8 0.89Showgirls (1995) 45k 4.8 0.88Grease 2 (1982) 23k 4.8 0.88Nothing But Trouble (1991) 16k 5.0 0.88The Lords of Salem (2012) 17k 5.2 0.88Halloween III: Season of the Witch (1982) 25k 4.7 0.88Howard the Duck (1986) 30k 4.8 0.88Bio-Dome (1996) 20k 4.8 0.88Halloween II (2009) 31k 5.1 0.88The Three Stooges (2012) 23k 5.5 0.87Queen of the Damned (2002) 39k 5.6 0.87Lowest Hard controversy votes r̄ H

Escape from Alcatraz (1979) 75k 7.7 0.27Dallas Buyers Club (2013) 241k 8.1 0.27The Last King of Scotland (2006) 127k 7.8 0.28Donnie Brasco (1997) 199k 7.9 0.28Serpico (1973) 65k 7.8 0.28The Lincoln Lawyer (2011) 153k 7.5 0.28The Fighter (2010) 232k 8.0 0.28The Damned United (2009) 28k 7.7 0.29The Bourne Identity (2002) 339k 8.0 0.29The Counterfeiters (2007) 33k 7.7 0.29Catch Me If You Can (2002) 398k 8.1 0.29Rush (2013) 222k 8.3 0.29The Intouchables (2011) 369k 8.7 0.29Philomena (2013) 59k 7.8 0.29Misery (1990) 116k 7.9 0.29

Highest Hard controversy (demographics) votes r̄ H

MalesThe Room (2003) 17k 4.2 0.85God’s Not Dead (2014) 12k 5.3 0.85In the Land of Blood and Honey (2011) 19k 3.5 0.80Happy New Year (2014) 13k 6.9 0.79Ek Tha Tiger (2012) 12k 5.4 0.72FemalesIn the Land of Blood and Honey (2011) 11k 4.4 0.89You Got Served (2004) 8k 4.2 0.76Dhoom: 3 (2013) 7k 6.3 0.74Chennai Express (2013) 7k 6.9 0.73Shoulder Arms (1918) 1k 5.8 0.73US usersThe Room (2003) 8k 4.3 0.86Napoleon (1927) 2k 5.7 0.85God’s Not Dead (2014) 6k 6.0 0.83An American Carol (2008) 4k 5.8 0.82High School Musical 3: Senior[...] (2008) 10k 4.9 0.81Non-US usersIn the Land of Blood and Honey (2011) 28k 3.7 0.84Happy New Year (2014) 17k 6.8 0.80LOL (2012) 34k 5.1 0.75Ek Tha Tiger (2012) 15k 5.5 0.74Chennai Express (2013) 27k 6.7 0.7218–29 years old age groupThe Room (2003) 13k 4.3 0.85In the Land of Blood and Honey (2011) 14k 3.6 0.83Happy New Year (2014) 10k 6.6 0.80You Got Served (2004) 10k 4.5 0.75High School Musical 3: Senior[...] (2008) 23k 4.9 0.7430–44 years old age groupIn the Land of Blood and Honey (2011) 9k 3.9 0.83Freddy Got Fingered (2001) 15k 4.9 0.72Postal (2007) 8k 5.0 0.72Plan 9 from Outer Space (1959) 10k 4.3 0.69Manos: The Hands of Fate (1966) 10k 2.5 0.6945+ years old age groupIn the Land of Blood and Honey (2011) 7k 4.0 0.86Happy New Year (2014) 5k 6.6 0.81Napoleon (1927) 2k 6.6 0.78Ek Tha Tiger (2012) 4k 5.9 0.77Chennai Express (2013) 8k 6.8 0.74

5

Highest Soft controversy (demographics) votes r̄ S

MalesSouthland Tales (2006) 24k 5.8 0.91Halloween III: Season of the Witch (1982) 18k 4.8 0.89Spring Breakers (2012) 60k 5.3 0.88Caligula (1979) 18k 5.5 0.88Nothing But Trouble (1991) 11k 5.0 0.88FemalesSouthland Tales (2006) 7k 5.8 0.91Caligula (1979) 7k 5.4 0.91Cosmopolis (2012) 10k 5.2 0.90An American Haunting (2005) 8k 5.5 0.90The Three Stooges (2012) 5k 5.6 0.89US usersSouthland Tales (2006) 10k 5.9 0.91An American Haunting (2005) 7k 5.4 0.90Spring Breakers (2012) 16k 5.5 0.90Cosmopolis (2012) 5k 5.2 0.90Xanadu (1980) 4k 5.8 0.90Non-US usersSouthland Tales (2006) 21k 5.8 0.91Spring Breakers (2012) 76k 5.2 0.89Cosmopolis (2012) 29k 5.1 0.89Showgirls (1995) 32k 4.9 0.88Caligula (1979) 19k 5.6 0.8818–29 years old age groupSpring Breakers (2012) 48k 5.2 0.89Cosmopolis (2012) 14k 5.3 0.89Southland Tales (2006) 10k 6.2 0.89Attack of the Killer Tomatoes! (1978) 2k 5.2 0.89Grease 2 (1982) 6k 4.9 0.8830–44 years old age groupSouthland Tales (2006) 14k 5.7 0.92Halloween III: Season of the Witch (1982) 10k 4.8 0.90Grease 2 (1982) 9k 4.8 0.89White Chicks (2004) 21k 5.3 0.89Nothing But Trouble (1991) 8k 5.1 0.8845+ years old age groupSouthland Tales (2006) 7k 5.6 0.93Xanadu (1980) 5k 5.8 0.91Caligula (1979) 9k 5.4 0.90Cosmopolis (2012) 7k 5.0 0.90The Wiz (1978) 4k 5.6 0.90

Lowest Hard controversy (demographics) votes r̄ H

MalesDallas Buyers Club (2013) 163k 8.0 0.27Escape from Alcatraz (1979) 59k 7.7 0.27Donnie Brasco (1997) 151k 7.9 0.27Catch Me If You Can (2002) 270k 8.1 0.28The Last King of Scotland (2006) 94k 7.8 0.28FemalesThe Intouchables (2011) 126k 8.8 0.28Escape from Alcatraz (1979) 16k 7.7 0.29Dallas Buyers Club (2013) 78k 8.2 0.29The Last King of Scotland (2006) 33k 7.9 0.29Philomena (2013) 22k 7.9 0.29US usersEdge of Tomorrow (2014) 33k 8.1 0.29Escape from Alcatraz (1979) 14k 7.7 0.30The Bourne Identity (2002) 66k 8.1 0.30Brute Force (1947) 2k 7.8 0.30Serpico (1973) 14k 7.8 0.30Non-US usersEscape from Alcatraz (1979) 60k 7.7 0.27Dallas Buyers Club (2013) 209k 8.1 0.27The Last King of Scotland (2006) 105k 7.8 0.27The Damned United (2009) 25k 7.7 0.28Donnie Brasco (1997) 163k 7.9 0.2818–29 years old age groupDallas Buyers Club (2013) 117k 8.1 0.27Catch Me If You Can (2002) 185k 8.3 0.27Donnie Brasco (1997) 76k 8.0 0.27The Last King of Scotland (2006) 51k 7.9 0.27Escape from Alcatraz (1979) 26k 7.8 0.2730–44 years old age groupEscape from Alcatraz (1979) 28k 7.7 0.27Dallas Buyers Club (2013) 64k 7.9 0.27Donnie Brasco (1997) 75k 7.8 0.28The Last King of Scotland (2006) 48k 7.8 0.28Serpico (1973) 24k 7.8 0.2845+ years old age groupEscape from Alcatraz (1979) 21k 7.6 0.28The Lincoln Lawyer (2011) 32k 7.4 0.29Dallas Buyers Club (2013) 60k 8.1 0.29Serpico (1973) 18k 7.8 0.29The Bourne Identity (2002) 83k 7.9 0.30

6

Cai, T., Cai, H., Zhang, Y., Huang, K. and Xu, Z. (2013),Polarized score distributions in music ratings and the emer-gence of popular artists, in ‘Science and Information Confer-ence (SAI), 2013’, pp. 472–476. (document)DiMaggio, P., Evans, J. and Bryson, B. (1996), ‘Have amer-ican’s social attitudes become more polarized?’, AmericanJournal of Sociology 102(3), pp. 690–755.URL: http://www.jstor.org/stable/2782461 (document)

Hu, N., Zhang, J. and Pavlou, P. A. (2009), ‘Overcomingthe j-shaped distribution of product reviews’, Commun. ACM52(10), 144–147.URL: http://doi.acm.org/10.1145/1562764.1562800 (docu-ment)Shohat, J. (1929), ‘Inequalities for moments of frequencyfunctions and for various statistical constants’, 21(1/4), 361–375.URL: http://www.jstor.org/stable/2332566 (document)

Documents

The evolving perception of controversial moviesamendola/files/final_with_app.pdf · book ratings are correlated to sales, and a positive correlation of variance-sales is observed