Predicting News Popularity by Mining Online Discussions

Preview:

Citation preview

Predicting News Popularity byMining Online DiscussionsGeorgios Rizos, Symeon Papadopoulos and Yiannis Kompatsiaris

Centre for Research and Technology Hellas (CERTH) – Information Technologies Institute (ITI)

SNOW/WWW 2016, April 12, 2016, Montréal, Québec, Canada.

Popularity

Operational definition ( $$$)• Number of views• Number of comments• Number of users commenting• Number of shares• Number of thumbs up, thumbs down….• Bonus: Controversiality

#2

#3

Overview

#4

Online discussions Comment trees/user graphs Tree/Graph Features

Predictive model

Comment Trees and User Graphs

#5

User XI blah blah….

User YBut bla bla….

User XNo!

User WAre you kiddin?

User ZWat?

X

X

XZY

W

Y

WZ

A

Comment Tree

User Graph

User A story poster

A

#6

Sony Hack #1 VS Sony Hack #2

#7

#8

10 mins

2 2

#9

30 mins

4 4

#10

1 hour

7 13

#11

1 ½ hour

57 10

#12

14358

2 ¼ hours

#13

DegreeDepth

Elements of an Engaging Discussion

Related Work: Post Popularity

News story discussion size prediction:• Modelling the comment timestamp time-series (Tatar

et al., 2014)• Feature set relevant to time of posting (hour, day),

entities mentioned, etc. (Tsagkias et al., 2009)These do not leverage the graph structure!

Online forum thread analysis:• Prediction of thread overall quality. Also includes

rudimentary graph features (Lee et al. 2014)Only very simple comment tree features.

#14

Related Work: Post Diffusion

Twitter hashtag popularity prediction:• Prediction based on adoption graph properties and

communities (Weng et al., 2014)• Prediction based on geolocation and adoption graph

conductance (Bora et al., 2015)

Facebook post share count prediction:• Prediction based on share graph, author and

temporal property features (Cheng et al., 2014)Setting different than online discussion mining.

#15

Related Work: Discussion Mining

Uncovering other graph based qualities:• Comment tree h-index hypothesized to be a proxy for

discussion controversiality (Gomez et al., 2008)• A user-comment h-index variation hypothesized to

be a proxy for political discussion deliberation (Gonzalez-Bailon et al., 2010)

• Share graph Wiener index shown to be a proxy of quality/interestingness of the initial post via an SIS diffusion simulation (Goel et al., 2015)

Indices not applied for popularity prediction!

#16

Comment Tree Features

#17

• Quantification of depth, width, bushy-ness, the existence of multiple long threads and branching complexity of comment tree structure.

User Graph Features

#18

• Quantification of user recurrence and branching complexity of user graph structure.

Temporal Features

#19

• Quantification of growth rate of the discussion using simple measures borrowed from (Cheng et al. 2014)

Datasets

#20

• We used three datasets of news story posts and online discussions.

• RedditNews dataset: Sample of posts in news-based subreddits made in 2014 (thanks derp.institute!)

Evaluation: Model building

• Random Forest regression to handle inhomogeneous features

• Prediction targets:– Comment count: all datasets– User count: eponymous users only; all datasets– Score: #upvotes - #downvotes; RedditNews only– Controversiality: #disagreements; RedditNews only

• Score and Controversiality are penalized for small number of votes

• Different models built for different timepoints in the discussion evolution corresponding to 1%-14% of the stories lifetime (1% ~ 10 minutes)

#21

Evaluation: VS Simple Graph Features

#22

• The proposed graph features capture lead to better prediction compared to rudimentary graph features.

• Large improvement when target is controversiality.

Evaluation: VS Simple Graph Features

#23

Evaluation: Feature Type Comparison

#24

• Comment tree features good for comment prediction and user graph for user count.

• Integration of all feature types yields best results, except for controversiality where all_graph is best.

Evaluation: Feature Type Comparison

#25

Evaluation: Top-100 Controversial

#26

• Which stories will be the most controversial?• We report the Jaccard Coefficient (x100) between

true top-100 and the top-100 predicted by the prediction framework using the three feature sets.

Conclusion

• Key contributions– Improved popularity prediction using lightweight graph-

based features– Post controversiality prediction showed significant

improvement

• Future Work– Leveraging other information modalities, such as text– Investigate dependence on the topic category of a story or

the type of the post (e.g., text post or multimedia).

#27

Thank you!

• Resources:Code: https://github.com/MKLab-ITI/news-popularity-predictionOnline demo: http://reveal-mklab.iti.gr/reveal/popularity

• Get in touch:@sympap / papadop@iti.gr@georgios_rizos/ georgerizos@iti.gr

#28

References (1/2)

• A. Tatar, P. Antoniadis, M. D. De Amorim, and S. Fdida. From popularity prediction to ranking online news. Social Network Analysis and Mining, 4(1):1–12, 2014.

• M. Tsagkias, W. Weerkamp, and M. De Rijke. Predicting the volume of comments on online news stories. In Proceedings of the 18th ACM Conference on Information and Knowledge Management, pages 1765–1768. ACM, 2009.

• J. Lee, M. Yang, and H. Rim. Discovering high-Quality threaded discussions in online forums. Journal of Computer Science and Technology, 29(3):519–531, 2014.

• L. Weng, F. Menczer, and Y.-Y. Ahn. Predicting successfulmemes using network and community structure. arXivpreprint arXiv:1403.6199, 2014.

• S. Bora, H. Singh, A. Sen, A. Bagchi, and P. Singla. On therole of conductance, geography and topology in predictinghashtag virality. arXiv preprint arXiv:1504.05351, 2015.

#29

References (2/2)

• J. Cheng, L. Adamic, P. A. Dow, J. M. Kleinberg, andJ. Leskovec. Can cascades be predicted? In Proceedings ofthe 23rd international conference on World Wide Web,pages 925–936, 2014.

• V. G´omez, A. Kaltenbrunner, and V. L´opez. Statistical analysis of the social network and discussion threads in slashdot. In Proceedings of the 17th intern. conference on World Wide Web, pages 645–654. ACM, 2008.

• S. Gonzalez-Bailon, A. Kaltenbrunner, and R. E. Banchs. The structure of political discussion networks: a model for the analysis of online deliberation. Journal of Information Technology, 25(2):230–243, 2010.

• S. Goel, A. Anderson, J. Hofman, and D.J. Watts. The structural virality of online diffusion. Management Science, 62(1): 180–196, 2015

#30

Recommended