SocInfo14 - On the Feasibility of Predicting News Popularity at Cold Start

On the Feasibility of Predicting News Popularity at Cold Start Ioannis Arapakis, B. Barla Cambazoglu, Mounia Lalmas Yahoo Labs, Barcelona

Background Information §  Until now news popularity prediction has relied for the most part

on: •  on early-stage measurements •  user-generated content

§  Cold-start prediction has been investigated mostly in the context of recommender systems*

*R. Bandari, A. Sitaram, and B. A. Huberman. The pulse of news in social media: Forecasting popularity. In Proc. 6th Int’l Conf. Weblogs and Social Media, 2012.

§  We follow the same experimental setting and reproduce the performance results reported in Bandari et al.

§  We improve the methodology and integrate the right performance metrics in a step-by-step fashion

§  We introduce a large number of new features which may further help predict future article popularity

§  In addition to tweet counts, we also use the view counts of article pages

Scope

News Dataset

§  News corpus of 13,319 news articles from Yahoo News, crawled over a period of two weeks

§  To quantify the popularity of news we considered two metrics: •  number of times an article was posted/shared in Twitter (Tweets) •  number of times an article was viewed by users (page views)

§  For each crawled article we sampled these metric values every 30' over a period of one week after the article’s publication

§  337 observations per article

100 101 102 103 104

Rank of the article (log scale)100

101

102

103

104N

umbe

r of t

wee

ts (l

og sc

ale)

0 1 2 3 4 5 6 7Time (in days)

0.0

0.2

0.4

0.6

0.8

1.0

Num

ber o

f tw

eets

(nor

mal

ized

)

0 1 2 3 4 5 6 7Time (in days)

0

5

10

15

20

25

30

Num

ber o

f tw

eets

Fig. 1: Tweet counts of articles. Fig. 2: Tweet counts over time.

Feature Engineering §  Time §  News source §  Genre §  Length §  NLP §  Sentiment analysis §  Entity extraction §  Wikipedia §  Twitter §  Web search

Experiments §  We start by reproducing the classification results presented in

Bandari et al. for Tweets §  We split two weeks of articles into three classes based on

their tweet counts: •  A (low popularity) [1, 20] •  B (medium popularity) (20, 100] •  C (high popularity) ) (100, ∞)

§  We experiment with the same classifiers (NB, Bagging, J48, SVM) and include a baseline (majority class)

§  We make predictions for one hour, one day, and one week after an article is published

Results

Classifier Tweets

Hour Day Week Baseline .840 .710 .703

NB .693 .581 .574

Bagging .858 .749 .741

J48 .856 .781 .775

SVM .859 .802 .797

Table 1: Accuracy (ten-fold cross validation, without zero-popularity articles)

Classifier Tweets


NB .735 .589 .584

Bagging .858 .737 .740

J48 .852 .779 .774

SVM .861 .803 .798

Table 2: Accuracy (training/test split, without zero-popularity articles)

Results

Classifier Tweets


NB .772 .642 .633

Bagging .886 .780 .769

J48 .883 .805 .804

SVM .890 .829 .825

Table 3: Accuracy (training/test split, with zero-popularity articles)

Class Tweets

Hour Day Week A .871 .746 .740

B .125 .227 .231

C .004 .027 .029

Table 4: Fraction of instances in each of the three popularity classes

Results

Actual Predicted

A B C A 4,698 247 0

B 728 812 0

C 98 96 0

Table 5: The confusion matrix for (Tweets, Week)

Class Tweets

Hour Day Week BaselineR 1.701 1.931 1.950

LR 1.132 1.270 1.305

KNNR 1.537 1.720 1.753

SVM 1.135 1.278 1.315

Table 6: Root mean squared error (training/test split, with zero-popularity articles)

Results

Table 7: Performance in terms of the Kendal Tau and recall@k metrics

Tweets Pageviews

Hour Day Week Hour Day Week

R@10 .000 .000 .000 .000 .000 .000

R@100 .240 .110 .090 .010 .020 .060

R@1000 .578 .557 .548 .212 .173 .245

Conclusions §  Predicting the news popularity at cold start is not a solved problem §  Classifiers are biased to learn unpopular articles due to the

imbalanced class distribution §  Highly popular articles could not be accurately detected,

rendering the predictions not useful in most practical scenarios §  News popularity may be more accurately predicted if early-stage

popularity measurements are incorporated into the prediction models as features

§  Increasing the duration of such measurements will increase the accuracy of predictions but decrease their importance, leading to an interesting trade-off

Questions?

This work was supported by MULTISENSOR project, partially funded by the European Commission, under the contract number FP7-610411

[email protected]

iarapakis

http://www.slideshare.net/iarapakis/

Science

SocInfo14 - On the Feasibility of Predicting News Popularity at Cold Start