CUbRIK research on social aspects

CUbRIK Summer School 2014


Mining, Analyzing and Exploiting Community Feedback on the Web

Sergiu Chelaru

L3S Research Center, Hannover


2-4/07/2014 CUbRIK Summer School 1

Community Feedback on the Web

Comments: a way to communicate with users and/or communities



Outline Comment-Centric Feedback

Comment Ratings

Polarized Content

Controversial Comments

Trolls

Social Feedback

Query Result Characteristics

Social Features

Learning to Rank using Social Features

Community Sentiment in Web Queries

Analysis of Sentiment in Web Queries

Detecting Query Sentiment

Two Application Scenarios

Summary and Contributions



Comment Centric Feedback

YouTube dataset 756 Google Zeitgeist keywords

50 videos, metadata, 500 comments

67k videos, 6 mil comments

Yahoo! News dataset Yahoo! RSS Feed, Sept-Dec 2011

27k news stories

5.4 mil comments

Descriptive statistics for the

YouTube and Yahoo! News

corpora.



Comment-Centric Feedback

Distribution of number of comments for videos in

YouTube and news stories in Yahoo! News.



Comment Ratings

Distribution of comment ratings for (a) YouTube, and (b) Yahoo! News.

(a) (b)



Term Analysis of Rated Comments

Top-50 terms according to

their MI values for accepted

comments (with high

comment ratings) vs. not

accepted comments (with low

comment ratings).




Examples of

comments

belonging to

the categories

“accepted”.




Examples of

comments

belonging to

the categories

“unaccepted”.



Sentiment Analysis of Rated Comments

Does language and sentiment used by the community have an influence on comment ratings?

Three disjoint partitions:

5Neg: comments with rating score r<= -5

0Dist: comments with rating score r = 0

5Pos: comments with rating score r>=5

Comparison of mean senti-values for comments with different

kinds of community ratings in (a) YouTube and (b) Yahoo! News.



Ratings and Polarized Content

Variance of Comment Ratings as Indicator for

Polarizing Videos



Ratings and Polarized Content

Variance of Comment Ratings as Indicator for

Polarizing Topics



Predicting Comment Ratings

Classify comments into accepted by the community and not accepted AC_POS

AC_NEG

THRESH-0

Text processing: stopwords removal, stemming

𝑐1, 𝑙1 , … , 𝑐𝑛, 𝑙𝑛 Rating thresholds for “accepted” vs “not

accepted”

Different amounts for training set size T




Comment rating classification: BEPs for different training set sizes T and

different rating thresholds.




Precision-recall curves for

comment rating prediction.


2-4/07/2014CUbRIK Summer School

16


In many platforms

“For some reason, a lot of you thing that rich people pay

NO taxes? They pay taxes even though 50% of Americans

do not. What Obama wants to do is RAISE their taxes.

That’s not fair. Let’s make sure everyone pays taxes and

politicians use tax money in a sensible way before we

raise taxes on a few.”

10 15

comment_rating = #likes - #dislikes




Examples of

comments belonging

to the categories

“controversial” and

“non-controversial”.



Term Analysis Controversial of Comments

bank: criticized because of their role in the financial crisis, comments are approved by a large majority of the users.

Top-20 terms according to their MI values for controversial vs.

non-controversial comments.



Analysis of likes and dislikes

Comment Approval Ratio

Φ 𝑐 =𝑙𝑐

𝑙𝑐+𝑑𝑐

𝑙𝑐 (𝑑𝑐) :number of likes (dislikes) for a comment 𝑐

Controversy Interval0.5 − δ𝐶≤ Φ 𝑐 ≤ 0.5 + δ𝐶 , δ𝐶= 0.1

Non-controversy Interval0.5 − δ𝑁𝐶≤ Φ 𝑐 ≤ 0.5 + δ𝑁𝐶 , δ𝑁𝐶∊ [0.1, 0.2, 0.3]



Analysis of likes and dislikes

(a) Distribution of number of comments per comment approval intervals

for distinct thresholds for the number of received ratings. (b) Controversy

interval vs. accepted (positive) and not accepted (negative) intervals.

(a) (b)



Predicting Controversial Comments

Co

BEPs for controversial comment prediction.

Note that:

• BEPs relatively low

• Results implementable

• Trading recall for precision leads

to applicable results: P = 0.859

for R = 0.1

Precision-recall curve for the classification of

controversial comments for δ𝑁𝐶 = 0.4



Trolls on the Social Web

Trolls: “posting disruptive, false or offensive comments to fool and provoke other users”

Study comment rating feedback for troll/non-troll

users

Study methods for automatically detecting the presence of trolls

Slashdot No More Trolls: 200 trolls, 200 non trolls, 24 comments / user

YouTube dataset



Trolls on the Social Web

Johny1

Mexican, Puerto Rican, Cuban ... whocares?

I love that this Negro says/ sings: "If I WERE a boy."

I would feel awful about admitting being a Republican.

I hope Britney Slut will die of Swine flu.

I love that this Negro says/ sings: "If I WERE a boy."

All I want is that she doesn't rape valuable classical songs. Even a diva like this Beyoncé doesn't have the right to commit such a crime.

Johny2

you obviously have no idea what you are talking about.

Shut up you douchebag.

Moron.If the religious groups did not subject their will on to everyone, there would not even need to be an atheist title. No one would care.

Perhaps people with speak issues should be euthanized.

Kinda the point there, dipshit.

You are quite the ignorant fuckwit. They do look like crap, you have no idea what you're talking about. Most likely don't have the device either.Moron.

Examples of troll users in YouTube (Johny1) and Slashdot (Johny2).



Term Analysis of Troll Comments

Top-20 terms according to their MI values for troll vs. non-troll

comments.



Trolls and Community Ratings

(a) (b)

Comment rating distribution for comments from troll users and non-

troll users in (a) YouTube and, (b) Slashdot.



Content-based Troll Prediction

Linear SVM, 2-fold cross validation

BEP: 0.68 for YouTube, 0.74 for Slashdot




Comment Ratings

Polarized Content


Trolls

Social Feedback


Social Features









Social Feedback



Contribution

What are the characteristics of the YouTube query results with respect to the social features?

How effective is each individual feature for ranking the videos for a given query?

Can social features help improving the video retrieval performance in a learning to rank (LETOR) framework?



Data Collection

Query Sets

1,4k popular queries (𝑄𝑝)

1,3k tail queries (𝑄𝑡)

Video Sets

𝑉𝑝: 132k videos retrieved for 𝑄𝑝

𝑉𝑡: 63k videos retrieved for 𝑄𝑡




Category distribution of (a) popular, and (b) tail queries

(a) (b)




Number of results (reported by YouTube) for (a) popular, and (b) tail queries

(a) (b)




Avg. no. of (a) views, (b) likes, (c) dislikes and (d) comments vs.

video rank in the query results



Data Annotation

100 queries, 100 videos/query =>10k videos



Basic and Social Features

The list of all the basic and social features (F) employed in our work.



Effectiveness of Features

Fraction of queries for which a given feature yields the ranking with the

highest NDCG@10 for (a) popular, and (b) tail queries

(a) (b)



Video Retrieval Framework

7 LETOR algorithms

Feature Selection

GAS

MMR

(q, F, r)

5-fold cross validation

NDCG@10, NDCG@5

Train 7 Letor

Models

Run Prediction

Models

Build k dimensional

Query-Video Pairs

NDCG

Top k Feature

Selection

Train Queries+Videos

Test Queries+Videos

for k ∊ {1,...,# features}



LETOR Results for Popular+Tail

Average NDCG@10 scores for LETOR algorithms using the basic and best-k

features obtained with the GAS and MMR strategies for the popular and tail

query sets (for bold cases, differences from the baseline are statistically

significant). For GAS and MMR, we also denote the number of selected

features (k) in parentheses.




Comment Ratings

Polarized Content


Trolls

Social Feedback


Social Features









Contribution

Analysis of sentiment in Web queries

Study the applicability of state-of-the-art sentiment analysis methods for detecting the sentiment of the queries

Employ query sentiment detectors in two use cases, query recommendation and controversial topic discovery



What is Sentiment Analysis

1

Examples of positive (top) and negative (bottom) opinionated

reviews for the movie Madagascar 3:Europe’s most wanted.



Data Collection

50 controversial topics from procon.org and Wikipedia (e.g.,abortion, iphone, marijuana)

AOL query log

31,053 queries

7,651 annotated queries

Templates for gathering queries (along with the number of

manually annotated queries per template)



Sentiment in Web Queries

Queries and sentiment categories for the topic “George Bush”.



Sentiment in Query Results

Traces of bias in top-k query results

60 queries, 600 titles, 600 snippets

Sentiment distribution of (a) query result titles, and (b) query result snippets

for the queries from each sentiment class.

(a)(b)



Post-Retrieval Analysis

Post retrieval behaviour of the user

MSN log, 5 topics, 1.5k queries, 79 opinionated,

222 clicked pages

Sentiment distribution of the clicked results for (a) positive

queries, and (b) negative queries.

(a) (b)




Study state-of-the-art methods to detect the sentiment class of a query

Feature vectors

Query text, top-10 result titles and snippets

TF-IDF weights, stemming, stopwords, negations

Classification aproaches

Simple logistic regression (SLR)

Naive Bayes (mNB)

3 SVM types

3 types of one vs all (binnary classifiers)

50/50 split for training/testing




Classification accuracy and AUC for the subjective vs. all classifiers

trained with four different representations of the queries (QAll stands

for QTextTitleSnippet).




Precision-recall curves and BEPs for (a)

subjective vs. all, (b) positive vs. all, and

(c) negative vs. all classifiers.

(a) (b)

(c)



Recommender Methods

Improvment of recommandations by analyzing the sentiment of the suggested query

Our approach: opinionated suggestions

For a query q, generate query suggestions having the same sentiment class as q

Baseline: search engine suggestions

Issue q to a SE (Nov-2011), collect suggested and related queries

Evaluation: compare the opinionated suggestions vs

the SE suggestions



Recommender Methods

User study

Suggested query: rellevant/irrelevant/undecided

15 topics, 30 seed queries, 600 annotated suggestions

CS researchers, AMT workers

Query recommendation performance based on (a) in-house

annotations, and (b) AMT annotations.



Recommender Methods

Search engine’s suggestions (provided as “related queries” and “auto-

completions”, the latter are shown in italics) vs. opinionated suggestions

for the query “economy is really bad”.



Controversial Topic Discovery

Classify sentiment in queries, infer controversial topics

A toy example illustrating controversial topic detection: the procedure

will output only “zen” as being controversial, as it yields very high variance in

query sentiment scores and filter “zendaya”, as its queries have less variance.



Controversial Topic DiscoveryTopics ranked with respect to the variance in sentiment

scores of their queries.

Wicca: a modern pagan religion

cult, good, right

fake, evil, stupid




Comment-Centric Feedback

In-depth analysis on 11mil comments

Studied dependencies between comment ratings and textual content

Explored the applicability of ML technieques to detect accepted and controversial comments

Studied users exhibiting offensive behaviour

Social Feedback

Analysed query/query result characteristics for popular and tail queries

Effectiveness of individual social features for LETOR





Community-Sentiment in Web queries

Studies Sentiment in Web search queries

Methods able to detect the sentiment class of a query

Application 1: Query recommandation method

Application 2: Controversial topic discovery method



Publications

Chelaru, S., Altingovde, I. S., Siersdorfer, S., and Nejdl, W. Analyzing, detecting, and exploiting sentiment in web queries. ACM Transactions on the Web 8, 1 (Dec. 2013), 6:1–6:28

Chelaru, S., Altingovde, I. S., and Siersdorfer, S. Analyzing the polarity of opinionated queries. In ECIR ’12, Springer-Verlag, pp. 463–467

Siersdorfer, S., Chelaru, S., Nejdl, W., and San Pedro, J. How useful are your comments?: analyzing and predicting youtube comments and comment ratings. In WWW ’10, ACM, pp. 891–90

Siersdorfer, S., Chelaru, S., San Pedro, J., Altingovde, I. S., and Nejdl,W. Analyzing and mining comments and comment ratings on the social web. ACM Transactions on the Web 8, (June 2014), 17:1-17:39

Chelaru, S., Orellana-Rodriguez, C., and Altingovde, I. S. Can social features help learning to rank youtube videos? WISE ’12, Springer-Verlag, pp. 552–566

Chelaru, S., Orellana-Rodriguez, C., and Altingovde, I. How useful is social feedback for learning to rank youtube videos? World Wide Web Journal (2013), 1–29

Chelaru, S., Herder, E., Djafari Naini, K., and Siehndel, P. Recognizing skill networks and their specific communication and connection practices. In HT ’14 (Accepted Paper), ACM

Demartini, G., Siersdorfer, S., Chelaru, S., and Nejdl, W. Analyzing political trends in the blogosphere ICWSM ’11.



Thanks

Questions?

Technology

CUbRIK research on social aspects