Case study on Yelp spamming

Case Study On Yelp

On the temporal dynamics of opinion

spamming: case studies on

Santosh K C, Arjun MukherjeeWWW 2016

-Harshitha Chidananda

Introduction• Online business boom• Online reviews are important!

• Enhance/defame products• Influence the buyer

• Online reviews dominance• Spammers

• The problem of opinion spam has been widespread and has attracted a lot of research attention

Problem

Service FraudCredit-card 0.2%Fake reviews 20%

• Deliberate attempts • Promote/demote • Target products/services

• Fake reviews• Fake profiles

Related Work• Notable works include:

• Detecting individual spammers• Group spammers• Detecting rating behaviors• Unexpected association rules• Linguistic approaches• Semi-supervised methods.

Challenges• Temporal dynamics not clearly understood

How does spamming operate on a daily basis?

What are the dominant spamming policies?

How do the spam injection rates vary upon variation of popularity of entities?

What factors are temporally correlated with opinion spamming?How effective can we predict the long term future of popularity and average rating of a entity in the presence of deception?

How accurately can future deception be predicted?

Are there specific spamming policies that spammers employ?

What kind of changes happen with respect to the dynamics to the truthful ratings on entities. ?

How do buffered spamming operate for entities that need spamming to retain threshold popularity and reduced spamming for entities making better success?

Review on Yelp

Contributions• Spamming Policies • Causal Modeling of Deceptive Ratings

• Predicting Deceptive Ratings • Predicting Truthful Popularity and Rating

Analyze the time-series of fake

ratings

Similar pattern observed

Indicate presence of spamming policies used by spammers.

Process

Overview• Reveals two interesting spamming trends

• Buffered spamming• Reduced spamming

2 types of restaurants:More successful and consequently in lesser need of spamming.

Need spamming to retain threshold popularity

Dataset• Truthful and fake (spam) reviews• 70 popular restaurants• Chicago• 5 year time span

Date of first review

5 years from the start

Yelp as a reference Dataset• Implements review filtering• Maintained by dedicated anti-fraud team• Research on Yelp’s filtering methods shows reliability

Studies revealed that majority (~75%) of the spam is focused on promotion as opposed to demotion. Hence paper focuses on promotion spamming

Dynamics of Spamming Policies• 10 Modalities• 3 Spamming policies

Spamming PoliciesEarly

With for truthful reviews for the initial period of 5 months. Then start spamming

MidDon’t exhibit spam injection until the 14th month

EarlyStart promotion spamming only after the 30th month

Casual Modelling of Deceptive Ratings

• 3 dominant trends of spam injection• Early• Mid• Late

• Characterize based on• Truthful like ratings• Truthful dislike ratings• Truthful review count

• Time-series comparison of truthful and deceptive reviews• Buffered Spamming• Reduced Spamming

Rating dynamics of truthful reviews can potentially determine the future deception rates for each restaurant

Buffered SpammingHow do restaurants deal with their weaning popularity and growth of dislike ratings?

•Proactively inject deceptive reviews?

•Deceptive like/average ratings increases with decrease in truthful like/average rating

Buffered Spamming• A buffer action at work which adjusts the spamming rate by

injecting deceptive like reviews

Truthful ratings Deceptive like ratings

Reduced Spamming• Case: When restaurant maintain decent popularity and rating

• Is there a reduction in the spam injection rate, as they have a better standing already?

• Show a pattern where spam injection rates are reduced when the truthful reviews are in favor.

Truthful ratings Deceptive like ratings

Predicting Dynamics of Deceptive Ratings• Truthful ratings are harbingers• Vector Auto Regression (VAR) model used to predict next week’s deceptive

like rating • Lags 1 week• Lags 2 week

• Prediction for • 10• 20 • 30 weeks window sizes

• Spamming Policies• Buffered• Reduced

• Early spamming is harder to predict

• Buffered spamming error rate is high since it uses complicated algorithm

• Early spamming is harder to predict

Imminent Truthful Popularity

Predicting Truthful Popularity and Rating • Do deceptive reviews affect a restaurant’s popularity and average

ratings?• Training

• 10 weeks• Prediction

• 6 months• Only truthful reviews used• 4 feature families used

Prediction Results - Popularity • Popularity refers to the total number of reviews in a time period

Mean Absolute Error

Prediction Results - Rating • Model performs better as features Opinion Lexicon, N-Grams and

Aspect Sentiment Lexicon are added in both popularity and rating prediction

• Natural language signals are helpful

Mean Absolute Error

How reliable are Yelp’s filtered reviews?• Significant increase in mean average error(mae) upon adding

review filtered by yelp across all policies• Reviews filtered by yelp

• Imparted noise • Harmful to popularity/rating prediction• Not representative of truthful experiences

• Yelp’s filter although may not be perfect is reasonably reliable

Strengths and WeaknessesStrength• New approach• Good results• Used large set of reviews• Good features selected

Weakness• Lack of term explanation• Graph less explained

Open Issues• Applicable to other popular review websites?• Demographic impact?• Deeper into NLP

My thoughts and ConclusionMy thoughts• In depth analysis of temporal

dynamics of opinion spamming

• Used to check the influence of deceptive ratings on true ratings

• Validate Yelp’s filtering process

Conclusion• Time series analysis• Deceptive and true ratings well

correlated• 3 dominant spamming policies

• Early• Mid• Late

• 2 spamming policies• Reduced• Buffered

Thank You!Questions?

Data & Analytics

Case study on Yelp spamming