Upload
harshitha-chidananda-murthy
View
164
Download
0
Embed Size (px)
Citation preview
Case Study On Yelp
On the temporal dynamics of opinion
spamming: case studies on
Santosh K C, Arjun MukherjeeWWW 2016
-Harshitha Chidananda
Introduction• Online business boom• Online reviews are important!
• Enhance/defame products• Influence the buyer
• Online reviews dominance• Spammers
• The problem of opinion spam has been widespread and has attracted a lot of research attention
Problem
Service FraudCredit-card 0.2%Fake reviews 20%
• Deliberate attempts • Promote/demote • Target products/services
• Fake reviews• Fake profiles
Related Work• Notable works include:
• Detecting individual spammers• Group spammers• Detecting rating behaviors• Unexpected association rules• Linguistic approaches• Semi-supervised methods.
Challenges• Temporal dynamics not clearly understood
How does spamming operate on a daily basis?
What are the dominant spamming policies?
How do the spam injection rates vary upon variation of popularity of entities?
What factors are temporally correlated with opinion spamming?How effective can we predict the long term future of popularity and average rating of a entity in the presence of deception?
How accurately can future deception be predicted?
Are there specific spamming policies that spammers employ?
What kind of changes happen with respect to the dynamics to the truthful ratings on entities. ?
How do buffered spamming operate for entities that need spamming to retain threshold popularity and reduced spamming for entities making better success?
Review on Yelp
Contributions• Spamming Policies • Causal Modeling of Deceptive Ratings
• Predicting Deceptive Ratings • Predicting Truthful Popularity and Rating
Analyze the time-series of fake
ratings
Similar pattern observed
Indicate presence of spamming policies used by spammers.
Process
Overview• Reveals two interesting spamming trends
• Buffered spamming• Reduced spamming
2 types of restaurants:More successful and consequently in lesser need of spamming.
Need spamming to retain threshold popularity
Dataset• Truthful and fake (spam) reviews• 70 popular restaurants• Chicago• 5 year time span
Date of first review
5 years from the start
Yelp as a reference Dataset• Implements review filtering• Maintained by dedicated anti-fraud team• Research on Yelp’s filtering methods shows reliability
Studies revealed that majority (~75%) of the spam is focused on promotion as opposed to demotion. Hence paper focuses on promotion spamming
Dynamics of Spamming Policies• 10 Modalities• 3 Spamming policies
Spamming PoliciesEarly
With for truthful reviews for the initial period of 5 months. Then start spamming
MidDon’t exhibit spam injection until the 14th month
EarlyStart promotion spamming only after the 30th month
Casual Modelling of Deceptive Ratings
• 3 dominant trends of spam injection• Early• Mid• Late
• Characterize based on• Truthful like ratings• Truthful dislike ratings• Truthful review count
• Time-series comparison of truthful and deceptive reviews• Buffered Spamming• Reduced Spamming
Rating dynamics of truthful reviews can potentially determine the future deception rates for each restaurant
Buffered SpammingHow do restaurants deal with their weaning popularity and growth of dislike ratings?
•Proactively inject deceptive reviews?
•Deceptive like/average ratings increases with decrease in truthful like/average rating
Buffered Spamming• A buffer action at work which adjusts the spamming rate by
injecting deceptive like reviews
Truthful ratings Deceptive like ratings
Reduced Spamming• Case: When restaurant maintain decent popularity and rating
• Is there a reduction in the spam injection rate, as they have a better standing already?
• Show a pattern where spam injection rates are reduced when the truthful reviews are in favor.
Truthful ratings Deceptive like ratings
Predicting Dynamics of Deceptive Ratings• Truthful ratings are harbingers• Vector Auto Regression (VAR) model used to predict next week’s deceptive
like rating • Lags 1 week• Lags 2 week
• Prediction for • 10• 20 • 30 weeks window sizes
• Spamming Policies• Buffered• Reduced
• Early spamming is harder to predict
• Buffered spamming error rate is high since it uses complicated algorithm
• Early spamming is harder to predict
Imminent Truthful Popularity
Predicting Truthful Popularity and Rating • Do deceptive reviews affect a restaurant’s popularity and average
ratings?• Training
• 10 weeks• Prediction
• 6 months• Only truthful reviews used• 4 feature families used
Prediction Results - Popularity • Popularity refers to the total number of reviews in a time period
Mean Absolute Error
Prediction Results - Rating • Model performs better as features Opinion Lexicon, N-Grams and
Aspect Sentiment Lexicon are added in both popularity and rating prediction
• Natural language signals are helpful
Mean Absolute Error
How reliable are Yelp’s filtered reviews?• Significant increase in mean average error(mae) upon adding
review filtered by yelp across all policies• Reviews filtered by yelp
• Imparted noise • Harmful to popularity/rating prediction• Not representative of truthful experiences
• Yelp’s filter although may not be perfect is reasonably reliable
Strengths and WeaknessesStrength• New approach• Good results• Used large set of reviews• Good features selected
Weakness• Lack of term explanation• Graph less explained
Open Issues• Applicable to other popular review websites?• Demographic impact?• Deeper into NLP
My thoughts and ConclusionMy thoughts• In depth analysis of temporal
dynamics of opinion spamming
• Used to check the influence of deceptive ratings on true ratings
• Validate Yelp’s filtering process
Conclusion• Time series analysis• Deceptive and true ratings well
correlated• 3 dominant spamming policies
• Early• Mid• Late
• 2 spamming policies• Reduced• Buffered
Thank You!Questions?