Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer...

Preview:

DESCRIPTION

This talk is made by reading a very good journal paper: "Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics"

Citation preview

1

“Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics”

Authors: Anindya Ghose, Panagiotis G. Ipeirotis, Member, IEEE

Course: Topics in Data miningPresenter: Nobal Niraula

December 8, 2010 @ UOM

2

Introduction Gathering variables (Attributes) Explanatory study using Econometric

Regression◦ Hypothesis for sales◦ Hypothesis for perceived usefulness

Prediction◦ Helpfulness◦ Impact on sales

Conclusion

Outline

3

Product related word-of-mouth conversations in online markets

Reviewers contribute time and energy Volume of review could be high Benefits

◦ Customers: Usefulness / Helpfulness Average Star Rating Bimodel Peer Review Biased Helpfulness = helpful votes / total votes “Spotlight Review” in Amazon.com

◦ Manufacturers: Influence on Sales Helpful reviews are not necessarily the ones that lead to increases

in sales ! Reviews that affect most should be presented first to

manufacturers

Introduction (1)

4

The paper is unique in looking at how subjectivity level, readability and spelling errors in the text of reviews affect product sales and the perceived helpfulness of these reviews.

Introduction (2)

5

Two Level Study◦ Explanatory Econometric Analysis

Identify aspects of a review a reviewer

◦ Prediction Model using “Random Forests” How peer consumers are going to rate a review How sales will be affected by the posted review

Predicting Helpfulness and Importance

6

Product Reviews

7

Sample Review

8

Reviewer’s Profile

9

Product Rank

10

Variables Collection Products

◦ Audio and video players (144 products),◦ Digital cameras (109 products), and◦ DVDs (158 products).

Product and Sales Data :Retail Price, Sales Rank, Average Rating, Number of Reviews, Elapsed Date

Reviewer History: Number of Past Reviews, Reviewer History Micro, Reviewer History Micro, Past Helpful Votes, Past Total Votes

Reviewer Characteristics: Reviewer Rank, Top-10 Reviewer, Top-50 Reviewer, Top-100 Reviewer, Top-500 Reviewer, Real Name, Nick Name, Hobbies, Birthday, Location, Web page, Interests, Snippet, Any Discloser

Individual Review: Moderate Review, Helpful Votes, Total Votes, Helpfulness

Review Readability : Length(Chars), Length (Words), Length(Sentence), Spelling Error, ARI, Gunning Index, Coleman–Liau index, Flesch Reading Ease, Flesch–Kincaid Grade Level, SMOG

Review Subjectivity: AvgProb, DevProb

11

Readability Analysis◦ Automated Readability Index◦ Coleman-Liau Index◦ Flesch-Kincaid Grade Level◦ Gunning fog index◦ SMOG

Subjectivity Analysis◦ Stylistic Choices : “Subjective” vs “Objective”◦ Each document gets a “Subjectivity Score”

AvgProb (r) : High value Many Subjective sentences DevProb (r) : High Value Mixed (Subj+Obj) sentences

Text of a Review Matters !

12

Hypothesis 1a: ◦ All else equal, a change in the subjectivity level and

mixture of objective and subjective statements in reviews will be associated with a change in sales.

Hypothesis 1b: ◦ All else equal, a change in the readability score of

reviews will be associated with a change in sales.

Hypothesis 1c: ◦ All else equal, a decrease in the proportion of spelling

errors in reviews will be positively related to sales.

Hypothesis for Sales

13

ln(D) = a + b * ln(S)◦ D is the unobserved product demand◦ S is its observed sales rank◦ Pareto Distribution◦ High sales rank low demand

Key Observation:◦ “Sales rank” in Amazon.com can be taken as

PROXY of Demand !

Effect on Product Sales

14

Descriptive Statistics for Econometric Analysis

15

Model to test Hypothesis1

μk is a product fixed effect that accounts for unobserved heterogeneity across products and εkt is the error termControl Variables: Retail Price, Avg. Numeric Rating, Elapsed Date, Number of Reviews

16

Empirical Results for Product Sales

Note: 1. (-ve) decrease Sales Rank Increase Sales2. Variables that Increase Sales: AvgProb, Readability, Spelling

Errors3. Variables that Decrease Sales: Retail Price, DevProbAlso: Reviews with Rating < =2 are associated with increased sales

17

Hypothesis 1a:◦ High subjective sentences increase sales◦ Mixture of subjective and objective sentences are negatively associated

with product sales compared to highly subjective and objective sentences.

Hypothesis 1b:◦ Higher readability scores are associated with higher sales

Hypothesis 1c◦ An increase in proportion of spelling mistakes decreases product sales for

some “experience products” like DVDs however the proportion of spelling errors doesn’t have significant impact on sales for “search products”

Reviews with that rate products negatively can be associated with increased product sales when the review text is informative and detailed !!!

Conclusion (1)

18

Hypothesis 2a: ◦ All else equal, a change in the subjectivity level and mixture of

objective and subjective statements in a review will be associated with a change in the perceived helpfulness of that review

Hypothesis 2b: ◦ All else equal, a change in the readability of a review will be

associated with a change the perceived helpfulness of that review.

Hypothesis 2c: ◦ All else equal, a decrease in the proportion of spelling errors in a

review will be positively related to perceived helpfulness of that review.

Hypothesis 2d: ◦ All else equal, an increase in the average helpfulness of a

reviewer’s historical reviews will be positively related to perceived helpfulness of a review posted by that reviewer.

Hypothesis for Helpfulness

19

Effect on Helpfulness

μk is a product fixed effect that controls differences in the average helpfulness of reviews across products and εkt is the error term

20

Empirical Results for Helpfulness

Note:

(-ve) Lower Helpfulness

Negative Relations:AvgProb, Spelling Error, Moderate

Positive Relations: DevProb **, Disclosure, Readability, Reviewer History Macro, Number of Reviews

21

Hypothesis 2a:◦ In general, mixture of subjective and objective elements more

informative (helpful) by the users.◦ For feature-based goods users prefer reviews having more objective

information and less subjective sentences ◦ For experience goods, e.g. DVD, users expect few objective

sentences but more subjective sentences

Hypothesis 2b – 2d : ◦ Increase in the readability of reviews has a positive and statistically

impact on review helpfulness◦ An increase in proportion of spelling errors has a negative and

statistically significant impact review helpfulness for audio-video products and DVDs.

◦ Past historical information about reviewers has a statistically significant effect on the perceived helpfulness of reviews

Conclusion (2)

22

Main goal◦ Is the review informative or not ?◦ Does the review impact on sales or not ?

Question: given a helpfulness value of a review, decide whether it is useful or not◦ Helpfulness = (Helpful votes/ Total votes)◦ Continuous to binary conversion◦ Threshold found is 60 %

Classification◦ Regression Model can be used◦ Binary Classification

Predictive Modeling

23

Classifiers◦ SVM VS Random Forest

SVM consistently performed worse unlike reported in reports

Training time for SVM was significantly higher than that of Random Forest

Predicting Helpfulness (1)

24

Predicting Helpfulness (2)

25

Examining whether the difference SalesRankt(r)+T − SalesRankt(r) where t(r) is the time the review is posted, is positive or negative.

Predicting Impact on Sales

26

Random Forest based prediction◦ For experience goods such as DVDs classifier has

lower performance◦ Observed high correlation of “classification error”

with “distribution of review ratings”◦ Reviews that have received widely fluctuating

ratings also have reviews with widely fluctuating helpfulness votes.

◦ Highly detailed and readable reviews can have low helpfulness votes

◦ “reviewer-related”, “review subjectivity” and “review readability” features sets are interchangeable!

Conclusion (3)

27

Subjectivity level, readability and spelling errors in the text of reviews affect product sales and the perceived helpfulness

Overall Conclusion

28

Anindya Ghose, Panagiotis G. Ipeirotis, "Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Text and Reviewer Characteristics," IEEE Transactions on Knowledge and Data Engineering, vol. 99, no. PrePrints, , 2010

References

29

Thank You !

Recommended