Opinion Mining on Hotel Reviews

  • View
    431

  • Download
    2

Embed Size (px)

Transcript

PowerPoint Presentation

Opinion Mining on Hotel Reviews CSE-4250: PROJECT AND THESIS-I

Presented ByMd. Rafeedul Bar Chowdhury ID:11.02.04.088Zahidul Haque ID:11.02.04.086Mahmud Hossain ID:11.02.04.082Soumik Das Bibon ID:11.02.04.003

1

Course Teacher MIR TAFSEER NAYEEM

Lecturer, Dept. of Computer Science & EngineeringAhsanullah University Of Science & Technology

18-Jun-152

2

What is Opinion?Opinions are thoughts influencing decision making.

Which schools should I apply to?Which professor to work for?Whom should I vote for?Which hotel should I book?

18-Jun-153

Whom shall I ask for opinions?Pre WebFriends and relativesAcquaintances

Post WebBlogs (google blogs, livejournal)E-commerce sites (amazon, ebay)Review sites (CNET, PC Magazine)Discussion forums (forums.craigslist.org, forums.macrumors.com)

18-Jun-154

Have enough opinions!

Now that I have got enough opinions, I can take decisions

Is it really enough?

18-Jun-155

Having opinions is not enoughSearching for reviews is quite difficultSearching for opinions is not as convenient as general web search Huge amounts of information available on one topicDifficult and quite impossible to analyze each and every review separatelyExpression of reviews are different in many ways overall, this hotel is my first choice at Coxs Bazar the facilities are good but the service isnt so best in Coxs Bazar by quality and value disappointing

18-Jun-156

What is Opinion Mining?Computational study of opinions, sentiments and emotions expressed in text.

Detects the contextual polarity of text (positive or, neutral or, negative)

Derives the opinion, or the attitude of an opinion holder.

18-Jun-157

Mining opinions

Its about finding out what people think

18-Jun-158

Topic of our researchMining traveler experiences/reviews expressed on services provided by hotels in Bangladesh

18-Jun-159

Our research domainWe are mining reviews from the travelling guide, TripAdvisor www.tripadvisor.com/Bangladesh

We are starting our work from the district that attracts tourists the most which is Coxs Bazar

Primarily we are mining reviews of travelers visiting various hotels in this tourist gathering area

18-Jun-1510

TripAdvisor

23118-Jun-1511

TripAdvisor(Cont.)Searching panel for hotels, situated on different locations/arease.g. Search Coxs Bazar, Bangladesh, Asia for Long Beach Hotel

Panel for booking hotels based on prices

Searching Panel for a sorted list of hotels based on particular preferencese.g. types of hotel, most rated hotels etc.

18-Jun-1512

TripAdvisor(Cont.)

1218-Jun-1513

TripAdvisor(Cont.)List of hotels sorted by traveler ratings

Panel where travelers checks vacancy and books hotels e.g. Check-In Date/Check-Out Date of travelers18-Jun-1514

TripAdvisor(Cont.)

12318-Jun-1515

TripAdvisor(Cont.)Hotel rating in stars(out of 5) based on type of reviews (positive or, negative) given by travelers e.g. Rating: 4.5 out of 5 stars; 95 Reviews

Position of the hotel among other hotels in an area depending on positive feedbacks from the travelerse.g. No.1 of 24 hotels in Coxs Bazar

Special services provided by the hotele.g. Free parking, Free breakfast etc.18-Jun-1516

Reviewers of TripAdvisor

1352418-Jun-1517

Reviewers of TripAdvisor(Cont.)Name and residence of the reviewing traveler e.g. MUHassan, Dhaka City, Bangladesh

Information about the profile of the reviewer on TripAdvisor site e.g. Top Contributor, 50 Reviews, 19 helpful votes etc.

Rating provided by the reviewer and the time of the reviewe.g. 5 out of 5 stars, Reviwed 12,June,2015 etc.

The review of the traveler in a large opinioned paragraph Voting panel for other users to vote a review if it is helpful or not!18-Jun-1518

Reviewers of TripAdvisor(Cont.)

1218-Jun-1519

Reviewers of TripAdvisor(Cont.)

All the reviews provided by a certain reviewer on the site

Feedback given by a hotel manager upon reviewing of a traveler 18-Jun-1520

Problem domain of our researchIntegrity of the reviews can not be ensured

Due to competition among the hotels high possibility of false reviews

Amount of reviews highly varies which sometimes results in inaccurate ratings of hotels

18-Jun-1521

Solution to these problems Only the reviews given by verified travelers will be prioritized first e.g. Top Contributor, Contributor etc.For the rating system to work there has to be at least 10 or more reviews

18-Jun-1522

Goal of our research Rating the hotels according to reviews based on potential opinion holders

Finding out the hotels which impacts the most upon the traveling tax

Mining each individual reviews then summarizing them according to polarity to get accurate opinion about a hotel18-Jun-1523

Works we have done so far Primarily we have parsed data from popular hotels in Coxs Bazar We have mined and stored data of 15 hotels initially This data will work as our data set to further our research which contains reviews and reviewers information We have collected another data set from National Board of Revenue(NBR) about the revenue on traveling tax18-Jun-1524

Tools for Parsing dataThe language we used for parsing data,-python(2.7.9) https://www.python.org/

We used a library for extracting data from HTML and XML files,-beautifulsoup (vers. 4)

18-Jun-1525

Flowchart for parsing data

18-Jun-1526

Code for parsing data

18-Jun-1527

Code for parsing data(Cont.) head : this variable stores headline of the reviews date : time when the review was given member_info : name and residence of the reviewermember_badge : Information about reviewers profile e.g. Top Contributor, 50 Reviews etc. mngr_com : Feedbacks on reviews from hotel managers review : The whole review will be stored in this variable

12345618-Jun-1528

Data set created from the parsed data

1235418-Jun-1529

Data set on traveling taxThis is our data set which will be used to calculate the impact of hotels on traveling tax

It contains data on collected revenue from the tourism sectortill fiscal year 13 - 14 to 14 - 15 18-Jun-1530

Data set on traveling tax(Cont.)

18-Jun-1531

Why Prioritize Opinions?Opinion prioritizing is a must in order to distinguish mature and professional reviewers opinion from an amateur reviewers opinion.

A professionals point of view and interest will definitely make a difference in case of ranking a hotel.

There will always be some integrity/accuracy issues when it comes to opinions as someone stating a feature as satisfactory while other will state otherwise ( Whose opinion to choose? )

Prioritizing Opinions(Cont.)In TripAdvisor, there is already a priority rating system available for the reviewers!

Reviewer: (3 5) reviews Senior Reviewer: (6 10) reviews Contributor: (11 20) reviews Senior Contributor: (21 49) reviews Top Contributor: (50+) reviews

Prioritizing Opinions(Cont.)For the prioritizing purpose, we gave a potential opinion holders value for each class of reviewers.

Reviewer: 2Senior Reviewer: 4 Contributor: 5 Senior Contributor: 7 Top Contributor: 10

This value indicates which reviewer gets more priority than others.

Opinion SubjectivityOpinion Subjectivity is categorizing reviews as positive, negative or objective. Methods we are using for calculating opinion subjectivity:

-SentiWordNet -Semi Supervised Learning Approach -Naive Bayes Method

SentiWordNetIt is a lexical resource that is open publically for opinion mining

It scores every word in the database as positive(+) negative(-) or objective/neutral values.

Semi Supervised Learning ApproachIn the beginning synsets is labeled manually.i.e SentiWordNet in our case

Labeling will be expanded by machine learning approach. i.e we are using Naive Bayes Method

This process is efficient and helps avoiding labeling errors.

Naive Bayes MethodThis method is used to predict the values of words that are not in the database or synsets.The main formula is:

Where,P(c) = Probability of Unlabeled data(c)P(d) = Probability of Labeled data(d)

Final Calculation

For instance:Lets assume:

Hotel 1: Long Beach Hotel Hotel 2: Resort Labiba BilashPotential Opinion Holders value: 7Potential Opinion Holders Value: 2Opinion Subjectivity: +15.7 Opinion Subjectivity: +12.3For, Hotel 1: Z = 7 * (+15.7) For, Hotel 2: Z = 2 * (+12.3)= 109.9 = 24.6So, Hotel 1 will be ranked above Hotel 2. (assuming there is only one review in each hotels)

References[1] B. Liu, Sentiment Analysis and Subjectivity. A Chapter in Handbook of Natural Language Processing, 2nd Edition, 2010. http://www.cs.uic.edu/~liub/

[2] Kavita Ganesan , Hyun Duk kim, Opinion Mining-A Short Tutorial.

[3] Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan, Thumbs up? sentiment classification using machine learning techniques 2002.

[4] http://sentiwordnet.isti.cnr.it/

[5] https://en.wikipedia.org/18-Jun-1541

18-Jun-1542