The Stanford Natural Language Processing Group€¦ · Web viewO= Opinion Word F= Feature Word I= Intensifier S= Sentence Sentiment Analyzer . Bootstrapping and Expansion using

Automated Summarization of Restaurant ReviewsManoj Pawar, Deepak Mallya

{mpawar,dmallya}@stanford.eduJune 4th, 2010

IntroductionWebsites like Yelp.com or we8there.com and other restaurants reviews websites nowadays have lots of user reviews. Numbers of reviews for each restaurant are increasing day by day because of accessibility of Internet everywhere. For each restaurant normally more than 50 reviews are available. Websites like yelp.com have 5 star rating system along with reviews but it hardly helps for users to understand what is great and what is bad for this restaurant. It is also nearly impossible for someone to read all those reviews. Lot of people wants to make quick decisions about restaurants for dining or lunch. Reading through all the reviews is not only cumbersome; it can confuse lot of people about overall positive or negative sentiments about specific features.

ObjectiveIn this project, we are thinking of providing for each restaurant a set of features from all available reviews. These features can be anything from general features like “food”, “ambience” to very specific features like “Caesar salad”. We are also going to classify them under either Good or Bad from positive or negative sentiments from reviews.

OverviewWe view solution to this problem as 3 steps. 1.feature extraction 2.Sentiment Analysis 3.Classification.

Our first task is to extract important features about the restaurant. Taking out features like “food”, “ambience”, “staff” etc. is the important part of whole process of summarization. Once we retrieve these features from reviews we need to analyze each sentence in which these features are being talked about. Taking out important sentiment from the sentence and giving it positive or negative class is second step of the process. Once we are done with each

FeatureExtractions

Sentiment Analysis

Sentiment Classification

Food

feature and their sentiments, we need to summarize on these features, classification will help users to understand what is overall sentiment on a particular feature.

Related Work

Recently many ideas have been proposed on automatic summarization of reviews. Hu and Liu [1] talked about extracting frequent features and bootstrapping techniques to analyze sentiments. Popescu and Etzioni [2] describe relaxation modeling for product features semantics. They have introduce OPINE system for feature extractions and associating opinions to these features. Pang and Lee [3] have discussed various different classification and sentiment analysis method in opining mining. We have taken simple approach of finding frequents nouns and adjectives from reviews which gives us a very good idea about features and sentiments on that. Combination of nouns and adjectives and their pair count helps us to prune unwanted features. Bootstrapping and WordNet expansions of positive and negative sentiments along with part of speech structure of the sentence have helped us in sentiment classification.

Data Collection

We have used reviews provided by Prof Andrew Ng’s lab for our experiments. We have worked on data of 196 restaurants with a total number of 99693 reviews. These reviews are from we8there.com. These reviews are in the following form.

Example:<Restaurant>

<id> 595 </id><Review>

<overallRating> 4.0 </overallRating><foodRating> 5.0 </foodRating><ambianceRating> 3.0 </ambianceRating><serviceRating> 4.0 </serviceRating><noiseRating> 2.0 </noiseRating><text> Food and service is always top notch at Vinny's - We both had brunch and it was very well prepared and the service is always attentive, but not overbearing - Best value in the Windward area in our book </text>

</Review>..

</Restaurant>

We wrote initial script to retrieve reviews text for each restaurant and extracted sentences from these reviews. These sentences were then used for feature extraction and analysis. We found that some of the reviews text doesn’t have sentence boundaries so we use delimiter like comma, punctuation mark and hyphen for sentence extractions for further analysis.

As you can see every review has overall rating, food rating etc., but we have not used these rating, we have just analyzed text part of each review.

Feature Extraction

We extracted feature from 99693 available reviews. For all available reviews, we ran Stanford POS tagger to tag each word in the reviews with POS tags. Following is an example of the tagged text for a sentence.Example SentenceFood and service is always top notch at Vinny's.

POS tagged Sentence: Food_NNP and_CC service_NN is_VBZ always_RB top_JJ notch_NN at_IN Vinny_NNP ._.

Our intuition is that most of the features for the restaurants will be noun (NN and NNS). We collected counts for each noun on all the available 99693 reviews and sorted them according to their number of occurrences. This worked pretty well for us, as we were able to get most talked features about restaurants.

We wrote a Perl script to parse this file and generate a count of all features for every possible POS Tag. Here are two lines of this file with counts for NN and JJ for a few features.

Per line we have POStag->Word,WordCountNN-> food,51179 service,32587 restaurant,20863 time,16012 experience,15294 menu,13775 dinner,11099 place,11097 table,10279 wine,10229 meal,8866 staff,8565 dining,6943 waiter,6738 night,6154 atmosphere,6053 server,5932 everything,5042 dessert,5000 bit,4879 evening,4860 steak,4763 reservation,4698 bar,4530 lunch,4363 salad,4228 ambiance,4091 list,3887 price,3763

JJ-> great,31836 good,30915 excellent,17761 wonderful,9861 nice,9808 special,8322 delicious,7846 outstanding,6443 friendly,5998 attentive,5906 first,5797 little,5678 other,5537 amazing,4391 fantastic,4294 small,4160 perfect,3958 . . .We extracted top 200 features according to most frequent words from POS

tags "NN" and "NNS". So as we can see from above most people talk in general about feature "food", "service", "restaurant", "time", "experience" etc. a lot while commenting in reviews. Also we saw that “restaurant” itself is a feature, which tells that people talk in general about the restaurant a lot. Other features are specific to "food", "service", "experience" and what people commented about a restaurant.

Bigram Feature Extraction

There is a chance that some features can be multiple word, approach described in previous section won’t be able to handle it. We decided to separately extract bigram features from POS tagged reviews. We look for consecutive NN and NNS as bigram independently and take bigrams, which occur frequently with each other. By doing this we were able to extract features like “table service”, “wine menu”, “caesar salad” and “dining experience”. We combined bigram features and unigram features from previous section with each other to make rich set of features for restaurants.

Automatic Feature Pruning

As we are collecting frequent nouns, we get some high frequency nouns like “wife”, “husband” etc. But in our approach, while doing sentiment analysis on each sentences, we check for adjectives associated with these features, which helps us to prune this features. As these high frequency nouns, which are not actual feature, wont be having any adjectives associated with them most of the time. So ratio of adjectives with these features is pretty low. This indicates that these features are not been opinioned in the reviews. So these features get pruned automatically at later stage of our approach.

Example:

Feature “wife” - my wife gave it her greatest compliment.

nsubj(“wife”, ”gave”);

As you can see “gave” wont be on our top adjective list. Feature “wife” will be dropped at sentiment analysis phase.

Sentence Pruning

Once we have most frequent features of restaurants, we do sentence pruning from reviews text and only extract those sentences, which contain any of the features from our list of frequent noun features. The motivation is that

reviews contain a lot of sentences, which don't talk about opinions on any feature of a restaurant. We prune such sentences and only use those sentences, which talk about any features that we have extracted from our feature extraction step. We only store those sentences and the feature labels for these sentences and pass it to the Type Dependency Parser.

Example:Actual review:I have forgotten his first name but his last name is Frankel. Our food was delicious. The restaurant became noisier as the evening progressed but because we were in one of the intimate side sections, the noise never became overwhelming.

Data after Sentence Pruning:food => Our food was delicious.restaurant => The restaurant became noisier as the evening progressed but because we were in one of the intimate side sections, the noise never became overwhelming.

As you can see we have pruned first sentence, as it doesn’t contain in any extracted feature.

Type Dependencies on Pruned Set of Sentences

We use the pruned set of sentences for each restaurant that contain the features from the Sentence Pruning step and run type dependency parser on these sentences for each Restaurant. We have used Stanford Parser to output type dependency format for each sentence. Stanford parser uses dot, exclamation mark and hyphen as a sentence boundary. We have used same boundaries for extracting sentences from reviews. We store type dependency for each sentence.

Example:

The type dependency generated for a sentence below is as followswe run it as follows using LexicalParser and output only the dependencies.

java -mx1200m -cp "$scriptdir/stanford-parser.jar:" edu.stanford.nlp.parser.lexparser.LexicalizedParser -outputFormat "typedDependencies" $scriptdir/englishPCFG.ser.gz $*

Feature => Sentence pair:staff => staff was super attentive.food => the food was brilliant.ambience => the ambiance and decor excellent.

Ouput of the parser.nsubj(attentive-4, staff-1)cop(attentive-4, was-2)amod(attentive-4, super-3)det(food-7, the-6)nsubj(brilliant-9, food-7)cop(brilliant-9, was-8)ccomp(attentive-4, brilliant-9)det(ambiance-12, the-11)dep(excellent-15, ambiance-12)conj_and(ambiance-12, decor-14)dep(excellent-15, decor-14)ccomp(attentive-4, excellent-15)

We generate separate files(196 files) for each Restaurant with all sentences, which contain features. They are followed by the Type Dependencies as in the above format. We call this the dependencies file for each Restaurant. Once we have these dependencies file we extract the opinions about each feature by running our SentenceAnalyser script.

Opinion Extraction

We have implemented a Perl script that performs the Opinion Extraction from the dependencies file for each restaurant. We have automated this task by having a single script that calls our SentenceAnalyser on each of the Restaurant dependency files. The script takes three input files and produces one output file. The first input file is the Restaurant review sentences and the second file is the feature for each sentence and third file is the Type dependencies (dependencies file) for each Sentence. The input files have one to one correspondence on line level.

Restaurant1FeaturesFile Line 1 -> Restaurant1ReviewSentencesFile Line 1 Restaurant1FeaturesFile Line 2-> Restaurant1ReviewSentencesFile Line 2. . .

Restaurant1FeaturesFile Line N -> Restaurant1ReviewSentencesFile Line N

we use the three important type dependencies in doing opinion word extraction for given feature.

1) nsubj : nominal subject A nominal subject is a noun phrase which is the syntactic subject of a clause. The governor of this relation might not always be a verb: when the verb is a copular verb, the root of the clause is the complement of the copular verb.Example:-“Clinton defeated Dole” nsubj(defeated, Clinton) “The baby is cute” nsubj(cute, baby)

2) amod: adjectival modifierAn adjectival modifier of an NP is any adjectival phrase that serves to modify the meaning of the NP.Example“Sam eats red meat” amod(meat, red)3) neg: negation modifier The negation modifier is the relation between a negation word and the word it modifies.Example“Bill is not a scientist” neg(scientist, not) “Bill doesn’t drive” neg(drive, n’t)

The SentenceAnalyser takes the dependencies files and extracts "nsubj", "amod" and "neg" tags and interprets them to extract opinions of a feature.For example consider the sentence below. "staff was super attentive , the food was brilliant , the ambiance and decor excellent."We have already extracted the features like "staff", "food" and "ambiance" from our feature extraction step. We then look for these features in the nsubj, amod and neg tags to obtain opinions. For example from the dependencies above we have only "nsubj" and "amod" tags to identify opinions about the sentence.nsubj(attentive-4, staff-1)amod(attentive-4, super-3)

nsubj(brilliant-9, food-7)We report opinions as follows for each sentence and store the results in another file called summaryFile per Restaurants."staff" -> "super","attentive""food" -> "brilliant"Even though we identified "staff", "food", "ambiance" and "decor" as features we could get opinion information only about staff and food looking at the dependencies.In the above example, we identified "attentive" as well as "super" to be opinion words for "staff". We do that by looking at if any of the features has an nsubj and if so we get the opinion identifier and see if there is any other adjective modifying the opinion adjective, which in the above case is “super”, which is modifying "attentive". Here even though "super" seems to be an intensifying the opinion "attentive" we currently report it as an "opinion". We also try to identify negation of opinion words, which will invert the polarity of an opinion.

Example"we both felt that the clam chowder broth was really thin and not as creamy and thick as previous trips"nsubj(felt-3, we-1)nsubj(thin-11, broth-8)nsubj(creamy-15, broth-8)advmod(creamy-15, as-14)ccomp(felt-3, creamy-15)amod(trips-20, previous-19)prep_as(thin-11, trips-20)

So we extract "broth" and "chowder" as features and identify the opinion words from the type dependencies belownsubj(thin-11, broth-8)neg(creamy-15, not-13)We identify the opinion words for "broth" as "~creamy" and "thin" as shown below. We have put a tilde "~" before the opinion word to signify a negation in polarity. "broth"->~creamy,thinWe did not get any opinion about "chowder" from the type dependencies.

The Algorithm for the Sentence Analyzer using Type Dependencies is as follows

O= Opinion Word F= Feature Word I= Intensifier S= Sentence

Sentiment Analyzer

Bootstrapping and Expansion using WordNet

We extracted frequent adjectives(JJ), which were generated by the POS tagger on all the 99693 reviews. Our intuition is most of the opinion words

nsubj(O,F)

(F,S)

print(~O)

amod(O,F)

print(O)neg (O)

NO

YesYes

YesNO

Yes

amod(O,I)

are adjectives. We labeled 200 words from this extracted frequent adjective list as good and 200 words as bad. Since these 200 words don’t cover all opinion words we decided to automatically expand list by using WordNet.We used Java(JAWS) API of WordNet for expansion problem. We have generated synsets of for these good and bad words and took only synsets of these words, which are of the form AdjectiveSatellite. For each of these synsets we have extracted all the wordForms and produced the expansion set of good and bad words.

Here is an example of a good and bad word expansion

For opinion Word "excellent"excellent, first-class, fantabulous, splendid

For opinion Word "elegant" elegant, graceful, refined

For opinion Word "disappointed"disappointed, defeated, discomfited, foiled, frustrated, thwarted

For opinion Word "worst"worst, bad, big, tough, spoiled, spoilt, uncollectible, risky, high-risk, speculative, unfit, unsound, forged, defective

Once we have this list of Good and Bad opinion words. Making decision on feature given an opinion word is trivial.Example:"staff" -> "super","attentive"As “super” and “attentive” are both present in our good opinion list, we increase the count of good for staff.staff(good=2,bad=0)

Sentiment Classifier

We maintain count of good and bad adjectives for each feature and in the end depending on count of good opinion vs. bad opinions we assign a class for feature.

For a given Restaurant

Feature F opinionsgood > opinionsbad = GOOD (F)

else opinionsgood < opinionsbad = BAD (F)

Example:Restaurant id “595”food(good:8, bad:0)service(good:3, bad:0)breakfast(good:0, bad:1)

So for above example “food” and “service” will get overall sentiment of good from all the reviews while “breakfast” will go under bad sentiments.GOOD BADFood BreakfastService

Error Analysis

As you can see we found out that not all noun can be taken as features. So nouns like ‘something’ and ‘nothing’ etc. were also produced as output of feature extraction. We took out top features with extremely high frequency; this process was able to prune some words, which were not features. But still there are some unwanted word nouns, which occur frequently.

Here is a list of few sentences, which have a "neg" type dependency, which helps in inverting polarity. "~" symbol as described earlier inverts the polarity of the opinion Word.The format below is in the form"feature"->"opinion Words"->"sentence". These are some correct examples.1) waiter->delightful,~attentive,-> waiter was delightful but not especially attentive.2) food->~disappointment,->every time i come here the presentation and food are never a disappointment.3) wine->~good->the wine was n't good and the `` wine expert '' lead us astray.4) atmosphere->~stuffy,first,class,->the building is historic and the restaurant's atmosphere is first class but not stuffy.

5) cobbler->~best,->my favorite dessert was the creme brulee but the mississippi mud pie and the apple cobbler were not the best.6) grill->~disappoint,->as per usual water grill did not disappoint.7) food->~fabulous,->service was very good -food was not as fabulous as I remember. We found following errors at type dependencies step.

1) service->~attentive,->service was attentive but not over-bearing .We get an Error here saying that the service is not attentive which is wrong. This is because the neg type dependencies for this sentence isnsubj(attentive-3, service-1)neg(attentive-3, not-5)

Some other sentences with this problem are shown below2) service->~prompt,->our service was prompt but not professional .3) place->~crowded,->the place was pretty crowded but not so overwhelmingly packed so we still had privacy .4) service->wonderful,unpretentious,~intrusive,->our service was wonderful - attentive not intrusive and unpretentious .Here we get an error on unpretentious since it's polarity is not inverted. This is difficult because there is no "not" before unpretentious but the reviewer actually means not intrusive and not unpretentious.

Evaluation for Opinion Extraction for Features

We evaluated the opinion's by taking 2 main features "food" and "service" of restaurant and evaluating them manually and comparing with our system.

Feature "Food"

Number of Sentences containing "Food" feature = 42.Number of Sentences identifying correct opinion for "Food" feature = 32Percentage Correct = 32/42 = 76.19%

Feature "service"Number of Sentences containing "service" feature = 47.Number of Sentences identifying correct opinion for "service" feature = 36Percentage Correct = 36/47 = 76.59%

Evaluation for Feature Extraction

We evaluated feature extraction by manually looking at sentences for a sample of 50 sentences in a restaurant and determined how many features we missed.

Number of Sentences = 50Total Features Automatically Extracted Correctly = 68Total Features Automatically Extracted InCorrectly = 27Total Features Not Extracted(checked manually) = 10

Precision = 68/95 = 71.57%Recall = 68/78 = 87.17%

Results:Restaurant Id: 10381

GOOD BAD

amazing food:11.0 dark bit:4.0

marvelous service:6.0 small beef hash:2.0

great lunch:2.0 other restaurant:1.0

great place:2.0 horrible seating:1.0

friendly ambiance:2.0

great wine list:2.0

unbelievable experience:2.0

superb wait staff:1.0

Restaurant Id: 10636

GOOD BAD

amazing food:30.0 cold lobster soup:1.0

decent service:20.0 slow kitchen:1.0

great experience:8.0 small table:1.0

amazing steak:5.0

satisfied dining experience:4.0

solid wine list:4.0

great time:3.0

professional reservation process:2.0

refreshing ambiance:2.0

impressive atmosphere:2.0

Conclusions and Future Work

We found combination of NN or NNS with adjectives in the sentence gives you more accurate feature and opinion mining. We were able to achieve better precision and recall because of WordNet expansion on opinion words. Since our problem definition is little bit different from most of reviews summarization approaches, we found out that our approach need long reviews and complete natural sentences to perform well.In future work, I think if someone needs less features, we can classify various features under some big feature classes. Features like "food", "drink" and "salad" can go under bigger class like "Food", and features like "waiter", "waitress" and "server" can go under "staff" feature class. This work will basically include feature classification.In future work, we also plan to use machine-learning classifier for semantic classifier.Also we will also like to work on extending this strategy for digital world product review summarizer.

Contributions

Initially, Manoj worked on converting data from protocol buffer storage system to xml format. Deepak worked on writing scripts for running Stanford POS tagger and Type dependency Parser. He also worked on feature extraction and opinion extraction, WordNet expansion. Manoj

worked on Sentence pruning, Semantic analyzer and classification. We both worked on doing error analysis and generating result and writing this document.

References

1. Mining Opinion Features in Customer Reviews" by Hu etc.

2. Extracting Product Features and Opinions from Reviews, Popescu etc.

3. Opinion mining and sentiment analysis Bo Pang and Lillian Lee

4. Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. Thumbs up? Sentiment classification using machine learning techniques

5.Bo Pang and Lillian Lee, A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts

6. Stanford POS tagger http://nlp.stanford.edu/software/tagger.shtml

7. WordNet lexical database for the English language. Site: http://WordNet.princeton.edu

8.We8there.com

9.Yelp.com

http://research.yahoo.com/bouncer_user/13

http://nlp.stanford.edu/software/tagger.shtml

http://www.cs.cornell.edu/home/llee

Documents

The Stanford Natural Language Processing Group€¦ · Web viewO= Opinion Word F= Feature Word I= Intensifier S= Sentence Sentiment Analyzer . Bootstrapping and Expansion using