Upload
gary-perkins
View
215
Download
0
Embed Size (px)
Citation preview
Collecting Evaluative Expression for Opinion Extraction
Nozomi Kobayasi, Kentaro Inui, Yuji Matsumoto (Nara Institute)
Kenji Tateishi, Toshikazu Fukushima
(NEC Internet System Lab)
IJCNLP 2004
Lun Wei Ku, 2005/04/21
What are they going to do?
The seats are very comfortable and supportive. But the back seat room is tight.
– <Product_X, seats, comfortable>– <Product_X, seats, supportive>– <Product_X, back seat room, tight>
Related Work
• Classify reviews into recommended or not recommended.
• Positive sentences and negative sentences.
• Acquiring subjective words – adjectives, nouns, verbs and adverbs.
• Using patterns
Related Work
1. Bing Liu, Minqing Hu and Junsheng Cheng. "Opinion Observer: Analyzing and Comparing Opinions on the Web" To appear in Proceedings of the 14th international World Wide Web conference (WWW-2005), May 10-14, 2005, in Chiba, Japan.
2. Mining and summarizing customer reviews". Proceedings of the ACM SIGKDD 2004
Attribute and Value
• <Subject, Attribute, Value>
• Take orientation as a special type of Value
(I like the lether seats of Product_X)
• <Attribute> of <Subject> is <Value>
Collecting Expressions
Iterate the following two steps:• Candidate generation:
– Web documents– Coocurrence patterns– Subject/attribute/value dictionary– Coocurrence
• Candidate selection:– Human judge– Update dictionaries
Collecting Expressions -- Example
• Pattern: <Attribute> is <Value>
• Sentences:– …<the handling> is <excellent> and …– …<the gas mileage> is <great>…
Provide only highly ranked candidates to the human judge.
Experiment Resources
• Domain: cars and video games
• 15,000 reviews (230,000 sentences) for cars and 9,700 reviews (90,000 sentences) for games.
• Dictionaries:– Subject: 389 for cars (“BMW”,”TOYOTA”)
and 660 for games (“Dark Chronicle”, “Seaman”)
Experiment Resources
– Attribute: 7 for both domains. (cost/price/service/performance/function/support/design)
– Value: using thesaurus, 247 mostly adjectives. (good/beautiful/bright/like/favorite/high)
– Patterns: select 8 patterns, decide which pattern to use according to POS. Scores are given to these patterns.
Results
Discussions
• No convergence: compound expressions
• Coverage: 45% (car) , 35% (game)
Discussions
• Value patterns outperformed attribute patterns.– Value coocurrs with not only attributes, but als
o named entities and general nouns.– There are problems in deciding attribute scope.
• Character
• Face character
• Motion character
Discussions
Conclusions
• A semi-automatic methods based on cooccurrence patterns of subjects, attributes and values.
• More efficiently than manual collection.• Cooccurrence patterns works well across dif
ferent domains.• Future work: directly extract triplets <Subje
ct, Attribute, Value> from Web.