On Stopwords, Filtering and Data Sparsity for Sentiment Analysis of Twitter

  • View
    442

  • Download
    3

  • Category

    Science

Preview:

DESCRIPTION

Sentiment classification over Twitter is usually affected by the noisy nature (abbreviations, irregular forms) of tweets data. A popular procedure to reduce the noise of textual data is to remove stopwords by using pre-compiled stopword lists or more sophisticated methods for dynamic stopword identification. However, the effectiveness of removing stopwords in the context of Twitter sentiment classification has been debated in the last few years. In this paper we investigate whether removing stopwords helps or hampers the effectiveness of Twitter sentiment classification methods. To this end, we apply six different stopword identification methods to Twitter data from six different datasets and observe how removing stopwords affects two well-known supervised sentiment classification methods. We assess the impact of removing stopwords by observing fluctuations on the level of data sparsity, the size of the classifier's feature space and its classification performance. Our results show that using pre-compiled lists of stopwords negatively impacts the performance of Twitter sentiment classification approaches. On the other hand, the dynamic generation of stopword lists, by removing those infrequent terms appearing only once in the corpus, appears to be the optimal method to maintaining a high classification performance while reducing the data sparsity and substantially shrinking the feature space.

Citation preview

On Stopwords, Filtering and Data Sparsity for Sentiment Analysis of Twitter

Hassan Saif, Miriam Fernandez, Yulan He and Harith Alani

Knowledge Media Institute, The Open University,

Milton Keynes, United Kingdom

The 9th edition of the Language Resources and Evaluation Conference, Reykjavik, Iceland

• Sentiment Analysis

• Twitter

• Stopwords Removal Methods

• Comparative Study

• Conclusion

Outline

“Sentiment analysis is the task of identifying positive and negative opinions, emotions and evaluations in text”

3

The main dish was delicious

It is a Syrian dishThe main dish was salty and horrible

Opinion OpinionFact

Sentiment Analysis

Stopwords Removal

Stopwords Removal in Twitter Sentiment Analysis

- Kouloumpis et al. 2011

- Pak & Paroubek, 2010

- Asiaee et al., 2012

- Bollen et al., 2011

- Bifet and Frank, 2010

- Speriosu et al., 2011

- Zhang & Yuan, 2013

- Gokulakrishnan et al 2012

- Saif et al., 2012

- Hu et al., 2013

- Camara et al., 2013Removing Stopwordsis USEFUL

NOYES

• Precompiled

• Very popular

• Outdated

• Domain-Independent

Classic Stopword Lists

• Unsupervised Methods

– Term Frequency

– Term-based Random Sampling

• Supervised

– Term Entropy Measures

– Maximum Likelihood Estimation

Automatic Stopwords Generation Methods

Stopwords Removal for Twitter Sentiment Analysis

Stopword Analysis Set-Up (1)

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100%

OMD

HCR

STS

SemEval

WAB

GASP

OMD HCR STS SemEval WAB GASP

Negative 688 957 1402 1590 2580 5235

Positive 393 397 632 3781 2915 1050

Datasets

Stopword Analysis Set-Up (2)

Stopwords Removal Methods

1. The Baseline Method

– (non removal of stopwords)

1. The Classic Method

– This method is based on removing stopwords obtained from pre-compiled lists

– Van Stoplist

Stopword Analysis Set-Up (3)

Stopwords Removal Methods3. Methods based on Zipf’s

Law

- TF-High Method

Removing most frequent

- TF1 Method

Removing singleton words (i.e., words that occur once in tweets)

- IDF Method

Removing words with low inverse document frequency (IDF)

Stopword Analysis Set-Up (4)

Stopwords Removal Methods

4. Term-based Random Sampling (TBRS)

5. The Mutual Information Method (MI)

Stopword Analysis Set-Up (5)

Twitter Sentiment Classifiers

– Two Supervised Classifiers:

• Maximum Entropy (MaxEnt)

• Naïve Bayes (NB)

– Measure the performance in Accuracy and F1 measure

– 10 fold cross validation

Experimental Results

Assess the impact of removing stopwords by observing fluctuations on:

- Classification Performance

- Feature space

- Data Sparsity

Experimental Results (1)

1. Classification Performance

70

75

80

85

90

95

OMD HCR STS-Gold SemEval WAB GASP

Accuracy(%)

MaxEnt NB

60

65

70

75

80

85

90

OMD HCR STS-Gold SemEval WAB GASP F1(%)

MaxEnt NB

The baseline classification performance in Accuracy and F-measure

of MaxEnt and NB classifiers across all datasets

Accuracy F-Measure

Experimental Results (2)

1. Classification Performance

60

65

70

75

80

85

90

Baseline Classic TF1 TF-High IDF TBRS MI

Accuracy(%)

MaxEnt NB

50

55

60

65

70

75

80

85

Baseline Classic TF1 TF-High IDF TBRS MI F1(%)

MaxEnt NB

Accuracy F-Measure

Average Accuracy and F-measure of MaxEnt and NB classifiers using different stoplists

Experimental Results (3)

2. Feature Space

0.005.50

65.24

0.82

11.226.06

19.34

Baseline Classic TF1 TF-High IDF TBRS MI

Reduction rate on the feature space of the various stoplists

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

OMD HCR STS-Gold SemEval WAB GASP

TF=1 TF>1

The number of singleton words to the number non singleton words in all datasets

Experimental Results (4)

3. Data Sparsity

0.98800

0.99000

0.99200

0.99400

0.99600

0.99800

1.00000

Baseline Classic TF1 TF-High IDF TBRS MI

SparsityDegree

OMD HCR STS-Gold SemEval WAB GASP

Stoplist impact on the sparsity degree of all datasets

The Ideal Stoplist (1)

• The ideal stopword removal method is the one which:

– Helps maintaining a high classification performance,

– Leads to shrinking the classifier’s feature space

– Reduces the data sparseness

– Has low runtime and storage complexity

– Has minimal human supervision

The Ideal Stoplist (2)

Average accuracy, F1, reduction rate on feature space and data sparsity of the six stoplistmethods. Positive sparsity values refer to an increase in the sparsity degree while negative values refer to a decrease in the sparsity degree.

Overall Analysis Results

Conclusion

• We studied how six different stopword removal methods affect the sentiment polarity classification on Twitter.

• The use of pre-compiled (classic) Stoplist has a negative impact on the classification performance.

• TF1 stopword removal method is the one that obtains the best trade-off:

– Reducing the feature space by nearly 65%, – Decreasing the data sparsity degree up to 0.37%, and – Maintaining a high classification performance.

Recommended