Click here to load reader
Upload
giacomo-carozza
View
747
Download
0
Embed Size (px)
Citation preview
2
What Sentiment analysis is about to identifying and extract subjective
information from source materials.
Why Applying the Sentiment analysis, researchers tries to identify what is the
sentiment behind a text. The input of a sentiment algorithm is a written text
while the output is a classification.
How Statistical Natural Language Processing
Keywords Spotting
Lexical Affinity
Hand-Crafted Models
Sentiment Analysis
3
Overview
User
Favorites
Number of Tweets
Retweets
Framework
Data Output
Adjusted Closing Price
30 days 120 keywords
~1.2 GB of data ~1M tweets
Data Mined
Social networking platform where
users interact among themselves
through messages long
140 characters.
Index considers the 100 largestcompanies among the NASDAQ
Composite’s list.
4
Process Flow Overview
Se
ntim
et
An
aly
sis
5
ClusteringOnce ranked all the users in terms of number of tweets tweeted during the month, we studied the
trend of the aggregated number of retweets and favorites each user got:
This graph has been the basis for the clustering process for the users we
realized on our dataset.
41% 19% 19% 21%
Bots
6
SentimentOccasional Users
A quite large percentage of user’s accounts are fake
accounts are managed by those bots. They can be
created for different reasons: tweets specific contents,
retweet some kind of tweets or follow different user.
Number of tweets between 226 and 6662 tweets
They are users who usually do not tweet about market
related content. They tweeted by chance about these
topics during the month and they content has almost no
engagement.
Number of tweets between 1 and 6 tweets
Bot
7
SentimentMagazines Influencers
The difference from bots is that behind every tweet there
is a person, or a group of persons, which professionally
work on the Twitter account of the
company/journal/magazine etc. They tweet quite often
and their content has high engagement.
Number of tweets between 46 and 225 tweets
they are single persons who work in the financial
market with different roles. They tweet by
themselves and few time per day or per month.
Their tweets has usually a very high engagement.
Number of tweets between 7 and 45 tweets
8
Sentiment
Sentiment Sign Sentiment Volume
Generated by the following function:
X=∑{[Yi(+)xRP(Yi)] - [Yi(-)xRP(Yi)]}
Generated by the following function:
X=∑{[Yi(+)xRP(Yi)] + [Yi(-)xRP(Yi)]}
It is the difference between positive
and negative overall sentiment
generated by each tweet day by day.
It is the sum between positive and
negative overall sentiment generated by
each tweet day by day.
9
Correlation Overview
The comparison between the Adj. Close trend and the trends of the 2 clusters
took into consideration plus their Aggregated trend. The graph takes also into
consideration the days when the stock market was closed(areas grayed out).
10
CONCLUSIONS
Expand the length of
the period took into
consideration (at least a 3
months)
Expand the analysis also to
the other two clusters or to
focus it on our best
performing cluster:
Magazines
Adj. Close and
Influencers:
47.4%
Adj. Close and
Magazines:
68.4%
Further Researches
Try different approaches
to the sentiment analysis
(i.e. Keyword Spotting)
Adj. Close and
Aggregated:
63.2%