16
TWITTER SENTIMENT ANALYSIS Margherita Zucchini - Giulia Sosio Web Communication Course 2014/2015 Summary Introduction 1 Sentiment Analysis: contents What is Sentiment Analysis Different Level of Sentiment Analysis Opinion, Emotion and Spam Issue 2 Sentiment Analysis: methodologies How to evaluate a sentiment opinion How to create an efficient lexicon Summarization An example of Sentiment Analysis system: Datumbox API 3 Twitter An italian case study: Voices from the blog and Ihappy index Ihappy index Conclusion Bibliography Twitter Sentiment Analysis 1 Sosio Giulia e Zucchini Margherita

Web Communication: Twitter Sentiment Analysis

Embed Size (px)

DESCRIPTION

This essay investigates sentiment analysis: biases, spamming issues, methods and tecniques and its application on a daily basis through Datumbox API software and UNIMI side-project 'Voices From The Blog'. Written with student Margherita Zucchini.

Citation preview

Page 1: Web Communication: Twitter Sentiment Analysis

TWITTER SENTIMENT ANALYSIS

Margherita Zucchini - Giulia Sosio

Web Communication Course 2014/2015

Summary

• Introduction

• 1 Sentiment Analysis: contents

◦ What is Sentiment Analysis

◦ Different Level of Sentiment Analysis

◦ Opinion, Emotion and Spam Issue

• 2 Sentiment Analysis: methodologies

◦ How to evaluate a sentiment opinion

◦ How to create an efficient lexicon

◦ Summarization

◦ An example of Sentiment Analysis system: Datumbox API

• 3 Twitter

◦ An italian case study: Voices from the blog and Ihappy index

▪ Ihappy index

• Conclusion

• Bibliography

Twitter Sentiment Analysis 1 Sosio Giulia e Zucchini Margherita

Page 2: Web Communication: Twitter Sentiment Analysis

Introduction

Opinions and its related concepts such as sentiment, evaluations, attitudes and

emotions are the subjects of study of Sentiment Analysis and Opinion Mining. The

increasingly growth of the field coincide with those of the social media on the Web,

such as reviews, forum discussions, blogs, micro blogs, Twitter and other social

networks. In our paper, we focus on using Twitter, the most popular micro blogging

platform, for the task of sentiment analysis.

With the growth of social medias, individuals and organizations are increasingly

using the content in these tools for decision making. If one wants to buy a consumer

product, he/she is no longer limited by asking one's friends, because there are many

user reviews and discussions in public forums about the product. For a firm it may no

longer be necessary to conduct surveys, opinion polls and focus groups. However,

finding and monitoring opinion sites on the Web and distilling the information

contained in them remains formidable task because of the proliferation of diverse

sources. Each site typically contains a huge volume of opinion text that is not always

easily deciphered in long blogs and forum posts.

Another important task of this kind of analysis is teach to a computer to

understand symbolic languages or some sentiments as for example irony. How can a

computer understand if a writer is joking? This problem is not a small one, if we are

investigating web for a firm which needs to understand the impact of a new product

on market or is studying a new sector to decide if invest in it or not just count how

many time one or more language marks appear could not be enough especially

because each language has its symbolic and sub-symbolic marks and even if some

groups use the same word and the language the meaning of the same word change

from community to community and it is not possible create a program for each

speech around the world. This task is important even to analyze our society and the

changing inside of it especially for big mass event, Twitter and Facebook were

important media for example during Arabian spring.

In this paper we investigate the answers of information and communication

technology and their application in the real world.

Sentiment Analysis: contents

What is Sentiment Analysis

Sentiment analysis is a mathematics approach to investigate language not just on

Twitter or Facebook but in all the text created by human. The aim of sentiment

analysis' method is understand the meaning of a word or a sentence using statistic

and logic information; it is not just used to know what people think or feel about a

topic but also more information about the writer (his education or his social group).

Twitter Sentiment Analysis 2 Sosio Giulia e Zucchini Margherita

Page 3: Web Communication: Twitter Sentiment Analysis

This kind of approach to work well needs good dictionary and lexicon and a very

good structure to understand which words are important and which not for a specific

language or grammar.

The problem whit this kind of analysis is in the measure of dictionary and lexicons

but also in the capacity of recognize the right word to analyze each language has

different rules about where a word must be and also knowing its rule in a sentence it

is not so simple. On the other hand give too much information about grammar rules

and meaning could is not a solution because not all the people speaking a language

use a correct grammar and speech change more often than grammar's book and

dictionary.

Scientist has found a possible solution to those problems investigate more levels

at more time and in the next paragraph we speak about it.

Different Level of Sentiment Analysis

In general, sentiment analysis has been investigated in mainly three levels.

Document level: the task at this level is to classify whether a whole opinion

document expresses a positive or negative sentiment. This level assumes that each

document expresses opinions on a single entity, not multiples.

Sentence level: it determines whether each sentence expressed a positive,

negative or neutral opinion. This level of analysis is closely related to subjectivity

classification, which distinguishes sentences that express factual information from

sentences that express subjective views and opinions.

Entity and Aspect level: both the document and sentence level analyses do not

discover what exactly people like and don't like. Aspect level performs finer-grained

analysis. Instead of looking at language constructs, it looks directly at the opinion

itself. It is based on the idea that an opinion consists of a sentiment (positive or

negative) and a target (of opinion). An opinion without a target is identified as one of

a limited use.

To improve its lexicon scientist had created some algorithms to let computers

understand at least if an adjective has the same polarity of another or not using

grammar indicators for example “and” or “but” to define clusters of words similar or

not and when they have a positive connection or not. In this way it is possible create

a good lexicon that know the relation between two or more word and their

polarization. When we speak about polarity of a word we speak about the grade of a

word: if it is positive, negative or neutral and about the operation of give it a different

point from other words or from the word whit or without modifier. Polarity is calculated

by this formula:

Twitter Sentiment Analysis 3 Sosio Giulia e Zucchini Margherita

Page 4: Web Communication: Twitter Sentiment Analysis

How we can see polarity is calculated in a positive way and the “i” represent each

day of our examination time. So more texts for more dais means a better precision

about the polarity of a word and about its relationship whit other words of course the

level at which stop and consider entity-polarity defined is fixed by the scientist and at

the end our lexicon would appears for a computer this way:

Using this method we can increase our lexicon starting from just few words simple

but usefull and organize it in cluster that had the same or a similar meaning and

polarity. This way is possible for sentiment analysis been each time more accurate

and specify but for some scientist it is impossible for an algorithm really understand

human languages because of how this work not at a grammar side but at meaning

once and at different levels of languages.

What about a word not in the official dictionary but in an slang one or in a dialect

one? Sentiment analysis is relative simple if we analyze articles or blogs created to

be follow by everyone, but when we start to analyze opinions of normal people it

would not be so easy. For example to speak about rubbish a citizen of Bologna will

use the word “rusco” which is a word used only in Bologna and which come from the

name of a tax for rubbish of course. The meaning of this word is unknown for the

majority of people in Italy but it is used by everyone in Bologna even as a topic for

comments and opinions and from the name come adjectives and adverbs using to

speak about not just rubbish but also person. So there is the need of improve an

algorithm for sentiment analysis whit the capacity of recognize when a sentence is an

opinion or not and its topic and polarity which will be discuss in the next paragraph.

Twitter Sentiment Analysis 4 Sosio Giulia e Zucchini Margherita

Page 5: Web Communication: Twitter Sentiment Analysis

Opinion, Emotion and Spam Issue

In the sentiment analysis field, an opinion is represented by a quintiuple (ei, aij,

sijkl, hk, tl) . Where e i is the name of an entity, a i j is an aspect of ei, sijkl is the

sentiment on aspect aij of entity ei, hk is the opinion holder, and t l is the time when

the opinion is expressed by hk. The sentiment sijkl is positive, negative, or neutral, or

expressed with different strength/intensity levels.

We can distinguish different types of opinions:

• regular opinion, often referred to simply as an opinion. It has got two main sub

types, direct opinion and indirect opinion (which often occurs in the medial domain,

and are harder to deal with);

• comparative opinion, which express a relation of similarities or differences

between two or more entities and/or a preference of the opinion holder based on

some shared aspects of the entities; a comparison can be gradable (expresses an

ordering relationship of entities being compared), equative (relation of the type "equal

to"), superlative ("greater or less than"), non-gradable (a relation of two or more

entities but without giving a grade to them);

• explicit opinion, that gives a regular or comparative opinion;

• implicit opinion, which is an objective statement that implies a regular or

comparative opinion.

W e d e fi n e a n emotion to be "a persona l pos i t i ve or negat ive

feeling." Here are some examples:

Emotions, our subjective feelings and thoughts, have been studied in multiple

fields, like psychology, philosophy and sociology. Based on Parrott studies, 2001,

people have six primary emotions: love, joy, surprise, anger, sadness and fear, which

can be sub-divided into many secondary and tertiary emotions.

According to consumer behavior research, evaluations can be broadly categorized

into two types: rational evaluations and emotional evaluations. Rational evaluation is

based on tangible beliefs and utilitarian attitudes. Emotional evaluation, instead, goes

deep into people's state of mind. To make use of these types of evaluations in

practice, sentiment ratings can be designed as emotional negative (-2), rational

negative (-1), neutral (0) , rational positive (+1), and emotional positive (+2). In

practice, neutral often means no opinion or sentiment expressed.

The most important indicators of sentiments are the so-called opinion words.

These are words that are commonly used to express positive or negative sentiments.

For example good, wonderful and amazing are positive sentiment words and bad,

poor and terrible are negative sentiment words. There are also phrases and idioms,

Twitter Sentiment Analysis 5 Sosio Giulia e Zucchini Margherita

Page 6: Web Communication: Twitter Sentiment Analysis

of course. A list of such constructs is called a sentiment lexicon (or opinion lexicon).

There are, however, some problems about sentiment lexicon:

• a positive or negative sentiment word may have opposite orientations in different

application domains (for example "this camera sucks" or "this vacuum cleaner really

sucks")

• a sentence containing sentiment words may not express any sentiment at all.

• sarcastic sentences with or without sentiment words are hard to deal.

• many sentences without sentiment words can easily imply opinions.

Opinion spamming has become a major issue. Social media enable anyone from

anywhere in the world to freely express his/her views and opinions without disclosing

his/her true identity. This allows people with hidden agendas or malicious intents to

game the system to give people the impression that they are independent members

of the public and post fake opinions to promote or discredit target products, services,

organizations, or individuals without disclosing their true intentions. We can identify

three types of spam and spamming: fake reviews (Untruthful reviews that are written

not based on the reviewers' genuine experiences of using the products or services,

but are written with hidden motives; reviews about brands only, that do not comment

on the specific product or service but only on the brands or the manufacturers; non-

reviews, advertisements and other irrelevant text containing no opinions.

Another problem about opinion mining has to do with the fact that some sources,

like Twitter, has got a particular structure in its opinion presentations. Tweets are

short (at most 140 characters), informal, and use many internet slangs and

emoticons. Twitter postings are actually easier to analyze (in comparison with forums,

articles, facebook posts) due to the length limit. It is also often easier to achieve high

sentiment analysis accuracy. Reviews are also easier because they are highly

focused with little irrelevant information. To have a better idea of what we intend whit

this look at the table below1:

1This rating was took by Namrata Godbole, Manjunath Srinivasaiah and Steven Skiena using Lydiasentiment analysis sistem.

Twitter Sentiment Analysis 6 Sosio Giulia e Zucchini Margherita

Page 7: Web Communication: Twitter Sentiment Analysis

We can see that there is an important difference between the rating in newspaper

and in blogs for the same topic is really different and some times they crash. So who

should believes to know what people thinks about a topic. This is because of the

nature of blogs and microblogs as Twitter but also because of the need of more word

and lexicons whit slang words and abbreviation.

2. Sentiment Analysis: metodologies

How to evaluate a sentiment opinion

1. Mark sentiment words and phrases: this step marks all sentiment words

and phrases in the sentence. Each positive word is assigned the sentiment score of

+1 and each negative word is assigned the sentiment score of 1. Fo example, we

have the sentence, “The voice quality of this phone is not good, but the battery life is

long.” After this step, the sentence becomes “The voice quality of this phone is not

good [+1], but the battery life is long” because “good” is a positive sentiment word

(the aspects in the sentence are italicized).

2. Apply sentiment shifters: sentiment shifters are words and phrases that

can change sentiment orientations. There are several types of such shifters.

Negation words like not, never, none, nobody, nowhere, neither, and cannot are the

most common type.

3. Handle but-clauses: Words or phrases that indicate contrary need special

handling because they often change sentiment orientations too. The most commonly

used contrary word in English is “but”. A sentence containing a contrary word or

phrase is handled by applying the following rule: the sentiment orientations before

the contrary word (e.g. but) and after the contrary word are opposite to each other if

the opinion on one side cannot be determined.

4. Aggregate opinions: This step applies an opinion aggregation function to

the resulting sentiment scores to determine the final orientation of the sentiment on

each aspect in the sentence.

Twitter Sentiment Analysis 7 Sosio Giulia e Zucchini Margherita

Page 8: Web Communication: Twitter Sentiment Analysis

How to create an efficient lexicon

It is also crucial to create a sentiment lexicon which is is accurately correlated to

the subject under analysis. Researchers have proposed many approaches to compile

sentiment words. Three main approaches are: manual approach, dictionary-based

approach, and corpus-based approach.

• The manual approach is labor intensive and time consuming, and is thus not

usually used alone but combined with automated approaches as the final check,

because automated methods make mistakes.

• Using a dictionary-based approach, the method works as follows: A small set of

sentiment words (seeds) with known positive or negative orientations is first collected

manually, which is very easy. The algorithm then grows this set by searching in the

WordNet or another online dictionary for their synonyms and antonyms. The newly

found words are added to the seed list. The next iteration begins. The iterative

process ends when no more new words can be found. After the process completes, a

manual inspection step was used to clean up the list.

• The corpus-based approach has been applied to two main scenarios: first is

given a seed list of known (often general-purpose) sentiment words, discover other

sentiment words and their orientations from a domain corpus, and then adapt a

general-purpose sentiment lexicon to a new one using a domain corpus for sentiment

analysis applications in the domain. Although the corpus-based approach may also

be used to build a general-purpose sentiment lexicon if a very large and very diverse

corpus is available, the dictionary-based approach is usually more effective for that

because a dictionary has all words.

Due to contributions of many researchers, several general-purpose subjectivity,

sentiment, and emotion lexicons have been constructed, and some of them are also

publically available:

• General Inquirer lexicon (Stone, 1968): (http://www.wjh.harvard.edu/~inquirer/

spreadsheet_guide.htm)

• Sentiment lexicon (Hu and Liu, 2004): (http://www.cs.uic.edu/~liub/FBS/ sentiment-

analysis.html)

• MPQA subjectivity lexicon (Wilson, Wiebe and Hoffmann, 2005):

(http://www.cs.pitt.edu/mpqa/subj _lexicon .html)

• SentiWordNet (Esuli and Sebastiani, 2006): (http://sentiwordnet.isti.cnr.it/)

• Emotion lexicon (Mohammad and Turney, 2010): (http://www.purl.org/net/emolex)

Twitter Sentiment Analysis 8 Sosio Giulia e Zucchini Margherita

Page 9: Web Communication: Twitter Sentiment Analysis

Summarization

Opinion summarization is still an active research area. Most opinion

summarization methods which produce a short text summary have not focused on

the quantitative side (proportions of positive and negative opinions). Future research

can deal with this problem while also producing human readable texts. We should

note that the opinion summarization research cannot progress alone because it

critically depends on results and techniques from other areas of research in

sentiment analysis, e.g., aspect or topic extraction and sentiment classification. All

these research directions will need to go hand-in-hand.

An example of Sentiment Analysis system: Datumbox API

Social Media Monitoring is one of the hottest topics nowadays. As more and more

companies use Social Media Marketing to promote their brands, it became necessary

for them to be able to evaluate the effectiveness of their campaigns.

Evaluating opinions requires performing Sentiment Analysis, which is the task of

identifying automatically the polarity, the subjectivity and the emotional states of

particular document or sentence. It requires using Machine Learning and Natural

Language Processing techniques and this is where most of the developers hit the

wall when they try to build their own tools.

In order to create an efficient tool, we can use a freeware and opensource

software called Datumbox API 1.0v. Datumbox operates by at first buiding at least

two modules: one that evaluate how many people are influenced by a certain

campaign and one that finds out what people think about the particular topic. Using

twitter, Datumbox need at least two things: being able to connect on twitter and

second evaluate the polarity of the tweets based on their words.

The first main steps are: log in on Twitter using your credentials, click on “Create

new Application” button and fill in the form to register a new app. When you create it

select the application and go to the “Details” tab (the first tab) and on the bottom of

the page click the “Create my access token” button. Once you do this, go to the

“OAuth tool” tab and note down the values: Consumer Key, Consumer secret, Access

token and Access token secret.

Here's an example, just a few raws, of a code that use simultaneusly Twitter and

Datumbox API credentials and creates a Sentiment Analysis tool.

<?php

class TwitterSentimentAnalysis {

protected $datumbox_api_key; //Your Datumbox API Key.

protected $consumer_key; //Your Twitter Consumer Key.

Twitter Sentiment Analysis 9 Sosio Giulia e Zucchini Margherita

Page 10: Web Communication: Twitter Sentiment Analysis

0

1

2

3

4

5

6

7

8

9

0

1

2

3

4

5

6

protected $consumer_secret; //Your Twitter Consumer Secret.

protected $access_key; //Your Twitter Access Key.

Protected $access_secret; // Your Twitter Access Secret.

/**

* The constructor of the class

*

* @param string $datumbox_api_key Your Datumbox API Key

* @param string $consumer_key Your Twitter Consumer Key

* @param string $consumer_secret Your Twitter Consumer Secret

* @param string $access_key Your Twitter Access Key

* @param string $access_secret Your Twitter Access Secret

*

* @return TwitterSentimentAnalysis

*/

p u b l i c f u n c t i o n

construct($datumbox_api_key, $consumer_key, $consumer_secret, $access_key

, $access_secret){

$this->datumbox_api_key=$datumbox_api_key;

$this->consumer_key=$consumer_key;

$this->consumer_secret=$consumer_secret;

$this->access_key=$access_key; }

[...]

$sentiment=$DatumboxAPI-

>TwitterSentimentAnalysis($tweet['text']); //call Datumbox service to get

the sentiment

if($sentiment!=false) { //if the sentiment is not

false, the API call was successful.

$results[]=array( //add the tweet message in the

results

'id'=>$tweet['id_str'],

'user'=>$tweet['user']['name'],

'text'=>$tweet['text'],

'url'=>'https://twitter.com/'.$tweet['user']

['name'].'/status/'.$tweet['id_str'],

Twitter Sentiment Analysis 10 Sosio Giulia e Zucchini Margherita

Page 11: Web Communication: Twitter Sentiment Analysis

0

'sentiment'=>$sentiment,

);.....

In order to detect the Sentiment of the tweets Datumbox uses a Machine

Learning framework to build a classifier capable of detecting Positive, Negative and

Neutral tweets. This training set consisted of 1.2 million tweets evenly distributed

across the 3 categories. The software tokenized the tweets by extracting their

bigrams and by taking into account the URLs, the hash tags, the usernames and the

emoticons.

In order to select the best features it uses several different algorithms and at the

end choses the Mutual Information. To evaluate the results we used the 10-fold

cross-validation method and our best performing classifier achieves an accuracy of

83.26%.

3. Twitter

In the past few years, there has been a huge growth in the use of micro blogging

platforms such as Twitter. Spurred by that growth, companies and media

organizations are increasingly seeking ways to mine Twitter for information about

what people think and feel about their products and services. Companies such as

Twitratr (twitrratr.com), tweetfeel (www.tweetfeel.com), and Social Mention

(www.socialmention.com) are just a few who advertise Twitter sentiment analysis as

one of their services.

Twitter messages have many unique attributes, and it's interesting to underline

them if we want to talk about its Sentiment Analysis methods.

1. Length. The maximum length of a Twitter message is 140 characters.

From our training set, we calculated that the average length of a tweet is 14 words,

and the average length of a sentence is 78 characters.

2. Language model. Twitter users post messages from many different

mediums, including their cell phones. The frequency of misspellings and slang in

tweets is much higher than other domains.

3. Twitter is used by different people to express their opinion about different

topics, thus it is a valuable source of people’s opinions.

4. Twitter contains an enormous number of text posts and it grows every day.

The collected corpus can be arbitrarily large.

5. Twitter’s audience varies from regular users to celebrities, company

representatives, politicians4, and even country presidents. Therefore, it is possible to

collect text posts of users from different social and interests groups.

Twitter Sentiment Analysis 11 Sosio Giulia e Zucchini Margherita

Page 12: Web Communication: Twitter Sentiment Analysis

So investigate Twitter, and applicate sentiment analyses to it, is becoming each

day more and more important. Using Twitter people connect themselves to other

creating a social web where it is possible organize meeting and, for example, flash

mob, show it to the rest of the world and had a feedback that was impossible in all

the other historical period.

At the fourth year of my high school a story teacher asked himself and us what

would be of French Revolution if politician had had mobile phone or Facebook or at

least a TV. Would it happen or not? This question is proposed now to every body whit

the problems of arabian spring. Those social movements were organized on social

platform particularly on Twitter and the instantly feedback from other users give to

participant a strong that it was impossible in events before social network. The

problem is that we have no evidence of that; this events were born on internet and

thank of social networks but nobody has a real instrument to understand the power of

this new place. As all this kind of research there are always a black side: understand

those medias is also a way to understand how control them. Firms and government

had yet account official or not and strategies to promote themselves, to inform and

also to control people but everything is on this social networks is public and

everybody can use them if they ask, and often pay. By this side sentiment analyses

of social networks and blogs had some interesting dark shadows.

What would happens if police could know from where an opinion starts to diffuse

around the web? Or if a firm whit problems of imagine could know what write to

correct it and prevent problems next year? During politician champaign, for example,

sentiment analyses are used to understand what public opinion thinks about their

sentence and how the champaign is going. Unfortunately sentiment analyses

Twitter Sentiment Analysis 12 Sosio Giulia e Zucchini Margherita

Page 13: Web Communication: Twitter Sentiment Analysis

methods are not enough efficient and even in USA not everybody are on Twitter so

using just computer is not possible gives a right idea of what is happening on web.

The problem by an algorithm point of view is create a program that could understand

semantic part of a text and connect how many times a topic appears and in which

context. In the next paragraph we will present Voices From the Blog which is

proposing an interesting solution to this problems.

An italian case study: Voices from the blog and Ihappy indexVoices From the Blog (VFB) is a start up whit the aim of study Twitter and other

social media whit a new sentiment analysis approach that can integrate algorithm

and human process. The idea was that to cancel error of missclassification the work

of computer must be control and integrate, where it is need, by human. Using this

method, VFB increase its lexicon and class better than other and using data and

statistical analyses method illustrated before it gives a more specific and accurate

report of what the web is felling about discussing topics actually they have created an

index called Ihappy which measure the happiness in each city of a country. VCB is

collaborating which some important italian journal for example Il Sole 24 Ore e Il

Corriere della Sera and it has also created an index for Wired.com to know the desire

of innovation and creativity in Italy. It is important to observe that this index and in

general their revelations give a better picture of what the web is felling and how

people is talking about important topics especially discussing once that we have

notice before are the really problem of sentiment analyses system that use just

statistical and logical method and not involve humans. This method is very usefull

because it do not let all the work to a computer whit a small or big lexicon but a

person control and teach to a computer how translate human languages to machine

language and this is possible for each language of our word that is spoken by a

researcher. Voices From the Blogs has also created an app to show its results every

day.

Ihappy index

The Ihappy is an index created by Voices From the Blog to analize humor day by

day of Twitter users, it is not just a temporal index but also a geographic once thank

to geo-localization system of Twitter. So for each day of a year we can know where

people is happier and thank to tags even why. The analyses starts from a small

sample of tweet happy and unhappy that Twitter gives every day to everyone and

using some method of text analyses and the human control as they yet do for normal

tex analyses of the web is possible understand the felling of Twitter. The level of

happiness in a particular city and on Twitter is calculated by this formula:

Ihappy=(number of happy post/number of happy & unhappy post)*100%

Twitter Sentiment Analysis 13 Sosio Giulia e Zucchini Margherita

Page 14: Web Communication: Twitter Sentiment Analysis

It is important to understand that VFB does not just look to word or particular topic

but the integration between human and computer permit also a semantic analyses of

tweets. This way the Ihappy index consider also if somebody is speaking about his or

her son birth or a sunny day after a week of rain and fog.

The analysis of Twitter by VFB show us that there are some dynamic variables and

some other that are static. First ones are connected whit events or particular days of

the year:

• the happy of one day remain for more days

• which day is analyzed, if it is holiday or there is a celebration, for example

mum's day, it will be a happier day then for example the day in which we change

our hours

• facts of the day, if during the day there were good news or not it changes our

happy level

The second kind of variables are defined and do not change day by day:

• geographic, where we live changes our happiness that decreases when altitude

increases but if the district has seasides it increases again; for example if Milano

would have a seaside happiness of its citizens will be happier of 1,3 points

• institutional and politician, here we do not speak about the color of our city

administration but about the quality of life and social services

• demographic, living in a more populated city makes people happier, maybe

because a city offers more opportunities than town

The Ihappy ebook of 2013 shows us a happier country then the one of year before

whit 310 day of happy we can also see that the real variable that can change the

color of a day is the one connects whit collective events like a holiday or a success in

an important field. The Ihappy index gives also us an idea of how important is

becoming communication and social networks to know our society and in same case

to control it.

ConclusionAfter we wrote about the basic aspects of Sentiment Analysis, starting from the key

point of this type of research (what is in fact an opinion, how is explained, what are

the different level of analysis and which characteristics have different emotions) to the

technical and arithmetical methodologies, we focused more on the Twitter

phenomenon (the real “search field” for this kind of analysis) and a business reality

created on this topic, Voices From the Blog.

This excursus is important, because it allows us to describe the conclusion. In our

Twitter Sentiment Analysis 14 Sosio Giulia e Zucchini Margherita

Page 15: Web Communication: Twitter Sentiment Analysis

opinion, Sentiment Analysis as it is nowadays it's not yet full capable of catching the

real moods and trends showing in the social medias. It's still missing the key point,

which are the billions forms of communication that a human being use apart from a

standard lexicon: slangs, dialects, internal jokes among friends, sarcasm and so on.

It's not about misspellings or graphic emoticons, which we have seen are well

integrated in a Sentiment Analysis lexicon. It has got something to do with the

singularity of each users, and specific ways in which he/she only can communicate to

the digital world.

Of course we are talking about a growing field, that is sophisticating its filters and

structures in order to be catch up with trends, so we're hoping that in time it will be

possible to see an integrated structure, in which computers will be fed up with human

minds and natural procedures. It's the same kind of integration we have seen in

Artificial Neural Networks, and that it's still growing and expanding since it has

unlimited options.

Twitter Sentiment Analysis 15 Sosio Giulia e Zucchini Margherita

Page 16: Web Communication: Twitter Sentiment Analysis

Bibliography

• International Sentiment Analysis for News and Blogs by Mickail Bautin, LohitVijayarenu and Steven Skiena;

• Large Scale Sentiment Analysis For News and Blogs by Namrata Godbole,Manjunath Srinivasaiah and Steven Skiena;

• Sentiment Analysis and Opinion Meaning by Bing Liu Morgan & ClaypoolPublishers, May 2012.

• Twitter as a Corpus for Sentiment Analysis and Opinion Mining by AlexanderPak, Patrick Paroubek

• Twitter Sentiment Analysis by Alec Go, Lei Huang, Richa Bhayani

• IHappy 2013 by Voices From the Blogs Andrea Ceron Luigi Curini Stefano M.Iacus

• Voices from the Blogs http://voicesfromtheblogs.com

Twitter Sentiment Analysis 16 Sosio Giulia e Zucchini Margherita