Upload
faculty-of-computer-science
View
345
Download
1
Embed Size (px)
Citation preview
Doina Balahur(FEBA), Adrian Iftene(FCS)
LiSS Conference, 3-5 May, Iasi“Al.I.Cuza” University
““Al. I. Cuza”, University of IaAl. I. Cuza”, University of Iassi, Romi, Romaaniania
FacultFaculty of Computer Science (FCS)y of Computer Science (FCS)
Faculty of Economics and Business Administration (FEBA)Faculty of Economics and Business Administration (FEBA)
LiSS Conference, 3-5 May, Iasi
Our activity over Internet is carefully monitored (links, keywords used in queries, the computers on which we connect the Internet, etc.)
Some of them have to making our life easier, and others can report anti-social or harmful to society (such task the Wikipedia Vandalism from CLEF2010)
It is up to the invocation of national security (after 11 September) and to the interests of big companies
LiSS Conference, 3-5 May, Iasi
There are resources available from Yahoo (Cries task from Clef 2010) or from Wikipedia
Yahoo!Questions and Answers (Surdeanu M., M. Ciaramita, H. Zaragoza, ACL 2008) aim is to identify human experts in a field
A study on the use of social networks StudiVZ, Facebook and MySpace - Social Networking Sites and the Surveillance Society (C. Fuchs, 2009) LiSS Conference, 3-5 May, Iasi
We want to find out users’ opinions on various products or events◦ I want to buy a certain product (eg iPhone). What
are the strengths and weaknesses of it are? What say persons who have used it already?
◦ Phenomenon flu “AH1N1“. To make or not make flu vaccine? What are the benefits of the vaccine? What are the unintended side effects?
LiSS Conference, 3-5 May, Iasi
It is based on three main components:◦ Information extraction
◦ Indexing with Lucene Nutch
◦ Search texts and their classification according to the feelings and opinions found in themLiSS Conference, 3-5 May, Iasi
Internet surveillance
Opinions identification
User query
Local indexEmotion Triggers
Negations,Emphasizes and
Diminishing words
Consumer opinions- Paragraphs (+/-)- NEs
Google API, Yahoo API
Social Networks (MySpace, Facebook, Twitter)
We initially had performance issues (because we prefer Wikipedia pages)
Solution: we added additional criteria to prefer blogs, forums, or areas that allow to add personal comments and opinions
LiSS Conference, 3-5 May, Iasi
Romanian researchers working on English, Spanish, the Romanian (Dan Tufiş, Rada Mihalcea, Alexandra Balahur)
Other researchers (Carlo Strapparava, A. Montoyo)
Used techniques involve disambiguation, the use of SentiWordNet resources, identify triggers feelings, etc.LiSS Conference, 3-5 May, Iasi
Based on the resources from English language (A. Balahur and A. Montoyo, 2008) we build our resources specific to the Romanian language
Using the Romanian WordNet (Tufiş et al., 2004) we extended this resource
The resources we have completed and adapted (Iftene, Rotaru, 2010) with specific terms used in social networks or in blogs
Specific terms: “successfully”, “super”, “fine”, “good” and emotional icons ":)" (smiley face ), “” (sad face ), etc.
LiSS Conference, 3-5 May, Iasi
We considered a set of principal terms representing positive triggers (pride, family, home, freedom, esteem, etc..), and another set of principal terms representing negative triggers (murder, treason, cowardice, etc.).
Synonyms and hyponyms of principal terms were added to the same set to which they belonged
Antonyms of principal terms were added to other sets
LiSS Conference, 3-5 May, Iasi
Terms that changes valence to another term that accompany
Negations - which (radically) change valence (“no”, “never”)
Amplifiers - that emphasize the positive or negative aspect of a trigger (adjectives: “high”, “more”, “better”, “profound”, “exceptional”, adverbs: “definitely”, “sure”, “certainly”, “ultimately”)
Diminished - that diminishes the positive or negative or positive valence of a trigger, going to a neutral valence (adjective “small”, “less”, “worse”, “rather” modal verbs “can”, “possible”, “must”, “want” adverbs: “probably”)
LiSS Conference, 3-5 May, Iasi
1. Alex is a good person.“good” is a positive trigger
2. Ben is a very good person.“good” as positive trigger, but we have also the
amplifier “very”
3. John is the best.“best” is a positive trigger with the strongest
intensity LiSS Conference, 3-5 May, Iasi
A1. Alex is a good person. A2. Ben is not a good person.
we have the antonym relation: good ≠ not good (= bad)
B1. John is the best. B2. Thomas is not the best.
we consider not the best to be somewhere between good and the best
We are interested to see what are Romanian people opinions regarding listening and interception of their communications
We generate queries in Romanian using basic words: ◦ SMS, e-mail, mobile/fixed phone, e-mail,
chat and additional words representing Romanian
organizations:◦ SRI - Romanian Intelligence Service◦ DNA - National Anticorruption
LiSS Conference, 3-5 May, Iasi
In total we obtained over 30 queries and around 100 links corresponding to searches with these queries (after filtering phase)
LiSS Conference, 3-5 May, Iasi
Positive opinions (10%)
Negative opinions (30%)
Neutral opinions (60%)
Positive opinions: ◦ it can intercept corrupt politicians, ◦ businessmen who have illegal business,◦ terrorist acts in time
Negative opinions: ◦ we have the right to privacy◦ interceptions are used for extortion and intimidation,
or to eliminate business competition Neutral opinions:◦ users who say they have nothing to hide, nor did that
they would interest in what other persons doing◦ In many cases users come with ironic comments
regarding to SRI capabilities to surprise even any communication
“Law no. 298/2008 related to retention of data generated or processed by providers of electronic communications ... ” issued on 18 November 2008 by the Romanian Parliament
Negative opinions: ◦ In many cases, users see the law a return back in
communist times when security was using wiretaps for the oppression of the population◦ The other comments are critical of the
television and newspapers that have shown the introduction of this law
“Law no. 298/2008 related to retention of data generated or processed by providers of electronic communications ... ” issued on 18 November 2008 by the Romanian Parliament
Positive opinions: ◦ One of them is a technical comment that explain
how data transfer protocols that track network packets route across a network, may report errors and can correct them if they know who and to whom was sent the package◦ Other messages show that this law will be beneficial
for tracking crime groups
LiSS Conference, 3-5 May, Iasi
On six subjects, three people were evaluated over 100 paragraphs => ~ 44% classification accuracy
For Romanian, there are major differences between the number of search results obtained using Google or Yahoo and search on social networks
On English language searches have enough results by both methodsLiSS Conference, 3-5 May, Iasi
Existing information on the Internet are very useful
Case study on wiretaps and monitoring activities on the Internet (chat, email, etc.) showed that are more negative opinions than positive ones
We could also find that negative views are focused on law adopted and the SRILiSS Conference, 3-5 May, Iasi
THANK YOU
Q/A
LiSS Conference, 3-5 May, Iasi